Files
markitect-tool/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md

9.2 KiB

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, related_workplans, created, updated, state_hub_workstream_id
id type title domain status owner topic_slug planning_priority planning_order depends_on_workplans related_workplans created updated state_hub_workstream_id
MKTT-WP-0019 workplan Source Adapter Contract Refinement markitect done markitect-tool markitect complete 142
MKTT-WP-0013
MKTT-WP-0017
MKTT-WP-0018
MKTT-WP-0010
MKTT-WP-0011
2026-05-14 2026-05-14 10a85934-a4b2-4661-83f7-92ac8d322af4

MKTT-WP-0019: Source Adapter Contract Refinement

Purpose

Refine the source adapter contract before implementing MKTT-WP-0018. The goal is to remove the remaining ambiguity in the external adapter surface so markitect-tool can implement the framework and markitect-filter can implement EPUB3 without guessing about model fields, entry points, CLI behavior, or contract-test expectations.

This is a short gating workplan. It should produce decisions, documentation, and test fixtures that make MKTT-WP-0018 implementation straightforward.

Background

MKTT-WP-0018 establishes the correct architecture boundary:

markitect-tool   -> contracts, normalized markdown model, registry, CLI/API
markitect-filter -> concrete source-format adapters, EPUB3 first

The boundary is sound, but a feasibility review found that the implementation workplan still leaves several decisions too implicit:

  • the existing internal extension framework does not yet define external package entry point discovery
  • the normalized source-to-markdown model names are listed, but field-level contracts and serialization rules are not pinned
  • v1 should be read-only, with write/export support reserved for a later format-by-format decision
  • CLI/API output envelopes, adapter selection, and unsupported-format behavior need deterministic contracts
  • markitect-filter needs a concrete handoff shape for its first EPUB3 adapter

Decision

Add a refinement pass ahead of MKTT-WP-0018. This workplan should define the minimum stable v1 contract and explicitly defer nonessential scope.

The v1 source adapter contract should be:

  • read-only
  • deterministic
  • local-file-first, with URI support documented as future or explicitly scoped
  • discoverable through a named package entry point group
  • serializable without heavyweight optional format dependencies
  • testable through fake adapters and small fixtures

Non-Goals

  • Do not implement EPUB3 parsing here.
  • Do not implement the full markitect-tool source adapter framework here.
  • Do not add PDF, DOCX, ODT, OCR, or browser dependencies.
  • Do not design write/export adapters beyond recording the future extension point.
  • Do not make markitect-filter a knowledge platform or ingestion service.

P19.1 - Pin v1 scope and external adapter package shape

id: MKTT-WP-0019-T001
status: done
priority: high
state_hub_task_id: "7ecc6976-c549-47ba-9a16-4d55d1173b41"

Define the v1 source adapter scope:

  • read adapters only
  • local filesystem inputs first
  • explicit future status for URI inputs, binary attachments, and write adapters
  • expected external package layout for markitect-filter
  • dependency policy for optional format libraries
  • compatibility expectations between markitect-tool and adapter packages

Output: concise architecture note or source-adapter contract section that MKTT-WP-0018 can implement directly.

Implemented: docs/source-adapter-contract.md defines the v1 read-only scope, local-file-first posture, external package shape, optional dependency policy, and compatibility boundary for markitect-filter.

P19.2 - Specify normalized data model fields and serialization

id: MKTT-WP-0019-T002
status: done
priority: high
state_hub_task_id: "7b164d67-8374-4aea-9948-f54912ef4cf5"

Specify the field-level v1 model for:

  • SourceAsset
  • SourceMetadata
  • NormalizedMarkdownDocument
  • NormalizedMarkdownSegment
  • SourceProvenance
  • NormalizationQuality
  • adapter diagnostics using the existing Diagnostic/SourceLocation shape
  • optional asset reference envelopes, if needed for v1

The specification should define required vs optional fields, stable dict/JSON serialization, digest/cache-key inputs, segment ordering, segment IDs, headings, anchors, source hrefs, page/section references, and adapter metadata.

Output: model contract documentation and fixture-shaped examples.

Implemented: docs/source-adapter-contract.md pins field-level model contracts for source assets, metadata, provenance, segments, normalized documents, and quality. examples/source-adapters/normalized-document.json and examples/source-adapters/normalized-output.md provide fixture-shaped examples.

P19.3 - Specify read adapter protocol and selection semantics

id: MKTT-WP-0019-T003
status: done
priority: high
state_hub_task_id: "f7cc1956-a6f3-4181-b4df-786cbba39198"

Define the v1 read protocol:

  • request/result type names and fields
  • can_read, inspect, and read method signatures
  • media type and file extension matching rules
  • adapter option schema conventions
  • malformed-source and unsupported-format diagnostics
  • deterministic adapter selection when multiple adapters match
  • behavior when optional adapter dependencies are missing

Output: protocol contract that can be implemented as Python Protocol classes in MKTT-WP-0018.

Implemented: docs/source-adapter-contract.md defines the v1 SourceReadAdapter protocol, request/result names, option handling, adapter selection semantics, and deterministic diagnostics for unsupported, malformed, and dependency-missing inputs.

P19.4 - Define package entry point and registry contract

id: MKTT-WP-0019-T004
status: done
priority: high
state_hub_task_id: "5db7448c-c0d0-48eb-8e44-9f694782af7f"

Define how external source adapter packages register with markitect-tool:

  • entry point group name, initially markitect_tool.source_adapters
  • expected entry point object shape
  • descriptor ID and versioning rules
  • relationship between source adapter descriptors and ExtensionDescriptor
  • duplicate descriptor handling
  • dependency diagnostics for missing optional format libraries
  • compatibility notes for separately versioned packages

Output: discovery contract and fake entry point test plan for MKTT-WP-0018.

Implemented: docs/source-adapter-contract.md defines the markitect_tool.source_adapters entry point group, accepted entry point object shapes, descriptor mapping to ExtensionDescriptor, duplicate handling, and dependency diagnostics. examples/source-adapters/fake-adapter-pyproject.toml provides the fake entry point fixture.

P19.5 - Pin CLI/API output envelopes and exit behavior

id: MKTT-WP-0019-T005
status: done
priority: medium
state_hub_task_id: "b57a2fd1-e528-4481-b11b-12b15979a85f"

Specify the public source commands and library functions:

  • mkt source adapters
  • mkt source inspect <path>
  • mkt source normalize <path> --format markdown
  • JSON output for adapters, inspection, normalization, and diagnostics
  • Markdown output for normalized document content
  • adapter selection and explicit adapter override options
  • exit behavior for unsupported, malformed, or dependency-missing inputs
  • public API names that should be exported from markitect_tool

Output: CLI/API contract note and expected-output fixtures.

Implemented: docs/source-adapter-contract.md pins the mkt source command surface, formats, options, exit behavior, and public API export names. examples/source-adapters/adapter-list.json and examples/source-adapters/inspect-result.json provide expected-output fixtures.

P19.6 - Prepare contract-test and markitect-filter handoff criteria

id: MKTT-WP-0019-T006
status: done
priority: high
state_hub_task_id: "a7cb10fd-e1bd-4aee-81af-c93f09496ff8"

Define the contract tests that MKTT-WP-0018 must implement:

  • fake in-tree adapter for core behavior
  • fake external adapter package or monkeypatched entry point for discovery
  • serialization round trips for normalized model fixtures
  • unsupported-format and missing-dependency diagnostics
  • CLI JSON and Markdown output fixtures
  • reusable adapter conformance expectations for markitect-filter

Also seed the markitect-filter handoff:

  • expected package entry point declaration
  • first EPUB3 adapter descriptor shape
  • minimal fixture expectations for EPUB3 spine/nav/body extraction
  • follow-up workplan seed for markitect-filter implementation

Output: contract-test checklist and handoff note.

Implemented: docs/source-adapter-contract.md includes the WP0018 contract test checklist and the first markitect-filter EPUB3 handoff descriptor, fixture expectations, and extraction responsibilities.

Acceptance

  • MKTT-WP-0018 has no unresolved v1 contract ambiguity around model fields, read protocol shape, entry point discovery, CLI/API output, or fake adapter tests.
  • v1 is explicitly read-only; write/export support is deferred to a later workplan.
  • External adapter discovery has a named entry point group and descriptor object contract.
  • markitect-filter has enough handoff detail to implement EPUB3 without importing implementation decisions from infospace-bench.
  • The existing MKTT-WP-0018 workplan is updated to depend on this refinement pass and to reference the pinned decisions rather than reopening them.