--- id: MKTT-WP-0019 type: workplan title: "Source Adapter Contract Refinement" domain: markitect status: done owner: markitect-tool topic_slug: markitect planning_priority: complete planning_order: 142 depends_on_workplans: - MKTT-WP-0013 - MKTT-WP-0017 related_workplans: - MKTT-WP-0018 - MKTT-WP-0010 - MKTT-WP-0011 created: "2026-05-14" updated: "2026-05-14" state_hub_workstream_id: "10a85934-a4b2-4661-83f7-92ac8d322af4" --- # MKTT-WP-0019: Source Adapter Contract Refinement ## Purpose Refine the source adapter contract before implementing `MKTT-WP-0018`. The goal is to remove the remaining ambiguity in the external adapter surface so `markitect-tool` can implement the framework and `markitect-filter` can implement EPUB3 without guessing about model fields, entry points, CLI behavior, or contract-test expectations. This is a short gating workplan. It should produce decisions, documentation, and test fixtures that make `MKTT-WP-0018` implementation straightforward. ## Background `MKTT-WP-0018` establishes the correct architecture boundary: ```text markitect-tool -> contracts, normalized markdown model, registry, CLI/API markitect-filter -> concrete source-format adapters, EPUB3 first ``` The boundary is sound, but a feasibility review found that the implementation workplan still leaves several decisions too implicit: - the existing internal extension framework does not yet define external package entry point discovery - the normalized source-to-markdown model names are listed, but field-level contracts and serialization rules are not pinned - v1 should be read-only, with write/export support reserved for a later format-by-format decision - CLI/API output envelopes, adapter selection, and unsupported-format behavior need deterministic contracts - `markitect-filter` needs a concrete handoff shape for its first EPUB3 adapter ## Decision Add a refinement pass ahead of `MKTT-WP-0018`. This workplan should define the minimum stable v1 contract and explicitly defer nonessential scope. The v1 source adapter contract should be: - read-only - deterministic - local-file-first, with URI support documented as future or explicitly scoped - discoverable through a named package entry point group - serializable without heavyweight optional format dependencies - testable through fake adapters and small fixtures ## Non-Goals - Do not implement EPUB3 parsing here. - Do not implement the full `markitect-tool` source adapter framework here. - Do not add PDF, DOCX, ODT, OCR, or browser dependencies. - Do not design write/export adapters beyond recording the future extension point. - Do not make `markitect-filter` a knowledge platform or ingestion service. ## P19.1 - Pin v1 scope and external adapter package shape ```task id: MKTT-WP-0019-T001 status: done priority: high state_hub_task_id: "7ecc6976-c549-47ba-9a16-4d55d1173b41" ``` Define the v1 source adapter scope: - read adapters only - local filesystem inputs first - explicit future status for URI inputs, binary attachments, and write adapters - expected external package layout for `markitect-filter` - dependency policy for optional format libraries - compatibility expectations between `markitect-tool` and adapter packages Output: concise architecture note or source-adapter contract section that `MKTT-WP-0018` can implement directly. Implemented: `docs/source-adapter-contract.md` defines the v1 read-only scope, local-file-first posture, external package shape, optional dependency policy, and compatibility boundary for `markitect-filter`. ## P19.2 - Specify normalized data model fields and serialization ```task id: MKTT-WP-0019-T002 status: done priority: high state_hub_task_id: "7b164d67-8374-4aea-9948-f54912ef4cf5" ``` Specify the field-level v1 model for: - `SourceAsset` - `SourceMetadata` - `NormalizedMarkdownDocument` - `NormalizedMarkdownSegment` - `SourceProvenance` - `NormalizationQuality` - adapter diagnostics using the existing `Diagnostic`/`SourceLocation` shape - optional asset reference envelopes, if needed for v1 The specification should define required vs optional fields, stable dict/JSON serialization, digest/cache-key inputs, segment ordering, segment IDs, headings, anchors, source hrefs, page/section references, and adapter metadata. Output: model contract documentation and fixture-shaped examples. Implemented: `docs/source-adapter-contract.md` pins field-level model contracts for source assets, metadata, provenance, segments, normalized documents, and quality. `examples/source-adapters/normalized-document.json` and `examples/source-adapters/normalized-output.md` provide fixture-shaped examples. ## P19.3 - Specify read adapter protocol and selection semantics ```task id: MKTT-WP-0019-T003 status: done priority: high state_hub_task_id: "f7cc1956-a6f3-4181-b4df-786cbba39198" ``` Define the v1 read protocol: - request/result type names and fields - `can_read`, `inspect`, and `read` method signatures - media type and file extension matching rules - adapter option schema conventions - malformed-source and unsupported-format diagnostics - deterministic adapter selection when multiple adapters match - behavior when optional adapter dependencies are missing Output: protocol contract that can be implemented as Python `Protocol` classes in `MKTT-WP-0018`. Implemented: `docs/source-adapter-contract.md` defines the v1 `SourceReadAdapter` protocol, request/result names, option handling, adapter selection semantics, and deterministic diagnostics for unsupported, malformed, and dependency-missing inputs. ## P19.4 - Define package entry point and registry contract ```task id: MKTT-WP-0019-T004 status: done priority: high state_hub_task_id: "5db7448c-c0d0-48eb-8e44-9f694782af7f" ``` Define how external source adapter packages register with `markitect-tool`: - entry point group name, initially `markitect_tool.source_adapters` - expected entry point object shape - descriptor ID and versioning rules - relationship between source adapter descriptors and `ExtensionDescriptor` - duplicate descriptor handling - dependency diagnostics for missing optional format libraries - compatibility notes for separately versioned packages Output: discovery contract and fake entry point test plan for `MKTT-WP-0018`. Implemented: `docs/source-adapter-contract.md` defines the `markitect_tool.source_adapters` entry point group, accepted entry point object shapes, descriptor mapping to `ExtensionDescriptor`, duplicate handling, and dependency diagnostics. `examples/source-adapters/fake-adapter-pyproject.toml` provides the fake entry point fixture. ## P19.5 - Pin CLI/API output envelopes and exit behavior ```task id: MKTT-WP-0019-T005 status: done priority: medium state_hub_task_id: "b57a2fd1-e528-4481-b11b-12b15979a85f" ``` Specify the public source commands and library functions: - `mkt source adapters` - `mkt source inspect ` - `mkt source normalize --format markdown` - JSON output for adapters, inspection, normalization, and diagnostics - Markdown output for normalized document content - adapter selection and explicit adapter override options - exit behavior for unsupported, malformed, or dependency-missing inputs - public API names that should be exported from `markitect_tool` Output: CLI/API contract note and expected-output fixtures. Implemented: `docs/source-adapter-contract.md` pins the `mkt source` command surface, formats, options, exit behavior, and public API export names. `examples/source-adapters/adapter-list.json` and `examples/source-adapters/inspect-result.json` provide expected-output fixtures. ## P19.6 - Prepare contract-test and markitect-filter handoff criteria ```task id: MKTT-WP-0019-T006 status: done priority: high state_hub_task_id: "a7cb10fd-e1bd-4aee-81af-c93f09496ff8" ``` Define the contract tests that `MKTT-WP-0018` must implement: - fake in-tree adapter for core behavior - fake external adapter package or monkeypatched entry point for discovery - serialization round trips for normalized model fixtures - unsupported-format and missing-dependency diagnostics - CLI JSON and Markdown output fixtures - reusable adapter conformance expectations for `markitect-filter` Also seed the `markitect-filter` handoff: - expected package entry point declaration - first EPUB3 adapter descriptor shape - minimal fixture expectations for EPUB3 spine/nav/body extraction - follow-up workplan seed for `markitect-filter` implementation Output: contract-test checklist and handoff note. Implemented: `docs/source-adapter-contract.md` includes the WP0018 contract test checklist and the first `markitect-filter` EPUB3 handoff descriptor, fixture expectations, and extraction responsibilities. ## Acceptance - `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields, read protocol shape, entry point discovery, CLI/API output, or fake adapter tests. - v1 is explicitly read-only; write/export support is deferred to a later workplan. - External adapter discovery has a named entry point group and descriptor object contract. - `markitect-filter` has enough handoff detail to implement EPUB3 without importing implementation decisions from `infospace-bench`. - The existing `MKTT-WP-0018` workplan is updated to depend on this refinement pass and to reference the pinned decisions rather than reopening them.