Files
markitect-tool/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md

266 lines
9.2 KiB
Markdown

---
id: MKTT-WP-0019
type: workplan
title: "Source Adapter Contract Refinement"
domain: markitect
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: complete
planning_order: 142
depends_on_workplans:
- MKTT-WP-0013
- MKTT-WP-0017
related_workplans:
- MKTT-WP-0018
- MKTT-WP-0010
- MKTT-WP-0011
created: "2026-05-14"
updated: "2026-05-14"
state_hub_workstream_id: "10a85934-a4b2-4661-83f7-92ac8d322af4"
---
# MKTT-WP-0019: Source Adapter Contract Refinement
## Purpose
Refine the source adapter contract before implementing
`MKTT-WP-0018`. The goal is to remove the remaining ambiguity in the external
adapter surface so `markitect-tool` can implement the framework and
`markitect-filter` can implement EPUB3 without guessing about model fields,
entry points, CLI behavior, or contract-test expectations.
This is a short gating workplan. It should produce decisions, documentation,
and test fixtures that make `MKTT-WP-0018` implementation straightforward.
## Background
`MKTT-WP-0018` establishes the correct architecture boundary:
```text
markitect-tool -> contracts, normalized markdown model, registry, CLI/API
markitect-filter -> concrete source-format adapters, EPUB3 first
```
The boundary is sound, but a feasibility review found that the implementation
workplan still leaves several decisions too implicit:
- the existing internal extension framework does not yet define external
package entry point discovery
- the normalized source-to-markdown model names are listed, but field-level
contracts and serialization rules are not pinned
- v1 should be read-only, with write/export support reserved for a later
format-by-format decision
- CLI/API output envelopes, adapter selection, and unsupported-format behavior
need deterministic contracts
- `markitect-filter` needs a concrete handoff shape for its first EPUB3 adapter
## Decision
Add a refinement pass ahead of `MKTT-WP-0018`. This workplan should define the
minimum stable v1 contract and explicitly defer nonessential scope.
The v1 source adapter contract should be:
- read-only
- deterministic
- local-file-first, with URI support documented as future or explicitly scoped
- discoverable through a named package entry point group
- serializable without heavyweight optional format dependencies
- testable through fake adapters and small fixtures
## Non-Goals
- Do not implement EPUB3 parsing here.
- Do not implement the full `markitect-tool` source adapter framework here.
- Do not add PDF, DOCX, ODT, OCR, or browser dependencies.
- Do not design write/export adapters beyond recording the future extension
point.
- Do not make `markitect-filter` a knowledge platform or ingestion service.
## P19.1 - Pin v1 scope and external adapter package shape
```task
id: MKTT-WP-0019-T001
status: done
priority: high
state_hub_task_id: "7ecc6976-c549-47ba-9a16-4d55d1173b41"
```
Define the v1 source adapter scope:
- read adapters only
- local filesystem inputs first
- explicit future status for URI inputs, binary attachments, and write adapters
- expected external package layout for `markitect-filter`
- dependency policy for optional format libraries
- compatibility expectations between `markitect-tool` and adapter packages
Output: concise architecture note or source-adapter contract section that
`MKTT-WP-0018` can implement directly.
Implemented: `docs/source-adapter-contract.md` defines the v1 read-only scope,
local-file-first posture, external package shape, optional dependency policy,
and compatibility boundary for `markitect-filter`.
## P19.2 - Specify normalized data model fields and serialization
```task
id: MKTT-WP-0019-T002
status: done
priority: high
state_hub_task_id: "7b164d67-8374-4aea-9948-f54912ef4cf5"
```
Specify the field-level v1 model for:
- `SourceAsset`
- `SourceMetadata`
- `NormalizedMarkdownDocument`
- `NormalizedMarkdownSegment`
- `SourceProvenance`
- `NormalizationQuality`
- adapter diagnostics using the existing `Diagnostic`/`SourceLocation` shape
- optional asset reference envelopes, if needed for v1
The specification should define required vs optional fields, stable dict/JSON
serialization, digest/cache-key inputs, segment ordering, segment IDs, headings,
anchors, source hrefs, page/section references, and adapter metadata.
Output: model contract documentation and fixture-shaped examples.
Implemented: `docs/source-adapter-contract.md` pins field-level model contracts
for source assets, metadata, provenance, segments, normalized documents, and
quality. `examples/source-adapters/normalized-document.json` and
`examples/source-adapters/normalized-output.md` provide fixture-shaped
examples.
## P19.3 - Specify read adapter protocol and selection semantics
```task
id: MKTT-WP-0019-T003
status: done
priority: high
state_hub_task_id: "f7cc1956-a6f3-4181-b4df-786cbba39198"
```
Define the v1 read protocol:
- request/result type names and fields
- `can_read`, `inspect`, and `read` method signatures
- media type and file extension matching rules
- adapter option schema conventions
- malformed-source and unsupported-format diagnostics
- deterministic adapter selection when multiple adapters match
- behavior when optional adapter dependencies are missing
Output: protocol contract that can be implemented as Python `Protocol`
classes in `MKTT-WP-0018`.
Implemented: `docs/source-adapter-contract.md` defines the v1
`SourceReadAdapter` protocol, request/result names, option handling, adapter
selection semantics, and deterministic diagnostics for unsupported, malformed,
and dependency-missing inputs.
## P19.4 - Define package entry point and registry contract
```task
id: MKTT-WP-0019-T004
status: done
priority: high
state_hub_task_id: "5db7448c-c0d0-48eb-8e44-9f694782af7f"
```
Define how external source adapter packages register with `markitect-tool`:
- entry point group name, initially `markitect_tool.source_adapters`
- expected entry point object shape
- descriptor ID and versioning rules
- relationship between source adapter descriptors and
`ExtensionDescriptor`
- duplicate descriptor handling
- dependency diagnostics for missing optional format libraries
- compatibility notes for separately versioned packages
Output: discovery contract and fake entry point test plan for
`MKTT-WP-0018`.
Implemented: `docs/source-adapter-contract.md` defines the
`markitect_tool.source_adapters` entry point group, accepted entry point object
shapes, descriptor mapping to `ExtensionDescriptor`, duplicate handling, and
dependency diagnostics. `examples/source-adapters/fake-adapter-pyproject.toml`
provides the fake entry point fixture.
## P19.5 - Pin CLI/API output envelopes and exit behavior
```task
id: MKTT-WP-0019-T005
status: done
priority: medium
state_hub_task_id: "b57a2fd1-e528-4481-b11b-12b15979a85f"
```
Specify the public source commands and library functions:
- `mkt source adapters`
- `mkt source inspect <path>`
- `mkt source normalize <path> --format markdown`
- JSON output for adapters, inspection, normalization, and diagnostics
- Markdown output for normalized document content
- adapter selection and explicit adapter override options
- exit behavior for unsupported, malformed, or dependency-missing inputs
- public API names that should be exported from `markitect_tool`
Output: CLI/API contract note and expected-output fixtures.
Implemented: `docs/source-adapter-contract.md` pins the `mkt source` command
surface, formats, options, exit behavior, and public API export names.
`examples/source-adapters/adapter-list.json` and
`examples/source-adapters/inspect-result.json` provide expected-output
fixtures.
## P19.6 - Prepare contract-test and markitect-filter handoff criteria
```task
id: MKTT-WP-0019-T006
status: done
priority: high
state_hub_task_id: "a7cb10fd-e1bd-4aee-81af-c93f09496ff8"
```
Define the contract tests that `MKTT-WP-0018` must implement:
- fake in-tree adapter for core behavior
- fake external adapter package or monkeypatched entry point for discovery
- serialization round trips for normalized model fixtures
- unsupported-format and missing-dependency diagnostics
- CLI JSON and Markdown output fixtures
- reusable adapter conformance expectations for `markitect-filter`
Also seed the `markitect-filter` handoff:
- expected package entry point declaration
- first EPUB3 adapter descriptor shape
- minimal fixture expectations for EPUB3 spine/nav/body extraction
- follow-up workplan seed for `markitect-filter` implementation
Output: contract-test checklist and handoff note.
Implemented: `docs/source-adapter-contract.md` includes the WP0018 contract
test checklist and the first `markitect-filter` EPUB3 handoff descriptor,
fixture expectations, and extraction responsibilities.
## Acceptance
- `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,
read protocol shape, entry point discovery, CLI/API output, or fake adapter
tests.
- v1 is explicitly read-only; write/export support is deferred to a later
workplan.
- External adapter discovery has a named entry point group and descriptor
object contract.
- `markitect-filter` has enough handoff detail to implement EPUB3 without
importing implementation decisions from `infospace-bench`.
- The existing `MKTT-WP-0018` workplan is updated to depend on this refinement
pass and to reference the pinned decisions rather than reopening them.