generated from coulomb/repo-seed
266 lines
9.2 KiB
Markdown
266 lines
9.2 KiB
Markdown
---
|
|
id: MKTT-WP-0019
|
|
type: workplan
|
|
title: "Source Adapter Contract Refinement"
|
|
domain: markitect
|
|
status: done
|
|
owner: markitect-tool
|
|
topic_slug: markitect
|
|
planning_priority: complete
|
|
planning_order: 142
|
|
depends_on_workplans:
|
|
- MKTT-WP-0013
|
|
- MKTT-WP-0017
|
|
related_workplans:
|
|
- MKTT-WP-0018
|
|
- MKTT-WP-0010
|
|
- MKTT-WP-0011
|
|
created: "2026-05-14"
|
|
updated: "2026-05-14"
|
|
state_hub_workstream_id: "702e94d9-35e8-4a83-b0ea-40cc19afbe51"
|
|
---
|
|
|
|
# MKTT-WP-0019: Source Adapter Contract Refinement
|
|
|
|
## Purpose
|
|
|
|
Refine the source adapter contract before implementing
|
|
`MKTT-WP-0018`. The goal is to remove the remaining ambiguity in the external
|
|
adapter surface so `markitect-tool` can implement the framework and
|
|
`markitect-filter` can implement EPUB3 without guessing about model fields,
|
|
entry points, CLI behavior, or contract-test expectations.
|
|
|
|
This is a short gating workplan. It should produce decisions, documentation,
|
|
and test fixtures that make `MKTT-WP-0018` implementation straightforward.
|
|
|
|
## Background
|
|
|
|
`MKTT-WP-0018` establishes the correct architecture boundary:
|
|
|
|
```text
|
|
markitect-tool -> contracts, normalized markdown model, registry, CLI/API
|
|
markitect-filter -> concrete source-format adapters, EPUB3 first
|
|
```
|
|
|
|
The boundary is sound, but a feasibility review found that the implementation
|
|
workplan still leaves several decisions too implicit:
|
|
|
|
- the existing internal extension framework does not yet define external
|
|
package entry point discovery
|
|
- the normalized source-to-markdown model names are listed, but field-level
|
|
contracts and serialization rules are not pinned
|
|
- v1 should be read-only, with write/export support reserved for a later
|
|
format-by-format decision
|
|
- CLI/API output envelopes, adapter selection, and unsupported-format behavior
|
|
need deterministic contracts
|
|
- `markitect-filter` needs a concrete handoff shape for its first EPUB3 adapter
|
|
|
|
## Decision
|
|
|
|
Add a refinement pass ahead of `MKTT-WP-0018`. This workplan should define the
|
|
minimum stable v1 contract and explicitly defer nonessential scope.
|
|
|
|
The v1 source adapter contract should be:
|
|
|
|
- read-only
|
|
- deterministic
|
|
- local-file-first, with URI support documented as future or explicitly scoped
|
|
- discoverable through a named package entry point group
|
|
- serializable without heavyweight optional format dependencies
|
|
- testable through fake adapters and small fixtures
|
|
|
|
## Non-Goals
|
|
|
|
- Do not implement EPUB3 parsing here.
|
|
- Do not implement the full `markitect-tool` source adapter framework here.
|
|
- Do not add PDF, DOCX, ODT, OCR, or browser dependencies.
|
|
- Do not design write/export adapters beyond recording the future extension
|
|
point.
|
|
- Do not make `markitect-filter` a knowledge platform or ingestion service.
|
|
|
|
## P19.1 - Pin v1 scope and external adapter package shape
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T001
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "0aa1d9a3-6cf8-47ab-8585-f23b2512d19b"
|
|
```
|
|
|
|
Define the v1 source adapter scope:
|
|
|
|
- read adapters only
|
|
- local filesystem inputs first
|
|
- explicit future status for URI inputs, binary attachments, and write adapters
|
|
- expected external package layout for `markitect-filter`
|
|
- dependency policy for optional format libraries
|
|
- compatibility expectations between `markitect-tool` and adapter packages
|
|
|
|
Output: concise architecture note or source-adapter contract section that
|
|
`MKTT-WP-0018` can implement directly.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` defines the v1 read-only scope,
|
|
local-file-first posture, external package shape, optional dependency policy,
|
|
and compatibility boundary for `markitect-filter`.
|
|
|
|
## P19.2 - Specify normalized data model fields and serialization
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T002
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "fabd3e76-3c2c-43cb-92b2-2322bd933fa7"
|
|
```
|
|
|
|
Specify the field-level v1 model for:
|
|
|
|
- `SourceAsset`
|
|
- `SourceMetadata`
|
|
- `NormalizedMarkdownDocument`
|
|
- `NormalizedMarkdownSegment`
|
|
- `SourceProvenance`
|
|
- `NormalizationQuality`
|
|
- adapter diagnostics using the existing `Diagnostic`/`SourceLocation` shape
|
|
- optional asset reference envelopes, if needed for v1
|
|
|
|
The specification should define required vs optional fields, stable dict/JSON
|
|
serialization, digest/cache-key inputs, segment ordering, segment IDs, headings,
|
|
anchors, source hrefs, page/section references, and adapter metadata.
|
|
|
|
Output: model contract documentation and fixture-shaped examples.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` pins field-level model contracts
|
|
for source assets, metadata, provenance, segments, normalized documents, and
|
|
quality. `examples/source-adapters/normalized-document.json` and
|
|
`examples/source-adapters/normalized-output.md` provide fixture-shaped
|
|
examples.
|
|
|
|
## P19.3 - Specify read adapter protocol and selection semantics
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T003
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2d559e3b-1515-4c88-8ed9-3895026cd2ca"
|
|
```
|
|
|
|
Define the v1 read protocol:
|
|
|
|
- request/result type names and fields
|
|
- `can_read`, `inspect`, and `read` method signatures
|
|
- media type and file extension matching rules
|
|
- adapter option schema conventions
|
|
- malformed-source and unsupported-format diagnostics
|
|
- deterministic adapter selection when multiple adapters match
|
|
- behavior when optional adapter dependencies are missing
|
|
|
|
Output: protocol contract that can be implemented as Python `Protocol`
|
|
classes in `MKTT-WP-0018`.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` defines the v1
|
|
`SourceReadAdapter` protocol, request/result names, option handling, adapter
|
|
selection semantics, and deterministic diagnostics for unsupported, malformed,
|
|
and dependency-missing inputs.
|
|
|
|
## P19.4 - Define package entry point and registry contract
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T004
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "3d661d24-2496-405a-b525-c7e6d8eb4e68"
|
|
```
|
|
|
|
Define how external source adapter packages register with `markitect-tool`:
|
|
|
|
- entry point group name, initially `markitect_tool.source_adapters`
|
|
- expected entry point object shape
|
|
- descriptor ID and versioning rules
|
|
- relationship between source adapter descriptors and
|
|
`ExtensionDescriptor`
|
|
- duplicate descriptor handling
|
|
- dependency diagnostics for missing optional format libraries
|
|
- compatibility notes for separately versioned packages
|
|
|
|
Output: discovery contract and fake entry point test plan for
|
|
`MKTT-WP-0018`.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` defines the
|
|
`markitect_tool.source_adapters` entry point group, accepted entry point object
|
|
shapes, descriptor mapping to `ExtensionDescriptor`, duplicate handling, and
|
|
dependency diagnostics. `examples/source-adapters/fake-adapter-pyproject.toml`
|
|
provides the fake entry point fixture.
|
|
|
|
## P19.5 - Pin CLI/API output envelopes and exit behavior
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T005
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "2c30b0c7-683e-4d60-8268-0b49660f2e30"
|
|
```
|
|
|
|
Specify the public source commands and library functions:
|
|
|
|
- `mkt source adapters`
|
|
- `mkt source inspect <path>`
|
|
- `mkt source normalize <path> --format markdown`
|
|
- JSON output for adapters, inspection, normalization, and diagnostics
|
|
- Markdown output for normalized document content
|
|
- adapter selection and explicit adapter override options
|
|
- exit behavior for unsupported, malformed, or dependency-missing inputs
|
|
- public API names that should be exported from `markitect_tool`
|
|
|
|
Output: CLI/API contract note and expected-output fixtures.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` pins the `mkt source` command
|
|
surface, formats, options, exit behavior, and public API export names.
|
|
`examples/source-adapters/adapter-list.json` and
|
|
`examples/source-adapters/inspect-result.json` provide expected-output
|
|
fixtures.
|
|
|
|
## P19.6 - Prepare contract-test and markitect-filter handoff criteria
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T006
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "f6845a4d-3465-40b3-970a-714cfafe282c"
|
|
```
|
|
|
|
Define the contract tests that `MKTT-WP-0018` must implement:
|
|
|
|
- fake in-tree adapter for core behavior
|
|
- fake external adapter package or monkeypatched entry point for discovery
|
|
- serialization round trips for normalized model fixtures
|
|
- unsupported-format and missing-dependency diagnostics
|
|
- CLI JSON and Markdown output fixtures
|
|
- reusable adapter conformance expectations for `markitect-filter`
|
|
|
|
Also seed the `markitect-filter` handoff:
|
|
|
|
- expected package entry point declaration
|
|
- first EPUB3 adapter descriptor shape
|
|
- minimal fixture expectations for EPUB3 spine/nav/body extraction
|
|
- follow-up workplan seed for `markitect-filter` implementation
|
|
|
|
Output: contract-test checklist and handoff note.
|
|
|
|
Implemented: `docs/source-adapter-contract.md` includes the WP0018 contract
|
|
test checklist and the first `markitect-filter` EPUB3 handoff descriptor,
|
|
fixture expectations, and extraction responsibilities.
|
|
|
|
## Acceptance
|
|
|
|
- `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,
|
|
read protocol shape, entry point discovery, CLI/API output, or fake adapter
|
|
tests.
|
|
- v1 is explicitly read-only; write/export support is deferred to a later
|
|
workplan.
|
|
- External adapter discovery has a named entry point group and descriptor
|
|
object contract.
|
|
- `markitect-filter` has enough handoff detail to implement EPUB3 without
|
|
importing implementation decisions from `infospace-bench`.
|
|
- The existing `MKTT-WP-0018` workplan is updated to depend on this refinement
|
|
pass and to reference the pinned decisions rather than reopening them.
|