generated from coulomb/repo-seed
235 lines
7.6 KiB
Markdown
235 lines
7.6 KiB
Markdown
---
|
|
id: MKTT-WP-0019
|
|
type: workplan
|
|
title: "Source Adapter Contract Refinement"
|
|
domain: markitect
|
|
status: active
|
|
owner: markitect-tool
|
|
topic_slug: markitect
|
|
planning_priority: P0
|
|
planning_order: 142
|
|
depends_on_workplans:
|
|
- MKTT-WP-0013
|
|
- MKTT-WP-0017
|
|
related_workplans:
|
|
- MKTT-WP-0018
|
|
- MKTT-WP-0010
|
|
- MKTT-WP-0011
|
|
created: "2026-05-14"
|
|
updated: "2026-05-14"
|
|
state_hub_workstream_id: "702e94d9-35e8-4a83-b0ea-40cc19afbe51"
|
|
---
|
|
|
|
# MKTT-WP-0019: Source Adapter Contract Refinement
|
|
|
|
## Purpose
|
|
|
|
Refine the source adapter contract before implementing
|
|
`MKTT-WP-0018`. The goal is to remove the remaining ambiguity in the external
|
|
adapter surface so `markitect-tool` can implement the framework and
|
|
`markitect-filter` can implement EPUB3 without guessing about model fields,
|
|
entry points, CLI behavior, or contract-test expectations.
|
|
|
|
This is a short gating workplan. It should produce decisions, documentation,
|
|
and test fixtures that make `MKTT-WP-0018` implementation straightforward.
|
|
|
|
## Background
|
|
|
|
`MKTT-WP-0018` establishes the correct architecture boundary:
|
|
|
|
```text
|
|
markitect-tool -> contracts, normalized markdown model, registry, CLI/API
|
|
markitect-filter -> concrete source-format adapters, EPUB3 first
|
|
```
|
|
|
|
The boundary is sound, but a feasibility review found that the implementation
|
|
workplan still leaves several decisions too implicit:
|
|
|
|
- the existing internal extension framework does not yet define external
|
|
package entry point discovery
|
|
- the normalized source-to-markdown model names are listed, but field-level
|
|
contracts and serialization rules are not pinned
|
|
- v1 should be read-only, with write/export support reserved for a later
|
|
format-by-format decision
|
|
- CLI/API output envelopes, adapter selection, and unsupported-format behavior
|
|
need deterministic contracts
|
|
- `markitect-filter` needs a concrete handoff shape for its first EPUB3 adapter
|
|
|
|
## Decision
|
|
|
|
Add a refinement pass ahead of `MKTT-WP-0018`. This workplan should define the
|
|
minimum stable v1 contract and explicitly defer nonessential scope.
|
|
|
|
The v1 source adapter contract should be:
|
|
|
|
- read-only
|
|
- deterministic
|
|
- local-file-first, with URI support documented as future or explicitly scoped
|
|
- discoverable through a named package entry point group
|
|
- serializable without heavyweight optional format dependencies
|
|
- testable through fake adapters and small fixtures
|
|
|
|
## Non-Goals
|
|
|
|
- Do not implement EPUB3 parsing here.
|
|
- Do not implement the full `markitect-tool` source adapter framework here.
|
|
- Do not add PDF, DOCX, ODT, OCR, or browser dependencies.
|
|
- Do not design write/export adapters beyond recording the future extension
|
|
point.
|
|
- Do not make `markitect-filter` a knowledge platform or ingestion service.
|
|
|
|
## P19.1 - Pin v1 scope and external adapter package shape
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T001
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "0aa1d9a3-6cf8-47ab-8585-f23b2512d19b"
|
|
```
|
|
|
|
Define the v1 source adapter scope:
|
|
|
|
- read adapters only
|
|
- local filesystem inputs first
|
|
- explicit future status for URI inputs, binary attachments, and write adapters
|
|
- expected external package layout for `markitect-filter`
|
|
- dependency policy for optional format libraries
|
|
- compatibility expectations between `markitect-tool` and adapter packages
|
|
|
|
Output: concise architecture note or source-adapter contract section that
|
|
`MKTT-WP-0018` can implement directly.
|
|
|
|
## P19.2 - Specify normalized data model fields and serialization
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T002
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "fabd3e76-3c2c-43cb-92b2-2322bd933fa7"
|
|
```
|
|
|
|
Specify the field-level v1 model for:
|
|
|
|
- `SourceAsset`
|
|
- `SourceMetadata`
|
|
- `NormalizedMarkdownDocument`
|
|
- `NormalizedMarkdownSegment`
|
|
- `SourceProvenance`
|
|
- `NormalizationQuality`
|
|
- adapter diagnostics using the existing `Diagnostic`/`SourceLocation` shape
|
|
- optional asset reference envelopes, if needed for v1
|
|
|
|
The specification should define required vs optional fields, stable dict/JSON
|
|
serialization, digest/cache-key inputs, segment ordering, segment IDs, headings,
|
|
anchors, source hrefs, page/section references, and adapter metadata.
|
|
|
|
Output: model contract documentation and fixture-shaped examples.
|
|
|
|
## P19.3 - Specify read adapter protocol and selection semantics
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T003
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "2d559e3b-1515-4c88-8ed9-3895026cd2ca"
|
|
```
|
|
|
|
Define the v1 read protocol:
|
|
|
|
- request/result type names and fields
|
|
- `can_read`, `inspect`, and `read` method signatures
|
|
- media type and file extension matching rules
|
|
- adapter option schema conventions
|
|
- malformed-source and unsupported-format diagnostics
|
|
- deterministic adapter selection when multiple adapters match
|
|
- behavior when optional adapter dependencies are missing
|
|
|
|
Output: protocol contract that can be implemented as Python `Protocol`
|
|
classes in `MKTT-WP-0018`.
|
|
|
|
## P19.4 - Define package entry point and registry contract
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T004
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "3d661d24-2496-405a-b525-c7e6d8eb4e68"
|
|
```
|
|
|
|
Define how external source adapter packages register with `markitect-tool`:
|
|
|
|
- entry point group name, initially `markitect_tool.source_adapters`
|
|
- expected entry point object shape
|
|
- descriptor ID and versioning rules
|
|
- relationship between source adapter descriptors and
|
|
`ExtensionDescriptor`
|
|
- duplicate descriptor handling
|
|
- dependency diagnostics for missing optional format libraries
|
|
- compatibility notes for separately versioned packages
|
|
|
|
Output: discovery contract and fake entry point test plan for
|
|
`MKTT-WP-0018`.
|
|
|
|
## P19.5 - Pin CLI/API output envelopes and exit behavior
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T005
|
|
status: todo
|
|
priority: medium
|
|
state_hub_task_id: "2c30b0c7-683e-4d60-8268-0b49660f2e30"
|
|
```
|
|
|
|
Specify the public source commands and library functions:
|
|
|
|
- `mkt source adapters`
|
|
- `mkt source inspect <path>`
|
|
- `mkt source normalize <path> --format markdown`
|
|
- JSON output for adapters, inspection, normalization, and diagnostics
|
|
- Markdown output for normalized document content
|
|
- adapter selection and explicit adapter override options
|
|
- exit behavior for unsupported, malformed, or dependency-missing inputs
|
|
- public API names that should be exported from `markitect_tool`
|
|
|
|
Output: CLI/API contract note and expected-output fixtures.
|
|
|
|
## P19.6 - Prepare contract-test and markitect-filter handoff criteria
|
|
|
|
```task
|
|
id: MKTT-WP-0019-T006
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "f6845a4d-3465-40b3-970a-714cfafe282c"
|
|
```
|
|
|
|
Define the contract tests that `MKTT-WP-0018` must implement:
|
|
|
|
- fake in-tree adapter for core behavior
|
|
- fake external adapter package or monkeypatched entry point for discovery
|
|
- serialization round trips for normalized model fixtures
|
|
- unsupported-format and missing-dependency diagnostics
|
|
- CLI JSON and Markdown output fixtures
|
|
- reusable adapter conformance expectations for `markitect-filter`
|
|
|
|
Also seed the `markitect-filter` handoff:
|
|
|
|
- expected package entry point declaration
|
|
- first EPUB3 adapter descriptor shape
|
|
- minimal fixture expectations for EPUB3 spine/nav/body extraction
|
|
- follow-up workplan seed for `markitect-filter` implementation
|
|
|
|
Output: contract-test checklist and handoff note.
|
|
|
|
## Acceptance
|
|
|
|
- `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,
|
|
read protocol shape, entry point discovery, CLI/API output, or fake adapter
|
|
tests.
|
|
- v1 is explicitly read-only; write/export support is deferred to a later
|
|
workplan.
|
|
- External adapter discovery has a named entry point group and descriptor
|
|
object contract.
|
|
- `markitect-filter` has enough handoff detail to implement EPUB3 without
|
|
importing implementation decisions from `infospace-bench`.
|
|
- The existing `MKTT-WP-0018` workplan is updated to depend on this refinement
|
|
pass and to reference the pinned decisions rather than reopening them.
|