generated from coulomb/repo-seed
Workplans to establish markitect-filter integration
This commit is contained in:
@@ -42,6 +42,8 @@ and descriptions mirror the operational view.
|
|||||||
| `MKTT-WP-0012` | complete | done | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Document function layer is complete: deterministic Markdown-native function descriptors, registry, inline/fenced syntax, pipelines, context bindings, CLI, docs, examples, diagnostics, provenance, and extension descriptor. |
|
| `MKTT-WP-0012` | complete | done | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Document function layer is complete: deterministic Markdown-native function descriptors, registry, inline/fenced syntax, pipelines, context bindings, CLI, docs, examples, diagnostics, provenance, and extension descriptor. |
|
||||||
| `MKTT-WP-0008` | complete | done | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory context cache is complete: context package schema, local registry, package creation from queries/search/manifests, deterministic summaries, namespaces, activation/deactivation/refresh/explain lifecycle, policy re-checks, CLI, docs, and examples. |
|
| `MKTT-WP-0008` | complete | done | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory context cache is complete: context package schema, local registry, package creation from queries/search/manifests, deterministic summaries, namespaces, activation/deactivation/refresh/explain lifecycle, policy re-checks, CLI, docs, and examples. |
|
||||||
| `MKTT-WP-0017` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0013` | CLI/API polish and practical adoption track is complete: shell completion, extension discovery, generated CLI/API docs, usecase relevance matrix, E2E fixture matrix, large-corpus smoke coverage, first-use docs, examples index, and command cheat sheet. |
|
| `MKTT-WP-0017` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0013` | CLI/API polish and practical adoption track is complete: shell completion, extension discovery, generated CLI/API docs, usecase relevance matrix, E2E fixture matrix, large-corpus smoke coverage, first-use docs, examples index, and command cheat sheet. |
|
||||||
|
| `MKTT-WP-0019` | P0 | active | `MKTT-WP-0013`, `MKTT-WP-0017` | Source adapter contract refinement: pin v1 read-only scope, field-level normalized model semantics, external adapter entry point discovery, CLI/API envelopes, fake adapter contract tests, and `markitect-filter` handoff before implementation. |
|
||||||
|
| `MKTT-WP-0018` | P1 | todo | `MKTT-WP-0013`, `MKTT-WP-0017`, `MKTT-WP-0019` | Source adapter framework implementation: implement the contract refined in `MKTT-WP-0019`, keeping format extraction in `markitect-filter` and the base install free of heavyweight conversion dependencies. |
|
||||||
| `MKTT-WP-0015` | P2 | todo | `MKTT-WP-0010`, `MKTT-WP-0011`, `MKTT-WP-0012` | Future render and document-function extensions: typed values, richer syntax, document-local reusable functions, Quarkdown/export adapters, render-aware references, assets, and permission sandboxing. Defer unless publishing/export pressure becomes current. |
|
| `MKTT-WP-0015` | P2 | todo | `MKTT-WP-0010`, `MKTT-WP-0011`, `MKTT-WP-0012` | Future render and document-function extensions: typed values, richer syntax, document-local reusable functions, Quarkdown/export adapters, render-aware references, assets, and permission sandboxing. Defer unless publishing/export pressure becomes current. |
|
||||||
| `MKTT-WP-0016` | P2 | todo | `MKTT-WP-0008`, `MKTT-WP-0007`, `MKTT-WP-0009`, `MKTT-WP-0013` | Follow-on agentic memory architecture: reasoning decision graphs, conversational paths, long-term knowledge graphs, memory service blueprints/profiles, graph-to-context-package compilation, and adapter boundaries. |
|
| `MKTT-WP-0016` | P2 | todo | `MKTT-WP-0008`, `MKTT-WP-0007`, `MKTT-WP-0009`, `MKTT-WP-0013` | Follow-on agentic memory architecture: reasoning decision graphs, conversational paths, long-term knowledge graphs, memory service blueprints/profiles, graph-to-context-package compilation, and adapter boundaries. |
|
||||||
|
|
||||||
@@ -123,6 +125,13 @@ before the remaining advanced extension work because users and agents need a
|
|||||||
complete, documented, shell-friendly, test-backed public surface before the
|
complete, documented, shell-friendly, test-backed public surface before the
|
||||||
tool grows further.
|
tool grows further.
|
||||||
|
|
||||||
|
`MKTT-WP-0019` is the current source-adapter refinement track. It intentionally
|
||||||
|
precedes `MKTT-WP-0018` so the implementation work does not have to invent
|
||||||
|
field-level normalized model semantics, package entry point discovery, read
|
||||||
|
protocol behavior, CLI/API envelopes, or `markitect-filter` handoff criteria
|
||||||
|
while coding. The v1 contract should stay read-only; writer/export adapters
|
||||||
|
belong in later format-specific work once preservation semantics are explicit.
|
||||||
|
|
||||||
## State Hub Mirror
|
## State Hub Mirror
|
||||||
|
|
||||||
Native State Hub dependency edges should mirror the whole-workstream
|
Native State Hub dependency edges should mirror the whole-workstream
|
||||||
@@ -162,3 +171,8 @@ dependencies:
|
|||||||
- `MKTT-WP-0016 -> MKTT-WP-0013`
|
- `MKTT-WP-0016 -> MKTT-WP-0013`
|
||||||
- `MKTT-WP-0017 -> MKTT-WP-0003`
|
- `MKTT-WP-0017 -> MKTT-WP-0003`
|
||||||
- `MKTT-WP-0017 -> MKTT-WP-0013`
|
- `MKTT-WP-0017 -> MKTT-WP-0013`
|
||||||
|
- `MKTT-WP-0019 -> MKTT-WP-0013`
|
||||||
|
- `MKTT-WP-0019 -> MKTT-WP-0017`
|
||||||
|
- `MKTT-WP-0018 -> MKTT-WP-0013`
|
||||||
|
- `MKTT-WP-0018 -> MKTT-WP-0017`
|
||||||
|
- `MKTT-WP-0018 -> MKTT-WP-0019`
|
||||||
|
|||||||
286
workplans/MKTT-WP-0018-source-adapter-contract.md
Normal file
286
workplans/MKTT-WP-0018-source-adapter-contract.md
Normal file
@@ -0,0 +1,286 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0018
|
||||||
|
type: workplan
|
||||||
|
title: "Source Adapter Interface And Markdown Normalization Contract"
|
||||||
|
domain: markitect
|
||||||
|
status: todo
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P1
|
||||||
|
planning_order: 145
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0013
|
||||||
|
- MKTT-WP-0017
|
||||||
|
- MKTT-WP-0019
|
||||||
|
related_workplans:
|
||||||
|
- MKTT-WP-0010
|
||||||
|
- MKTT-WP-0011
|
||||||
|
- MKTT-WP-0012
|
||||||
|
- MKTT-WP-0019
|
||||||
|
created: "2026-05-14"
|
||||||
|
updated: "2026-05-14"
|
||||||
|
state_hub_workstream_id: "c4e4511f-13ea-40b4-9083-6d9ab6d12dad"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0018: Source Adapter Interface And Markdown Normalization Contract
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Define the `markitect-tool` framework for source-format adapters and canonical
|
||||||
|
markdown normalization so a separate `markitect-filter` repository can provide
|
||||||
|
concrete adapters for EPUB3 first, and later PDF, DOCX, ODT, HTML, and other
|
||||||
|
formats.
|
||||||
|
|
||||||
|
This workplan deliberately does **not** make `markitect-tool` a document
|
||||||
|
conversion product. It keeps `markitect-tool` focused on the syntax layer:
|
||||||
|
|
||||||
|
- structured markdown contracts
|
||||||
|
- canonical normalized markdown representations
|
||||||
|
- adapter protocols and descriptors
|
||||||
|
- registry/discovery hooks
|
||||||
|
- deterministic validation and diagnostics
|
||||||
|
|
||||||
|
Concrete format extraction lives outside the core toolkit, initially in
|
||||||
|
`markitect-filter`.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
The Lefevre EPUB3 example in `infospace-bench` exposed a boundary problem.
|
||||||
|
`infospace-bench` can pragmatically read EPUB-like files, but that logic is not
|
||||||
|
application-layer work. The durable split should be:
|
||||||
|
|
||||||
|
```text
|
||||||
|
source formats
|
||||||
|
-> markitect-filter concrete adapters
|
||||||
|
-> markitect-tool source adapter protocol and markdown normalization contract
|
||||||
|
-> infospace-bench generation workflows
|
||||||
|
-> optional kontextual-engine persistence, retrieval, governance
|
||||||
|
```
|
||||||
|
|
||||||
|
This lets `infospace-bench` consume normalized markdown sources without owning
|
||||||
|
EPUB/PDF/DOCX details, while `kontextual-engine` can later use the same
|
||||||
|
adapter outputs for managed ingestion.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Establish a `markitect-tool` source adapter framework and normalization
|
||||||
|
contract first. Then implement EPUB3 as the first concrete read adapter in a
|
||||||
|
new `markitect-filter` repo.
|
||||||
|
|
||||||
|
`MKTT-WP-0019` must run first and pin the v1 contract details for this
|
||||||
|
implementation: field-level normalized model semantics, read-only protocol
|
||||||
|
shape, external package entry point discovery, CLI/API output envelopes, and
|
||||||
|
fake adapter contract-test expectations.
|
||||||
|
|
||||||
|
`markitect-tool` should define:
|
||||||
|
|
||||||
|
- adapter request/result types
|
||||||
|
- canonical markdown document and segment models
|
||||||
|
- provenance, metadata, quality, and diagnostic envelopes
|
||||||
|
- adapter capability descriptors
|
||||||
|
- registry and package discovery hooks
|
||||||
|
- CLI/API affordances for adapter inspection and conversion
|
||||||
|
- contract tests that an external adapter package can satisfy
|
||||||
|
|
||||||
|
`markitect-filter` should define:
|
||||||
|
|
||||||
|
- concrete source readers
|
||||||
|
- optional dependencies needed for each source format
|
||||||
|
- format-specific tests and fixtures
|
||||||
|
- EPUB3 spine/nav/body extraction as its first implementation
|
||||||
|
|
||||||
|
Write/export adapters remain future optional scope until a format-specific
|
||||||
|
preservation contract exists.
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Do not implement EPUB3 parsing in `markitect-tool` core.
|
||||||
|
- Do not add heavyweight PDF, DOCX, or ODT dependencies to the base install.
|
||||||
|
- Do not move infospace lifecycle or generation concerns into `markitect-tool`.
|
||||||
|
- Do not implement persistent ingestion, permissions, retrieval, or governance;
|
||||||
|
those remain `kontextual-engine` responsibilities.
|
||||||
|
- Do not define domain-specific entity/relation workflows here.
|
||||||
|
- Do not implement source writer/export adapters in the v1 slice; keep the
|
||||||
|
protocol read-only unless a later workplan deliberately opens write scope.
|
||||||
|
|
||||||
|
## P18.1 - Architecture boundary and markitect-filter handoff
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "a5d05b2a-b9d8-43c6-9e52-5a77094b49d1"
|
||||||
|
```
|
||||||
|
|
||||||
|
Implement the cross-repo architecture pinned by `MKTT-WP-0019`:
|
||||||
|
|
||||||
|
- `markitect-tool` owns adapter contracts and markdown normalization
|
||||||
|
- `markitect-filter` owns concrete source-format adapters
|
||||||
|
- `infospace-bench` consumes normalized markdown for concrete infospaces
|
||||||
|
- `kontextual-engine` can ingest adapter outputs into durable knowledge assets
|
||||||
|
|
||||||
|
Output: architecture note covering responsibilities, extension package shape,
|
||||||
|
the pinned entry point contract, dependency policy, and migration path from the
|
||||||
|
current `infospace-bench` EPUB spike.
|
||||||
|
|
||||||
|
## P18.2 - Canonical source-to-markdown data model
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "f8164264-a9c1-4c82-8617-76bbb84a51bb"
|
||||||
|
```
|
||||||
|
|
||||||
|
Implement the normalized output model specified by `MKTT-WP-0019`:
|
||||||
|
|
||||||
|
- `SourceAsset`
|
||||||
|
- `SourceMetadata`
|
||||||
|
- `NormalizedMarkdownDocument`
|
||||||
|
- `NormalizedMarkdownSegment`
|
||||||
|
- `SourceProvenance`
|
||||||
|
- `NormalizationDiagnostic`
|
||||||
|
- `NormalizationQuality`
|
||||||
|
- optional `SourceBinaryAttachment` or asset reference envelope
|
||||||
|
|
||||||
|
The model should represent:
|
||||||
|
|
||||||
|
- original path/URI/media type
|
||||||
|
- title, author/creator, language, rights, source URL, publication metadata
|
||||||
|
- ordered markdown content
|
||||||
|
- segment IDs, headings, anchors, page/section references
|
||||||
|
- digest and cache keys
|
||||||
|
- extraction diagnostics
|
||||||
|
- lossiness/quality signals
|
||||||
|
- adapter name/version/options
|
||||||
|
|
||||||
|
Output: public data model, serialization tests, and normalization contract
|
||||||
|
documentation matching the field-level v1 specification.
|
||||||
|
|
||||||
|
## P18.3 - Source adapter protocol and capability descriptors
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "5036ff34-49f4-4900-9e90-95c4555b4ce9"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define the read adapter interface:
|
||||||
|
|
||||||
|
- source reader protocol: `can_read`, `inspect`, `read`
|
||||||
|
- media type and file extension matching
|
||||||
|
- adapter option schema
|
||||||
|
- capability descriptor shape
|
||||||
|
- safety and dependency flags
|
||||||
|
- deterministic diagnostics
|
||||||
|
|
||||||
|
Do not add writer protocols in this implementation slice. Preserve room for a
|
||||||
|
future writer protocol by keeping descriptors capability-based, but avoid
|
||||||
|
shipping `can_write`/`write` contracts before there is a format-specific
|
||||||
|
preservation model.
|
||||||
|
|
||||||
|
The first implementation slice can ship a fake in-tree adapter for tests only.
|
||||||
|
Concrete EPUB3 implementation belongs in `markitect-filter`.
|
||||||
|
|
||||||
|
Output: protocol module, descriptor integration, tests for matching,
|
||||||
|
inspection, reading, diagnostics, and unsupported-format behavior.
|
||||||
|
|
||||||
|
## P18.4 - Adapter registry and discovery hooks
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T004
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "391fb723-8990-4086-ac6c-656a3d637ba3"
|
||||||
|
```
|
||||||
|
|
||||||
|
Wire source adapters into the existing internal extension framework:
|
||||||
|
|
||||||
|
- register source adapter descriptors
|
||||||
|
- discover package-provided adapters through the entry point group pinned by
|
||||||
|
`MKTT-WP-0019`
|
||||||
|
- expose adapter capabilities via extension listing/inspection
|
||||||
|
- report missing optional dependency diagnostics
|
||||||
|
- ensure adapter packages can remain independently versioned
|
||||||
|
|
||||||
|
Output: registry implementation, package discovery tests, and compatibility
|
||||||
|
notes for `markitect-filter`.
|
||||||
|
|
||||||
|
## P18.5 - Normalization CLI and public API surface
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "c6233bd1-0403-498b-a6ed-c1874b172aa3"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expose a small CLI/API surface:
|
||||||
|
|
||||||
|
- `mkt source adapters`
|
||||||
|
- `mkt source inspect <path-or-uri>`
|
||||||
|
- `mkt source normalize <path-or-uri> --format markdown`
|
||||||
|
- structured JSON output for inspection and diagnostics
|
||||||
|
- markdown output for normalized content
|
||||||
|
- public API exports for adapter discovery and normalization
|
||||||
|
|
||||||
|
Output: CLI commands, API exports, generated command/API docs updates, and
|
||||||
|
tests.
|
||||||
|
|
||||||
|
## P18.6 - Contract tests and fake adapter fixture
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T006
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "263d0351-2942-4c2a-b333-b3aa96f2b8e3"
|
||||||
|
```
|
||||||
|
|
||||||
|
Add deterministic contract tests proving that an external read adapter can:
|
||||||
|
|
||||||
|
- register through the extension framework
|
||||||
|
- advertise read capabilities
|
||||||
|
- inspect a source without full conversion
|
||||||
|
- normalize a source into canonical markdown documents and segments
|
||||||
|
- emit provenance, metadata, diagnostics, and quality signals
|
||||||
|
- fail gracefully for unsupported or malformed sources
|
||||||
|
|
||||||
|
Output: fake adapter fixture, reusable contract-test helpers, and documentation
|
||||||
|
for `markitect-filter` adapter implementers.
|
||||||
|
|
||||||
|
## P18.7 - Cross-repo migration notes for infospace-bench and markitect-filter
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0018-T007
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "dfc81c61-f7ca-4266-8908-56b221101fd4"
|
||||||
|
```
|
||||||
|
|
||||||
|
Document how the new contract affects sibling repos:
|
||||||
|
|
||||||
|
- `infospace-bench` should replace its local EPUB/read normalization spike with
|
||||||
|
calls to the source adapter API
|
||||||
|
- `markitect-filter` should implement EPUB3 first against this contract
|
||||||
|
- `kontextual-engine` should treat normalized source outputs as ingestible
|
||||||
|
knowledge asset derivatives when it needs durable ingestion
|
||||||
|
|
||||||
|
Output: migration note and follow-up workplan seeds for `markitect-filter` and
|
||||||
|
`infospace-bench`.
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
- `markitect-tool` exposes a stable source adapter protocol and canonical
|
||||||
|
markdown normalization contract.
|
||||||
|
- The base install remains markdown-native and does not pull heavyweight
|
||||||
|
format-conversion dependencies.
|
||||||
|
- External adapter packages can register and be discovered through the existing
|
||||||
|
extension framework.
|
||||||
|
- CLI and API users can inspect available source adapters and normalize a source
|
||||||
|
through a registered adapter.
|
||||||
|
- Tests prove the contract with a fake adapter and no network dependency.
|
||||||
|
- Documentation clearly assigns EPUB3 implementation to `markitect-filter`, not
|
||||||
|
`markitect-tool` or `infospace-bench`.
|
||||||
|
- Writer/export adapter support is explicitly deferred beyond the v1 read
|
||||||
|
adapter contract.
|
||||||
234
workplans/MKTT-WP-0019-source-adapter-contract-refinement.md
Normal file
234
workplans/MKTT-WP-0019-source-adapter-contract-refinement.md
Normal file
@@ -0,0 +1,234 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0019
|
||||||
|
type: workplan
|
||||||
|
title: "Source Adapter Contract Refinement"
|
||||||
|
domain: markitect
|
||||||
|
status: active
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
planning_priority: P0
|
||||||
|
planning_order: 142
|
||||||
|
depends_on_workplans:
|
||||||
|
- MKTT-WP-0013
|
||||||
|
- MKTT-WP-0017
|
||||||
|
related_workplans:
|
||||||
|
- MKTT-WP-0018
|
||||||
|
- MKTT-WP-0010
|
||||||
|
- MKTT-WP-0011
|
||||||
|
created: "2026-05-14"
|
||||||
|
updated: "2026-05-14"
|
||||||
|
state_hub_workstream_id: "702e94d9-35e8-4a83-b0ea-40cc19afbe51"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0019: Source Adapter Contract Refinement
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Refine the source adapter contract before implementing
|
||||||
|
`MKTT-WP-0018`. The goal is to remove the remaining ambiguity in the external
|
||||||
|
adapter surface so `markitect-tool` can implement the framework and
|
||||||
|
`markitect-filter` can implement EPUB3 without guessing about model fields,
|
||||||
|
entry points, CLI behavior, or contract-test expectations.
|
||||||
|
|
||||||
|
This is a short gating workplan. It should produce decisions, documentation,
|
||||||
|
and test fixtures that make `MKTT-WP-0018` implementation straightforward.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
`MKTT-WP-0018` establishes the correct architecture boundary:
|
||||||
|
|
||||||
|
```text
|
||||||
|
markitect-tool -> contracts, normalized markdown model, registry, CLI/API
|
||||||
|
markitect-filter -> concrete source-format adapters, EPUB3 first
|
||||||
|
```
|
||||||
|
|
||||||
|
The boundary is sound, but a feasibility review found that the implementation
|
||||||
|
workplan still leaves several decisions too implicit:
|
||||||
|
|
||||||
|
- the existing internal extension framework does not yet define external
|
||||||
|
package entry point discovery
|
||||||
|
- the normalized source-to-markdown model names are listed, but field-level
|
||||||
|
contracts and serialization rules are not pinned
|
||||||
|
- v1 should be read-only, with write/export support reserved for a later
|
||||||
|
format-by-format decision
|
||||||
|
- CLI/API output envelopes, adapter selection, and unsupported-format behavior
|
||||||
|
need deterministic contracts
|
||||||
|
- `markitect-filter` needs a concrete handoff shape for its first EPUB3 adapter
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Add a refinement pass ahead of `MKTT-WP-0018`. This workplan should define the
|
||||||
|
minimum stable v1 contract and explicitly defer nonessential scope.
|
||||||
|
|
||||||
|
The v1 source adapter contract should be:
|
||||||
|
|
||||||
|
- read-only
|
||||||
|
- deterministic
|
||||||
|
- local-file-first, with URI support documented as future or explicitly scoped
|
||||||
|
- discoverable through a named package entry point group
|
||||||
|
- serializable without heavyweight optional format dependencies
|
||||||
|
- testable through fake adapters and small fixtures
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Do not implement EPUB3 parsing here.
|
||||||
|
- Do not implement the full `markitect-tool` source adapter framework here.
|
||||||
|
- Do not add PDF, DOCX, ODT, OCR, or browser dependencies.
|
||||||
|
- Do not design write/export adapters beyond recording the future extension
|
||||||
|
point.
|
||||||
|
- Do not make `markitect-filter` a knowledge platform or ingestion service.
|
||||||
|
|
||||||
|
## P19.1 - Pin v1 scope and external adapter package shape
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "0aa1d9a3-6cf8-47ab-8585-f23b2512d19b"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define the v1 source adapter scope:
|
||||||
|
|
||||||
|
- read adapters only
|
||||||
|
- local filesystem inputs first
|
||||||
|
- explicit future status for URI inputs, binary attachments, and write adapters
|
||||||
|
- expected external package layout for `markitect-filter`
|
||||||
|
- dependency policy for optional format libraries
|
||||||
|
- compatibility expectations between `markitect-tool` and adapter packages
|
||||||
|
|
||||||
|
Output: concise architecture note or source-adapter contract section that
|
||||||
|
`MKTT-WP-0018` can implement directly.
|
||||||
|
|
||||||
|
## P19.2 - Specify normalized data model fields and serialization
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "fabd3e76-3c2c-43cb-92b2-2322bd933fa7"
|
||||||
|
```
|
||||||
|
|
||||||
|
Specify the field-level v1 model for:
|
||||||
|
|
||||||
|
- `SourceAsset`
|
||||||
|
- `SourceMetadata`
|
||||||
|
- `NormalizedMarkdownDocument`
|
||||||
|
- `NormalizedMarkdownSegment`
|
||||||
|
- `SourceProvenance`
|
||||||
|
- `NormalizationQuality`
|
||||||
|
- adapter diagnostics using the existing `Diagnostic`/`SourceLocation` shape
|
||||||
|
- optional asset reference envelopes, if needed for v1
|
||||||
|
|
||||||
|
The specification should define required vs optional fields, stable dict/JSON
|
||||||
|
serialization, digest/cache-key inputs, segment ordering, segment IDs, headings,
|
||||||
|
anchors, source hrefs, page/section references, and adapter metadata.
|
||||||
|
|
||||||
|
Output: model contract documentation and fixture-shaped examples.
|
||||||
|
|
||||||
|
## P19.3 - Specify read adapter protocol and selection semantics
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "2d559e3b-1515-4c88-8ed9-3895026cd2ca"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define the v1 read protocol:
|
||||||
|
|
||||||
|
- request/result type names and fields
|
||||||
|
- `can_read`, `inspect`, and `read` method signatures
|
||||||
|
- media type and file extension matching rules
|
||||||
|
- adapter option schema conventions
|
||||||
|
- malformed-source and unsupported-format diagnostics
|
||||||
|
- deterministic adapter selection when multiple adapters match
|
||||||
|
- behavior when optional adapter dependencies are missing
|
||||||
|
|
||||||
|
Output: protocol contract that can be implemented as Python `Protocol`
|
||||||
|
classes in `MKTT-WP-0018`.
|
||||||
|
|
||||||
|
## P19.4 - Define package entry point and registry contract
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T004
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "3d661d24-2496-405a-b525-c7e6d8eb4e68"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define how external source adapter packages register with `markitect-tool`:
|
||||||
|
|
||||||
|
- entry point group name, initially `markitect_tool.source_adapters`
|
||||||
|
- expected entry point object shape
|
||||||
|
- descriptor ID and versioning rules
|
||||||
|
- relationship between source adapter descriptors and
|
||||||
|
`ExtensionDescriptor`
|
||||||
|
- duplicate descriptor handling
|
||||||
|
- dependency diagnostics for missing optional format libraries
|
||||||
|
- compatibility notes for separately versioned packages
|
||||||
|
|
||||||
|
Output: discovery contract and fake entry point test plan for
|
||||||
|
`MKTT-WP-0018`.
|
||||||
|
|
||||||
|
## P19.5 - Pin CLI/API output envelopes and exit behavior
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "2c30b0c7-683e-4d60-8268-0b49660f2e30"
|
||||||
|
```
|
||||||
|
|
||||||
|
Specify the public source commands and library functions:
|
||||||
|
|
||||||
|
- `mkt source adapters`
|
||||||
|
- `mkt source inspect <path>`
|
||||||
|
- `mkt source normalize <path> --format markdown`
|
||||||
|
- JSON output for adapters, inspection, normalization, and diagnostics
|
||||||
|
- Markdown output for normalized document content
|
||||||
|
- adapter selection and explicit adapter override options
|
||||||
|
- exit behavior for unsupported, malformed, or dependency-missing inputs
|
||||||
|
- public API names that should be exported from `markitect_tool`
|
||||||
|
|
||||||
|
Output: CLI/API contract note and expected-output fixtures.
|
||||||
|
|
||||||
|
## P19.6 - Prepare contract-test and markitect-filter handoff criteria
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0019-T006
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "f6845a4d-3465-40b3-970a-714cfafe282c"
|
||||||
|
```
|
||||||
|
|
||||||
|
Define the contract tests that `MKTT-WP-0018` must implement:
|
||||||
|
|
||||||
|
- fake in-tree adapter for core behavior
|
||||||
|
- fake external adapter package or monkeypatched entry point for discovery
|
||||||
|
- serialization round trips for normalized model fixtures
|
||||||
|
- unsupported-format and missing-dependency diagnostics
|
||||||
|
- CLI JSON and Markdown output fixtures
|
||||||
|
- reusable adapter conformance expectations for `markitect-filter`
|
||||||
|
|
||||||
|
Also seed the `markitect-filter` handoff:
|
||||||
|
|
||||||
|
- expected package entry point declaration
|
||||||
|
- first EPUB3 adapter descriptor shape
|
||||||
|
- minimal fixture expectations for EPUB3 spine/nav/body extraction
|
||||||
|
- follow-up workplan seed for `markitect-filter` implementation
|
||||||
|
|
||||||
|
Output: contract-test checklist and handoff note.
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
- `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,
|
||||||
|
read protocol shape, entry point discovery, CLI/API output, or fake adapter
|
||||||
|
tests.
|
||||||
|
- v1 is explicitly read-only; write/export support is deferred to a later
|
||||||
|
workplan.
|
||||||
|
- External adapter discovery has a named entry point group and descriptor
|
||||||
|
object contract.
|
||||||
|
- `markitect-filter` has enough handoff detail to implement EPUB3 without
|
||||||
|
importing implementation decisions from `infospace-bench`.
|
||||||
|
- The existing `MKTT-WP-0018` workplan is updated to depend on this refinement
|
||||||
|
pass and to reference the pinned decisions rather than reopening them.
|
||||||
Reference in New Issue
Block a user