# Source Adapter Migration Notes ## Purpose These notes describe how sibling repositories should consume the `markitect-tool` source adapter contract implemented by `MKTT-WP-0018`. The source adapter layer is deliberately split: ```text external source files -> markitect-filter concrete adapters -> markitect-tool source adapter protocol and normalized Markdown model -> infospace-bench workflows -> optional kontextual-engine ingestion ``` ## Markitect-tool `markitect-tool` owns the stable contract: - normalized source data model - read-only source adapter protocol - adapter registry and Python entry point discovery - `mkt source` CLI commands - public API helpers such as `inspect_source` and `normalize_source` - fake adapter contract tests It does not own EPUB3, PDF, DOCX, ODT, OCR, or browser extraction. ## Markitect-filter `markitect-filter` should implement concrete adapters behind the entry point group: ```toml [project.entry-points."markitect_tool.source_adapters"] epub3 = "markitect_filter.adapters:epub3_adapter_descriptor" ``` The first adapter should be: ```text id: source.epub3 operations: read media_types: application/epub+zip extensions: .epub ``` The EPUB3 adapter should satisfy the contract tests described in `docs/source-adapter-contract.md` and add EPUB-specific fixtures for container, OPF, spine, nav, body XHTML, malformed package structure, skipped assets, and lossy extraction diagnostics. ## Infospace-bench `infospace-bench` should replace its local EPUB intake spike with the public source adapter API: ```python from markitect_tool import normalize_source result = normalize_source("source.epub") if not result.is_valid: raise RuntimeError(result.to_dict()["diagnostics"]) markdown = result.document.markdown segments = result.document.segments ``` Application workflows should consume normalized Markdown and segment metadata. They should not depend on EPUB package internals, spine parsing, XHTML extraction, or boilerplate classification directly. ## Kontextual-engine `kontextual-engine` can treat normalized source outputs as ingestible derived knowledge assets when it needs durable ingestion. The durable layer should persist policy, indexing, retrieval, permissions, audit, and lifecycle state. It should not require source-format dependencies inside `markitect-tool`. Recommended ingestion boundary: - keep `NormalizedMarkdownDocument.to_dict()` as the portable derivative - preserve source asset digest and normalization cache key - record adapter ID, adapter version, and read options - preserve document and segment provenance - store diagnostics and quality signals for human review ## Follow-up Workplan Seeds Recommended `markitect-filter` workplan: ```text MKTF-WP-0001: EPUB3 Read Adapter Implement source.epub3 against docs/source-adapter-contract.md: - package scaffold and pyproject entry point - optional EPUB dependencies - META-INF/container.xml parsing - OPF metadata and spine reading order - nav/chapter label extraction - body XHTML to normalized Markdown segments - explicit boilerplate skip policy - malformed/unsupported/lossy diagnostics - contract tests using markitect-tool fake adapter expectations ``` Recommended `infospace-bench` workplan: ```text ISB-WP-source-adapter-intake Replace local EPUB source intake with markitect-tool normalize_source: - install markitect-filter[epub3] in the relevant environment - call normalize_source for source documents - consume NormalizedMarkdownDocument markdown and segments - remove app-local EPUB package parsing - preserve source diagnostics in benchmark review artifacts ```