--- id: MKTF-WP-0001 type: workplan title: "EPUB3 Read Adapter" domain: markitect status: done owner: markitect-filter topic_slug: markitect planning_priority: complete planning_order: 10 related_workplans: - MKTT-WP-0018 created: "2026-05-14" updated: "2026-05-14" state_hub_workstream_id: "15595fa9-63f9-4ff5-8a9d-45f51893f085" --- # MKTF-WP-0001: EPUB3 Read Adapter ## Purpose Implement the first concrete `markitect-filter` source adapter: `source.epub3`, a read-only EPUB3 adapter that satisfies the `markitect-tool` source adapter contract. The contract dependency is cross-repo and is tracked as related work rather than a same-repo State Hub dependency edge: `markitect-tool` `MKTT-WP-0018`. ## Implemented Scope - Python package scaffold with `pyproject.toml`. - Entry point group registration: `markitect_tool.source_adapters`. - Lightweight `epub3_adapter_descriptor`. - Stdlib-only EPUB3 package reading with `zipfile` and `ElementTree`. - `META-INF/container.xml` rootfile discovery. - OPF metadata, manifest, and spine extraction. - EPUB nav label extraction. - XHTML body extraction into ordered Markdown segments. - Source provenance with package paths, hrefs, anchors, and section labels. - Structured diagnostics for malformed EPUBs, skipped boilerplate, missing spine items, unsupported media, and malformed XML. - Tests for descriptor shape, matching, inspection, normalization, malformed packages, Markitect API registry use, and entry point shape. ## Non-Goals - PDF, DOCX, ODT, OCR, or browser extraction. - Write/export adapters. - Network fetching. - Styling-preserving conversion. - Image extraction beyond future metadata/attachment handling. ## Validation Run from `markitect-filter`: ```bash PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest ```