generated from coulomb/repo-seed
1.8 KiB
1.8 KiB
id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, related_workplans, created, updated, state_hub_workstream_id
| id | type | title | domain | status | owner | topic_slug | planning_priority | planning_order | related_workplans | created | updated | state_hub_workstream_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MKTF-WP-0001 | workplan | EPUB3 Read Adapter | markitect | done | markitect-filter | markitect | complete | 10 |
|
2026-05-14 | 2026-05-14 | 15595fa9-63f9-4ff5-8a9d-45f51893f085 |
MKTF-WP-0001: EPUB3 Read Adapter
Purpose
Implement the first concrete markitect-filter source adapter:
source.epub3, a read-only EPUB3 adapter that satisfies the
markitect-tool source adapter contract.
The contract dependency is cross-repo and is tracked as related work rather
than a same-repo State Hub dependency edge: markitect-tool MKTT-WP-0018.
Implemented Scope
- Python package scaffold with
pyproject.toml. - Entry point group registration:
markitect_tool.source_adapters. - Lightweight
epub3_adapter_descriptor. - Stdlib-only EPUB3 package reading with
zipfileandElementTree. META-INF/container.xmlrootfile discovery.- OPF metadata, manifest, and spine extraction.
- EPUB nav label extraction.
- XHTML body extraction into ordered Markdown segments.
- Source provenance with package paths, hrefs, anchors, and section labels.
- Structured diagnostics for malformed EPUBs, skipped boilerplate, missing spine items, unsupported media, and malformed XML.
- Tests for descriptor shape, matching, inspection, normalization, malformed packages, Markitect API registry use, and entry point shape.
Non-Goals
- PDF, DOCX, ODT, OCR, or browser extraction.
- Write/export adapters.
- Network fetching.
- Styling-preserving conversion.
- Image extraction beyond future metadata/attachment handling.
Validation
Run from markitect-filter:
PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest