Files
markitect-filter/workplans/MKTF-WP-0001-epub3-read-adapter.md

1.8 KiB

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, related_workplans, created, updated, state_hub_workstream_id
id type title domain status owner topic_slug planning_priority planning_order related_workplans created updated state_hub_workstream_id
MKTF-WP-0001 workplan EPUB3 Read Adapter markitect done markitect-filter markitect complete 10
MKTT-WP-0018
2026-05-14 2026-05-14 15595fa9-63f9-4ff5-8a9d-45f51893f085

MKTF-WP-0001: EPUB3 Read Adapter

Purpose

Implement the first concrete markitect-filter source adapter: source.epub3, a read-only EPUB3 adapter that satisfies the markitect-tool source adapter contract.

The contract dependency is cross-repo and is tracked as related work rather than a same-repo State Hub dependency edge: markitect-tool MKTT-WP-0018.

Implemented Scope

  • Python package scaffold with pyproject.toml.
  • Entry point group registration: markitect_tool.source_adapters.
  • Lightweight epub3_adapter_descriptor.
  • Stdlib-only EPUB3 package reading with zipfile and ElementTree.
  • META-INF/container.xml rootfile discovery.
  • OPF metadata, manifest, and spine extraction.
  • EPUB nav label extraction.
  • XHTML body extraction into ordered Markdown segments.
  • Source provenance with package paths, hrefs, anchors, and section labels.
  • Structured diagnostics for malformed EPUBs, skipped boilerplate, missing spine items, unsupported media, and malformed XML.
  • Tests for descriptor shape, matching, inspection, normalization, malformed packages, Markitect API registry use, and entry point shape.

Non-Goals

  • PDF, DOCX, ODT, OCR, or browser extraction.
  • Write/export adapters.
  • Network fetching.
  • Styling-preserving conversion.
  • Image extraction beyond future metadata/attachment handling.

Validation

Run from markitect-filter:

PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest