Files
markitect-filter/workplans/MKTF-WP-0001-epub3-read-adapter.md
2026-05-14 22:46:51 +02:00

1.6 KiB

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, created, updated
id type title domain status owner topic_slug planning_priority planning_order depends_on_workplans created updated
MKTF-WP-0001 workplan EPUB3 Read Adapter markitect done markitect-filter markitect complete 10
MKTT-WP-0018
2026-05-14 2026-05-14

MKTF-WP-0001: EPUB3 Read Adapter

Purpose

Implement the first concrete markitect-filter source adapter: source.epub3, a read-only EPUB3 adapter that satisfies the markitect-tool source adapter contract.

Implemented Scope

  • Python package scaffold with pyproject.toml.
  • Entry point group registration: markitect_tool.source_adapters.
  • Lightweight epub3_adapter_descriptor.
  • Stdlib-only EPUB3 package reading with zipfile and ElementTree.
  • META-INF/container.xml rootfile discovery.
  • OPF metadata, manifest, and spine extraction.
  • EPUB nav label extraction.
  • XHTML body extraction into ordered Markdown segments.
  • Source provenance with package paths, hrefs, anchors, and section labels.
  • Structured diagnostics for malformed EPUBs, skipped boilerplate, missing spine items, unsupported media, and malformed XML.
  • Tests for descriptor shape, matching, inspection, normalization, malformed packages, Markitect API registry use, and entry point shape.

Non-Goals

  • PDF, DOCX, ODT, OCR, or browser extraction.
  • Write/export adapters.
  • Network fetching.
  • Styling-preserving conversion.
  • Image extraction beyond future metadata/attachment handling.

Validation

Run from markitect-filter:

PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest