Files

tegwick 789602ef93 chore(workplans): sync state hub metadata

2026-05-14 23:03:19 +02:00

1.8 KiB

Raw Permalink Blame History

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, related_workplans, created, updated, state_hub_workstream_id

type

title

domain

status

owner

topic_slug

planning_priority

planning_order

related_workplans

created

updated

state_hub_workstream_id

MKTF-WP-0001

workplan

EPUB3 Read Adapter

markitect

done

markitect-filter

markitect

complete

MKTT-WP-0018

2026-05-14

15595fa9-63f9-4ff5-8a9d-45f51893f085

MKTF-WP-0001: EPUB3 Read Adapter

Purpose

Implement the first concrete markitect-filter source adapter: source.epub3, a read-only EPUB3 adapter that satisfies the markitect-tool source adapter contract.

The contract dependency is cross-repo and is tracked as related work rather than a same-repo State Hub dependency edge: markitect-tool MKTT-WP-0018.

Implemented Scope

Python package scaffold with pyproject.toml.
Entry point group registration: markitect_tool.source_adapters.
Lightweight epub3_adapter_descriptor.
Stdlib-only EPUB3 package reading with zipfile and ElementTree.
META-INF/container.xml rootfile discovery.
OPF metadata, manifest, and spine extraction.
EPUB nav label extraction.
XHTML body extraction into ordered Markdown segments.
Source provenance with package paths, hrefs, anchors, and section labels.
Structured diagnostics for malformed EPUBs, skipped boilerplate, missing spine items, unsupported media, and malformed XML.
Tests for descriptor shape, matching, inspection, normalization, malformed packages, Markitect API registry use, and entry point shape.

Non-Goals

PDF, DOCX, ODT, OCR, or browser extraction.
Write/export adapters.
Network fetching.
Styling-preserving conversion.
Image extraction beyond future metadata/attachment handling.

Validation

Run from markitect-filter:

PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest

1.8 KiB Raw Permalink Blame History