tegwick a137cba176 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for markitect-filter
2026-05-15 14:36:58 +02:00
2026-05-14 18:06:43 +00:00
2026-05-14 20:40:24 +02:00
2026-05-14 18:06:43 +00:00

markitect-filter

markitect-filter provides concrete source-format adapters for converting external document formats into canonical Markitect Markdown representations.

The first adapters are read-only source adapters that implement the markitect-tool source adapter contract:

  • source.epub3 for EPUB3 packages
  • source.pdf for digitally-readable PDFs

Development

Run tests from this checkout:

PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest

The EPUB3 adapter is registered through:

[project.entry-points."markitect_tool.source_adapters"]
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
pdf = "markitect_filter.adapters:pdf_adapter_descriptor"

The first PDF slice is stdlib-only and targets deterministic text extraction from local, digitally-readable PDFs. OCR, scanned-document recognition, and layout-perfect reconstruction are intentionally deferred.

Read-side attachment metadata is exposed through NormalizedMarkdownDocument.attachments for EPUB3 package resources, PDF embedded files, and PDF image-resource signals. See docs/source-attachment-metadata.md for the handoff contract to passive render asset manifests.

Description
A collection of read/write filters for markdown representations of other content formats, including epub3.
Readme MIT-0 116 KiB
Languages
Python 100%