Files
markitect-filter/README.md

37 lines
1.2 KiB
Markdown

# markitect-filter
`markitect-filter` provides concrete source-format adapters for converting
external document formats into canonical Markitect Markdown representations.
The first adapters are read-only source adapters that implement the
`markitect-tool` source adapter contract:
- `source.epub3` for EPUB3 packages
- `source.pdf` for digitally-readable PDFs
## Development
Run tests from this checkout:
```bash
PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest
```
The EPUB3 adapter is registered through:
```toml
[project.entry-points."markitect_tool.source_adapters"]
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
pdf = "markitect_filter.adapters:pdf_adapter_descriptor"
```
The first PDF slice is stdlib-only and targets deterministic text extraction
from local, digitally-readable PDFs. OCR, scanned-document recognition, and
layout-perfect reconstruction are intentionally deferred.
Read-side attachment metadata is exposed through
`NormalizedMarkdownDocument.attachments` for EPUB3 package resources, PDF
embedded files, and PDF image-resource signals. See
`docs/source-attachment-metadata.md` for the handoff contract to passive render
asset manifests.