generated from coulomb/repo-seed
37 lines
1.2 KiB
Markdown
37 lines
1.2 KiB
Markdown
# markitect-filter
|
|
|
|
`markitect-filter` provides concrete source-format adapters for converting
|
|
external document formats into canonical Markitect Markdown representations.
|
|
|
|
The first adapters are read-only source adapters that implement the
|
|
`markitect-tool` source adapter contract:
|
|
|
|
- `source.epub3` for EPUB3 packages
|
|
- `source.pdf` for digitally-readable PDFs
|
|
|
|
## Development
|
|
|
|
Run tests from this checkout:
|
|
|
|
```bash
|
|
PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest
|
|
```
|
|
|
|
The EPUB3 adapter is registered through:
|
|
|
|
```toml
|
|
[project.entry-points."markitect_tool.source_adapters"]
|
|
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
|
|
pdf = "markitect_filter.adapters:pdf_adapter_descriptor"
|
|
```
|
|
|
|
The first PDF slice is stdlib-only and targets deterministic text extraction
|
|
from local, digitally-readable PDFs. OCR, scanned-document recognition, and
|
|
layout-perfect reconstruction are intentionally deferred.
|
|
|
|
Read-side attachment metadata is exposed through
|
|
`NormalizedMarkdownDocument.attachments` for EPUB3 package resources, PDF
|
|
embedded files, and PDF image-resource signals. See
|
|
`docs/source-attachment-metadata.md` for the handoff contract to passive render
|
|
asset manifests.
|