feat(source): add pdf read adapter

This commit is contained in:
2026-05-14 23:33:31 +02:00
parent 24ee499b50
commit 0c9a418e85
8 changed files with 1176 additions and 13 deletions

View File

@@ -3,8 +3,11 @@
`markitect-filter` provides concrete source-format adapters for converting
external document formats into canonical Markitect Markdown representations.
The first adapter is `source.epub3`, a read-only EPUB3 adapter that implements
the `markitect-tool` source adapter contract.
The first adapters are read-only source adapters that implement the
`markitect-tool` source adapter contract:
- `source.epub3` for EPUB3 packages
- `source.pdf` for digitally-readable PDFs
## Development
@@ -19,4 +22,9 @@ The EPUB3 adapter is registered through:
```toml
[project.entry-points."markitect_tool.source_adapters"]
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
pdf = "markitect_filter.adapters:pdf_adapter_descriptor"
```
The first PDF slice is stdlib-only and targets deterministic text extraction
from local, digitally-readable PDFs. OCR, scanned-document recognition, and
layout-perfect reconstruction are intentionally deferred.