generated from coulomb/repo-seed
31 lines
929 B
Markdown
31 lines
929 B
Markdown
# markitect-filter
|
|
|
|
`markitect-filter` provides concrete source-format adapters for converting
|
|
external document formats into canonical Markitect Markdown representations.
|
|
|
|
The first adapters are read-only source adapters that implement the
|
|
`markitect-tool` source adapter contract:
|
|
|
|
- `source.epub3` for EPUB3 packages
|
|
- `source.pdf` for digitally-readable PDFs
|
|
|
|
## Development
|
|
|
|
Run tests from this checkout:
|
|
|
|
```bash
|
|
PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest
|
|
```
|
|
|
|
The EPUB3 adapter is registered through:
|
|
|
|
```toml
|
|
[project.entry-points."markitect_tool.source_adapters"]
|
|
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
|
|
pdf = "markitect_filter.adapters:pdf_adapter_descriptor"
|
|
```
|
|
|
|
The first PDF slice is stdlib-only and targets deterministic text extraction
|
|
from local, digitally-readable PDFs. OCR, scanned-document recognition, and
|
|
layout-perfect reconstruction are intentionally deferred.
|