Introduces a new `markitect/proxy/` module with pluggable extractors that convert non-markdown sources (PDF, HTML) into tracked markdown proxy files. Proxy files preserve origin metadata (path, checksum, timestamp) so they can be kept in sync when the original changes. CLI commands: `proxy create`, `proxy update`, `proxy status`, `proxy extractors`. Built-in extractors: PDF (pymupdf4llm), HTML (markdownify), Markdown (built-in). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 lines
476 B
Python
15 lines
476 B
Python
"""
|
|
Built-in extractor registration.
|
|
|
|
Importing this module registers all built-in extractors with the global registry.
|
|
"""
|
|
|
|
from markitect.proxy.registry import registry
|
|
from markitect.proxy.extractors.pdf import PdfExtractor
|
|
from markitect.proxy.extractors.html import HtmlExtractor
|
|
from markitect.proxy.extractors.markdown import MarkdownNormalizer
|
|
|
|
registry.register(PdfExtractor())
|
|
registry.register(HtmlExtractor())
|
|
registry.register(MarkdownNormalizer())
|