feat(proxy): add proxy file system for non-markdown source conversion

Introduces a new `markitect/proxy/` module with pluggable extractors that convert non-markdown sources (PDF, HTML) into tracked markdown proxy files. Proxy files preserve origin metadata (path, checksum, timestamp) so they can be kept in sync when the original changes. CLI commands: `proxy create`, `proxy update`, `proxy status`, `proxy extractors`. Built-in extractors: PDF (pymupdf4llm), HTML (markdownify), Markdown (built-in). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:06:09 +01:00
parent 69aea1ada7
commit ac334c679d
13 changed files with 781 additions and 0 deletions
--- a/markitect/proxy/models.py
+++ b/markitect/proxy/models.py
@@ -0,0 +1,26 @@
+"""
+Data models for the proxy file system.
+"""
+
+from dataclasses import dataclass
+
+
+@dataclass
+class ProxyMetadata:
+    """Metadata stored in a proxy file's YAML frontmatter."""
+
+    source_path: str
+    source_checksum: str       # "sha256:<hex>"
+    source_size: int
+    generated_at: str          # ISO 8601
+    extractor: str
+    extractor_version: str
+
+
+@dataclass
+class ExtractionResult:
+    """Result returned by an extractor after processing a source file."""
+
+    content: str
+    extractor: str
+    extractor_version: str