markitect-main

Author	SHA1	Message	Date
tegwick	120ed89780	fix(proxy): catch markitdown missing-dependency errors with clean hint When markitdown is installed but a format-specific sub-dependency is missing (e.g. pdfminer-six for PDF), translate the raw traceback into a DependencyMissingError with the correct install command. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 21:00:51 +01:00
tegwick	9fa239c140	fix(proxy): register markitdown extractor unconditionally Always register MarkitdownExtractor so it overrides specialized extractors for all its extensions. When markitdown-no-magika is not installed, users now see the correct install hint instead of the old pymupdf4llm message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 20:52:07 +01:00
tegwick	e4fbba8a57	feat(proxy): add markitdown as default proxy backend Uses markitdown-no-magika (lighter fork without magika/onnxruntime) to handle PDF, HTML, DOCX, PPTX, XLSX, XLS, CSV, JSON, and XML files. Specialized extractors (pymupdf4llm, markdownify) remain as fallbacks when markitdown is not installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 20:48:47 +01:00
tegwick	ac334c679d	feat(proxy): add proxy file system for non-markdown source conversion Introduces a new `markitect/proxy/` module with pluggable extractors that convert non-markdown sources (PDF, HTML) into tracked markdown proxy files. Proxy files preserve origin metadata (path, checksum, timestamp) so they can be kept in sync when the original changes. CLI commands: `proxy create`, `proxy update`, `proxy status`, `proxy extractors`. Built-in extractors: PDF (pymupdf4llm), HTML (markdownify), Markdown (built-in). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 19:06:09 +01:00

4 Commits