generated from coulomb/repo-seed
4.1 KiB
4.1 KiB
SCOPE
This file helps humans and agents understand when this repository is useful, what it owns, and where its boundaries stop.
One-liner
markitect-filter provides concrete source-format adapters that convert EPUB3 and digitally readable PDF inputs into normalized Markitect Markdown documents.
Core Idea
This repo keeps source extraction outside markitect-tool while implementing
the Markitect source adapter contract. It turns selected external document
formats into deterministic, normalized read-side artifacts that the Markitect
core can consume without knowing each format's package or file structure.
In Scope
- EPUB3 read adapter descriptor and package-to-Markitect normalization.
- PDF read adapter descriptor for local, digitally readable PDF inputs.
- Source attachment metadata for EPUB package resources, PDF embedded files, and PDF image-resource signals.
- Tests, examples, and docs for adapter contract compatibility.
- Entry points under
markitect_tool.source_adapters.
Out of Scope
- Markitect core document, contract, render, memory, or workflow engines.
- OCR, scanned-document recognition, and layout-perfect PDF reconstruction.
- Export or rendering behavior.
- Remote ingestion services, queues, storage, or production hosting.
- Owning the passive render asset manifest contract beyond read-side handoff metadata.
Relevant When
- You need Markitect to ingest EPUB3 or digitally readable PDF sources.
- You are testing source adapter descriptors against
markitect-tool. - You need attachment metadata from source documents for downstream render manifest planning.
- You are maintaining EPUB3/PDF normalization fixtures and examples.
Not Relevant When
- The needed behavior belongs in the source adapter contract itself.
- The work is Quarkdown rendering or export production.
- The input is scanned or image-only PDF requiring OCR.
- The task is general Markdown transformation after normalization.
Current State
- Status: active.
- Implementation: Python 3.12 package with EPUB3, PDF, attachment metadata, tests, examples, and docs.
- Stability: adapter slices are deterministic and test-covered.
- Integration: registered in the Custodian State Hub as
markitect-filter.
How It Fits
- Upstream contract:
markitect-toolowns the source adapter interfaces and normalized document model. - Downstream consumers: Markitect workflows load this package through source adapter entry points.
- Adjacent repo:
markitect-quarkdownconsumes Markitect render/export contracts on the output side.
Terminology
- Preferred terms: source adapter, normalized Markdown document, attachment metadata, EPUB3 package, digitally readable PDF.
- Also known as: Markitect filter adapters.
- Potentially confusing terms: "filter" means source normalization here, not policy filtering or search result filtering.
Related / Overlapping
markitect-tool- owns core contracts and normalized document APIs.markitect-quarkdown- owns concrete rendering/export through Quarkdown.the-custodian- State Hub registration, workplan tracking, and consistency sync.
Getting Oriented
- Start with:
README.md,docs/pdf-adapter.md, anddocs/source-attachment-metadata.md. - Key directories:
src/markitect_filter/,tests/,examples/, andworkplans/. - Entry points:
markitect_filter.adapters:epub3_adapter_descriptorandmarkitect_filter.adapters:pdf_adapter_descriptor.
Provided Capabilities
type: source_adapter
title: source.epub3
description: Convert EPUB3 packages into normalized Markitect Markdown documents with package resource attachment metadata.
keywords: [epub3, source-adapter, markdown, attachments]
type: source_adapter
title: source.pdf
description: Convert local digitally readable PDFs into normalized Markitect Markdown documents with embedded-file and image-resource signals.
keywords: [pdf, source-adapter, markdown, attachments]
Notes
Run tests with PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest from this checkout.