From fca3a22dd572385cac1885591fd03e26290d51f5 Mon Sep 17 00:00:00 2001 From: tegwick Date: Fri, 15 May 2026 15:20:33 +0200 Subject: [PATCH] Add repository scope profile --- SCOPE.md | 137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 SCOPE.md diff --git a/SCOPE.md b/SCOPE.md new file mode 100644 index 0000000..02d0ec1 --- /dev/null +++ b/SCOPE.md @@ -0,0 +1,137 @@ +# SCOPE + +> This file helps humans and agents understand when this repository is useful, +> what it owns, and where its boundaries stop. + +--- + +## One-liner + +markitect-filter provides concrete source-format adapters that convert EPUB3 and +digitally readable PDF inputs into normalized Markitect Markdown documents. + +--- + +## Core Idea + +This repo keeps source extraction outside `markitect-tool` while implementing +the Markitect source adapter contract. It turns selected external document +formats into deterministic, normalized read-side artifacts that the Markitect +core can consume without knowing each format's package or file structure. + +--- + +## In Scope + +- EPUB3 read adapter descriptor and package-to-Markitect normalization. +- PDF read adapter descriptor for local, digitally readable PDF inputs. +- Source attachment metadata for EPUB package resources, PDF embedded files, + and PDF image-resource signals. +- Tests, examples, and docs for adapter contract compatibility. +- Entry points under `markitect_tool.source_adapters`. + +--- + +## Out of Scope + +- Markitect core document, contract, render, memory, or workflow engines. +- OCR, scanned-document recognition, and layout-perfect PDF reconstruction. +- Export or rendering behavior. +- Remote ingestion services, queues, storage, or production hosting. +- Owning the passive render asset manifest contract beyond read-side handoff + metadata. + +--- + +## Relevant When + +- You need Markitect to ingest EPUB3 or digitally readable PDF sources. +- You are testing source adapter descriptors against `markitect-tool`. +- You need attachment metadata from source documents for downstream render + manifest planning. +- You are maintaining EPUB3/PDF normalization fixtures and examples. + +--- + +## Not Relevant When + +- The needed behavior belongs in the source adapter contract itself. +- The work is Quarkdown rendering or export production. +- The input is scanned or image-only PDF requiring OCR. +- The task is general Markdown transformation after normalization. + +--- + +## Current State + +- Status: active. +- Implementation: Python 3.12 package with EPUB3, PDF, attachment metadata, + tests, examples, and docs. +- Stability: adapter slices are deterministic and test-covered. +- Integration: registered in the Custodian State Hub as `markitect-filter`. + +--- + +## How It Fits + +- Upstream contract: `markitect-tool` owns the source adapter interfaces and + normalized document model. +- Downstream consumers: Markitect workflows load this package through source + adapter entry points. +- Adjacent repo: `markitect-quarkdown` consumes Markitect render/export + contracts on the output side. + +--- + +## Terminology + +- Preferred terms: source adapter, normalized Markdown document, attachment + metadata, EPUB3 package, digitally readable PDF. +- Also known as: Markitect filter adapters. +- Potentially confusing terms: "filter" means source normalization here, not + policy filtering or search result filtering. + +--- + +## Related / Overlapping + +- `markitect-tool` - owns core contracts and normalized document APIs. +- `markitect-quarkdown` - owns concrete rendering/export through Quarkdown. +- `the-custodian` - State Hub registration, workplan tracking, and consistency + sync. + +--- + +## Getting Oriented + +- Start with: `README.md`, `docs/pdf-adapter.md`, and + `docs/source-attachment-metadata.md`. +- Key directories: `src/markitect_filter/`, `tests/`, `examples/`, and + `workplans/`. +- Entry points: `markitect_filter.adapters:epub3_adapter_descriptor` and + `markitect_filter.adapters:pdf_adapter_descriptor`. + +--- + +## Provided Capabilities + +```capability +type: source_adapter +title: source.epub3 +description: Convert EPUB3 packages into normalized Markitect Markdown documents with package resource attachment metadata. +keywords: [epub3, source-adapter, markdown, attachments] +``` + +```capability +type: source_adapter +title: source.pdf +description: Convert local digitally readable PDFs into normalized Markitect Markdown documents with embedded-file and image-resource signals. +keywords: [pdf, source-adapter, markdown, attachments] +``` + +--- + +## Notes + +Run tests with `PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m +pytest` from this checkout.