Add repository scope profile

This commit is contained in:
2026-05-15 15:20:33 +02:00
parent a137cba176
commit fca3a22dd5

137
SCOPE.md Normal file
View File

@@ -0,0 +1,137 @@
# SCOPE
> This file helps humans and agents understand when this repository is useful,
> what it owns, and where its boundaries stop.
---
## One-liner
markitect-filter provides concrete source-format adapters that convert EPUB3 and
digitally readable PDF inputs into normalized Markitect Markdown documents.
---
## Core Idea
This repo keeps source extraction outside `markitect-tool` while implementing
the Markitect source adapter contract. It turns selected external document
formats into deterministic, normalized read-side artifacts that the Markitect
core can consume without knowing each format's package or file structure.
---
## In Scope
- EPUB3 read adapter descriptor and package-to-Markitect normalization.
- PDF read adapter descriptor for local, digitally readable PDF inputs.
- Source attachment metadata for EPUB package resources, PDF embedded files,
and PDF image-resource signals.
- Tests, examples, and docs for adapter contract compatibility.
- Entry points under `markitect_tool.source_adapters`.
---
## Out of Scope
- Markitect core document, contract, render, memory, or workflow engines.
- OCR, scanned-document recognition, and layout-perfect PDF reconstruction.
- Export or rendering behavior.
- Remote ingestion services, queues, storage, or production hosting.
- Owning the passive render asset manifest contract beyond read-side handoff
metadata.
---
## Relevant When
- You need Markitect to ingest EPUB3 or digitally readable PDF sources.
- You are testing source adapter descriptors against `markitect-tool`.
- You need attachment metadata from source documents for downstream render
manifest planning.
- You are maintaining EPUB3/PDF normalization fixtures and examples.
---
## Not Relevant When
- The needed behavior belongs in the source adapter contract itself.
- The work is Quarkdown rendering or export production.
- The input is scanned or image-only PDF requiring OCR.
- The task is general Markdown transformation after normalization.
---
## Current State
- Status: active.
- Implementation: Python 3.12 package with EPUB3, PDF, attachment metadata,
tests, examples, and docs.
- Stability: adapter slices are deterministic and test-covered.
- Integration: registered in the Custodian State Hub as `markitect-filter`.
---
## How It Fits
- Upstream contract: `markitect-tool` owns the source adapter interfaces and
normalized document model.
- Downstream consumers: Markitect workflows load this package through source
adapter entry points.
- Adjacent repo: `markitect-quarkdown` consumes Markitect render/export
contracts on the output side.
---
## Terminology
- Preferred terms: source adapter, normalized Markdown document, attachment
metadata, EPUB3 package, digitally readable PDF.
- Also known as: Markitect filter adapters.
- Potentially confusing terms: "filter" means source normalization here, not
policy filtering or search result filtering.
---
## Related / Overlapping
- `markitect-tool` - owns core contracts and normalized document APIs.
- `markitect-quarkdown` - owns concrete rendering/export through Quarkdown.
- `the-custodian` - State Hub registration, workplan tracking, and consistency
sync.
---
## Getting Oriented
- Start with: `README.md`, `docs/pdf-adapter.md`, and
`docs/source-attachment-metadata.md`.
- Key directories: `src/markitect_filter/`, `tests/`, `examples/`, and
`workplans/`.
- Entry points: `markitect_filter.adapters:epub3_adapter_descriptor` and
`markitect_filter.adapters:pdf_adapter_descriptor`.
---
## Provided Capabilities
```capability
type: source_adapter
title: source.epub3
description: Convert EPUB3 packages into normalized Markitect Markdown documents with package resource attachment metadata.
keywords: [epub3, source-adapter, markdown, attachments]
```
```capability
type: source_adapter
title: source.pdf
description: Convert local digitally readable PDFs into normalized Markitect Markdown documents with embedded-file and image-resource signals.
keywords: [pdf, source-adapter, markdown, attachments]
```
---
## Notes
Run tests with `PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m
pytest` from this checkout.