generated from coulomb/repo-seed
138 lines
4.1 KiB
Markdown
138 lines
4.1 KiB
Markdown
# SCOPE
|
|
|
|
> This file helps humans and agents understand when this repository is useful,
|
|
> what it owns, and where its boundaries stop.
|
|
|
|
---
|
|
|
|
## One-liner
|
|
|
|
markitect-filter provides concrete source-format adapters that convert EPUB3 and
|
|
digitally readable PDF inputs into normalized Markitect Markdown documents.
|
|
|
|
---
|
|
|
|
## Core Idea
|
|
|
|
This repo keeps source extraction outside `markitect-tool` while implementing
|
|
the Markitect source adapter contract. It turns selected external document
|
|
formats into deterministic, normalized read-side artifacts that the Markitect
|
|
core can consume without knowing each format's package or file structure.
|
|
|
|
---
|
|
|
|
## In Scope
|
|
|
|
- EPUB3 read adapter descriptor and package-to-Markitect normalization.
|
|
- PDF read adapter descriptor for local, digitally readable PDF inputs.
|
|
- Source attachment metadata for EPUB package resources, PDF embedded files,
|
|
and PDF image-resource signals.
|
|
- Tests, examples, and docs for adapter contract compatibility.
|
|
- Entry points under `markitect_tool.source_adapters`.
|
|
|
|
---
|
|
|
|
## Out of Scope
|
|
|
|
- Markitect core document, contract, render, memory, or workflow engines.
|
|
- OCR, scanned-document recognition, and layout-perfect PDF reconstruction.
|
|
- Export or rendering behavior.
|
|
- Remote ingestion services, queues, storage, or production hosting.
|
|
- Owning the passive render asset manifest contract beyond read-side handoff
|
|
metadata.
|
|
|
|
---
|
|
|
|
## Relevant When
|
|
|
|
- You need Markitect to ingest EPUB3 or digitally readable PDF sources.
|
|
- You are testing source adapter descriptors against `markitect-tool`.
|
|
- You need attachment metadata from source documents for downstream render
|
|
manifest planning.
|
|
- You are maintaining EPUB3/PDF normalization fixtures and examples.
|
|
|
|
---
|
|
|
|
## Not Relevant When
|
|
|
|
- The needed behavior belongs in the source adapter contract itself.
|
|
- The work is Quarkdown rendering or export production.
|
|
- The input is scanned or image-only PDF requiring OCR.
|
|
- The task is general Markdown transformation after normalization.
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
- Status: active.
|
|
- Implementation: Python 3.12 package with EPUB3, PDF, attachment metadata,
|
|
tests, examples, and docs.
|
|
- Stability: adapter slices are deterministic and test-covered.
|
|
- Integration: registered in the Custodian State Hub as `markitect-filter`.
|
|
|
|
---
|
|
|
|
## How It Fits
|
|
|
|
- Upstream contract: `markitect-tool` owns the source adapter interfaces and
|
|
normalized document model.
|
|
- Downstream consumers: Markitect workflows load this package through source
|
|
adapter entry points.
|
|
- Adjacent repo: `markitect-quarkdown` consumes Markitect render/export
|
|
contracts on the output side.
|
|
|
|
---
|
|
|
|
## Terminology
|
|
|
|
- Preferred terms: source adapter, normalized Markdown document, attachment
|
|
metadata, EPUB3 package, digitally readable PDF.
|
|
- Also known as: Markitect filter adapters.
|
|
- Potentially confusing terms: "filter" means source normalization here, not
|
|
policy filtering or search result filtering.
|
|
|
|
---
|
|
|
|
## Related / Overlapping
|
|
|
|
- `markitect-tool` - owns core contracts and normalized document APIs.
|
|
- `markitect-quarkdown` - owns concrete rendering/export through Quarkdown.
|
|
- `the-custodian` - State Hub registration, workplan tracking, and consistency
|
|
sync.
|
|
|
|
---
|
|
|
|
## Getting Oriented
|
|
|
|
- Start with: `README.md`, `docs/pdf-adapter.md`, and
|
|
`docs/source-attachment-metadata.md`.
|
|
- Key directories: `src/markitect_filter/`, `tests/`, `examples/`, and
|
|
`workplans/`.
|
|
- Entry points: `markitect_filter.adapters:epub3_adapter_descriptor` and
|
|
`markitect_filter.adapters:pdf_adapter_descriptor`.
|
|
|
|
---
|
|
|
|
## Provided Capabilities
|
|
|
|
```capability
|
|
type: source_adapter
|
|
title: source.epub3
|
|
description: Convert EPUB3 packages into normalized Markitect Markdown documents with package resource attachment metadata.
|
|
keywords: [epub3, source-adapter, markdown, attachments]
|
|
```
|
|
|
|
```capability
|
|
type: source_adapter
|
|
title: source.pdf
|
|
description: Convert local digitally readable PDFs into normalized Markitect Markdown documents with embedded-file and image-resource signals.
|
|
keywords: [pdf, source-adapter, markdown, attachments]
|
|
```
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
Run tests with `PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m
|
|
pytest` from this checkout.
|