Files
markitect-filter/SCOPE.md
2026-05-15 15:20:33 +02:00

4.1 KiB

SCOPE

This file helps humans and agents understand when this repository is useful, what it owns, and where its boundaries stop.


One-liner

markitect-filter provides concrete source-format adapters that convert EPUB3 and digitally readable PDF inputs into normalized Markitect Markdown documents.


Core Idea

This repo keeps source extraction outside markitect-tool while implementing the Markitect source adapter contract. It turns selected external document formats into deterministic, normalized read-side artifacts that the Markitect core can consume without knowing each format's package or file structure.


In Scope

  • EPUB3 read adapter descriptor and package-to-Markitect normalization.
  • PDF read adapter descriptor for local, digitally readable PDF inputs.
  • Source attachment metadata for EPUB package resources, PDF embedded files, and PDF image-resource signals.
  • Tests, examples, and docs for adapter contract compatibility.
  • Entry points under markitect_tool.source_adapters.

Out of Scope

  • Markitect core document, contract, render, memory, or workflow engines.
  • OCR, scanned-document recognition, and layout-perfect PDF reconstruction.
  • Export or rendering behavior.
  • Remote ingestion services, queues, storage, or production hosting.
  • Owning the passive render asset manifest contract beyond read-side handoff metadata.

Relevant When

  • You need Markitect to ingest EPUB3 or digitally readable PDF sources.
  • You are testing source adapter descriptors against markitect-tool.
  • You need attachment metadata from source documents for downstream render manifest planning.
  • You are maintaining EPUB3/PDF normalization fixtures and examples.

Not Relevant When

  • The needed behavior belongs in the source adapter contract itself.
  • The work is Quarkdown rendering or export production.
  • The input is scanned or image-only PDF requiring OCR.
  • The task is general Markdown transformation after normalization.

Current State

  • Status: active.
  • Implementation: Python 3.12 package with EPUB3, PDF, attachment metadata, tests, examples, and docs.
  • Stability: adapter slices are deterministic and test-covered.
  • Integration: registered in the Custodian State Hub as markitect-filter.

How It Fits

  • Upstream contract: markitect-tool owns the source adapter interfaces and normalized document model.
  • Downstream consumers: Markitect workflows load this package through source adapter entry points.
  • Adjacent repo: markitect-quarkdown consumes Markitect render/export contracts on the output side.

Terminology

  • Preferred terms: source adapter, normalized Markdown document, attachment metadata, EPUB3 package, digitally readable PDF.
  • Also known as: Markitect filter adapters.
  • Potentially confusing terms: "filter" means source normalization here, not policy filtering or search result filtering.

  • markitect-tool - owns core contracts and normalized document APIs.
  • markitect-quarkdown - owns concrete rendering/export through Quarkdown.
  • the-custodian - State Hub registration, workplan tracking, and consistency sync.

Getting Oriented

  • Start with: README.md, docs/pdf-adapter.md, and docs/source-attachment-metadata.md.
  • Key directories: src/markitect_filter/, tests/, examples/, and workplans/.
  • Entry points: markitect_filter.adapters:epub3_adapter_descriptor and markitect_filter.adapters:pdf_adapter_descriptor.

Provided Capabilities

type: source_adapter
title: source.epub3
description: Convert EPUB3 packages into normalized Markitect Markdown documents with package resource attachment metadata.
keywords: [epub3, source-adapter, markdown, attachments]
type: source_adapter
title: source.pdf
description: Convert local digitally readable PDFs into normalized Markitect Markdown documents with embedded-file and image-resource signals.
keywords: [pdf, source-adapter, markdown, attachments]

Notes

Run tests with PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest from this checkout.