From f8f20c7c32b850ada9ff57bc8af65bdd5e46b553 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 14 May 2026 21:49:43 +0200 Subject: [PATCH] Workplan refinement and examples --- README.md | 1 + docs/examples-index.md | 7 + docs/internal-extension-framework.md | 1 + docs/source-adapter-contract.md | 461 ++++++++++++++++++ docs/workplan-planning-map.md | 18 +- examples/source-adapters/adapter-list.json | 52 ++ .../fake-adapter-pyproject.toml | 2 + examples/source-adapters/inspect-result.json | 35 ++ .../source-adapters/normalized-document.json | 86 ++++ examples/source-adapters/normalized-output.md | 7 + examples/source-adapters/sample.fake | 11 + .../MKTT-WP-0018-source-adapter-contract.md | 21 +- ...0019-source-adapter-contract-refinement.md | 47 +- 13 files changed, 726 insertions(+), 23 deletions(-) create mode 100644 docs/source-adapter-contract.md create mode 100644 examples/source-adapters/adapter-list.json create mode 100644 examples/source-adapters/fake-adapter-pyproject.toml create mode 100644 examples/source-adapters/inspect-result.json create mode 100644 examples/source-adapters/normalized-document.json create mode 100644 examples/source-adapters/normalized-output.md create mode 100644 examples/source-adapters/sample.fake diff --git a/README.md b/README.md index dac63a6..6e23038 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ requirements documents in `wiki/`. - `docs/getting-started.md` - first-use guide from checkout to practical commands - `docs/command-cheatsheet.md` - command-oriented workflow cheat sheet - `docs/examples-index.md` - map from examples to usecases and commands +- `docs/source-adapter-contract.md` - v1 source adapter contract for external format adapters - `docs/performance-notes.md` - local performance posture and smoke coverage - `docs/cli-reference.md` - generated `mkt` command reference - `docs/api-reference.md` - generated public API reference diff --git a/docs/examples-index.md b/docs/examples-index.md index a435623..359d98d 100644 --- a/docs/examples-index.md +++ b/docs/examples-index.md @@ -30,6 +30,13 @@ This index maps example files to practical usecases and useful commands. | `examples/workflows/adr-release-notes.workflow.md` | ADR release notes workflow | `mkt workflow plan examples/workflows/adr-release-notes.workflow.md --output-dir /tmp/markitect-workflow` | | `examples/workflows/assisted-review.workflow.md` | Assisted-generation boundary shape | Inspect with `mkt workflow inspect` | +## Source Adapter Contract Fixtures + +| Files | Usecase | Try | +| --- | --- | --- | +| `examples/source-adapters/*.json`, `examples/source-adapters/normalized-output.md` | Expected envelopes for the v1 source adapter contract | Use as fixtures for `mkt source` commands after `MKTT-WP-0018` implementation | +| `examples/source-adapters/fake-adapter-pyproject.toml` | External adapter entry point shape | Use as the fake package discovery fixture for source adapter contract tests | + ## Cache, Backend, Policy, And Context | Files | Usecase | Try | diff --git a/docs/internal-extension-framework.md b/docs/internal-extension-framework.md index 05bd537..477550c 100644 --- a/docs/internal-extension-framework.md +++ b/docs/internal-extension-framework.md @@ -39,6 +39,7 @@ framework organizes how Markitect itself exposes and composes capabilities. | `policy-gateway` | local label gateway, future external auth adapters | subject/action/object in, decision or filtered results out | | `template-engine` | deterministic templates | template/data in, Markdown out | | `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out | +| `source-adapter` | EPUB3/PDF/DOCX adapters in external packages | source asset in, normalized Markdown out | | `cli-group` | cache, backend, ref, class | command descriptors or registration hook | | `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out | | `document-function` | future function layer | function call in, typed document value out | diff --git a/docs/source-adapter-contract.md b/docs/source-adapter-contract.md new file mode 100644 index 0000000..a09eecc --- /dev/null +++ b/docs/source-adapter-contract.md @@ -0,0 +1,461 @@ +# Source Adapter Contract + +## Purpose + +This document pins the v1 contract for source-format adapters. It is the +handoff from `MKTT-WP-0019` to `MKTT-WP-0018`: `markitect-tool` implements the +contract, registry, CLI, public API, and tests; `markitect-filter` implements +concrete adapters, starting with EPUB3. + +The v1 contract is intentionally read-only. It normalizes heterogeneous source +formats into canonical Markitect Markdown plus metadata, provenance, quality +signals, and diagnostics. Writer/export adapters are future scope. + +## Scope + +The v1 source adapter layer supports: + +- local filesystem source inputs +- deterministic inspection and normalization +- package-provided read adapters discovered through Python entry points +- optional dependencies isolated in adapter packages +- JSON-serializable normalized Markdown outputs +- contract tests with fake adapters and small fixtures + +The v1 layer does not support: + +- EPUB3, PDF, DOCX, ODT, OCR, browser, or archive parsing in `markitect-tool` +- write/export adapters +- network fetching for source URIs +- durable ingestion, permissions, retrieval, or governance +- hidden AI-assisted repair or enrichment + +URI fields appear in the model so adapters can preserve source identity, but +v1 CLI/API inputs are local paths unless a later workplan opens remote source +loading explicitly. + +## Package Shape + +External adapter packages should depend on `markitect-tool` and register one or +more read adapter descriptors through the entry point group +`markitect_tool.source_adapters`. + +Recommended `markitect-filter` shape: + +```text +markitect_filter/ + src/markitect_filter/ + __init__.py + epub3.py + adapters.py + tests/ + fixtures/epub3/ + test_epub3_adapter.py + pyproject.toml +``` + +The package should expose a lightweight descriptor function that does not +import heavyweight format dependencies until the adapter is instantiated or +used. For example: + +```toml +[project.entry-points."markitect_tool.source_adapters"] +epub3 = "markitect_filter.adapters:epub3_adapter_descriptor" +``` + +Adapter packages may use extras such as `markitect-filter[epub3]` or +`markitect-filter[pdf]`. Missing optional dependencies must be reported through +structured diagnostics; they must not surface as raw import errors. + +## Descriptor Contract + +`MKTT-WP-0018` should implement a `SourceAdapterDescriptor` dataclass and map +it into the existing `ExtensionDescriptor` catalog with kind `source-adapter`. + +Required descriptor fields: + +| Field | Type | Meaning | +| --- | --- | --- | +| `id` | `str` | Stable adapter id, for example `source.epub3`. | +| `version` | `str` | Adapter contract implementation version. | +| `name` | `str` | Human-readable adapter name. | +| `operations` | `list[str]` | V1 must contain only `read`. | +| `media_types` | `list[str]` | Supported media types, lower-case. | +| `extensions` | `list[str]` | Supported file suffixes including dots. | +| `factory` | callable | Returns a `SourceReadAdapter`. | + +Optional descriptor fields: + +| Field | Type | Meaning | +| --- | --- | --- | +| `summary` | `str | None` | Short description for CLI and docs. | +| `option_schema` | `dict` | JSON-schema-like adapter options. | +| `optional_dependencies` | `list[OptionalDependency]` | Runtime libraries needed by the adapter. | +| `safety` | `dict` | Reads files, network, external processes, and related flags. | +| `quality_profile` | `dict` | Known extraction quality behavior. | +| `metadata` | `dict` | Adapter-specific metadata. | + +The corresponding `ExtensionDescriptor` should use: + +```text +id: same as SourceAdapterDescriptor.id +kind: source-adapter +input_contract: SourceInspectRequest | SourceReadRequest +output_contract: SourceInspectResult | SourceReadResult +diagnostics_namespace: source +provenance_prefix: source. +``` + +Capabilities should include: + +```text +source read +markdown normalize +diagnostics emit +provenance emit +filesystem read +``` + +Descriptor IDs are globally unique. Duplicate IDs from external packages are +registry errors. Descriptors from package entry points are sorted by ID for +deterministic listing. + +## Entry Point Contract + +The entry point group is: + +```text +markitect_tool.source_adapters +``` + +Each entry point may load to one of: + +- a `SourceAdapterDescriptor` +- an iterable of `SourceAdapterDescriptor` +- a callable returning either of the above + +Discovery must not instantiate adapters unless the loaded object itself is a +descriptor factory. Descriptors should remain cheap enough to list without +format-specific imports. + +Discovery errors should produce diagnostics with code +`source.discovery_failed`. Missing optional dependencies declared by a +descriptor should produce `source.missing_dependency` and mark that adapter +unavailable for reads until the dependency is installed. + +## Data Model + +All model objects must support stable `to_dict()` serialization. Serialization +rules: + +- omit `None`, empty lists, empty dicts, and empty strings +- preserve `False`, `0`, and empty Markdown content where semantically valid +- use UTF-8 text +- use canonical JSON with sorted keys and compact separators when computing + hashes or cache keys +- keep all timestamps and dates as strings unless they are filesystem metadata + such as `mtime_ns` + +### `SourceAsset` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `uri` | yes | `str` | Stable source URI. For local files, use a normalized path URI or path string. | +| `path` | no | `str` | Local path when available. | +| `name` | no | `str` | Display name or basename. | +| `media_type` | no | `str` | Detected or declared media type. | +| `extension` | no | `str` | Lower-case suffix including the dot. | +| `size` | no | `int` | Byte size for local files. | +| `mtime_ns` | no | `int` | Local file modification timestamp in nanoseconds. | +| `digest` | no | `str` | `sha256:` of source bytes when available. | +| `metadata` | no | `dict` | Source asset metadata that is not document metadata. | + +### `SourceMetadata` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `title` | no | `str` | Source title. | +| `creators` | no | `list[str]` | Authors or creators in source order. | +| `language` | no | `str` | BCP 47 language tag when known. | +| `rights` | no | `str` | Rights or license text from the source. | +| `source_url` | no | `str` | Original public URL when known. | +| `publication_date` | no | `str` | Source publication date string. | +| `publisher` | no | `str` | Publisher name. | +| `identifiers` | no | `dict[str, str]` | ISBN, DOI, package IDs, and similar identifiers. | +| `raw` | no | `dict` | Adapter-preserved raw metadata. | + +### `SourceProvenance` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `source_uri` | yes | `str` | Source asset URI. | +| `source_path` | no | `str` | Local source path. | +| `source_href` | no | `str` | Package-internal href or document-relative reference. | +| `package_path` | no | `str` | Archive/package member path, such as EPUB XHTML. | +| `anchor` | no | `str` | Source anchor or fragment. | +| `page` | no | `str` | Page label or number where available. | +| `section` | no | `str` | Chapter, section, or nav label. | +| `start_offset` | no | `int` | Adapter-defined start offset. | +| `end_offset` | no | `int` | Adapter-defined end offset. | +| `digest` | no | `str` | Digest of the specific source component. | +| `metadata` | no | `dict` | Adapter-specific provenance details. | + +### `NormalizedMarkdownSegment` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `segment_id` | yes | `str` | Stable ID unique within the document. | +| `order` | yes | `int` | Zero-based reading order. | +| `markdown` | yes | `str` | Canonical Markdown for the segment. | +| `heading` | no | `str` | Primary heading text for the segment. | +| `heading_level` | no | `int` | Markdown heading level when known. | +| `anchors` | no | `list[str]` | Source anchors covered by the segment. | +| `provenance` | no | `list[SourceProvenance]` | Source spans contributing to the segment. | +| `metadata` | no | `dict` | Adapter-specific segment metadata. | + +Segment IDs should be deterministic. Prefer source anchors when they are stable +and unique. Otherwise use ordinal IDs such as `seg-0001`, `seg-0002`, and so +on. Segment order is always authoritative for reading order. + +### `NormalizationQuality` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `lossiness` | yes | `str` | One of `none`, `low`, `medium`, `high`, or `unknown`. | +| `confidence` | no | `float` | Adapter confidence from `0.0` to `1.0`. | +| `skipped_items` | no | `int` | Count of skipped source items. | +| `warnings` | no | `int` | Count of warning diagnostics. | +| `metadata` | no | `dict` | Adapter-specific quality details. | + +### `NormalizedMarkdownDocument` + +| Field | Required | Type | Meaning | +| --- | --- | --- | --- | +| `schema_version` | yes | `str` | V1 uses `markitect.source.v1`. | +| `document_id` | yes | `str` | Stable normalized document ID. | +| `asset` | yes | `SourceAsset` | Original source identity. | +| `metadata` | yes | `SourceMetadata` | Source document metadata. | +| `markdown` | yes | `str` | Full normalized Markdown. | +| `segments` | yes | `list[NormalizedMarkdownSegment]` | Ordered segment list. | +| `quality` | yes | `NormalizationQuality` | Extraction quality summary. | +| `diagnostics` | no | `list[Diagnostic]` | Existing Markitect diagnostic shape. | +| `provenance` | no | `list[SourceProvenance]` | Document-level provenance. | +| `attachments` | no | `list[SourceAsset]` | Referenced binary assets; v1 metadata only. | +| `adapter` | yes | `dict` | Adapter id, version, and options. | +| `cache_key` | yes | `str` | Deterministic normalization cache key. | + +The full `markdown` field should be equal to the ordered segment Markdown joined +with exactly two newlines, unless an adapter has a documented reason to emit +document-level frontmatter or separators. + +## Hashing And Cache Keys + +Source asset digests use the source bytes: + +```text +sha256: +``` + +Document IDs should be stable across machines and based on: + +- normalized source asset URI or path +- source asset digest when available +- adapter ID +- adapter version + +Normalization cache keys should be based on canonical JSON containing: + +- source asset URI or path +- source asset digest +- adapter ID +- adapter version +- normalized model version +- read options + +Use this prefix: + +```text +source-normalize:sha256: +``` + +## Read Adapter Protocol + +`MKTT-WP-0018` should implement Python `Protocol` classes equivalent to: + +```python +class SourceReadAdapter(Protocol): + descriptor: SourceAdapterDescriptor + + def can_read(self, request: SourceAdapterMatchRequest) -> SourceAdapterMatch: + ... + + def inspect(self, request: SourceInspectRequest) -> SourceInspectResult: + ... + + def read(self, request: SourceReadRequest) -> SourceReadResult: + ... +``` + +Request and result objects: + +| Type | Required fields | Meaning | +| --- | --- | --- | +| `SourceAdapterMatchRequest` | `asset`, `options` | Cheap matching request. | +| `SourceAdapterMatch` | `adapter_id`, `matched`, `confidence`, `reason`, `diagnostics` | Match result. Confidence is `0` to `100`. | +| `SourceInspectRequest` | `asset`, `options` | Metadata-only inspection request. | +| `SourceInspectResult` | `valid`, `asset`, `adapter`, `metadata`, `capabilities`, `diagnostics`, `quality` | Inspection result without full Markdown conversion. | +| `SourceReadRequest` | `asset`, `options` | Full normalization request. | +| `SourceReadResult` | `valid`, `document`, `diagnostics` | Normalized read result. | + +`inspect` must not perform full conversion. It may open enough of the source to +validate structure and collect metadata. `read` may perform full extraction. + +Options must be JSON-serializable. Adapter-specific options should be declared +in `option_schema`. Unknown options should produce +`source.unknown_option` unless the descriptor explicitly permits free-form +options. + +## Adapter Selection + +Selection is deterministic: + +1. If an explicit adapter ID is provided, use only that descriptor. +2. Prefer media type matches over extension-only matches. +3. Prefer higher `can_read().confidence`. +4. Prefer descriptors with required optional dependencies available. +5. Break remaining ties by descriptor ID in ascending lexical order and emit + warning `source.adapter_ambiguous`. + +No matching adapter returns an error diagnostic: + +```text +source.unsupported_format +``` + +Malformed sources return an error diagnostic: + +```text +source.malformed +``` + +Missing required optional dependencies return: + +```text +source.missing_dependency +``` + +Warnings do not make a result invalid. Any error diagnostic makes `valid` +false. + +## CLI Contract + +The public commands are: + +```bash +mkt source adapters +mkt source inspect +mkt source normalize --format markdown +``` + +Common options: + +```text +--adapter Explicit adapter selection. +--format text|json|yaml For adapters and inspect. +--format markdown|json|yaml For normalize. +--option key=value Adapter-specific option, repeatable. +--output Write normalized output. +``` + +Exit behavior: + +| Exit | Meaning | +| --- | --- | +| `0` | Operation valid; warning diagnostics may exist. | +| `1` | Operation completed with error diagnostics. | +| `2` | CLI usage error from Click. | + +JSON output must contain a top-level `valid` field for operations that can +fail. Markdown output writes only normalized Markdown to stdout or `--output`; +diagnostics for Markdown output go to stderr. If normalization is invalid, do +not emit partial Markdown unless a future option explicitly requests it. + +## API Contract + +`MKTT-WP-0018` should export these names from `markitect_tool`: + +```text +SourceAsset +SourceMetadata +SourceProvenance +NormalizedMarkdownSegment +NormalizedMarkdownDocument +NormalizationQuality +SourceAdapterDescriptor +SourceReadAdapter +SourceAdapterRegistry +SourceAdapterMatchRequest +SourceAdapterMatch +SourceInspectRequest +SourceInspectResult +SourceReadRequest +SourceReadResult +default_source_adapter_registry +discover_source_adapters +inspect_source +normalize_source +``` + +Direct API helpers should accept an optional registry and adapter ID so tests +and sibling repos can avoid global discovery when they need deterministic +fixtures. + +## Contract Tests For MKTT-WP-0018 + +Implementation should add tests for: + +- `SourceAsset`, metadata, provenance, quality, segment, and document + serialization +- source document cache-key determinism +- fake in-tree adapter registration and read behavior +- fake external entry point discovery +- optional dependency diagnostics +- unsupported format diagnostics +- malformed source diagnostics +- adapter selection tie behavior +- CLI `source adapters` JSON fixture +- CLI `source inspect` JSON fixture +- CLI `source normalize --format json` fixture +- CLI `source normalize --format markdown` fixture +- public API exports + +Fixtures live in `examples/source-adapters/` and should be reused by tests where +practical. + +## Markitect-filter Handoff + +The first `markitect-filter` implementation should provide an EPUB3 descriptor: + +```text +id: source.epub3 +name: EPUB3 +operations: read +media_types: application/epub+zip +extensions: .epub +entry_point: markitect_filter.adapters:epub3_adapter_descriptor +``` + +The EPUB3 adapter should inspect and normalize: + +- `META-INF/container.xml` +- the OPF package document +- Dublin Core and package metadata +- spine reading order +- navigation labels +- body XHTML as ordered Markdown segments +- source hrefs, anchors, sections, and page references where available + +It should classify or skip cover, navigation, table-of-contents, header, +footer, license, and transcriber-note material through explicit options and +diagnostics. It should report unsupported media, malformed package structure, +skipped assets, and lossy extraction. diff --git a/docs/workplan-planning-map.md b/docs/workplan-planning-map.md index cee6584..b31aa58 100644 --- a/docs/workplan-planning-map.md +++ b/docs/workplan-planning-map.md @@ -42,8 +42,8 @@ and descriptions mirror the operational view. | `MKTT-WP-0012` | complete | done | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Document function layer is complete: deterministic Markdown-native function descriptors, registry, inline/fenced syntax, pipelines, context bindings, CLI, docs, examples, diagnostics, provenance, and extension descriptor. | | `MKTT-WP-0008` | complete | done | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory context cache is complete: context package schema, local registry, package creation from queries/search/manifests, deterministic summaries, namespaces, activation/deactivation/refresh/explain lifecycle, policy re-checks, CLI, docs, and examples. | | `MKTT-WP-0017` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0013` | CLI/API polish and practical adoption track is complete: shell completion, extension discovery, generated CLI/API docs, usecase relevance matrix, E2E fixture matrix, large-corpus smoke coverage, first-use docs, examples index, and command cheat sheet. | -| `MKTT-WP-0019` | P0 | active | `MKTT-WP-0013`, `MKTT-WP-0017` | Source adapter contract refinement: pin v1 read-only scope, field-level normalized model semantics, external adapter entry point discovery, CLI/API envelopes, fake adapter contract tests, and `markitect-filter` handoff before implementation. | -| `MKTT-WP-0018` | P1 | todo | `MKTT-WP-0013`, `MKTT-WP-0017`, `MKTT-WP-0019` | Source adapter framework implementation: implement the contract refined in `MKTT-WP-0019`, keeping format extraction in `markitect-filter` and the base install free of heavyweight conversion dependencies. | +| `MKTT-WP-0019` | complete | done | `MKTT-WP-0013`, `MKTT-WP-0017` | Source adapter contract refinement is complete: v1 read-only scope, normalized model fields, package entry point discovery, CLI/API envelopes, fake adapter fixtures, and `markitect-filter` EPUB3 handoff are pinned in `docs/source-adapter-contract.md`. | +| `MKTT-WP-0018` | P0 | active | `MKTT-WP-0013`, `MKTT-WP-0017`, `MKTT-WP-0019` | Source adapter framework implementation is the current track: implement `docs/source-adapter-contract.md`, keeping format extraction in `markitect-filter` and the base install free of heavyweight conversion dependencies. | | `MKTT-WP-0015` | P2 | todo | `MKTT-WP-0010`, `MKTT-WP-0011`, `MKTT-WP-0012` | Future render and document-function extensions: typed values, richer syntax, document-local reusable functions, Quarkdown/export adapters, render-aware references, assets, and permission sandboxing. Defer unless publishing/export pressure becomes current. | | `MKTT-WP-0016` | P2 | todo | `MKTT-WP-0008`, `MKTT-WP-0007`, `MKTT-WP-0009`, `MKTT-WP-0013` | Follow-on agentic memory architecture: reasoning decision graphs, conversational paths, long-term knowledge graphs, memory service blueprints/profiles, graph-to-context-package compilation, and adapter boundaries. | @@ -125,12 +125,16 @@ before the remaining advanced extension work because users and agents need a complete, documented, shell-friendly, test-backed public surface before the tool grows further. -`MKTT-WP-0019` is the current source-adapter refinement track. It intentionally -precedes `MKTT-WP-0018` so the implementation work does not have to invent +`MKTT-WP-0019` completed the source-adapter refinement track. It pins the field-level normalized model semantics, package entry point discovery, read -protocol behavior, CLI/API envelopes, or `markitect-filter` handoff criteria -while coding. The v1 contract should stay read-only; writer/export adapters -belong in later format-specific work once preservation semantics are explicit. +protocol behavior, CLI/API envelopes, fake adapter fixtures, and +`markitect-filter` handoff criteria in `docs/source-adapter-contract.md`. The +v1 contract is read-only; writer/export adapters belong in later +format-specific work once preservation semantics are explicit. + +`MKTT-WP-0018` is now the current source-adapter implementation track. It +should implement the pinned contract directly rather than reopening v1 model, +entry point, protocol, or CLI/API decisions. ## State Hub Mirror diff --git a/examples/source-adapters/adapter-list.json b/examples/source-adapters/adapter-list.json new file mode 100644 index 0000000..9fc4f70 --- /dev/null +++ b/examples/source-adapters/adapter-list.json @@ -0,0 +1,52 @@ +{ + "count": 1, + "adapters": [ + { + "id": "source.fake", + "kind": "source-adapter", + "version": "1", + "name": "Fake Source Adapter", + "summary": "Contract-test adapter for plain fixture sources.", + "operations": [ + "read" + ], + "media_types": [ + "text/x.markitect-fake" + ], + "extensions": [ + ".fake" + ], + "capabilities": [ + { + "id": "source", + "kind": "read" + }, + { + "id": "markdown", + "kind": "normalize" + }, + { + "id": "diagnostics", + "kind": "emit" + }, + { + "id": "provenance", + "kind": "emit" + }, + { + "id": "filesystem", + "kind": "read" + } + ], + "safety": { + "reads_files": true, + "writes_files": false, + "network": false, + "external_process": false + }, + "docs": [ + "docs/source-adapter-contract.md" + ] + } + ] +} diff --git a/examples/source-adapters/fake-adapter-pyproject.toml b/examples/source-adapters/fake-adapter-pyproject.toml new file mode 100644 index 0000000..18bb59f --- /dev/null +++ b/examples/source-adapters/fake-adapter-pyproject.toml @@ -0,0 +1,2 @@ +[project.entry-points."markitect_tool.source_adapters"] +fake = "fake_adapter:fake_adapter_descriptor" diff --git a/examples/source-adapters/inspect-result.json b/examples/source-adapters/inspect-result.json new file mode 100644 index 0000000..dcd3a00 --- /dev/null +++ b/examples/source-adapters/inspect-result.json @@ -0,0 +1,35 @@ +{ + "valid": true, + "asset": { + "uri": "examples/source-adapters/sample.fake", + "path": "examples/source-adapters/sample.fake", + "name": "sample.fake", + "media_type": "text/x.markitect-fake", + "extension": ".fake", + "size": 128, + "digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111" + }, + "adapter": { + "id": "source.fake", + "version": "1", + "options": {} + }, + "metadata": { + "title": "Fake Source", + "creators": [ + "Markitect Fixture" + ], + "language": "en", + "identifiers": { + "fixture": "fake-source-001" + } + }, + "capabilities": [ + "read" + ], + "quality": { + "lossiness": "none", + "confidence": 1.0 + }, + "diagnostics": [] +} diff --git a/examples/source-adapters/normalized-document.json b/examples/source-adapters/normalized-document.json new file mode 100644 index 0000000..da3138b --- /dev/null +++ b/examples/source-adapters/normalized-document.json @@ -0,0 +1,86 @@ +{ + "valid": true, + "document": { + "schema_version": "markitect.source.v1", + "document_id": "source.fake:fake-source-001", + "asset": { + "uri": "examples/source-adapters/sample.fake", + "path": "examples/source-adapters/sample.fake", + "name": "sample.fake", + "media_type": "text/x.markitect-fake", + "extension": ".fake", + "size": 128, + "digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111" + }, + "metadata": { + "title": "Fake Source", + "creators": [ + "Markitect Fixture" + ], + "language": "en", + "identifiers": { + "fixture": "fake-source-001" + } + }, + "markdown": "# Fake Source\n\nA small normalized segment.\n\n## Second Segment\n\nAnother deterministic segment.", + "segments": [ + { + "segment_id": "seg-0001", + "order": 0, + "heading": "Fake Source", + "heading_level": 1, + "markdown": "# Fake Source\n\nA small normalized segment.", + "anchors": [ + "fake-source" + ], + "provenance": [ + { + "source_uri": "examples/source-adapters/sample.fake", + "source_path": "examples/source-adapters/sample.fake", + "anchor": "fake-source", + "section": "Fake Source" + } + ] + }, + { + "segment_id": "seg-0002", + "order": 1, + "heading": "Second Segment", + "heading_level": 2, + "markdown": "## Second Segment\n\nAnother deterministic segment.", + "anchors": [ + "second-segment" + ], + "provenance": [ + { + "source_uri": "examples/source-adapters/sample.fake", + "source_path": "examples/source-adapters/sample.fake", + "anchor": "second-segment", + "section": "Second Segment" + } + ] + } + ], + "quality": { + "lossiness": "none", + "confidence": 1.0, + "skipped_items": 0, + "warnings": 0 + }, + "diagnostics": [], + "provenance": [ + { + "source_uri": "examples/source-adapters/sample.fake", + "source_path": "examples/source-adapters/sample.fake", + "digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111" + } + ], + "adapter": { + "id": "source.fake", + "version": "1", + "options": {} + }, + "cache_key": "source-normalize:sha256:2222222222222222222222222222222222222222222222222222222222222222" + }, + "diagnostics": [] +} diff --git a/examples/source-adapters/normalized-output.md b/examples/source-adapters/normalized-output.md new file mode 100644 index 0000000..afb87d1 --- /dev/null +++ b/examples/source-adapters/normalized-output.md @@ -0,0 +1,7 @@ +# Fake Source + +A small normalized segment. + +## Second Segment + +Another deterministic segment. diff --git a/examples/source-adapters/sample.fake b/examples/source-adapters/sample.fake new file mode 100644 index 0000000..5aa0cd9 --- /dev/null +++ b/examples/source-adapters/sample.fake @@ -0,0 +1,11 @@ +title: Fake Source +creator: Markitect Fixture +language: en + +# Fake Source + +A small normalized segment. + +## Second Segment + +Another deterministic segment. diff --git a/workplans/MKTT-WP-0018-source-adapter-contract.md b/workplans/MKTT-WP-0018-source-adapter-contract.md index 4197c59..587155c 100644 --- a/workplans/MKTT-WP-0018-source-adapter-contract.md +++ b/workplans/MKTT-WP-0018-source-adapter-contract.md @@ -3,10 +3,10 @@ id: MKTT-WP-0018 type: workplan title: "Source Adapter Interface And Markdown Normalization Contract" domain: markitect -status: todo +status: active owner: markitect-tool topic_slug: markitect -planning_priority: P1 +planning_priority: P0 planning_order: 145 depends_on_workplans: - MKTT-WP-0013 @@ -70,7 +70,8 @@ new `markitect-filter` repo. `MKTT-WP-0019` must run first and pin the v1 contract details for this implementation: field-level normalized model semantics, read-only protocol shape, external package entry point discovery, CLI/API output envelopes, and -fake adapter contract-test expectations. +fake adapter contract-test expectations. Those decisions are captured in +`docs/source-adapter-contract.md`. `markitect-tool` should define: @@ -120,8 +121,8 @@ Implement the cross-repo architecture pinned by `MKTT-WP-0019`: - `kontextual-engine` can ingest adapter outputs into durable knowledge assets Output: architecture note covering responsibilities, extension package shape, -the pinned entry point contract, dependency policy, and migration path from the -current `infospace-bench` EPUB spike. +the `docs/source-adapter-contract.md` entry point contract, dependency policy, +and migration path from the current `infospace-bench` EPUB spike. ## P18.2 - Canonical source-to-markdown data model @@ -132,7 +133,8 @@ priority: high state_hub_task_id: "f8164264-a9c1-4c82-8617-76bbb84a51bb" ``` -Implement the normalized output model specified by `MKTT-WP-0019`: +Implement the normalized output model specified by +`docs/source-adapter-contract.md`: - `SourceAsset` - `SourceMetadata` @@ -154,7 +156,8 @@ The model should represent: - lossiness/quality signals - adapter name/version/options -Output: public data model, serialization tests, and normalization contract +Output: public data model, serialization tests using +`examples/source-adapters/normalized-document.json`, and normalization contract documentation matching the field-level v1 specification. ## P18.3 - Source adapter protocol and capability descriptors @@ -199,7 +202,7 @@ Wire source adapters into the existing internal extension framework: - register source adapter descriptors - discover package-provided adapters through the entry point group pinned by - `MKTT-WP-0019` + `docs/source-adapter-contract.md` - expose adapter capabilities via extension listing/inspection - report missing optional dependency diagnostics - ensure adapter packages can remain independently versioned @@ -284,3 +287,5 @@ Output: migration note and follow-up workplan seeds for `markitect-filter` and `markitect-tool` or `infospace-bench`. - Writer/export adapter support is explicitly deferred beyond the v1 read adapter contract. +- Implementation behavior matches `docs/source-adapter-contract.md` and the + fixtures in `examples/source-adapters/`. diff --git a/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md b/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md index 1eee6c6..e92c97f 100644 --- a/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md +++ b/workplans/MKTT-WP-0019-source-adapter-contract-refinement.md @@ -3,10 +3,10 @@ id: MKTT-WP-0019 type: workplan title: "Source Adapter Contract Refinement" domain: markitect -status: active +status: done owner: markitect-tool topic_slug: markitect -planning_priority: P0 +planning_priority: complete planning_order: 142 depends_on_workplans: - MKTT-WP-0013 @@ -82,7 +82,7 @@ The v1 source adapter contract should be: ```task id: MKTT-WP-0019-T001 -status: todo +status: done priority: high state_hub_task_id: "0aa1d9a3-6cf8-47ab-8585-f23b2512d19b" ``` @@ -99,11 +99,15 @@ Define the v1 source adapter scope: Output: concise architecture note or source-adapter contract section that `MKTT-WP-0018` can implement directly. +Implemented: `docs/source-adapter-contract.md` defines the v1 read-only scope, +local-file-first posture, external package shape, optional dependency policy, +and compatibility boundary for `markitect-filter`. + ## P19.2 - Specify normalized data model fields and serialization ```task id: MKTT-WP-0019-T002 -status: todo +status: done priority: high state_hub_task_id: "fabd3e76-3c2c-43cb-92b2-2322bd933fa7" ``` @@ -125,11 +129,17 @@ anchors, source hrefs, page/section references, and adapter metadata. Output: model contract documentation and fixture-shaped examples. +Implemented: `docs/source-adapter-contract.md` pins field-level model contracts +for source assets, metadata, provenance, segments, normalized documents, and +quality. `examples/source-adapters/normalized-document.json` and +`examples/source-adapters/normalized-output.md` provide fixture-shaped +examples. + ## P19.3 - Specify read adapter protocol and selection semantics ```task id: MKTT-WP-0019-T003 -status: todo +status: done priority: high state_hub_task_id: "2d559e3b-1515-4c88-8ed9-3895026cd2ca" ``` @@ -147,11 +157,16 @@ Define the v1 read protocol: Output: protocol contract that can be implemented as Python `Protocol` classes in `MKTT-WP-0018`. +Implemented: `docs/source-adapter-contract.md` defines the v1 +`SourceReadAdapter` protocol, request/result names, option handling, adapter +selection semantics, and deterministic diagnostics for unsupported, malformed, +and dependency-missing inputs. + ## P19.4 - Define package entry point and registry contract ```task id: MKTT-WP-0019-T004 -status: todo +status: done priority: high state_hub_task_id: "3d661d24-2496-405a-b525-c7e6d8eb4e68" ``` @@ -170,11 +185,17 @@ Define how external source adapter packages register with `markitect-tool`: Output: discovery contract and fake entry point test plan for `MKTT-WP-0018`. +Implemented: `docs/source-adapter-contract.md` defines the +`markitect_tool.source_adapters` entry point group, accepted entry point object +shapes, descriptor mapping to `ExtensionDescriptor`, duplicate handling, and +dependency diagnostics. `examples/source-adapters/fake-adapter-pyproject.toml` +provides the fake entry point fixture. + ## P19.5 - Pin CLI/API output envelopes and exit behavior ```task id: MKTT-WP-0019-T005 -status: todo +status: done priority: medium state_hub_task_id: "2c30b0c7-683e-4d60-8268-0b49660f2e30" ``` @@ -192,11 +213,17 @@ Specify the public source commands and library functions: Output: CLI/API contract note and expected-output fixtures. +Implemented: `docs/source-adapter-contract.md` pins the `mkt source` command +surface, formats, options, exit behavior, and public API export names. +`examples/source-adapters/adapter-list.json` and +`examples/source-adapters/inspect-result.json` provide expected-output +fixtures. + ## P19.6 - Prepare contract-test and markitect-filter handoff criteria ```task id: MKTT-WP-0019-T006 -status: todo +status: done priority: high state_hub_task_id: "f6845a4d-3465-40b3-970a-714cfafe282c" ``` @@ -219,6 +246,10 @@ Also seed the `markitect-filter` handoff: Output: contract-test checklist and handoff note. +Implemented: `docs/source-adapter-contract.md` includes the WP0018 contract +test checklist and the first `markitect-filter` EPUB3 handoff descriptor, +fixture expectations, and extraction responsibilities. + ## Acceptance - `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,