Workplan refinement and examples

This commit is contained in:
2026-05-14 21:49:43 +02:00
parent 28ce4b3f65
commit f8f20c7c32
13 changed files with 726 additions and 23 deletions

View File

@@ -18,6 +18,7 @@ requirements documents in `wiki/`.
- `docs/getting-started.md` - first-use guide from checkout to practical commands
- `docs/command-cheatsheet.md` - command-oriented workflow cheat sheet
- `docs/examples-index.md` - map from examples to usecases and commands
- `docs/source-adapter-contract.md` - v1 source adapter contract for external format adapters
- `docs/performance-notes.md` - local performance posture and smoke coverage
- `docs/cli-reference.md` - generated `mkt` command reference
- `docs/api-reference.md` - generated public API reference

View File

@@ -30,6 +30,13 @@ This index maps example files to practical usecases and useful commands.
| `examples/workflows/adr-release-notes.workflow.md` | ADR release notes workflow | `mkt workflow plan examples/workflows/adr-release-notes.workflow.md --output-dir /tmp/markitect-workflow` |
| `examples/workflows/assisted-review.workflow.md` | Assisted-generation boundary shape | Inspect with `mkt workflow inspect` |
## Source Adapter Contract Fixtures
| Files | Usecase | Try |
| --- | --- | --- |
| `examples/source-adapters/*.json`, `examples/source-adapters/normalized-output.md` | Expected envelopes for the v1 source adapter contract | Use as fixtures for `mkt source` commands after `MKTT-WP-0018` implementation |
| `examples/source-adapters/fake-adapter-pyproject.toml` | External adapter entry point shape | Use as the fake package discovery fixture for source adapter contract tests |
## Cache, Backend, Policy, And Context
| Files | Usecase | Try |

View File

@@ -39,6 +39,7 @@ framework organizes how Markitect itself exposes and composes capabilities.
| `policy-gateway` | local label gateway, future external auth adapters | subject/action/object in, decision or filtered results out |
| `template-engine` | deterministic templates | template/data in, Markdown out |
| `generation-adapter` | provider-neutral assisted generation | request in, generated candidate out |
| `source-adapter` | EPUB3/PDF/DOCX adapters in external packages | source asset in, normalized Markdown out |
| `cli-group` | cache, backend, ref, class | command descriptors or registration hook |
| `render-export` | future Quarkdown/export adapters | Markdown source in, rendered/exported artifact out |
| `document-function` | future function layer | function call in, typed document value out |

View File

@@ -0,0 +1,461 @@
# Source Adapter Contract
## Purpose
This document pins the v1 contract for source-format adapters. It is the
handoff from `MKTT-WP-0019` to `MKTT-WP-0018`: `markitect-tool` implements the
contract, registry, CLI, public API, and tests; `markitect-filter` implements
concrete adapters, starting with EPUB3.
The v1 contract is intentionally read-only. It normalizes heterogeneous source
formats into canonical Markitect Markdown plus metadata, provenance, quality
signals, and diagnostics. Writer/export adapters are future scope.
## Scope
The v1 source adapter layer supports:
- local filesystem source inputs
- deterministic inspection and normalization
- package-provided read adapters discovered through Python entry points
- optional dependencies isolated in adapter packages
- JSON-serializable normalized Markdown outputs
- contract tests with fake adapters and small fixtures
The v1 layer does not support:
- EPUB3, PDF, DOCX, ODT, OCR, browser, or archive parsing in `markitect-tool`
- write/export adapters
- network fetching for source URIs
- durable ingestion, permissions, retrieval, or governance
- hidden AI-assisted repair or enrichment
URI fields appear in the model so adapters can preserve source identity, but
v1 CLI/API inputs are local paths unless a later workplan opens remote source
loading explicitly.
## Package Shape
External adapter packages should depend on `markitect-tool` and register one or
more read adapter descriptors through the entry point group
`markitect_tool.source_adapters`.
Recommended `markitect-filter` shape:
```text
markitect_filter/
src/markitect_filter/
__init__.py
epub3.py
adapters.py
tests/
fixtures/epub3/
test_epub3_adapter.py
pyproject.toml
```
The package should expose a lightweight descriptor function that does not
import heavyweight format dependencies until the adapter is instantiated or
used. For example:
```toml
[project.entry-points."markitect_tool.source_adapters"]
epub3 = "markitect_filter.adapters:epub3_adapter_descriptor"
```
Adapter packages may use extras such as `markitect-filter[epub3]` or
`markitect-filter[pdf]`. Missing optional dependencies must be reported through
structured diagnostics; they must not surface as raw import errors.
## Descriptor Contract
`MKTT-WP-0018` should implement a `SourceAdapterDescriptor` dataclass and map
it into the existing `ExtensionDescriptor` catalog with kind `source-adapter`.
Required descriptor fields:
| Field | Type | Meaning |
| --- | --- | --- |
| `id` | `str` | Stable adapter id, for example `source.epub3`. |
| `version` | `str` | Adapter contract implementation version. |
| `name` | `str` | Human-readable adapter name. |
| `operations` | `list[str]` | V1 must contain only `read`. |
| `media_types` | `list[str]` | Supported media types, lower-case. |
| `extensions` | `list[str]` | Supported file suffixes including dots. |
| `factory` | callable | Returns a `SourceReadAdapter`. |
Optional descriptor fields:
| Field | Type | Meaning |
| --- | --- | --- |
| `summary` | `str | None` | Short description for CLI and docs. |
| `option_schema` | `dict` | JSON-schema-like adapter options. |
| `optional_dependencies` | `list[OptionalDependency]` | Runtime libraries needed by the adapter. |
| `safety` | `dict` | Reads files, network, external processes, and related flags. |
| `quality_profile` | `dict` | Known extraction quality behavior. |
| `metadata` | `dict` | Adapter-specific metadata. |
The corresponding `ExtensionDescriptor` should use:
```text
id: same as SourceAdapterDescriptor.id
kind: source-adapter
input_contract: SourceInspectRequest | SourceReadRequest
output_contract: SourceInspectResult | SourceReadResult
diagnostics_namespace: source
provenance_prefix: source.<adapter>
```
Capabilities should include:
```text
source read
markdown normalize
diagnostics emit
provenance emit
filesystem read
```
Descriptor IDs are globally unique. Duplicate IDs from external packages are
registry errors. Descriptors from package entry points are sorted by ID for
deterministic listing.
## Entry Point Contract
The entry point group is:
```text
markitect_tool.source_adapters
```
Each entry point may load to one of:
- a `SourceAdapterDescriptor`
- an iterable of `SourceAdapterDescriptor`
- a callable returning either of the above
Discovery must not instantiate adapters unless the loaded object itself is a
descriptor factory. Descriptors should remain cheap enough to list without
format-specific imports.
Discovery errors should produce diagnostics with code
`source.discovery_failed`. Missing optional dependencies declared by a
descriptor should produce `source.missing_dependency` and mark that adapter
unavailable for reads until the dependency is installed.
## Data Model
All model objects must support stable `to_dict()` serialization. Serialization
rules:
- omit `None`, empty lists, empty dicts, and empty strings
- preserve `False`, `0`, and empty Markdown content where semantically valid
- use UTF-8 text
- use canonical JSON with sorted keys and compact separators when computing
hashes or cache keys
- keep all timestamps and dates as strings unless they are filesystem metadata
such as `mtime_ns`
### `SourceAsset`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `uri` | yes | `str` | Stable source URI. For local files, use a normalized path URI or path string. |
| `path` | no | `str` | Local path when available. |
| `name` | no | `str` | Display name or basename. |
| `media_type` | no | `str` | Detected or declared media type. |
| `extension` | no | `str` | Lower-case suffix including the dot. |
| `size` | no | `int` | Byte size for local files. |
| `mtime_ns` | no | `int` | Local file modification timestamp in nanoseconds. |
| `digest` | no | `str` | `sha256:<hex>` of source bytes when available. |
| `metadata` | no | `dict` | Source asset metadata that is not document metadata. |
### `SourceMetadata`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `title` | no | `str` | Source title. |
| `creators` | no | `list[str]` | Authors or creators in source order. |
| `language` | no | `str` | BCP 47 language tag when known. |
| `rights` | no | `str` | Rights or license text from the source. |
| `source_url` | no | `str` | Original public URL when known. |
| `publication_date` | no | `str` | Source publication date string. |
| `publisher` | no | `str` | Publisher name. |
| `identifiers` | no | `dict[str, str]` | ISBN, DOI, package IDs, and similar identifiers. |
| `raw` | no | `dict` | Adapter-preserved raw metadata. |
### `SourceProvenance`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `source_uri` | yes | `str` | Source asset URI. |
| `source_path` | no | `str` | Local source path. |
| `source_href` | no | `str` | Package-internal href or document-relative reference. |
| `package_path` | no | `str` | Archive/package member path, such as EPUB XHTML. |
| `anchor` | no | `str` | Source anchor or fragment. |
| `page` | no | `str` | Page label or number where available. |
| `section` | no | `str` | Chapter, section, or nav label. |
| `start_offset` | no | `int` | Adapter-defined start offset. |
| `end_offset` | no | `int` | Adapter-defined end offset. |
| `digest` | no | `str` | Digest of the specific source component. |
| `metadata` | no | `dict` | Adapter-specific provenance details. |
### `NormalizedMarkdownSegment`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `segment_id` | yes | `str` | Stable ID unique within the document. |
| `order` | yes | `int` | Zero-based reading order. |
| `markdown` | yes | `str` | Canonical Markdown for the segment. |
| `heading` | no | `str` | Primary heading text for the segment. |
| `heading_level` | no | `int` | Markdown heading level when known. |
| `anchors` | no | `list[str]` | Source anchors covered by the segment. |
| `provenance` | no | `list[SourceProvenance]` | Source spans contributing to the segment. |
| `metadata` | no | `dict` | Adapter-specific segment metadata. |
Segment IDs should be deterministic. Prefer source anchors when they are stable
and unique. Otherwise use ordinal IDs such as `seg-0001`, `seg-0002`, and so
on. Segment order is always authoritative for reading order.
### `NormalizationQuality`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `lossiness` | yes | `str` | One of `none`, `low`, `medium`, `high`, or `unknown`. |
| `confidence` | no | `float` | Adapter confidence from `0.0` to `1.0`. |
| `skipped_items` | no | `int` | Count of skipped source items. |
| `warnings` | no | `int` | Count of warning diagnostics. |
| `metadata` | no | `dict` | Adapter-specific quality details. |
### `NormalizedMarkdownDocument`
| Field | Required | Type | Meaning |
| --- | --- | --- | --- |
| `schema_version` | yes | `str` | V1 uses `markitect.source.v1`. |
| `document_id` | yes | `str` | Stable normalized document ID. |
| `asset` | yes | `SourceAsset` | Original source identity. |
| `metadata` | yes | `SourceMetadata` | Source document metadata. |
| `markdown` | yes | `str` | Full normalized Markdown. |
| `segments` | yes | `list[NormalizedMarkdownSegment]` | Ordered segment list. |
| `quality` | yes | `NormalizationQuality` | Extraction quality summary. |
| `diagnostics` | no | `list[Diagnostic]` | Existing Markitect diagnostic shape. |
| `provenance` | no | `list[SourceProvenance]` | Document-level provenance. |
| `attachments` | no | `list[SourceAsset]` | Referenced binary assets; v1 metadata only. |
| `adapter` | yes | `dict` | Adapter id, version, and options. |
| `cache_key` | yes | `str` | Deterministic normalization cache key. |
The full `markdown` field should be equal to the ordered segment Markdown joined
with exactly two newlines, unless an adapter has a documented reason to emit
document-level frontmatter or separators.
## Hashing And Cache Keys
Source asset digests use the source bytes:
```text
sha256:<hex>
```
Document IDs should be stable across machines and based on:
- normalized source asset URI or path
- source asset digest when available
- adapter ID
- adapter version
Normalization cache keys should be based on canonical JSON containing:
- source asset URI or path
- source asset digest
- adapter ID
- adapter version
- normalized model version
- read options
Use this prefix:
```text
source-normalize:sha256:<hex>
```
## Read Adapter Protocol
`MKTT-WP-0018` should implement Python `Protocol` classes equivalent to:
```python
class SourceReadAdapter(Protocol):
descriptor: SourceAdapterDescriptor
def can_read(self, request: SourceAdapterMatchRequest) -> SourceAdapterMatch:
...
def inspect(self, request: SourceInspectRequest) -> SourceInspectResult:
...
def read(self, request: SourceReadRequest) -> SourceReadResult:
...
```
Request and result objects:
| Type | Required fields | Meaning |
| --- | --- | --- |
| `SourceAdapterMatchRequest` | `asset`, `options` | Cheap matching request. |
| `SourceAdapterMatch` | `adapter_id`, `matched`, `confidence`, `reason`, `diagnostics` | Match result. Confidence is `0` to `100`. |
| `SourceInspectRequest` | `asset`, `options` | Metadata-only inspection request. |
| `SourceInspectResult` | `valid`, `asset`, `adapter`, `metadata`, `capabilities`, `diagnostics`, `quality` | Inspection result without full Markdown conversion. |
| `SourceReadRequest` | `asset`, `options` | Full normalization request. |
| `SourceReadResult` | `valid`, `document`, `diagnostics` | Normalized read result. |
`inspect` must not perform full conversion. It may open enough of the source to
validate structure and collect metadata. `read` may perform full extraction.
Options must be JSON-serializable. Adapter-specific options should be declared
in `option_schema`. Unknown options should produce
`source.unknown_option` unless the descriptor explicitly permits free-form
options.
## Adapter Selection
Selection is deterministic:
1. If an explicit adapter ID is provided, use only that descriptor.
2. Prefer media type matches over extension-only matches.
3. Prefer higher `can_read().confidence`.
4. Prefer descriptors with required optional dependencies available.
5. Break remaining ties by descriptor ID in ascending lexical order and emit
warning `source.adapter_ambiguous`.
No matching adapter returns an error diagnostic:
```text
source.unsupported_format
```
Malformed sources return an error diagnostic:
```text
source.malformed
```
Missing required optional dependencies return:
```text
source.missing_dependency
```
Warnings do not make a result invalid. Any error diagnostic makes `valid`
false.
## CLI Contract
The public commands are:
```bash
mkt source adapters
mkt source inspect <path>
mkt source normalize <path> --format markdown
```
Common options:
```text
--adapter <adapter-id> Explicit adapter selection.
--format text|json|yaml For adapters and inspect.
--format markdown|json|yaml For normalize.
--option key=value Adapter-specific option, repeatable.
--output <path> Write normalized output.
```
Exit behavior:
| Exit | Meaning |
| --- | --- |
| `0` | Operation valid; warning diagnostics may exist. |
| `1` | Operation completed with error diagnostics. |
| `2` | CLI usage error from Click. |
JSON output must contain a top-level `valid` field for operations that can
fail. Markdown output writes only normalized Markdown to stdout or `--output`;
diagnostics for Markdown output go to stderr. If normalization is invalid, do
not emit partial Markdown unless a future option explicitly requests it.
## API Contract
`MKTT-WP-0018` should export these names from `markitect_tool`:
```text
SourceAsset
SourceMetadata
SourceProvenance
NormalizedMarkdownSegment
NormalizedMarkdownDocument
NormalizationQuality
SourceAdapterDescriptor
SourceReadAdapter
SourceAdapterRegistry
SourceAdapterMatchRequest
SourceAdapterMatch
SourceInspectRequest
SourceInspectResult
SourceReadRequest
SourceReadResult
default_source_adapter_registry
discover_source_adapters
inspect_source
normalize_source
```
Direct API helpers should accept an optional registry and adapter ID so tests
and sibling repos can avoid global discovery when they need deterministic
fixtures.
## Contract Tests For MKTT-WP-0018
Implementation should add tests for:
- `SourceAsset`, metadata, provenance, quality, segment, and document
serialization
- source document cache-key determinism
- fake in-tree adapter registration and read behavior
- fake external entry point discovery
- optional dependency diagnostics
- unsupported format diagnostics
- malformed source diagnostics
- adapter selection tie behavior
- CLI `source adapters` JSON fixture
- CLI `source inspect` JSON fixture
- CLI `source normalize --format json` fixture
- CLI `source normalize --format markdown` fixture
- public API exports
Fixtures live in `examples/source-adapters/` and should be reused by tests where
practical.
## Markitect-filter Handoff
The first `markitect-filter` implementation should provide an EPUB3 descriptor:
```text
id: source.epub3
name: EPUB3
operations: read
media_types: application/epub+zip
extensions: .epub
entry_point: markitect_filter.adapters:epub3_adapter_descriptor
```
The EPUB3 adapter should inspect and normalize:
- `META-INF/container.xml`
- the OPF package document
- Dublin Core and package metadata
- spine reading order
- navigation labels
- body XHTML as ordered Markdown segments
- source hrefs, anchors, sections, and page references where available
It should classify or skip cover, navigation, table-of-contents, header,
footer, license, and transcriber-note material through explicit options and
diagnostics. It should report unsupported media, malformed package structure,
skipped assets, and lossy extraction.

View File

@@ -42,8 +42,8 @@ and descriptions mirror the operational view.
| `MKTT-WP-0012` | complete | done | `MKTT-WP-0004`, `MKTT-WP-0010`, `MKTT-WP-0011` | Document function layer is complete: deterministic Markdown-native function descriptors, registry, inline/fenced syntax, pipelines, context bindings, CLI, docs, examples, diagnostics, provenance, and extension descriptor. |
| `MKTT-WP-0008` | complete | done | `MKTT-WP-0006`, `MKTT-WP-0007`, `MKTT-WP-0009` | Agent working-memory context cache is complete: context package schema, local registry, package creation from queries/search/manifests, deterministic summaries, namespaces, activation/deactivation/refresh/explain lifecycle, policy re-checks, CLI, docs, and examples. |
| `MKTT-WP-0017` | complete | done | `MKTT-WP-0003`, `MKTT-WP-0013` | CLI/API polish and practical adoption track is complete: shell completion, extension discovery, generated CLI/API docs, usecase relevance matrix, E2E fixture matrix, large-corpus smoke coverage, first-use docs, examples index, and command cheat sheet. |
| `MKTT-WP-0019` | P0 | active | `MKTT-WP-0013`, `MKTT-WP-0017` | Source adapter contract refinement: pin v1 read-only scope, field-level normalized model semantics, external adapter entry point discovery, CLI/API envelopes, fake adapter contract tests, and `markitect-filter` handoff before implementation. |
| `MKTT-WP-0018` | P1 | todo | `MKTT-WP-0013`, `MKTT-WP-0017`, `MKTT-WP-0019` | Source adapter framework implementation: implement the contract refined in `MKTT-WP-0019`, keeping format extraction in `markitect-filter` and the base install free of heavyweight conversion dependencies. |
| `MKTT-WP-0019` | complete | done | `MKTT-WP-0013`, `MKTT-WP-0017` | Source adapter contract refinement is complete: v1 read-only scope, normalized model fields, package entry point discovery, CLI/API envelopes, fake adapter fixtures, and `markitect-filter` EPUB3 handoff are pinned in `docs/source-adapter-contract.md`. |
| `MKTT-WP-0018` | P0 | active | `MKTT-WP-0013`, `MKTT-WP-0017`, `MKTT-WP-0019` | Source adapter framework implementation is the current track: implement `docs/source-adapter-contract.md`, keeping format extraction in `markitect-filter` and the base install free of heavyweight conversion dependencies. |
| `MKTT-WP-0015` | P2 | todo | `MKTT-WP-0010`, `MKTT-WP-0011`, `MKTT-WP-0012` | Future render and document-function extensions: typed values, richer syntax, document-local reusable functions, Quarkdown/export adapters, render-aware references, assets, and permission sandboxing. Defer unless publishing/export pressure becomes current. |
| `MKTT-WP-0016` | P2 | todo | `MKTT-WP-0008`, `MKTT-WP-0007`, `MKTT-WP-0009`, `MKTT-WP-0013` | Follow-on agentic memory architecture: reasoning decision graphs, conversational paths, long-term knowledge graphs, memory service blueprints/profiles, graph-to-context-package compilation, and adapter boundaries. |
@@ -125,12 +125,16 @@ before the remaining advanced extension work because users and agents need a
complete, documented, shell-friendly, test-backed public surface before the
tool grows further.
`MKTT-WP-0019` is the current source-adapter refinement track. It intentionally
precedes `MKTT-WP-0018` so the implementation work does not have to invent
`MKTT-WP-0019` completed the source-adapter refinement track. It pins the
field-level normalized model semantics, package entry point discovery, read
protocol behavior, CLI/API envelopes, or `markitect-filter` handoff criteria
while coding. The v1 contract should stay read-only; writer/export adapters
belong in later format-specific work once preservation semantics are explicit.
protocol behavior, CLI/API envelopes, fake adapter fixtures, and
`markitect-filter` handoff criteria in `docs/source-adapter-contract.md`. The
v1 contract is read-only; writer/export adapters belong in later
format-specific work once preservation semantics are explicit.
`MKTT-WP-0018` is now the current source-adapter implementation track. It
should implement the pinned contract directly rather than reopening v1 model,
entry point, protocol, or CLI/API decisions.
## State Hub Mirror

View File

@@ -0,0 +1,52 @@
{
"count": 1,
"adapters": [
{
"id": "source.fake",
"kind": "source-adapter",
"version": "1",
"name": "Fake Source Adapter",
"summary": "Contract-test adapter for plain fixture sources.",
"operations": [
"read"
],
"media_types": [
"text/x.markitect-fake"
],
"extensions": [
".fake"
],
"capabilities": [
{
"id": "source",
"kind": "read"
},
{
"id": "markdown",
"kind": "normalize"
},
{
"id": "diagnostics",
"kind": "emit"
},
{
"id": "provenance",
"kind": "emit"
},
{
"id": "filesystem",
"kind": "read"
}
],
"safety": {
"reads_files": true,
"writes_files": false,
"network": false,
"external_process": false
},
"docs": [
"docs/source-adapter-contract.md"
]
}
]
}

View File

@@ -0,0 +1,2 @@
[project.entry-points."markitect_tool.source_adapters"]
fake = "fake_adapter:fake_adapter_descriptor"

View File

@@ -0,0 +1,35 @@
{
"valid": true,
"asset": {
"uri": "examples/source-adapters/sample.fake",
"path": "examples/source-adapters/sample.fake",
"name": "sample.fake",
"media_type": "text/x.markitect-fake",
"extension": ".fake",
"size": 128,
"digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111"
},
"adapter": {
"id": "source.fake",
"version": "1",
"options": {}
},
"metadata": {
"title": "Fake Source",
"creators": [
"Markitect Fixture"
],
"language": "en",
"identifiers": {
"fixture": "fake-source-001"
}
},
"capabilities": [
"read"
],
"quality": {
"lossiness": "none",
"confidence": 1.0
},
"diagnostics": []
}

View File

@@ -0,0 +1,86 @@
{
"valid": true,
"document": {
"schema_version": "markitect.source.v1",
"document_id": "source.fake:fake-source-001",
"asset": {
"uri": "examples/source-adapters/sample.fake",
"path": "examples/source-adapters/sample.fake",
"name": "sample.fake",
"media_type": "text/x.markitect-fake",
"extension": ".fake",
"size": 128,
"digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111"
},
"metadata": {
"title": "Fake Source",
"creators": [
"Markitect Fixture"
],
"language": "en",
"identifiers": {
"fixture": "fake-source-001"
}
},
"markdown": "# Fake Source\n\nA small normalized segment.\n\n## Second Segment\n\nAnother deterministic segment.",
"segments": [
{
"segment_id": "seg-0001",
"order": 0,
"heading": "Fake Source",
"heading_level": 1,
"markdown": "# Fake Source\n\nA small normalized segment.",
"anchors": [
"fake-source"
],
"provenance": [
{
"source_uri": "examples/source-adapters/sample.fake",
"source_path": "examples/source-adapters/sample.fake",
"anchor": "fake-source",
"section": "Fake Source"
}
]
},
{
"segment_id": "seg-0002",
"order": 1,
"heading": "Second Segment",
"heading_level": 2,
"markdown": "## Second Segment\n\nAnother deterministic segment.",
"anchors": [
"second-segment"
],
"provenance": [
{
"source_uri": "examples/source-adapters/sample.fake",
"source_path": "examples/source-adapters/sample.fake",
"anchor": "second-segment",
"section": "Second Segment"
}
]
}
],
"quality": {
"lossiness": "none",
"confidence": 1.0,
"skipped_items": 0,
"warnings": 0
},
"diagnostics": [],
"provenance": [
{
"source_uri": "examples/source-adapters/sample.fake",
"source_path": "examples/source-adapters/sample.fake",
"digest": "sha256:1111111111111111111111111111111111111111111111111111111111111111"
}
],
"adapter": {
"id": "source.fake",
"version": "1",
"options": {}
},
"cache_key": "source-normalize:sha256:2222222222222222222222222222222222222222222222222222222222222222"
},
"diagnostics": []
}

View File

@@ -0,0 +1,7 @@
# Fake Source
A small normalized segment.
## Second Segment
Another deterministic segment.

View File

@@ -0,0 +1,11 @@
title: Fake Source
creator: Markitect Fixture
language: en
# Fake Source
A small normalized segment.
## Second Segment
Another deterministic segment.

View File

@@ -3,10 +3,10 @@ id: MKTT-WP-0018
type: workplan
title: "Source Adapter Interface And Markdown Normalization Contract"
domain: markitect
status: todo
status: active
owner: markitect-tool
topic_slug: markitect
planning_priority: P1
planning_priority: P0
planning_order: 145
depends_on_workplans:
- MKTT-WP-0013
@@ -70,7 +70,8 @@ new `markitect-filter` repo.
`MKTT-WP-0019` must run first and pin the v1 contract details for this
implementation: field-level normalized model semantics, read-only protocol
shape, external package entry point discovery, CLI/API output envelopes, and
fake adapter contract-test expectations.
fake adapter contract-test expectations. Those decisions are captured in
`docs/source-adapter-contract.md`.
`markitect-tool` should define:
@@ -120,8 +121,8 @@ Implement the cross-repo architecture pinned by `MKTT-WP-0019`:
- `kontextual-engine` can ingest adapter outputs into durable knowledge assets
Output: architecture note covering responsibilities, extension package shape,
the pinned entry point contract, dependency policy, and migration path from the
current `infospace-bench` EPUB spike.
the `docs/source-adapter-contract.md` entry point contract, dependency policy,
and migration path from the current `infospace-bench` EPUB spike.
## P18.2 - Canonical source-to-markdown data model
@@ -132,7 +133,8 @@ priority: high
state_hub_task_id: "f8164264-a9c1-4c82-8617-76bbb84a51bb"
```
Implement the normalized output model specified by `MKTT-WP-0019`:
Implement the normalized output model specified by
`docs/source-adapter-contract.md`:
- `SourceAsset`
- `SourceMetadata`
@@ -154,7 +156,8 @@ The model should represent:
- lossiness/quality signals
- adapter name/version/options
Output: public data model, serialization tests, and normalization contract
Output: public data model, serialization tests using
`examples/source-adapters/normalized-document.json`, and normalization contract
documentation matching the field-level v1 specification.
## P18.3 - Source adapter protocol and capability descriptors
@@ -199,7 +202,7 @@ Wire source adapters into the existing internal extension framework:
- register source adapter descriptors
- discover package-provided adapters through the entry point group pinned by
`MKTT-WP-0019`
`docs/source-adapter-contract.md`
- expose adapter capabilities via extension listing/inspection
- report missing optional dependency diagnostics
- ensure adapter packages can remain independently versioned
@@ -284,3 +287,5 @@ Output: migration note and follow-up workplan seeds for `markitect-filter` and
`markitect-tool` or `infospace-bench`.
- Writer/export adapter support is explicitly deferred beyond the v1 read
adapter contract.
- Implementation behavior matches `docs/source-adapter-contract.md` and the
fixtures in `examples/source-adapters/`.

View File

@@ -3,10 +3,10 @@ id: MKTT-WP-0019
type: workplan
title: "Source Adapter Contract Refinement"
domain: markitect
status: active
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: P0
planning_priority: complete
planning_order: 142
depends_on_workplans:
- MKTT-WP-0013
@@ -82,7 +82,7 @@ The v1 source adapter contract should be:
```task
id: MKTT-WP-0019-T001
status: todo
status: done
priority: high
state_hub_task_id: "0aa1d9a3-6cf8-47ab-8585-f23b2512d19b"
```
@@ -99,11 +99,15 @@ Define the v1 source adapter scope:
Output: concise architecture note or source-adapter contract section that
`MKTT-WP-0018` can implement directly.
Implemented: `docs/source-adapter-contract.md` defines the v1 read-only scope,
local-file-first posture, external package shape, optional dependency policy,
and compatibility boundary for `markitect-filter`.
## P19.2 - Specify normalized data model fields and serialization
```task
id: MKTT-WP-0019-T002
status: todo
status: done
priority: high
state_hub_task_id: "fabd3e76-3c2c-43cb-92b2-2322bd933fa7"
```
@@ -125,11 +129,17 @@ anchors, source hrefs, page/section references, and adapter metadata.
Output: model contract documentation and fixture-shaped examples.
Implemented: `docs/source-adapter-contract.md` pins field-level model contracts
for source assets, metadata, provenance, segments, normalized documents, and
quality. `examples/source-adapters/normalized-document.json` and
`examples/source-adapters/normalized-output.md` provide fixture-shaped
examples.
## P19.3 - Specify read adapter protocol and selection semantics
```task
id: MKTT-WP-0019-T003
status: todo
status: done
priority: high
state_hub_task_id: "2d559e3b-1515-4c88-8ed9-3895026cd2ca"
```
@@ -147,11 +157,16 @@ Define the v1 read protocol:
Output: protocol contract that can be implemented as Python `Protocol`
classes in `MKTT-WP-0018`.
Implemented: `docs/source-adapter-contract.md` defines the v1
`SourceReadAdapter` protocol, request/result names, option handling, adapter
selection semantics, and deterministic diagnostics for unsupported, malformed,
and dependency-missing inputs.
## P19.4 - Define package entry point and registry contract
```task
id: MKTT-WP-0019-T004
status: todo
status: done
priority: high
state_hub_task_id: "3d661d24-2496-405a-b525-c7e6d8eb4e68"
```
@@ -170,11 +185,17 @@ Define how external source adapter packages register with `markitect-tool`:
Output: discovery contract and fake entry point test plan for
`MKTT-WP-0018`.
Implemented: `docs/source-adapter-contract.md` defines the
`markitect_tool.source_adapters` entry point group, accepted entry point object
shapes, descriptor mapping to `ExtensionDescriptor`, duplicate handling, and
dependency diagnostics. `examples/source-adapters/fake-adapter-pyproject.toml`
provides the fake entry point fixture.
## P19.5 - Pin CLI/API output envelopes and exit behavior
```task
id: MKTT-WP-0019-T005
status: todo
status: done
priority: medium
state_hub_task_id: "2c30b0c7-683e-4d60-8268-0b49660f2e30"
```
@@ -192,11 +213,17 @@ Specify the public source commands and library functions:
Output: CLI/API contract note and expected-output fixtures.
Implemented: `docs/source-adapter-contract.md` pins the `mkt source` command
surface, formats, options, exit behavior, and public API export names.
`examples/source-adapters/adapter-list.json` and
`examples/source-adapters/inspect-result.json` provide expected-output
fixtures.
## P19.6 - Prepare contract-test and markitect-filter handoff criteria
```task
id: MKTT-WP-0019-T006
status: todo
status: done
priority: high
state_hub_task_id: "f6845a4d-3465-40b3-970a-714cfafe282c"
```
@@ -219,6 +246,10 @@ Also seed the `markitect-filter` handoff:
Output: contract-test checklist and handoff note.
Implemented: `docs/source-adapter-contract.md` includes the WP0018 contract
test checklist and the first `markitect-filter` EPUB3 handoff descriptor,
fixture expectations, and extraction responsibilities.
## Acceptance
- `MKTT-WP-0018` has no unresolved v1 contract ambiguity around model fields,