Markitect boundary and reuse tests

This commit is contained in:
2026-05-05 19:41:32 +02:00
parent 9f1b8da87a
commit ef8391e6a7
15 changed files with 490 additions and 6 deletions

View File

@@ -16,6 +16,7 @@ Start here:
- `docs/stack-decision.md`
- `docs/markitect-main-scope-assessment.md`
- `docs/markitect-tool-reuse-boundary.md`
- `docs/markitect-tool-integration-usecases.md`
- `docs/phase-memory-boundary.md`
- `docs/system-layer-extraction-inventory.md`
- `docs/system-layer-migration-backlog.md`

View File

@@ -201,8 +201,11 @@ Required MVP ports:
Adapter rules:
- `markitect-tool` is an adapter for markdown syntax and context-package
interoperability.
- `markitect-tool` is an adapter for markdown syntax, selector extraction,
deterministic markdown operations, snapshot identity, contracts/runtime
checks, and context-package interoperability. Engine domain code must not
import it directly; adapter code should persist serializable Markitect
outputs as adapter provenance or representation metadata.
- `llm-connect` or equivalent is an adapter for LLM providers.
- `phase-memory` is an adjacent memory runtime; this engine may exchange opaque
memory references or context packages but should not implement memory phases.

View File

@@ -55,6 +55,11 @@ The strongest implementation wedge is:
review-gated where needed.
- Use adapters for markdown tooling, document extraction, AI providers, search,
workflow engines, external policy systems, and storage backends.
- Treat `markitect-tool` as the Markdown-native adapter dependency described in
`docs/markitect-tool-reuse-boundary.md` and
`docs/markitect-tool-integration-usecases.md`, with contract tests guarding
the expected parser, selector, operations, snapshot, and context-package
behavior.
## Workplan Set

View File

@@ -0,0 +1,213 @@
# markitect-tool Integration Use Cases
Date: 2026-05-05
Status: contract examples for the `kontextual-engine` to `markitect-tool`
adapter boundary.
## Purpose
`kontextual-engine` should use `markitect-tool` for Markdown-native syntax,
selection, deterministic operations, snapshot identity, and portable
markdown-backed context packages. The engine should not rebuild those features.
Instead, it should wrap them as adapters and persist engine-owned assets,
lineage, policy decisions, audit events, and service contracts around them.
The executable companion for this document is
`tests/test_markitect_tool_contract.py`.
## Expected Dependency Shape
- Optional dependency: `kontextual-engine[markdown]`.
- Import preference: public exports from `markitect_tool` or documented
subpackages.
- Failure mode: when unavailable, engine adapters raise structured adapter
errors and non-Markdown functionality continues.
- Persistence posture: store serializable Markitect results and provenance as
adapter metadata, not as canonical domain objects.
## Use Case 1: Markdown Normalization
Intent: convert Markdown source content into structured frontmatter, headings,
sections, blocks, and source-aware metadata.
Expected Markitect APIs:
- `parse_markdown(markdown, source_path=...)`
- `parse_markdown_file(path)`
- `Document.to_dict()`
Example:
```python
from markitect_tool import parse_markdown
document = parse_markdown(markdown, source_path="decision.md")
frontmatter = document.frontmatter
headings = [heading.to_dict() for heading in document.headings]
```
Engine expectation:
- Ingestion stores the original source representation separately from the
normalized representation.
- Frontmatter and headings can become engine metadata.
- The parser object model remains Markitect-owned.
## Use Case 2: Selector-Based Markdown Extraction
Intent: extract stable Markdown units such as a section, heading set,
frontmatter path, or block without inventing a second selector language.
Expected Markitect APIs:
- `query_document(document, "sections[heading=Decision]")`
- `extract_document(document, "frontmatter.status")`
- optional JSONPath support through Markitect when the dependency is present.
Example:
```python
from markitect_tool import parse_markdown, query_document, extract_document
document = parse_markdown(markdown)
matches = query_document(document, "sections[heading=Decision]")
text = extract_document(document, "sections[heading=Decision]")[0]
```
Engine expectation:
- Engine retrieval contracts can reference source-grounded snippets derived
from Markitect matches.
- Cross-format retrieval remains engine-owned and cannot be reduced to
Markitect selectors.
## Use Case 3: Deterministic Markdown Operations
Intent: compose, include, or transform Markdown while preserving Markitect
operation provenance.
Expected Markitect APIs:
- `transform_markdown(...)`
- `compose_files(...)`
- `resolve_includes(...)`
- `OperationProvenance.to_dict()`
Example:
```python
from markitect_tool import resolve_includes, transform_markdown
included = resolve_includes(markdown, base_dir=root)
transformed = transform_markdown(
included.markdown,
set_frontmatter={"status": "draft"},
heading_delta=1,
)
```
Engine expectation:
- The transformation run, actor, policy context, source versions, and derived
artifact identity are engine-owned.
- Markitect provenance is stored as adapter provenance inside the engine run.
## Use Case 4: Snapshot Identity For Markdown Sources
Intent: reuse Markitect content-addressed snapshot identity for
Markdown-backed normalized representations and dependency tracking.
Expected Markitect APIs:
- `snapshot_identity_for_file(path, parse_options=..., contract_hash=...)`
- `SnapshotIdentity.snapshot_id`
- `SnapshotIdentity.to_dict()`
Example:
```python
from markitect_tool import snapshot_identity_for_file
identity = snapshot_identity_for_file("decision.md")
snapshot_id = identity.snapshot_id
```
Engine expectation:
- The engine can persist Markitect snapshot IDs as adapter metadata.
- Engine asset identity remains independent of Markitect snapshot identity.
## Use Case 5: Markdown-Backed Agent Context Packages
Intent: build portable context packages from Markdown sources while applying
local label policy where available.
Expected Markitect APIs:
- `create_context_package_from_sources(...)`
- `activate_context_package(...)`
- `MemoryNamespace`
- `ContextBudget`
- `LocalLabelPolicyGateway`
Example:
```python
from markitect_tool import (
ContextBudget,
LocalLabelPolicyGateway,
MemoryNamespace,
activate_context_package,
create_context_package_from_sources,
)
package = create_context_package_from_sources(
"sections[heading=Decision]",
[source_path],
root=root,
namespace=MemoryNamespace(project="kontextual-engine"),
budget=ContextBudget(max_items=3),
)
activation = activate_context_package(package, policy_gateway=gateway)
```
Engine expectation:
- Markitect context packages are portable markdown-backed payloads.
- Engine context packages must remain permission-aware, source-grounded,
auditable, and usable across non-Markdown assets.
## Use Case 6: Contracts Runtime And Markdown Workflows
Intent: reuse Markitect contract checks, runtime context, templates, document
functions, and markdown-centered workflow helpers where Markdown documents are
the subject.
Expected Markitect APIs:
- `check_document_contract(...)`
- `load_contract_file(...)`
- `evaluate_form_state(...)`
- `WorkflowRunner`, `load_workflow_file(...)`
- document function, template, generation, and processor public APIs.
Engine expectation:
- Markitect checks can become workflow steps or validation adapters.
- The engine owns workflow templates, run state, retries, review gates,
exceptions, audit, and derived artifacts.
## Integration Test Matrix
| Test area | Boundary protected |
| --- | --- |
| Parser and document shape | Markdown structure comes from Markitect. |
| Selector query and extraction | Engine does not invent duplicate selector behavior. |
| Transform and include provenance | Markdown ops retain Markitect provenance. |
| Snapshot identity | Engine stores Markitect snapshot metadata without owning the algorithm. |
| Context package policy filtering | Agent context can reuse Markitect packages and local label policy. |
These tests are intentionally small. They are not a replacement for
`markitect-tool`'s own test suite; they assert only the behaviors this engine
depends on.

View File

@@ -8,6 +8,25 @@ This note records what `kontextual-engine` should reuse from
`markitect-tool` instead of reimplementing. `markitect-tool` is the syntax
layer; `kontextual-engine` is the system/runtime layer.
## Dependency Contract
`kontextual-engine` should integrate `markitect-tool` through documented public
Python APIs and adapter modules. The preferred import surface is the
top-level `markitect_tool` package or documented subpackages such as
`markitect_tool.query`, `markitect_tool.ops`, `markitect_tool.memory`,
`markitect_tool.policy`, and `markitect_tool.backend`.
The engine must treat returned Markitect objects as adapter payloads. Domain
state should persist serializable envelopes, source references, digests,
lineage, policy decisions, and audit events rather than storing Markitect
runtime objects as canonical engine entities.
Required integration behavior is captured in
`docs/markitect-tool-integration-usecases.md` and exercised by
`tests/test_markitect_tool_contract.py`. These tests are allowed to skip when
the optional `markitect-tool` dependency is not installed, but they become
stability checks for the boundary when the `markdown` extra is installed.
## Reuse As Adapter Dependencies
| Need in kontextual-engine | markitect-tool owner | Reuse posture |
@@ -22,6 +41,21 @@ layer; `kontextual-engine` is the system/runtime layer.
| Document functions, templates, and generation hooks | `markitect_tool.document_function`, `markitect_tool.generation` | Invoke as syntax-layer processors. Keep provider calls behind `llm-connect`. |
| Local label policy and policy adapter protocols | `markitect_tool.policy.*` | Reuse for markdown source/package filtering. Engine should expose policy-aware operations at artifact/service level. |
## Adapter Ownership Rules
- Markdown ingestion adapters may call `parse_markdown`, `parse_markdown_file`,
`query_document`, `extract_document`, and `snapshot_identity_for_file`.
- Markdown transformation adapters may call `transform_markdown`,
`compose_files`, `resolve_includes`, Markitect contract checks, document
functions, templates, and workflow helpers.
- Agent/context adapters may call Markitect context-package and local policy
APIs for markdown-backed context packages.
- Engine domain code must not import Markitect APIs directly.
- Service APIs must not expose the `mkt` CLI as the engine control surface.
- Cross-format query, policy, audit, workflow run, versioning, and export
contracts remain engine-owned even when Markitect produced the markdown
payload.
## Do Not Reimplement Here
- Markdown ASTs, section trees, frontmatter parsing, explode/implode, document
@@ -44,4 +78,3 @@ layer; `kontextual-engine` is the system/runtime layer.
- Agent-operable context continuity and service/programmatic APIs.
- Adapter registry that can call `markitect-tool`, `llm-connect`, and storage
backends without embedding their internals.

View File

@@ -37,4 +37,7 @@ where = ["src"]
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
markers = [
"integration: tests that exercise optional external package contracts",
"markitect_tool: tests for the optional markitect-tool adapter boundary",
]

View File

@@ -74,7 +74,7 @@ class MarkdownIngestionAdapter:
def ingest(self, request: IngestionRequest) -> IngestionResult:
try:
from markitect_tool.core.parser import parse_markdown
from markitect_tool import parse_markdown
except Exception as exc: # pragma: no cover - exercised when optional dep absent
raise AdapterUnavailableError(
"markitect-tool is required for markdown ingestion",
@@ -122,4 +122,3 @@ class IngestionService:
"No ingestion adapter registered for media type",
details={"media_type": request.media_type},
)

View File

@@ -0,0 +1,154 @@
from pathlib import Path
import pytest
pytestmark = [pytest.mark.integration, pytest.mark.markitect_tool]
mkt = pytest.importorskip(
"markitect_tool",
reason="Install kontextual-engine[markdown] to run markitect-tool contract tests.",
)
SAMPLE_MARKDOWN = """---
document_type: decision
status: accepted
policy:
labels: [public]
---
# Engine Boundary
## Context
The engine needs Markdown-native structure without owning a Markdown parser.
## Decision
Use markitect-tool as the syntax and deterministic operations layer.
## Consequences
- Engine assets stay cross-format.
- Markdown selectors stay Markitect-owned.
"""
def test_markitect_parser_returns_structured_markdown_document() -> None:
document = mkt.parse_markdown(SAMPLE_MARKDOWN, source_path="docs/decision.md")
serialized = document.to_dict()
assert serialized["frontmatter"]["status"] == "accepted"
assert serialized["source_path"] == "docs/decision.md"
assert [heading["text"] for heading in serialized["headings"]] == [
"Engine Boundary",
"Context",
"Decision",
"Consequences",
]
assert any(section["heading"]["text"] == "Decision" for section in serialized["sections"])
def test_markitect_selectors_extract_source_grounded_markdown_units() -> None:
document = mkt.parse_markdown(SAMPLE_MARKDOWN)
matches = mkt.query_document(document, "sections[heading=Decision]")
extracted = mkt.extract_document(document, "sections[heading=Decision]")
assert len(matches) == 1
assert matches[0].kind == "section"
assert matches[0].line is not None
assert "deterministic operations layer" in matches[0].text
assert extracted == [
"## Decision\n\nUse markitect-tool as the syntax and deterministic operations layer."
]
def test_markitect_ops_resolve_includes_transform_and_return_provenance(tmp_path: Path) -> None:
partial = tmp_path / "partial.md"
partial.write_text(
"# Included\n\n## Decision\n\nReuse Markitect operations.\n",
encoding="utf-8",
)
included = mkt.resolve_includes(
'{{include:partial.md}}',
base_dir=tmp_path,
)
transformed = mkt.transform_markdown(
included.markdown,
set_frontmatter={"status": "draft"},
heading_delta=1,
source_path="composed.md",
)
assert included.included_paths == [str(partial.resolve())]
assert included.provenance[0].operation == "include"
assert included.provenance[0].target_path == str(partial.resolve())
assert "status: draft" in transformed.markdown
assert "## Included" in transformed.markdown
assert "### Decision" in transformed.markdown
assert [event.operation for event in transformed.provenance] == [
"set_frontmatter",
"shift_headings",
]
def test_markitect_snapshot_identity_is_content_addressed_adapter_metadata(tmp_path: Path) -> None:
source = tmp_path / "decision.md"
source.write_text(SAMPLE_MARKDOWN, encoding="utf-8")
first = mkt.snapshot_identity_for_file(source, parse_options={"profile": "default"})
second = mkt.snapshot_identity_for_file(source, parse_options={"profile": "default"})
changed = mkt.snapshot_identity_for_file(source, parse_options={"profile": "strict"})
assert first.snapshot_id == second.snapshot_id
assert first.content_hash == second.content_hash
assert first.parser == "markdown-it-py/commonmark"
assert first.snapshot_id != changed.snapshot_id
assert first.to_dict()["source_path"] == str(source)
def test_markitect_context_packages_filter_by_local_policy(tmp_path: Path) -> None:
public = tmp_path / "public.md"
private = tmp_path / "private.md"
public.write_text(
"---\npolicy:\n labels: [public]\n---\n# Public\n\nVisible context.\n",
encoding="utf-8",
)
private.write_text(
"---\npolicy:\n labels: [internal]\n---\n# Private\n\nHidden context.\n",
encoding="utf-8",
)
package = mkt.create_context_package_from_sources(
"document",
[public, private],
root=tmp_path,
namespace=mkt.MemoryNamespace(project="kontextual-engine", task="boundary"),
budget=mkt.ContextBudget(max_items=5),
)
gateway = mkt.LocalLabelPolicyGateway(
{
"id": "kontextual-engine-boundary",
"subjects": {
"reader": {
"allowed_labels": ["public"],
"allowed_actions": ["read", "activate"],
}
},
"default_subject": "reader",
}
)
activation = mkt.activate_context_package(
package,
policy_gateway=gateway,
subject="reader",
)
assert package.namespace.project == "kontextual-engine"
assert len(activation.items) == 1
assert "Visible context" in activation.content
assert "Hidden context" not in activation.content
assert activation.policy["summary"]["denied"] == 1

View File

@@ -33,10 +33,21 @@ workflow state, exportability, and agent-safe operation from the start.
- Updated scope and roadmap documentation.
- `docs/architecture-blueprint.md` as the architecture baseline for the V0.2
implementation sequence.
- `docs/markitect-tool-reuse-boundary.md` and
`docs/markitect-tool-integration-usecases.md` as the explicit boundary
between markdown syntax tooling and the engine runtime.
- Architecture decision notes for the P0 capability baseline.
- Traceability from PRD/FRS V0.2 requirements to implementation workplans.
- Revised implementation sequence for `KONT-WP-0005` through `KONT-WP-0010`.
## markitect-tool Boundary Remark
The architecture work must treat `markitect-tool` as the Markdown-native syntax
and operations dependency, not as a subsystem to copy into this repo.
Architecture decisions should require adapter-only imports, public Markitect
APIs, and integration contract tests for parser, selector, operation,
snapshot, and context-package behavior.
## A4.1 - Reconcile implementation baseline with V0.2 vision
```task
@@ -192,6 +203,8 @@ Acceptance:
- `docs/knowledge-operations-roadmap.md` maps PRD/FRS areas to workplans.
- `docs/architecture-blueprint.md` defines the implementation shape and review
checklist.
- Markitect dependency boundaries and use cases are linked from roadmap and
scope materials where they affect implementation sequencing.
- `README.md` points to the new research and roadmap materials.
## Definition Of Done

View File

@@ -37,6 +37,14 @@ ports, policy port, audit port, and SQLite/in-memory adapters described in
`docs/architecture-blueprint.md`. The asset registry must not depend on HTTP,
source connectors, document extractors, search backends, or AI providers.
## markitect-tool Boundary Remark
The asset registry may persist Markitect snapshot IDs, parser metadata,
frontmatter-derived metadata, selector references, and operation provenance as
adapter metadata on representations or versions. It must not make Markitect
document classes canonical engine entities, and asset identity must remain
independent of Markitect snapshot identity.
## G5.1 - Implement stable asset identity and source references
```task
@@ -76,6 +84,8 @@ Acceptance:
- Derived artifacts are stored as asset-linked records, not detached strings.
- Representation metadata includes media type, digest, size, extractor or
producer, and provenance.
- Markdown representation metadata can include serialized Markitect snapshot
identity without coupling engine identity to it.
## G5.3 - Implement metadata classification lifecycle and schema validation

View File

@@ -37,6 +37,14 @@ Implement ingestion through connector and extractor ports described in
access, `markitect-tool`, PDF/document libraries, and dataset readers must live
behind adapters, not in the domain core.
## markitect-tool Boundary Remark
Markdown ingestion must use `markitect-tool` for Markdown parsing,
frontmatter, headings, sections, selectors, includes, contract checks where
needed, and snapshot identity. The engine should normalize Markitect results
into its common representation and preserve source/adapter provenance rather
than rebuilding Markdown syntax behavior.
## I6.1 - Implement ingestion job model status and retry surface
```task
@@ -110,6 +118,8 @@ Acceptance:
- Plain text produces normalized text representation and source provenance.
- Markdown extraction delegates to `markitect-tool` when available.
- Missing adapter dependencies fail with structured adapter errors.
- Parser, selector extraction, and snapshot identity behavior are covered by
the Markitect integration contract tests.
## I6.5 - Implement PDF office document and dataset baseline adapters

View File

@@ -36,6 +36,14 @@ and policy checks described in `docs/architecture-blueprint.md`. Search indexes
and ranking backends are adapters; they must not define the stable query or
result contracts.
## markitect-tool Boundary Remark
For Markdown-backed assets, retrieval adapters may use Markitect selectors,
extraction helpers, local index concepts, and context-package source spans to
produce grounded units and snippets. Engine retrieval contracts, result
envelopes, policy filtering, pagination, feedback, and cross-format search
remain engine-owned.
## R7.1 - Implement query contracts pagination sorting and result envelopes
```task
@@ -149,6 +157,8 @@ Acceptance:
- Results explain why they were returned and where they originated.
- Snippets are permission filtered.
- Retrieval packages are suitable for later grounded answer generation.
- Markdown snippets can reference Markitect selector matches or context-package
spans as adapter provenance.
## R7.7 - Capture retrieval feedback and KPI measurement hooks

View File

@@ -37,6 +37,14 @@ services, repository ports, event ports, policy checks, and audit events
described in `docs/architecture-blueprint.md`. Execution may start embedded,
but contracts must allow later queue or workflow-engine adapters.
## markitect-tool Boundary Remark
Markdown-specific transformations should delegate to Markitect operations,
contracts, runtime checks, templates, document functions, processors, and
workflow helpers. The engine owns the operation registry, run state, actors,
policy checks, derived artifact identity, lineage, retries, review gates, and
audit events.
## O8.1 - Implement transformation operation registry
```task
@@ -55,6 +63,8 @@ Acceptance:
supported asset types.
- Provider-specific LLM behavior remains behind adapters.
- Unsupported operations return structured capability errors.
- Markdown compose, include, transform, and validate operations are registered
as adapter-backed operations rather than reimplemented.
## O8.2 - Implement transformation runs with parameters actors and policy context

View File

@@ -38,6 +38,14 @@ Implement the service API as an adapter over application services, following
Agent operations must use the bounded operation catalog, policy checks, audit
events, dry-run behavior, and review gates described in the blueprint.
## markitect-tool Boundary Remark
Service and agent APIs may expose engine operations that internally use
Markitect for markdown-backed context packages, selectors, validation, and
deterministic markdown operations. They must not expose the `mkt` CLI as the
engine control plane or let agents bypass engine policy, audit, lifecycle, and
review gates through Markitect APIs.
## S9.1 - Implement versioned FastAPI service skeleton and health contracts
```task
@@ -151,6 +159,8 @@ Acceptance:
- Package contents are source-grounded and permission filtered.
- External memory references remain opaque and respect
`docs/phase-memory-boundary.md`.
- Markdown-backed packages can interoperate with Markitect context-package
payloads while remaining wrapped in engine permission and audit contracts.
## S9.7 - Implement dry-run review-gate and contract-test coverage

View File

@@ -38,6 +38,14 @@ ports, services, audit model, and export package model described in
`docs/architecture-blueprint.md`. Export and observability must preserve policy
checks and must not require direct storage access.
## markitect-tool Boundary Remark
Observability and export should surface Markitect adapter provenance, snapshot
identity, selector references, context-package manifests, and operation
provenance where markdown-backed assets depend on them. Export formats remain
engine-owned and should include Markitect payloads as documented adapter
sections, not as the whole portability model.
## E10.1 - Expose operational metrics events and job inspection
```task
@@ -138,6 +146,8 @@ Acceptance:
status, policy exceptions, derived artifact creation, and review decisions.
- Storage, index, queue, workflow, AI, and model backend abstractions remain
externally semantic-preserving.
- Markitect adapter contract tests are part of the extension compatibility
posture for markdown-related engine capabilities.
## E10.6 - Capture retrieval AI cost and quality signals