diff --git a/docs/asset-registry-implementation.md b/docs/asset-registry-implementation.md index 0a278e4..cb21fc3 100644 --- a/docs/asset-registry-implementation.md +++ b/docs/asset-registry-implementation.md @@ -82,6 +82,7 @@ and SQLite repositories are adapters behind those ports. - `core_relationships` - `asset_versions` - `audit_events` +- `retrieval_feedback` - `idempotency_records` Payloads are stored as compact JSON envelopes while indexed columns carry diff --git a/docs/current-state-overlap-review.md b/docs/current-state-overlap-review.md new file mode 100644 index 0000000..3ae4eec --- /dev/null +++ b/docs/current-state-overlap-review.md @@ -0,0 +1,165 @@ +# Current State Overlap Review + +Date: 2026-05-06 + +## Purpose + +Compare the current `kontextual-engine` implementation with +`/home/worsch/markitect-main` and `/home/worsch/markitect-tool` after +completion of `KONT-WP-0007`. + +The review asks two questions: + +1. Did we capture the useful successor scope from `markitect-main`? +2. Did we accidentally create unhealthy overlap with `markitect-tool`? + +## Inputs Reviewed + +- Current engine implementation in `src/kontextual_engine/`. +- Current workplans, especially `KONT-WP-0005`, `KONT-WP-0006`, and + `KONT-WP-0007`. +- Existing boundary notes: + - `docs/markitect-main-scope-assessment.md` + - `docs/markitect-tool-reuse-boundary.md` + - `docs/markitect-tool-integration-usecases.md` + - `docs/system-layer-migration-backlog.md` +- `markitect-main` areas: + - `markitect/infospace/` + - `markitect/assets/` + - `markitect/spaces/` + - `markitect/prompts/` + - `markitect/query_paradigms/` + - `infrastructure/repositories/` +- `markitect-tool` public API and source areas: + - `core`, `query`, `ops`, `backend`, `memory`, `policy`, `contract`, + `schema`, `workflow`, `runtime`, `reference`, and `document_function`. + +## Current Engine State + +The engine now has a coherent runtime foundation: + +- canonical asset identity through `KnowledgeAsset`, +- separate source, normalized, and derived representations, +- source references, metadata records, lifecycle, versions, audit, policy, and + idempotency, +- durable memory and SQLite repository adapters, +- ingestion adapters for local files, text, document metadata, datasets, and + Markdown via `markitect-tool`, +- governed retrieval with lexical search, filters, contextual entities, + relationships, policy filtering, snippets, feedback, and KPI hooks. + +This is now materially beyond the original minimal artifact/query scaffold. +The canonical implementation is the `core/`, `ports/`, `services/`, and +`adapters/` architecture. The older simple modules such as `artifacts.py`, +`storage.py`, `query.py`, `context.py`, `ingestion.py`, and `relationships.py` +remain useful as compatibility scaffolding and migrated seed tests, but they +should not be treated as the long-term canonical model. + +## Comparison With markitect-main + +### Healthy Successor Coverage + +`markitect-main` mixed many concepts into one project: Markdown syntax tooling, +infospace experiments, assets, spaces, prompt workflow machinery, query +paradigms, UI, finance, issue tracking, provider adapters, and repository +infrastructure. + +The current engine has correctly lifted the system-layer concepts instead of +porting old package structure directly. + +| `markitect-main` concept | Current engine successor | Assessment | +| --- | --- | --- | +| `markitect/assets/*` content-addressed asset records | `KnowledgeAsset`, `AssetRepresentation`, `SourceReference`, repository adapters | Healthy reimplementation. The engine model is more governed and cross-format. | +| `markitect/infospace/models.py` entity metadata | `ContextEntity`, metadata records, asset classification | Healthy abstraction. Domain-specific section fields were not copied. | +| `markitect/infospace/relation_models.py` triplets | `CoreRelationship` with target kind, confidence, actor, provenance, validity windows | Healthy reimplementation. More generic than VSM-specific relation metadata. | +| `markitect/spaces/models.py` information spaces | Partly covered by collections/tags/source context metadata, not yet a first-class scope container | Gap remains. A future collection/scope model should avoid recreating old rendering-oriented `InformationSpace`. | +| `markitect/query_paradigms/base.py` generic `QueryResult` | `AssetQueryResult`, `ContextEntityQueryResult`, `RelationshipQueryResult` | Healthy reimplementation. The engine now owns stable operational query contracts rather than query-paradigm plugins. | +| `markitect/prompts/*` artifact, dependency, quality, run, lineage concepts | Workplans `KONT-WP-0008` and `KONT-WP-0010` | Not implemented yet. These should influence transformation/workflow work next. | +| `infrastructure/repositories/*` SQLite/filesystem lessons | Engine repository ports and SQLite adapter | Healthy reimplementation. The old repository shape was document/workspace-specific and async-heavy. | + +### Remaining markitect-main Gaps + +The largest successor gaps are not retrieval gaps anymore. They are workflow and +operation-state gaps: + +- transformation runs and derived artifact lineage, +- workflow templates, step state, retries, review gates, and failures, +- quality gates, impact debt, recomputation, and traceability, +- first-class collection/scope membership, +- service/API surfaces and agent-safe operation envelopes, +- export and enterprise-readiness concerns. + +These are already covered by later workplans, especially `KONT-WP-0008`, +`KONT-WP-0009`, and `KONT-WP-0010`. + +## Comparison With markitect-tool + +### Healthy Boundary + +The current implementation mostly respects the intended split: + +- `markitect-tool` owns Markdown parsing, selectors, transforms, includes, + document contracts, Markdown schema validation, local snapshot identity, + Markdown context packages, and Markdown-centered workflows. +- `kontextual-engine` owns governed asset state, metadata, lifecycle, policy, + audit, cross-format retrieval, relationship/context graph, feedback, and KPI + hooks. + +The executable boundary checks pass against the sibling checkout: + +```text +PYTHONPATH=/home/worsch/kontextual-engine/src:/home/worsch/markitect-tool/src \ + python3 -m pytest tests/test_markitect_tool_contract.py \ + tests/test_markitect_ingestion_adapter.py -q + +10 passed +``` + +The current Markdown ingestion adapter delegates parsing and snapshot identity +to public `markitect_tool` APIs. It persists serializable normalized +representations and adapter metadata rather than storing Markitect runtime +objects as engine domain state. + +### No Unhealthy Overlap Found + +No current engine code reimplements these Markitect-owned capabilities: + +- Markdown AST construction, +- selector language parsing, +- Markitect document extraction, +- Markdown transforms/includes/composition, +- Markdown document contracts, +- Markdown document schema validation, +- Markitect context package activation, +- Markitect local snapshot identity. + +That is the important line, and we are still on the correct side of it. + +### Watchlist: Benign Today, Risky If Expanded + +| Area | Current state | Risk | Recommended guardrail | +| --- | --- | --- | --- | +| Lexical retrieval index | Engine builds an in-memory substring index over normalized representation `search_text`. | Healthy as a cross-format MVP, but could become a duplicate of Markitect local index/FTS if expanded for Markdown-specific search. | Keep engine search backend-neutral. For durable Markdown FTS, wrap `markitect_tool.backend.IndexBackend` or query adapters instead of rebuilding Markitect local index semantics. | +| Source-grounded snippets | Engine creates offset snippets from normalized search text and carries Markitect provenance if present. | Healthy as cross-format fallback, but exact Markdown section/block snippets should not grow into a second selector engine. | For Markdown-specific snippets, call `markitect_tool.query`/`reference` through an adapter and persist selector/source-span provenance. | +| Policy primitives | Engine and Markitect both have a `PolicyDecision` concept. | Names overlap, but scopes differ: engine policy gates governed asset operations; Markitect policy filters Markdown objects/context packages. | Keep separate models and add explicit adapter mapping when using Markitect local/enterprise policy for Markdown-backed context. | +| Context packages | Engine still has an older simple `ContextPackage` scaffold while Markitect has rich Markdown context packages. | Future agent context work could duplicate Markitect package behavior. | In `KONT-WP-0009`, treat Markitect context packages as Markdown adapter payloads and make engine context packages cross-format, audited, and policy-aware. | +| Adapter post-processing | `MarkitectMarkdownExtractor` derives links/tables from serialized token/block payloads. | Low risk, but depends on Markitect serialization details. | Prefer public Markitect fields if they become available; add contract tests for link/table/source-span stability if these fields become operationally important. | + +## Recommendations + +1. Treat `core/`, `ports/`, `services/`, and `adapters/` as canonical. + Plan a cleanup pass to deprecate or quarantine the older simple + `artifacts.py`/`query.py`/`context.py` scaffold once successor contracts are + fully covered. +2. Proceed to `KONT-WP-0008` using `markitect-main` prompt/workflow tests as + behavioral reference, not as code to port directly. +3. For Markdown transformations in `KONT-WP-0008`, use `markitect_tool.ops` + and `markitect_tool.workflow` adapters. The engine should persist runs, + inputs, outputs, decisions, provenance, and derived artifacts. +4. For future Markdown snippet precision, add an adapter that calls + `markitect_tool.query` and `markitect_tool.reference` instead of expanding + engine substring search into a Markdown selector system. +5. Add a small policy mapping contract before integrating Markitect local or + enterprise policy into engine retrieval/context packages. +6. Keep the current Markitect boundary tests in CI or at least in the standard + integration check profile. They are doing exactly the right job.