stable asset queries, lexical search, filters, contextual entity and relationship retrieval, permission-aware fail-closed behavior, source-grounded snippets, feedback capture, and KPI hooks

This commit is contained in:
2026-05-06 16:27:03 +02:00
parent 80a3e59701
commit 1e3c6fe34a
13 changed files with 3173 additions and 9 deletions

View File

@@ -4,13 +4,13 @@ type: workplan
title: "Governed Retrieval And Context Graph"
domain: markitect
repo: kontextual-engine
status: todo
status: done
owner: codex
topic_slug: markitect
planning_priority: high
planning_order: 7
created: "2026-05-05"
updated: "2026-05-05"
updated: "2026-05-06"
state_hub_workstream_id: "64352515-9677-46bb-909a-9e2db4915dc7"
---
@@ -44,11 +44,32 @@ produce grounded units and snippets. Engine retrieval contracts, result
envelopes, policy filtering, pagination, feedback, and cross-format search
remain engine-owned.
## Implementation Status
As of 2026-05-06, the first retrieval slice is recorded in
`docs/retrieval-implementation.md`. It establishes asset query request/result
contracts, stable sorting and pagination, result envelopes with source
references, representations, metadata records, refreshable lexical search,
relevance metadata, zero-result smoke metadata, and structured validation
diagnostics. It also supports combined metadata, lifecycle, source-context,
tag, collection, timestamp, and representation filters across in-memory and
SQLite-backed repositories. The contextual graph slice adds direct contextual
entity and relationship query envelopes plus asset filters by contextual
entity, workflow run, related asset, and relationship predicate. Remaining work
is focused on multi-hop graph traversal/ranking, source-grounded snippets, and
feedback/KPI hooks. Permission-aware retrieval now uses the engine policy
gateway for query-scope and per-resource checks, with fail-closed denied
envelopes and retrieval audit events. Lexical queries can also return
source-grounded snippet packets with representation/source references and
adapter provenance. Feedback and KPI hooks persist retrieval feedback and
derive zero-result, precision, citation precision, safety, confidence, and
permission-filter timing signals.
## R7.1 - Implement query contracts pagination sorting and result envelopes
```task
id: KONT-WP-0007-T001
status: todo
status: done
priority: high
state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356"
```
@@ -63,11 +84,22 @@ Acceptance:
references, and diagnostics.
- Invalid queries return structured validation errors.
Implemented:
- `AssetQueryRequest`, `AssetQueryItem`, `AssetQueryResult`, and
`AssetRetrievalService` provide the stable asset query contract.
- Queries return deterministic ordering with pagination metadata and
correlation IDs.
- Result entries expose asset identity, classification, source references,
representations, and metadata records.
- Invalid lifecycle, representation kind, sort key, sort order, limit, and
offset return structured diagnostics without raising raw exceptions.
## R7.2 - Implement lexical search over normalized content
```task
id: KONT-WP-0007-T002
status: todo
status: done
priority: high
state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1"
```
@@ -81,11 +113,23 @@ Acceptance:
- Search indexes can be refreshed after ingestion or update.
- p95 latency and zero-result rate can be measured in smoke tests.
Implemented:
- Normalized ingestion now stores representation search text and length
metadata for retrieval indexing.
- `AssetRetrievalService.refresh_index()` builds a refreshable lexical index
with indexed asset and representation counts.
- Text queries perform lexical substring matching over normalized
representations and return relevance metadata including strategy, query,
match count, and matching representation IDs.
- Query result metadata includes zero-result and lexical index statistics for
later smoke/performance measurement.
## R7.3 - Implement metadata lifecycle and source-context filters
```task
id: KONT-WP-0007-T003
status: todo
status: done
priority: high
state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02"
```
@@ -100,11 +144,24 @@ Acceptance:
- Filter behavior is covered across in-memory and durable backends where
supported.
Implemented:
- Asset queries support filters for asset type, lifecycle, sensitivity, owner,
topic, review state, source system/path, representation kind, collection,
tags, created/updated timestamp bounds, and custom metadata records.
- Text search can be combined with standard, source, tag, collection,
sensitivity, and metadata filters.
- Combined filter behavior is covered over in-memory and SQLite-backed asset
repositories.
- Permission enforcement is intentionally deferred to R7.5; current lifecycle
and sensitivity filters establish the policy inputs without claiming
authorization semantics.
## R7.4 - Implement contextual entity model and relationship retrieval
```task
id: KONT-WP-0007-T004
status: todo
status: done
priority: high
state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30"
```
@@ -120,11 +177,27 @@ Acceptance:
- Callers can retrieve assets by project, case, topic, source, workflow run, or
related asset.
Implemented:
- Existing `ContextEntity`/`CoreRelationship` primitives are reused as the
canonical model; entity types now include workflow runs and generated
artifacts for operational graph use cases.
- `ContextEntityQueryRequest`/`ContextEntityQueryResult` provide stable
contextual entity lookup by type, name, external reference, and metadata.
- `RelationshipQueryRequest`/`RelationshipQueryResult` provide stable
relationship retrieval by source, target, asset, contextual entity,
workflow run, predicate, target kind, and direction.
- Asset queries can filter by contextual entity, workflow run, related asset,
and relationship predicate while returning relationship and contextual
entity context for matched assets.
- Graph retrieval behavior is covered across in-memory and SQLite-backed
repositories.
## R7.5 - Enforce permission-aware retrieval and fail-closed semantics
```task
id: KONT-WP-0007-T005
status: todo
status: done
priority: high
state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa"
```
@@ -140,11 +213,25 @@ Acceptance:
- Retrieval audit events capture actor, query scope, outcome, and policy
context.
Implemented:
- Retrieval services accept the engine `PolicyGateway`, defaulting to the
allow-all local adapter used elsewhere in the system.
- Asset, contextual entity, and relationship queries authorize the query scope
before loading result envelopes.
- Assets, contextual entities, and relationships are policy-filtered before
they are returned; relationships additionally require source and target
resource visibility so traversal cannot reveal denied assets or entities.
- Policy gateway failures produce empty denied envelopes with structured
diagnostics and fail-closed policy decisions.
- Retrieval audit events capture actor, correlation ID, query scope, policy
decision, outcome, result counts, and internal permission-filter counts.
## R7.6 - Return source-grounded snippets citations and explanation data
```task
id: KONT-WP-0007-T006
status: todo
status: done
priority: medium
state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1"
```
@@ -160,11 +247,26 @@ Acceptance:
- Markdown snippets can reference Markitect selector matches or context-package
spans as adapter provenance.
Implemented:
- `RetrievalSnippet` packets expose asset, representation, source reference,
storage reference, media type, match offsets, match text, snippet text, and
adapter provenance.
- Lexical asset queries can request snippets through `include_snippets`,
`max_snippets`, and `snippet_radius`.
- Snippets are generated from normalized representation search text and are
attached only to policy-authorized asset results.
- Markitect selectors, source spans, context spans, adapter provenance,
snapshots, and extractor identity are preserved when supplied as
representation metadata.
- Snippet behavior is covered with permission filtering so denied matching
content does not leak through snippet packets.
## R7.7 - Capture retrieval feedback and KPI measurement hooks
```task
id: KONT-WP-0007-T007
status: todo
status: done
priority: medium
state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde"
```
@@ -179,6 +281,20 @@ Acceptance:
- Precision@k, zero-result rate, permission-filter latency, and citation
precision have measurement hooks.
Implemented:
- `RetrievalFeedbackRecord` persists feedback labels for useful, irrelevant,
missing, unsafe, and low-confidence outcomes with actor, correlation ID,
query context, result references, notes, and metadata.
- Asset registry repository ports and memory/SQLite adapters persist and list
retrieval feedback.
- `AssetRetrievalService.record_feedback()` records authorized feedback with
structured diagnostics for invalid labels or denied feedback operations.
- `AssetRetrievalService.quality_metrics()` derives zero-result rate,
precision@k, citation precision, feedback totals, unsafe/low-confidence
counts, and permission-filter timing observations from query results,
feedback records, and retrieval audit events.
## Definition Of Done
- Retrieval tests cover text, metadata, lifecycle, relationship, contextual