12 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | planning_order | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KONT-WP-0007 | workplan | Governed Retrieval And Context Graph | markitect | kontextual-engine | done | codex | markitect | high | 7 | 2026-05-05 | 2026-05-06 | 64352515-9677-46bb-909a-9e2db4915dc7 |
KONT-WP-0007: Governed Retrieval And Context Graph
Purpose
Build retrieval as a governed operational capability: stable query contracts, text search, metadata and lifecycle filtering, contextual entities, relationship traversal, source-grounded snippets, permission checks, and quality feedback.
Requirement Coverage
Primary: FR-040 to FR-050 and FR-060 to FR-071.
Supporting: FR-120 to FR-126, FR-143 to FR-146, FR-163, FR-200 to FR-204.
Architecture Constraint
Implement retrieval through retrieval services, search ports, repository ports,
and policy checks described in docs/architecture-blueprint.md. Search indexes
and ranking backends are adapters; they must not define the stable query or
result contracts.
markitect-tool Boundary Remark
For Markdown-backed assets, retrieval adapters may use Markitect selectors, extraction helpers, local index concepts, and context-package source spans to produce grounded units and snippets. Engine retrieval contracts, result envelopes, policy filtering, pagination, feedback, and cross-format search remain engine-owned.
Implementation Status
As of 2026-05-06, the first retrieval slice is recorded in
docs/retrieval-implementation.md. It establishes asset query request/result
contracts, stable sorting and pagination, result envelopes with source
references, representations, metadata records, refreshable lexical search,
relevance metadata, zero-result smoke metadata, and structured validation
diagnostics. It also supports combined metadata, lifecycle, source-context,
tag, collection, timestamp, and representation filters across in-memory and
SQLite-backed repositories. The contextual graph slice adds direct contextual
entity and relationship query envelopes plus asset filters by contextual
entity, workflow run, related asset, and relationship predicate. Remaining work
is focused on multi-hop graph traversal/ranking, source-grounded snippets, and
feedback/KPI hooks. Permission-aware retrieval now uses the engine policy
gateway for query-scope and per-resource checks, with fail-closed denied
envelopes and retrieval audit events. Lexical queries can also return
source-grounded snippet packets with representation/source references and
adapter provenance. Feedback and KPI hooks persist retrieval feedback and
derive zero-result, precision, citation precision, safety, confidence, and
permission-filter timing signals.
R7.1 - Implement query contracts pagination sorting and result envelopes
id: KONT-WP-0007-T001
status: done
priority: high
state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356"
Define query requests, result envelopes, deterministic pagination, sorting, diagnostics, and correlation IDs.
Acceptance:
- Repeated equivalent queries return stable ordering within documented limits.
- Results include asset IDs, representation references, metadata, source references, and diagnostics.
- Invalid queries return structured validation errors.
Implemented:
AssetQueryRequest,AssetQueryItem,AssetQueryResult, andAssetRetrievalServiceprovide the stable asset query contract.- Queries return deterministic ordering with pagination metadata and correlation IDs.
- Result entries expose asset identity, classification, source references, representations, and metadata records.
- Invalid lifecycle, representation kind, sort key, sort order, limit, and offset return structured diagnostics without raising raw exceptions.
R7.2 - Implement lexical search over normalized content
id: KONT-WP-0007-T002
status: done
priority: high
state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1"
Implement MVP lexical search over normalized representations without making semantic/vector search a blocker.
Acceptance:
- Text search returns matching assets with relevance metadata.
- Search indexes can be refreshed after ingestion or update.
- p95 latency and zero-result rate can be measured in smoke tests.
Implemented:
- Normalized ingestion now stores representation search text and length metadata for retrieval indexing.
AssetRetrievalService.refresh_index()builds a refreshable lexical index with indexed asset and representation counts.- Text queries perform lexical substring matching over normalized representations and return relevance metadata including strategy, query, match count, and matching representation IDs.
- Query result metadata includes zero-result and lexical index statistics for later smoke/performance measurement.
R7.3 - Implement metadata lifecycle and source-context filters
id: KONT-WP-0007-T003
status: done
priority: high
state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02"
Support filters by asset type, collection, source, owner, tags, classification, sensitivity, lifecycle state, timestamps, and custom metadata.
Acceptance:
- Text search and metadata filters can be combined.
- Lifecycle and sensitivity filters participate in permission checks.
- Filter behavior is covered across in-memory and durable backends where supported.
Implemented:
- Asset queries support filters for asset type, lifecycle, sensitivity, owner, topic, review state, source system/path, representation kind, collection, tags, created/updated timestamp bounds, and custom metadata records.
- Text search can be combined with standard, source, tag, collection, sensitivity, and metadata filters.
- Combined filter behavior is covered over in-memory and SQLite-backed asset repositories.
- Permission enforcement is intentionally deferred to R7.5; current lifecycle and sensitivity filters establish the policy inputs without claiming authorization semantics.
R7.4 - Implement contextual entity model and relationship retrieval
id: KONT-WP-0007-T004
status: done
priority: high
state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30"
Represent contextual entities such as people, teams, projects, cases, topics, source systems, processes, products, and generated artifacts.
Acceptance:
- Assets can be linked to contextual entities.
- Relationship direction, type, validity, confidence, actor, and provenance are represented where available.
- Callers can retrieve assets by project, case, topic, source, workflow run, or related asset.
Implemented:
- Existing
ContextEntity/CoreRelationshipprimitives are reused as the canonical model; entity types now include workflow runs and generated artifacts for operational graph use cases. ContextEntityQueryRequest/ContextEntityQueryResultprovide stable contextual entity lookup by type, name, external reference, and metadata.RelationshipQueryRequest/RelationshipQueryResultprovide stable relationship retrieval by source, target, asset, contextual entity, workflow run, predicate, target kind, and direction.- Asset queries can filter by contextual entity, workflow run, related asset, and relationship predicate while returning relationship and contextual entity context for matched assets.
- Graph retrieval behavior is covered across in-memory and SQLite-backed repositories.
R7.5 - Enforce permission-aware retrieval and fail-closed semantics
id: KONT-WP-0007-T005
status: done
priority: high
state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa"
Apply authorization and policy checks before returning content, metadata, snippets, relationships, derived artifacts, or context packages.
Acceptance:
- Unauthorized assets do not leak through result lists, snippets, relationship traversal, or derived answer packages.
- Missing or stale permission context fails closed according to policy.
- Retrieval audit events capture actor, query scope, outcome, and policy context.
Implemented:
- Retrieval services accept the engine
PolicyGateway, defaulting to the allow-all local adapter used elsewhere in the system. - Asset, contextual entity, and relationship queries authorize the query scope before loading result envelopes.
- Assets, contextual entities, and relationships are policy-filtered before they are returned; relationships additionally require source and target resource visibility so traversal cannot reveal denied assets or entities.
- Policy gateway failures produce empty denied envelopes with structured diagnostics and fail-closed policy decisions.
- Retrieval audit events capture actor, correlation ID, query scope, policy decision, outcome, result counts, and internal permission-filter counts.
R7.6 - Return source-grounded snippets citations and explanation data
id: KONT-WP-0007-T006
status: done
priority: medium
state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1"
Return matched regions, snippets, source references, representation IDs, relationship context, and citation-ready data for grounded AI workflows.
Acceptance:
- Results explain why they were returned and where they originated.
- Snippets are permission filtered.
- Retrieval packages are suitable for later grounded answer generation.
- Markdown snippets can reference Markitect selector matches or context-package spans as adapter provenance.
Implemented:
RetrievalSnippetpackets expose asset, representation, source reference, storage reference, media type, match offsets, match text, snippet text, and adapter provenance.- Lexical asset queries can request snippets through
include_snippets,max_snippets, andsnippet_radius. - Snippets are generated from normalized representation search text and are attached only to policy-authorized asset results.
- Markitect selectors, source spans, context spans, adapter provenance, snapshots, and extractor identity are preserved when supplied as representation metadata.
- Snippet behavior is covered with permission filtering so denied matching content does not leak through snippet packets.
R7.7 - Capture retrieval feedback and KPI measurement hooks
id: KONT-WP-0007-T007
status: done
priority: medium
state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde"
Capture relevance feedback and quality signals for retrieval improvement.
Acceptance:
- Feedback can mark results useful, irrelevant, missing, unsafe, or low confidence.
- Query context and result metadata are stored with feedback.
- Precision@k, zero-result rate, permission-filter latency, and citation precision have measurement hooks.
Implemented:
RetrievalFeedbackRecordpersists feedback labels for useful, irrelevant, missing, unsafe, and low-confidence outcomes with actor, correlation ID, query context, result references, notes, and metadata.- Asset registry repository ports and memory/SQLite adapters persist and list retrieval feedback.
AssetRetrievalService.record_feedback()records authorized feedback with structured diagnostics for invalid labels or denied feedback operations.AssetRetrievalService.quality_metrics()derives zero-result rate, precision@k, citation precision, feedback totals, unsafe/low-confidence counts, and permission-filter timing observations from query results, feedback records, and retrieval audit events.
Definition Of Done
- Retrieval tests cover text, metadata, lifecycle, relationship, contextual entity, pagination, permission, snippet, and feedback behavior.
- Retrieval does not bypass policy or source provenance.
- Search, relationship, and context retrieval contracts follow
docs/architecture-blueprint.md. python3 -m pytestpasses.