generated from coulomb/repo-seed
306 lines
12 KiB
Markdown
306 lines
12 KiB
Markdown
---
|
|
id: KONT-WP-0007
|
|
type: workplan
|
|
title: "Governed Retrieval And Context Graph"
|
|
domain: markitect
|
|
repo: kontextual-engine
|
|
status: done
|
|
owner: codex
|
|
topic_slug: markitect
|
|
planning_priority: high
|
|
planning_order: 7
|
|
created: "2026-05-05"
|
|
updated: "2026-05-06"
|
|
state_hub_workstream_id: "64352515-9677-46bb-909a-9e2db4915dc7"
|
|
---
|
|
|
|
# KONT-WP-0007: Governed Retrieval And Context Graph
|
|
|
|
## Purpose
|
|
|
|
Build retrieval as a governed operational capability: stable query contracts,
|
|
text search, metadata and lifecycle filtering, contextual entities,
|
|
relationship traversal, source-grounded snippets, permission checks, and
|
|
quality feedback.
|
|
|
|
## Requirement Coverage
|
|
|
|
Primary: FR-040 to FR-050 and FR-060 to FR-071.
|
|
|
|
Supporting: FR-120 to FR-126, FR-143 to FR-146, FR-163, FR-200 to FR-204.
|
|
|
|
## Architecture Constraint
|
|
|
|
Implement retrieval through retrieval services, search ports, repository ports,
|
|
and policy checks described in `docs/architecture-blueprint.md`. Search indexes
|
|
and ranking backends are adapters; they must not define the stable query or
|
|
result contracts.
|
|
|
|
## markitect-tool Boundary Remark
|
|
|
|
For Markdown-backed assets, retrieval adapters may use Markitect selectors,
|
|
extraction helpers, local index concepts, and context-package source spans to
|
|
produce grounded units and snippets. Engine retrieval contracts, result
|
|
envelopes, policy filtering, pagination, feedback, and cross-format search
|
|
remain engine-owned.
|
|
|
|
## Implementation Status
|
|
|
|
As of 2026-05-06, the first retrieval slice is recorded in
|
|
`docs/retrieval-implementation.md`. It establishes asset query request/result
|
|
contracts, stable sorting and pagination, result envelopes with source
|
|
references, representations, metadata records, refreshable lexical search,
|
|
relevance metadata, zero-result smoke metadata, and structured validation
|
|
diagnostics. It also supports combined metadata, lifecycle, source-context,
|
|
tag, collection, timestamp, and representation filters across in-memory and
|
|
SQLite-backed repositories. The contextual graph slice adds direct contextual
|
|
entity and relationship query envelopes plus asset filters by contextual
|
|
entity, workflow run, related asset, and relationship predicate. Remaining work
|
|
is focused on multi-hop graph traversal/ranking, source-grounded snippets, and
|
|
feedback/KPI hooks. Permission-aware retrieval now uses the engine policy
|
|
gateway for query-scope and per-resource checks, with fail-closed denied
|
|
envelopes and retrieval audit events. Lexical queries can also return
|
|
source-grounded snippet packets with representation/source references and
|
|
adapter provenance. Feedback and KPI hooks persist retrieval feedback and
|
|
derive zero-result, precision, citation precision, safety, confidence, and
|
|
permission-filter timing signals.
|
|
|
|
## R7.1 - Implement query contracts pagination sorting and result envelopes
|
|
|
|
```task
|
|
id: KONT-WP-0007-T001
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356"
|
|
```
|
|
|
|
Define query requests, result envelopes, deterministic pagination, sorting,
|
|
diagnostics, and correlation IDs.
|
|
|
|
Acceptance:
|
|
|
|
- Repeated equivalent queries return stable ordering within documented limits.
|
|
- Results include asset IDs, representation references, metadata, source
|
|
references, and diagnostics.
|
|
- Invalid queries return structured validation errors.
|
|
|
|
Implemented:
|
|
|
|
- `AssetQueryRequest`, `AssetQueryItem`, `AssetQueryResult`, and
|
|
`AssetRetrievalService` provide the stable asset query contract.
|
|
- Queries return deterministic ordering with pagination metadata and
|
|
correlation IDs.
|
|
- Result entries expose asset identity, classification, source references,
|
|
representations, and metadata records.
|
|
- Invalid lifecycle, representation kind, sort key, sort order, limit, and
|
|
offset return structured diagnostics without raising raw exceptions.
|
|
|
|
## R7.2 - Implement lexical search over normalized content
|
|
|
|
```task
|
|
id: KONT-WP-0007-T002
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1"
|
|
```
|
|
|
|
Implement MVP lexical search over normalized representations without making
|
|
semantic/vector search a blocker.
|
|
|
|
Acceptance:
|
|
|
|
- Text search returns matching assets with relevance metadata.
|
|
- Search indexes can be refreshed after ingestion or update.
|
|
- p95 latency and zero-result rate can be measured in smoke tests.
|
|
|
|
Implemented:
|
|
|
|
- Normalized ingestion now stores representation search text and length
|
|
metadata for retrieval indexing.
|
|
- `AssetRetrievalService.refresh_index()` builds a refreshable lexical index
|
|
with indexed asset and representation counts.
|
|
- Text queries perform lexical substring matching over normalized
|
|
representations and return relevance metadata including strategy, query,
|
|
match count, and matching representation IDs.
|
|
- Query result metadata includes zero-result and lexical index statistics for
|
|
later smoke/performance measurement.
|
|
|
|
## R7.3 - Implement metadata lifecycle and source-context filters
|
|
|
|
```task
|
|
id: KONT-WP-0007-T003
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02"
|
|
```
|
|
|
|
Support filters by asset type, collection, source, owner, tags,
|
|
classification, sensitivity, lifecycle state, timestamps, and custom metadata.
|
|
|
|
Acceptance:
|
|
|
|
- Text search and metadata filters can be combined.
|
|
- Lifecycle and sensitivity filters participate in permission checks.
|
|
- Filter behavior is covered across in-memory and durable backends where
|
|
supported.
|
|
|
|
Implemented:
|
|
|
|
- Asset queries support filters for asset type, lifecycle, sensitivity, owner,
|
|
topic, review state, source system/path, representation kind, collection,
|
|
tags, created/updated timestamp bounds, and custom metadata records.
|
|
- Text search can be combined with standard, source, tag, collection,
|
|
sensitivity, and metadata filters.
|
|
- Combined filter behavior is covered over in-memory and SQLite-backed asset
|
|
repositories.
|
|
- Permission enforcement is intentionally deferred to R7.5; current lifecycle
|
|
and sensitivity filters establish the policy inputs without claiming
|
|
authorization semantics.
|
|
|
|
## R7.4 - Implement contextual entity model and relationship retrieval
|
|
|
|
```task
|
|
id: KONT-WP-0007-T004
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30"
|
|
```
|
|
|
|
Represent contextual entities such as people, teams, projects, cases, topics,
|
|
source systems, processes, products, and generated artifacts.
|
|
|
|
Acceptance:
|
|
|
|
- Assets can be linked to contextual entities.
|
|
- Relationship direction, type, validity, confidence, actor, and provenance are
|
|
represented where available.
|
|
- Callers can retrieve assets by project, case, topic, source, workflow run, or
|
|
related asset.
|
|
|
|
Implemented:
|
|
|
|
- Existing `ContextEntity`/`CoreRelationship` primitives are reused as the
|
|
canonical model; entity types now include workflow runs and generated
|
|
artifacts for operational graph use cases.
|
|
- `ContextEntityQueryRequest`/`ContextEntityQueryResult` provide stable
|
|
contextual entity lookup by type, name, external reference, and metadata.
|
|
- `RelationshipQueryRequest`/`RelationshipQueryResult` provide stable
|
|
relationship retrieval by source, target, asset, contextual entity,
|
|
workflow run, predicate, target kind, and direction.
|
|
- Asset queries can filter by contextual entity, workflow run, related asset,
|
|
and relationship predicate while returning relationship and contextual
|
|
entity context for matched assets.
|
|
- Graph retrieval behavior is covered across in-memory and SQLite-backed
|
|
repositories.
|
|
|
|
## R7.5 - Enforce permission-aware retrieval and fail-closed semantics
|
|
|
|
```task
|
|
id: KONT-WP-0007-T005
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa"
|
|
```
|
|
|
|
Apply authorization and policy checks before returning content, metadata,
|
|
snippets, relationships, derived artifacts, or context packages.
|
|
|
|
Acceptance:
|
|
|
|
- Unauthorized assets do not leak through result lists, snippets, relationship
|
|
traversal, or derived answer packages.
|
|
- Missing or stale permission context fails closed according to policy.
|
|
- Retrieval audit events capture actor, query scope, outcome, and policy
|
|
context.
|
|
|
|
Implemented:
|
|
|
|
- Retrieval services accept the engine `PolicyGateway`, defaulting to the
|
|
allow-all local adapter used elsewhere in the system.
|
|
- Asset, contextual entity, and relationship queries authorize the query scope
|
|
before loading result envelopes.
|
|
- Assets, contextual entities, and relationships are policy-filtered before
|
|
they are returned; relationships additionally require source and target
|
|
resource visibility so traversal cannot reveal denied assets or entities.
|
|
- Policy gateway failures produce empty denied envelopes with structured
|
|
diagnostics and fail-closed policy decisions.
|
|
- Retrieval audit events capture actor, correlation ID, query scope, policy
|
|
decision, outcome, result counts, and internal permission-filter counts.
|
|
|
|
## R7.6 - Return source-grounded snippets citations and explanation data
|
|
|
|
```task
|
|
id: KONT-WP-0007-T006
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1"
|
|
```
|
|
|
|
Return matched regions, snippets, source references, representation IDs,
|
|
relationship context, and citation-ready data for grounded AI workflows.
|
|
|
|
Acceptance:
|
|
|
|
- Results explain why they were returned and where they originated.
|
|
- Snippets are permission filtered.
|
|
- Retrieval packages are suitable for later grounded answer generation.
|
|
- Markdown snippets can reference Markitect selector matches or context-package
|
|
spans as adapter provenance.
|
|
|
|
Implemented:
|
|
|
|
- `RetrievalSnippet` packets expose asset, representation, source reference,
|
|
storage reference, media type, match offsets, match text, snippet text, and
|
|
adapter provenance.
|
|
- Lexical asset queries can request snippets through `include_snippets`,
|
|
`max_snippets`, and `snippet_radius`.
|
|
- Snippets are generated from normalized representation search text and are
|
|
attached only to policy-authorized asset results.
|
|
- Markitect selectors, source spans, context spans, adapter provenance,
|
|
snapshots, and extractor identity are preserved when supplied as
|
|
representation metadata.
|
|
- Snippet behavior is covered with permission filtering so denied matching
|
|
content does not leak through snippet packets.
|
|
|
|
## R7.7 - Capture retrieval feedback and KPI measurement hooks
|
|
|
|
```task
|
|
id: KONT-WP-0007-T007
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde"
|
|
```
|
|
|
|
Capture relevance feedback and quality signals for retrieval improvement.
|
|
|
|
Acceptance:
|
|
|
|
- Feedback can mark results useful, irrelevant, missing, unsafe, or low
|
|
confidence.
|
|
- Query context and result metadata are stored with feedback.
|
|
- Precision@k, zero-result rate, permission-filter latency, and citation
|
|
precision have measurement hooks.
|
|
|
|
Implemented:
|
|
|
|
- `RetrievalFeedbackRecord` persists feedback labels for useful, irrelevant,
|
|
missing, unsafe, and low-confidence outcomes with actor, correlation ID,
|
|
query context, result references, notes, and metadata.
|
|
- Asset registry repository ports and memory/SQLite adapters persist and list
|
|
retrieval feedback.
|
|
- `AssetRetrievalService.record_feedback()` records authorized feedback with
|
|
structured diagnostics for invalid labels or denied feedback operations.
|
|
- `AssetRetrievalService.quality_metrics()` derives zero-result rate,
|
|
precision@k, citation precision, feedback totals, unsafe/low-confidence
|
|
counts, and permission-filter timing observations from query results,
|
|
feedback records, and retrieval audit events.
|
|
|
|
## Definition Of Done
|
|
|
|
- Retrieval tests cover text, metadata, lifecycle, relationship, contextual
|
|
entity, pagination, permission, snippet, and feedback behavior.
|
|
- Retrieval does not bypass policy or source provenance.
|
|
- Search, relationship, and context retrieval contracts follow
|
|
`docs/architecture-blueprint.md`.
|
|
- `python3 -m pytest` passes.
|