kontextual-engine/workplans/KONT-WP-0007-governed-retrieval-context-graph.md

---
id: KONT-WP-0007
type: workplan
title: "Governed Retrieval And Context Graph"
domain: markitect
repo: kontextual-engine
status: done
owner: codex
topic_slug: markitect
planning_priority: high
planning_order: 7
created: "2026-05-05"
updated: "2026-05-06"
state_hub_workstream_id: "64352515-9677-46bb-909a-9e2db4915dc7"
---

# KONT-WP-0007: Governed Retrieval And Context Graph

## Purpose

Build retrieval as a governed operational capability: stable query contracts,
text search, metadata and lifecycle filtering, contextual entities,
relationship traversal, source-grounded snippets, permission checks, and
quality feedback.

## Requirement Coverage

Primary: FR-040 to FR-050 and FR-060 to FR-071.

Supporting: FR-120 to FR-126, FR-143 to FR-146, FR-163, FR-200 to FR-204.

## Architecture Constraint

Implement retrieval through retrieval services, search ports, repository ports,
and policy checks described in `docs/architecture-blueprint.md`. Search indexes
and ranking backends are adapters; they must not define the stable query or
result contracts.

## markitect-tool Boundary Remark

For Markdown-backed assets, retrieval adapters may use Markitect selectors,
extraction helpers, local index concepts, and context-package source spans to
produce grounded units and snippets. Engine retrieval contracts, result
envelopes, policy filtering, pagination, feedback, and cross-format search
remain engine-owned.

## Implementation Status

As of 2026-05-06, the first retrieval slice is recorded in
`docs/retrieval-implementation.md`. It establishes asset query request/result
contracts, stable sorting and pagination, result envelopes with source
references, representations, metadata records, refreshable lexical search,
relevance metadata, zero-result smoke metadata, and structured validation
diagnostics. It also supports combined metadata, lifecycle, source-context,
tag, collection, timestamp, and representation filters across in-memory and
SQLite-backed repositories. The contextual graph slice adds direct contextual
entity and relationship query envelopes plus asset filters by contextual
entity, workflow run, related asset, and relationship predicate. Remaining work
is focused on multi-hop graph traversal/ranking, source-grounded snippets, and
feedback/KPI hooks. Permission-aware retrieval now uses the engine policy
gateway for query-scope and per-resource checks, with fail-closed denied
envelopes and retrieval audit events. Lexical queries can also return
source-grounded snippet packets with representation/source references and
adapter provenance. Feedback and KPI hooks persist retrieval feedback and
derive zero-result, precision, citation precision, safety, confidence, and
permission-filter timing signals.

## R7.1 - Implement query contracts pagination sorting and result envelopes

```task
id: KONT-WP-0007-T001
status: done
priority: high
state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356"
```

Define query requests, result envelopes, deterministic pagination, sorting,
diagnostics, and correlation IDs.

Acceptance:

- Repeated equivalent queries return stable ordering within documented limits.
- Results include asset IDs, representation references, metadata, source
  references, and diagnostics.
- Invalid queries return structured validation errors.

Implemented:

- `AssetQueryRequest`, `AssetQueryItem`, `AssetQueryResult`, and
  `AssetRetrievalService` provide the stable asset query contract.
- Queries return deterministic ordering with pagination metadata and
  correlation IDs.
- Result entries expose asset identity, classification, source references,
  representations, and metadata records.
- Invalid lifecycle, representation kind, sort key, sort order, limit, and
  offset return structured diagnostics without raising raw exceptions.

## R7.2 - Implement lexical search over normalized content

```task
id: KONT-WP-0007-T002
status: done
priority: high
state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1"
```

Implement MVP lexical search over normalized representations without making
semantic/vector search a blocker.

Acceptance:

- Text search returns matching assets with relevance metadata.
- Search indexes can be refreshed after ingestion or update.
- p95 latency and zero-result rate can be measured in smoke tests.

Implemented:

- Normalized ingestion now stores representation search text and length
  metadata for retrieval indexing.
- `AssetRetrievalService.refresh_index()` builds a refreshable lexical index
  with indexed asset and representation counts.
- Text queries perform lexical substring matching over normalized
  representations and return relevance metadata including strategy, query,
  match count, and matching representation IDs.
- Query result metadata includes zero-result and lexical index statistics for
  later smoke/performance measurement.

## R7.3 - Implement metadata lifecycle and source-context filters

```task
id: KONT-WP-0007-T003
status: done
priority: high
state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02"
```

Support filters by asset type, collection, source, owner, tags,
classification, sensitivity, lifecycle state, timestamps, and custom metadata.

Acceptance:

- Text search and metadata filters can be combined.
- Lifecycle and sensitivity filters participate in permission checks.
- Filter behavior is covered across in-memory and durable backends where
  supported.

Implemented:

- Asset queries support filters for asset type, lifecycle, sensitivity, owner,
  topic, review state, source system/path, representation kind, collection,
  tags, created/updated timestamp bounds, and custom metadata records.
- Text search can be combined with standard, source, tag, collection,
  sensitivity, and metadata filters.
- Combined filter behavior is covered over in-memory and SQLite-backed asset
  repositories.
- Permission enforcement is intentionally deferred to R7.5; current lifecycle
  and sensitivity filters establish the policy inputs without claiming
  authorization semantics.

## R7.4 - Implement contextual entity model and relationship retrieval

```task
id: KONT-WP-0007-T004
status: done
priority: high
state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30"
```

Represent contextual entities such as people, teams, projects, cases, topics,
source systems, processes, products, and generated artifacts.

Acceptance:

- Assets can be linked to contextual entities.
- Relationship direction, type, validity, confidence, actor, and provenance are
  represented where available.
- Callers can retrieve assets by project, case, topic, source, workflow run, or
  related asset.

Implemented:

- Existing `ContextEntity`/`CoreRelationship` primitives are reused as the
  canonical model; entity types now include workflow runs and generated
  artifacts for operational graph use cases.
- `ContextEntityQueryRequest`/`ContextEntityQueryResult` provide stable
  contextual entity lookup by type, name, external reference, and metadata.
- `RelationshipQueryRequest`/`RelationshipQueryResult` provide stable
  relationship retrieval by source, target, asset, contextual entity,
  workflow run, predicate, target kind, and direction.
- Asset queries can filter by contextual entity, workflow run, related asset,
  and relationship predicate while returning relationship and contextual
  entity context for matched assets.
- Graph retrieval behavior is covered across in-memory and SQLite-backed
  repositories.

## R7.5 - Enforce permission-aware retrieval and fail-closed semantics

```task
id: KONT-WP-0007-T005
status: done
priority: high
state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa"
```

Apply authorization and policy checks before returning content, metadata,
snippets, relationships, derived artifacts, or context packages.

Acceptance:

- Unauthorized assets do not leak through result lists, snippets, relationship
  traversal, or derived answer packages.
- Missing or stale permission context fails closed according to policy.
- Retrieval audit events capture actor, query scope, outcome, and policy
  context.

Implemented:

- Retrieval services accept the engine `PolicyGateway`, defaulting to the
  allow-all local adapter used elsewhere in the system.
- Asset, contextual entity, and relationship queries authorize the query scope
  before loading result envelopes.
- Assets, contextual entities, and relationships are policy-filtered before
  they are returned; relationships additionally require source and target
  resource visibility so traversal cannot reveal denied assets or entities.
- Policy gateway failures produce empty denied envelopes with structured
  diagnostics and fail-closed policy decisions.
- Retrieval audit events capture actor, correlation ID, query scope, policy
  decision, outcome, result counts, and internal permission-filter counts.

## R7.6 - Return source-grounded snippets citations and explanation data

```task
id: KONT-WP-0007-T006
status: done
priority: medium
state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1"
```

Return matched regions, snippets, source references, representation IDs,
relationship context, and citation-ready data for grounded AI workflows.

Acceptance:

- Results explain why they were returned and where they originated.
- Snippets are permission filtered.
- Retrieval packages are suitable for later grounded answer generation.
- Markdown snippets can reference Markitect selector matches or context-package
  spans as adapter provenance.

Implemented:

- `RetrievalSnippet` packets expose asset, representation, source reference,
  storage reference, media type, match offsets, match text, snippet text, and
  adapter provenance.
- Lexical asset queries can request snippets through `include_snippets`,
  `max_snippets`, and `snippet_radius`.
- Snippets are generated from normalized representation search text and are
  attached only to policy-authorized asset results.
- Markitect selectors, source spans, context spans, adapter provenance,
  snapshots, and extractor identity are preserved when supplied as
  representation metadata.
- Snippet behavior is covered with permission filtering so denied matching
  content does not leak through snippet packets.

## R7.7 - Capture retrieval feedback and KPI measurement hooks

```task
id: KONT-WP-0007-T007
status: done
priority: medium
state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde"
```

Capture relevance feedback and quality signals for retrieval improvement.

Acceptance:

- Feedback can mark results useful, irrelevant, missing, unsafe, or low
  confidence.
- Query context and result metadata are stored with feedback.
- Precision@k, zero-result rate, permission-filter latency, and citation
  precision have measurement hooks.

Implemented:

- `RetrievalFeedbackRecord` persists feedback labels for useful, irrelevant,
  missing, unsafe, and low-confidence outcomes with actor, correlation ID,
  query context, result references, notes, and metadata.
- Asset registry repository ports and memory/SQLite adapters persist and list
  retrieval feedback.
- `AssetRetrievalService.record_feedback()` records authorized feedback with
  structured diagnostics for invalid labels or denied feedback operations.
- `AssetRetrievalService.quality_metrics()` derives zero-result rate,
  precision@k, citation precision, feedback totals, unsafe/low-confidence
  counts, and permission-filter timing observations from query results,
  feedback records, and retrieval audit events.

## Definition Of Done

- Retrieval tests cover text, metadata, lifecycle, relationship, contextual
  entity, pagination, permission, snippet, and feedback behavior.
- Retrieval does not bypass policy or source provenance.
- Search, relationship, and context retrieval contracts follow
  `docs/architecture-blueprint.md`.
- `python3 -m pytest` passes.