Files
kontextual-engine/docs/retrieval-implementation.md

5.1 KiB

Retrieval Implementation Note

Date: 2026-05-06

Status: first implementation slice for KONT-WP-0007.

Purpose

This note records the first governed retrieval implementation over the asset registry. It introduces stable query request/result contracts before search ranking, policy filtering, snippets, relationship traversal, and feedback grow more complex.

Implemented Package Shape

src/kontextual_engine/
  services/retrieval_service.py

The retrieval service depends on the asset registry repository port and domain core contracts. It does not depend on HTTP, search backends, Markitect internals, or AI providers.

Implemented Capabilities

  • AssetQueryRequest for stable asset queries with pagination, sorting, and common asset filters.
  • AssetQueryResult envelope with correlation ID, total count, returned count, limit, offset, next offset, sort metadata, results, diagnostics, and success flag.
  • AssetQueryItem result entries carrying asset identity, classification, lifecycle, source references, representations, metadata records, relevance metadata, and diagnostics.
  • Deterministic sorting by title, asset ID, asset type, lifecycle, creation time, or update time with asset ID as a tie-breaker.
  • Pagination by limit and offset.
  • Structured validation diagnostics for invalid lifecycle, representation kind, limit, offset, sort key, and sort order.
  • Standard filters for asset type, lifecycle, sensitivity, owner, topic, review state, metadata records, source system/path, and representation kind.
  • Lexical search over normalized representation search text produced during ingestion or supplied as representation metadata.
  • Refreshable in-memory lexical index with indexed asset/representation counts.
  • Relevance metadata for lexical matches, including strategy, query, match count, and representation IDs.
  • Zero-result measurement metadata for query smoke tests.
  • Additional source-context filters for collection, tags, and created/updated timestamp bounds.
  • ContextEntityQueryRequest/ContextEntityQueryResult for querying contextual entities by type, name, external reference, and metadata.
  • RelationshipQueryRequest/RelationshipQueryResult for stable relationship retrieval by source, target, asset, contextual entity, predicate, target kind, direction, and workflow run.
  • Asset queries can be constrained by contextual entity, workflow run, related asset, and relationship predicate.
  • Asset result entries can carry relationship context and linked contextual entities when graph filters or relationship inclusion are requested.
  • Relationship payloads expose direction, predicate, validity windows, confidence, actor, provenance, creation time, and resolved source/target context where available.
  • Retrieval uses the engine policy gateway for query-scope authorization and per-resource checks before assets, relationships, or contextual entities are returned.
  • Policy failures or unavailable policy context fail closed with empty denied envelopes and structured diagnostics.
  • Retrieval audit events record actor, correlation ID, query scope, policy decision, outcome, result count, total count, and internal permission-filter counts.
  • RetrievalSnippet packets can be requested for lexical queries. Snippets are built from normalized representation search text and carry asset ID, representation ID, source reference ID, storage reference, media type, match offsets, match text, and adapter provenance such as Markitect selectors or source spans when present.
  • Snippets are attached only to policy-authorized asset results, so denied matching content is not exposed through snippet packets.
  • RetrievalFeedbackRecord persists useful, irrelevant, missing, unsafe, and low-confidence feedback with query context, result references, actor, correlation ID, notes, and metadata.
  • Retrieval quality metrics summarize zero-result rate, precision@k, citation precision, feedback counts, unsafe/low-confidence counts, and permission-filter timing observations from retrieval audit events.

Not Yet Implemented

  • Multi-hop relationship traversal and graph ranking.
  • Full grounded answer package assembly.

Test Coverage

tests/test_asset_retrieval_service.py covers:

  • stable paginated query envelopes,
  • result payloads with source references, representations, and metadata records,
  • combined standard, metadata, source, lifecycle, and representation filters,
  • lexical search over normalized content with index refresh and zero-result metadata,
  • combined text, metadata, source, tag, collection, and sensitivity filters over SQLite,
  • contextual entity, workflow run, related asset, and relationship predicate filters,
  • direct contextual entity and relationship queries over SQLite,
  • permission-filtered asset, relationship, and context entity envelopes,
  • fail-closed query scope behavior when policy context is unavailable,
  • source-grounded lexical snippets with representation/source references and adapter provenance,
  • persisted retrieval feedback and KPI measurement hooks over feedback, query results, and retrieval audit timing,
  • structured diagnostics for invalid query requests.