stable asset queries, lexical search, filters, contextual entity and relationship retrieval, permission-aware fail-closed behavior, source-grounded snippets, feedback capture, and KPI hooks

2026-05-06 16:27:03 +02:00
parent 80a3e59701
commit 1e3c6fe34a
13 changed files with 3173 additions and 9 deletions
--- a/docs/retrieval-implementation.md
+++ b/docs/retrieval-implementation.md
@@ -0,0 +1,110 @@
+# Retrieval Implementation Note
+
+Date: 2026-05-06
+
+Status: first implementation slice for `KONT-WP-0007`.
+
+## Purpose
+
+This note records the first governed retrieval implementation over the asset
+registry. It introduces stable query request/result contracts before search
+ranking, policy filtering, snippets, relationship traversal, and feedback grow
+more complex.
+
+## Implemented Package Shape
+
+```text
+src/kontextual_engine/
+  services/retrieval_service.py
+```
+
+The retrieval service depends on the asset registry repository port and domain
+core contracts. It does not depend on HTTP, search backends, Markitect internals,
+or AI providers.
+
+## Implemented Capabilities
+
+- `AssetQueryRequest` for stable asset queries with pagination, sorting, and
+  common asset filters.
+- `AssetQueryResult` envelope with correlation ID, total count, returned count,
+  limit, offset, next offset, sort metadata, results, diagnostics, and success
+  flag.
+- `AssetQueryItem` result entries carrying asset identity, classification,
+  lifecycle, source references, representations, metadata records, relevance
+  metadata, and diagnostics.
+- Deterministic sorting by title, asset ID, asset type, lifecycle, creation
+  time, or update time with asset ID as a tie-breaker.
+- Pagination by limit and offset.
+- Structured validation diagnostics for invalid lifecycle, representation
+  kind, limit, offset, sort key, and sort order.
+- Standard filters for asset type, lifecycle, sensitivity, owner, topic, review
+  state, metadata records, source system/path, and representation kind.
+- Lexical search over normalized representation search text produced during
+  ingestion or supplied as representation metadata.
+- Refreshable in-memory lexical index with indexed asset/representation counts.
+- Relevance metadata for lexical matches, including strategy, query, match
+  count, and representation IDs.
+- Zero-result measurement metadata for query smoke tests.
+- Additional source-context filters for collection, tags, and created/updated
+  timestamp bounds.
+- `ContextEntityQueryRequest`/`ContextEntityQueryResult` for querying
+  contextual entities by type, name, external reference, and metadata.
+- `RelationshipQueryRequest`/`RelationshipQueryResult` for stable relationship
+  retrieval by source, target, asset, contextual entity, predicate, target
+  kind, direction, and workflow run.
+- Asset queries can be constrained by contextual entity, workflow run, related
+  asset, and relationship predicate.
+- Asset result entries can carry relationship context and linked contextual
+  entities when graph filters or relationship inclusion are requested.
+- Relationship payloads expose direction, predicate, validity windows,
+  confidence, actor, provenance, creation time, and resolved source/target
+  context where available.
+- Retrieval uses the engine policy gateway for query-scope authorization and
+  per-resource checks before assets, relationships, or contextual entities are
+  returned.
+- Policy failures or unavailable policy context fail closed with empty denied
+  envelopes and structured diagnostics.
+- Retrieval audit events record actor, correlation ID, query scope, policy
+  decision, outcome, result count, total count, and internal
+  permission-filter counts.
+- `RetrievalSnippet` packets can be requested for lexical queries. Snippets are
+  built from normalized representation search text and carry asset ID,
+  representation ID, source reference ID, storage reference, media type, match
+  offsets, match text, and adapter provenance such as Markitect selectors or
+  source spans when present.
+- Snippets are attached only to policy-authorized asset results, so denied
+  matching content is not exposed through snippet packets.
+- `RetrievalFeedbackRecord` persists useful, irrelevant, missing, unsafe, and
+  low-confidence feedback with query context, result references, actor,
+  correlation ID, notes, and metadata.
+- Retrieval quality metrics summarize zero-result rate, precision@k,
+  citation precision, feedback counts, unsafe/low-confidence counts, and
+  permission-filter timing observations from retrieval audit events.
+
+## Not Yet Implemented
+
+- Multi-hop relationship traversal and graph ranking.
+- Full grounded answer package assembly.
+
+## Test Coverage
+
+`tests/test_asset_retrieval_service.py` covers:
+
+- stable paginated query envelopes,
+- result payloads with source references, representations, and metadata
+  records,
+- combined standard, metadata, source, lifecycle, and representation filters,
+- lexical search over normalized content with index refresh and zero-result
+  metadata,
+- combined text, metadata, source, tag, collection, and sensitivity filters
+  over SQLite,
+- contextual entity, workflow run, related asset, and relationship predicate
+  filters,
+- direct contextual entity and relationship queries over SQLite,
+- permission-filtered asset, relationship, and context entity envelopes,
+- fail-closed query scope behavior when policy context is unavailable,
+- source-grounded lexical snippets with representation/source references and
+  adapter provenance,
+- persisted retrieval feedback and KPI measurement hooks over feedback,
+  query results, and retrieval audit timing,
+- structured diagnostics for invalid query requests.