generated from coulomb/repo-seed
stable asset queries, lexical search, filters, contextual entity and relationship retrieval, permission-aware fail-closed behavior, source-grounded snippets, feedback capture, and KPI hooks
This commit is contained in:
110
docs/retrieval-implementation.md
Normal file
110
docs/retrieval-implementation.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Retrieval Implementation Note
|
||||
|
||||
Date: 2026-05-06
|
||||
|
||||
Status: first implementation slice for `KONT-WP-0007`.
|
||||
|
||||
## Purpose
|
||||
|
||||
This note records the first governed retrieval implementation over the asset
|
||||
registry. It introduces stable query request/result contracts before search
|
||||
ranking, policy filtering, snippets, relationship traversal, and feedback grow
|
||||
more complex.
|
||||
|
||||
## Implemented Package Shape
|
||||
|
||||
```text
|
||||
src/kontextual_engine/
|
||||
services/retrieval_service.py
|
||||
```
|
||||
|
||||
The retrieval service depends on the asset registry repository port and domain
|
||||
core contracts. It does not depend on HTTP, search backends, Markitect internals,
|
||||
or AI providers.
|
||||
|
||||
## Implemented Capabilities
|
||||
|
||||
- `AssetQueryRequest` for stable asset queries with pagination, sorting, and
|
||||
common asset filters.
|
||||
- `AssetQueryResult` envelope with correlation ID, total count, returned count,
|
||||
limit, offset, next offset, sort metadata, results, diagnostics, and success
|
||||
flag.
|
||||
- `AssetQueryItem` result entries carrying asset identity, classification,
|
||||
lifecycle, source references, representations, metadata records, relevance
|
||||
metadata, and diagnostics.
|
||||
- Deterministic sorting by title, asset ID, asset type, lifecycle, creation
|
||||
time, or update time with asset ID as a tie-breaker.
|
||||
- Pagination by limit and offset.
|
||||
- Structured validation diagnostics for invalid lifecycle, representation
|
||||
kind, limit, offset, sort key, and sort order.
|
||||
- Standard filters for asset type, lifecycle, sensitivity, owner, topic, review
|
||||
state, metadata records, source system/path, and representation kind.
|
||||
- Lexical search over normalized representation search text produced during
|
||||
ingestion or supplied as representation metadata.
|
||||
- Refreshable in-memory lexical index with indexed asset/representation counts.
|
||||
- Relevance metadata for lexical matches, including strategy, query, match
|
||||
count, and representation IDs.
|
||||
- Zero-result measurement metadata for query smoke tests.
|
||||
- Additional source-context filters for collection, tags, and created/updated
|
||||
timestamp bounds.
|
||||
- `ContextEntityQueryRequest`/`ContextEntityQueryResult` for querying
|
||||
contextual entities by type, name, external reference, and metadata.
|
||||
- `RelationshipQueryRequest`/`RelationshipQueryResult` for stable relationship
|
||||
retrieval by source, target, asset, contextual entity, predicate, target
|
||||
kind, direction, and workflow run.
|
||||
- Asset queries can be constrained by contextual entity, workflow run, related
|
||||
asset, and relationship predicate.
|
||||
- Asset result entries can carry relationship context and linked contextual
|
||||
entities when graph filters or relationship inclusion are requested.
|
||||
- Relationship payloads expose direction, predicate, validity windows,
|
||||
confidence, actor, provenance, creation time, and resolved source/target
|
||||
context where available.
|
||||
- Retrieval uses the engine policy gateway for query-scope authorization and
|
||||
per-resource checks before assets, relationships, or contextual entities are
|
||||
returned.
|
||||
- Policy failures or unavailable policy context fail closed with empty denied
|
||||
envelopes and structured diagnostics.
|
||||
- Retrieval audit events record actor, correlation ID, query scope, policy
|
||||
decision, outcome, result count, total count, and internal
|
||||
permission-filter counts.
|
||||
- `RetrievalSnippet` packets can be requested for lexical queries. Snippets are
|
||||
built from normalized representation search text and carry asset ID,
|
||||
representation ID, source reference ID, storage reference, media type, match
|
||||
offsets, match text, and adapter provenance such as Markitect selectors or
|
||||
source spans when present.
|
||||
- Snippets are attached only to policy-authorized asset results, so denied
|
||||
matching content is not exposed through snippet packets.
|
||||
- `RetrievalFeedbackRecord` persists useful, irrelevant, missing, unsafe, and
|
||||
low-confidence feedback with query context, result references, actor,
|
||||
correlation ID, notes, and metadata.
|
||||
- Retrieval quality metrics summarize zero-result rate, precision@k,
|
||||
citation precision, feedback counts, unsafe/low-confidence counts, and
|
||||
permission-filter timing observations from retrieval audit events.
|
||||
|
||||
## Not Yet Implemented
|
||||
|
||||
- Multi-hop relationship traversal and graph ranking.
|
||||
- Full grounded answer package assembly.
|
||||
|
||||
## Test Coverage
|
||||
|
||||
`tests/test_asset_retrieval_service.py` covers:
|
||||
|
||||
- stable paginated query envelopes,
|
||||
- result payloads with source references, representations, and metadata
|
||||
records,
|
||||
- combined standard, metadata, source, lifecycle, and representation filters,
|
||||
- lexical search over normalized content with index refresh and zero-result
|
||||
metadata,
|
||||
- combined text, metadata, source, tag, collection, and sensitivity filters
|
||||
over SQLite,
|
||||
- contextual entity, workflow run, related asset, and relationship predicate
|
||||
filters,
|
||||
- direct contextual entity and relationship queries over SQLite,
|
||||
- permission-filtered asset, relationship, and context entity envelopes,
|
||||
- fail-closed query scope behavior when policy context is unavailable,
|
||||
- source-grounded lexical snippets with representation/source references and
|
||||
adapter provenance,
|
||||
- persisted retrieval feedback and KPI measurement hooks over feedback,
|
||||
query results, and retrieval audit timing,
|
||||
- structured diagnostics for invalid query requests.
|
||||
Reference in New Issue
Block a user