generated from coulomb/repo-seed
111 lines
5.1 KiB
Markdown
111 lines
5.1 KiB
Markdown
# Retrieval Implementation Note
|
|
|
|
Date: 2026-05-06
|
|
|
|
Status: first implementation slice for `KONT-WP-0007`.
|
|
|
|
## Purpose
|
|
|
|
This note records the first governed retrieval implementation over the asset
|
|
registry. It introduces stable query request/result contracts before search
|
|
ranking, policy filtering, snippets, relationship traversal, and feedback grow
|
|
more complex.
|
|
|
|
## Implemented Package Shape
|
|
|
|
```text
|
|
src/kontextual_engine/
|
|
services/retrieval_service.py
|
|
```
|
|
|
|
The retrieval service depends on the asset registry repository port and domain
|
|
core contracts. It does not depend on HTTP, search backends, Markitect internals,
|
|
or AI providers.
|
|
|
|
## Implemented Capabilities
|
|
|
|
- `AssetQueryRequest` for stable asset queries with pagination, sorting, and
|
|
common asset filters.
|
|
- `AssetQueryResult` envelope with correlation ID, total count, returned count,
|
|
limit, offset, next offset, sort metadata, results, diagnostics, and success
|
|
flag.
|
|
- `AssetQueryItem` result entries carrying asset identity, classification,
|
|
lifecycle, source references, representations, metadata records, relevance
|
|
metadata, and diagnostics.
|
|
- Deterministic sorting by title, asset ID, asset type, lifecycle, creation
|
|
time, or update time with asset ID as a tie-breaker.
|
|
- Pagination by limit and offset.
|
|
- Structured validation diagnostics for invalid lifecycle, representation
|
|
kind, limit, offset, sort key, and sort order.
|
|
- Standard filters for asset type, lifecycle, sensitivity, owner, topic, review
|
|
state, metadata records, source system/path, and representation kind.
|
|
- Lexical search over normalized representation search text produced during
|
|
ingestion or supplied as representation metadata.
|
|
- Refreshable in-memory lexical index with indexed asset/representation counts.
|
|
- Relevance metadata for lexical matches, including strategy, query, match
|
|
count, and representation IDs.
|
|
- Zero-result measurement metadata for query smoke tests.
|
|
- Additional source-context filters for collection, tags, and created/updated
|
|
timestamp bounds.
|
|
- `ContextEntityQueryRequest`/`ContextEntityQueryResult` for querying
|
|
contextual entities by type, name, external reference, and metadata.
|
|
- `RelationshipQueryRequest`/`RelationshipQueryResult` for stable relationship
|
|
retrieval by source, target, asset, contextual entity, predicate, target
|
|
kind, direction, and workflow run.
|
|
- Asset queries can be constrained by contextual entity, workflow run, related
|
|
asset, and relationship predicate.
|
|
- Asset result entries can carry relationship context and linked contextual
|
|
entities when graph filters or relationship inclusion are requested.
|
|
- Relationship payloads expose direction, predicate, validity windows,
|
|
confidence, actor, provenance, creation time, and resolved source/target
|
|
context where available.
|
|
- Retrieval uses the engine policy gateway for query-scope authorization and
|
|
per-resource checks before assets, relationships, or contextual entities are
|
|
returned.
|
|
- Policy failures or unavailable policy context fail closed with empty denied
|
|
envelopes and structured diagnostics.
|
|
- Retrieval audit events record actor, correlation ID, query scope, policy
|
|
decision, outcome, result count, total count, and internal
|
|
permission-filter counts.
|
|
- `RetrievalSnippet` packets can be requested for lexical queries. Snippets are
|
|
built from normalized representation search text and carry asset ID,
|
|
representation ID, source reference ID, storage reference, media type, match
|
|
offsets, match text, and adapter provenance such as Markitect selectors or
|
|
source spans when present.
|
|
- Snippets are attached only to policy-authorized asset results, so denied
|
|
matching content is not exposed through snippet packets.
|
|
- `RetrievalFeedbackRecord` persists useful, irrelevant, missing, unsafe, and
|
|
low-confidence feedback with query context, result references, actor,
|
|
correlation ID, notes, and metadata.
|
|
- Retrieval quality metrics summarize zero-result rate, precision@k,
|
|
citation precision, feedback counts, unsafe/low-confidence counts, and
|
|
permission-filter timing observations from retrieval audit events.
|
|
|
|
## Not Yet Implemented
|
|
|
|
- Multi-hop relationship traversal and graph ranking.
|
|
- Full grounded answer package assembly.
|
|
|
|
## Test Coverage
|
|
|
|
`tests/test_asset_retrieval_service.py` covers:
|
|
|
|
- stable paginated query envelopes,
|
|
- result payloads with source references, representations, and metadata
|
|
records,
|
|
- combined standard, metadata, source, lifecycle, and representation filters,
|
|
- lexical search over normalized content with index refresh and zero-result
|
|
metadata,
|
|
- combined text, metadata, source, tag, collection, and sensitivity filters
|
|
over SQLite,
|
|
- contextual entity, workflow run, related asset, and relationship predicate
|
|
filters,
|
|
- direct contextual entity and relationship queries over SQLite,
|
|
- permission-filtered asset, relationship, and context entity envelopes,
|
|
- fail-closed query scope behavior when policy context is unavailable,
|
|
- source-grounded lexical snippets with representation/source references and
|
|
adapter provenance,
|
|
- persisted retrieval feedback and KPI measurement hooks over feedback,
|
|
query results, and retrieval audit timing,
|
|
- structured diagnostics for invalid query requests.
|