Files

tegwick 1e3c6fe34a stable asset queries, lexical search, filters, contextual entity and relationship retrieval, permission-aware fail-closed behavior, source-grounded snippets, feedback capture, and KPI hooks

2026-05-06 16:27:03 +02:00

12 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id

id	type	title	domain	repo	status	owner	topic_slug	planning_priority	planning_order	created	updated	state_hub_workstream_id
KONT-WP-0007	workplan	Governed Retrieval And Context Graph	markitect	kontextual-engine	done	codex	markitect	high	7	2026-05-05	2026-05-06	64352515-9677-46bb-909a-9e2db4915dc7

KONT-WP-0007: Governed Retrieval And Context Graph

Purpose

Build retrieval as a governed operational capability: stable query contracts, text search, metadata and lifecycle filtering, contextual entities, relationship traversal, source-grounded snippets, permission checks, and quality feedback.

Requirement Coverage

Primary: FR-040 to FR-050 and FR-060 to FR-071.

Supporting: FR-120 to FR-126, FR-143 to FR-146, FR-163, FR-200 to FR-204.

Architecture Constraint

Implement retrieval through retrieval services, search ports, repository ports, and policy checks described in docs/architecture-blueprint.md. Search indexes and ranking backends are adapters; they must not define the stable query or result contracts.

markitect-tool Boundary Remark

For Markdown-backed assets, retrieval adapters may use Markitect selectors, extraction helpers, local index concepts, and context-package source spans to produce grounded units and snippets. Engine retrieval contracts, result envelopes, policy filtering, pagination, feedback, and cross-format search remain engine-owned.

Implementation Status

As of 2026-05-06, the first retrieval slice is recorded in docs/retrieval-implementation.md. It establishes asset query request/result contracts, stable sorting and pagination, result envelopes with source references, representations, metadata records, refreshable lexical search, relevance metadata, zero-result smoke metadata, and structured validation diagnostics. It also supports combined metadata, lifecycle, source-context, tag, collection, timestamp, and representation filters across in-memory and SQLite-backed repositories. The contextual graph slice adds direct contextual entity and relationship query envelopes plus asset filters by contextual entity, workflow run, related asset, and relationship predicate. Remaining work is focused on multi-hop graph traversal/ranking, source-grounded snippets, and feedback/KPI hooks. Permission-aware retrieval now uses the engine policy gateway for query-scope and per-resource checks, with fail-closed denied envelopes and retrieval audit events. Lexical queries can also return source-grounded snippet packets with representation/source references and adapter provenance. Feedback and KPI hooks persist retrieval feedback and derive zero-result, precision, citation precision, safety, confidence, and permission-filter timing signals.

R7.1 - Implement query contracts pagination sorting and result envelopes

id: KONT-WP-0007-T001
status: done
priority: high
state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356"

Define query requests, result envelopes, deterministic pagination, sorting, diagnostics, and correlation IDs.

Acceptance:

Repeated equivalent queries return stable ordering within documented limits.
Results include asset IDs, representation references, metadata, source references, and diagnostics.
Invalid queries return structured validation errors.

Implemented:

AssetQueryRequest, AssetQueryItem, AssetQueryResult, and AssetRetrievalService provide the stable asset query contract.
Queries return deterministic ordering with pagination metadata and correlation IDs.
Result entries expose asset identity, classification, source references, representations, and metadata records.
Invalid lifecycle, representation kind, sort key, sort order, limit, and offset return structured diagnostics without raising raw exceptions.

R7.2 - Implement lexical search over normalized content

id: KONT-WP-0007-T002
status: done
priority: high
state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1"

Implement MVP lexical search over normalized representations without making semantic/vector search a blocker.

Acceptance:

Text search returns matching assets with relevance metadata.
Search indexes can be refreshed after ingestion or update.
p95 latency and zero-result rate can be measured in smoke tests.

Implemented:

Normalized ingestion now stores representation search text and length metadata for retrieval indexing.
AssetRetrievalService.refresh_index() builds a refreshable lexical index with indexed asset and representation counts.
Text queries perform lexical substring matching over normalized representations and return relevance metadata including strategy, query, match count, and matching representation IDs.
Query result metadata includes zero-result and lexical index statistics for later smoke/performance measurement.

R7.3 - Implement metadata lifecycle and source-context filters

id: KONT-WP-0007-T003
status: done
priority: high
state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02"

Support filters by asset type, collection, source, owner, tags, classification, sensitivity, lifecycle state, timestamps, and custom metadata.

Acceptance:

Text search and metadata filters can be combined.
Lifecycle and sensitivity filters participate in permission checks.
Filter behavior is covered across in-memory and durable backends where supported.

Implemented:

Asset queries support filters for asset type, lifecycle, sensitivity, owner, topic, review state, source system/path, representation kind, collection, tags, created/updated timestamp bounds, and custom metadata records.
Text search can be combined with standard, source, tag, collection, sensitivity, and metadata filters.
Combined filter behavior is covered over in-memory and SQLite-backed asset repositories.
Permission enforcement is intentionally deferred to R7.5; current lifecycle and sensitivity filters establish the policy inputs without claiming authorization semantics.

R7.4 - Implement contextual entity model and relationship retrieval

id: KONT-WP-0007-T004
status: done
priority: high
state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30"

Represent contextual entities such as people, teams, projects, cases, topics, source systems, processes, products, and generated artifacts.

Acceptance:

Assets can be linked to contextual entities.
Relationship direction, type, validity, confidence, actor, and provenance are represented where available.
Callers can retrieve assets by project, case, topic, source, workflow run, or related asset.

Implemented:

Existing ContextEntity/CoreRelationship primitives are reused as the canonical model; entity types now include workflow runs and generated artifacts for operational graph use cases.
ContextEntityQueryRequest/ContextEntityQueryResult provide stable contextual entity lookup by type, name, external reference, and metadata.
RelationshipQueryRequest/RelationshipQueryResult provide stable relationship retrieval by source, target, asset, contextual entity, workflow run, predicate, target kind, and direction.
Asset queries can filter by contextual entity, workflow run, related asset, and relationship predicate while returning relationship and contextual entity context for matched assets.
Graph retrieval behavior is covered across in-memory and SQLite-backed repositories.

R7.5 - Enforce permission-aware retrieval and fail-closed semantics

id: KONT-WP-0007-T005
status: done
priority: high
state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa"

Apply authorization and policy checks before returning content, metadata, snippets, relationships, derived artifacts, or context packages.

Acceptance:

Unauthorized assets do not leak through result lists, snippets, relationship traversal, or derived answer packages.
Missing or stale permission context fails closed according to policy.
Retrieval audit events capture actor, query scope, outcome, and policy context.

Implemented:

Retrieval services accept the engine PolicyGateway, defaulting to the allow-all local adapter used elsewhere in the system.
Asset, contextual entity, and relationship queries authorize the query scope before loading result envelopes.
Assets, contextual entities, and relationships are policy-filtered before they are returned; relationships additionally require source and target resource visibility so traversal cannot reveal denied assets or entities.
Policy gateway failures produce empty denied envelopes with structured diagnostics and fail-closed policy decisions.
Retrieval audit events capture actor, correlation ID, query scope, policy decision, outcome, result counts, and internal permission-filter counts.

R7.6 - Return source-grounded snippets citations and explanation data

id: KONT-WP-0007-T006
status: done
priority: medium
state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1"

Return matched regions, snippets, source references, representation IDs, relationship context, and citation-ready data for grounded AI workflows.

Acceptance:

Results explain why they were returned and where they originated.
Snippets are permission filtered.
Retrieval packages are suitable for later grounded answer generation.
Markdown snippets can reference Markitect selector matches or context-package spans as adapter provenance.

Implemented:

RetrievalSnippet packets expose asset, representation, source reference, storage reference, media type, match offsets, match text, snippet text, and adapter provenance.
Lexical asset queries can request snippets through include_snippets, max_snippets, and snippet_radius.
Snippets are generated from normalized representation search text and are attached only to policy-authorized asset results.
Markitect selectors, source spans, context spans, adapter provenance, snapshots, and extractor identity are preserved when supplied as representation metadata.
Snippet behavior is covered with permission filtering so denied matching content does not leak through snippet packets.

R7.7 - Capture retrieval feedback and KPI measurement hooks

id: KONT-WP-0007-T007
status: done
priority: medium
state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde"

Capture relevance feedback and quality signals for retrieval improvement.

Acceptance:

Feedback can mark results useful, irrelevant, missing, unsafe, or low confidence.
Query context and result metadata are stored with feedback.
Precision@k, zero-result rate, permission-filter latency, and citation precision have measurement hooks.

Implemented:

RetrievalFeedbackRecord persists feedback labels for useful, irrelevant, missing, unsafe, and low-confidence outcomes with actor, correlation ID, query context, result references, notes, and metadata.
Asset registry repository ports and memory/SQLite adapters persist and list retrieval feedback.
AssetRetrievalService.record_feedback() records authorized feedback with structured diagnostics for invalid labels or denied feedback operations.
AssetRetrievalService.quality_metrics() derives zero-result rate, precision@k, citation precision, feedback totals, unsafe/low-confidence counts, and permission-filter timing observations from query results, feedback records, and retrieval audit events.

Definition Of Done

Retrieval tests cover text, metadata, lifecycle, relationship, contextual entity, pagination, permission, snippet, and feedback behavior.
Retrieval does not bypass policy or source provenance.
Search, relationship, and context retrieval contracts follow docs/architecture-blueprint.md.
python3 -m pytest passes.

12 KiB Raw Blame History

KONT-WP-0007: Governed Retrieval And Context Graph

Purpose

Requirement Coverage

Architecture Constraint

markitect-tool Boundary Remark

Implementation Status

R7.1 - Implement query contracts pagination sorting and result envelopes

R7.2 - Implement lexical search over normalized content

R7.3 - Implement metadata lifecycle and source-context filters

R7.4 - Implement contextual entity model and relationship retrieval

R7.5 - Enforce permission-aware retrieval and fail-closed semantics

R7.6 - Return source-grounded snippets citations and explanation data

R7.7 - Capture retrieval feedback and KPI measurement hooks

Definition Of Done

12 KiB

Raw Blame History