From 228b397fc539e8777c81f052b64f912898969535 Mon Sep 17 00:00:00 2001 From: tegwick Date: Tue, 5 May 2026 18:45:09 +0200 Subject: [PATCH] Generated new set of workplans --- README.md | 12 +- SCOPE.md | 180 ++++++-------- docs/knowledge-operations-roadmap.md | 109 ++++++++ workplans/KONT-WP-0004-durable-persistence.md | 234 ------------------ ...-0004-knowledge-operations-architecture.md | 200 +++++++++++++++ ...WP-0005-asset-registry-governance-state.md | 173 +++++++++++++ ...06-multi-format-ingestion-normalization.md | 171 +++++++++++++ ...P-0007-governed-retrieval-context-graph.md | 170 +++++++++++++ ...T-WP-0008-transformations-workflow-jobs.md | 170 +++++++++++++ ...P-0009-service-api-agent-safe-operation.md | 172 +++++++++++++ ...servability-export-enterprise-readiness.md | 179 ++++++++++++++ 11 files changed, 1430 insertions(+), 340 deletions(-) create mode 100644 docs/knowledge-operations-roadmap.md delete mode 100644 workplans/KONT-WP-0004-durable-persistence.md create mode 100644 workplans/KONT-WP-0004-knowledge-operations-architecture.md create mode 100644 workplans/KONT-WP-0005-asset-registry-governance-state.md create mode 100644 workplans/KONT-WP-0006-multi-format-ingestion-normalization.md create mode 100644 workplans/KONT-WP-0007-governed-retrieval-context-graph.md create mode 100644 workplans/KONT-WP-0008-transformations-workflow-jobs.md create mode 100644 workplans/KONT-WP-0009-service-api-agent-safe-operation.md create mode 100644 workplans/KONT-WP-0010-observability-export-enterprise-readiness.md diff --git a/README.md b/README.md index 82a05bb..556a689 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,17 @@ # kontextual-engine -AI-first, headless knowledge engine for persistent, operable structured -knowledge. +Headless knowledge operations engine for turning heterogeneous information +assets into persistent, contextual, governed, retrievable, transformable, and +agent-operable knowledge. Start here: - `INTENT.md` - `wiki/ProductRequirementsDocument.md` - `wiki/FunctionalRequirementsSpecification.md` +- `wiki/kontextual-engine_scope_research_md_bundle/` - `SCOPE.md` +- `docs/knowledge-operations-roadmap.md` - `docs/stack-decision.md` - `docs/markitect-main-scope-assessment.md` - `docs/markitect-tool-reuse-boundary.md` @@ -28,4 +31,7 @@ python3 -m pytest The first runtime slice implements artifacts, collections, relationships, in-memory storage, ingestion adapters, query, workflow run manifests, and -agent-facing context packages. +agent-facing context packages. The current roadmap re-scopes the next work +around the V0.2 knowledge operations vision: governed asset identity, +multi-format ingestion, retrieval, traceable transformations, workflows, +service APIs, agent-safe operation, observability, and export. diff --git a/SCOPE.md b/SCOPE.md index 04b26de..c016d8c 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -5,142 +5,116 @@ --- -## One-liner +## One-Liner -AI-first, headless knowledge engine that makes structured knowledge persistent, -queryable, orchestratable, and operable across formats. +Headless knowledge operations engine for turning heterogeneous information +assets into persistent, contextual, governed, retrievable, transformable, and +agent-operable knowledge. --- ## Core Idea -`kontextual-engine` is the system-layer successor to the platform portions of -`markitect-main`. It should preserve the useful ideas around infospaces, -knowledge artifacts, relationships, retrieval, workflow execution, and agent -context, while avoiding the old repo's mixed ownership of markdown primitives, -UI, provider integrations, and project/domain content. +`kontextual-engine` provides reusable backend capabilities for making fragmented +information operational. It is meant to sit behind applications, automation, +workflows, services, and AI agents that need durable knowledge assets rather +than disconnected files, documents, records, notes, datasets, generated outputs, +or content collections. -The engine owns the runtime contract for persistent knowledge systems. Lower -level syntax operations belong in `markitect-tool`; concrete domain workspaces -belong in `infospace-bench`. +The engine owns stable asset identity, source provenance, metadata, +relationships, lifecycle state, permission-aware retrieval, traceable +transformation, workflow state, auditability, exportability, and controlled +agent operation. + +It should support CMS-like, DMS-like, ECM-like, file-service, knowledge-base, +research-support, and AI-assisted workflow use cases without becoming a finished +end-user application in any one category. --- ## In Scope -- Persistent storage and lifecycle management for knowledge artifacts. -- Collections/domains, metadata, and relationships between artifacts. -- Multi-format ingestion interfaces and normalized internal representations. -- Query, retrieval, indexing, and composition APIs. -- Workflow orchestration for transformation, generation, and analysis. -- Agent-facing context continuity and operation surfaces. -- Integration adapters for lower-layer tools such as `markitect-tool`. -- Structured errors, deterministic behavior where applicable, and auditable - state transitions. +- Knowledge asset registry with stable asset IDs. +- Persistent management of source, normalized, and derived asset forms. +- Source references, provenance, ingestion history, and lifecycle state. +- Multi-format ingestion and normalization for common knowledge assets. +- Metadata, classification, custom schemas, contextual entities, and typed + relationships. +- Search, filtering, querying, source-grounded snippets, relationship retrieval, + and permission-aware access. +- Traceable transformations that produce derived artifacts with lineage. +- Workflow and job orchestration for ingestion, enrichment, validation, review, + transformation, publication, archival, synchronization, and export. +- Actors, permissions, policy checks, review gates, audit logs, and + fail-closed operation for ambiguous access. +- Agent-safe operation through explicit, bounded, permissioned, auditable, and + reviewable APIs. +- Service and programmatic APIs, adapter boundaries, extension hooks, events, + observability, export, and portability. --- ## Out of Scope -- Low-level markdown parsing, schema primitives, document transforms, or CLI - tooling that belongs in `markitect-tool`. -- Visual UI applications, rendering plugins, or WYSIWYG editing. -- Domain-specific infospace content or benchmark corpora that belong in - `infospace-bench`. -- Direct ownership of LLM provider adapters; use `llm-connect` or equivalent. -- Finance, issue tracking, profile management, release tooling, and other - legacy `markitect-main` utilities unrelated to the engine contract. -- A CLI-first product posture; any CLI should remain an administrative or - development convenience over service/programmatic APIs. +- A finished end-user ECM, DMS, CMS, intranet, workspace, or file-sharing + product by itself. +- A visual website builder, WYSIWYG authoring suite, or document editor. +- A file sync client or simple file browser. +- A markdown-specific tooling layer; low-level markdown operations belong in + `markitect-tool` or equivalent adapters. +- A pure vector database, standalone search appliance, or generic chatbot over + documents. +- A domain-specific knowledge base with hard-coded legal, support, research, or + marketing semantics. +- Direct ownership of every enterprise connector, AI provider, embedding model, + search backend, workflow engine, or deployment platform. +- Direct ownership of LLM provider adapters; use `llm-connect` or equivalent + provider-neutral integrations. --- ## Relevant When -- A project needs durable knowledge artifacts instead of one-off file parsing. -- Agents need stable context and retrievable state across sessions. -- Workflows must ingest, normalize, transform, compose, and query knowledge. -- Multiple formats and external tooling need a common runtime layer. -- A higher-level application needs a headless knowledge service. +- A project needs stable knowledge asset identity rather than path-only file + references. +- Heterogeneous documents, files, markdown, PDFs, datasets, notes, records, or + generated outputs need a common operational layer. +- Search, retrieval, transformations, workflows, and AI operations must preserve + provenance, permissions, and auditability. +- Derived summaries, reports, extracts, classifications, or generated artifacts + must remain traceable to source assets and operation context. +- AI agents need bounded, permissioned, source-grounded context and explicit + operation surfaces. +- A higher-level product needs a headless knowledge engine rather than another + monolithic content application. --- ## Not Relevant When -- The task is only markdown syntax manipulation or schema validation. -- The primary need is an end-user visual application. -- The work is domain-specific corpus curation without runtime needs. -- Provider-specific LLM client behavior is the main concern. +- The task is only low-level markdown syntax manipulation or schema validation. +- The primary need is a polished end-user UI. +- The desired product is a file synchronization client or office editor. +- A project only needs a vector index without durable identity, provenance, + permissions, workflows, audit, or derived artifact lineage. +- The work is domain-specific corpus curation without reusable engine needs. --- ## Current State -- Status: scoping / foundation. -- Implementation: documentation and workplans only. -- Stability: evolving. -- Usage: successor planning for the in-scope system-layer parts of - `markitect-main`. +- Status: foundation complete; roadmap re-scoped around the V0.2 knowledge + operations vision. +- Implementation: first runtime slice exists for artifacts, collections, + relationships, in-memory storage, ingestion adapters, query, workflow run + manifests, and agent-facing context packages. +- Next work: execute `KONT-WP-0004` through `KONT-WP-0010`, starting with the + architecture rebase and then building durable governed asset operations. --- -## How It Fits +## Keywords -- Upstream dependencies: `markitect-tool` for syntax-layer primitives, - `llm-connect` for provider-neutral LLM access, storage backends to be chosen. -- Downstream consumers: `infospace-bench`, future knowledge services, agents, - and automation systems. -- Often used with: State Hub for planning/coordination, markitect ecosystem - repos for adjacent responsibilities. - ---- - -## Terminology - -- Preferred terms: knowledge artifact, collection, relationship, ingestion, - normalization, workflow, context, operation. -- Also known as: Kontextual Engine, knowledge runtime, headless knowledge - engine. -- Potentially confusing terms: "infospace" is a conceptual collection pattern - inherited from `markitect-main`, not necessarily a project directory or a - UI-facing workspace. - ---- - -## Related / Overlapping - -- `markitect-main` — legacy mixed platform; source for candidate behavior and - tests, not the target architecture. -- `markitect-tool` — syntax layer for markdown and structured document - primitives. -- `infospace-bench` — application/project layer for concrete knowledge spaces. -- `llm-connect` — LLM provider abstraction that this repo may call but should - not replace. -- `the-custodian/state-hub` — coordination and repo/workplan index. - ---- - -## Getting Oriented - -- Start with: `INTENT.md`, `wiki/ProductRequirementsDocument.md`, - `wiki/FunctionalRequirementsSpecification.md`. -- Key files / directories: `docs/`, `workplans/`, `SCOPE.md`, `CLAUDE.md`. -- Entry points: none yet; implementation starts from the workplans. - ---- - -## Provided Capabilities - -```capability -type: service -title: Persistent knowledge runtime -description: Provides the planned system layer for storing, querying, transforming, and orchestrating structured knowledge artifacts across formats. -keywords: [knowledge, runtime, persistence, orchestration, retrieval] -``` - -```capability -type: automation -title: Agent-operable knowledge workflows -description: Provides planned APIs and workflow surfaces that let agents access context, trigger transformations, and operate over durable knowledge state. -keywords: [agent, workflow, context, automation, knowledge] -``` +knowledge, content, assets, provenance, retrieval, governance, audit, +workflow, transformation, agent-safe, API-first, metadata, relationships, +ingestion, export, portability diff --git a/docs/knowledge-operations-roadmap.md b/docs/knowledge-operations-roadmap.md new file mode 100644 index 0000000..9248df4 --- /dev/null +++ b/docs/knowledge-operations-roadmap.md @@ -0,0 +1,109 @@ +# Knowledge Operations Roadmap + +Date: 2026-05-05 + +This roadmap re-scopes `kontextual-engine` around the updated `INTENT.md`, +Product Requirements Document V0.2, Functional Requirements Specification V0.2, +and the research bundle under +`wiki/kontextual-engine_scope_research_md_bundle/`. + +## Review Finding + +The refreshed vision changes the center of gravity. The project is no longer +best described as a persistence-oriented system layer that happens to support +agents. It is a headless knowledge operations engine: stable asset identity, +context, provenance, governed retrieval, traceable transformation, workflow, +audit, export, and agent-safe operation are all first-order concerns. + +The previous `KONT-WP-0004: Durable Persistence Foundation` captured a real +gap, but it was too narrow. Durable storage is now one part of a broader asset +registry, governance, lineage, audit, workflow, and export foundation. + +## Product Shape + +The new source documents establish the engine as reusable backend capability +for systems that need to operate heterogeneous information assets: + +- files and folders, +- markdown and text repositories, +- office documents and PDFs, +- datasets and structured records, +- notes, policies, and project documentation, +- knowledge-base articles, +- generated AI outputs, +- operational documents and application-linked records. + +The strongest implementation wedge is: + +> Ingest a heterogeneous project or organizational corpus, assign stable asset +> identities, extract metadata and structure, build contextual relationships, +> support governed retrieval, and produce traceable derived artifacts through +> API-accessible workflows. + +## Roadmap Principles + +- Treat identity, provenance, permission checks, audit, and structured errors as + P0 infrastructure, not enterprise-only additions. +- Separate source, normalized, and derived representations. +- Keep transformations traceable to inputs, versions, parameters, actor, policy, + and output artifacts. +- Expose APIs before UI and keep UI/application concerns as consumers. +- Make agent operation explicit, bounded, permissioned, auditable, and + review-gated where needed. +- Use adapters for markdown tooling, document extraction, AI providers, search, + workflow engines, external policy systems, and storage backends. + +## Workplan Set + +| Workplan | Role | Primary Coverage | +| --- | --- | --- | +| `KONT-WP-0004` | Architecture rebase | Resolve open product/architecture decisions and publish V0.2 traceability. | +| `KONT-WP-0005` | Asset registry core | Stable identity, source/normalized/derived forms, metadata, permissions, audit, durable state. | +| `KONT-WP-0006` | Ingestion core | Jobs, connectors, extractors, local files, markdown, PDFs, office docs, datasets, normalization. | +| `KONT-WP-0007` | Retrieval core | Query contracts, lexical search, filters, context graph retrieval, permission-aware results, snippets, KPIs. | +| `KONT-WP-0008` | Operations core | Traceable transformations, derived artifacts, workflow templates, jobs, retries, review gates, exceptions. | +| `KONT-WP-0009` | Service and agents | Versioned service API, actor context, authorization middleware, agent operation catalog, context packages. | +| `KONT-WP-0010` | Enterprise readiness | Observability, admin recovery, export packages, governance inspection, events, quality and performance signals. | + +## Superseded Plan + +The old persistence-only `KONT-WP-0004` scope is superseded: + +- Durable asset state moves into `KONT-WP-0005`. +- Workflow run persistence moves into `KONT-WP-0008`. +- Context package references and agent constraints move into `KONT-WP-0009`. +- Snapshot/export behavior moves into `KONT-WP-0010`. + +The same State Hub workstream ID is retained for `KONT-WP-0004`, but its +purpose is now the architecture rebase that makes the new implementation set +coherent. + +## FRS Traceability + +| FRS Area | Workplans | +| --- | --- | +| FR-001 to FR-010 asset registry and persistence | `KONT-WP-0004`, `KONT-WP-0005` | +| FR-020 to FR-030 ingestion and normalization | `KONT-WP-0004`, `KONT-WP-0006` | +| FR-040 to FR-050 metadata, classification, context | `KONT-WP-0004`, `KONT-WP-0005`, `KONT-WP-0007` | +| FR-060 to FR-071 search, query, retrieval | `KONT-WP-0004`, `KONT-WP-0007` | +| FR-080 to FR-090 transformations and derived artifacts | `KONT-WP-0004`, `KONT-WP-0008` | +| FR-100 to FR-110 workflows and jobs | `KONT-WP-0004`, `KONT-WP-0008` | +| FR-120 to FR-132 permissions, governance, audit, lifecycle | `KONT-WP-0004`, `KONT-WP-0005`, `KONT-WP-0010` | +| FR-140 to FR-146 versioning and provenance | `KONT-WP-0004`, `KONT-WP-0005`, `KONT-WP-0008` | +| FR-160 to FR-169 agent-safe operation | `KONT-WP-0004`, `KONT-WP-0009` | +| FR-180 to FR-188 APIs, integration, extensibility | `KONT-WP-0004`, `KONT-WP-0009`, `KONT-WP-0010` | +| FR-200 to FR-207 observability and administration | `KONT-WP-0008`, `KONT-WP-0010` | +| FR-220 to FR-225 export and portability | `KONT-WP-0010` | +| FR-240 to FR-245 errors and correctness | `KONT-WP-0005`, `KONT-WP-0009` | + +## Current Capability Reality + +The existing code remains useful but now represents only an early contract +slice. It has artifacts, collections, relationships, in-memory storage, +ingestion adapters, query, workflow run manifests, relationship graph helpers, +and context packages. It does not yet implement durable governed asset identity, +multi-format document ingestion, permission-aware retrieval, service APIs, +audit logs, transformation execution, export packages, or agent-safe operation +gates. + +That gap is exactly what the new workplan set is designed to close. diff --git a/workplans/KONT-WP-0004-durable-persistence.md b/workplans/KONT-WP-0004-durable-persistence.md deleted file mode 100644 index d9477bc..0000000 --- a/workplans/KONT-WP-0004-durable-persistence.md +++ /dev/null @@ -1,234 +0,0 @@ ---- -id: KONT-WP-0004 -type: workplan -title: "Durable Persistence Foundation" -domain: markitect -repo: kontextual-engine -status: todo -owner: codex -topic_slug: markitect -created: "2026-05-05" -updated: "2026-05-05" -state_hub_workstream_id: "e177f2dc-a2a0-41a4-b5cd-82e8f9f12f34" ---- - -# KONT-WP-0004: Durable Persistence Foundation - -## Purpose - -Close the persistence gap identified after `KONT-WP-0003` by turning the current -in-memory repository contract into a durable, local-first storage foundation for -knowledge artifacts, collections, relationships, workflow state, and context -references. - -This workplan deliberately does not implement `phase-memory` behavior. It uses -`docs/phase-memory-boundary.md` as the boundary: `kontextual-engine` persists -durable knowledge runtime state; `phase-memory` owns agentic memory phases, -profiles, compaction, retention, and activation planning. - -## Persistence Scope - -In scope: - -- Durable storage for collections, artifacts, artifact revisions, and - relationships. -- Durable storage for workflow runs and run manifests. -- Explicit update and delete behavior for artifacts and relationships. -- Change records that make artifact evolution inspectable. -- Query support for identifiers, names, digests, metadata, content text, and - relationships. -- Local-first SQLite backend with a repository interface that can later be - backed by service, PostgreSQL, graph, or object-storage adapters. -- Tests proving data survives repository re-instantiation. - -Out of scope: - -- Memory-phase lifecycle behavior from `phase-memory`. -- Vector search, embedding storage, and memory activation planning. -- Markdown parsing or markdown transformations; use `markitect-tool` adapters. -- LLM provider execution; use future `llm-connect` adapters. -- Remote multi-tenant deployment concerns beyond schema choices that do not - block later migration. - -## P4.1 - Finalize persistence boundary and ADR - -```task -id: KONT-WP-0004-T001 -status: todo -priority: high -state_hub_task_id: "6b665ab1-cc8e-473b-824a-d953b598bb72" -``` - -Promote the storage decision from deferred to explicit: local-first SQLite for -the first durable backend, wrapped by repository contracts. Decide whether the -implementation uses direct `sqlite3` or SQLAlchemy for this slice. - -Output: update `docs/stack-decision.md` or add an ADR under `docs/`. - -Acceptance: - -- The backend choice is explicit and justified. -- The decision references `docs/phase-memory-boundary.md`. -- Future service-backed storage remains possible. - -## P4.2 - Complete repository contract semantics - -```task -id: KONT-WP-0004-T002 -status: todo -priority: high -state_hub_task_id: "eed4f0b5-9080-4c76-9ae6-841459edbab6" -``` - -Extend `KnowledgeRepository` from create/list/get into a durable lifecycle -contract. Define update, delete, revision, and transaction semantics without -binding callers to a specific backend. - -Output: `src/kontextual_engine/storage.py` and focused tests. - -Acceptance: - -- Artifact update produces explicit revision/change semantics. -- Artifact delete behavior is defined for relationships and query results. -- Duplicate-name and referential-integrity behavior remains deterministic. -- Existing in-memory tests continue to pass. - -## P4.3 - Design durable schema and migrations - -```task -id: KONT-WP-0004-T003 -status: todo -priority: high -state_hub_task_id: "7f34e36f-4e9b-40ab-bbe9-afaee4553a9f" -``` - -Create the first durable schema for collections, artifacts, revisions, -relationships, workflow runs, run manifests, and change records. - -Output: schema/migration files under `src/kontextual_engine/storage/` or an -equivalent package-owned location. - -Acceptance: - -- Schema stores content digest, artifact type, size, metadata, timestamps, and - provenance. -- Relationships enforce valid source and target artifacts. -- JSON metadata is preserved roundtrip. -- Migrations can initialize an empty local database deterministically. - -## P4.4 - Implement SQLite repository backend - -```task -id: KONT-WP-0004-T004 -status: todo -priority: high -state_hub_task_id: "6d20a457-7246-4380-943f-c6d726506356" -``` - -Implement `SQLiteKnowledgeRepository` behind the same repository contract used -by the in-memory backend. - -Output: durable repository implementation and tests. - -Acceptance: - -- Collections, artifacts, relationships, and metadata survive closing and - reopening the repository. -- Query behavior matches the in-memory repository for supported filters. -- Tests cover duplicate artifact names, missing relationship endpoints, and - deterministic ordering. -- No markdown or memory-runtime logic is introduced. - -## P4.5 - Persist artifact evolution - -```task -id: KONT-WP-0004-T005 -status: todo -priority: high -state_hub_task_id: "e4e6f188-9ac3-4daf-9633-f11d812e50fa" -``` - -Add artifact revision and change-record support so persistent knowledge can be -versioned and audited over time. - -Output: model additions, repository methods, and tests. - -Acceptance: - -- Updating an artifact records old and new digests. -- Revision history can be retrieved by artifact id. -- Deletion is traceable through a change record. -- Change records are backend-neutral at the programmatic API boundary. - -## P4.6 - Persist workflow run state - -```task -id: KONT-WP-0004-T006 -status: todo -priority: medium -state_hub_task_id: "d0a9e9d4-12eb-406b-b32c-5b45f931f18c" -``` - -Persist `OperationRun`, `WorkflowStep`, `InputBundle`, and `RunManifest` -records so orchestration can resume and inspect prior execution. - -Output: repository methods and persistence tests for workflow records. - -Acceptance: - -- Run status transitions survive repository re-instantiation. -- Run manifests roundtrip with inputs, outputs, diagnostics, and timestamps. -- Artifact outputs can be linked to producing runs. - -## P4.7 - Add context and phase-memory reference hooks - -```task -id: KONT-WP-0004-T007 -status: todo -priority: medium -state_hub_task_id: "965738a5-9538-45f6-98bb-7987aba62904" -``` - -Add lightweight persistence for context-package references and external memory -references without implementing memory lifecycle behavior. - -Output: context reference model and tests. - -Acceptance: - -- Context packages can refer to artifacts, relationships, runs, and external - memory records. -- External memory references are opaque and provenance-tagged. -- No retention, decay, compaction, activation planning, or preference-memory - behavior is added to this repo. - -## P4.8 - Add import/export and smoke verification - -```task -id: KONT-WP-0004-T008 -status: todo -priority: medium -state_hub_task_id: "ea7313ce-fb1f-49b1-b5da-66a036893a04" -``` - -Provide a deterministic import/export path for repository snapshots so early -users and agents can inspect or migrate local state. - -Output: programmatic snapshot helpers and tests. - -Acceptance: - -- A repository snapshot can be exported and imported into a fresh backend. -- Imported data preserves ids, digests, metadata, relationships, revisions, and - run links. -- Snapshot format does not become a replacement for the service API. - -## Definition Of Done - -- `python3 -m pytest` passes. -- Existing in-memory behavior remains compatible unless explicitly revised. -- SQLite-backed tests prove durable behavior across repository - re-instantiation. -- Persistence docs explain what is durable now and what remains deferred. -- `docs/phase-memory-boundary.md` remains the boundary for memory-specific - behavior. diff --git a/workplans/KONT-WP-0004-knowledge-operations-architecture.md b/workplans/KONT-WP-0004-knowledge-operations-architecture.md new file mode 100644 index 0000000..d24ed02 --- /dev/null +++ b/workplans/KONT-WP-0004-knowledge-operations-architecture.md @@ -0,0 +1,200 @@ +--- +id: KONT-WP-0004 +type: workplan +title: "Knowledge Operations Architecture Rebase" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 4 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "e177f2dc-a2a0-41a4-b5cd-82e8f9f12f34" +--- + +# KONT-WP-0004: Knowledge Operations Architecture Rebase + +## Purpose + +Rebase the implementation roadmap around the V0.2 product vision: +`kontextual-engine` as a headless knowledge operations engine for making +heterogeneous information assets persistent, contextual, governed, retrievable, +transformable, and agent-operable. + +This workplan supersedes the earlier persistence-only interpretation of +`KONT-WP-0004`. Durable persistence remains required, but it must be designed +with asset identity, provenance, permissions, audit, transformation lineage, +workflow state, exportability, and agent-safe operation from the start. + +## Outputs + +- Updated scope and roadmap documentation. +- Architecture decision notes for the P0 capability baseline. +- Traceability from PRD/FRS V0.2 requirements to implementation workplans. +- Revised implementation sequence for `KONT-WP-0005` through `KONT-WP-0010`. + +## A4.1 - Reconcile implementation baseline with V0.2 vision + +```task +id: KONT-WP-0004-T001 +status: todo +priority: high +state_hub_task_id: "6b665ab1-cc8e-473b-824a-d953b598bb72" +``` + +Review the current Python package against the V0.2 PRD/FRS and identify which +existing contracts can remain, which must be renamed or expanded, and which are +now out of date. + +Acceptance: + +- Current modules are mapped to V0.2 capability areas. +- In-memory artifacts, collections, relationships, query, workflows, and + context packages are classified as reusable, replace, or defer. +- The old persistence-only roadmap is explicitly superseded. + +## A4.2 - Define canonical asset identity and representation model + +```task +id: KONT-WP-0004-T002 +status: todo +priority: high +state_hub_task_id: "eed4f0b5-9080-4c76-9ae6-841459edbab6" +``` + +Define stable knowledge asset identity, source references, source +representations, normalized representations, derived artifacts, aliases, +supersession, lifecycle state, and duplicate/re-ingestion semantics. + +Acceptance: + +- FR-001 through FR-010 have an implementation model. +- Source, normalized, and derived forms are distinct. +- Identity is independent of path, filename, backend, and representation. + +## A4.3 - Define actor permission policy and audit baseline + +```task +id: KONT-WP-0004-T003 +status: todo +priority: high +state_hub_task_id: "7f34e36f-4e9b-40ab-bbe9-afaee4553a9f" +``` + +Define the minimum actor, authorization context, policy check, sensitivity, +lifecycle, review, fail-closed, and audit event model needed for P0. + +Acceptance: + +- Human, application, automation, service, and AI-agent actors are modeled. +- Permission-aware retrieval and transformation rules are specified. +- Audit records include actor, operation, target, outcome, correlation ID, and + policy context where available. + +## A4.4 - Define provenance lineage versioning and derived artifact model + +```task +id: KONT-WP-0004-T004 +status: todo +priority: high +state_hub_task_id: "6d20a457-7246-4380-943f-c6d726506356" +``` + +Specify how source provenance, versions, content changes, metadata changes, +relationship changes, transformation runs, and derived artifacts are linked. + +Acceptance: + +- FR-080 through FR-090 and FR-140 through FR-146 are mapped to data contracts. +- Derived artifacts can explain their source assets, parameters, actor, policy, + run, and output identity. +- Restore, supersession, and re-run behavior is defined at contract level. + +## A4.5 - Define retrieval architecture and quality KPIs + +```task +id: KONT-WP-0004-T005 +status: todo +priority: high +state_hub_task_id: "e4e6f188-9ac3-4daf-9633-f11d812e50fa" +``` + +Define the first retrieval architecture: lexical search, filters, relationship +retrieval, stable pagination, snippets, citations/source-grounding, permission +checks, feedback, and KPIs. + +Acceptance: + +- FR-060 through FR-071 have an implementation path. +- MVP retrieval does not depend on vector search. +- Precision, zero-result rate, p95 latency, citation precision, and permission + fidelity are named as measurable targets. + +## A4.6 - Define workflow job and operation execution architecture + +```task +id: KONT-WP-0004-T006 +status: todo +priority: high +state_hub_task_id: "d0a9e9d4-12eb-406b-b32c-5b45f931f18c" +``` + +Define job and workflow execution boundaries for ingestion, enrichment, +validation, transformation, review, publication, archival, synchronization, +export, retries, cancellation, and exception handling. + +Acceptance: + +- FR-020 through FR-030 and FR-100 through FR-110 have job-state semantics. +- Workflow templates, runs, steps, dependencies, retries, failures, and outputs + are explicitly modeled. +- Embedded execution vs adapter-backed orchestration is decided for MVP. + +## A4.7 - Define agent-safe operation catalog and review gates + +```task +id: KONT-WP-0004-T007 +status: todo +priority: high +state_hub_task_id: "965738a5-9538-45f6-98bb-7987aba62904" +``` + +Define explicit agent operations for inspection, retrieval, metadata +enrichment, classification, transformation, workflow invocation, review +submission, dry runs, and bounded context packages. + +Acceptance: + +- FR-160 through FR-169 have API-level operation contracts. +- Agent operations cannot bypass permission, lifecycle, export, or review + policy. +- Destructive or sensitive actions can be denied, dry-run, or routed to review. + +## A4.8 - Publish roadmap traceability and update scope docs + +```task +id: KONT-WP-0004-T008 +status: todo +priority: medium +state_hub_task_id: "ea7313ce-fb1f-49b1-b5da-66a036893a04" +``` + +Update repo-local docs so humans and agents can understand the new product +shape and implementation sequence. + +Acceptance: + +- `SCOPE.md` reflects the V0.2 knowledge operations vision. +- `docs/knowledge-operations-roadmap.md` maps PRD/FRS areas to workplans. +- `README.md` points to the new research and roadmap materials. + +## Definition Of Done + +- Architecture docs clearly distinguish engine, application, connector, + provider, and domain-package responsibilities. +- Workplans `KONT-WP-0005` through `KONT-WP-0010` exist and are linked to State + Hub. +- `python3 -m pytest` passes. +- State Hub consistency passes without using the push-capable fixer. diff --git a/workplans/KONT-WP-0005-asset-registry-governance-state.md b/workplans/KONT-WP-0005-asset-registry-governance-state.md new file mode 100644 index 0000000..4f20536 --- /dev/null +++ b/workplans/KONT-WP-0005-asset-registry-governance-state.md @@ -0,0 +1,173 @@ +--- +id: KONT-WP-0005 +type: workplan +title: "Asset Registry Governance And Durable State" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 5 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "231a7794-aa3b-4763-a556-80b4cea731c8" +--- + +# KONT-WP-0005: Asset Registry Governance And Durable State + +## Purpose + +Implement the governed knowledge asset registry that underpins the V0.2 product +vision: stable asset identity, source references, source/normalized/derived +representations, metadata, classification, lifecycle state, actors, +authorization checks, audit events, versioning, and durable local-first state. + +## Requirement Coverage + +Primary: FR-001 to FR-010, FR-040 to FR-049, FR-120 to FR-126, +FR-140 to FR-145, FR-240 to FR-245. + +Supporting: FR-180 to FR-182, FR-200 to FR-201. + +## G5.1 - Implement stable asset identity and source references + +```task +id: KONT-WP-0005-T001 +status: todo +priority: high +state_hub_task_id: "7d61a11c-ca14-4075-ab0b-897bdfe57cb1" +``` + +Replace artifact-centric naming with knowledge asset identity that survives +rename, move, re-ingestion, representation changes, and transformation. + +Acceptance: + +- Assets have stable IDs, source references, source aliases, and content + digests. +- Source system, source path/URL/external ID, checksum, ingestion actor, and + ingestion time can be represented. +- Existing artifact tests are migrated or wrapped without losing deterministic + digest behavior. + +## G5.2 - Represent source normalized and derived asset forms + +```task +id: KONT-WP-0005-T002 +status: todo +priority: high +state_hub_task_id: "cd0a2b0a-a2a0-426e-8b8c-6013cd6b9303" +``` + +Introduce explicit representation records for original/source-near content, +normalized engine content, and derived artifacts. + +Acceptance: + +- Retrieval can distinguish source content from normalized content. +- Derived artifacts are stored as asset-linked records, not detached strings. +- Representation metadata includes media type, digest, size, extractor or + producer, and provenance. + +## G5.3 - Implement metadata classification lifecycle and schema validation + +```task +id: KONT-WP-0005-T003 +status: todo +priority: high +state_hub_task_id: "b06c5124-ce54-4241-b712-2fbab856877b" +``` + +Implement standard metadata, custom metadata schemas, classification, +sensitivity, lifecycle state, tags, ownership, and validation behavior. + +Acceptance: + +- Assets can be filtered by standard metadata and lifecycle state. +- Custom schema validation produces structured validation errors. +- Inferred and confirmed metadata can be distinguished for later review flows. + +## G5.4 - Implement actor authorization and policy baseline + +```task +id: KONT-WP-0005-T004 +status: todo +priority: high +state_hub_task_id: "c86e24ee-7e3f-488d-a649-d17a8689f0af" +``` + +Add actor and authorization context models for humans, applications, +automation, service accounts, and AI agents. + +Acceptance: + +- Operations accept explicit actor context. +- Role, group, sensitivity, lifecycle, source-policy, and operation type can + participate in policy checks. +- Ambiguous permission state fails closed by contract. + +## G5.5 - Implement audit events correlation IDs and structured errors + +```task +id: KONT-WP-0005-T005 +status: todo +priority: high +state_hub_task_id: "3d2e98a1-3312-452a-a5f1-f7a73234b45b" +``` + +Create audit and correctness primitives for material operations. + +Acceptance: + +- Asset create, ingest, update, delete/retire, metadata, relationship, + permission, query, transformation, workflow, export, and agent operations can + emit audit events. +- Structured errors include code, message, correlation ID, operation, and + remediation hint where practical. +- Partial failures are represented for batch operations. + +## G5.6 - Implement durable SQLite repository for registry state + +```task +id: KONT-WP-0005-T006 +status: todo +priority: high +state_hub_task_id: "de155d02-3123-42da-8ede-f111bec62747" +``` + +Implement a local-first durable backend for assets, representations, metadata, +classifications, relationships, actors, policies, audit events, and versions. + +Acceptance: + +- State survives repository re-instantiation. +- Referential integrity is enforced for assets, relationships, representations, + versions, and audit references. +- The in-memory backend remains useful for deterministic unit tests. + +## G5.7 - Implement versioning change history conflict and idempotency semantics + +```task +id: KONT-WP-0005-T007 +status: todo +priority: medium +state_hub_task_id: "5288b136-05c1-449c-9215-f8b34db8b274" +``` + +Add version and change history semantics for asset content, metadata, +relationships, policy-relevant lifecycle state, and repeated requests. + +Acceptance: + +- Updates create traceable change records. +- Restore creates a new auditable change rather than erasing history. +- Idempotency keys and conflict detection prevent unintended duplicate or stale + writes where harmful. + +## Definition Of Done + +- Asset lifecycle tests cover create, retrieve, update, retire, delete request, + metadata changes, permission checks, audit events, and durable reload. +- New models map to the V0.2 FRS vocabulary. +- `python3 -m pytest` passes. diff --git a/workplans/KONT-WP-0006-multi-format-ingestion-normalization.md b/workplans/KONT-WP-0006-multi-format-ingestion-normalization.md new file mode 100644 index 0000000..fc37cbd --- /dev/null +++ b/workplans/KONT-WP-0006-multi-format-ingestion-normalization.md @@ -0,0 +1,171 @@ +--- +id: KONT-WP-0006 +type: workplan +title: "Multi-Format Ingestion And Normalization" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 6 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "270c83c0-eaed-4143-99d0-bb3fcfd23758" +--- + +# KONT-WP-0006: Multi-Format Ingestion And Normalization + +## Purpose + +Implement ingestion as an observable, retryable, provenance-preserving job +system that can bring heterogeneous information assets into the engine and +normalize them into a common representation for retrieval, metadata, +relationships, transformations, workflows, and agent context. + +## Requirement Coverage + +Primary: FR-020 to FR-030. + +Supporting: FR-001 to FR-008, FR-022 to FR-028, FR-200 to FR-202, +FR-240 to FR-244. + +## I6.1 - Implement ingestion job model status and retry surface + +```task +id: KONT-WP-0006-T001 +status: todo +priority: high +state_hub_task_id: "8e5e514a-6eef-42d9-a93c-2458b4c82753" +``` + +Define ingestion jobs that support queued, running, completed, failed, +partially completed, retried, quarantined, and canceled states. + +Acceptance: + +- Ingestion requests return job IDs and correlation IDs. +- Job status exposes input, actor, source reference, output assets, failures, + retry options, and partial results. +- Failed ingestion does not silently enter the trusted asset set. + +## I6.2 - Implement connector and extractor contracts + +```task +id: KONT-WP-0006-T002 +status: todo +priority: high +state_hub_task_id: "3eafdab5-478d-49d9-a17f-3cd7c8847cb1" +``` + +Define source connector and format extractor protocols that can provide source +references, metadata, permission context, content streams, and normalized +outputs. + +Acceptance: + +- Connectors can describe capabilities and supported source types. +- Extractors can describe supported media types and extraction depth. +- External extraction results can be accepted with provenance. + +## I6.3 - Implement local file and directory ingestion + +```task +id: KONT-WP-0006-T003 +status: todo +priority: high +state_hub_task_id: "d3e3d4d2-a581-4438-bee7-6fc4161d3925" +``` + +Create the first concrete source connector for local files and directories. + +Acceptance: + +- Local files can be ingested as source-referenced knowledge assets. +- Directory ingestion reports per-file success, skip, failure, and retry state. +- File path changes can be represented without changing stable asset identity + when identity policy permits. + +## I6.4 - Implement text and markdown normalization via markitect-tool adapter + +```task +id: KONT-WP-0006-T004 +status: todo +priority: high +state_hub_task_id: "63bf2f7e-705d-40ae-a160-75fc508ffb1f" +``` + +Normalize plain text directly and markdown through `markitect-tool` adapter +boundaries, without reimplementing markdown syntax primitives here. + +Acceptance: + +- Plain text produces normalized text representation and source provenance. +- Markdown extraction delegates to `markitect-tool` when available. +- Missing adapter dependencies fail with structured adapter errors. + +## I6.5 - Implement PDF office document and dataset baseline adapters + +```task +id: KONT-WP-0006-T005 +status: todo +priority: high +state_hub_task_id: "04d7c4b0-abfd-4b14-892f-91d1c1a820cd" +``` + +Provide baseline ingestion adapters for PDFs, office-like documents, and +structured datasets using optional dependencies or adapter stubs with explicit +capability reporting. + +Acceptance: + +- Baseline formats can be represented as knowledge assets. +- Unsupported extraction depth is reported explicitly. +- CSV or table-like datasets produce structured normalized output. + +## I6.6 - Extract structural elements into common normalized representation + +```task +id: KONT-WP-0006-T006 +status: todo +priority: medium +state_hub_task_id: "7421bc87-d962-4938-9aa3-591f8489e542" +``` + +Represent titles, sections, headings, paragraphs, tables, links, embedded +references, fields, and confidence signals where extractors can recover them. + +Acceptance: + +- Normalized representation supports text, structure, tables, links, and + extractor metadata. +- Structural output can feed search, snippets, transformations, and context + packages. +- Extractor confidence and unsupported elements are visible. + +## I6.7 - Validate ingestion output quarantine failures and preserve provenance + +```task +id: KONT-WP-0006-T007 +status: todo +priority: medium +state_hub_task_id: "07b32021-3701-437a-ae87-030bed56a25c" +``` + +Validate normalized content, required metadata, source provenance, permissions, +and policy constraints before ingestion completion. + +Acceptance: + +- Invalid output is quarantined or failed with structured diagnostics. +- Re-ingestion preserves identity, provenance, permissions, versions, and + relationships where policy allows. +- Batch ingestion reports succeeded, failed, skipped, quarantined, and retriable + items separately. + +## Definition Of Done + +- Local file, text, markdown, PDF/document placeholder, and dataset ingestion + scenarios are covered by tests. +- Job status and provenance are inspectable through programmatic APIs. +- `python3 -m pytest` passes. diff --git a/workplans/KONT-WP-0007-governed-retrieval-context-graph.md b/workplans/KONT-WP-0007-governed-retrieval-context-graph.md new file mode 100644 index 0000000..e757811 --- /dev/null +++ b/workplans/KONT-WP-0007-governed-retrieval-context-graph.md @@ -0,0 +1,170 @@ +--- +id: KONT-WP-0007 +type: workplan +title: "Governed Retrieval And Context Graph" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 7 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "64352515-9677-46bb-909a-9e2db4915dc7" +--- + +# KONT-WP-0007: Governed Retrieval And Context Graph + +## Purpose + +Build retrieval as a governed operational capability: stable query contracts, +text search, metadata and lifecycle filtering, contextual entities, +relationship traversal, source-grounded snippets, permission checks, and +quality feedback. + +## Requirement Coverage + +Primary: FR-040 to FR-050 and FR-060 to FR-071. + +Supporting: FR-120 to FR-126, FR-143 to FR-146, FR-163, FR-200 to FR-204. + +## R7.1 - Implement query contracts pagination sorting and result envelopes + +```task +id: KONT-WP-0007-T001 +status: todo +priority: high +state_hub_task_id: "5a1b0661-ce22-4ee6-a9e7-0aedce9d4356" +``` + +Define query requests, result envelopes, deterministic pagination, sorting, +diagnostics, and correlation IDs. + +Acceptance: + +- Repeated equivalent queries return stable ordering within documented limits. +- Results include asset IDs, representation references, metadata, source + references, and diagnostics. +- Invalid queries return structured validation errors. + +## R7.2 - Implement lexical search over normalized content + +```task +id: KONT-WP-0007-T002 +status: todo +priority: high +state_hub_task_id: "5ec90dcb-473c-4d01-85f2-8db18de0b7d1" +``` + +Implement MVP lexical search over normalized representations without making +semantic/vector search a blocker. + +Acceptance: + +- Text search returns matching assets with relevance metadata. +- Search indexes can be refreshed after ingestion or update. +- p95 latency and zero-result rate can be measured in smoke tests. + +## R7.3 - Implement metadata lifecycle and source-context filters + +```task +id: KONT-WP-0007-T003 +status: todo +priority: high +state_hub_task_id: "9e7d0a5c-71d4-44ca-9b71-70f2206e4a02" +``` + +Support filters by asset type, collection, source, owner, tags, +classification, sensitivity, lifecycle state, timestamps, and custom metadata. + +Acceptance: + +- Text search and metadata filters can be combined. +- Lifecycle and sensitivity filters participate in permission checks. +- Filter behavior is covered across in-memory and durable backends where + supported. + +## R7.4 - Implement contextual entity model and relationship retrieval + +```task +id: KONT-WP-0007-T004 +status: todo +priority: high +state_hub_task_id: "b3358059-ac58-4e37-985c-6e8c1cc6df30" +``` + +Represent contextual entities such as people, teams, projects, cases, topics, +source systems, processes, products, and generated artifacts. + +Acceptance: + +- Assets can be linked to contextual entities. +- Relationship direction, type, validity, confidence, actor, and provenance are + represented where available. +- Callers can retrieve assets by project, case, topic, source, workflow run, or + related asset. + +## R7.5 - Enforce permission-aware retrieval and fail-closed semantics + +```task +id: KONT-WP-0007-T005 +status: todo +priority: high +state_hub_task_id: "c6c93713-3ab1-41fb-bf35-15dd860b66fa" +``` + +Apply authorization and policy checks before returning content, metadata, +snippets, relationships, derived artifacts, or context packages. + +Acceptance: + +- Unauthorized assets do not leak through result lists, snippets, relationship + traversal, or derived answer packages. +- Missing or stale permission context fails closed according to policy. +- Retrieval audit events capture actor, query scope, outcome, and policy + context. + +## R7.6 - Return source-grounded snippets citations and explanation data + +```task +id: KONT-WP-0007-T006 +status: todo +priority: medium +state_hub_task_id: "1a6d5a95-d87a-447a-a186-cb73162cd9a1" +``` + +Return matched regions, snippets, source references, representation IDs, +relationship context, and citation-ready data for grounded AI workflows. + +Acceptance: + +- Results explain why they were returned and where they originated. +- Snippets are permission filtered. +- Retrieval packages are suitable for later grounded answer generation. + +## R7.7 - Capture retrieval feedback and KPI measurement hooks + +```task +id: KONT-WP-0007-T007 +status: todo +priority: medium +state_hub_task_id: "e17e2839-400f-4348-98e3-f77acc0b2fde" +``` + +Capture relevance feedback and quality signals for retrieval improvement. + +Acceptance: + +- Feedback can mark results useful, irrelevant, missing, unsafe, or low + confidence. +- Query context and result metadata are stored with feedback. +- Precision@k, zero-result rate, permission-filter latency, and citation + precision have measurement hooks. + +## Definition Of Done + +- Retrieval tests cover text, metadata, lifecycle, relationship, contextual + entity, pagination, permission, snippet, and feedback behavior. +- Retrieval does not bypass policy or source provenance. +- `python3 -m pytest` passes. diff --git a/workplans/KONT-WP-0008-transformations-workflow-jobs.md b/workplans/KONT-WP-0008-transformations-workflow-jobs.md new file mode 100644 index 0000000..8313a9c --- /dev/null +++ b/workplans/KONT-WP-0008-transformations-workflow-jobs.md @@ -0,0 +1,170 @@ +--- +id: KONT-WP-0008 +type: workplan +title: "Traceable Transformations And Workflow Jobs" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 8 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "1b7a6b04-7879-4862-bb3e-817f7f20fc59" +--- + +# KONT-WP-0008: Traceable Transformations And Workflow Jobs + +## Purpose + +Implement the operations layer that turns knowledge assets into traceable +outputs: transformation operations, derived artifacts, workflow templates, +workflow runs, job execution state, retries, cancellation, review gates, +exception queues, and operation audit. + +## Requirement Coverage + +Primary: FR-080 to FR-090 and FR-100 to FR-110. + +Supporting: FR-083 to FR-085, FR-106, FR-144 to FR-145, FR-165, +FR-200 to FR-202. + +## O8.1 - Implement transformation operation registry + +```task +id: KONT-WP-0008-T001 +status: todo +priority: high +state_hub_task_id: "ee2471b1-fab3-48f5-8b2d-d8f624abfc35" +``` + +Create a registry for transformation operations such as summarize, extract, +classify, compose, validate, generate report, and produce structured view. + +Acceptance: + +- Operations declare inputs, outputs, parameters, required permissions, and + supported asset types. +- Provider-specific LLM behavior remains behind adapters. +- Unsupported operations return structured capability errors. + +## O8.2 - Implement transformation runs with parameters actors and policy context + +```task +id: KONT-WP-0008-T002 +status: todo +priority: high +state_hub_task_id: "1eac7b47-8cff-4736-9f7d-599123218bad" +``` + +Represent each transformation as a run with source assets, source versions, +operation type, parameters, actor, policy context, timestamps, and status. + +Acceptance: + +- Transformations can be queued, run, completed, failed, retried, or canceled. +- Transformation permissions are checked before reading sources or writing + outputs. +- Parameters needed to interpret or reproduce the run are preserved. + +## O8.3 - Persist derived artifacts and source lineage + +```task +id: KONT-WP-0008-T003 +status: todo +priority: high +state_hub_task_id: "837ad793-2e9a-41f0-bce6-0a75815b5c15" +``` + +Persist summaries, extracts, reports, structured representations, generated +artifacts, and composed outputs as governed derived artifacts. + +Acceptance: + +- Derived artifacts have stable identity and lineage to source assets. +- Lineage includes transformation run, source versions, actor, parameters, + policy context, and output representation. +- Re-runs create new traceable records rather than silently overwriting outputs. + +## O8.4 - Implement workflow templates steps dependencies and preconditions + +```task +id: KONT-WP-0008-T004 +status: todo +priority: high +state_hub_task_id: "2c55c5dd-f07b-466b-85a5-f229e41fd124" +``` + +Define reusable workflow templates containing steps, dependencies, inputs, +outputs, preconditions, policy checks, and failure behavior. + +Acceptance: + +- Templates can be created and invoked programmatically. +- Step dependencies prevent unsafe or premature execution. +- Workflow inputs can be assets, collections, queries, source events, or + submitted payloads. + +## O8.5 - Implement job runner status retry resume and cancel behavior + +```task +id: KONT-WP-0008-T005 +status: todo +priority: high +state_hub_task_id: "5f4d6c88-904d-4369-90d5-eaa4d27e3010" +``` + +Implement a simple MVP job runner for workflows and transformations. + +Acceptance: + +- Runs expose queued, running, waiting, completed, failed, retried, canceled, + and partially completed states. +- Safe retry, resume, and cancellation behavior is defined per operation. +- Recovery actions do not require direct storage edits. + +## O8.6 - Implement review gates human tasks and exception queues + +```task +id: KONT-WP-0008-T006 +status: todo +priority: medium +state_hub_task_id: "5fae9005-4d64-4fca-8c51-a19405512377" +``` + +Add workflow primitives for review, approval, correction, rejection, +low-confidence handling, policy conflicts, and blocked exceptions. + +Acceptance: + +- Sensitive or high-impact outputs can pause for human review. +- Exception queues expose failed, blocked, low-confidence, policy-conflicted, + or review-required items. +- Review decisions continue, reject, correct, retry, or escalate runs. + +## O8.7 - Audit workflow and transformation operations + +```task +id: KONT-WP-0008-T007 +status: todo +priority: medium +state_hub_task_id: "9e06aa46-3988-4389-99ec-0a934c68af1b" +``` + +Audit template changes, run starts, step executions, retries, cancellations, +approvals, failures, outputs, and derived artifact changes. + +Acceptance: + +- A workflow run can be reconstructed from run records and audit events. +- Audit records include actor, operation, target, outcome, correlation ID, and + policy context. +- Derived artifact audit events connect to source lineage. + +## Definition Of Done + +- Transformations and workflows produce inspectable run records and audit + events. +- Derived artifacts are persistent, governed, and lineage-linked. +- `python3 -m pytest` passes. diff --git a/workplans/KONT-WP-0009-service-api-agent-safe-operation.md b/workplans/KONT-WP-0009-service-api-agent-safe-operation.md new file mode 100644 index 0000000..4cd8e9d --- /dev/null +++ b/workplans/KONT-WP-0009-service-api-agent-safe-operation.md @@ -0,0 +1,172 @@ +--- +id: KONT-WP-0009 +type: workplan +title: "Service API And Agent-Safe Operation" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 9 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "6e672b1a-2e57-489e-8516-cb75611d4354" +--- + +# KONT-WP-0009: Service API And Agent-Safe Operation + +## Purpose + +Expose the engine through versioned service APIs and explicit agent-safe +operations. This workplan turns the programmatic contracts into a headless +service surface for assets, metadata, relationships, ingestion, retrieval, +transformations, workflows, permissions, audit, context packages, and bounded +agent actions. + +## Requirement Coverage + +Primary: FR-160 to FR-169 and FR-180 to FR-188. + +Supporting: FR-060 to FR-066, FR-080 to FR-085, FR-100 to FR-106, +FR-120 to FR-126, FR-200 to FR-202, FR-240 to FR-245. + +## S9.1 - Implement versioned FastAPI service skeleton and health contracts + +```task +id: KONT-WP-0009-T001 +status: todo +priority: high +state_hub_task_id: "bdb2380e-4ea1-4b8c-a6c9-fc8da2122813" +``` + +Add the first optional FastAPI service layer while keeping core behavior in +programmatic contracts. + +Acceptance: + +- Service startup, health, readiness, version, and OpenAPI output are tested. +- Service code wraps core contracts rather than becoming the architecture. +- API versioning policy is documented for MVP. + +## S9.2 - Expose asset metadata relationship audit and policy APIs + +```task +id: KONT-WP-0009-T002 +status: todo +priority: high +state_hub_task_id: "a37e5ba3-e128-4100-b22c-c85cca3f8db3" +``` + +Expose service APIs for asset lifecycle, metadata, classifications, +relationships, policies, permissions, lifecycle state, and audit events. + +Acceptance: + +- Core asset operations are available without a CLI or UI. +- Permission and policy checks run before protected operations. +- Audit history can be queried by authorized callers. + +## S9.3 - Expose ingestion retrieval transformation and workflow APIs + +```task +id: KONT-WP-0009-T003 +status: todo +priority: high +state_hub_task_id: "7271b26d-0dbb-4eca-9140-a7729ad296e4" +``` + +Expose APIs for ingestion jobs, query/retrieval, transformations, derived +artifacts, workflow templates, workflow runs, and job recovery actions. + +Acceptance: + +- Jobs return IDs, state, outputs, failures, retry options, and correlation + IDs. +- Retrieval results are permission-aware and source-grounded. +- Transformations and workflows expose lineage and audit references. + +## S9.4 - Implement actor context delegation and authorization middleware + +```task +id: KONT-WP-0009-T004 +status: todo +priority: high +state_hub_task_id: "7becdec7-ddbb-497f-b762-77043e16046e" +``` + +Implement request-level actor context for human users, applications, +automation, service accounts, delegated users, and AI agents. + +Acceptance: + +- Every material service operation has actor context. +- Delegation and agent identity are represented explicitly. +- Authorization failures do not leak protected content in errors or result + shapes. + +## S9.5 - Implement bounded agent operation catalog + +```task +id: KONT-WP-0009-T005 +status: todo +priority: high +state_hub_task_id: "fc9e1def-229c-4224-8fd3-6fd4f9785c27" +``` + +Define and expose explicit agent operations for inspect, search, retrieve, +assemble context, enrich metadata, classify, transform, invoke workflow, submit +review, and report result. + +Acceptance: + +- Agents can only act through documented operations. +- Each operation declares inputs, outputs, permission requirements, audit + behavior, and failure modes. +- Agent operations are auditable separately from human and deterministic + automation actions. + +## S9.6 - Implement context package API with policy constraints + +```task +id: KONT-WP-0009-T006 +status: todo +priority: medium +state_hub_task_id: "9ff1d345-d0a1-46eb-ae9a-f6beba2fa5e9" +``` + +Provide bounded context packages containing selected assets, snippets, +metadata, relationships, provenance, task instructions, and policy constraints. + +Acceptance: + +- Context packages do not require unrestricted repository access. +- Package contents are source-grounded and permission filtered. +- External memory references remain opaque and respect + `docs/phase-memory-boundary.md`. + +## S9.7 - Implement dry-run review-gate and contract-test coverage + +```task +id: KONT-WP-0009-T007 +status: todo +priority: medium +state_hub_task_id: "bbbdec75-d3c0-4367-b073-ef9c5dffa2b7" +``` + +Add dry-run and review-gate behavior for destructive, sensitive, externally +published, or high-impact service and agent operations. + +Acceptance: + +- Risky actions can be denied, dry-run, or routed to review. +- Contract tests cover API errors, authorization failures, review-required + responses, and partial failures. +- OpenAPI output remains stable for implemented endpoints. + +## Definition Of Done + +- The service API exposes the MVP operation surface without requiring UI. +- Agent-safe operations are explicit, bounded, permissioned, auditable, and + reviewable. +- `python3 -m pytest` passes. diff --git a/workplans/KONT-WP-0010-observability-export-enterprise-readiness.md b/workplans/KONT-WP-0010-observability-export-enterprise-readiness.md new file mode 100644 index 0000000..626cb13 --- /dev/null +++ b/workplans/KONT-WP-0010-observability-export-enterprise-readiness.md @@ -0,0 +1,179 @@ +--- +id: KONT-WP-0010 +type: workplan +title: "Observability Export And Enterprise Readiness" +domain: markitect +repo: kontextual-engine +status: todo +owner: codex +topic_slug: markitect +planning_priority: high +planning_order: 10 +created: "2026-05-05" +updated: "2026-05-05" +state_hub_workstream_id: "09d769a5-a3cf-4cdf-ae5e-b4ecf767f109" +--- + +# KONT-WP-0010: Observability Export And Enterprise Readiness + +## Purpose + +Add the operational surfaces that make the engine inspectable, recoverable, +portable, measurable, and ready for enterprise-oriented expansion: metrics, +events, job inspection, recovery actions, governed export packages, governance +inspection, extension hooks, backend abstraction readiness, quality signals, +cost signals, and MVP compliance reporting. + +## Requirement Coverage + +Primary: FR-200 to FR-207 and FR-220 to FR-225. + +Supporting: FR-183 to FR-188, FR-127 to FR-132, FR-070, FR-166 to FR-168, +FR-240 to FR-245. + +## E10.1 - Expose operational metrics events and job inspection + +```task +id: KONT-WP-0010-T001 +status: todo +priority: high +state_hub_task_id: "ce6cfbc4-b171-4f03-a27b-c46abbde85a0" +``` + +Expose operational telemetry for ingestion, retrieval, indexing, +transformations, workflow jobs, permissions, audit, exports, and service +health. + +Acceptance: + +- Operators can inspect current and historical job state. +- Metrics include ingestion throughput, query latency, API latency, workflow + completion, failure rate, queue age, and storage/index health. +- Events use correlation IDs that line up with audit records. + +## E10.2 - Implement administrative recovery actions + +```task +id: KONT-WP-0010-T002 +status: todo +priority: high +state_hub_task_id: "8f0ead65-79be-42e3-8ec8-43d146bb3934" +``` + +Provide authorized recovery actions for retry, re-run, re-index, cancel, +quarantine, repair, and failure inspection. + +Acceptance: + +- Recovery actions enforce permissions and audit events. +- Common ingestion, indexing, workflow, and transformation failures are + recoverable without direct database edits. +- Partial failure reports remain available after recovery. + +## E10.3 - Implement export packages manifests and integrity validation + +```task +id: KONT-WP-0010-T003 +status: todo +priority: high +state_hub_task_id: "54ed199f-636e-4cfd-898f-fd6ad0057b61" +``` + +Implement governed export packages for assets, normalized representations, +metadata, relationships, provenance, versions, audit references, and derived +artifacts. + +Acceptance: + +- Exports can be scoped by asset ID, collection, query, workflow run, source + system, lifecycle state, date range, or governance policy. +- Export manifests include schema version, counts, hashes, actor, time, and + policy context. +- Export validation can detect missing records or integrity mismatches. + +## E10.4 - Implement governance inspection and reporting hooks + +```task +id: KONT-WP-0010-T004 +status: todo +priority: medium +state_hub_task_id: "c62c5f36-30d9-4469-90cf-5dc3d37588ba" +``` + +Expose governance inspection for permission coverage, policy gaps, stale +permissions, missing metadata, lifecycle exceptions, access anomalies, retention +coverage, legal holds, and audit completeness. + +Acceptance: + +- Governance reports can be generated for selected scopes. +- Reports identify under-classified, overexposed, stale, held, or + policy-conflicted assets. +- Reporting respects authorization and redaction policy. + +## E10.5 - Implement extension events webhooks and backend abstraction readiness + +```task +id: KONT-WP-0010-T005 +status: todo +priority: medium +state_hub_task_id: "f1713b41-0535-47fc-ba7e-054aea93f8cf" +``` + +Prepare the extension surface for source adapters, extractors, +transformations, validators, policy modules, webhooks, events, and backend +swapping. + +Acceptance: + +- Extension points are documented and covered by contract tests. +- Events can be emitted for asset changes, ingestion completion, workflow + status, policy exceptions, derived artifact creation, and review decisions. +- Storage, index, queue, workflow, AI, and model backend abstractions remain + externally semantic-preserving. + +## E10.6 - Capture retrieval AI cost and quality signals + +```task +id: KONT-WP-0010-T006 +status: todo +priority: medium +state_hub_task_id: "1d36035a-b211-49e9-935c-382d52aa3639" +``` + +Capture retrieval quality, AI operation, and cost signals where available. + +Acceptance: + +- Retrieval metrics include precision hooks, zero-result rate, low-confidence + result rate, and feedback counts. +- AI usage can record model calls, token or compute usage, provider errors, and + estimated operation cost where adapters provide them. +- Signals can be attributed to assets, workflows, agents, applications, and + actors. + +## E10.7 - Add performance smoke tests and MVP compliance report + +```task +id: KONT-WP-0010-T007 +status: todo +priority: medium +state_hub_task_id: "057c7bcf-f224-4d9f-9161-6bfff4948e95" +``` + +Create smoke tests and a compliance report against the V0.2 MVP acceptance +perspective. + +Acceptance: + +- Smoke tests measure representative ingestion, query, workflow, and export + behavior. +- MVP compliance report maps implemented behavior to FRS P0 requirements. +- Remaining P1/P2 gaps are explicit and prioritized. + +## Definition Of Done + +- Operators can inspect, diagnose, recover, export, and evaluate MVP engine + behavior through supported surfaces. +- Export packages preserve enough context for inspection and migration. +- `python3 -m pytest` passes.