16 KiB
Architecture Blueprint
Date: 2026-05-05
Status: planning baseline for the V0.2 knowledge operations roadmap.
This blueprint defines the target architecture for kontextual-engine as a
headless knowledge operations engine. It should guide implementation workplans
without freezing every internal detail too early.
Architectural Aim
kontextual-engine should make heterogeneous information assets durable,
contextual, governed, retrievable, transformable, and agent-operable through a
clean backend architecture.
The architecture should optimize for:
- stable knowledge asset identity,
- explicit source provenance,
- source, normalized, and derived representations,
- governed retrieval and transformation,
- traceable workflow and job execution,
- auditability and structured errors,
- provider-neutral adapters,
- agent-safe operations,
- exportability and operational visibility.
Non-Negotiable Rules
- Core behavior lives in domain and application services, not in HTTP routes.
- All material operations accept explicit actor and operation context.
- Permission and policy checks happen before content exposure or mutation.
- Source content, normalized representations, and derived artifacts remain distinct.
- Derived artifacts preserve lineage to sources, versions, parameters, actor, policy context, and operation run.
- Audit records are emitted for material operations by default.
- External systems are reached through ports and adapters.
- Agent operations are explicit catalog entries, not unrestricted internal method access.
- The service API wraps stable contracts; it does not define the domain model.
- The MVP can use local-first backends, but contracts must not assume one storage, search, workflow, AI, or policy provider.
System Shape
clients, apps, workflows, agents
-> service API / SDK / operation catalog
-> application services
-> domain core
-> ports
-> adapters and infrastructure
The dependency direction should point inward:
api adapters -> application services -> domain core <- repository/search/workflow ports
^
|
infrastructure adapters
No domain model should import FastAPI, SQLite, HTTP clients, LLM providers, document parsing libraries, or source-system SDKs directly.
Package Layout
The current flat package can evolve incrementally into this shape:
src/kontextual_engine/
core/
assets.py stable asset identity and representations
metadata.py metadata, classification, lifecycle, schemas
relationships.py typed relationships and contextual entities
provenance.py source refs, lineage, versions, changes
actors.py actors, delegated actors, operation context
policy.py policy inputs, decisions, review requirements
audit.py audit events and correlation IDs
errors.py structured errors and diagnostics
services/
asset_service.py create/update/retire assets
ingestion_service.py submit and complete ingestion jobs
retrieval_service.py search, filter, snippets, context retrieval
transform_service.py transformation runs and derived artifacts
workflow_service.py templates, runs, steps, retries, exceptions
agent_service.py bounded agent operation catalog
export_service.py export packages and validation
ports/
repositories.py asset, audit, run, export repositories
object_store.py source/normalized/derived content storage
search.py lexical/semantic search index port
extractors.py format extraction and normalization port
connectors.py source connector port
policy.py authorization and policy decision port
events.py event publisher and webhook port
ai.py provider-neutral AI/model operation port
adapters/
memory/ deterministic in-memory test adapters
sqlite/ local-first durable repository
local_files/ local file/directory connector
markitect_tool/ markdown syntax adapter
builtin_extractors/ text/csv/simple document extraction
api/
app.py FastAPI app factory
routes/ versioned HTTP routes
schemas/ API request/response DTOs
Migration can be gradual. Existing modules can be retained temporarily as compatibility facades while new code moves into the layered structure.
Domain Core
The domain core should be deterministic, import-light, and usable without a running service.
Primary entities:
| Entity | Responsibility |
|---|---|
KnowledgeAsset |
Stable asset identity and current operational state. |
SourceReference |
Origin information: source system, path, URL, external ID, checksum, connector reference. |
AssetRepresentation |
Source, normalized, or derived content form with media type, digest, size, producer, and storage reference. |
AssetVersion |
Traceable version of content, metadata, relationships, lifecycle, or derived output. |
MetadataRecord |
Standard and custom metadata with provenance and confidence where relevant. |
Classification |
Type, topic, sensitivity, lifecycle, operational category, review state. |
ContextEntity |
Person, team, project, case, customer, product, process, source system, topic, or business object. |
Relationship |
Typed link between assets or between an asset and contextual entity. |
Actor |
Human, application, automation, service account, or AI agent identity. |
OperationContext |
Actor, delegated identity, correlation ID, request scope, policy scope, and operation metadata. |
PolicyDecision |
Allow, deny, redact, require review, dry-run only, or fail-closed result. |
AuditEvent |
Material operation record with actor, target, operation, outcome, correlation ID, policy context. |
IngestionJob |
Observable ingestion request and status. |
TransformationRun |
Traceable operation over assets producing derived artifacts. |
WorkflowTemplate |
Reusable workflow definition with steps, dependencies, inputs, outputs, and policy. |
WorkflowRun |
Executed workflow instance and step state. |
ExportPackage |
Governed export with manifest, integrity data, and selected records. |
The old Artifact vocabulary can map to KnowledgeAsset and
AssetRepresentation. The old Collection vocabulary remains useful as an
organizational container, but assets should eventually support multiple
collections/scopes where needed.
Application Services
Application services coordinate domain rules and ports. They should be thin but not anemic; this is where operation ordering, policy checks, audit emission, and repository updates meet.
Every material service method should follow this pattern:
validate input
resolve actor and operation context
load required state
authorize through policy port
perform deterministic domain change or submit job
persist changes
emit audit and events
return typed result or structured error
Suggested service boundaries:
AssetService: create, retrieve, update, retire, delete request, metadata, classification, versioning, relationship changes.IngestionService: submit ingestion, run extraction, validate normalized output, quarantine failures, reconcile re-ingestion.RetrievalService: query, text search, filters, context graph retrieval, snippets, permission filtering, feedback.TransformationService: operation registry, transformation runs, derived artifacts, lineage, review requirements.WorkflowService: workflow templates, run execution, retries, cancel, resume, exception queues, human tasks.AgentService: bounded agent operations, context packages, dry runs, review gates, agent audit.ExportService: package selection, manifest, integrity validation, permission-aware export.
Ports And Adapters
Ports are stable interfaces owned by the engine. Adapters are replaceable implementations.
Required MVP ports:
- Repository port for assets, representations, metadata, relationships, versions, runs, audit events, and exports.
- Object/content store port for source, normalized, and derived content payloads.
- Search index port for lexical search and later semantic/hybrid retrieval.
- Extractor port for format-specific normalization.
- Connector port for source systems.
- Policy decision port for authorization and review requirements.
- Event publisher port for observability, webhooks, and integration.
- AI/model port for provider-neutral summarization, classification, extraction, embedding, or generation when enabled.
Adapter rules:
markitect-toolis an adapter for markdown syntax, selector extraction, deterministic markdown operations, snapshot identity, contracts/runtime checks, and context-package interoperability. Engine domain code must not import it directly; adapter code should persist serializable Markitect outputs as adapter provenance or representation metadata.llm-connector equivalent is an adapter for LLM providers.phase-memoryis an adjacent memory runtime; this engine may exchange opaque memory references or context packages but should not implement memory phases.- SQLite is an MVP repository adapter, not the domain model.
- Semantic/vector search is an optional retrieval adapter, not the definition of retrieval.
Persistence Blueprint
Use SQLite first for local-first durability and tests that prove state survives repository re-instantiation.
Core tables should map to stable domain concepts:
- assets,
- source references,
- representations,
- metadata records,
- classifications,
- contextual entities,
- relationships,
- versions,
- change records,
- actors,
- policy assignments or policy references,
- audit events,
- ingestion jobs,
- transformation runs,
- workflow templates,
- workflow runs and step runs,
- export packages and manifests.
Recommended storage style:
- Relational columns for identifiers, types, status, timestamps, digests, foreign keys, and lifecycle fields.
- JSON columns for flexible metadata, extractor details, policy context, and adapter-specific payloads.
- Separate content/object references for large source, normalized, or derived payloads.
- Append-only audit events and change records.
- Deterministic ordering fields for pagination and tests.
Do not store permission-sensitive content in search indexes unless the retrieval layer can enforce permissions before exposing results.
Retrieval Blueprint
MVP retrieval should be useful before semantic search:
- Retrieve by stable asset ID.
- Filter by metadata, classification, lifecycle, source, collection, and time.
- Search normalized text lexically.
- Retrieve by relationship and contextual entity.
- Return source-grounded snippets and explanation data.
- Enforce permissions before returning content, snippets, or relationship data.
- Capture feedback and quality signals.
Later retrieval can add:
- semantic/vector search,
- hybrid ranking,
- facets and aggregations,
- grounded answer packages,
- federated external-source retrieval.
The retrieval contract should not expose backend-specific ranking internals as stable API.
Workflow And Transformation Blueprint
Transformations and workflows should share a common run model.
Transformation:
source assets + versions + parameters + actor + policy context
-> transformation run
-> derived artifact representation
-> lineage + audit + event
Workflow:
template + inputs + actor + trigger
-> workflow run
-> step runs
-> assets / metadata / relationships / derived artifacts / review tasks
-> audit + events + metrics
MVP execution can be embedded and synchronous/asynchronous-lite. The contracts should still allow later replacement with a queue or external workflow engine.
Operation states should include queued, running, waiting, completed, failed, partially completed, retried, canceled, quarantined, and review required where applicable.
Policy, Governance, And Audit Blueprint
Policy is part of the core operating model, not a UI feature.
Policy inputs:
- actor and delegated actor,
- role and group membership,
- operation type,
- source-system permission context,
- sensitivity,
- lifecycle state,
- review state,
- asset policy,
- workflow state,
- requested output or export scope.
Policy outcomes:
- allow,
- deny,
- redact,
- require review,
- dry-run only,
- fail closed.
Audit should record material operations:
- asset creation and updates,
- ingestion,
- metadata and classification changes,
- relationship changes,
- permission or policy changes,
- retrieval/query where configured,
- transformation runs,
- workflow actions,
- export,
- agent operations,
- administrative recovery actions.
Agent-Safe Operation Blueprint
Agents are actors with explicit scope. They must not receive implicit privileged access.
Agent operations should be listed in a bounded catalog:
- inspect asset,
- search assets,
- retrieve permitted snippets,
- assemble context package,
- propose metadata enrichment,
- propose classification,
- request transformation,
- invoke workflow,
- submit review result,
- dry-run change,
- report generated output.
Each operation declares:
- input schema,
- output schema,
- required permissions,
- policy checks,
- audit behavior,
- review-gate behavior,
- failure modes,
- whether dry-run is supported.
Context packages should contain selected assets, snippets, metadata, relationships, provenance, task instructions, and policy constraints. They should be inspectable and bounded; they are not a back door to unrestricted repository access.
Service API Blueprint
The FastAPI service should be an adapter over application services.
Endpoint groups:
/v1/assets/v1/metadata/v1/relationships/v1/ingestion/jobs/v1/retrieval/query/v1/transformations/v1/workflows/v1/audit/v1/policies/v1/agents/operations/v1/context-packages/v1/exports/v1/admin/health,/ready,/version
API DTOs may differ from domain objects. Keep mapping explicit so the domain can evolve without leaking internal storage shape.
Observability And Export Blueprint
Observability must cover both system operation and product quality.
Operational signals:
- ingestion throughput,
- source-update-to-index latency,
- query latency,
- API latency,
- workflow completion rate,
- job failure rate,
- queue age,
- storage/index health,
- policy failures,
- audit completeness.
Quality signals:
- retrieval precision hooks,
- zero-result rate,
- low-confidence result rate,
- citation precision,
- unsupported-claim rate where AI adapters are used,
- manual correction rate,
- review turnaround time.
Export packages should include:
- selected assets,
- source and normalized representations where policy permits,
- metadata,
- relationships,
- provenance,
- versions,
- audit references,
- derived artifacts,
- manifest,
- schema version,
- checksums,
- actor and policy context.
Implementation Sequence
- Finish
KONT-WP-0004by turning this blueprint into concrete ADRs and module migration decisions. - Build
KONT-WP-0005first as the governed asset registry foundation. - Add ingestion jobs and source/format adapters in
KONT-WP-0006. - Build permission-aware retrieval and context graph behavior in
KONT-WP-0007. - Add transformations, derived artifacts, and workflow jobs in
KONT-WP-0008. - Expose service and agent-safe APIs in
KONT-WP-0009. - Add observability, export, and enterprise-readiness surfaces in
KONT-WP-0010.
Review Checklist
Use this checklist before accepting significant implementation changes:
- Does the change preserve stable asset identity?
- Does it distinguish source, normalized, and derived representations?
- Does every material operation have actor context?
- Are permission checks applied before content exposure or mutation?
- Are audit events emitted or explicitly deferred?
- Are errors structured and traceable with correlation IDs?
- Does the code depend inward on domain contracts rather than outward on infrastructure?
- Is the extension point a port owned by the engine?
- Can the feature work with in-memory tests and a durable backend?
- Can an agent use the feature only through explicit bounded operations?