5.7 KiB
Cache Backend Architecture Blueprint
Date: 2026-05-03
Purpose
This blueprint defines an optional backend architecture for sophisticated
knowledge systems built on top of markitect-tool.
It is a research-lab architecture: powerful enough to support cached ASTs, advanced query backends, agent memory, and access control, but separated from the slim core so one-off CLI use stays fast and simple.
Architectural Boundary
The core package owns:
- Markdown parsing
- document contracts
- simple selectors
- deterministic transforms and generation primitives
- unified diagnostics
The optional backend fabric owns:
- persistent snapshots
- indexes
- advanced query adapters
- memory/context packages
- policy enforcement
- provenance records
- trace and performance metadata
The core must be able to run without the backend fabric.
Conceptual Layers
Markdown files
-> Core parser and contract layer
-> Content-addressed document snapshots
-> Index fabric
-> AST/JSON index
-> full-text index
-> vector/semantic index
-> analytical/index export
-> Query adapter registry
-> simple selectors
-> JSONPath
-> SQL/FTS
-> vector/hybrid retrieval
-> Context package registry
-> activated working sets
-> memory namespaces
-> agent-ready context bundles
-> Access policy gateway
-> labels/ACL/ReBAC/ABAC
-> result filtering and denial diagnostics
-> Provenance and observability
Core Interfaces
Snapshot Backend
Responsible for durable parsed-document snapshots.
Minimum protocol:
put_document(source_path, content, parse_options) -> snapshot_id
get_snapshot(snapshot_id) -> DocumentSnapshot
resolve_source(source_path) -> latest snapshot_id
diff_snapshot(old_id, new_id) -> SnapshotDiff
Snapshot identity should include:
- source content hash
- parser version
- parse options
- contract version when relevant
Index Backend
Responsible for derived lookup structures.
Minimum protocol:
capabilities() -> IndexCapabilities
build(snapshot_ids, options) -> IndexBuildResult
refresh(changed_snapshots) -> IndexBuildResult
query(request) -> QueryResult
explain(request) -> QueryPlan
Capabilities should include:
jsonpathsqlftsvectorhybridinline_tokenssection_graphpolicy_pushdown
Query Adapter
Translates a stable Markitect query request into backend-specific execution.
Minimum protocol:
name
supports(selector_or_query, target) -> bool
execute(document_or_backend, request) -> QueryResult
explain(request) -> QueryExplanation
Adapters must return a common result envelope:
- kind
- path
- value
- text
- source location
- snapshot id
- provenance
- policy decision
- backend metadata
Context Package Registry
Responsible for agent-ready working memory.
Minimum protocol:
create_package(query_or_manifest, budget, policy) -> context_package_id
activate(package_id, thread_or_workspace) -> activation_id
deactivate(activation_id)
refresh(package_id) -> package_id
explain(package_id) -> ContextPackageReport
Context packages should include:
- included source spans
- summary layers
- token estimates
- provenance
- freshness
- policy labels
- retrieval recipe
- cache keys
Access Policy Gateway
Responsible for authorization and redaction before results leave a backend.
Minimum protocol:
authorize(subject, action, object, context) -> PolicyDecision
filter_results(subject, action, results, context) -> FilteredResults
explain_decision(decision_id) -> PolicyExplanation
Policy should support a ladder:
- Labels and trust zones.
- File/path ACLs.
- Relationship-based access control.
- Attribute/rule-based policies.
- External authorization services.
Suggested Backend Manifest
Backends should register through a Markdown/YAML manifest:
# Local SQLite Backend
```yaml markitect-backend
id: local-sqlite-cache
kind: cache-backend
capabilities:
- snapshots
- json
- fts
- sql
- provenance
storage:
engine: sqlite
path: .markitect/cache/index.sqlite
policy:
mode: labels
```
CLI Direction
The first backend CLI should be explicit:
mkt cache init
mkt cache build <path>
mkt cache status
mkt cache query <selector-or-query> --backend <name>
mkt ast show <file>
mkt ast query <file> <jsonpath>
mkt context pack <manifest-or-query>
mkt context activate <package-id>
mkt policy check <subject> <action> <object>
Do not hide persistence behind mkt query. The user should know when the tool
is querying live files versus a persistent backend.
Recommended First Stack
Start with:
- content hashes in Python standard library
- SQLite for snapshot metadata, JSON, and FTS5
- JSONPath as an optional extra
- local filesystem cache directory
- simple label policy
- provenance tables
Defer:
- vector search until text/structure cache works
- external authorization engines until local policy model is stable
- MCP server exposure until resources/tools are secure and explainable
- distributed cache until local invalidation is boring
Security Notes
Cached data becomes a new data exposure surface.
Minimum requirements before secure use:
- cache location is explicit
- cache entries know source path and content hash
- policy mode is visible
- query results report policy filtering
- context packages list what they include
- destructive cache operations require explicit command
- no backend silently sends document content to a network service
Architecture Decision
Implement the backend fabric after deterministic transform/composition primitives are underway, but before serious caching, agent memory, or advanced query backends. This lets WP-0003 continue while reserving a clean path for the research-lab track.