# Cache Backend Architecture Blueprint Date: 2026-05-03 ## Purpose This blueprint defines an optional backend architecture for sophisticated knowledge systems built on top of `markitect-tool`. It is a research-lab architecture: powerful enough to support cached ASTs, advanced query backends, agent memory, and access control, but separated from the slim core so one-off CLI use stays fast and simple. ## Architectural Boundary The core package owns: - Markdown parsing - document contracts - simple selectors - deterministic transforms and generation primitives - unified diagnostics The optional backend fabric owns: - persistent snapshots - indexes - advanced query adapters - memory/context packages - policy enforcement - provenance records - trace and performance metadata The core must be able to run without the backend fabric. ## Conceptual Layers ```text Markdown files -> Core parser and contract layer -> Content-addressed document snapshots -> Index fabric -> AST/JSON index -> full-text index -> vector/semantic index -> analytical/index export -> Query adapter registry -> simple selectors -> JSONPath -> SQL/FTS -> vector/hybrid retrieval -> Context package registry -> activated working sets -> memory namespaces -> agent-ready context bundles -> Access policy gateway -> labels/ACL/ReBAC/ABAC -> result filtering and denial diagnostics -> Provenance and observability ``` ## Core Interfaces ### Snapshot Backend Responsible for durable parsed-document snapshots. Minimum protocol: ```text put_document(source_path, content, parse_options) -> snapshot_id get_snapshot(snapshot_id) -> DocumentSnapshot resolve_source(source_path) -> latest snapshot_id diff_snapshot(old_id, new_id) -> SnapshotDiff ``` Snapshot identity should include: - source content hash - parser version - parse options - contract version when relevant ### Index Backend Responsible for derived lookup structures. Minimum protocol: ```text capabilities() -> IndexCapabilities build(snapshot_ids, options) -> IndexBuildResult refresh(changed_snapshots) -> IndexBuildResult query(request) -> QueryResult explain(request) -> QueryPlan ``` Capabilities should include: - `jsonpath` - `sql` - `fts` - `vector` - `hybrid` - `inline_tokens` - `section_graph` - `policy_pushdown` ### Query Adapter Translates a stable Markitect query request into backend-specific execution. Minimum protocol: ```text name supports(selector_or_query, target) -> bool execute(document_or_backend, request) -> QueryResult explain(request) -> QueryExplanation ``` Adapters must return a common result envelope: - kind - path - value - text - source location - snapshot id - provenance - policy decision - backend metadata ### Context Package Registry Responsible for agent-ready working memory. Minimum protocol: ```text create_package(query_or_manifest, budget, policy) -> context_package_id activate(package_id, thread_or_workspace) -> activation_id deactivate(activation_id) refresh(package_id) -> package_id explain(package_id) -> ContextPackageReport ``` Context packages should include: - included source spans - summary layers - token estimates - provenance - freshness - policy labels - retrieval recipe - cache keys ### Access Policy Gateway Responsible for authorization and redaction before results leave a backend. Minimum protocol: ```text authorize(subject, action, object, context) -> PolicyDecision filter_results(subject, action, results, context) -> FilteredResults explain_decision(decision_id) -> PolicyExplanation ``` Policy should support a ladder: 1. Labels and trust zones. 2. File/path ACLs. 3. Relationship-based access control. 4. Attribute/rule-based policies. 5. External authorization services. ## Suggested Backend Manifest Backends should register through a Markdown/YAML manifest: ````markdown # Local SQLite Backend ```yaml markitect-backend id: local-sqlite-cache kind: cache-backend capabilities: - snapshots - json - fts - sql - provenance storage: engine: sqlite path: .markitect/cache/index.sqlite policy: mode: labels ``` ```` ## CLI Direction The first backend CLI should be explicit: ```text mkt cache init mkt cache build mkt cache status mkt cache query --backend mkt ast show mkt ast query mkt context pack mkt context activate mkt policy check ``` Do not hide persistence behind `mkt query`. The user should know when the tool is querying live files versus a persistent backend. ## Recommended First Stack Start with: - content hashes in Python standard library - SQLite for snapshot metadata, JSON, and FTS5 - JSONPath as an optional extra - local filesystem cache directory - simple label policy - provenance tables Defer: - vector search until text/structure cache works - external authorization engines until local policy model is stable - MCP server exposure until resources/tools are secure and explainable - distributed cache until local invalidation is boring ## Security Notes Cached data becomes a new data exposure surface. Minimum requirements before secure use: - cache location is explicit - cache entries know source path and content hash - policy mode is visible - query results report policy filtering - context packages list what they include - destructive cache operations require explicit command - no backend silently sends document content to a network service ## Architecture Decision Implement the backend fabric after deterministic transform/composition primitives are underway, but before serious caching, agent memory, or advanced query backends. This lets WP-0003 continue while reserving a clean path for the research-lab track.