Files
markitect-tool/docs/cache-backend-architecture-blueprint.md

5.7 KiB

Cache Backend Architecture Blueprint

Date: 2026-05-03

Purpose

This blueprint defines an optional backend architecture for sophisticated knowledge systems built on top of markitect-tool.

It is a research-lab architecture: powerful enough to support cached ASTs, advanced query backends, agent memory, and access control, but separated from the slim core so one-off CLI use stays fast and simple.

Architectural Boundary

The core package owns:

  • Markdown parsing
  • document contracts
  • simple selectors
  • deterministic transforms and generation primitives
  • unified diagnostics

The optional backend fabric owns:

  • persistent snapshots
  • indexes
  • advanced query adapters
  • memory/context packages
  • policy enforcement
  • provenance records
  • trace and performance metadata

The core must be able to run without the backend fabric.

Conceptual Layers

Markdown files
  -> Core parser and contract layer
  -> Content-addressed document snapshots
  -> Index fabric
      -> AST/JSON index
      -> full-text index
      -> vector/semantic index
      -> analytical/index export
  -> Query adapter registry
      -> simple selectors
      -> JSONPath
      -> SQL/FTS
      -> vector/hybrid retrieval
  -> Context package registry
      -> activated working sets
      -> memory namespaces
      -> agent-ready context bundles
  -> Access policy gateway
      -> labels/ACL/ReBAC/ABAC
      -> result filtering and denial diagnostics
  -> Provenance and observability

Core Interfaces

Snapshot Backend

Responsible for durable parsed-document snapshots.

Minimum protocol:

put_document(source_path, content, parse_options) -> snapshot_id
get_snapshot(snapshot_id) -> DocumentSnapshot
resolve_source(source_path) -> latest snapshot_id
diff_snapshot(old_id, new_id) -> SnapshotDiff

Snapshot identity should include:

  • source content hash
  • parser version
  • parse options
  • contract version when relevant

Index Backend

Responsible for derived lookup structures.

Minimum protocol:

capabilities() -> IndexCapabilities
build(snapshot_ids, options) -> IndexBuildResult
refresh(changed_snapshots) -> IndexBuildResult
query(request) -> QueryResult
explain(request) -> QueryPlan

Capabilities should include:

  • jsonpath
  • sql
  • fts
  • vector
  • hybrid
  • inline_tokens
  • section_graph
  • policy_pushdown

Query Adapter

Translates a stable Markitect query request into backend-specific execution.

Minimum protocol:

name
supports(selector_or_query, target) -> bool
execute(document_or_backend, request) -> QueryResult
explain(request) -> QueryExplanation

Adapters must return a common result envelope:

  • kind
  • path
  • value
  • text
  • source location
  • snapshot id
  • provenance
  • policy decision
  • backend metadata

Context Package Registry

Responsible for agent-ready working memory.

Minimum protocol:

create_package(query_or_manifest, budget, policy) -> context_package_id
activate(package_id, thread_or_workspace) -> activation_id
deactivate(activation_id)
refresh(package_id) -> package_id
explain(package_id) -> ContextPackageReport

Context packages should include:

  • included source spans
  • summary layers
  • token estimates
  • provenance
  • freshness
  • policy labels
  • retrieval recipe
  • cache keys

Access Policy Gateway

Responsible for authorization and redaction before results leave a backend.

Minimum protocol:

authorize(subject, action, object, context) -> PolicyDecision
filter_results(subject, action, results, context) -> FilteredResults
explain_decision(decision_id) -> PolicyExplanation

Policy should support a ladder:

  1. Labels and trust zones.
  2. File/path ACLs.
  3. Relationship-based access control.
  4. Attribute/rule-based policies.
  5. External authorization services.

Suggested Backend Manifest

Backends should register through a Markdown/YAML manifest:

# Local SQLite Backend

```yaml markitect-backend
id: local-sqlite-cache
kind: cache-backend
capabilities:
  - snapshots
  - json
  - fts
  - sql
  - provenance
storage:
  engine: sqlite
  path: .markitect/cache/index.sqlite
policy:
  mode: labels
```

CLI Direction

The first backend CLI should be explicit:

mkt cache init
mkt cache build <path>
mkt cache status
mkt cache query <selector-or-query> --backend <name>
mkt ast show <file>
mkt ast query <file> <jsonpath>
mkt context pack <manifest-or-query>
mkt context activate <package-id>
mkt policy check <subject> <action> <object>

Do not hide persistence behind mkt query. The user should know when the tool is querying live files versus a persistent backend.

Start with:

  • content hashes in Python standard library
  • SQLite for snapshot metadata, JSON, and FTS5
  • JSONPath as an optional extra
  • local filesystem cache directory
  • simple label policy
  • provenance tables

Defer:

  • vector search until text/structure cache works
  • external authorization engines until local policy model is stable
  • MCP server exposure until resources/tools are secure and explainable
  • distributed cache until local invalidation is boring

Security Notes

Cached data becomes a new data exposure surface.

Minimum requirements before secure use:

  • cache location is explicit
  • cache entries know source path and content hash
  • policy mode is visible
  • query results report policy filtering
  • context packages list what they include
  • destructive cache operations require explicit command
  • no backend silently sends document content to a network service

Architecture Decision

Implement the backend fabric after deterministic transform/composition primitives are underway, but before serious caching, agent memory, or advanced query backends. This lets WP-0003 continue while reserving a clean path for the research-lab track.