# Cache Backend Architecture Blueprint

Date: 2026-05-03

## Purpose

This blueprint defines an optional backend architecture for sophisticated
knowledge systems built on top of `markitect-tool`.

It is a research-lab architecture: powerful enough to support cached ASTs,
advanced query backends, agent memory, and access control, but separated from
the slim core so one-off CLI use stays fast and simple.

## Architectural Boundary

The core package owns:

- Markdown parsing
- document contracts
- simple selectors
- deterministic transforms and generation primitives
- unified diagnostics

The optional backend fabric owns:

- persistent snapshots
- indexes
- advanced query adapters
- memory/context packages
- policy enforcement
- provenance records
- trace and performance metadata

The core must be able to run without the backend fabric.

## Conceptual Layers

```text
Markdown files
  -> Core parser and contract layer
  -> Content-addressed document snapshots
  -> Index fabric
      -> AST/JSON index
      -> full-text index
      -> vector/semantic index
      -> analytical/index export
  -> Query adapter registry
      -> simple selectors
      -> JSONPath
      -> SQL/FTS
      -> vector/hybrid retrieval
  -> Context package registry
      -> activated working sets
      -> memory namespaces
      -> agent-ready context bundles
  -> Access policy gateway
      -> labels/ACL/ReBAC/ABAC
      -> result filtering and denial diagnostics
  -> Provenance and observability
```

## Core Interfaces

### Snapshot Backend

Responsible for durable parsed-document snapshots.

Minimum protocol:

```text
put_document(source_path, content, parse_options) -> snapshot_id
get_snapshot(snapshot_id) -> DocumentSnapshot
resolve_source(source_path) -> latest snapshot_id
diff_snapshot(old_id, new_id) -> SnapshotDiff
```

Snapshot identity should include:

- source content hash
- parser version
- parse options
- contract version when relevant

### Index Backend

Responsible for derived lookup structures.

Minimum protocol:

```text
capabilities() -> IndexCapabilities
build(snapshot_ids, options) -> IndexBuildResult
refresh(changed_snapshots) -> IndexBuildResult
query(request) -> QueryResult
explain(request) -> QueryPlan
```

Capabilities should include:

- `jsonpath`
- `sql`
- `fts`
- `vector`
- `hybrid`
- `inline_tokens`
- `section_graph`
- `policy_pushdown`

### Query Adapter

Translates a stable Markitect query request into backend-specific execution.

Minimum protocol:

```text
name
supports(selector_or_query, target) -> bool
execute(document_or_backend, request) -> QueryResult
explain(request) -> QueryExplanation
```

Adapters must return a common result envelope:

- kind
- path
- value
- text
- source location
- snapshot id
- provenance
- policy decision
- backend metadata

### Context Package Registry

Responsible for agent-ready working memory.

Minimum protocol:

```text
create_package(query_or_manifest, budget, policy) -> context_package_id
activate(package_id, thread_or_workspace) -> activation_id
deactivate(activation_id)
refresh(package_id) -> package_id
explain(package_id) -> ContextPackageReport
```

Context packages should include:

- included source spans
- summary layers
- token estimates
- provenance
- freshness
- policy labels
- retrieval recipe
- cache keys

### Access Policy Gateway

Responsible for authorization and redaction before results leave a backend.

Minimum protocol:

```text
authorize(subject, action, object, context) -> PolicyDecision
filter_results(subject, action, results, context) -> FilteredResults
explain_decision(decision_id) -> PolicyExplanation
```

Policy should support a ladder:

1. Labels and trust zones.
2. File/path ACLs.
3. Relationship-based access control.
4. Attribute/rule-based policies.
5. External authorization services.

## Suggested Backend Manifest

Backends should register through a Markdown/YAML manifest:

````markdown
# Local SQLite Backend

```yaml markitect-backend
id: local-sqlite-cache
kind: cache-backend
capabilities:
  - snapshots
  - json
  - fts
  - sql
  - provenance
storage:
  engine: sqlite
  path: .markitect/cache/index.sqlite
policy:
  mode: labels
```
````

## CLI Direction

The first backend CLI should be explicit:

```text
mkt cache init
mkt cache build <path>
mkt cache status
mkt cache query <selector-or-query> --backend <name>
mkt ast show <file>
mkt ast query <file> <jsonpath>
mkt context pack <manifest-or-query>
mkt context activate <package-id>
mkt policy check <subject> <action> <object>
```

Do not hide persistence behind `mkt query`. The user should know when the tool
is querying live files versus a persistent backend.

## Recommended First Stack

Start with:

- content hashes in Python standard library
- SQLite for snapshot metadata, JSON, and FTS5
- JSONPath as an optional extra
- local filesystem cache directory
- simple label policy
- provenance tables

Defer:

- vector search until text/structure cache works
- external authorization engines until local policy model is stable
- MCP server exposure until resources/tools are secure and explainable
- distributed cache until local invalidation is boring

## Security Notes

Cached data becomes a new data exposure surface.

Minimum requirements before secure use:

- cache location is explicit
- cache entries know source path and content hash
- policy mode is visible
- query results report policy filtering
- context packages list what they include
- destructive cache operations require explicit command
- no backend silently sends document content to a network service

## Architecture Decision

Implement the backend fabric after deterministic transform/composition
primitives are underway, but before serious caching, agent memory, or advanced
query backends. This lets WP-0003 continue while reserving a clean path for the
research-lab track.