generated from coulomb/repo-seed
260 lines
5.7 KiB
Markdown
260 lines
5.7 KiB
Markdown
# Cache Backend Architecture Blueprint
|
|
|
|
Date: 2026-05-03
|
|
|
|
## Purpose
|
|
|
|
This blueprint defines an optional backend architecture for sophisticated
|
|
knowledge systems built on top of `markitect-tool`.
|
|
|
|
It is a research-lab architecture: powerful enough to support cached ASTs,
|
|
advanced query backends, agent memory, and access control, but separated from
|
|
the slim core so one-off CLI use stays fast and simple.
|
|
|
|
## Architectural Boundary
|
|
|
|
The core package owns:
|
|
|
|
- Markdown parsing
|
|
- document contracts
|
|
- simple selectors
|
|
- deterministic transforms and generation primitives
|
|
- unified diagnostics
|
|
|
|
The optional backend fabric owns:
|
|
|
|
- persistent snapshots
|
|
- indexes
|
|
- advanced query adapters
|
|
- memory/context packages
|
|
- policy enforcement
|
|
- provenance records
|
|
- trace and performance metadata
|
|
|
|
The core must be able to run without the backend fabric.
|
|
|
|
## Conceptual Layers
|
|
|
|
```text
|
|
Markdown files
|
|
-> Core parser and contract layer
|
|
-> Content-addressed document snapshots
|
|
-> Index fabric
|
|
-> AST/JSON index
|
|
-> full-text index
|
|
-> vector/semantic index
|
|
-> analytical/index export
|
|
-> Query adapter registry
|
|
-> simple selectors
|
|
-> JSONPath
|
|
-> SQL/FTS
|
|
-> vector/hybrid retrieval
|
|
-> Context package registry
|
|
-> activated working sets
|
|
-> memory namespaces
|
|
-> agent-ready context bundles
|
|
-> Access policy gateway
|
|
-> labels/ACL/ReBAC/ABAC
|
|
-> result filtering and denial diagnostics
|
|
-> Provenance and observability
|
|
```
|
|
|
|
## Core Interfaces
|
|
|
|
### Snapshot Backend
|
|
|
|
Responsible for durable parsed-document snapshots.
|
|
|
|
Minimum protocol:
|
|
|
|
```text
|
|
put_document(source_path, content, parse_options) -> snapshot_id
|
|
get_snapshot(snapshot_id) -> DocumentSnapshot
|
|
resolve_source(source_path) -> latest snapshot_id
|
|
diff_snapshot(old_id, new_id) -> SnapshotDiff
|
|
```
|
|
|
|
Snapshot identity should include:
|
|
|
|
- source content hash
|
|
- parser version
|
|
- parse options
|
|
- contract version when relevant
|
|
|
|
### Index Backend
|
|
|
|
Responsible for derived lookup structures.
|
|
|
|
Minimum protocol:
|
|
|
|
```text
|
|
capabilities() -> IndexCapabilities
|
|
build(snapshot_ids, options) -> IndexBuildResult
|
|
refresh(changed_snapshots) -> IndexBuildResult
|
|
query(request) -> QueryResult
|
|
explain(request) -> QueryPlan
|
|
```
|
|
|
|
Capabilities should include:
|
|
|
|
- `jsonpath`
|
|
- `sql`
|
|
- `fts`
|
|
- `vector`
|
|
- `hybrid`
|
|
- `inline_tokens`
|
|
- `section_graph`
|
|
- `policy_pushdown`
|
|
|
|
### Query Adapter
|
|
|
|
Translates a stable Markitect query request into backend-specific execution.
|
|
|
|
Minimum protocol:
|
|
|
|
```text
|
|
name
|
|
supports(selector_or_query, target) -> bool
|
|
execute(document_or_backend, request) -> QueryResult
|
|
explain(request) -> QueryExplanation
|
|
```
|
|
|
|
Adapters must return a common result envelope:
|
|
|
|
- kind
|
|
- path
|
|
- value
|
|
- text
|
|
- source location
|
|
- snapshot id
|
|
- provenance
|
|
- policy decision
|
|
- backend metadata
|
|
|
|
### Context Package Registry
|
|
|
|
Responsible for agent-ready working memory.
|
|
|
|
Minimum protocol:
|
|
|
|
```text
|
|
create_package(query_or_manifest, budget, policy) -> context_package_id
|
|
activate(package_id, thread_or_workspace) -> activation_id
|
|
deactivate(activation_id)
|
|
refresh(package_id) -> package_id
|
|
explain(package_id) -> ContextPackageReport
|
|
```
|
|
|
|
Context packages should include:
|
|
|
|
- included source spans
|
|
- summary layers
|
|
- token estimates
|
|
- provenance
|
|
- freshness
|
|
- policy labels
|
|
- retrieval recipe
|
|
- cache keys
|
|
|
|
### Access Policy Gateway
|
|
|
|
Responsible for authorization and redaction before results leave a backend.
|
|
|
|
Minimum protocol:
|
|
|
|
```text
|
|
authorize(subject, action, object, context) -> PolicyDecision
|
|
filter_results(subject, action, results, context) -> FilteredResults
|
|
explain_decision(decision_id) -> PolicyExplanation
|
|
```
|
|
|
|
Policy should support a ladder:
|
|
|
|
1. Labels and trust zones.
|
|
2. File/path ACLs.
|
|
3. Relationship-based access control.
|
|
4. Attribute/rule-based policies.
|
|
5. External authorization services.
|
|
|
|
## Suggested Backend Manifest
|
|
|
|
Backends should register through a Markdown/YAML manifest:
|
|
|
|
````markdown
|
|
# Local SQLite Backend
|
|
|
|
```yaml markitect-backend
|
|
id: local-sqlite-cache
|
|
kind: cache-backend
|
|
capabilities:
|
|
- snapshots
|
|
- json
|
|
- fts
|
|
- sql
|
|
- provenance
|
|
storage:
|
|
engine: sqlite
|
|
path: .markitect/cache/index.sqlite
|
|
policy:
|
|
mode: labels
|
|
```
|
|
````
|
|
|
|
## CLI Direction
|
|
|
|
The first backend CLI should be explicit:
|
|
|
|
```text
|
|
mkt cache init
|
|
mkt cache build <path>
|
|
mkt cache status
|
|
mkt cache query <selector-or-query> --backend <name>
|
|
mkt ast show <file>
|
|
mkt ast query <file> <jsonpath>
|
|
mkt context pack <manifest-or-query>
|
|
mkt context activate <package-id>
|
|
mkt policy check <subject> <action> <object>
|
|
```
|
|
|
|
Do not hide persistence behind `mkt query`. The user should know when the tool
|
|
is querying live files versus a persistent backend.
|
|
|
|
## Recommended First Stack
|
|
|
|
Start with:
|
|
|
|
- content hashes in Python standard library
|
|
- SQLite for snapshot metadata, JSON, and FTS5
|
|
- JSONPath as an optional extra
|
|
- local filesystem cache directory
|
|
- simple label policy
|
|
- provenance tables
|
|
|
|
Defer:
|
|
|
|
- vector search until text/structure cache works
|
|
- external authorization engines until local policy model is stable
|
|
- MCP server exposure until resources/tools are secure and explainable
|
|
- distributed cache until local invalidation is boring
|
|
|
|
## Security Notes
|
|
|
|
Cached data becomes a new data exposure surface.
|
|
|
|
Minimum requirements before secure use:
|
|
|
|
- cache location is explicit
|
|
- cache entries know source path and content hash
|
|
- policy mode is visible
|
|
- query results report policy filtering
|
|
- context packages list what they include
|
|
- destructive cache operations require explicit command
|
|
- no backend silently sends document content to a network service
|
|
|
|
## Architecture Decision
|
|
|
|
Implement the backend fabric after deterministic transform/composition
|
|
primitives are underway, but before serious caching, agent memory, or advanced
|
|
query backends. This lets WP-0003 continue while reserving a clean path for the
|
|
research-lab track.
|