generated from coulomb/repo-seed
Workplan dependencies and prio for text research lab workplans
This commit is contained in:
259
docs/cache-backend-architecture-blueprint.md
Normal file
259
docs/cache-backend-architecture-blueprint.md
Normal file
@@ -0,0 +1,259 @@
|
||||
# Cache Backend Architecture Blueprint
|
||||
|
||||
Date: 2026-05-03
|
||||
|
||||
## Purpose
|
||||
|
||||
This blueprint defines an optional backend architecture for sophisticated
|
||||
knowledge systems built on top of `markitect-tool`.
|
||||
|
||||
It is a research-lab architecture: powerful enough to support cached ASTs,
|
||||
advanced query backends, agent memory, and access control, but separated from
|
||||
the slim core so one-off CLI use stays fast and simple.
|
||||
|
||||
## Architectural Boundary
|
||||
|
||||
The core package owns:
|
||||
|
||||
- Markdown parsing
|
||||
- document contracts
|
||||
- simple selectors
|
||||
- deterministic transforms and generation primitives
|
||||
- unified diagnostics
|
||||
|
||||
The optional backend fabric owns:
|
||||
|
||||
- persistent snapshots
|
||||
- indexes
|
||||
- advanced query adapters
|
||||
- memory/context packages
|
||||
- policy enforcement
|
||||
- provenance records
|
||||
- trace and performance metadata
|
||||
|
||||
The core must be able to run without the backend fabric.
|
||||
|
||||
## Conceptual Layers
|
||||
|
||||
```text
|
||||
Markdown files
|
||||
-> Core parser and contract layer
|
||||
-> Content-addressed document snapshots
|
||||
-> Index fabric
|
||||
-> AST/JSON index
|
||||
-> full-text index
|
||||
-> vector/semantic index
|
||||
-> analytical/index export
|
||||
-> Query adapter registry
|
||||
-> simple selectors
|
||||
-> JSONPath
|
||||
-> SQL/FTS
|
||||
-> vector/hybrid retrieval
|
||||
-> Context package registry
|
||||
-> activated working sets
|
||||
-> memory namespaces
|
||||
-> agent-ready context bundles
|
||||
-> Access policy gateway
|
||||
-> labels/ACL/ReBAC/ABAC
|
||||
-> result filtering and denial diagnostics
|
||||
-> Provenance and observability
|
||||
```
|
||||
|
||||
## Core Interfaces
|
||||
|
||||
### Snapshot Backend
|
||||
|
||||
Responsible for durable parsed-document snapshots.
|
||||
|
||||
Minimum protocol:
|
||||
|
||||
```text
|
||||
put_document(source_path, content, parse_options) -> snapshot_id
|
||||
get_snapshot(snapshot_id) -> DocumentSnapshot
|
||||
resolve_source(source_path) -> latest snapshot_id
|
||||
diff_snapshot(old_id, new_id) -> SnapshotDiff
|
||||
```
|
||||
|
||||
Snapshot identity should include:
|
||||
|
||||
- source content hash
|
||||
- parser version
|
||||
- parse options
|
||||
- contract version when relevant
|
||||
|
||||
### Index Backend
|
||||
|
||||
Responsible for derived lookup structures.
|
||||
|
||||
Minimum protocol:
|
||||
|
||||
```text
|
||||
capabilities() -> IndexCapabilities
|
||||
build(snapshot_ids, options) -> IndexBuildResult
|
||||
refresh(changed_snapshots) -> IndexBuildResult
|
||||
query(request) -> QueryResult
|
||||
explain(request) -> QueryPlan
|
||||
```
|
||||
|
||||
Capabilities should include:
|
||||
|
||||
- `jsonpath`
|
||||
- `sql`
|
||||
- `fts`
|
||||
- `vector`
|
||||
- `hybrid`
|
||||
- `inline_tokens`
|
||||
- `section_graph`
|
||||
- `policy_pushdown`
|
||||
|
||||
### Query Adapter
|
||||
|
||||
Translates a stable Markitect query request into backend-specific execution.
|
||||
|
||||
Minimum protocol:
|
||||
|
||||
```text
|
||||
name
|
||||
supports(selector_or_query, target) -> bool
|
||||
execute(document_or_backend, request) -> QueryResult
|
||||
explain(request) -> QueryExplanation
|
||||
```
|
||||
|
||||
Adapters must return a common result envelope:
|
||||
|
||||
- kind
|
||||
- path
|
||||
- value
|
||||
- text
|
||||
- source location
|
||||
- snapshot id
|
||||
- provenance
|
||||
- policy decision
|
||||
- backend metadata
|
||||
|
||||
### Context Package Registry
|
||||
|
||||
Responsible for agent-ready working memory.
|
||||
|
||||
Minimum protocol:
|
||||
|
||||
```text
|
||||
create_package(query_or_manifest, budget, policy) -> context_package_id
|
||||
activate(package_id, thread_or_workspace) -> activation_id
|
||||
deactivate(activation_id)
|
||||
refresh(package_id) -> package_id
|
||||
explain(package_id) -> ContextPackageReport
|
||||
```
|
||||
|
||||
Context packages should include:
|
||||
|
||||
- included source spans
|
||||
- summary layers
|
||||
- token estimates
|
||||
- provenance
|
||||
- freshness
|
||||
- policy labels
|
||||
- retrieval recipe
|
||||
- cache keys
|
||||
|
||||
### Access Policy Gateway
|
||||
|
||||
Responsible for authorization and redaction before results leave a backend.
|
||||
|
||||
Minimum protocol:
|
||||
|
||||
```text
|
||||
authorize(subject, action, object, context) -> PolicyDecision
|
||||
filter_results(subject, action, results, context) -> FilteredResults
|
||||
explain_decision(decision_id) -> PolicyExplanation
|
||||
```
|
||||
|
||||
Policy should support a ladder:
|
||||
|
||||
1. Labels and trust zones.
|
||||
2. File/path ACLs.
|
||||
3. Relationship-based access control.
|
||||
4. Attribute/rule-based policies.
|
||||
5. External authorization services.
|
||||
|
||||
## Suggested Backend Manifest
|
||||
|
||||
Backends should register through a Markdown/YAML manifest:
|
||||
|
||||
````markdown
|
||||
# Local SQLite Backend
|
||||
|
||||
```yaml markitect-backend
|
||||
id: local-sqlite-cache
|
||||
kind: cache-backend
|
||||
capabilities:
|
||||
- snapshots
|
||||
- json
|
||||
- fts
|
||||
- sql
|
||||
- provenance
|
||||
storage:
|
||||
engine: sqlite
|
||||
path: .markitect/cache/index.sqlite
|
||||
policy:
|
||||
mode: labels
|
||||
```
|
||||
````
|
||||
|
||||
## CLI Direction
|
||||
|
||||
The first backend CLI should be explicit:
|
||||
|
||||
```text
|
||||
mkt cache init
|
||||
mkt cache build <path>
|
||||
mkt cache status
|
||||
mkt cache query <selector-or-query> --backend <name>
|
||||
mkt ast show <file>
|
||||
mkt ast query <file> <jsonpath>
|
||||
mkt context pack <manifest-or-query>
|
||||
mkt context activate <package-id>
|
||||
mkt policy check <subject> <action> <object>
|
||||
```
|
||||
|
||||
Do not hide persistence behind `mkt query`. The user should know when the tool
|
||||
is querying live files versus a persistent backend.
|
||||
|
||||
## Recommended First Stack
|
||||
|
||||
Start with:
|
||||
|
||||
- content hashes in Python standard library
|
||||
- SQLite for snapshot metadata, JSON, and FTS5
|
||||
- JSONPath as an optional extra
|
||||
- local filesystem cache directory
|
||||
- simple label policy
|
||||
- provenance tables
|
||||
|
||||
Defer:
|
||||
|
||||
- vector search until text/structure cache works
|
||||
- external authorization engines until local policy model is stable
|
||||
- MCP server exposure until resources/tools are secure and explainable
|
||||
- distributed cache until local invalidation is boring
|
||||
|
||||
## Security Notes
|
||||
|
||||
Cached data becomes a new data exposure surface.
|
||||
|
||||
Minimum requirements before secure use:
|
||||
|
||||
- cache location is explicit
|
||||
- cache entries know source path and content hash
|
||||
- policy mode is visible
|
||||
- query results report policy filtering
|
||||
- context packages list what they include
|
||||
- destructive cache operations require explicit command
|
||||
- no backend silently sends document content to a network service
|
||||
|
||||
## Architecture Decision
|
||||
|
||||
Implement the backend fabric after deterministic transform/composition
|
||||
primitives are underway, but before serious caching, agent memory, or advanced
|
||||
query backends. This lets WP-0003 continue while reserving a clean path for the
|
||||
research-lab track.
|
||||
Reference in New Issue
Block a user