Files
markitect-tool/docs/local-index-backend.md

111 lines
3.4 KiB
Markdown

# Local Index Backend
`markitect-tool` now includes a local SQLite snapshot/index backend as the
first practical implementation of the optional backend fabric.
## Purpose
The local index is optimized for repeatable Markdown infrastructure work:
- persist parsed document snapshots
- keep cheap source metadata for incremental refresh planning
- store document JSON for later AST/JSONPath use
- index frontmatter, headings, sections, blocks, and metrics
- preserve extension points for dependency edges, references, named regions,
chunks, processor outputs, FTS, and policy-aware access
The backend is optional. Single-file commands such as `mkt parse`, `mkt query`,
and `mkt ast` do not require it.
## Commands
Initialize the SQLite store:
```text
mkt cache init --root .
```
Build or refresh the local index:
```text
mkt cache index docs workplans --root .
```
Query indexed snapshots:
```text
mkt cache query 'sections[heading=Decision]' --root .
mkt cache query '$.headings[*].text' --engine jsonpath --root .
mkt cache query 'sections[heading=Decision]' --policy examples/policy/local-label-policy.yaml --subject public-agent
```
Search indexed section/block text:
```text
mkt search SQLite --root .
mkt search SQLite --policy examples/policy/local-label-policy.yaml --subject public-agent
```
Inspect a parsed AST without using the cache:
```text
mkt ast show docs/backend-fabric.md --format tree
mkt ast stats docs/backend-fabric.md
```
By default, the index is written to:
```text
.markitect/cache/index.sqlite3
```
Use `--index-path` to override it.
## Refresh Behavior
`mkt cache index` uses the same cheap-first refresh planning model as
`mkt backend refresh-plan`:
1. Compare path, size, mtime, parser identity, parse options, and contract hash.
2. Hash only files whose metadata changed.
3. Skip parse/index when metadata changed but content hash stayed the same.
4. Parse and index new or changed files.
5. Delete rows for removed source files.
The command reports planned work and actual work separately in JSON/YAML output.
## Stored Data
The first schema stores:
- `sources`: path, absolute path, size, mtime, content hash, snapshot id,
parser identity, parse option hash, contract hash, document JSON,
frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
- `headings`: heading level, text, and source line
- `sections`: heading metadata, section text, and source span
- `blocks`: block type, text, source span, and heading level
- `dependencies`: reserved dependency edge table for references,
transclusion, literate chunks, and future invalidation graphs
- `search_units`: FTS5 virtual table over sections and blocks
This is enough to recover the useful markitect-main idea of keeping parsed
structure available for faster and richer query backends, while keeping the
normal CLI usable without a cache.
## Policy-Aware Retrieval
`mkt cache query` and `mkt search` can run with a local label policy before
results leave the local backend boundary. When `--policy` is supplied, Markitect
extracts labels and trust zones from document frontmatter and applies any path
rules in the policy file. JSON/YAML output includes policy decisions and
diagnostics.
See `docs/access-control-policy-gateway.md` for the policy vocabulary and
adapter boundaries.
## Future Work
Follow-on backend work can now focus on richer dependency extraction from
references, transclusion, and literate chunks; persistent decision logs; and
larger-scale memory/context packages.