generated from coulomb/repo-seed
111 lines
3.4 KiB
Markdown
111 lines
3.4 KiB
Markdown
# Local Index Backend
|
|
|
|
`markitect-tool` now includes a local SQLite snapshot/index backend as the
|
|
first practical implementation of the optional backend fabric.
|
|
|
|
## Purpose
|
|
|
|
The local index is optimized for repeatable Markdown infrastructure work:
|
|
|
|
- persist parsed document snapshots
|
|
- keep cheap source metadata for incremental refresh planning
|
|
- store document JSON for later AST/JSONPath use
|
|
- index frontmatter, headings, sections, blocks, and metrics
|
|
- preserve extension points for dependency edges, references, named regions,
|
|
chunks, processor outputs, FTS, and policy-aware access
|
|
|
|
The backend is optional. Single-file commands such as `mkt parse`, `mkt query`,
|
|
and `mkt ast` do not require it.
|
|
|
|
## Commands
|
|
|
|
Initialize the SQLite store:
|
|
|
|
```text
|
|
mkt cache init --root .
|
|
```
|
|
|
|
Build or refresh the local index:
|
|
|
|
```text
|
|
mkt cache index docs workplans --root .
|
|
```
|
|
|
|
Query indexed snapshots:
|
|
|
|
```text
|
|
mkt cache query 'sections[heading=Decision]' --root .
|
|
mkt cache query '$.headings[*].text' --engine jsonpath --root .
|
|
mkt cache query 'sections[heading=Decision]' --policy examples/policy/local-label-policy.yaml --subject public-agent
|
|
```
|
|
|
|
Search indexed section/block text:
|
|
|
|
```text
|
|
mkt search SQLite --root .
|
|
mkt search SQLite --policy examples/policy/local-label-policy.yaml --subject public-agent
|
|
```
|
|
|
|
Inspect a parsed AST without using the cache:
|
|
|
|
```text
|
|
mkt ast show docs/backend-fabric.md --format tree
|
|
mkt ast stats docs/backend-fabric.md
|
|
```
|
|
|
|
By default, the index is written to:
|
|
|
|
```text
|
|
.markitect/cache/index.sqlite3
|
|
```
|
|
|
|
Use `--index-path` to override it.
|
|
|
|
## Refresh Behavior
|
|
|
|
`mkt cache index` uses the same cheap-first refresh planning model as
|
|
`mkt backend refresh-plan`:
|
|
|
|
1. Compare path, size, mtime, parser identity, parse options, and contract hash.
|
|
2. Hash only files whose metadata changed.
|
|
3. Skip parse/index when metadata changed but content hash stayed the same.
|
|
4. Parse and index new or changed files.
|
|
5. Delete rows for removed source files.
|
|
|
|
The command reports planned work and actual work separately in JSON/YAML output.
|
|
|
|
## Stored Data
|
|
|
|
The first schema stores:
|
|
|
|
- `sources`: path, absolute path, size, mtime, content hash, snapshot id,
|
|
parser identity, parse option hash, contract hash, document JSON,
|
|
frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
|
|
- `headings`: heading level, text, and source line
|
|
- `sections`: heading metadata, section text, and source span
|
|
- `blocks`: block type, text, source span, and heading level
|
|
- `dependencies`: reserved dependency edge table for references,
|
|
transclusion, literate chunks, and future invalidation graphs
|
|
- `search_units`: FTS5 virtual table over sections and blocks
|
|
|
|
This is enough to recover the useful markitect-main idea of keeping parsed
|
|
structure available for faster and richer query backends, while keeping the
|
|
normal CLI usable without a cache.
|
|
|
|
## Policy-Aware Retrieval
|
|
|
|
`mkt cache query` and `mkt search` can run with a local label policy before
|
|
results leave the local backend boundary. When `--policy` is supplied, Markitect
|
|
extracts labels and trust zones from document frontmatter and applies any path
|
|
rules in the policy file. JSON/YAML output includes policy decisions and
|
|
diagnostics.
|
|
|
|
See `docs/access-control-policy-gateway.md` for the policy vocabulary and
|
|
adapter boundaries.
|
|
|
|
## Future Work
|
|
|
|
Follow-on backend work can now focus on richer dependency extraction from
|
|
references, transclusion, and literate chunks; persistent decision logs; and
|
|
larger-scale memory/context packages.
|