Files
markitect-tool/docs/local-index-backend.md

3.4 KiB

Local Index Backend

markitect-tool now includes a local SQLite snapshot/index backend as the first practical implementation of the optional backend fabric.

Purpose

The local index is optimized for repeatable Markdown infrastructure work:

  • persist parsed document snapshots
  • keep cheap source metadata for incremental refresh planning
  • store document JSON for later AST/JSONPath use
  • index frontmatter, headings, sections, blocks, and metrics
  • preserve extension points for dependency edges, references, named regions, chunks, processor outputs, FTS, and policy-aware access

The backend is optional. Single-file commands such as mkt parse, mkt query, and mkt ast do not require it.

Commands

Initialize the SQLite store:

mkt cache init --root .

Build or refresh the local index:

mkt cache index docs workplans --root .

Query indexed snapshots:

mkt cache query 'sections[heading=Decision]' --root .
mkt cache query '$.headings[*].text' --engine jsonpath --root .
mkt cache query 'sections[heading=Decision]' --policy examples/policy/local-label-policy.yaml --subject public-agent

Search indexed section/block text:

mkt search SQLite --root .
mkt search SQLite --policy examples/policy/local-label-policy.yaml --subject public-agent

Inspect a parsed AST without using the cache:

mkt ast show docs/backend-fabric.md --format tree
mkt ast stats docs/backend-fabric.md

By default, the index is written to:

.markitect/cache/index.sqlite3

Use --index-path to override it.

Refresh Behavior

mkt cache index uses the same cheap-first refresh planning model as mkt backend refresh-plan:

  1. Compare path, size, mtime, parser identity, parse options, and contract hash.
  2. Hash only files whose metadata changed.
  3. Skip parse/index when metadata changed but content hash stayed the same.
  4. Parse and index new or changed files.
  5. Delete rows for removed source files.

The command reports planned work and actual work separately in JSON/YAML output.

Stored Data

The first schema stores:

  • sources: path, absolute path, size, mtime, content hash, snapshot id, parser identity, parse option hash, contract hash, document JSON, frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
  • headings: heading level, text, and source line
  • sections: heading metadata, section text, and source span
  • blocks: block type, text, source span, and heading level
  • dependencies: reserved dependency edge table for references, transclusion, literate chunks, and future invalidation graphs
  • search_units: FTS5 virtual table over sections and blocks

This is enough to recover the useful markitect-main idea of keeping parsed structure available for faster and richer query backends, while keeping the normal CLI usable without a cache.

Policy-Aware Retrieval

mkt cache query and mkt search can run with a local label policy before results leave the local backend boundary. When --policy is supplied, Markitect extracts labels and trust zones from document frontmatter and applies any path rules in the policy file. JSON/YAML output includes policy decisions and diagnostics.

See docs/access-control-policy-gateway.md for the policy vocabulary and adapter boundaries.

Future Work

Follow-on backend work can now focus on richer dependency extraction from references, transclusion, and literate chunks; persistent decision logs; and larger-scale memory/context packages.