Files
markitect-tool/docs/local-index-backend.md

2.8 KiB

Local Index Backend

markitect-tool now includes a local SQLite snapshot/index backend as the first practical implementation of the optional backend fabric.

Purpose

The local index is optimized for repeatable Markdown infrastructure work:

  • persist parsed document snapshots
  • keep cheap source metadata for incremental refresh planning
  • store document JSON for later AST/JSONPath use
  • index frontmatter, headings, sections, blocks, and metrics
  • preserve extension points for dependency edges, references, named regions, chunks, processor outputs, FTS, and policy-aware access

The backend is optional. Single-file commands such as mkt parse, mkt query, and mkt ast do not require it.

Commands

Initialize the SQLite store:

mkt cache init --root .

Build or refresh the local index:

mkt cache index docs workplans --root .

Query indexed snapshots:

mkt cache query 'sections[heading=Decision]' --root .
mkt cache query '$.headings[*].text' --engine jsonpath --root .

Search indexed section/block text:

mkt search SQLite --root .

Inspect a parsed AST without using the cache:

mkt ast show docs/backend-fabric.md --format tree
mkt ast stats docs/backend-fabric.md

By default, the index is written to:

.markitect/cache/index.sqlite3

Use --index-path to override it.

Refresh Behavior

mkt cache index uses the same cheap-first refresh planning model as mkt backend refresh-plan:

  1. Compare path, size, mtime, parser identity, parse options, and contract hash.
  2. Hash only files whose metadata changed.
  3. Skip parse/index when metadata changed but content hash stayed the same.
  4. Parse and index new or changed files.
  5. Delete rows for removed source files.

The command reports planned work and actual work separately in JSON/YAML output.

Stored Data

The first schema stores:

  • sources: path, absolute path, size, mtime, content hash, snapshot id, parser identity, parse option hash, contract hash, document JSON, frontmatter JSON, metrics JSON, provenance JSON, and indexed flag
  • headings: heading level, text, and source line
  • sections: heading metadata, section text, and source span
  • blocks: block type, text, source span, and heading level
  • dependencies: reserved dependency edge table for references, transclusion, literate chunks, and future invalidation graphs
  • search_units: FTS5 virtual table over sections and blocks

This is enough to recover the useful markitect-main idea of keeping parsed structure available for faster and richer query backends, while keeping the normal CLI usable without a cache.

Future Work

Follow-on backend work can now focus on richer dependency extraction from references, transclusion, and literate chunks; access-controlled query gateways; and larger-scale memory/context packages.