--- id: MKTT-WP-0007 type: workplan title: "Advanced Query and Local Index Backend" domain: markitect status: done owner: markitect-tool topic_slug: markitect planning_priority: P2 planning_order: 60 depends_on_workplans: - MKTT-WP-0006 related_workplans: - MKTT-WP-0010 created: "2026-05-03" updated: "2026-05-04" state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6" --- # MKTT-WP-0007: Advanced Query and Local Index Backend ## Purpose Implement the first practical backend use case: cached AST introspection, JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents. This backend should later be able to index `MKTT-WP-0010` references, named regions, chunks, and processor provenance without changing its basic storage contract. ## Preliminary Refinement - Snapshot Refresh Planning Implemented before starting the SQLite/index tasks: `SnapshotState`, `SnapshotPlanEntry`, `SnapshotRefreshPlan`, `plan_snapshot_refresh`, `load_snapshot_state_file`, and CLI `mkt backend refresh-plan`. This is the performance contract for WP-0007: - compare cheap metadata before hashing - hash only likely-changed files when `--verify-hashes` is requested - parse only files whose identity/content requires a new snapshot - index only new, changed, unindexed, or dependency-invalidated entries - carry direct and transitive dependency invalidation forward from `DependencyEdge` - keep refresh planning inspectable through JSON/YAML/text output The future SQLite store should persist enough state to feed this planner directly and should report actual refresh work against the same categories. ## P7.1 - Implement local snapshot store ```task id: MKTT-WP-0007-T001 status: done priority: high state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2" ``` Persist parsed document snapshots and source metadata in a local cache directory. Implemented: `LocalSnapshotStore`, SQLite schema initialization, source-state loading, parsed document JSON persistence, provenance envelope storage, and relative path handling. See `docs/local-index-backend.md`. Implementation hints: - Persist `SnapshotState` fields in the snapshot/source tables. - Store path, size, mtime, content hash, parser id/version, parse options hash, contract hash, snapshot id, indexed flag, and dependency edges. - Keep large document/token JSON lazy-loadable so refresh planning does not pull whole AST payloads into memory. ## P7.2 - Add AST introspection commands ```task id: MKTT-WP-0007-T002 status: done priority: high state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20" ``` Add: ```text mkt ast show mkt ast stats ``` Use the current parsed document and token model. Do not require cache presence for single-file use. Implemented: `mkt ast show ` and `mkt ast stats ` with JSON, YAML, tree/text output modes. ## P7.3 - Add optional JSONPath query adapter ```task id: MKTT-WP-0007-T003 status: done priority: high state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c" ``` Support JSONPath over `Document.to_dict()` behind an optional dependency and shared query result envelope. Implemented: `query_document_jsonpath()` and `extract_document_jsonpath()` use the optional `jsonpath-ng` dependency and return the same `QueryMatch` envelope as the compact selector engine. CLI `mkt query` and `mkt extract` accept `--engine jsonpath`. ## P7.4 - Build SQLite metadata and JSON index ```task id: MKTT-WP-0007-T004 status: done priority: medium state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea" ``` Persist source files, content hashes, frontmatter, headings, sections, blocks, and metrics in SQLite. Keep schema extension points for reference edges, named regions, chunks, and processor outputs. Implementation hints: - Use narrow metadata tables for hot refresh decisions. - Store document/token JSON separately from searchable section/block rows. - Add indexes on path, content hash, snapshot id, parser version, and unit ids. - Preserve source spans and content-unit ids from WP-0010 reference/literate layers. Implemented: source, heading, section, block, dependency, and metadata tables; document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on path, content hash, snapshot id, parser identity, unit path, and dependency target. ## P7.5 - Add FTS5 section/block search ```task id: MKTT-WP-0007-T005 status: done priority: medium state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b" ``` Add full-text search over section and block text with source spans and relevance ranking. Implemented: local SQLite index creates an FTS5 `search_units` virtual table for sections and blocks, including path, snapshot id, unit kind/index, heading, text, source spans, and BM25 rank. CLI `mkt search ` queries it. ## P7.6 - Add incremental refresh ```task id: MKTT-WP-0007-T006 status: done priority: medium state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf" ``` Refresh only changed files based on content hash and parser version. Include dependency invalidation hooks for future transclusion/reference graphs. Implementation hints: - Drive incremental refresh from `SnapshotRefreshPlan`. - The first pass should use cheap metadata; only hash metadata-changed files. - With `--verify-hashes`, skip parse/index when content is unchanged and only update metadata. - Use reverse dependency edges for direct and transitive invalidation. - Report planned vs actual counts for hash, parse, index, metadata update, delete, and invalidation work. Implemented first pass: `LocalSnapshotStore.build()` drives refresh from `SnapshotRefreshPlan`, hashes metadata-changed files by default, skips unchanged content, updates metadata-only rows, refreshes changed snapshots, and deletes removed files. ## P7.7 - Add local index CLI ```task id: MKTT-WP-0007-T007 status: done priority: high state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b" ``` Add: ```text mkt cache init mkt cache build mkt cache query mkt search ``` Implemented: - `mkt cache init` - `mkt cache index ` - `mkt cache query ` - `mkt search ` The older lightweight manifest commands remain available as `mkt cache build`, `mkt cache status`, and `mkt cache fingerprint`. ## Exit Criteria - Legacy AST/JSONPath value is recovered as an optional backend. - Local repeated queries are faster and explainable. - Simple selectors still work without cache.