--- id: MKTT-WP-0007 type: workplan title: "Advanced Query and Local Index Backend" domain: markitect status: todo owner: markitect-tool topic_slug: markitect planning_priority: P2 planning_order: 60 depends_on_workplans: - MKTT-WP-0006 related_workplans: - MKTT-WP-0010 created: "2026-05-03" updated: "2026-05-04" state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6" --- # MKTT-WP-0007: Advanced Query and Local Index Backend ## Purpose Implement the first practical backend use case: cached AST introspection, JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents. This backend should later be able to index `MKTT-WP-0010` references, named regions, chunks, and processor provenance without changing its basic storage contract. ## Preliminary Refinement - Snapshot Refresh Planning Implemented before starting the SQLite/index tasks: `SnapshotState`, `SnapshotPlanEntry`, `SnapshotRefreshPlan`, `plan_snapshot_refresh`, `load_snapshot_state_file`, and CLI `mkt backend refresh-plan`. This is the performance contract for WP-0007: - compare cheap metadata before hashing - hash only likely-changed files when `--verify-hashes` is requested - parse only files whose identity/content requires a new snapshot - index only new, changed, unindexed, or dependency-invalidated entries - carry direct and transitive dependency invalidation forward from `DependencyEdge` - keep refresh planning inspectable through JSON/YAML/text output The future SQLite store should persist enough state to feed this planner directly and should report actual refresh work against the same categories. ## P7.1 - Implement local snapshot store ```task id: MKTT-WP-0007-T001 status: todo priority: high state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2" ``` Persist parsed document snapshots and source metadata in a local cache directory. Implementation hints: - Persist `SnapshotState` fields in the snapshot/source tables. - Store path, size, mtime, content hash, parser id/version, parse options hash, contract hash, snapshot id, indexed flag, and dependency edges. - Keep large document/token JSON lazy-loadable so refresh planning does not pull whole AST payloads into memory. ## P7.2 - Add AST introspection commands ```task id: MKTT-WP-0007-T002 status: todo priority: high state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20" ``` Add: ```text mkt ast show mkt ast stats ``` Use the current parsed document and token model. Do not require cache presence for single-file use. ## P7.3 - Add optional JSONPath query adapter ```task id: MKTT-WP-0007-T003 status: todo priority: high state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c" ``` Support JSONPath over `Document.to_dict()` behind an optional dependency and shared query result envelope. ## P7.4 - Build SQLite metadata and JSON index ```task id: MKTT-WP-0007-T004 status: todo priority: medium state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea" ``` Persist source files, content hashes, frontmatter, headings, sections, blocks, and metrics in SQLite. Keep schema extension points for reference edges, named regions, chunks, and processor outputs. Implementation hints: - Use narrow metadata tables for hot refresh decisions. - Store document/token JSON separately from searchable section/block rows. - Add indexes on path, content hash, snapshot id, parser version, and unit ids. - Preserve source spans and content-unit ids from WP-0010 reference/literate layers. ## P7.5 - Add FTS5 section/block search ```task id: MKTT-WP-0007-T005 status: todo priority: medium state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b" ``` Add full-text search over section and block text with source spans and relevance ranking. ## P7.6 - Add incremental refresh ```task id: MKTT-WP-0007-T006 status: todo priority: medium state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf" ``` Refresh only changed files based on content hash and parser version. Include dependency invalidation hooks for future transclusion/reference graphs. Implementation hints: - Drive incremental refresh from `SnapshotRefreshPlan`. - The first pass should use cheap metadata; only hash metadata-changed files. - With `--verify-hashes`, skip parse/index when content is unchanged and only update metadata. - Use reverse dependency edges for direct and transitive invalidation. - Report planned vs actual counts for hash, parse, index, metadata update, delete, and invalidation work. ## P7.7 - Add local index CLI ```task id: MKTT-WP-0007-T007 status: todo priority: high state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b" ``` Add: ```text mkt cache init mkt cache build mkt cache query mkt search ``` ## Exit Criteria - Legacy AST/JSONPath value is recovered as an optional backend. - Local repeated queries are faster and explainable. - Simple selectors still work without cache.