Files
markitect-tool/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md

6.3 KiB

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, related_workplans, created, updated, state_hub_workstream_id
id type title domain status owner topic_slug planning_priority planning_order depends_on_workplans related_workplans created updated state_hub_workstream_id
MKTT-WP-0007 workplan Advanced Query and Local Index Backend markitect done markitect-tool markitect P2 60
MKTT-WP-0006
MKTT-WP-0010
2026-05-03 2026-05-04 d61a82e4-651a-4df2-944a-9ff996b2e1f6

MKTT-WP-0007: Advanced Query and Local Index Backend

Purpose

Implement the first practical backend use case: cached AST introspection, JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents.

This backend should later be able to index MKTT-WP-0010 references, named regions, chunks, and processor provenance without changing its basic storage contract.

Preliminary Refinement - Snapshot Refresh Planning

Implemented before starting the SQLite/index tasks: SnapshotState, SnapshotPlanEntry, SnapshotRefreshPlan, plan_snapshot_refresh, load_snapshot_state_file, and CLI mkt backend refresh-plan.

This is the performance contract for WP-0007:

  • compare cheap metadata before hashing
  • hash only likely-changed files when --verify-hashes is requested
  • parse only files whose identity/content requires a new snapshot
  • index only new, changed, unindexed, or dependency-invalidated entries
  • carry direct and transitive dependency invalidation forward from DependencyEdge
  • keep refresh planning inspectable through JSON/YAML/text output

The future SQLite store should persist enough state to feed this planner directly and should report actual refresh work against the same categories.

P7.1 - Implement local snapshot store

id: MKTT-WP-0007-T001
status: done
priority: high
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"

Persist parsed document snapshots and source metadata in a local cache directory.

Implemented: LocalSnapshotStore, SQLite schema initialization, source-state loading, parsed document JSON persistence, provenance envelope storage, and relative path handling. See docs/local-index-backend.md.

Implementation hints:

  • Persist SnapshotState fields in the snapshot/source tables.
  • Store path, size, mtime, content hash, parser id/version, parse options hash, contract hash, snapshot id, indexed flag, and dependency edges.
  • Keep large document/token JSON lazy-loadable so refresh planning does not pull whole AST payloads into memory.

P7.2 - Add AST introspection commands

id: MKTT-WP-0007-T002
status: done
priority: high
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"

Add:

mkt ast show <file>
mkt ast stats <file>

Use the current parsed document and token model. Do not require cache presence for single-file use.

Implemented: mkt ast show <file> and mkt ast stats <file> with JSON, YAML, tree/text output modes.

P7.3 - Add optional JSONPath query adapter

id: MKTT-WP-0007-T003
status: done
priority: high
state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c"

Support JSONPath over Document.to_dict() behind an optional dependency and shared query result envelope.

Implemented: query_document_jsonpath() and extract_document_jsonpath() use the optional jsonpath-ng dependency and return the same QueryMatch envelope as the compact selector engine. CLI mkt query and mkt extract accept --engine jsonpath.

P7.4 - Build SQLite metadata and JSON index

id: MKTT-WP-0007-T004
status: done
priority: medium
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"

Persist source files, content hashes, frontmatter, headings, sections, blocks, and metrics in SQLite.

Keep schema extension points for reference edges, named regions, chunks, and processor outputs.

Implementation hints:

  • Use narrow metadata tables for hot refresh decisions.
  • Store document/token JSON separately from searchable section/block rows.
  • Add indexes on path, content hash, snapshot id, parser version, and unit ids.
  • Preserve source spans and content-unit ids from WP-0010 reference/literate layers.

Implemented: source, heading, section, block, dependency, and metadata tables; document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on path, content hash, snapshot id, parser identity, unit path, and dependency target.

id: MKTT-WP-0007-T005
status: done
priority: medium
state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b"

Add full-text search over section and block text with source spans and relevance ranking.

Implemented: local SQLite index creates an FTS5 search_units virtual table for sections and blocks, including path, snapshot id, unit kind/index, heading, text, source spans, and BM25 rank. CLI mkt search <text> queries it.

P7.6 - Add incremental refresh

id: MKTT-WP-0007-T006
status: done
priority: medium
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"

Refresh only changed files based on content hash and parser version.

Include dependency invalidation hooks for future transclusion/reference graphs.

Implementation hints:

  • Drive incremental refresh from SnapshotRefreshPlan.
  • The first pass should use cheap metadata; only hash metadata-changed files.
  • With --verify-hashes, skip parse/index when content is unchanged and only update metadata.
  • Use reverse dependency edges for direct and transitive invalidation.
  • Report planned vs actual counts for hash, parse, index, metadata update, delete, and invalidation work.

Implemented first pass: LocalSnapshotStore.build() drives refresh from SnapshotRefreshPlan, hashes metadata-changed files by default, skips unchanged content, updates metadata-only rows, refreshes changed snapshots, and deletes removed files.

P7.7 - Add local index CLI

id: MKTT-WP-0007-T007
status: done
priority: high
state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b"

Add:

mkt cache init
mkt cache build <path>
mkt cache query <selector-or-query>
mkt search <text>

Implemented:

  • mkt cache init
  • mkt cache index <path>
  • mkt cache query <selector-or-query>
  • mkt search <text>

The older lightweight manifest commands remain available as mkt cache build, mkt cache status, and mkt cache fingerprint.

Exit Criteria

  • Legacy AST/JSONPath value is recovered as an optional backend.
  • Local repeated queries are faster and explainable.
  • Simple selectors still work without cache.