Files
markitect-tool/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md

218 lines
6.3 KiB
Markdown

---
id: MKTT-WP-0007
type: workplan
title: "Advanced Query and Local Index Backend"
domain: markitect
status: done
owner: markitect-tool
topic_slug: markitect
planning_priority: P2
planning_order: 60
depends_on_workplans:
- MKTT-WP-0006
related_workplans:
- MKTT-WP-0010
created: "2026-05-03"
updated: "2026-05-04"
state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6"
---
# MKTT-WP-0007: Advanced Query and Local Index Backend
## Purpose
Implement the first practical backend use case: cached AST introspection,
JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents.
This backend should later be able to index `MKTT-WP-0010` references, named
regions, chunks, and processor provenance without changing its basic storage
contract.
## Preliminary Refinement - Snapshot Refresh Planning
Implemented before starting the SQLite/index tasks: `SnapshotState`,
`SnapshotPlanEntry`, `SnapshotRefreshPlan`, `plan_snapshot_refresh`,
`load_snapshot_state_file`, and CLI `mkt backend refresh-plan`.
This is the performance contract for WP-0007:
- compare cheap metadata before hashing
- hash only likely-changed files when `--verify-hashes` is requested
- parse only files whose identity/content requires a new snapshot
- index only new, changed, unindexed, or dependency-invalidated entries
- carry direct and transitive dependency invalidation forward from
`DependencyEdge`
- keep refresh planning inspectable through JSON/YAML/text output
The future SQLite store should persist enough state to feed this planner
directly and should report actual refresh work against the same categories.
## P7.1 - Implement local snapshot store
```task
id: MKTT-WP-0007-T001
status: done
priority: high
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
```
Persist parsed document snapshots and source metadata in a local cache
directory.
Implemented: `LocalSnapshotStore`, SQLite schema initialization, source-state
loading, parsed document JSON persistence, provenance envelope storage, and
relative path handling. See `docs/local-index-backend.md`.
Implementation hints:
- Persist `SnapshotState` fields in the snapshot/source tables.
- Store path, size, mtime, content hash, parser id/version, parse options hash,
contract hash, snapshot id, indexed flag, and dependency edges.
- Keep large document/token JSON lazy-loadable so refresh planning does not
pull whole AST payloads into memory.
## P7.2 - Add AST introspection commands
```task
id: MKTT-WP-0007-T002
status: done
priority: high
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
```
Add:
```text
mkt ast show <file>
mkt ast stats <file>
```
Use the current parsed document and token model. Do not require cache presence
for single-file use.
Implemented: `mkt ast show <file>` and `mkt ast stats <file>` with JSON, YAML,
tree/text output modes.
## P7.3 - Add optional JSONPath query adapter
```task
id: MKTT-WP-0007-T003
status: done
priority: high
state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c"
```
Support JSONPath over `Document.to_dict()` behind an optional dependency and
shared query result envelope.
Implemented: `query_document_jsonpath()` and `extract_document_jsonpath()` use
the optional `jsonpath-ng` dependency and return the same `QueryMatch` envelope
as the compact selector engine. CLI `mkt query` and `mkt extract` accept
`--engine jsonpath`.
## P7.4 - Build SQLite metadata and JSON index
```task
id: MKTT-WP-0007-T004
status: done
priority: medium
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
```
Persist source files, content hashes, frontmatter, headings, sections, blocks,
and metrics in SQLite.
Keep schema extension points for reference edges, named regions, chunks, and
processor outputs.
Implementation hints:
- Use narrow metadata tables for hot refresh decisions.
- Store document/token JSON separately from searchable section/block rows.
- Add indexes on path, content hash, snapshot id, parser version, and unit ids.
- Preserve source spans and content-unit ids from WP-0010 reference/literate
layers.
Implemented: source, heading, section, block, dependency, and metadata tables;
document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on
path, content hash, snapshot id, parser identity, unit path, and dependency
target.
## P7.5 - Add FTS5 section/block search
```task
id: MKTT-WP-0007-T005
status: done
priority: medium
state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b"
```
Add full-text search over section and block text with source spans and
relevance ranking.
Implemented: local SQLite index creates an FTS5 `search_units` virtual table
for sections and blocks, including path, snapshot id, unit kind/index, heading,
text, source spans, and BM25 rank. CLI `mkt search <text>` queries it.
## P7.6 - Add incremental refresh
```task
id: MKTT-WP-0007-T006
status: done
priority: medium
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
```
Refresh only changed files based on content hash and parser version.
Include dependency invalidation hooks for future transclusion/reference graphs.
Implementation hints:
- Drive incremental refresh from `SnapshotRefreshPlan`.
- The first pass should use cheap metadata; only hash metadata-changed files.
- With `--verify-hashes`, skip parse/index when content is unchanged and only
update metadata.
- Use reverse dependency edges for direct and transitive invalidation.
- Report planned vs actual counts for hash, parse, index, metadata update,
delete, and invalidation work.
Implemented first pass: `LocalSnapshotStore.build()` drives refresh from
`SnapshotRefreshPlan`, hashes metadata-changed files by default, skips
unchanged content, updates metadata-only rows, refreshes changed snapshots, and
deletes removed files.
## P7.7 - Add local index CLI
```task
id: MKTT-WP-0007-T007
status: done
priority: high
state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b"
```
Add:
```text
mkt cache init
mkt cache build <path>
mkt cache query <selector-or-query>
mkt search <text>
```
Implemented:
- `mkt cache init`
- `mkt cache index <path>`
- `mkt cache query <selector-or-query>`
- `mkt search <text>`
The older lightweight manifest commands remain available as `mkt cache build`,
`mkt cache status`, and `mkt cache fingerprint`.
## Exit Criteria
- Legacy AST/JSONPath value is recovered as an optional backend.
- Local repeated queries are faster and explainable.
- Simple selectors still work without cache.