generated from coulomb/repo-seed
218 lines
6.3 KiB
Markdown
218 lines
6.3 KiB
Markdown
---
|
|
id: MKTT-WP-0007
|
|
type: workplan
|
|
title: "Advanced Query and Local Index Backend"
|
|
domain: markitect
|
|
status: done
|
|
owner: markitect-tool
|
|
topic_slug: markitect
|
|
planning_priority: P2
|
|
planning_order: 60
|
|
depends_on_workplans:
|
|
- MKTT-WP-0006
|
|
related_workplans:
|
|
- MKTT-WP-0010
|
|
created: "2026-05-03"
|
|
updated: "2026-05-04"
|
|
state_hub_workstream_id: "d61a82e4-651a-4df2-944a-9ff996b2e1f6"
|
|
---
|
|
|
|
# MKTT-WP-0007: Advanced Query and Local Index Backend
|
|
|
|
## Purpose
|
|
|
|
Implement the first practical backend use case: cached AST introspection,
|
|
JSONPath querying, SQLite metadata, and FTS5 search over Markdown documents.
|
|
|
|
This backend should later be able to index `MKTT-WP-0010` references, named
|
|
regions, chunks, and processor provenance without changing its basic storage
|
|
contract.
|
|
|
|
## Preliminary Refinement - Snapshot Refresh Planning
|
|
|
|
Implemented before starting the SQLite/index tasks: `SnapshotState`,
|
|
`SnapshotPlanEntry`, `SnapshotRefreshPlan`, `plan_snapshot_refresh`,
|
|
`load_snapshot_state_file`, and CLI `mkt backend refresh-plan`.
|
|
|
|
This is the performance contract for WP-0007:
|
|
|
|
- compare cheap metadata before hashing
|
|
- hash only likely-changed files when `--verify-hashes` is requested
|
|
- parse only files whose identity/content requires a new snapshot
|
|
- index only new, changed, unindexed, or dependency-invalidated entries
|
|
- carry direct and transitive dependency invalidation forward from
|
|
`DependencyEdge`
|
|
- keep refresh planning inspectable through JSON/YAML/text output
|
|
|
|
The future SQLite store should persist enough state to feed this planner
|
|
directly and should report actual refresh work against the same categories.
|
|
|
|
## P7.1 - Implement local snapshot store
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T001
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
|
|
```
|
|
|
|
Persist parsed document snapshots and source metadata in a local cache
|
|
directory.
|
|
|
|
Implemented: `LocalSnapshotStore`, SQLite schema initialization, source-state
|
|
loading, parsed document JSON persistence, provenance envelope storage, and
|
|
relative path handling. See `docs/local-index-backend.md`.
|
|
|
|
Implementation hints:
|
|
|
|
- Persist `SnapshotState` fields in the snapshot/source tables.
|
|
- Store path, size, mtime, content hash, parser id/version, parse options hash,
|
|
contract hash, snapshot id, indexed flag, and dependency edges.
|
|
- Keep large document/token JSON lazy-loadable so refresh planning does not
|
|
pull whole AST payloads into memory.
|
|
|
|
## P7.2 - Add AST introspection commands
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T002
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "fb9eaa9d-5c20-49a9-a7a6-acae28ac5e20"
|
|
```
|
|
|
|
Add:
|
|
|
|
```text
|
|
mkt ast show <file>
|
|
mkt ast stats <file>
|
|
```
|
|
|
|
Use the current parsed document and token model. Do not require cache presence
|
|
for single-file use.
|
|
|
|
Implemented: `mkt ast show <file>` and `mkt ast stats <file>` with JSON, YAML,
|
|
tree/text output modes.
|
|
|
|
## P7.3 - Add optional JSONPath query adapter
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T003
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "a7b46b32-f322-4fe0-a6fb-60b0b823593c"
|
|
```
|
|
|
|
Support JSONPath over `Document.to_dict()` behind an optional dependency and
|
|
shared query result envelope.
|
|
|
|
Implemented: `query_document_jsonpath()` and `extract_document_jsonpath()` use
|
|
the optional `jsonpath-ng` dependency and return the same `QueryMatch` envelope
|
|
as the compact selector engine. CLI `mkt query` and `mkt extract` accept
|
|
`--engine jsonpath`.
|
|
|
|
## P7.4 - Build SQLite metadata and JSON index
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T004
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "479f11a3-4ab4-451b-991c-7f143f2bffea"
|
|
```
|
|
|
|
Persist source files, content hashes, frontmatter, headings, sections, blocks,
|
|
and metrics in SQLite.
|
|
|
|
Keep schema extension points for reference edges, named regions, chunks, and
|
|
processor outputs.
|
|
|
|
Implementation hints:
|
|
|
|
- Use narrow metadata tables for hot refresh decisions.
|
|
- Store document/token JSON separately from searchable section/block rows.
|
|
- Add indexes on path, content hash, snapshot id, parser version, and unit ids.
|
|
- Preserve source spans and content-unit ids from WP-0010 reference/literate
|
|
layers.
|
|
|
|
Implemented: source, heading, section, block, dependency, and metadata tables;
|
|
document/frontmatter/metrics/provenance JSON payloads; hot-path indexes on
|
|
path, content hash, snapshot id, parser identity, unit path, and dependency
|
|
target.
|
|
|
|
## P7.5 - Add FTS5 section/block search
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T005
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "0f03e9be-b6f0-4e4b-8220-3bbf638a892b"
|
|
```
|
|
|
|
Add full-text search over section and block text with source spans and
|
|
relevance ranking.
|
|
|
|
Implemented: local SQLite index creates an FTS5 `search_units` virtual table
|
|
for sections and blocks, including path, snapshot id, unit kind/index, heading,
|
|
text, source spans, and BM25 rank. CLI `mkt search <text>` queries it.
|
|
|
|
## P7.6 - Add incremental refresh
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T006
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "7d9472e6-0716-435b-866c-d2c66ad786cf"
|
|
```
|
|
|
|
Refresh only changed files based on content hash and parser version.
|
|
|
|
Include dependency invalidation hooks for future transclusion/reference graphs.
|
|
|
|
Implementation hints:
|
|
|
|
- Drive incremental refresh from `SnapshotRefreshPlan`.
|
|
- The first pass should use cheap metadata; only hash metadata-changed files.
|
|
- With `--verify-hashes`, skip parse/index when content is unchanged and only
|
|
update metadata.
|
|
- Use reverse dependency edges for direct and transitive invalidation.
|
|
- Report planned vs actual counts for hash, parse, index, metadata update,
|
|
delete, and invalidation work.
|
|
|
|
Implemented first pass: `LocalSnapshotStore.build()` drives refresh from
|
|
`SnapshotRefreshPlan`, hashes metadata-changed files by default, skips
|
|
unchanged content, updates metadata-only rows, refreshes changed snapshots, and
|
|
deletes removed files.
|
|
|
|
## P7.7 - Add local index CLI
|
|
|
|
```task
|
|
id: MKTT-WP-0007-T007
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "35cc63ff-3723-43d5-aaf6-f9312efa0f4b"
|
|
```
|
|
|
|
Add:
|
|
|
|
```text
|
|
mkt cache init
|
|
mkt cache build <path>
|
|
mkt cache query <selector-or-query>
|
|
mkt search <text>
|
|
```
|
|
|
|
Implemented:
|
|
|
|
- `mkt cache init`
|
|
- `mkt cache index <path>`
|
|
- `mkt cache query <selector-or-query>`
|
|
- `mkt search <text>`
|
|
|
|
The older lightweight manifest commands remain available as `mkt cache build`,
|
|
`mkt cache status`, and `mkt cache fingerprint`.
|
|
|
|
## Exit Criteria
|
|
|
|
- Legacy AST/JSONPath value is recovered as an optional backend.
|
|
- Local repeated queries are faster and explainable.
|
|
- Simple selectors still work without cache.
|