Refresh planning layer for backend fabric

2026-05-04 03:25:26 +02:00
parent 3f08a27a24
commit b1577d90db
10 changed files with 797 additions and 2 deletions
--- a/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md
+++ b/workplans/MKTT-WP-0007-advanced-query-and-local-index-backend.md
@@ -28,6 +28,25 @@ This backend should later be able to index `MKTT-WP-0010` references, named
 regions, chunks, and processor provenance without changing its basic storage
 contract.

+## Preliminary Refinement - Snapshot Refresh Planning
+
+Implemented before starting the SQLite/index tasks: `SnapshotState`,
+`SnapshotPlanEntry`, `SnapshotRefreshPlan`, `plan_snapshot_refresh`,
+`load_snapshot_state_file`, and CLI `mkt backend refresh-plan`.
+
+This is the performance contract for WP-0007:
+
+- compare cheap metadata before hashing
+- hash only likely-changed files when `--verify-hashes` is requested
+- parse only files whose identity/content requires a new snapshot
+- index only new, changed, unindexed, or dependency-invalidated entries
+- carry direct and transitive dependency invalidation forward from
+  `DependencyEdge`
+- keep refresh planning inspectable through JSON/YAML/text output
+
+The future SQLite store should persist enough state to feed this planner
+directly and should report actual refresh work against the same categories.
+
 ## P7.1 - Implement local snapshot store

 ```task
@@ -40,6 +59,14 @@ state_hub_task_id: "8894a9a4-586c-457b-b4e6-add8276ff5f2"
 Persist parsed document snapshots and source metadata in a local cache
 directory.

+Implementation hints:
+
+- Persist `SnapshotState` fields in the snapshot/source tables.
+- Store path, size, mtime, content hash, parser id/version, parse options hash,
+  contract hash, snapshot id, indexed flag, and dependency edges.
+- Keep large document/token JSON lazy-loadable so refresh planning does not
+  pull whole AST payloads into memory.
+
 ## P7.2 - Add AST introspection commands

 ```task
@@ -86,6 +113,14 @@ and metrics in SQLite.
 Keep schema extension points for reference edges, named regions, chunks, and
 processor outputs.

+Implementation hints:
+
+- Use narrow metadata tables for hot refresh decisions.
+- Store document/token JSON separately from searchable section/block rows.
+- Add indexes on path, content hash, snapshot id, parser version, and unit ids.
+- Preserve source spans and content-unit ids from WP-0010 reference/literate
+  layers.
+
 ## P7.5 - Add FTS5 section/block search

 ```task
@@ -111,6 +146,16 @@ Refresh only changed files based on content hash and parser version.

 Include dependency invalidation hooks for future transclusion/reference graphs.

+Implementation hints:
+
+- Drive incremental refresh from `SnapshotRefreshPlan`.
+- The first pass should use cheap metadata; only hash metadata-changed files.
+- With `--verify-hashes`, skip parse/index when content is unchanged and only
+  update metadata.
+- Use reverse dependency edges for direct and transitive invalidation.
+- Report planned vs actual counts for hash, parse, index, metadata update,
+  delete, and invalidation work.
+
 ## P7.7 - Add local index CLI

 ```task