--- id: SHARD-WP-0011 type: workplan title: "incremental union maintenance + equivalence index + I-2 verification" domain: whynot repo: shard-wiki status: done owner: tegwick topic_slug: whynot created: "2026-06-15" updated: "2026-06-15" depends_on: - SHARD-WP-0007 - SHARD-WP-0010 state_hub_workstream_id: "78d48bcf-6482-4266-bc81-084b7ec1cd80" --- # SHARD-WP-0011 — Incremental union + equivalence index ## Goal Replace direct, recompute-on-read resolution with the **incremental-first** derived tier from `CoreArchitectureBlueprint` §8.7: change-driven delta maintenance of the union/indexes, an **indexed equivalence** path (blocking/LSH, not O(N²)) with correct retraction/propagation (review B-4), and the **I-2 verification** mechanism (digest + background consistency-checker). Rebuild becomes a bounded fallback, not the operational path. **Non-goal:** distributed maintenance; persisted on-disk index store (in-memory derived tier is fine for this slice). Per-tenant partitioning (I-13) is honoured structurally but multi-tenant deployment is later. ## Context - Spec: blueprint §8.7 (incremental, blocking/LSH, rebuild-as-fallback), §8.4; review B-4 + open items O-1/O-4. - Builds on union resolution (SHARD-WP-0007) and the views (SHARD-WP-0010) it accelerates. --- ## Equivalence index: blocking + verify ```task id: SHARD-WP-0011-T1 status: done priority: high state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187" ``` A candidate-generation (**blocking**) layer — normalized title/path buckets + MinHash/LSH bands over content shingles — then **verify** (fingerprint / span-set overlap + curator bindings) to produce equivalence edges. Replaces pairwise O(N²). Tests: near-duplicates bucket together; unrelated pages don't; verified edges match a brute-force oracle on a small corpus. ## Incremental maintenance (delta, not additive) ```task id: SHARD-WP-0011-T2 status: done priority: high state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30" ``` Change-driven delta updates: a changed/added/removed page **re-buckets**, then (per B-4) **retracts** edges it leaves, **adds** edges it enters, and **propagates** to equivalence neighbours (a retraction can split a chorus set). Drives union/BackLinks/RecentChanges deltas. Tests: add/edit/remove keep the index equal to a from-scratch rebuild; a bucket-exit retracts a stale edge. ## I-2 verification: digest + consistency-checker ```task id: SHARD-WP-0011-T3 status: done priority: high state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc" ``` A per-partition **Merkle-style digest** of the derived tier, maintained alongside deltas, and a **background consistency-checker** that recomputes a sampled fold and compares; mismatch → **scoped recompute** of the affected region (self-healing). Makes `derived = f(canonical)` *verified*, not asserted. Tests: induced drift is detected and repaired; digest stable under equivalent event orders. ## Wire incremental tier behind resolution + views ```task id: SHARD-WP-0011-T4 status: done priority: medium state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089" ``` Route `UnionGraph.resolve` and the SHARD-WP-0010 views through the maintained index (rebuild = explicit fallback). Behaviour is unchanged from the consumer's view; only freshness/cost change. Update SCOPE; `pytest` + pyflakes green. --- ## Acceptance criteria - Equivalence is indexed (blocking/LSH + verify), not pairwise; matches a brute-force oracle. - Incremental maintenance (with retraction + propagation) keeps the derived tier equal to a from-scratch rebuild; rebuild is a bounded fallback. - I-2 is verified by a digest + consistency-checker that detects and self-heals drift. - Consumer-visible resolution/views behaviour unchanged; `pytest` + pyflakes green; synced. ## Suggested task order T1 equivalence index → T2 incremental maintenance → T3 I-2 verification → T4 wiring.