Files
shard-wiki/workplans/SHARD-WP-0011-incremental-union.md
tegwick 37681d89b6 feat(incremental): wire maintained tier behind views; rebuild fallback (WP-0011 T4)
Route InformationSpace.all_pages through a maintained UnionIndex: equivalence is
served from the incrementally maintained index (curator bindings re-synced live
from the log fold + detected content edges), exposed in decision-log string form
so results are a behaviour-preserving superset. The index is built lazily and
rebuilt (bounded fallback) when the union mutates (attach/edit invalidate it);
reindex() forces a rebuild and verify_index() runs the I-2 self-healing checker.
all_pages() gains an optional equivalence_groups source (default = fold) so
direct callers are unaffected. SCOPE updated; WP-0011 done. 173 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:21:39 +02:00

3.8 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated depends_on state_hub_workstream_id
SHARD-WP-0011 workplan incremental union maintenance + equivalence index + I-2 verification whynot shard-wiki done tegwick whynot 2026-06-15 2026-06-15
SHARD-WP-0007
SHARD-WP-0010
78d48bcf-6482-4266-bc81-084b7ec1cd80

SHARD-WP-0011 — Incremental union + equivalence index

Goal

Replace direct, recompute-on-read resolution with the incremental-first derived tier from CoreArchitectureBlueprint §8.7: change-driven delta maintenance of the union/indexes, an indexed equivalence path (blocking/LSH, not O(N²)) with correct retraction/propagation (review B-4), and the I-2 verification mechanism (digest + background consistency-checker). Rebuild becomes a bounded fallback, not the operational path.

Non-goal: distributed maintenance; persisted on-disk index store (in-memory derived tier is fine for this slice). Per-tenant partitioning (I-13) is honoured structurally but multi-tenant deployment is later.

Context

  • Spec: blueprint §8.7 (incremental, blocking/LSH, rebuild-as-fallback), §8.4; review B-4 + open items O-1/O-4.
  • Builds on union resolution (SHARD-WP-0007) and the views (SHARD-WP-0010) it accelerates.

Equivalence index: blocking + verify

id: SHARD-WP-0011-T1
status: done
priority: high
state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187"

A candidate-generation (blocking) layer — normalized title/path buckets + MinHash/LSH bands over content shingles — then verify (fingerprint / span-set overlap + curator bindings) to produce equivalence edges. Replaces pairwise O(N²). Tests: near-duplicates bucket together; unrelated pages don't; verified edges match a brute-force oracle on a small corpus.

Incremental maintenance (delta, not additive)

id: SHARD-WP-0011-T2
status: done
priority: high
state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30"

Change-driven delta updates: a changed/added/removed page re-buckets, then (per B-4) retracts edges it leaves, adds edges it enters, and propagates to equivalence neighbours (a retraction can split a chorus set). Drives union/BackLinks/RecentChanges deltas. Tests: add/edit/remove keep the index equal to a from-scratch rebuild; a bucket-exit retracts a stale edge.

I-2 verification: digest + consistency-checker

id: SHARD-WP-0011-T3
status: done
priority: high
state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc"

A per-partition Merkle-style digest of the derived tier, maintained alongside deltas, and a background consistency-checker that recomputes a sampled fold and compares; mismatch → scoped recompute of the affected region (self-healing). Makes derived = f(canonical) verified, not asserted. Tests: induced drift is detected and repaired; digest stable under equivalent event orders.

Wire incremental tier behind resolution + views

id: SHARD-WP-0011-T4
status: done
priority: medium
state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089"

Route UnionGraph.resolve and the SHARD-WP-0010 views through the maintained index (rebuild = explicit fallback). Behaviour is unchanged from the consumer's view; only freshness/cost change. Update SCOPE; pytest + pyflakes green.


Acceptance criteria

  • Equivalence is indexed (blocking/LSH + verify), not pairwise; matches a brute-force oracle.
  • Incremental maintenance (with retraction + propagation) keeps the derived tier equal to a from-scratch rebuild; rebuild is a bounded fallback.
  • I-2 is verified by a digest + consistency-checker that detects and self-heals drift.
  • Consumer-visible resolution/views behaviour unchanged; pytest + pyflakes green; synced.

Suggested task order

T1 equivalence index → T2 incremental maintenance → T3 I-2 verification → T4 wiring.