generated from coulomb/repo-seed
Route InformationSpace.all_pages through a maintained UnionIndex: equivalence is served from the incrementally maintained index (curator bindings re-synced live from the log fold + detected content edges), exposed in decision-log string form so results are a behaviour-preserving superset. The index is built lazily and rebuilt (bounded fallback) when the union mutates (attach/edit invalidate it); reindex() forces a rebuild and verify_index() runs the I-2 self-healing checker. all_pages() gains an optional equivalence_groups source (default = fold) so direct callers are unaffected. SCOPE updated; WP-0011 done. 173 tests green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
110 lines
3.8 KiB
Markdown
110 lines
3.8 KiB
Markdown
---
|
|
id: SHARD-WP-0011
|
|
type: workplan
|
|
title: "incremental union maintenance + equivalence index + I-2 verification"
|
|
domain: whynot
|
|
repo: shard-wiki
|
|
status: done
|
|
owner: tegwick
|
|
topic_slug: whynot
|
|
created: "2026-06-15"
|
|
updated: "2026-06-15"
|
|
depends_on:
|
|
- SHARD-WP-0007
|
|
- SHARD-WP-0010
|
|
state_hub_workstream_id: "78d48bcf-6482-4266-bc81-084b7ec1cd80"
|
|
---
|
|
|
|
# SHARD-WP-0011 — Incremental union + equivalence index
|
|
|
|
## Goal
|
|
|
|
Replace direct, recompute-on-read resolution with the **incremental-first** derived tier from
|
|
`CoreArchitectureBlueprint` §8.7: change-driven delta maintenance of the union/indexes, an
|
|
**indexed equivalence** path (blocking/LSH, not O(N²)) with correct retraction/propagation
|
|
(review B-4), and the **I-2 verification** mechanism (digest + background consistency-checker).
|
|
Rebuild becomes a bounded fallback, not the operational path.
|
|
|
|
**Non-goal:** distributed maintenance; persisted on-disk index store (in-memory derived tier is
|
|
fine for this slice). Per-tenant partitioning (I-13) is honoured structurally but multi-tenant
|
|
deployment is later.
|
|
|
|
## Context
|
|
|
|
- Spec: blueprint §8.7 (incremental, blocking/LSH, rebuild-as-fallback), §8.4; review B-4 +
|
|
open items O-1/O-4.
|
|
- Builds on union resolution (SHARD-WP-0007) and the views (SHARD-WP-0010) it accelerates.
|
|
|
|
---
|
|
|
|
## Equivalence index: blocking + verify
|
|
|
|
```task
|
|
id: SHARD-WP-0011-T1
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187"
|
|
```
|
|
|
|
A candidate-generation (**blocking**) layer — normalized title/path buckets + MinHash/LSH bands
|
|
over content shingles — then **verify** (fingerprint / span-set overlap + curator bindings) to
|
|
produce equivalence edges. Replaces pairwise O(N²). Tests: near-duplicates bucket together;
|
|
unrelated pages don't; verified edges match a brute-force oracle on a small corpus.
|
|
|
|
## Incremental maintenance (delta, not additive)
|
|
|
|
```task
|
|
id: SHARD-WP-0011-T2
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30"
|
|
```
|
|
|
|
Change-driven delta updates: a changed/added/removed page **re-buckets**, then (per B-4)
|
|
**retracts** edges it leaves, **adds** edges it enters, and **propagates** to equivalence
|
|
neighbours (a retraction can split a chorus set). Drives union/BackLinks/RecentChanges deltas.
|
|
Tests: add/edit/remove keep the index equal to a from-scratch rebuild; a bucket-exit retracts a
|
|
stale edge.
|
|
|
|
## I-2 verification: digest + consistency-checker
|
|
|
|
```task
|
|
id: SHARD-WP-0011-T3
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc"
|
|
```
|
|
|
|
A per-partition **Merkle-style digest** of the derived tier, maintained alongside deltas, and a
|
|
**background consistency-checker** that recomputes a sampled fold and compares; mismatch →
|
|
**scoped recompute** of the affected region (self-healing). Makes `derived = f(canonical)`
|
|
*verified*, not asserted. Tests: induced drift is detected and repaired; digest stable under
|
|
equivalent event orders.
|
|
|
|
## Wire incremental tier behind resolution + views
|
|
|
|
```task
|
|
id: SHARD-WP-0011-T4
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089"
|
|
```
|
|
|
|
Route `UnionGraph.resolve` and the SHARD-WP-0010 views through the maintained index (rebuild =
|
|
explicit fallback). Behaviour is unchanged from the consumer's view; only freshness/cost change.
|
|
Update SCOPE; `pytest` + pyflakes green.
|
|
|
|
---
|
|
|
|
## Acceptance criteria
|
|
|
|
- Equivalence is indexed (blocking/LSH + verify), not pairwise; matches a brute-force oracle.
|
|
- Incremental maintenance (with retraction + propagation) keeps the derived tier equal to a
|
|
from-scratch rebuild; rebuild is a bounded fallback.
|
|
- I-2 is verified by a digest + consistency-checker that detects and self-heals drift.
|
|
- Consumer-visible resolution/views behaviour unchanged; `pytest` + pyflakes green; synced.
|
|
|
|
## Suggested task order
|
|
|
|
T1 equivalence index → T2 incremental maintenance → T3 I-2 verification → T4 wiring.
|