diff --git a/spec/CoreArchitectureBlueprint.md b/spec/CoreArchitectureBlueprint.md index b0ac9c7..3e1d1b2 100644 --- a/spec/CoreArchitectureBlueprint.md +++ b/spec/CoreArchitectureBlueprint.md @@ -645,14 +645,37 @@ comparison across all pages of all shards is O(N²) and is forbidden. Instead: ≈O(N) candidates. 2. **Verification** — candidate pairs are confirmed by full fingerprint / span-set overlap and any curator binding. Confirmed equivalences become union edges. -3. **Incremental maintenance** — a changed page is re-bucketed and only its *new* candidate set - is re-verified; equivalence is maintained per-change, never recomputed globally. +3. **Incremental maintenance — the delta is *not* additive (review B-4).** A changed page may + *leave* buckets as well as *enter* them, and leaving a bucket can **break an existing + equivalence edge** another page relied on. So a change is processed as: (i) recompute the + page's bucket membership; (ii) for buckets it **left**, re-verify the pairs that depended on + the shared bucket and **retract** edges no longer supported; (iii) for buckets it **entered**, + verify the new candidate pairs and **add** edges; (iv) **propagate** to the equivalence + neighbours of any retracted/added edge (equivalence is transitive-ish via chorus sets, so a + retraction can split a set). Maintenance is per-change and bounded by the page's + neighbourhood, but it covers retraction and propagation — not just additions. **The index is itself derived** (disposable, recomputable) and per-tenant-partitioned (§9). Its parameters (LSH band/row counts, shingle size, precision/recall) are tunable; the accepted **false-negative rate of blocking** is a known, tracked limitation (§12) — blocking trades a small miss rate for tractability, and curator bindings are the escape hatch for misses. +**Verifying I-2 (`derived = f(canonical)`) — eventually, not on faith (review B-4).** +Incremental maintenance can drift from a from-scratch fold over time (a missed retraction, a +dropped event, a bug). I-2 is therefore an **eventually-verified** property, not a free one, +and the architecture names the mechanism that verifies it: + +- **A digest of the derived tier.** Each partition's derived tier carries a rolling content + digest (a Merkle-style hash over union nodes/edges/index entries) maintained alongside the + incremental updates. +- **A background consistency-checker** periodically recomputes the digest over a *sampled* (or, + on a slow cadence, full) fold of canonical state and compares. A mismatch localises the drift + to a partition/region and triggers a **scoped recompute** of just that region — cheap relative + to a global rebuild, and self-healing. +- **So I-2 holds *eventually and verifiably*:** the incremental engine is the fast path, the + checker is the guarantee, and divergence is detected and repaired rather than silently + accumulating. The exact sampling rate / digest granularity is an implementation spike (§12). + ### 8.8 Cache freshness & invalidation Replication-projection caches remote shard content; cache invalidation is the actual hard part diff --git a/workplans/SHARD-WP-0006-architecture-hardening-2.md b/workplans/SHARD-WP-0006-architecture-hardening-2.md index 971a708..21ecb5f 100644 --- a/workplans/SHARD-WP-0006-architecture-hardening-2.md +++ b/workplans/SHARD-WP-0006-architecture-hardening-2.md @@ -94,7 +94,7 @@ Settle the keystone (review B-1 + B-3 together). Decide and document: ```task id: SHARD-WP-0006-T3 -status: todo +status: done priority: high state_hub_task_id: "900c8234-ca73-4225-b2c5-77d218ded28c" ```