feat(incremental): wire maintained tier behind views; rebuild fallback (WP-0011 T4)

Route InformationSpace.all_pages through a maintained UnionIndex: equivalence is
served from the incrementally maintained index (curator bindings re-synced live
from the log fold + detected content edges), exposed in decision-log string form
so results are a behaviour-preserving superset. The index is built lazily and
rebuilt (bounded fallback) when the union mutates (attach/edit invalidate it);
reindex() forces a rebuild and verify_index() runs the I-2 self-healing checker.
all_pages() gains an optional equivalence_groups source (default = fold) so
direct callers are unaffected. SCOPE updated; WP-0011 done. 173 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-16 02:21:39 +02:00
parent a8e65235a8
commit 37681d89b6
8 changed files with 219 additions and 13 deletions

View File

@@ -17,7 +17,7 @@ Learnings update both SCOPE and INTENT where necessary.
| Layer | State |
|-------|-------|
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. 152 tests green, ~97% coverage |
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. Incremental-first derived tier (SHARD-WP-0011): `incremental` (indexed equivalence via MinHash/LSH blocking + verify, change-driven delta maintenance with retraction/propagation, Merkle-style digest + self-healing I-2 consistency-checker, `UnionIndex` routed behind `InformationSpace.all_pages` with rebuild as explicit fallback). 173 tests green, ~97% coverage |
| Intent | `INTENT.md` established; authorization-in-core amendments drafted |
| Research | yawex prior art; c2 origins; federation concepts; wikiengines overview (`research/260608-*/`); XWiki/TWiki/Foswiki deep dives (`research/260613-*/`); Xanadu + ZigZag + Roam + Obsidian + Notion + Joplin + Logseq + local-first workspaces (Anytype/AFFiNE/AppFlowy) + Trilium + Wiki.js + Federated Wiki + Wikibase + git-forge wikis + TiddlyWiki + ikiwiki + Quip + MojoMojo + Oddmuse + UseModWiki deep dives & shard-spectrum synthesis (`research/260614-*/`) |
| Demand | NetKingdom integration asks captured, not yet negotiated |

View File

@@ -22,6 +22,7 @@ from shard_wiki.incremental.minhash import (
jaccard,
shingles,
)
from shard_wiki.incremental.union_index import UnionIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
@@ -41,4 +42,5 @@ __all__ = [
"region_digest",
"ConsistencyReport",
"ConsistencyChecker",
"UnionIndex",
]

View File

@@ -134,6 +134,10 @@ class EquivalenceIndex:
def unbind(self, a: Identity, b: Identity) -> None:
self._curator_edges.discard(_pair(a, b))
def set_curator_edges(self, edges: Iterable[tuple[Identity, Identity]]) -> None:
"""Replace all curator edges at once (re-syncing from the decision-log fold)."""
self._curator_edges = {_pair(a, b) for a, b in edges if a != b}
# -- queries -------------------------------------------------------------
def identities(self) -> frozenset[Identity]:

View File

@@ -0,0 +1,91 @@
"""UnionIndex — the maintained derived tier wired behind resolution + views (SHARD-WP-0011 T4).
Wraps a :class:`UnionGraph` + decision log with an incrementally maintained
:class:`EquivalenceIndex`. Content equivalence is kept fresh by deltas (``note_change`` /
``note_removed``); curator bindings are re-synced live from the log fold. A full :meth:`rebuild`
is the bounded fallback. :meth:`verify` runs the I-2 consistency-checker over the live source.
Consumer-visible results are unchanged — equivalence groups are exposed in the same string form the
decision-log fold uses, a *superset* that additionally collapses genuine content duplicates — only
freshness and cost differ (recompute-on-read becomes change-driven).
"""
from __future__ import annotations
from shard_wiki.coordination import DecisionLog
from shard_wiki.incremental.equivalence import EquivalenceIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
derived_digest,
)
from shard_wiki.model import Identity, Page
from shard_wiki.union import UnionGraph
__all__ = ["UnionIndex"]
def _identity(token: str) -> Identity:
shard, _, key = token.partition(":")
return Identity(shard, key)
class UnionIndex:
"""An incrementally maintained equivalence index over a union, with a rebuild fallback."""
def __init__(self, union: UnionGraph, log: DecisionLog, space: str) -> None:
self._union = union
self._log = log
self._space = space
self._eq = EquivalenceIndex()
self.rebuild()
def rebuild(self) -> None:
"""The bounded fallback: re-derive the whole index from current union pages + bindings."""
self._eq.build(self._union.iter_pages())
self._sync_curator()
def note_change(self, page: Page) -> None:
"""Change-driven update for one added/edited page (the operational path)."""
self._eq.update(page)
def note_removed(self, identity: Identity) -> None:
self._eq.remove(identity)
def _sync_curator(self) -> None:
"""Re-sync curator equivalence from the live decision-log fold (cheap, always correct)."""
groups = self._log.fold(self._space).equivalence_groups
edges: list[tuple[Identity, Identity]] = []
for group in groups:
members = [_identity(m) for m in group]
edges.extend((members[0], other) for other in members[1:])
self._eq.set_curator_edges(edges)
def equivalence_groups(self) -> tuple[frozenset[str], ...]:
"""Equivalence groups in decision-log string form (curator content), for the views."""
self._sync_curator()
return tuple(
frozenset(str(identity) for identity in group) for group in self._eq.groups()
)
def digest(self) -> str:
"""The Merkle-style digest of the maintained derived tier (I-2)."""
self._sync_curator()
return derived_digest(self._eq)
def verify(self) -> ConsistencyReport:
"""Check the maintained index against a from-scratch fold of the live source; self-heal."""
self._sync_curator()
checker = ConsistencyChecker(
self._eq,
pages=lambda: list(self._union.iter_pages()),
curator_edges=self._curator_pairs,
)
return checker.check_and_repair()
def _curator_pairs(self) -> list[tuple[Identity, Identity]]:
pairs: list[tuple[Identity, Identity]] = []
for group in self._log.fold(self._space).equivalence_groups:
members = [_identity(m) for m in group]
pairs.extend((members[0], other) for other in members[1:])
return pairs

View File

@@ -20,6 +20,7 @@ from shard_wiki.coordination import (
Overlay,
OverlayEngine,
)
from shard_wiki.incremental import ConsistencyReport, UnionIndex
from shard_wiki.model import Page
from shard_wiki.policy import DEFAULT_POLICY, Policy
from shard_wiki.union import Resolution, UnionGraph
@@ -51,6 +52,8 @@ class InformationSpace:
self.log = DecisionLog(store)
self.union = UnionGraph(space_id, log=self.log, policy=policy)
self.overlays = OverlayEngine(space_id, self.log)
self._index: UnionIndex | None = None # maintained derived tier, built lazily
self._index_stale = True
@classmethod
def git_backed(
@@ -67,6 +70,7 @@ class InformationSpace:
"""Attach a shard — only if it passes conformance (verified profile, I-3/§6.6)."""
assert_conformant(adapter)
self.union.attach(adapter)
self._index_stale = True
def alias(self, name: str, target: str, actor: str | None = None) -> None:
"""Record a coordination-canonical alias (``name`` → ``"shard:key"``) in the log."""
@@ -101,7 +105,29 @@ class InformationSpace:
write-through-capable target fast-forwards (write-through); a read-only target keeps the
draft as local truth (I-5: overlay before mutation, always)."""
overlay = self.overlay(name, body, actor=actor)
return self.apply_overlay(overlay.overlay_id)
result = self.apply_overlay(overlay.overlay_id)
self._index_stale = True # the applied edit changes the derived tier
return result
# --- maintained derived tier (SHARD-WP-0011): incremental-first, rebuild as fallback ---
@property
def index(self) -> UnionIndex:
"""The maintained equivalence index (built lazily; rebuilt when the union has changed)."""
if self._index is None:
self._index = UnionIndex(self.union, self.log, self.space_id)
elif self._index_stale:
self._index.rebuild() # bounded fallback after a mutation
self._index_stale = False
return self._index
def reindex(self) -> None:
"""Force a full rebuild of the maintained derived tier (the explicit fallback path)."""
self.index.rebuild()
def verify_index(self) -> ConsistencyReport:
"""Run the I-2 consistency-checker over the maintained tier; self-heal any drift."""
return self.index.verify()
# --- derived views (SHARD-WP-0010): recomputable, provenance-carrying, presentation-free ---
@@ -114,8 +140,8 @@ class InformationSpace:
return recent_changes(self.union, self.log, self.space_id, limit=limit)
def all_pages(self) -> tuple[AllPagesEntry, ...]:
"""The union's distinct pages, chorus/equivalence-collapsed with divergence noted."""
return all_pages(self.union)
"""The union's distinct pages, collapsed via the maintained equivalence index."""
return all_pages(self.union, equivalence_groups=self.index.equivalence_groups())
def site_map(self) -> SiteMapNode:
"""The union namespace tree built from page placements."""

View File

@@ -62,8 +62,16 @@ class _UnionFind:
self._parent[max(ra, rb)] = min(ra, rb)
def all_pages(union: UnionGraph) -> tuple[AllPagesEntry, ...]:
"""Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members."""
def all_pages(
union: UnionGraph,
equivalence_groups: tuple[frozenset[str], ...] | None = None,
) -> tuple[AllPagesEntry, ...]:
"""Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members.
``equivalence_groups`` (string identities, decision-log form) overrides the source of
equivalence — the orchestrator passes the maintained index's groups (SHARD-WP-0011 T4); the
default falls back to the decision-log fold, so direct callers are unaffected.
"""
pages: dict[str, Page] = {}
by_key: dict[str, list[str]] = {}
for page in union.iter_pages():
@@ -77,8 +85,9 @@ def all_pages(union: UnionGraph) -> tuple[AllPagesEntry, ...]:
for idents in by_key.values(): # same key across shards → chorus
for other in idents[1:]:
uf.union(idents[0], other)
fold = union.log.fold(union.space)
for group in fold.equivalence_groups: # decision-log bindings
if equivalence_groups is None:
equivalence_groups = union.log.fold(union.space).equivalence_groups
for group in equivalence_groups: # curator bindings (+ maintained content edges)
present = [m for m in group if m in pages]
for other in present[1:]:
uf.union(present[0], other)

View File

@@ -0,0 +1,74 @@
"""Wire the incremental tier behind InformationSpace views (SHARD-WP-0011 T4)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import EventType
from shard_wiki.model import Identity
from shard_wiki.space import InformationSpace
from shard_wiki.views import all_pages
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_all_pages_via_index_matches_direct_fold(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "wiki", {"Home.md": "welcome", "Guide.md": "the guide"}))
space.attach(_shard(tmp_path, "notes", {"Daily.md": "today"}))
# Routed-through-index result equals the direct fold-based computation (behaviour unchanged).
via_index = {(e.name, e.members) for e in space.all_pages()}
direct = {(e.name, e.members) for e in all_pages(space.union)}
assert via_index == direct
def test_curator_binding_collapses_via_maintained_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "x"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "y"}))
space.log.append(
"space", EventType.BINDING_MADE, {"members": ["a:Foo", "b:Bar"]}
)
# The maintained index re-syncs curator edges live from the log fold.
collapsed = [e for e in space.all_pages() if len(e.members) == 2]
assert len(collapsed) == 1
assert set(collapsed[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_content_duplicate_collapses_via_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "the very same body content here"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "the very same body content here"}))
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # content equivalence detected by the maintained index
assert set(dup[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_attach_invalidates_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
assert space.all_pages() # builds the index (one page, no groups)
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"})) # marks index stale
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # rebuilt fallback picks up the new equivalent page
def test_verify_index_reports_healthy_when_consistent(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"}))
space.all_pages() # ensure built
report = space.verify_index()
assert report.healthy is True
def test_reindex_is_an_explicit_fallback(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "content"}))
before = space.index.digest()
space.reindex()
assert space.index.digest() == before # rebuild is deterministic

View File

@@ -4,7 +4,7 @@ type: workplan
title: "incremental union maintenance + equivalence index + I-2 verification"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -41,7 +41,7 @@ deployment is later.
```task
id: SHARD-WP-0011-T1
status: todo
status: done
priority: high
state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187"
```
@@ -55,7 +55,7 @@ unrelated pages don't; verified edges match a brute-force oracle on a small corp
```task
id: SHARD-WP-0011-T2
status: todo
status: done
priority: high
state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30"
```
@@ -70,7 +70,7 @@ stale edge.
```task
id: SHARD-WP-0011-T3
status: todo
status: done
priority: high
state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc"
```
@@ -85,7 +85,7 @@ equivalent event orders.
```task
id: SHARD-WP-0011-T4
status: todo
status: done
priority: medium
state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089"
```