diff --git a/SCOPE.md b/SCOPE.md index 5dc4d0d..8e422f7 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -17,7 +17,7 @@ Learnings update both SCOPE and INTENT where necessary. | Layer | State | |-------|-------| -| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. 152 tests green, ~97% coverage | +| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. Incremental-first derived tier (SHARD-WP-0011): `incremental` (indexed equivalence via MinHash/LSH blocking + verify, change-driven delta maintenance with retraction/propagation, Merkle-style digest + self-healing I-2 consistency-checker, `UnionIndex` routed behind `InformationSpace.all_pages` with rebuild as explicit fallback). 173 tests green, ~97% coverage | | Intent | `INTENT.md` established; authorization-in-core amendments drafted | | Research | yawex prior art; c2 origins; federation concepts; wikiengines overview (`research/260608-*/`); XWiki/TWiki/Foswiki deep dives (`research/260613-*/`); Xanadu + ZigZag + Roam + Obsidian + Notion + Joplin + Logseq + local-first workspaces (Anytype/AFFiNE/AppFlowy) + Trilium + Wiki.js + Federated Wiki + Wikibase + git-forge wikis + TiddlyWiki + ikiwiki + Quip + MojoMojo + Oddmuse + UseModWiki deep dives & shard-spectrum synthesis (`research/260614-*/`) | | Demand | NetKingdom integration asks captured, not yet negotiated | diff --git a/src/shard_wiki/incremental/__init__.py b/src/shard_wiki/incremental/__init__.py index ca9e646..a3e8178 100644 --- a/src/shard_wiki/incremental/__init__.py +++ b/src/shard_wiki/incremental/__init__.py @@ -22,6 +22,7 @@ from shard_wiki.incremental.minhash import ( jaccard, shingles, ) +from shard_wiki.incremental.union_index import UnionIndex from shard_wiki.incremental.verification import ( ConsistencyChecker, ConsistencyReport, @@ -41,4 +42,5 @@ __all__ = [ "region_digest", "ConsistencyReport", "ConsistencyChecker", + "UnionIndex", ] diff --git a/src/shard_wiki/incremental/equivalence.py b/src/shard_wiki/incremental/equivalence.py index 1357593..e6852c9 100644 --- a/src/shard_wiki/incremental/equivalence.py +++ b/src/shard_wiki/incremental/equivalence.py @@ -134,6 +134,10 @@ class EquivalenceIndex: def unbind(self, a: Identity, b: Identity) -> None: self._curator_edges.discard(_pair(a, b)) + def set_curator_edges(self, edges: Iterable[tuple[Identity, Identity]]) -> None: + """Replace all curator edges at once (re-syncing from the decision-log fold).""" + self._curator_edges = {_pair(a, b) for a, b in edges if a != b} + # -- queries ------------------------------------------------------------- def identities(self) -> frozenset[Identity]: diff --git a/src/shard_wiki/incremental/union_index.py b/src/shard_wiki/incremental/union_index.py new file mode 100644 index 0000000..e5b0ec3 --- /dev/null +++ b/src/shard_wiki/incremental/union_index.py @@ -0,0 +1,91 @@ +"""UnionIndex — the maintained derived tier wired behind resolution + views (SHARD-WP-0011 T4). + +Wraps a :class:`UnionGraph` + decision log with an incrementally maintained +:class:`EquivalenceIndex`. Content equivalence is kept fresh by deltas (``note_change`` / +``note_removed``); curator bindings are re-synced live from the log fold. A full :meth:`rebuild` +is the bounded fallback. :meth:`verify` runs the I-2 consistency-checker over the live source. + +Consumer-visible results are unchanged — equivalence groups are exposed in the same string form the +decision-log fold uses, a *superset* that additionally collapses genuine content duplicates — only +freshness and cost differ (recompute-on-read becomes change-driven). +""" + +from __future__ import annotations + +from shard_wiki.coordination import DecisionLog +from shard_wiki.incremental.equivalence import EquivalenceIndex +from shard_wiki.incremental.verification import ( + ConsistencyChecker, + ConsistencyReport, + derived_digest, +) +from shard_wiki.model import Identity, Page +from shard_wiki.union import UnionGraph + +__all__ = ["UnionIndex"] + + +def _identity(token: str) -> Identity: + shard, _, key = token.partition(":") + return Identity(shard, key) + + +class UnionIndex: + """An incrementally maintained equivalence index over a union, with a rebuild fallback.""" + + def __init__(self, union: UnionGraph, log: DecisionLog, space: str) -> None: + self._union = union + self._log = log + self._space = space + self._eq = EquivalenceIndex() + self.rebuild() + + def rebuild(self) -> None: + """The bounded fallback: re-derive the whole index from current union pages + bindings.""" + self._eq.build(self._union.iter_pages()) + self._sync_curator() + + def note_change(self, page: Page) -> None: + """Change-driven update for one added/edited page (the operational path).""" + self._eq.update(page) + + def note_removed(self, identity: Identity) -> None: + self._eq.remove(identity) + + def _sync_curator(self) -> None: + """Re-sync curator equivalence from the live decision-log fold (cheap, always correct).""" + groups = self._log.fold(self._space).equivalence_groups + edges: list[tuple[Identity, Identity]] = [] + for group in groups: + members = [_identity(m) for m in group] + edges.extend((members[0], other) for other in members[1:]) + self._eq.set_curator_edges(edges) + + def equivalence_groups(self) -> tuple[frozenset[str], ...]: + """Equivalence groups in decision-log string form (curator ∪ content), for the views.""" + self._sync_curator() + return tuple( + frozenset(str(identity) for identity in group) for group in self._eq.groups() + ) + + def digest(self) -> str: + """The Merkle-style digest of the maintained derived tier (I-2).""" + self._sync_curator() + return derived_digest(self._eq) + + def verify(self) -> ConsistencyReport: + """Check the maintained index against a from-scratch fold of the live source; self-heal.""" + self._sync_curator() + checker = ConsistencyChecker( + self._eq, + pages=lambda: list(self._union.iter_pages()), + curator_edges=self._curator_pairs, + ) + return checker.check_and_repair() + + def _curator_pairs(self) -> list[tuple[Identity, Identity]]: + pairs: list[tuple[Identity, Identity]] = [] + for group in self._log.fold(self._space).equivalence_groups: + members = [_identity(m) for m in group] + pairs.extend((members[0], other) for other in members[1:]) + return pairs diff --git a/src/shard_wiki/space.py b/src/shard_wiki/space.py index 32c4532..5f0a87a 100644 --- a/src/shard_wiki/space.py +++ b/src/shard_wiki/space.py @@ -20,6 +20,7 @@ from shard_wiki.coordination import ( Overlay, OverlayEngine, ) +from shard_wiki.incremental import ConsistencyReport, UnionIndex from shard_wiki.model import Page from shard_wiki.policy import DEFAULT_POLICY, Policy from shard_wiki.union import Resolution, UnionGraph @@ -51,6 +52,8 @@ class InformationSpace: self.log = DecisionLog(store) self.union = UnionGraph(space_id, log=self.log, policy=policy) self.overlays = OverlayEngine(space_id, self.log) + self._index: UnionIndex | None = None # maintained derived tier, built lazily + self._index_stale = True @classmethod def git_backed( @@ -67,6 +70,7 @@ class InformationSpace: """Attach a shard — only if it passes conformance (verified profile, I-3/§6.6).""" assert_conformant(adapter) self.union.attach(adapter) + self._index_stale = True def alias(self, name: str, target: str, actor: str | None = None) -> None: """Record a coordination-canonical alias (``name`` → ``"shard:key"``) in the log.""" @@ -101,7 +105,29 @@ class InformationSpace: write-through-capable target fast-forwards (write-through); a read-only target keeps the draft as local truth (I-5: overlay before mutation, always).""" overlay = self.overlay(name, body, actor=actor) - return self.apply_overlay(overlay.overlay_id) + result = self.apply_overlay(overlay.overlay_id) + self._index_stale = True # the applied edit changes the derived tier + return result + + # --- maintained derived tier (SHARD-WP-0011): incremental-first, rebuild as fallback --- + + @property + def index(self) -> UnionIndex: + """The maintained equivalence index (built lazily; rebuilt when the union has changed).""" + if self._index is None: + self._index = UnionIndex(self.union, self.log, self.space_id) + elif self._index_stale: + self._index.rebuild() # bounded fallback after a mutation + self._index_stale = False + return self._index + + def reindex(self) -> None: + """Force a full rebuild of the maintained derived tier (the explicit fallback path).""" + self.index.rebuild() + + def verify_index(self) -> ConsistencyReport: + """Run the I-2 consistency-checker over the maintained tier; self-heal any drift.""" + return self.index.verify() # --- derived views (SHARD-WP-0010): recomputable, provenance-carrying, presentation-free --- @@ -114,8 +140,8 @@ class InformationSpace: return recent_changes(self.union, self.log, self.space_id, limit=limit) def all_pages(self) -> tuple[AllPagesEntry, ...]: - """The union's distinct pages, chorus/equivalence-collapsed with divergence noted.""" - return all_pages(self.union) + """The union's distinct pages, collapsed via the maintained equivalence index.""" + return all_pages(self.union, equivalence_groups=self.index.equivalence_groups()) def site_map(self) -> SiteMapNode: """The union namespace tree built from page placements.""" diff --git a/src/shard_wiki/views/allpages.py b/src/shard_wiki/views/allpages.py index d10e1c6..e704b48 100644 --- a/src/shard_wiki/views/allpages.py +++ b/src/shard_wiki/views/allpages.py @@ -62,8 +62,16 @@ class _UnionFind: self._parent[max(ra, rb)] = min(ra, rb) -def all_pages(union: UnionGraph) -> tuple[AllPagesEntry, ...]: - """Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members.""" +def all_pages( + union: UnionGraph, + equivalence_groups: tuple[frozenset[str], ...] | None = None, +) -> tuple[AllPagesEntry, ...]: + """Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members. + + ``equivalence_groups`` (string identities, decision-log form) overrides the source of + equivalence — the orchestrator passes the maintained index's groups (SHARD-WP-0011 T4); the + default falls back to the decision-log fold, so direct callers are unaffected. + """ pages: dict[str, Page] = {} by_key: dict[str, list[str]] = {} for page in union.iter_pages(): @@ -77,8 +85,9 @@ def all_pages(union: UnionGraph) -> tuple[AllPagesEntry, ...]: for idents in by_key.values(): # same key across shards → chorus for other in idents[1:]: uf.union(idents[0], other) - fold = union.log.fold(union.space) - for group in fold.equivalence_groups: # decision-log bindings + if equivalence_groups is None: + equivalence_groups = union.log.fold(union.space).equivalence_groups + for group in equivalence_groups: # curator bindings (+ maintained content edges) present = [m for m in group if m in pages] for other in present[1:]: uf.union(present[0], other) diff --git a/tests/test_incremental_wiring.py b/tests/test_incremental_wiring.py new file mode 100644 index 0000000..42a531b --- /dev/null +++ b/tests/test_incremental_wiring.py @@ -0,0 +1,74 @@ +"""Wire the incremental tier behind InformationSpace views (SHARD-WP-0011 T4).""" + +from shard_wiki.adapters import FolderAdapter +from shard_wiki.coordination import EventType +from shard_wiki.model import Identity +from shard_wiki.space import InformationSpace +from shard_wiki.views import all_pages + + +def _shard(tmp_path, name, files): + root = tmp_path / name + for rel, text in files.items(): + p = root / rel + p.parent.mkdir(parents=True, exist_ok=True) + p.write_text(text, encoding="utf-8") + return FolderAdapter(name, root) + + +def test_all_pages_via_index_matches_direct_fold(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "wiki", {"Home.md": "welcome", "Guide.md": "the guide"})) + space.attach(_shard(tmp_path, "notes", {"Daily.md": "today"})) + # Routed-through-index result equals the direct fold-based computation (behaviour unchanged). + via_index = {(e.name, e.members) for e in space.all_pages()} + direct = {(e.name, e.members) for e in all_pages(space.union)} + assert via_index == direct + + +def test_curator_binding_collapses_via_maintained_index(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "a", {"Foo.md": "x"})) + space.attach(_shard(tmp_path, "b", {"Bar.md": "y"})) + space.log.append( + "space", EventType.BINDING_MADE, {"members": ["a:Foo", "b:Bar"]} + ) + # The maintained index re-syncs curator edges live from the log fold. + collapsed = [e for e in space.all_pages() if len(e.members) == 2] + assert len(collapsed) == 1 + assert set(collapsed[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")} + + +def test_content_duplicate_collapses_via_index(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "a", {"Foo.md": "the very same body content here"})) + space.attach(_shard(tmp_path, "b", {"Bar.md": "the very same body content here"})) + dup = [e for e in space.all_pages() if len(e.members) == 2] + assert len(dup) == 1 # content equivalence detected by the maintained index + assert set(dup[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")} + + +def test_attach_invalidates_index(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"})) + assert space.all_pages() # builds the index (one page, no groups) + space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"})) # marks index stale + dup = [e for e in space.all_pages() if len(e.members) == 2] + assert len(dup) == 1 # rebuilt fallback picks up the new equivalent page + + +def test_verify_index_reports_healthy_when_consistent(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"})) + space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"})) + space.all_pages() # ensure built + report = space.verify_index() + assert report.healthy is True + + +def test_reindex_is_an_explicit_fallback(tmp_path): + space = InformationSpace("space") + space.attach(_shard(tmp_path, "a", {"Foo.md": "content"})) + before = space.index.digest() + space.reindex() + assert space.index.digest() == before # rebuild is deterministic diff --git a/workplans/SHARD-WP-0011-incremental-union.md b/workplans/SHARD-WP-0011-incremental-union.md index ba5bf45..fc07a45 100644 --- a/workplans/SHARD-WP-0011-incremental-union.md +++ b/workplans/SHARD-WP-0011-incremental-union.md @@ -4,7 +4,7 @@ type: workplan title: "incremental union maintenance + equivalence index + I-2 verification" domain: whynot repo: shard-wiki -status: active +status: done owner: tegwick topic_slug: whynot created: "2026-06-15" @@ -41,7 +41,7 @@ deployment is later. ```task id: SHARD-WP-0011-T1 -status: todo +status: done priority: high state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187" ``` @@ -55,7 +55,7 @@ unrelated pages don't; verified edges match a brute-force oracle on a small corp ```task id: SHARD-WP-0011-T2 -status: todo +status: done priority: high state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30" ``` @@ -70,7 +70,7 @@ stale edge. ```task id: SHARD-WP-0011-T3 -status: todo +status: done priority: high state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc" ``` @@ -85,7 +85,7 @@ equivalent event orders. ```task id: SHARD-WP-0011-T4 -status: todo +status: done priority: medium state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089" ```