Compare commits

...

10 Commits

Author SHA1 Message Date
def699c1eb feat(adapters): GitShardAdapter history adopt + cross-substrate integration (WP-0012 T3)
Adopt git-native history (TSD §A.5): a VERSION-gated history(key) surfaces the
commit list for a path (newest-first sha + subject) — declared by every git-IS-store
shard, read-only or not. Integration proves the union/overlay/edit machinery works
unchanged across folder + git substrates: resolve/chorus span both, edit through a
git shard fast-forwards as a commit, apply-under-drift refuses on an external commit
(sha drift) without clobbering, and a read-only git target keeps the overlay as a
draft. SCOPE updated; WP-0012 done. 196 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:41:19 +02:00
a4e0f52ec1 feat(adapters): GitShardAdapter write=commit + current_rev drift (WP-0012 T2)
Writable mode: write(key, body) stages and commits the file (skipping a no-op so
no empty commit is created), returning the page at the new commit sha. The
writable profile declares WRITE + VERSION with PER_PAGE granularity. current_rev
is the per-path commit sha, so a write — or an external commit to the same path —
moves it, driving apply-under-drift. Passes the conformance positive-write probe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:38:41 +02:00
4231daf94f feat(adapters): GitShardAdapter read path + git-IS-store profile (WP-0012 T1)
A second substrate validating the contract beyond plain folders: a git-IS-store
shard reading Markdown from a git repo. Keys are tracked *.md paths; read returns
a Page whose source_rev is the per-path last-commit sha (so an edit to one page
never drifts another); profile is git-IS-store / substrate=git / history=git-native
/ addressing=path, validated against the §6.5 implication rules. Passes the
conformance read path with honest absence of unclaimed verbs. Zero new deps
(git CLI via subprocess). No core changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:36:28 +02:00
37681d89b6 feat(incremental): wire maintained tier behind views; rebuild fallback (WP-0011 T4)
Route InformationSpace.all_pages through a maintained UnionIndex: equivalence is
served from the incrementally maintained index (curator bindings re-synced live
from the log fold + detected content edges), exposed in decision-log string form
so results are a behaviour-preserving superset. The index is built lazily and
rebuilt (bounded fallback) when the union mutates (attach/edit invalidate it);
reindex() forces a rebuild and verify_index() runs the I-2 self-healing checker.
all_pages() gains an optional equivalence_groups source (default = fold) so
direct callers are unaffected. SCOPE updated; WP-0011 done. 173 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:21:39 +02:00
a8e65235a8 feat(incremental): I-2 digest + consistency-checker (WP-0011 T3)
A Merkle-style digest summarizes the derived tier (per-identity fingerprint +
incident edges as order-independent leaves) so equal states have equal digests
and the digest is stable under equivalent event orders. A ConsistencyChecker
recomputes the authoritative fold from the current source, compares it over a
sampled region, and on mismatch scoped-recomputes just the affected identities —
self-healing missed-delta drift, corrupted internal state, and vanished pages.
Makes derived = f(canonical) verified, not asserted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:16:50 +02:00
d7d046cac0 test(incremental): delta maintenance == rebuild, retraction + split (WP-0011 T2)
Verify change-driven maintenance keeps the equivalence index equal to a
from-scratch rebuild under add / edit / remove: an edit into a new bucket
retracts the stale edge, an edit into equivalence adds one, and removing a
connector node propagates a retraction that splits a chorus. Equality checked
against a fresh build() oracle on every operation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:14:32 +02:00
0b3ab2086f feat(incremental): indexed equivalence — blocking + verify (WP-0011 T1)
Detect equivalence (distinct identities holding the same page) without pairwise
O(N²): MinHash/LSH bands over content shingles + normalized-title buckets
generate candidates (blocking), then exact-fingerprint or Jaccard>=threshold
confirm them (verify), with curator decision-log bindings always forming edges.
Groups are the connected components of the edge set. Includes the incremental
add/update/remove internals used by T2. Matches a brute-force oracle. New
incremental/ package (minhash primitives + EquivalenceIndex).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:13:06 +02:00
d85d019543 feat(views): wire derived views onto InformationSpace + integration (WP-0010 T5)
Expose backlinks(name), recent_changes(), all_pages(), site_map() on
InformationSpace. Integration test exercises all four over two shards (BackLinks
aggregate across shards, AllPages/SiteMap span the union, RecentChanges merges an
alias decision with shard edits). SCOPE updated; WP-0010 done. 152 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:05:12 +02:00
3a5acdcb28 feat(views): AllPages + SiteMap enumeration views (WP-0010 T4)
AllPages enumerates the union's distinct pages, collapsing chorus (same key
across shards) and equivalence-bound identities into one entry via union-find,
noting divergence when members' bodies differ (collapse acknowledged, not
silent). SiteMap builds the namespace tree from page placements, spanning shards.
Both derived/recomputable and presentation-free.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:03:15 +02:00
34b0c539f3 feat(views): RecentChanges merged change feed (WP-0010 T3)
One newest-first feed merging the coordination journal (overlay/alias/fork/merge/
binding decisions, with actor + payload) and shard change signals (page
source_rev / mtime). Each entry carries provenance: the originating shard for an
edit, or 'coordination' (and the actor) for a decision. Non-temporal revision
tokens are skipped gracefully. Derived/recomputable; notify-streaming later.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:59:11 +02:00
25 changed files with 1913 additions and 17 deletions

View File

@@ -17,7 +17,7 @@ Learnings update both SCOPE and INTENT where necessary.
| Layer | State |
|-------|-------|
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. 128 tests green, ~97% coverage |
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. Incremental-first derived tier (SHARD-WP-0011): `incremental` (indexed equivalence via MinHash/LSH blocking + verify, change-driven delta maintenance with retraction/propagation, Merkle-style digest + self-healing I-2 consistency-checker, `UnionIndex` routed behind `InformationSpace.all_pages` with rebuild as explicit fallback). Second adapter (SHARD-WP-0012): `GitShardAdapter` — git-IS-store substrate (read=tracked *.md, write=commit, current_rev=per-path sha for drift, adopted git-native history), passes conformance, works across folder+git shards in union/overlay/edit with no core change (capability-as-data proven on a second substrate). 196 tests green, ~97% coverage |
| Intent | `INTENT.md` established; authorization-in-core amendments drafted |
| Research | yawex prior art; c2 origins; federation concepts; wikiengines overview (`research/260608-*/`); XWiki/TWiki/Foswiki deep dives (`research/260613-*/`); Xanadu + ZigZag + Roam + Obsidian + Notion + Joplin + Logseq + local-first workspaces (Anytype/AFFiNE/AppFlowy) + Trilium + Wiki.js + Federated Wiki + Wikibase + git-forge wikis + TiddlyWiki + ikiwiki + Quip + MojoMojo + Oddmuse + UseModWiki deep dives & shard-spectrum synthesis (`research/260614-*/`) |
| Demand | NetKingdom integration asks captured, not yet negotiated |

View File

@@ -9,10 +9,13 @@ from shard_wiki.adapters.conformance import (
)
from shard_wiki.adapters.contract import CONTRACT_VERSION, ShardAdapter
from shard_wiki.adapters.folder import FolderAdapter
from shard_wiki.adapters.git import GitShardAdapter, PageRevision
__all__ = [
"ShardAdapter",
"FolderAdapter",
"GitShardAdapter",
"PageRevision",
"CONTRACT_VERSION",
"Check",
"ConformanceReport",

View File

@@ -0,0 +1,180 @@
"""GitShardAdapter — a second substrate: git-as-store (SHARD-WP-0012; TSD §A.3 git-IS-store).
The home case where **git is the store *and* the journal**. Tracked ``*.md`` paths are the page
keys; the working-tree file is the body; a page's ``source_rev`` is the **commit sha of the last
commit touching its path** (per-path, so an edit to one page never drifts another). The declared
profile is *git-IS-store ⟹ substrate=git ∧ history=git-native* — the implication rule the
capability model enforces (§6.5), validated at registration like any other binding.
This adapter adds **no core changes**: it implements the same :class:`ShardAdapter` contract the
folder adapter does, proving "write an adapter + declare a verified profile" is the whole cost of a
new substrate (capability-as-data, I-3). Built on the ``git`` CLI via subprocess — zero new deps.
"""
from __future__ import annotations
import os
import subprocess
from collections.abc import Iterable
from dataclasses import dataclass
from pathlib import Path
from shard_wiki.adapters.contract import ShardAdapter
from shard_wiki.model import (
AccessGrant,
Addressing,
AttachmentMode,
CapabilityProfile,
ContentOpacity,
History,
Identity,
MergeModel,
NativeQuery,
NotSupported,
OperationalEnvelope,
Page,
Placement,
Substrate,
Translation,
Verb,
WriteGranularity,
)
from shard_wiki.provenance import Liveness, ProvenanceEnvelope, Staleness
__all__ = ["GitShardAdapter", "PageRevision"]
@dataclass(frozen=True, slots=True)
class PageRevision:
"""One adopted git-native revision of a page: the commit sha and its subject line."""
sha: str
message: str
_GIT_IDENTITY = {
"GIT_AUTHOR_NAME": "shard-wiki",
"GIT_AUTHOR_EMAIL": "shard@shard-wiki",
"GIT_COMMITTER_NAME": "shard-wiki",
"GIT_COMMITTER_EMAIL": "shard@shard-wiki",
}
class GitShardAdapter(ShardAdapter):
"""A shard whose store is a git repo: keys are tracked ``*.md`` paths, revs are commit shas."""
def __init__(self, shard_id: str, repo_path: str | Path, writable: bool = False) -> None:
self._shard_id = shard_id
self._repo = Path(repo_path)
self._writable = writable
self._repo.mkdir(parents=True, exist_ok=True)
if not (self._repo / ".git").exists():
self._git("init", "--quiet")
@property
def shard_id(self) -> str:
return self._shard_id
def profile(self) -> CapabilityProfile:
# VERSION is always available — a git-IS-store has git-native history to adopt (§A.5),
# read-only or not. WRITE (= commit, PER_PAGE) is added only in writable mode.
verbs = {Verb.READ, Verb.VERSION}
granularity = WriteGranularity.NONE
if self._writable:
verbs |= {Verb.WRITE}
granularity = WriteGranularity.PER_PAGE
return CapabilityProfile(
substrate=Substrate.GIT,
attachment_mode=AttachmentMode.GIT_IS_STORE,
write_granularity=granularity,
content_opacity=ContentOpacity.TRANSPARENT,
operational_envelope=OperationalEnvelope.LOCAL_UNBOUNDED,
access_grant=AccessGrant.OPEN,
liveness=Liveness.STATIC,
history=History.GIT_NATIVE, # git-is-store ⟹ git-native (§6.5)
merge_model=MergeModel.GIT_TEXT,
addressing=Addressing.PATH,
native_query=NativeQuery.NONE,
translation=Translation.NATIVE,
supported_verbs=frozenset(verbs),
).validate()
def write(self, key: str, body: str) -> Page:
"""Write = **commit**: stage the file and commit it (skip a no-op so no empty commit),
returning the page at the new sha. Drift detection rides on ``current_rev`` = that sha."""
if not self._writable:
raise NotSupported(f"{type(self).__name__} is read-only")
rel = f"{key}.md"
path = self._path_for(key)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(body, encoding="utf-8")
self._git("add", "--", rel)
if self._run("diff", "--cached", "--quiet").returncode != 0: # staged changes present
self._git("commit", "-m", f"write {rel}", env=_GIT_IDENTITY)
return self.read(key)
def keys(self) -> Iterable[str]:
out = self._git("ls-files", "*.md").decode()
for line in out.splitlines():
yield line[: -len(".md")] if line.endswith(".md") else line
def read(self, key: str) -> Page:
path = self._path_for(key)
if not path.is_file():
raise KeyError(key)
rev = self.current_rev(key)
return Page(
identity=Identity(self._shard_id, key),
body=path.read_text(encoding="utf-8"),
envelope=ProvenanceEnvelope(
source_shard=self._shard_id,
liveness=Liveness.STATIC,
staleness=Staleness.FRESH,
source_rev=rev,
lineage="git-native",
),
placements=(Placement(self._shard_id, f"{key}.md"),),
)
def current_rev(self, key: str) -> str | None:
"""The sha of the last commit touching ``key``'s path (per-path drift token), or None."""
rel = f"{key}.md"
if not self._path_for(key).is_file():
return None
sha = self._git("log", "-1", "--format=%H", "--", rel).decode().strip()
return sha or None
def history(self, key: str) -> tuple[PageRevision, ...]:
"""Adopt git-native history (§A.5): the commit list for ``key``'s path, newest-first.
VERSION-gated; raises ``KeyError`` for an unknown page. Each revision is a commit sha +
subject — the native log surfaced through the contract, not re-implemented.
"""
if not self.profile().supports(Verb.VERSION):
raise NotSupported(f"{type(self).__name__} does not support version")
if not self._path_for(key).is_file():
raise KeyError(key)
out = self._git("log", "--format=%H%x00%s", "--", f"{key}.md").decode()
revisions = []
for line in out.splitlines():
sha, _, message = line.partition("\x00")
revisions.append(PageRevision(sha=sha, message=message))
return tuple(revisions)
# -- git plumbing --------------------------------------------------------
def _path_for(self, key: str) -> Path:
return self._repo / f"{key}.md"
def _git(self, *args: str, stdin: bytes | None = None, env: dict | None = None) -> bytes:
return self._run(*args, stdin=stdin, env=env, check=True).stdout
def _run(
self, *args: str, stdin: bytes | None = None, env: dict | None = None, check: bool = False
) -> subprocess.CompletedProcess:
return subprocess.run(
["git", "-C", str(self._repo), *args],
input=stdin,
capture_output=True,
env={**os.environ, **(env or {})},
check=check,
)

View File

@@ -0,0 +1,46 @@
"""incremental/ — the incremental-first derived tier (CoreArchitectureBlueprint §8.7).
Equivalence is **indexed** (blocking/LSH + verify), not pairwise O(N²); maintenance is
**change-driven** (delta with retraction + propagation, review B-4), keeping the derived tier equal
to a from-scratch rebuild — which becomes a bounded fallback, not the operational path. A
Merkle-style **digest** plus a background **consistency-checker** make ``derived = f(canonical)``
verified rather than asserted (I-2), self-healing on detected drift.
In-memory only for this slice (no persisted index store); per-partition structure is honoured but
multi-tenant deployment is later. Per the dependency rule this imports down (model/provenance) and
is wired by the orchestrator.
"""
from shard_wiki.incremental.equivalence import (
EquivalenceEdge,
EquivalenceIndex,
normalized_title,
)
from shard_wiki.incremental.minhash import (
MinHasher,
band_keys,
jaccard,
shingles,
)
from shard_wiki.incremental.union_index import UnionIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
derived_digest,
region_digest,
)
__all__ = [
"shingles",
"MinHasher",
"band_keys",
"jaccard",
"EquivalenceEdge",
"EquivalenceIndex",
"normalized_title",
"derived_digest",
"region_digest",
"ConsistencyReport",
"ConsistencyChecker",
"UnionIndex",
]

View File

@@ -0,0 +1,225 @@
"""Indexed equivalence — blocking + verify, incrementally maintained (SHARD-WP-0011 T1/T2).
Equivalence (two *distinct* identities holding the same page) is detected without pairwise O(N²):
1. **Blocking** generates candidate pairs — pages sharing a normalized-title bucket or an LSH band
(MinHash over content shingles).
2. **Verify** confirms a candidate — exact-body fingerprint match, or shingle Jaccard ≥ threshold —
plus **curator bindings** (explicit decision-log edges) which are always equivalence edges.
The index is **incrementally maintained** (T2): ``add`` / ``update`` / ``remove`` re-bucket the
changed page, **retract** the edges it leaves and **add** the edges it enters; equivalence groups
are the connected components of the current edge set, so a retraction that disconnects a component
**splits** a chorus automatically. A full :meth:`build` is just repeated ``add`` — the bounded
rebuild fallback. The invariant (and the test oracle): incremental state == a from-scratch rebuild.
"""
from __future__ import annotations
import hashlib
import re
from collections.abc import Iterable
from dataclasses import dataclass
from shard_wiki.incremental.minhash import MinHasher, band_keys, jaccard, shingles
from shard_wiki.model import Identity, Page
__all__ = ["EquivalenceEdge", "EquivalenceIndex", "normalized_title"]
_NONALNUM_RE = re.compile(r"[^a-z0-9]+")
def normalized_title(key: str) -> str:
"""A blocking bucket key: the last path segment, lowercased, stripped of non-alphanumerics."""
leaf = key.rsplit("/", 1)[-1]
return _NONALNUM_RE.sub("", leaf.lower())
@dataclass(frozen=True, slots=True)
class EquivalenceEdge:
"""A verified equivalence between two identities, tagged with why it was accepted."""
a: Identity
b: Identity
reason: str # "fingerprint" | "content" | "curator"
@dataclass(frozen=True, slots=True)
class _Entry:
shingle_set: frozenset[str]
bands: tuple[tuple[int, tuple[int, ...]], ...]
title: str
fingerprint: str
def _fingerprint(body: str) -> str:
return hashlib.blake2b(body.strip().encode("utf-8"), digest_size=16).hexdigest()
def _pair(a: Identity, b: Identity) -> frozenset[Identity]:
return frozenset((a, b))
class EquivalenceIndex:
"""An incrementally maintained, blocked-and-verified equivalence relation over union pages."""
def __init__(
self,
*,
num_perm: int = 64,
num_bands: int = 32,
threshold: float = 0.7,
hasher: MinHasher | None = None,
) -> None:
self.threshold = threshold
self.num_bands = num_bands
self._hasher = hasher or MinHasher(num_perm=num_perm)
self._entries: dict[Identity, _Entry] = {}
self._band_buckets: dict[tuple[int, tuple[int, ...]], set[Identity]] = {}
self._title_buckets: dict[str, set[Identity]] = {}
self._content_edges: dict[frozenset[Identity], str] = {}
self._curator_edges: set[frozenset[Identity]] = set()
# -- build / maintain ----------------------------------------------------
def build(
self,
pages: Iterable[Page],
curator_edges: Iterable[tuple[Identity, Identity]] = (),
) -> None:
"""Rebuild from scratch (the bounded fallback): add every page, then curator edges."""
self.__init__(
num_bands=self.num_bands, threshold=self.threshold, hasher=self._hasher
)
for page in pages:
self.add(page)
for a, b in curator_edges:
self.bind(a, b)
def add(self, page: Page) -> None:
"""Index a new (or, via :meth:`update`, refreshed) page and add its equivalence edges."""
identity = page.identity
entry = self._make_entry(page)
self._entries[identity] = entry
for key in entry.bands:
self._band_buckets.setdefault(key, set()).add(identity)
self._title_buckets.setdefault(entry.title, set()).add(identity)
for candidate in self._candidates(identity, entry):
reason = self._verify(identity, candidate)
if reason is not None:
self._content_edges[_pair(identity, candidate)] = reason
def remove(self, identity: Identity) -> None:
"""Drop a page: de-bucket it and retract every content edge incident to it."""
entry = self._entries.pop(identity, None)
if entry is None:
return
for key in entry.bands:
self._discard_bucket(self._band_buckets, key, identity)
self._discard_bucket(self._title_buckets, entry.title, identity)
for edge in [e for e in self._content_edges if identity in e]:
del self._content_edges[edge]
def update(self, page: Page) -> None:
"""Apply a change as retract-then-add: stale (bucket-exit) edges go, new edges arrive."""
self.remove(page.identity)
self.add(page)
def bind(self, a: Identity, b: Identity) -> None:
"""Record a curator equivalence (an explicit decision-log binding); always an edge."""
if a != b:
self._curator_edges.add(_pair(a, b))
def unbind(self, a: Identity, b: Identity) -> None:
self._curator_edges.discard(_pair(a, b))
def set_curator_edges(self, edges: Iterable[tuple[Identity, Identity]]) -> None:
"""Replace all curator edges at once (re-syncing from the decision-log fold)."""
self._curator_edges = {_pair(a, b) for a, b in edges if a != b}
# -- queries -------------------------------------------------------------
def identities(self) -> frozenset[Identity]:
"""All identities currently present in the index."""
return frozenset(self._entries)
def fingerprint(self, identity: Identity) -> str | None:
"""The content fingerprint indexed for ``identity`` (None if absent) — a digest leaf."""
entry = self._entries.get(identity)
return entry.fingerprint if entry is not None else None
def edges(self) -> frozenset[frozenset[Identity]]:
"""All equivalence edges (content + curator) among currently present identities."""
present = self._entries.keys()
curator = {e for e in self._curator_edges if e <= present}
return frozenset(set(self._content_edges) | curator)
def groups(self) -> tuple[frozenset[Identity], ...]:
"""Equivalence groups: connected components of size ≥ 2 (union-find over the edges)."""
parent: dict[Identity, Identity] = {}
def find(x: Identity) -> Identity:
parent.setdefault(x, x)
root = x
while parent[root] != root:
root = parent[root]
while parent[x] != root:
parent[x], x = root, parent[x]
return root
for edge in self.edges():
a, b = tuple(edge)
ra, rb = find(a), find(b)
if ra != rb:
parent[ra] = rb
comps: dict[Identity, set[Identity]] = {}
for node in parent:
comps.setdefault(find(node), set()).add(node)
return tuple(
frozenset(members) for members in comps.values() if len(members) > 1
)
def equivalent_to(self, identity: Identity) -> frozenset[Identity]:
"""The equivalence group containing ``identity`` (including itself), else just itself."""
for group in self.groups():
if identity in group:
return group
return frozenset({identity})
# -- internals -----------------------------------------------------------
def _make_entry(self, page: Page) -> _Entry:
shingle_set = shingles(page.body)
signature = self._hasher.signature(shingle_set)
return _Entry(
shingle_set=shingle_set,
bands=band_keys(signature, self.num_bands),
title=normalized_title(page.identity.key),
fingerprint=_fingerprint(page.body),
)
def _candidates(self, identity: Identity, entry: _Entry) -> set[Identity]:
candidates: set[Identity] = set()
for key in entry.bands:
candidates |= self._band_buckets.get(key, set())
candidates |= self._title_buckets.get(entry.title, set())
candidates.discard(identity)
return candidates
def _verify(self, a: Identity, b: Identity) -> str | None:
ea, eb = self._entries[a], self._entries[b]
if ea.fingerprint == eb.fingerprint:
return "fingerprint"
if jaccard(ea.shingle_set, eb.shingle_set) >= self.threshold:
return "content"
return None
@staticmethod
def _discard_bucket(buckets: dict, key, identity: Identity) -> None:
bucket = buckets.get(key)
if bucket is not None:
bucket.discard(identity)
if not bucket:
del buckets[key]

View File

@@ -0,0 +1,71 @@
"""MinHash + LSH banding primitives for content-similarity blocking (SHARD-WP-0011 T1).
Pure, deterministic functions (fixed hashing, no per-run randomness) so the derived tier and its
digest are reproducible. Shingle a body into k-grams, MinHash the shingle set into a signature,
split the signature into LSH bands; two pages sharing a band are *candidates* for equivalence —
the cheap pre-filter that replaces pairwise O(N²) comparison.
"""
from __future__ import annotations
import hashlib
import random
import re
from collections.abc import Iterable
__all__ = ["shingles", "MinHasher", "band_keys", "jaccard"]
_WORD_RE = re.compile(r"\w+")
# Largest Mersenne prime below 2**61 — the modulus for the universal-hash permutations.
_PRIME = (1 << 61) - 1
def shingles(text: str, k: int = 3) -> frozenset[str]:
"""The set of word k-grams in ``text`` (lowercased). Short texts fall back to their word set."""
words = _WORD_RE.findall(text.lower())
if len(words) < k:
return frozenset(words)
return frozenset(" ".join(words[i : i + k]) for i in range(len(words) - k + 1))
def _stable_hash(token: str) -> int:
return int.from_bytes(hashlib.blake2b(token.encode("utf-8"), digest_size=8).digest(), "big")
class MinHasher:
"""A bank of ``num_perm`` universal hash permutations producing a fixed-length signature."""
def __init__(self, num_perm: int = 64, seed: int = 1) -> None:
self.num_perm = num_perm
rng = random.Random(seed)
self._coeffs = [
(rng.randrange(1, _PRIME), rng.randrange(0, _PRIME)) for _ in range(num_perm)
]
def signature(self, shingle_set: Iterable[str]) -> tuple[int, ...]:
"""The MinHash signature of ``shingle_set`` (empty set → all-``_PRIME`` sentinel)."""
hashed = [_stable_hash(s) for s in shingle_set]
if not hashed:
return tuple(_PRIME for _ in self._coeffs)
return tuple(min((a * h + b) % _PRIME for h in hashed) for a, b in self._coeffs)
def band_keys(
signature: tuple[int, ...], num_bands: int
) -> tuple[tuple[int, tuple[int, ...]], ...]:
"""Split a signature into ``num_bands`` band keys; two pages sharing one are LSH candidates."""
if num_bands <= 0 or len(signature) % num_bands != 0:
raise ValueError(f"signature length {len(signature)} not divisible into {num_bands} bands")
rows = len(signature) // num_bands
return tuple(
(b, signature[b * rows : (b + 1) * rows]) for b in range(num_bands)
)
def jaccard(a: frozenset[str], b: frozenset[str]) -> float:
"""Jaccard similarity of two shingle sets; two empty sets are defined as identical (1.0)."""
if not a and not b:
return 1.0
if not a or not b:
return 0.0
return len(a & b) / len(a | b)

View File

@@ -0,0 +1,91 @@
"""UnionIndex — the maintained derived tier wired behind resolution + views (SHARD-WP-0011 T4).
Wraps a :class:`UnionGraph` + decision log with an incrementally maintained
:class:`EquivalenceIndex`. Content equivalence is kept fresh by deltas (``note_change`` /
``note_removed``); curator bindings are re-synced live from the log fold. A full :meth:`rebuild`
is the bounded fallback. :meth:`verify` runs the I-2 consistency-checker over the live source.
Consumer-visible results are unchanged — equivalence groups are exposed in the same string form the
decision-log fold uses, a *superset* that additionally collapses genuine content duplicates — only
freshness and cost differ (recompute-on-read becomes change-driven).
"""
from __future__ import annotations
from shard_wiki.coordination import DecisionLog
from shard_wiki.incremental.equivalence import EquivalenceIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
derived_digest,
)
from shard_wiki.model import Identity, Page
from shard_wiki.union import UnionGraph
__all__ = ["UnionIndex"]
def _identity(token: str) -> Identity:
shard, _, key = token.partition(":")
return Identity(shard, key)
class UnionIndex:
"""An incrementally maintained equivalence index over a union, with a rebuild fallback."""
def __init__(self, union: UnionGraph, log: DecisionLog, space: str) -> None:
self._union = union
self._log = log
self._space = space
self._eq = EquivalenceIndex()
self.rebuild()
def rebuild(self) -> None:
"""The bounded fallback: re-derive the whole index from current union pages + bindings."""
self._eq.build(self._union.iter_pages())
self._sync_curator()
def note_change(self, page: Page) -> None:
"""Change-driven update for one added/edited page (the operational path)."""
self._eq.update(page)
def note_removed(self, identity: Identity) -> None:
self._eq.remove(identity)
def _sync_curator(self) -> None:
"""Re-sync curator equivalence from the live decision-log fold (cheap, always correct)."""
groups = self._log.fold(self._space).equivalence_groups
edges: list[tuple[Identity, Identity]] = []
for group in groups:
members = [_identity(m) for m in group]
edges.extend((members[0], other) for other in members[1:])
self._eq.set_curator_edges(edges)
def equivalence_groups(self) -> tuple[frozenset[str], ...]:
"""Equivalence groups in decision-log string form (curator content), for the views."""
self._sync_curator()
return tuple(
frozenset(str(identity) for identity in group) for group in self._eq.groups()
)
def digest(self) -> str:
"""The Merkle-style digest of the maintained derived tier (I-2)."""
self._sync_curator()
return derived_digest(self._eq)
def verify(self) -> ConsistencyReport:
"""Check the maintained index against a from-scratch fold of the live source; self-heal."""
self._sync_curator()
checker = ConsistencyChecker(
self._eq,
pages=lambda: list(self._union.iter_pages()),
curator_edges=self._curator_pairs,
)
return checker.check_and_repair()
def _curator_pairs(self) -> list[tuple[Identity, Identity]]:
pairs: list[tuple[Identity, Identity]] = []
for group in self._log.fold(self._space).equivalence_groups:
members = [_identity(m) for m in group]
pairs.extend((members[0], other) for other in members[1:])
return pairs

View File

@@ -0,0 +1,112 @@
"""I-2 verification — digest + background consistency-checker (SHARD-WP-0011 T3).
``derived = f(canonical)`` is made *verified*, not asserted. A **Merkle-style digest** summarizes
the derived tier (each identity's content fingerprint + its incident equivalence edges as a leaf,
order-independently combined into a root) so two derived states are equal iff their digests match.
A **consistency-checker** recomputes the authoritative fold from the current source, compares it to
the maintained index over a (sampled) region, and on mismatch performs a **scoped recompute** of
just the affected identities — self-healing drift from a missed delta or corrupted state.
The digest is a pure function of index state, so it is "maintained alongside deltas" for free and
is stable under equivalent event orders (leaves are sorted before combination).
"""
from __future__ import annotations
import hashlib
from collections.abc import Callable, Iterable
from dataclasses import dataclass
from shard_wiki.incremental.equivalence import EquivalenceIndex
from shard_wiki.model import Identity, Page
__all__ = ["region_digest", "derived_digest", "ConsistencyReport", "ConsistencyChecker"]
CuratorEdges = Iterable[tuple[Identity, Identity]]
def _leaf(index: EquivalenceIndex, identity: Identity) -> str:
"""A digest leaf for one identity: its fingerprint + its incident edges (as sorted peers)."""
fingerprint = index.fingerprint(identity) or ""
peers = sorted(
str(other)
for edge in index.edges()
if identity in edge
for other in edge
if other != identity
)
payload = f"{identity}|{fingerprint}|{','.join(peers)}"
return hashlib.blake2b(payload.encode("utf-8"), digest_size=16).hexdigest()
def region_digest(index: EquivalenceIndex, identities: Iterable[Identity]) -> str:
"""A Merkle-style root over the given identities' leaves (order-independent)."""
leaves = sorted(_leaf(index, identity) for identity in identities)
root = hashlib.blake2b(digest_size=16)
for leaf in leaves:
root.update(leaf.encode("utf-8"))
return root.hexdigest()
def derived_digest(index: EquivalenceIndex) -> str:
"""The digest of the whole maintained derived tier."""
return region_digest(index, index.identities())
@dataclass(frozen=True, slots=True)
class ConsistencyReport:
"""Outcome of a consistency check: what was examined, whether it drifted, and if it healed."""
checked: int
drifted: bool
repaired: bool
healthy: bool
class ConsistencyChecker:
"""Compares the maintained index against an authoritative rebuild and repairs drift in place."""
def __init__(
self,
index: EquivalenceIndex,
pages: Callable[[], Iterable[Page]],
curator_edges: Callable[[], CuratorEdges] = lambda: (),
) -> None:
self._index = index
self._pages = pages
self._curator = curator_edges
def _authoritative(self) -> EquivalenceIndex:
expected = EquivalenceIndex(
num_bands=self._index.num_bands, threshold=self._index.threshold
)
expected.build(list(self._pages()), list(self._curator()))
return expected
def check_and_repair(self, sample: Iterable[Identity] | None = None) -> ConsistencyReport:
"""Verify the (sampled) region against a from-scratch fold; scoped-recompute on mismatch."""
source = {p.identity: p for p in self._pages()}
expected = self._authoritative()
region = (
set(sample)
if sample is not None
else set(source) | set(self._index.identities())
)
drifted = region_digest(self._index, region) != region_digest(expected, region)
if not drifted:
return ConsistencyReport(len(region), drifted=False, repaired=False, healthy=True)
self._repair(region, source)
healthy = region_digest(self._index, region) == region_digest(expected, region)
return ConsistencyReport(len(region), drifted=True, repaired=True, healthy=healthy)
def _repair(self, region: set[Identity], source: dict[Identity, Page]) -> None:
"""Scoped recompute: reconcile each affected identity to the current source."""
present = self._index.identities()
for identity in region:
page = source.get(identity)
if page is not None:
self._index.update(page) if identity in present else self._index.add(page)
elif identity in present:
self._index.remove(identity)

View File

@@ -20,9 +20,20 @@ from shard_wiki.coordination import (
Overlay,
OverlayEngine,
)
from shard_wiki.incremental import ConsistencyReport, UnionIndex
from shard_wiki.model import Page
from shard_wiki.policy import DEFAULT_POLICY, Policy
from shard_wiki.union import Resolution, UnionGraph
from shard_wiki.views import (
AllPagesEntry,
BackLink,
ChangeEntry,
SiteMapNode,
all_pages,
build_backlinks,
recent_changes,
site_map,
)
__all__ = ["InformationSpace"]
@@ -41,6 +52,8 @@ class InformationSpace:
self.log = DecisionLog(store)
self.union = UnionGraph(space_id, log=self.log, policy=policy)
self.overlays = OverlayEngine(space_id, self.log)
self._index: UnionIndex | None = None # maintained derived tier, built lazily
self._index_stale = True
@classmethod
def git_backed(
@@ -57,6 +70,7 @@ class InformationSpace:
"""Attach a shard — only if it passes conformance (verified profile, I-3/§6.6)."""
assert_conformant(adapter)
self.union.attach(adapter)
self._index_stale = True
def alias(self, name: str, target: str, actor: str | None = None) -> None:
"""Record a coordination-canonical alias (``name`` → ``"shard:key"``) in the log."""
@@ -91,4 +105,44 @@ class InformationSpace:
write-through-capable target fast-forwards (write-through); a read-only target keeps the
draft as local truth (I-5: overlay before mutation, always)."""
overlay = self.overlay(name, body, actor=actor)
return self.apply_overlay(overlay.overlay_id)
result = self.apply_overlay(overlay.overlay_id)
self._index_stale = True # the applied edit changes the derived tier
return result
# --- maintained derived tier (SHARD-WP-0011): incremental-first, rebuild as fallback ---
@property
def index(self) -> UnionIndex:
"""The maintained equivalence index (built lazily; rebuilt when the union has changed)."""
if self._index is None:
self._index = UnionIndex(self.union, self.log, self.space_id)
elif self._index_stale:
self._index.rebuild() # bounded fallback after a mutation
self._index_stale = False
return self._index
def reindex(self) -> None:
"""Force a full rebuild of the maintained derived tier (the explicit fallback path)."""
self.index.rebuild()
def verify_index(self) -> ConsistencyReport:
"""Run the I-2 consistency-checker over the maintained tier; self-heal any drift."""
return self.index.verify()
# --- derived views (SHARD-WP-0010): recomputable, provenance-carrying, presentation-free ---
def backlinks(self, name: str, *, camelcase: bool = False) -> tuple[BackLink, ...]:
"""Pages across the union that link to ``name`` (UC-18)."""
return build_backlinks(self.union, camelcase=camelcase).to(name)
def recent_changes(self, *, limit: int | None = None) -> tuple[ChangeEntry, ...]:
"""The merged newest-first change feed: coordination journal + shard signals (UC-17)."""
return recent_changes(self.union, self.log, self.space_id, limit=limit)
def all_pages(self) -> tuple[AllPagesEntry, ...]:
"""The union's distinct pages, collapsed via the maintained equivalence index."""
return all_pages(self.union, equivalence_groups=self.index.equivalence_groups())
def site_map(self) -> SiteMapNode:
"""The union namespace tree built from page placements."""
return site_map(self.union)

View File

@@ -6,6 +6,7 @@ stays out of core (L6) — these produce models, never rendered output. Per the
package imports down (union/model/coordination/provenance) and is imported only by the orchestrator.
"""
from shard_wiki.views.allpages import AllPagesEntry, SiteMapNode, all_pages, site_map
from shard_wiki.views.backlinks import BackLink, BackLinksIndex, build_backlinks
from shard_wiki.views.links import (
ResolvedLink,
@@ -13,6 +14,7 @@ from shard_wiki.views.links import (
extract_links,
resolve_links,
)
from shard_wiki.views.recentchanges import ChangeEntry, recent_changes
__all__ = [
"WikiLink",
@@ -22,4 +24,10 @@ __all__ = [
"BackLink",
"BackLinksIndex",
"build_backlinks",
"ChangeEntry",
"recent_changes",
"AllPagesEntry",
"SiteMapNode",
"all_pages",
"site_map",
]

View File

@@ -0,0 +1,131 @@
"""AllPages + SiteMap — enumeration views over the union (SHARD-WP-0010 T4).
**AllPages** lists the union's distinct pages, collapsing identities that name the same page: a
*chorus* (same key across shards) and *equivalence-bound* identities (decision-log bindings) fold
into one entry, with divergence noted when the members' bodies differ (union without erasure — the
collapse is acknowledged, never silent). **SiteMap** is the namespace tree built from page
placements (paths), spanning shards.
Both are derived/recomputable and presentation-free (the tree is a model, not rendered HTML).
"""
from __future__ import annotations
from dataclasses import dataclass
from shard_wiki.model import Identity, Page
from shard_wiki.union import UnionGraph
__all__ = ["AllPagesEntry", "SiteMapNode", "all_pages", "site_map"]
@dataclass(frozen=True, slots=True)
class AllPagesEntry:
"""One union page: its representative ``name``, the ``members`` collapsed into it, and whether
those members' bodies ``diverge`` (a chorus with differing content)."""
name: str
members: tuple[Identity, ...]
diverges: bool
@dataclass(frozen=True, slots=True)
class SiteMapNode:
"""A namespace node: its path ``name``, child namespaces, and pages directly under it."""
name: str
children: tuple[SiteMapNode, ...]
pages: tuple[Identity, ...]
class _UnionFind:
def __init__(self) -> None:
self._parent: dict[str, str] = {}
def add(self, x: str) -> None:
self._parent.setdefault(x, x)
def find(self, x: str) -> str:
self.add(x)
root = x
while self._parent[root] != root:
root = self._parent[root]
while self._parent[x] != root:
self._parent[x], x = root, self._parent[x]
return root
def union(self, a: str, b: str) -> None:
self.add(a)
self.add(b)
ra, rb = self.find(a), self.find(b)
if ra != rb:
self._parent[max(ra, rb)] = min(ra, rb)
def all_pages(
union: UnionGraph,
equivalence_groups: tuple[frozenset[str], ...] | None = None,
) -> tuple[AllPagesEntry, ...]:
"""Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members.
``equivalence_groups`` (string identities, decision-log form) overrides the source of
equivalence — the orchestrator passes the maintained index's groups (SHARD-WP-0011 T4); the
default falls back to the decision-log fold, so direct callers are unaffected.
"""
pages: dict[str, Page] = {}
by_key: dict[str, list[str]] = {}
for page in union.iter_pages():
ident = str(page.identity)
pages[ident] = page
by_key.setdefault(page.identity.key, []).append(ident)
uf = _UnionFind()
for ident in pages:
uf.add(ident)
for idents in by_key.values(): # same key across shards → chorus
for other in idents[1:]:
uf.union(idents[0], other)
if equivalence_groups is None:
equivalence_groups = union.log.fold(union.space).equivalence_groups
for group in equivalence_groups: # curator bindings (+ maintained content edges)
present = [m for m in group if m in pages]
for other in present[1:]:
uf.union(present[0], other)
groups: dict[str, list[str]] = {}
for ident in pages:
groups.setdefault(uf.find(ident), []).append(ident)
entries: list[AllPagesEntry] = []
for members in groups.values():
member_pages = [pages[m] for m in members]
identities = tuple(p.identity for p in member_pages)
name = min(p.identity.key for p in member_pages)
diverges = len({p.body for p in member_pages}) > 1
entries.append(AllPagesEntry(name=name, members=identities, diverges=diverges))
return tuple(sorted(entries, key=lambda e: e.name))
def _segments(page: Page) -> list[str]:
path = page.placements[0].path if page.placements else page.identity.key
if path.endswith(".md"):
path = path[:-3]
return [seg for seg in path.split("/") if seg]
def site_map(union: UnionGraph) -> SiteMapNode:
"""The union namespace tree from page placements (directories nest; pages sit at their dir)."""
root: dict = {"children": {}, "pages": []}
for page in union.iter_pages():
segments = _segments(page)
node = root
for seg in segments[:-1]: # directory segments build the nesting
node = node["children"].setdefault(seg, {"children": {}, "pages": []})
node["pages"].append(page.identity)
return _freeze("", root)
def _freeze(name: str, node: dict) -> SiteMapNode:
children = tuple(_freeze(k, v) for k, v in sorted(node["children"].items()))
pages = tuple(sorted(node["pages"], key=str))
return SiteMapNode(name=name, children=children, pages=pages)

View File

@@ -0,0 +1,108 @@
"""RecentChanges — a merged change feed over the union (SHARD-WP-0010 T3; UC-17).
Two streams, one ordered feed (newest-first):
* the **coordination journal** — overlay/alias/fork/merge/binding decisions from the decision log,
each carrying its actor and the decision payload; and
* **shard change signals** — a page's current revision (folder mtime / ``source_rev``), i.e. the
backend's own "this changed" evidence.
Every entry carries provenance: which shard the edit came from, or that it was a coordination
decision (and by whom). Derived/recomputable — `notify`-driven streaming is a later binding.
"""
from __future__ import annotations
from collections.abc import Mapping
from dataclasses import dataclass, field
from datetime import datetime
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.union import UnionGraph
__all__ = ["ChangeEntry", "recent_changes"]
_COORDINATION = "coordination"
# How each journal event names the thing it touched + a human kind label.
_EVENT_KIND = {
EventType.ALIAS_SET: "alias",
EventType.OVERLAY_CREATED: "overlay",
EventType.MERGE_DECIDED: "merge",
EventType.PAGE_FORKED: "fork",
EventType.BINDING_MADE: "binding",
}
@dataclass(frozen=True, slots=True)
class ChangeEntry:
"""One change in the feed. ``source`` is the shard id (a shard edit) or ``"coordination"``."""
when: datetime
kind: str
ref: str
source: str
actor: str | None = None
detail: Mapping[str, object] = field(default_factory=dict)
def _event_ref(event_type: EventType, payload: Mapping[str, object]) -> str:
if event_type is EventType.ALIAS_SET:
return str(payload.get("alias", ""))
if event_type is EventType.OVERLAY_CREATED:
return f"{payload.get('target_shard')}:{payload.get('target_key')}"
if event_type is EventType.PAGE_FORKED:
return f"{payload.get('source')}{payload.get('fork')}"
if event_type is EventType.BINDING_MADE:
return ", ".join(str(m) for m in payload.get("members", ()))
return str(payload.get("overlay_id", "")) # MERGE_DECIDED
def recent_changes(
union: UnionGraph,
log: DecisionLog,
space: str,
*,
limit: int | None = None,
) -> tuple[ChangeEntry, ...]:
"""Merge the coordination journal and shard change signals into one newest-first feed."""
entries: list[ChangeEntry] = []
for event in log.events(space):
entries.append(
ChangeEntry(
when=event.timestamp,
kind=_EVENT_KIND.get(event.type, event.type.value),
ref=_event_ref(event.type, event.payload),
source=_COORDINATION,
actor=event.actor,
detail=dict(event.payload),
)
)
for page in union.iter_pages():
rev = page.envelope.source_rev
when = _parse_rev(rev)
if when is None:
continue # shard offers no change signal for this page — skip gracefully
entries.append(
ChangeEntry(
when=when,
kind="edit",
ref=str(page.identity),
source=page.identity.shard,
detail={"source_rev": rev},
)
)
entries.sort(key=lambda e: e.when, reverse=True)
return tuple(entries if limit is None else entries[:limit])
def _parse_rev(rev: str | None) -> datetime | None:
if rev is None:
return None
try:
return datetime.fromisoformat(rev)
except ValueError:
return None # non-temporal revision token (e.g. a content hash) — no feed timestamp

131
tests/test_git_adapter.py Normal file
View File

@@ -0,0 +1,131 @@
"""Tests for the GitShardAdapter read path + profile (SHARD-WP-0012 T1)."""
import subprocess
import pytest
from shard_wiki.adapters import GitShardAdapter, run_conformance
from shard_wiki.model import (
AttachmentMode,
History,
NotSupported,
ProfileError,
Substrate,
Verb,
)
def _git(repo, *args):
subprocess.run(
["git", "-C", str(repo), *args],
check=True,
capture_output=True,
env={"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": __import__("os").environ.get("PATH", "")},
)
def _repo(tmp_path, files, name="repo"):
repo = tmp_path / name
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
p = repo / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def test_keys_are_tracked_md_paths(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h", "docs/Guide.md": "g", "ignore.txt": "x"})
adapter = GitShardAdapter("git", repo)
assert set(adapter.keys()) == {"Home", "docs/Guide"} # only tracked *.md
def test_read_returns_page_with_commit_sha_rev(tmp_path):
repo = _repo(tmp_path, {"Home.md": "welcome"})
adapter = GitShardAdapter("git", repo)
page = adapter.read("Home")
assert page.identity.shard == "git"
assert page.body == "welcome"
head = subprocess.run(
["git", "-C", str(repo), "rev-parse", "HEAD"], capture_output=True, text=True, check=True
).stdout.strip()
assert page.envelope.source_rev == head # source_rev is the commit sha
assert page.envelope.lineage == "git-native"
def test_read_missing_key_raises(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(KeyError):
adapter.read("Nope")
def test_profile_validates_implication_rules(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"})).profile()
assert profile.substrate is Substrate.GIT
assert profile.attachment_mode is AttachmentMode.GIT_IS_STORE
assert profile.history is History.GIT_NATIVE # git-is-store ⟹ git-native
profile.validate() # raises if the implication rule were violated
def test_profile_is_read_only_in_t1(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"})).profile()
assert profile.supports(Verb.READ)
assert not profile.supports(Verb.WRITE)
def test_conformance_read_path_passes(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h", "Other.md": "o"}))
report = run_conformance(adapter)
assert report.ok, report.diff()
def test_unclaimed_write_raises_not_supported(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(NotSupported):
adapter.write("Home", "new") # read-only: honest absence
def test_empty_repo_has_no_keys(tmp_path):
repo = tmp_path / "empty"
repo.mkdir()
_git(repo, "init", "--quiet")
adapter = GitShardAdapter("git", repo)
assert list(adapter.keys()) == []
def test_bad_profile_combo_is_rejected():
# Sanity: the implication rule that backs the git profile actually bites when violated.
from shard_wiki.model import (
AccessGrant,
Addressing,
CapabilityProfile,
ContentOpacity,
MergeModel,
NativeQuery,
OperationalEnvelope,
Translation,
WriteGranularity,
)
from shard_wiki.provenance import Liveness
with pytest.raises(ProfileError):
CapabilityProfile(
substrate=Substrate.FILES, # not git, but claims git-is-store
attachment_mode=AttachmentMode.GIT_IS_STORE,
write_granularity=WriteGranularity.NONE,
content_opacity=ContentOpacity.TRANSPARENT,
operational_envelope=OperationalEnvelope.LOCAL_UNBOUNDED,
access_grant=AccessGrant.OPEN,
liveness=Liveness.STATIC,
history=History.NONE,
merge_model=MergeModel.NONE,
addressing=Addressing.PATH,
native_query=NativeQuery.NONE,
translation=Translation.NATIVE,
supported_verbs=frozenset({Verb.READ}),
).validate()

View File

@@ -0,0 +1,116 @@
"""GitShardAdapter history adopt + cross-substrate integration (SHARD-WP-0012 T3)."""
import os
import subprocess
import pytest
from shard_wiki.adapters import FolderAdapter, GitShardAdapter
from shard_wiki.coordination import ApplyStatus
from shard_wiki.space import InformationSpace
_ENV = {
"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": os.environ.get("PATH", ""),
}
def _git(repo, *args):
return subprocess.run(
["git", "-C", str(repo), *args], check=True, capture_output=True, text=True, env=_ENV
).stdout.strip()
def _git_repo(tmp_path, files, name="git"):
repo = tmp_path / name
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
(repo / rel).parent.mkdir(parents=True, exist_ok=True)
(repo / rel).write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def _folder(tmp_path, name, files, writable=False):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root, writable=writable)
# -- history adopt -------------------------------------------------------------
def test_history_lists_commits_newest_first(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "v1"})
adapter = GitShardAdapter("git", repo, writable=True)
adapter.write("Home", "v2")
history = adapter.history("Home")
assert len(history) == 2
assert history[0].message == "write Home.md" # newest first
assert history[-1].message == "seed"
assert all(rev.sha for rev in history)
def test_history_unknown_key_raises(tmp_path):
adapter = GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(KeyError):
adapter.history("Nope")
# -- cross-substrate integration ----------------------------------------------
def test_resolve_across_git_and_folder(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "git home"})))
space.attach(_folder(tmp_path, "notes", {"Daily.md": "folder daily"}))
assert space.read("Home").body == "git home" # resolved from the git shard
assert space.read("Daily").body == "folder daily" # resolved from the folder shard
def test_chorus_spans_substrates_with_divergence(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Shared.md": "from git"})))
space.attach(_folder(tmp_path, "notes", {"Shared.md": "from folder"}))
res = space.resolve("Shared")
assert {p.body for p in res.pages} == {"from git", "from folder"} # chorus across substrates
git_page = next(p for p in res.pages if p.identity.shard == "git")
assert git_page.envelope.divergence # divergence recorded, not erased
def test_edit_through_git_shard_commits(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "original"})
space = InformationSpace("space")
space.attach(GitShardAdapter("git", repo, writable=True))
result = space.edit("Home", "edited via overlay")
assert result.status is ApplyStatus.APPLIED # write-through fast-forward on a git shard
assert space.read("Home").body == "edited via overlay"
assert int(_git(repo, "rev-list", "--count", "HEAD")) == 2 # the edit became a commit
def test_apply_under_drift_refuses_on_external_commit(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "original"})
space = InformationSpace("space")
space.attach(GitShardAdapter("git", repo, writable=True))
overlay = space.overlay("Home", "my draft") # base_rev = current git sha
# Another writer commits to the same path → the sha moves underneath the draft.
(repo / "Home.md").write_text("someone else", encoding="utf-8")
_git(repo, "add", "Home.md")
_git(repo, "commit", "-m", "external")
result = space.apply_overlay(overlay.overlay_id)
assert result.status is ApplyStatus.REFUSED_DRIFT # never clobber (sha drift detected)
# The shard itself is untouched — the external commit stands; the draft remains a draft.
assert space.union.shard("git").read("Home").body == "someone else"
def test_overlay_on_read_only_git_shard_kept_as_draft(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "ro"}), writable=False))
result = space.edit("Home", "wanted change")
assert result.status is ApplyStatus.KEPT_DRAFT # read-only target → overlay retained

View File

@@ -0,0 +1,89 @@
"""Tests for GitShardAdapter write=commit + current_rev drift (SHARD-WP-0012 T2)."""
import os
import subprocess
from shard_wiki.adapters import GitShardAdapter, run_conformance
from shard_wiki.model import Verb
_ENV = {
"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": os.environ.get("PATH", ""),
}
def _git(repo, *args, capture=False):
return subprocess.run(
["git", "-C", str(repo), *args], check=True, capture_output=True, text=True, env=_ENV
).stdout.strip()
def _repo(tmp_path, files):
repo = tmp_path / "repo"
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
(repo / rel).write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def test_writable_profile_declares_write_and_version(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}), writable=True).profile()
assert profile.supports(Verb.WRITE)
assert profile.supports(Verb.VERSION)
profile.validate() # PER_PAGE + WRITE is a consistent combination
def test_write_creates_a_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "old"})
adapter = GitShardAdapter("git", repo, writable=True)
before = _git(repo, "rev-list", "--count", "HEAD")
page = adapter.write("Home", "new body")
after = _git(repo, "rev-list", "--count", "HEAD")
assert int(after) == int(before) + 1 # one new commit
assert page.body == "new body"
assert page.envelope.source_rev == _git(repo, "rev-parse", "HEAD") # page is at the new sha
def test_write_advances_current_rev(tmp_path):
repo = _repo(tmp_path, {"Home.md": "old"})
adapter = GitShardAdapter("git", repo, writable=True)
rev_before = adapter.current_rev("Home")
adapter.write("Home", "changed")
assert adapter.current_rev("Home") != rev_before # sha moved → drift detectable
def test_write_new_key_tracks_it(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h"})
adapter = GitShardAdapter("git", repo, writable=True)
adapter.write("docs/New", "fresh page")
assert "docs/New" in set(adapter.keys())
assert adapter.read("docs/New").body == "fresh page"
def test_noop_write_creates_no_empty_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "same"})
adapter = GitShardAdapter("git", repo, writable=True)
before = _git(repo, "rev-list", "--count", "HEAD")
adapter.write("Home", "same") # identical body → nothing to commit
assert _git(repo, "rev-list", "--count", "HEAD") == before
def test_current_rev_reflects_external_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h"})
adapter = GitShardAdapter("git", repo, writable=True)
rev = adapter.current_rev("Home")
# An out-of-band commit to the same path (another writer) moves the per-path sha.
(repo / "Home.md").write_text("externally edited", encoding="utf-8")
_git(repo, "add", "Home.md")
_git(repo, "commit", "-m", "external")
assert adapter.current_rev("Home") != rev
def test_conformance_positive_write_probe_passes(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "body"}), writable=True)
report = run_conformance(adapter)
assert report.ok, report.diff()

View File

@@ -0,0 +1,89 @@
"""Tests for the indexed equivalence relation — blocking + verify (SHARD-WP-0011 T1)."""
from itertools import combinations
from shard_wiki.incremental import EquivalenceIndex, MinHasher, band_keys, jaccard, shingles
from shard_wiki.incremental.equivalence import _fingerprint
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def _brute_force_groups(pages, threshold):
"""Oracle: O(N²) verify of every pair, then connected components."""
parent = {p.identity: p.identity for p in pages}
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
for p, q in combinations(pages, 2):
same_fp = _fingerprint(p.body) == _fingerprint(q.body)
sim = jaccard(shingles(p.body), shingles(q.body))
if same_fp or sim >= threshold:
parent[find(p.identity)] = find(q.identity)
comps = {}
for p in pages:
comps.setdefault(find(p.identity), set()).add(p.identity)
return {frozenset(v) for v in comps.values() if len(v) > 1}
def test_minhash_lsh_buckets_near_duplicates_together():
hasher = MinHasher(num_perm=64)
base = "the quick brown fox jumps over the lazy dog near the river bank today"
near = base + " and then some"
far = "completely unrelated content about astrophysics and distant galaxies far"
b_base = set(band_keys(hasher.signature(shingles(base)), 32))
b_near = set(band_keys(hasher.signature(shingles(near)), 32))
b_far = set(band_keys(hasher.signature(shingles(far)), 32))
assert b_base & b_near # near-duplicates share at least one band
assert not (b_base & b_far) # unrelated pages do not
def test_exact_duplicate_across_shards_is_equivalent():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "identical body text here"))
idx.add(_page("B", "Bar", "identical body text here"))
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_unrelated_pages_are_not_equivalent():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "alpha beta gamma delta epsilon"))
idx.add(_page("B", "Bar", "nothing in common whatsoever entirely"))
assert idx.groups() == ()
def test_curator_binding_forces_equivalence_regardless_of_content():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "one thing"))
idx.add(_page("B", "Bar", "totally different"))
idx.bind(Identity("A", "Foo"), Identity("B", "Bar"))
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_index_matches_brute_force_oracle():
threshold = 0.7
shared = "shared sentence one shared sentence two shared sentence three end"
pages = [
_page("A", "Doc1", shared),
_page("B", "Doc1copy", shared + " minor tail"), # near-dup of A
_page("C", "Other", "a totally distinct page with no overlapping shingles at all here"),
_page("D", "Lonely", "yet another isolated document about unrelated subject matter alone"),
]
idx = EquivalenceIndex(threshold=threshold)
idx.build(pages)
assert set(idx.groups()) == _brute_force_groups(pages, threshold)

View File

@@ -0,0 +1,84 @@
"""Incremental maintenance == rebuild, with retraction + propagation (SHARD-WP-0011 T2)."""
from shard_wiki.incremental import EquivalenceIndex
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def _rebuilt(pages, curator=()):
idx = EquivalenceIndex()
idx.build(pages, curator)
return idx
def _equal(a, b):
return a.edges() == b.edges() and set(a.groups()) == set(b.groups())
def test_add_keeps_index_equal_to_rebuild():
pages = [_page("A", "Foo", "same content here"), _page("B", "Bar", "same content here")]
idx = EquivalenceIndex()
for p in pages:
idx.add(p)
assert _equal(idx, _rebuilt(pages))
assert idx.groups() # the two collapse
def test_remove_keeps_index_equal_to_rebuild():
pages = [
_page("A", "Foo", "same content here"),
_page("B", "Bar", "same content here"),
_page("C", "Baz", "unrelated isolated material entirely"),
]
idx = _rebuilt(pages)
idx.remove(Identity("B", "Bar"))
assert _equal(idx, _rebuilt([pages[0], pages[2]]))
def test_edit_into_new_bucket_retracts_stale_edge():
a = _page("A", "Foo", "shared identical body text")
b = _page("B", "Bar", "shared identical body text")
idx = _rebuilt([a, b])
assert idx.groups() # A ≡ B initially
# Edit B to something completely different: it exits A's buckets, the edge is retracted.
b2 = _page("B", "Bar", "now totally divergent unrelated prose about nothing")
idx.update(b2)
assert idx.groups() == () # stale edge gone
assert _equal(idx, _rebuilt([a, b2]))
def test_edit_into_equivalence_adds_edge():
a = _page("A", "Foo", "target body to converge on later")
b = _page("B", "Bar", "initially completely separate writing here")
idx = _rebuilt([a, b])
assert idx.groups() == ()
b2 = _page("B", "Bar", "target body to converge on later") # now identical to A
idx.update(b2)
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
assert _equal(idx, _rebuilt([a, b2]))
def test_removing_connector_splits_a_chorus():
# Curator chain A—B—C (no direct A—C): one group of three.
a, b, c = (_page("A", "X", "aaa"), _page("B", "Y", "bbb"), _page("C", "Z", "ccc"))
idx = EquivalenceIndex()
for p in (a, b, c):
idx.add(p)
idx.bind(a.identity, b.identity)
idx.bind(b.identity, c.identity)
assert idx.equivalent_to(a.identity) == {a.identity, b.identity, c.identity}
# Removing the connector B retracts/propagates: the chorus splits.
idx.remove(b.identity)
assert idx.groups() == ()
chain = [(a.identity, b.identity), (b.identity, c.identity)]
assert _equal(idx, _rebuilt([a, c], curator=chain))

View File

@@ -0,0 +1,89 @@
"""Tests for I-2 verification — digest + consistency-checker (SHARD-WP-0011 T3)."""
from shard_wiki.incremental import (
ConsistencyChecker,
EquivalenceIndex,
derived_digest,
)
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def test_digest_is_stable_under_equivalent_event_orders():
pages = [
_page("A", "Foo", "shared body text here"),
_page("B", "Bar", "shared body text here"),
_page("C", "Baz", "an entirely separate unrelated document"),
]
forward = EquivalenceIndex()
for p in pages:
forward.add(p)
reverse = EquivalenceIndex()
for p in reversed(pages):
reverse.add(p)
assert derived_digest(forward) == derived_digest(reverse)
def test_clean_index_reports_healthy():
pages = [_page("A", "Foo", "same body"), _page("B", "Bar", "same body")]
idx = EquivalenceIndex()
idx.build(pages)
checker = ConsistencyChecker(idx, pages_fn := (lambda: pages))
report = checker.check_and_repair()
assert report.drifted is False and report.healthy is True
assert pages_fn() # source unchanged
def test_missed_delta_drift_is_detected_and_repaired():
a = _page("A", "Foo", "converging target body")
b = _page("B", "Bar", "initially unrelated separate text")
source = {"pages": [a, b]}
idx = EquivalenceIndex()
idx.build(source["pages"])
assert idx.groups() == () # not equivalent yet
# Source changes B to match A, but the index is never told (a missed delta → drift).
b2 = _page("B", "Bar", "converging target body")
source["pages"] = [a, b2]
checker = ConsistencyChecker(idx, lambda: source["pages"])
report = checker.check_and_repair()
assert report.drifted is True and report.repaired is True and report.healthy is True
# Self-healed: the index now reflects the equivalence.
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_corrupted_internal_state_is_healed():
a = _page("A", "Foo", "identical content")
b = _page("B", "Bar", "identical content")
idx = EquivalenceIndex()
idx.build([a, b])
# Corrupt the derived tier directly: delete a true edge (simulated index corruption).
idx._content_edges.clear()
assert idx.groups() == () # corrupted away
checker = ConsistencyChecker(idx, lambda: [a, b])
report = checker.check_and_repair()
assert report.drifted is True and report.healthy is True
assert idx.groups() # edge restored by scoped recompute
def test_removed_source_page_is_reconciled():
a = _page("A", "Foo", "same body")
b = _page("B", "Bar", "same body")
idx = EquivalenceIndex()
idx.build([a, b])
checker = ConsistencyChecker(idx, lambda: [a]) # B vanished from source
report = checker.check_and_repair()
assert report.healthy is True
assert Identity("B", "Bar") not in idx.identities()

View File

@@ -0,0 +1,74 @@
"""Wire the incremental tier behind InformationSpace views (SHARD-WP-0011 T4)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import EventType
from shard_wiki.model import Identity
from shard_wiki.space import InformationSpace
from shard_wiki.views import all_pages
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_all_pages_via_index_matches_direct_fold(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "wiki", {"Home.md": "welcome", "Guide.md": "the guide"}))
space.attach(_shard(tmp_path, "notes", {"Daily.md": "today"}))
# Routed-through-index result equals the direct fold-based computation (behaviour unchanged).
via_index = {(e.name, e.members) for e in space.all_pages()}
direct = {(e.name, e.members) for e in all_pages(space.union)}
assert via_index == direct
def test_curator_binding_collapses_via_maintained_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "x"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "y"}))
space.log.append(
"space", EventType.BINDING_MADE, {"members": ["a:Foo", "b:Bar"]}
)
# The maintained index re-syncs curator edges live from the log fold.
collapsed = [e for e in space.all_pages() if len(e.members) == 2]
assert len(collapsed) == 1
assert set(collapsed[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_content_duplicate_collapses_via_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "the very same body content here"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "the very same body content here"}))
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # content equivalence detected by the maintained index
assert set(dup[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_attach_invalidates_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
assert space.all_pages() # builds the index (one page, no groups)
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"})) # marks index stale
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # rebuilt fallback picks up the new equivalent page
def test_verify_index_reports_healthy_when_consistent(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"}))
space.all_pages() # ensure built
report = space.verify_index()
assert report.healthy is True
def test_reindex_is_an_explicit_fallback(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "content"}))
before = space.index.digest()
space.reindex()
assert space.index.digest() == before # rebuild is deterministic

View File

@@ -0,0 +1,76 @@
"""Tests for the AllPages + SiteMap enumeration views (SHARD-WP-0010 T4)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.model import Identity
from shard_wiki.union import UnionGraph
from shard_wiki.views import all_pages, site_map
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_all_pages_spans_shards(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a"}))
u.attach(_shard(tmp_path, "shardB", {"B.md": "b"}))
names = {e.name for e in all_pages(u)}
assert names == {"A", "B"}
def test_chorus_collapses_to_one_entry_with_divergence(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "A home"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "B home"}))
entries = all_pages(u)
home = [e for e in entries if e.name == "Home"]
assert len(home) == 1 # chorus → single entry
assert set(home[0].members) == {Identity("shardA", "Home"), Identity("shardB", "Home")}
assert home[0].diverges is True # bodies differ — collapse acknowledged, not silent
def test_chorus_same_body_does_not_diverge(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "same"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "same"}))
(home,) = [e for e in all_pages(u) if e.name == "Home"]
assert home.diverges is False
def test_equivalence_binding_collapses_distinct_keys(tmp_path):
log = DecisionLog()
log.append(
"space", EventType.BINDING_MADE, {"members": ["shardA:Foo", "shardB:Bar"]}
)
u = UnionGraph("space", log=log)
u.attach(_shard(tmp_path, "shardA", {"Foo.md": "x"}))
u.attach(_shard(tmp_path, "shardB", {"Bar.md": "x"}))
pair = {Identity("shardA", "Foo"), Identity("shardB", "Bar")}
# The two bound identities fold into one entry (named by the min key, "Bar").
bound = [e for e in all_pages(u) if {*e.members} == pair]
assert len(bound) == 1
assert bound[0].name == "Bar"
def test_sitemap_reflects_namespace_paths(tmp_path):
u = UnionGraph("space")
u.attach(
_shard(
tmp_path,
"shardA",
{"Home.md": "h", "docs/Guide.md": "g", "docs/api/Ref.md": "r"},
)
)
root = site_map(u)
# Top level: "Home" page directly, and a "docs" namespace.
assert any(p.key == "Home" for p in root.pages)
docs = next(c for c in root.children if c.name == "docs")
assert any(p.key == "docs/Guide" for p in docs.pages)
api = next(c for c in docs.children if c.name == "api")
assert any(p.key == "docs/api/Ref" for p in api.pages)

View File

@@ -0,0 +1,52 @@
"""Integration: derived views exposed on InformationSpace over two shards (SHARD-WP-0010 T5)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.model import Identity
from shard_wiki.space import InformationSpace
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def _space(tmp_path):
space = InformationSpace("space")
space.attach(
_shard(tmp_path, "wiki", {"Home.md": "welcome, see [[Guide]]", "Guide.md": "the guide"})
)
space.attach(_shard(tmp_path, "notes", {"Daily.md": "today I read [[Guide]]"}))
return space
def test_backlinks_across_two_shards(tmp_path):
space = _space(tmp_path)
sources = {bl.source for bl in space.backlinks("Guide")}
assert sources == {Identity("wiki", "Home"), Identity("notes", "Daily")}
def test_all_pages_and_site_map_over_union(tmp_path):
space = _space(tmp_path)
names = {e.name for e in space.all_pages()}
assert names == {"Home", "Guide", "Daily"}
leaves = {p.key for p in space.site_map().pages}
assert {"Home", "Guide", "Daily"} <= leaves
def test_recent_changes_includes_alias_and_edits(tmp_path):
space = _space(tmp_path)
space.alias("Start", "wiki:Home", actor="ana")
feed = space.recent_changes()
kinds = {e.kind for e in feed}
assert "alias" in kinds and "edit" in kinds
alias = next(e for e in feed if e.kind == "alias")
assert alias.source == "coordination" and alias.actor == "ana"
def test_red_link_creates_no_backlink_via_space(tmp_path):
space = _space(tmp_path)
assert space.backlinks("Nonexistent") == ()

View File

@@ -0,0 +1,67 @@
"""Tests for the RecentChanges merged feed (SHARD-WP-0010 T3)."""
import os
from datetime import datetime, timezone
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.union import UnionGraph
from shard_wiki.views import recent_changes
def _shard(tmp_path, name, files, mtime=None):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
if mtime is not None:
os.utime(p, (mtime, mtime))
return FolderAdapter(name, root)
def test_edit_and_alias_both_appear_newest_first(tmp_path):
# Page edit signal pinned to an old mtime; the alias decision happens "now" → alias is newest.
old = datetime(2020, 1, 1, tzinfo=timezone.utc).timestamp()
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "home"}, mtime=old))
log = DecisionLog()
log.append("space", EventType.ALIAS_SET, {"alias": "Start", "target": "shardA:Home"})
feed = recent_changes(u, log, "space")
kinds = [e.kind for e in feed]
assert "edit" in kinds and "alias" in kinds
assert feed[0].kind == "alias" # newest first
assert feed[-1].kind == "edit"
# Monotonic non-increasing by time.
assert all(feed[i].when >= feed[i + 1].when for i in range(len(feed) - 1))
def test_per_shard_attribution_present(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a"}))
u.attach(_shard(tmp_path, "shardB", {"B.md": "b"}))
feed = recent_changes(u, DecisionLog(), "space")
edits = {e.ref: e.source for e in feed if e.kind == "edit"}
assert edits["shardA:A"] == "shardA"
assert edits["shardB:B"] == "shardB" # each edit attributed to its shard
def test_coordination_entries_carry_actor_and_ref(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Doc.md": "x"}))
log = DecisionLog()
log.append(
"space", EventType.PAGE_FORKED, {"source": "shardA:Doc", "fork": "shardB:Doc"}, actor="ana"
)
fork = next(e for e in recent_changes(u, log, "space") if e.kind == "fork")
assert fork.source == "coordination"
assert fork.actor == "ana"
assert fork.ref == "shardA:Doc→shardB:Doc"
def test_limit_truncates_to_newest(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a", "B.md": "b", "C.md": "c"}))
feed = recent_changes(u, DecisionLog(), "space", limit=2)
assert len(feed) == 2

View File

@@ -4,7 +4,7 @@ type: workplan
title: "derived views — wikilinks, BackLinks, RecentChanges, AllPages/SiteMap"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -36,7 +36,7 @@ later by SHARD-WP-0011) and carry provenance. Presentation stays out of core (L6
```task
id: SHARD-WP-0010-T1
status: todo
status: done
priority: high
state_hub_task_id: "792660c3-9be9-4771-9f51-69d01f0c7f13"
```
@@ -51,7 +51,7 @@ red-link, CamelCase opt-in.
```task
id: SHARD-WP-0010-T2
status: todo
status: done
priority: high
state_hub_task_id: "431a54c3-82b5-4b08-b3f0-762624d4c91d"
```
@@ -65,7 +65,7 @@ chorus pages aggregate.
```task
id: SHARD-WP-0010-T3
status: todo
status: done
priority: medium
state_hub_task_id: "270c1c31-0445-42b9-9a49-92d32c298eb2"
```
@@ -79,7 +79,7 @@ alias both appear, newest-first; per-shard attribution present.
```task
id: SHARD-WP-0010-T4
status: todo
status: done
priority: low
state_hub_task_id: "898ba43e-cdef-4ce8-9fa3-4ce60ebb4fdd"
```
@@ -92,7 +92,7 @@ collapses to one entry with divergence noted; sitemap reflects paths.
```task
id: SHARD-WP-0010-T5
status: todo
status: done
priority: medium
state_hub_task_id: "7157544b-5d3b-45a2-ba5a-c32244c59323"
```

View File

@@ -4,7 +4,7 @@ type: workplan
title: "incremental union maintenance + equivalence index + I-2 verification"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -41,7 +41,7 @@ deployment is later.
```task
id: SHARD-WP-0011-T1
status: todo
status: done
priority: high
state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187"
```
@@ -55,7 +55,7 @@ unrelated pages don't; verified edges match a brute-force oracle on a small corp
```task
id: SHARD-WP-0011-T2
status: todo
status: done
priority: high
state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30"
```
@@ -70,7 +70,7 @@ stale edge.
```task
id: SHARD-WP-0011-T3
status: todo
status: done
priority: high
state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc"
```
@@ -85,7 +85,7 @@ equivalent event orders.
```task
id: SHARD-WP-0011-T4
status: todo
status: done
priority: medium
state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089"
```

View File

@@ -4,7 +4,7 @@ type: workplan
title: "second adapter — git-IS-store shard (contract validation on a new substrate)"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -40,7 +40,7 @@ merge beyond fast-forward (apply-under-drift refuse is enough, as in SHARD-WP-00
```task
id: SHARD-WP-0012-T1
status: todo
status: done
priority: high
state_hub_task_id: "8a1c7c80-a0cc-4e02-a611-1f1fd7dec57b"
```
@@ -54,7 +54,7 @@ implication rules. Tests: read tracked files; profile validates; conformance rea
```task
id: SHARD-WP-0012-T2
status: todo
status: done
priority: high
state_hub_task_id: "b47dfb86-46c1-4e97-a62f-377719499ff2"
```
@@ -68,7 +68,7 @@ changes after an external commit.
```task
id: SHARD-WP-0012-T3
status: todo
status: done
priority: medium
state_hub_task_id: "4c895f42-671d-4948-8bdf-941fd85644bb"
```