Compare commits

...

17 Commits

Author SHA1 Message Date
cca5bf83c3 Add credential routing instructions for all agent runtimes
Propagate shared credential-routing section (Codex, Claude, Grok, llm-connect)
from state-hub template via scripts/propagate_credential_routing.py.
2026-06-18 22:48:39 +02:00
def699c1eb feat(adapters): GitShardAdapter history adopt + cross-substrate integration (WP-0012 T3)
Adopt git-native history (TSD §A.5): a VERSION-gated history(key) surfaces the
commit list for a path (newest-first sha + subject) — declared by every git-IS-store
shard, read-only or not. Integration proves the union/overlay/edit machinery works
unchanged across folder + git substrates: resolve/chorus span both, edit through a
git shard fast-forwards as a commit, apply-under-drift refuses on an external commit
(sha drift) without clobbering, and a read-only git target keeps the overlay as a
draft. SCOPE updated; WP-0012 done. 196 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:41:19 +02:00
a4e0f52ec1 feat(adapters): GitShardAdapter write=commit + current_rev drift (WP-0012 T2)
Writable mode: write(key, body) stages and commits the file (skipping a no-op so
no empty commit is created), returning the page at the new commit sha. The
writable profile declares WRITE + VERSION with PER_PAGE granularity. current_rev
is the per-path commit sha, so a write — or an external commit to the same path —
moves it, driving apply-under-drift. Passes the conformance positive-write probe.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:38:41 +02:00
4231daf94f feat(adapters): GitShardAdapter read path + git-IS-store profile (WP-0012 T1)
A second substrate validating the contract beyond plain folders: a git-IS-store
shard reading Markdown from a git repo. Keys are tracked *.md paths; read returns
a Page whose source_rev is the per-path last-commit sha (so an edit to one page
never drifts another); profile is git-IS-store / substrate=git / history=git-native
/ addressing=path, validated against the §6.5 implication rules. Passes the
conformance read path with honest absence of unclaimed verbs. Zero new deps
(git CLI via subprocess). No core changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:36:28 +02:00
37681d89b6 feat(incremental): wire maintained tier behind views; rebuild fallback (WP-0011 T4)
Route InformationSpace.all_pages through a maintained UnionIndex: equivalence is
served from the incrementally maintained index (curator bindings re-synced live
from the log fold + detected content edges), exposed in decision-log string form
so results are a behaviour-preserving superset. The index is built lazily and
rebuilt (bounded fallback) when the union mutates (attach/edit invalidate it);
reindex() forces a rebuild and verify_index() runs the I-2 self-healing checker.
all_pages() gains an optional equivalence_groups source (default = fold) so
direct callers are unaffected. SCOPE updated; WP-0011 done. 173 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:21:39 +02:00
a8e65235a8 feat(incremental): I-2 digest + consistency-checker (WP-0011 T3)
A Merkle-style digest summarizes the derived tier (per-identity fingerprint +
incident edges as order-independent leaves) so equal states have equal digests
and the digest is stable under equivalent event orders. A ConsistencyChecker
recomputes the authoritative fold from the current source, compares it over a
sampled region, and on mismatch scoped-recomputes just the affected identities —
self-healing missed-delta drift, corrupted internal state, and vanished pages.
Makes derived = f(canonical) verified, not asserted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:16:50 +02:00
d7d046cac0 test(incremental): delta maintenance == rebuild, retraction + split (WP-0011 T2)
Verify change-driven maintenance keeps the equivalence index equal to a
from-scratch rebuild under add / edit / remove: an edit into a new bucket
retracts the stale edge, an edit into equivalence adds one, and removing a
connector node propagates a retraction that splits a chorus. Equality checked
against a fresh build() oracle on every operation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:14:32 +02:00
0b3ab2086f feat(incremental): indexed equivalence — blocking + verify (WP-0011 T1)
Detect equivalence (distinct identities holding the same page) without pairwise
O(N²): MinHash/LSH bands over content shingles + normalized-title buckets
generate candidates (blocking), then exact-fingerprint or Jaccard>=threshold
confirm them (verify), with curator decision-log bindings always forming edges.
Groups are the connected components of the edge set. Includes the incremental
add/update/remove internals used by T2. Matches a brute-force oracle. New
incremental/ package (minhash primitives + EquivalenceIndex).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:13:06 +02:00
d85d019543 feat(views): wire derived views onto InformationSpace + integration (WP-0010 T5)
Expose backlinks(name), recent_changes(), all_pages(), site_map() on
InformationSpace. Integration test exercises all four over two shards (BackLinks
aggregate across shards, AllPages/SiteMap span the union, RecentChanges merges an
alias decision with shard edits). SCOPE updated; WP-0010 done. 152 tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:05:12 +02:00
3a5acdcb28 feat(views): AllPages + SiteMap enumeration views (WP-0010 T4)
AllPages enumerates the union's distinct pages, collapsing chorus (same key
across shards) and equivalence-bound identities into one entry via union-find,
noting divergence when members' bodies differ (collapse acknowledged, not
silent). SiteMap builds the namespace tree from page placements, spanning shards.
Both derived/recomputable and presentation-free.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 02:03:15 +02:00
34b0c539f3 feat(views): RecentChanges merged change feed (WP-0010 T3)
One newest-first feed merging the coordination journal (overlay/alias/fork/merge/
binding decisions, with actor + payload) and shard change signals (page
source_rev / mtime). Each entry carries provenance: the originating shard for an
edit, or 'coordination' (and the actor) for a decision. Non-temporal revision
tokens are skipped gracefully. Derived/recomputable; notify-streaming later.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:59:11 +02:00
da540d4eea feat(views): BackLinks derived view over the union link graph (WP-0010 T2)
For any page name, the set of pages that link to it: extract wikilinks from every
union page (new UnionGraph.iter_pages enumeration) and index the resolved ones by
target name. Red-links create no backlinks; entries carry source provenance; a
chorus target aggregates the backlinks of all members under one name. Derived/
recomputable, stores nothing canonical.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:56:48 +02:00
951b24300d feat(views): wikilink + red-link model (WP-0010 T1)
A CommonMark wikilink extension: extract [[Target]] / [[Target|label]] from a
page body (skipping fenced + inline code, preserving offsets), and resolve each
target through the union — resolved is a link, unresolved is a createable
red-link (never a dropped reference). CamelCase auto-linking is off by default,
opt-in per space, and never double-counts a target already inside [[...]]. Link
model + resolution are core; rendering stays L6. New views/ package.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:55:06 +02:00
c731c96634 feat(coordination): git backend wiring + verbatim log migration (WP-0009 T4)
InformationSpace.git_backed(space_id, repo_path) wires the git coordination log;
the default constructor stays in-memory for tests (new keyword-only store=). A
one-time importer (migrate_space / import_log / JSONL export+import) replays an
existing in-memory or JSON log into git verbatim — preserving seq, timestamp and
actor (union-without-erasure) and refusing out-of-order import. Same fold after
migration; no behavioural change to overlay/union. SCOPE updated; WP-0009 done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:49:55 +02:00
f0fee65cc0 test(coordination): cross-process read-your-writes + fold parity (WP-0009 T3)
Verify the git backend's fold reads the durable log into CoordinationState with
unchanged semantics, and that read-your-writes holds across separate handles and
separate OS processes against the same space ref (one test spawns a real
subprocess that appends, then reads it back). Cross-process fold equals the
in-memory fold for the same event sequence (derived = f(log)).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:47:08 +02:00
34432c2e15 feat(coordination): per-space append authority (lease) (WP-0009 T2)
A single append authority per space serializes appends into a total order: at
most one node holds a space's lease; only the holder writes, non-holders forward
their append intent to the holder. Leases are time-bounded and re-grantable, so
a dead holder's lease expires and a new node resumes from the log head (seq stays
contiguous). A stale ex-holder discovers it is no longer the holder and forwards
rather than writing, so a partitioned node cannot fork the log. Works over both
in-memory and git stores. Single-coordinator only (distributed leasing out of scope).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:45:52 +02:00
45a858ead0 feat(coordination): git-backed DecisionLog event store (WP-0009 T1)
Factor DecisionLog storage behind an EventStore abstraction: InMemoryEventStore
stays the default/test double, GitEventStore makes the coordination log
git-addressable. Each space is a ref (refs/spaces/<sha1>); append writes an
immutable one-blob commit and advances the ref under compare-and-swap, so the
commit chain is the per-space total order and a racing appender can never fork
the log. Deterministic stable-JSON event serialization. Zero runtime deps
(git CLI via subprocess). API and fold unchanged across backends.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:41:27 +02:00
43 changed files with 3229 additions and 33 deletions

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=shard-wiki` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -59,4 +59,56 @@ Finished or canceled workplans move to `history/` with a `yymmdd-` archive prefi
- Reviewed design ready to guide code → `spec/`
- Implementation tasks → `workplans/`
- User/dev/agent how-to → `docs/`
- Collaborative unstructured notes → `wiki/`
- Collaborative unstructured notes → `wiki/`
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=shard-wiki` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -51,3 +51,4 @@ ruff format # format
```
Note: the system `pytest` is 7.4.x; `minversion` in `pyproject.toml` is pinned to `7.0` to match. Bump it if a newer pytest is installed into the dev environment.
@.claude/rules/credential-routing.md

View File

@@ -17,7 +17,7 @@ Learnings update both SCOPE and INTENT where necessary.
| Layer | State |
|-------|-------|
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. 107 tests green, ~97% coverage |
| Code | Foundation slice implemented (SHARD-WP-0007): `provenance` + `policy` leaves, `model` (Identity/Placement/Span/Page/CapabilityProfile), `adapters` (contract + FolderAdapter + conformance suite), `coordination` (event-sourced DecisionLog), `union` (resolution + chorus, overlay-aware), `InformationSpace` orchestrator. Write path added (SHARD-WP-0008): writable adapter, overlay engine (draft→patch→apply-under-drift), edit() unifies write-through + overlay-before-mutation. Native engine implemented (SHARD-WP-0014): `engine` (kernel + typed-extension runtime + per-shard activation [ADR-0001] + capability-profile-from-extensions + EngineShardAdapter + the `ext.struct` built-in) — an engine shard attaches to an InformationSpace as a canonical-mode shard. Git-backed coordination log (SHARD-WP-0009): `DecisionLog` storage factored behind an `EventStore`; `GitEventStore` makes the log git-addressable (each space a ref, append = immutable CAS-guarded commit), a per-space `AppendAuthority` (lease) gives a single-writer total order with re-grantable HA hand-off, cross-process read-your-writes verified, and a verbatim one-time importer (`migrate_space`/JSONL) replays in-memory logs into git; `InformationSpace.git_backed(...)` wires it. Derived views (SHARD-WP-0010): `views` (wikilink + red-link model, BackLinks, RecentChanges, AllPages/SiteMap) — recomputable, provenance-carrying, presentation-free, exposed via `InformationSpace.backlinks/recent_changes/all_pages/site_map`. Incremental-first derived tier (SHARD-WP-0011): `incremental` (indexed equivalence via MinHash/LSH blocking + verify, change-driven delta maintenance with retraction/propagation, Merkle-style digest + self-healing I-2 consistency-checker, `UnionIndex` routed behind `InformationSpace.all_pages` with rebuild as explicit fallback). Second adapter (SHARD-WP-0012): `GitShardAdapter` — git-IS-store substrate (read=tracked *.md, write=commit, current_rev=per-path sha for drift, adopted git-native history), passes conformance, works across folder+git shards in union/overlay/edit with no core change (capability-as-data proven on a second substrate). 196 tests green, ~97% coverage |
| Intent | `INTENT.md` established; authorization-in-core amendments drafted |
| Research | yawex prior art; c2 origins; federation concepts; wikiengines overview (`research/260608-*/`); XWiki/TWiki/Foswiki deep dives (`research/260613-*/`); Xanadu + ZigZag + Roam + Obsidian + Notion + Joplin + Logseq + local-first workspaces (Anytype/AFFiNE/AppFlowy) + Trilium + Wiki.js + Federated Wiki + Wikibase + git-forge wikis + TiddlyWiki + ikiwiki + Quip + MojoMojo + Oddmuse + UseModWiki deep dives & shard-spectrum synthesis (`research/260614-*/`) |
| Demand | NetKingdom integration asks captured, not yet negotiated |

View File

@@ -9,10 +9,13 @@ from shard_wiki.adapters.conformance import (
)
from shard_wiki.adapters.contract import CONTRACT_VERSION, ShardAdapter
from shard_wiki.adapters.folder import FolderAdapter
from shard_wiki.adapters.git import GitShardAdapter, PageRevision
__all__ = [
"ShardAdapter",
"FolderAdapter",
"GitShardAdapter",
"PageRevision",
"CONTRACT_VERSION",
"Check",
"ConformanceReport",

View File

@@ -0,0 +1,180 @@
"""GitShardAdapter — a second substrate: git-as-store (SHARD-WP-0012; TSD §A.3 git-IS-store).
The home case where **git is the store *and* the journal**. Tracked ``*.md`` paths are the page
keys; the working-tree file is the body; a page's ``source_rev`` is the **commit sha of the last
commit touching its path** (per-path, so an edit to one page never drifts another). The declared
profile is *git-IS-store ⟹ substrate=git ∧ history=git-native* — the implication rule the
capability model enforces (§6.5), validated at registration like any other binding.
This adapter adds **no core changes**: it implements the same :class:`ShardAdapter` contract the
folder adapter does, proving "write an adapter + declare a verified profile" is the whole cost of a
new substrate (capability-as-data, I-3). Built on the ``git`` CLI via subprocess — zero new deps.
"""
from __future__ import annotations
import os
import subprocess
from collections.abc import Iterable
from dataclasses import dataclass
from pathlib import Path
from shard_wiki.adapters.contract import ShardAdapter
from shard_wiki.model import (
AccessGrant,
Addressing,
AttachmentMode,
CapabilityProfile,
ContentOpacity,
History,
Identity,
MergeModel,
NativeQuery,
NotSupported,
OperationalEnvelope,
Page,
Placement,
Substrate,
Translation,
Verb,
WriteGranularity,
)
from shard_wiki.provenance import Liveness, ProvenanceEnvelope, Staleness
__all__ = ["GitShardAdapter", "PageRevision"]
@dataclass(frozen=True, slots=True)
class PageRevision:
"""One adopted git-native revision of a page: the commit sha and its subject line."""
sha: str
message: str
_GIT_IDENTITY = {
"GIT_AUTHOR_NAME": "shard-wiki",
"GIT_AUTHOR_EMAIL": "shard@shard-wiki",
"GIT_COMMITTER_NAME": "shard-wiki",
"GIT_COMMITTER_EMAIL": "shard@shard-wiki",
}
class GitShardAdapter(ShardAdapter):
"""A shard whose store is a git repo: keys are tracked ``*.md`` paths, revs are commit shas."""
def __init__(self, shard_id: str, repo_path: str | Path, writable: bool = False) -> None:
self._shard_id = shard_id
self._repo = Path(repo_path)
self._writable = writable
self._repo.mkdir(parents=True, exist_ok=True)
if not (self._repo / ".git").exists():
self._git("init", "--quiet")
@property
def shard_id(self) -> str:
return self._shard_id
def profile(self) -> CapabilityProfile:
# VERSION is always available — a git-IS-store has git-native history to adopt (§A.5),
# read-only or not. WRITE (= commit, PER_PAGE) is added only in writable mode.
verbs = {Verb.READ, Verb.VERSION}
granularity = WriteGranularity.NONE
if self._writable:
verbs |= {Verb.WRITE}
granularity = WriteGranularity.PER_PAGE
return CapabilityProfile(
substrate=Substrate.GIT,
attachment_mode=AttachmentMode.GIT_IS_STORE,
write_granularity=granularity,
content_opacity=ContentOpacity.TRANSPARENT,
operational_envelope=OperationalEnvelope.LOCAL_UNBOUNDED,
access_grant=AccessGrant.OPEN,
liveness=Liveness.STATIC,
history=History.GIT_NATIVE, # git-is-store ⟹ git-native (§6.5)
merge_model=MergeModel.GIT_TEXT,
addressing=Addressing.PATH,
native_query=NativeQuery.NONE,
translation=Translation.NATIVE,
supported_verbs=frozenset(verbs),
).validate()
def write(self, key: str, body: str) -> Page:
"""Write = **commit**: stage the file and commit it (skip a no-op so no empty commit),
returning the page at the new sha. Drift detection rides on ``current_rev`` = that sha."""
if not self._writable:
raise NotSupported(f"{type(self).__name__} is read-only")
rel = f"{key}.md"
path = self._path_for(key)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(body, encoding="utf-8")
self._git("add", "--", rel)
if self._run("diff", "--cached", "--quiet").returncode != 0: # staged changes present
self._git("commit", "-m", f"write {rel}", env=_GIT_IDENTITY)
return self.read(key)
def keys(self) -> Iterable[str]:
out = self._git("ls-files", "*.md").decode()
for line in out.splitlines():
yield line[: -len(".md")] if line.endswith(".md") else line
def read(self, key: str) -> Page:
path = self._path_for(key)
if not path.is_file():
raise KeyError(key)
rev = self.current_rev(key)
return Page(
identity=Identity(self._shard_id, key),
body=path.read_text(encoding="utf-8"),
envelope=ProvenanceEnvelope(
source_shard=self._shard_id,
liveness=Liveness.STATIC,
staleness=Staleness.FRESH,
source_rev=rev,
lineage="git-native",
),
placements=(Placement(self._shard_id, f"{key}.md"),),
)
def current_rev(self, key: str) -> str | None:
"""The sha of the last commit touching ``key``'s path (per-path drift token), or None."""
rel = f"{key}.md"
if not self._path_for(key).is_file():
return None
sha = self._git("log", "-1", "--format=%H", "--", rel).decode().strip()
return sha or None
def history(self, key: str) -> tuple[PageRevision, ...]:
"""Adopt git-native history (§A.5): the commit list for ``key``'s path, newest-first.
VERSION-gated; raises ``KeyError`` for an unknown page. Each revision is a commit sha +
subject — the native log surfaced through the contract, not re-implemented.
"""
if not self.profile().supports(Verb.VERSION):
raise NotSupported(f"{type(self).__name__} does not support version")
if not self._path_for(key).is_file():
raise KeyError(key)
out = self._git("log", "--format=%H%x00%s", "--", f"{key}.md").decode()
revisions = []
for line in out.splitlines():
sha, _, message = line.partition("\x00")
revisions.append(PageRevision(sha=sha, message=message))
return tuple(revisions)
# -- git plumbing --------------------------------------------------------
def _path_for(self, key: str) -> Path:
return self._repo / f"{key}.md"
def _git(self, *args: str, stdin: bytes | None = None, env: dict | None = None) -> bytes:
return self._run(*args, stdin=stdin, env=env, check=True).stdout
def _run(
self, *args: str, stdin: bytes | None = None, env: dict | None = None, check: bool = False
) -> subprocess.CompletedProcess:
return subprocess.run(
["git", "-C", str(self._repo), *args],
input=stdin,
capture_output=True,
env={**os.environ, **(env or {})},
check=check,
)

View File

@@ -4,7 +4,24 @@ from shard_wiki.coordination.decision_log import (
CoordinationState,
DecisionEvent,
DecisionLog,
EventStore,
EventType,
InMemoryEventStore,
deserialize_event,
serialize_event,
)
from shard_wiki.coordination.append_authority import (
AppendAuthority,
Lease,
LeaseHeld,
LeaseRegistry,
)
from shard_wiki.coordination.git_event_store import GitEventStore
from shard_wiki.coordination.migration import (
export_jsonl,
import_jsonl,
import_log,
migrate_space,
)
from shard_wiki.coordination.overlay import (
ApplyResult,
@@ -19,6 +36,19 @@ __all__ = [
"DecisionEvent",
"EventType",
"CoordinationState",
"EventStore",
"InMemoryEventStore",
"GitEventStore",
"Lease",
"LeaseHeld",
"LeaseRegistry",
"AppendAuthority",
"import_log",
"migrate_space",
"export_jsonl",
"import_jsonl",
"serialize_event",
"deserialize_event",
"Overlay",
"OverlayEngine",
"ApplyStatus",

View File

@@ -0,0 +1,158 @@
"""Per-space append authority — the single-writer lease over the log (SHARD-WP-0009 T2).
The log is a *total order per space* (§8.6). :class:`~shard_wiki.coordination.git_event_store`
makes a fork physically impossible via compare-and-swap; this layer adds the **policy** that gives
the order a single designated writer: a **per-space lease**. At most one node holds a space's lease
at a time; only the holder writes to the store. A non-holder does not write — it **forwards** its
append intent to the current holder, so intents from anywhere still land in one serialized stream.
The lease is **time-bounded and re-grantable** (HA): if a holder dies, its lease expires and a new
node may take it, resuming appends from the log head (``seq`` stays contiguous across the hand-off).
A node holding a *stale* lease (already re-granted elsewhere) cannot write either — it discovers it
is no longer the holder and forwards instead, so a partitioned ex-holder can never fork the log.
Mechanism over policy (CLAUDE.md): this provides the leasing *primitive*; who acquires when, and
the TTL, are the caller's policy. Single-coordinator only — distributed multi-node leasing and log
sharding are explicit non-goals of this workplan.
"""
from __future__ import annotations
import uuid
from collections.abc import Callable, Mapping
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from typing import Any
from shard_wiki.coordination.decision_log import DecisionEvent, EventStore, EventType
__all__ = ["Lease", "LeaseHeld", "LeaseRegistry", "AppendAuthority"]
def _utcnow() -> datetime:
return datetime.now(tz=timezone.utc)
@dataclass(frozen=True, slots=True)
class Lease:
"""A time-bounded grant of single-writer authority over one space."""
space: str
holder: str
token: str
expires_at: datetime
def valid_at(self, now: datetime) -> bool:
return now < self.expires_at
class LeaseHeld(Exception):
"""Raised when a space's lease is validly held by a different node."""
def __init__(self, lease: Lease) -> None:
super().__init__(
f"space {lease.space!r} leased to {lease.holder!r} until {lease.expires_at}"
)
self.lease = lease
class LeaseRegistry:
"""The single coordinator's grant table: at most one *valid* lease per space.
A lease that has expired is freely re-grantable to any node (the HA replacement path); a still
valid lease is exclusive to its holder (renewable by that holder). The registry also routes
forwarded append intents to the current holder node.
"""
def __init__(self, clock: Callable[[], datetime] = _utcnow) -> None:
self._clock = clock
self._leases: dict[str, Lease] = {}
self._nodes: dict[str, AppendAuthority] = {}
def register(self, node: AppendAuthority) -> None:
self._nodes[node.node_id] = node
def grant(self, space: str, holder: str, ttl_seconds: float) -> Lease:
"""Grant/renew the lease for ``space`` to ``holder``; raise :class:`LeaseHeld` if another
node still holds it validly. An expired lease is re-grantable to anyone."""
now = self._clock()
current = self._leases.get(space)
if current is not None and current.valid_at(now) and current.holder != holder:
raise LeaseHeld(current)
lease = Lease(
space=space,
holder=holder,
token=uuid.uuid4().hex,
expires_at=now + timedelta(seconds=ttl_seconds),
)
self._leases[space] = lease
return lease
def current(self, space: str) -> Lease | None:
"""The lease for ``space`` if one is currently valid, else None (expired/absent)."""
lease = self._leases.get(space)
return lease if lease is not None and lease.valid_at(self._clock()) else None
def holder_node(self, space: str) -> AppendAuthority | None:
lease = self.current(space)
return self._nodes.get(lease.holder) if lease is not None else None
class AppendAuthority:
"""A coordinator node that appends to the shared log only when it holds the space's lease.
Nodes share one :class:`EventStore` and one :class:`LeaseRegistry`. ``append`` routes itself:
the holder writes; a non-holder forwards to whoever holds the lease (acquiring it first if the
space is currently unleased). The append API mirrors :class:`EventStore` so the authority is a
drop-in single-writer guard.
"""
def __init__(
self,
node_id: str,
store: EventStore,
registry: LeaseRegistry,
ttl_seconds: float = 30.0,
) -> None:
self.node_id = node_id
self._store = store
self._registry = registry
self._ttl = ttl_seconds
registry.register(self)
def acquire(self, space: str) -> Lease:
"""Take (or renew) the lease for ``space``. Raises :class:`LeaseHeld` if another node holds
it validly."""
return self._registry.grant(space, self.node_id, self._ttl)
def holds(self, space: str) -> bool:
lease = self._registry.current(space)
return lease is not None and lease.holder == self.node_id
def append(
self,
space: str,
type: EventType,
payload: Mapping[str, Any],
actor: str | None = None,
) -> DecisionEvent:
"""Append via the single authority. If we hold the lease, write; otherwise forward to the
holder. If the space is unleased, acquire it first. A node with a *stale* lease forwards
(it is not the current holder) rather than writing — so it cannot fork the log."""
holder_node = self._registry.holder_node(space)
if holder_node is None:
self.acquire(space) # unleased: take authority, then write below
holder_node = self
if holder_node is self:
return self._store.append(space, type, payload, actor=actor)
return holder_node._write(space, type, payload, actor=actor)
def _write(
self,
space: str,
type: EventType,
payload: Mapping[str, Any],
actor: str | None,
) -> DecisionEvent:
"""Apply a forwarded intent. Called only on the lease holder by a forwarding peer."""
return self._store.append(space, type, payload, actor=actor)

View File

@@ -3,22 +3,36 @@
Coordination-canonical state (overlays, equivalence bindings, aliases, merges, forks) is an
**append-only decision log**, not a mutable file; the queryable *current* state is a **derived
fold** of the log (tier-3 disposable). The log is **totally ordered per space** via a single
**append authority** — here an in-process counter; a git-backed, lease-held authority is a later
binding. That total order is what gives read-your-writes across readers (§8.6).
**append authority**. That total order is what gives read-your-writes across readers (§8.6).
Storage lives behind :class:`EventStore`: :class:`InMemoryEventStore` is the default test double
(an in-process counter); :class:`~shard_wiki.coordination.git_event_store.GitEventStore` is the
git-addressable backend (SHARD-WP-0009). The :class:`DecisionLog` API and the :meth:`fold` are
identical across backends — only storage + the concurrency model differ.
`derived = f(canonical)`: :class:`CoordinationState` is always reproducible by replaying the log.
"""
from __future__ import annotations
import json
from collections.abc import Mapping
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from types import MappingProxyType
from typing import Any
from typing import Any, Protocol, runtime_checkable
__all__ = ["EventType", "DecisionEvent", "CoordinationState", "DecisionLog"]
__all__ = [
"EventType",
"DecisionEvent",
"CoordinationState",
"EventStore",
"InMemoryEventStore",
"DecisionLog",
"serialize_event",
"deserialize_event",
]
class EventType(Enum):
@@ -63,10 +77,57 @@ class CoordinationState:
return frozenset({identity})
class DecisionLog:
"""In-memory append-only log, totally ordered per space (the append authority for a process).
def serialize_event(event: DecisionEvent) -> bytes:
"""Deterministic, stable-JSON wire form of an event (same bytes for equal events, any process).
A later binding swaps the storage for git + a per-space lease without changing this API.
Sorted keys + compact separators make the serialization canonical, so a git object hashed from
it is reproducible — the basis for content-addressable, comparable logs across backends.
"""
obj = {
"seq": event.seq,
"space": event.space,
"type": event.type.value,
"payload": event.payload,
"actor": event.actor,
"timestamp": event.timestamp.isoformat(),
}
return json.dumps(obj, sort_keys=True, separators=(",", ":"), ensure_ascii=False).encode()
def deserialize_event(data: bytes | str) -> DecisionEvent:
"""Inverse of :func:`serialize_event` — round-trips an event byte-for-byte by field."""
obj = json.loads(data)
return DecisionEvent(
seq=obj["seq"],
space=obj["space"],
type=EventType(obj["type"]),
payload=obj["payload"],
actor=obj["actor"],
timestamp=datetime.fromisoformat(obj["timestamp"]),
)
@runtime_checkable
class EventStore(Protocol):
"""Append-only, per-space ordered storage behind :class:`DecisionLog`.
Two bindings exist: :class:`InMemoryEventStore` (default/test double) and
:class:`~shard_wiki.coordination.git_event_store.GitEventStore` (git-addressable). Both assign
a per-space monotonic ``seq`` at the log head and guarantee read-your-writes for their reach
(in-process for memory; cross-process for git).
"""
def append(
self, space: str, type: EventType, payload: Mapping[str, Any], actor: str | None = None
) -> DecisionEvent: ...
def events(self, space: str) -> tuple[DecisionEvent, ...]: ...
class InMemoryEventStore:
"""In-process append-only store, totally ordered per space (the append authority for a process).
The default test double; the git backend preserves this exact contract on durable storage.
"""
def __init__(self) -> None:
@@ -84,10 +145,33 @@ class DecisionLog:
self._events.setdefault(space, []).append(event)
return event
def events(self, space: str) -> tuple[DecisionEvent, ...]:
return tuple(self._events.get(space, ()))
class DecisionLog:
"""Append-only decision log, totally ordered per space, with a derived :meth:`fold`.
Storage is delegated to an :class:`EventStore` (default :class:`InMemoryEventStore`); swapping
in the git backend changes only durability + the concurrency model, not this API or the fold.
"""
def __init__(self, store: EventStore | None = None) -> None:
self._store: EventStore = store if store is not None else InMemoryEventStore()
def append(
self,
space: str,
type: EventType,
payload: Mapping[str, Any],
actor: str | None = None,
) -> DecisionEvent:
return self._store.append(space, type, payload, actor=actor)
def events(self, space: str) -> tuple[DecisionEvent, ...]:
"""The space's events in append (total) order. Read-your-writes: a just-appended event
is present immediately."""
return tuple(self._events.get(space, ()))
return self._store.events(space)
def fold(self, space: str) -> CoordinationState:
"""Replay the log into current coordination state (derived = f(log))."""

View File

@@ -0,0 +1,172 @@
"""GitEventStore — a git-addressable binding of :class:`EventStore` (SHARD-WP-0009 T1).
Each space is a ref (``refs/spaces/<sha1(space)>``); each ``append`` writes the event as an
immutable git object (a one-blob tree committed onto the ref) and advances the ref. The commit
chain *is* the totally ordered log: ``seq`` is the depth, ``events`` walks first-parent from the
head oldest→newest. Coordination-canonical state therefore inherits git's history / patch /
review / backup affordances (I-6) and is read-your-writes correct across processes.
The total order is enforced at storage by a **compare-and-swap** ref update
(``git update-ref <ref> <new> <old>``): two appenders racing off the same head — the loser's CAS
fails and it retries off the new head, so a non-holder can never fork the log. The lease layer
(T2) sits *above* this as the append-authority policy; CAS is the mechanism that makes it safe.
Implemented over the ``git`` CLI through :mod:`subprocess` — zero runtime dependencies.
"""
from __future__ import annotations
import hashlib
import os
import subprocess
from collections.abc import Mapping
from pathlib import Path
from typing import Any
from shard_wiki.coordination.decision_log import (
DecisionEvent,
EventType,
deserialize_event,
serialize_event,
)
__all__ = ["GitEventStore"]
# Fixed identity so commit objects are reproducible and never prompt for git config; the event's
# own timestamp/actor carry the real provenance, the commit is just the ordered container.
_GIT_IDENTITY = {
"GIT_AUTHOR_NAME": "shard-wiki",
"GIT_AUTHOR_EMAIL": "coordination@shard-wiki",
"GIT_COMMITTER_NAME": "shard-wiki",
"GIT_COMMITTER_EMAIL": "coordination@shard-wiki",
}
_EVENT_PATH = "event.json"
_MAX_CAS_RETRIES = 50
class GitEventStore:
"""Git-backed, append-only, per-space ordered event store (an :class:`EventStore`)."""
def __init__(self, repo_path: str | Path) -> None:
self.repo_path = Path(repo_path)
self.repo_path.mkdir(parents=True, exist_ok=True)
if not (self.repo_path / "HEAD").exists() and not (self.repo_path / ".git").exists():
self._git("init", "--quiet", str(self.repo_path), at_cwd=True)
# -- EventStore contract -------------------------------------------------
def append(
self,
space: str,
type: EventType,
payload: Mapping[str, Any],
actor: str | None = None,
) -> DecisionEvent:
"""Append one event, advancing the space ref under compare-and-swap (retry-on-race)."""
ref = self._ref(space)
for _ in range(_MAX_CAS_RETRIES):
head = self._head(ref)
seq = self._count(ref, head)
event = DecisionEvent(
seq=seq, space=space, type=type, payload=dict(payload), actor=actor
)
commit = self._commit_event(event, parent=head)
if self._cas_update(ref, new=commit, old=head):
return event
raise RuntimeError(f"append contention on {space!r}: exhausted {_MAX_CAS_RETRIES} retries")
def import_event(self, event: DecisionEvent) -> None:
"""Replay one pre-existing event *verbatim* (preserving seq / timestamp / actor) onto its
space ref — the one-time migration path (SHARD-WP-0009 T4), not a live append.
Refuses out-of-order import so the imported chain stays a contiguous total order; preserving
the original fields keeps provenance intact (union-without-erasure) rather than restamping.
"""
ref = self._ref(event.space)
head = self._head(ref)
expected = self._count(ref, head)
if event.seq != expected:
raise ValueError(
f"out-of-order import on {event.space!r}: expected seq {expected}, got {event.seq}"
)
commit = self._commit_event(event, parent=head)
if not self._cas_update(ref, new=commit, old=head):
raise RuntimeError(f"import race on {ref}")
def events(self, space: str) -> tuple[DecisionEvent, ...]:
"""The space's events oldest→newest (append/total order)."""
ref = self._ref(space)
head = self._head(ref)
if head is None:
return ()
shas = self._git("rev-list", "--reverse", "--first-parent", ref).decode().split()
return tuple(
deserialize_event(self._git("cat-file", "blob", f"{sha}:{_EVENT_PATH}"))
for sha in shas
)
# -- git plumbing --------------------------------------------------------
def _commit_event(self, event: DecisionEvent, parent: str | None) -> str:
blob = self._git(
"hash-object", "-w", "--stdin", stdin=serialize_event(event)
).decode().strip()
tree = self._git(
"mktree", stdin=f"100644 blob {blob}\t{_EVENT_PATH}\n".encode()
).decode().strip()
args = ["commit-tree", tree, "-m", f"event {event.seq} {event.type.value}"]
if parent is not None:
args += ["-p", parent]
# Pin the commit date to the event's timestamp for reproducible objects.
date = event.timestamp.isoformat()
env = {**_GIT_IDENTITY, "GIT_AUTHOR_DATE": date, "GIT_COMMITTER_DATE": date}
return self._git(*args, env=env).decode().strip()
def _cas_update(self, ref: str, new: str, old: str | None) -> bool:
"""``git update-ref`` with the old value as a CAS guard (empty oldvalue == must-not-exist).
Returns False if the ref moved since we read ``old`` (lost the race) — the caller retries.
"""
result = self._run("update-ref", ref, new, old if old is not None else "")
return result.returncode == 0
def _head(self, ref: str) -> str | None:
result = self._run("rev-parse", "--verify", "--quiet", ref)
out = result.stdout.decode().strip()
return out or None
def _count(self, ref: str, head: str | None) -> int:
if head is None:
return 0
return int(self._git("rev-list", "--count", "--first-parent", ref).decode().strip())
@staticmethod
def _ref(space: str) -> str:
return f"refs/spaces/{hashlib.sha1(space.encode()).hexdigest()}"
def _git(
self,
*args: str,
stdin: bytes | None = None,
env: dict | None = None,
at_cwd: bool = False,
) -> bytes:
result = self._run(*args, stdin=stdin, env=env, at_cwd=at_cwd, check=True)
return result.stdout
def _run(
self,
*args: str,
stdin: bytes | None = None,
env: dict | None = None,
at_cwd: bool = False,
check: bool = False,
) -> subprocess.CompletedProcess:
base = ["git"] if at_cwd else ["git", "-C", str(self.repo_path)]
return subprocess.run(
[*base, *args],
input=stdin,
capture_output=True,
env={**os.environ, **(env or {})},
check=check,
)

View File

@@ -0,0 +1,53 @@
"""One-time migration of a coordination log into git (SHARD-WP-0009 T4).
Replays an existing decision log — an in-memory store, or a JSON-lines export — into a
:class:`GitEventStore`, preserving each event verbatim (seq / timestamp / actor) so provenance
survives the move (union-without-erasure). After migration the same :meth:`DecisionLog.fold`
reproduces identical coordination state; only durability changes.
"""
from __future__ import annotations
from collections.abc import Iterable
from pathlib import Path
from shard_wiki.coordination.decision_log import (
DecisionEvent,
EventStore,
deserialize_event,
serialize_event,
)
from shard_wiki.coordination.git_event_store import GitEventStore
__all__ = ["import_log", "migrate_space", "export_jsonl", "import_jsonl"]
def import_log(events: Iterable[DecisionEvent], dest: GitEventStore) -> int:
"""Replay ``events`` (in space/seq order) into ``dest``. Returns the count imported."""
count = 0
for event in events:
dest.import_event(event)
count += 1
return count
def migrate_space(source: EventStore, space: str, dest: GitEventStore) -> int:
"""Migrate one space's log from any :class:`EventStore` into the git backend verbatim."""
return import_log(source.events(space), dest)
def export_jsonl(events: Iterable[DecisionEvent], path: str | Path) -> int:
"""Write events as newline-delimited canonical JSON (a portable, diffable log export)."""
count = 0
with open(path, "wb") as handle:
for event in events:
handle.write(serialize_event(event) + b"\n")
count += 1
return count
def import_jsonl(path: str | Path, dest: GitEventStore) -> int:
"""Replay a JSON-lines export (see :func:`export_jsonl`) into the git backend."""
with open(path, "rb") as handle:
events = [deserialize_event(line) for line in handle if line.strip()]
return import_log(events, dest)

View File

@@ -0,0 +1,46 @@
"""incremental/ — the incremental-first derived tier (CoreArchitectureBlueprint §8.7).
Equivalence is **indexed** (blocking/LSH + verify), not pairwise O(N²); maintenance is
**change-driven** (delta with retraction + propagation, review B-4), keeping the derived tier equal
to a from-scratch rebuild — which becomes a bounded fallback, not the operational path. A
Merkle-style **digest** plus a background **consistency-checker** make ``derived = f(canonical)``
verified rather than asserted (I-2), self-healing on detected drift.
In-memory only for this slice (no persisted index store); per-partition structure is honoured but
multi-tenant deployment is later. Per the dependency rule this imports down (model/provenance) and
is wired by the orchestrator.
"""
from shard_wiki.incremental.equivalence import (
EquivalenceEdge,
EquivalenceIndex,
normalized_title,
)
from shard_wiki.incremental.minhash import (
MinHasher,
band_keys,
jaccard,
shingles,
)
from shard_wiki.incremental.union_index import UnionIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
derived_digest,
region_digest,
)
__all__ = [
"shingles",
"MinHasher",
"band_keys",
"jaccard",
"EquivalenceEdge",
"EquivalenceIndex",
"normalized_title",
"derived_digest",
"region_digest",
"ConsistencyReport",
"ConsistencyChecker",
"UnionIndex",
]

View File

@@ -0,0 +1,225 @@
"""Indexed equivalence — blocking + verify, incrementally maintained (SHARD-WP-0011 T1/T2).
Equivalence (two *distinct* identities holding the same page) is detected without pairwise O(N²):
1. **Blocking** generates candidate pairs — pages sharing a normalized-title bucket or an LSH band
(MinHash over content shingles).
2. **Verify** confirms a candidate — exact-body fingerprint match, or shingle Jaccard ≥ threshold —
plus **curator bindings** (explicit decision-log edges) which are always equivalence edges.
The index is **incrementally maintained** (T2): ``add`` / ``update`` / ``remove`` re-bucket the
changed page, **retract** the edges it leaves and **add** the edges it enters; equivalence groups
are the connected components of the current edge set, so a retraction that disconnects a component
**splits** a chorus automatically. A full :meth:`build` is just repeated ``add`` — the bounded
rebuild fallback. The invariant (and the test oracle): incremental state == a from-scratch rebuild.
"""
from __future__ import annotations
import hashlib
import re
from collections.abc import Iterable
from dataclasses import dataclass
from shard_wiki.incremental.minhash import MinHasher, band_keys, jaccard, shingles
from shard_wiki.model import Identity, Page
__all__ = ["EquivalenceEdge", "EquivalenceIndex", "normalized_title"]
_NONALNUM_RE = re.compile(r"[^a-z0-9]+")
def normalized_title(key: str) -> str:
"""A blocking bucket key: the last path segment, lowercased, stripped of non-alphanumerics."""
leaf = key.rsplit("/", 1)[-1]
return _NONALNUM_RE.sub("", leaf.lower())
@dataclass(frozen=True, slots=True)
class EquivalenceEdge:
"""A verified equivalence between two identities, tagged with why it was accepted."""
a: Identity
b: Identity
reason: str # "fingerprint" | "content" | "curator"
@dataclass(frozen=True, slots=True)
class _Entry:
shingle_set: frozenset[str]
bands: tuple[tuple[int, tuple[int, ...]], ...]
title: str
fingerprint: str
def _fingerprint(body: str) -> str:
return hashlib.blake2b(body.strip().encode("utf-8"), digest_size=16).hexdigest()
def _pair(a: Identity, b: Identity) -> frozenset[Identity]:
return frozenset((a, b))
class EquivalenceIndex:
"""An incrementally maintained, blocked-and-verified equivalence relation over union pages."""
def __init__(
self,
*,
num_perm: int = 64,
num_bands: int = 32,
threshold: float = 0.7,
hasher: MinHasher | None = None,
) -> None:
self.threshold = threshold
self.num_bands = num_bands
self._hasher = hasher or MinHasher(num_perm=num_perm)
self._entries: dict[Identity, _Entry] = {}
self._band_buckets: dict[tuple[int, tuple[int, ...]], set[Identity]] = {}
self._title_buckets: dict[str, set[Identity]] = {}
self._content_edges: dict[frozenset[Identity], str] = {}
self._curator_edges: set[frozenset[Identity]] = set()
# -- build / maintain ----------------------------------------------------
def build(
self,
pages: Iterable[Page],
curator_edges: Iterable[tuple[Identity, Identity]] = (),
) -> None:
"""Rebuild from scratch (the bounded fallback): add every page, then curator edges."""
self.__init__(
num_bands=self.num_bands, threshold=self.threshold, hasher=self._hasher
)
for page in pages:
self.add(page)
for a, b in curator_edges:
self.bind(a, b)
def add(self, page: Page) -> None:
"""Index a new (or, via :meth:`update`, refreshed) page and add its equivalence edges."""
identity = page.identity
entry = self._make_entry(page)
self._entries[identity] = entry
for key in entry.bands:
self._band_buckets.setdefault(key, set()).add(identity)
self._title_buckets.setdefault(entry.title, set()).add(identity)
for candidate in self._candidates(identity, entry):
reason = self._verify(identity, candidate)
if reason is not None:
self._content_edges[_pair(identity, candidate)] = reason
def remove(self, identity: Identity) -> None:
"""Drop a page: de-bucket it and retract every content edge incident to it."""
entry = self._entries.pop(identity, None)
if entry is None:
return
for key in entry.bands:
self._discard_bucket(self._band_buckets, key, identity)
self._discard_bucket(self._title_buckets, entry.title, identity)
for edge in [e for e in self._content_edges if identity in e]:
del self._content_edges[edge]
def update(self, page: Page) -> None:
"""Apply a change as retract-then-add: stale (bucket-exit) edges go, new edges arrive."""
self.remove(page.identity)
self.add(page)
def bind(self, a: Identity, b: Identity) -> None:
"""Record a curator equivalence (an explicit decision-log binding); always an edge."""
if a != b:
self._curator_edges.add(_pair(a, b))
def unbind(self, a: Identity, b: Identity) -> None:
self._curator_edges.discard(_pair(a, b))
def set_curator_edges(self, edges: Iterable[tuple[Identity, Identity]]) -> None:
"""Replace all curator edges at once (re-syncing from the decision-log fold)."""
self._curator_edges = {_pair(a, b) for a, b in edges if a != b}
# -- queries -------------------------------------------------------------
def identities(self) -> frozenset[Identity]:
"""All identities currently present in the index."""
return frozenset(self._entries)
def fingerprint(self, identity: Identity) -> str | None:
"""The content fingerprint indexed for ``identity`` (None if absent) — a digest leaf."""
entry = self._entries.get(identity)
return entry.fingerprint if entry is not None else None
def edges(self) -> frozenset[frozenset[Identity]]:
"""All equivalence edges (content + curator) among currently present identities."""
present = self._entries.keys()
curator = {e for e in self._curator_edges if e <= present}
return frozenset(set(self._content_edges) | curator)
def groups(self) -> tuple[frozenset[Identity], ...]:
"""Equivalence groups: connected components of size ≥ 2 (union-find over the edges)."""
parent: dict[Identity, Identity] = {}
def find(x: Identity) -> Identity:
parent.setdefault(x, x)
root = x
while parent[root] != root:
root = parent[root]
while parent[x] != root:
parent[x], x = root, parent[x]
return root
for edge in self.edges():
a, b = tuple(edge)
ra, rb = find(a), find(b)
if ra != rb:
parent[ra] = rb
comps: dict[Identity, set[Identity]] = {}
for node in parent:
comps.setdefault(find(node), set()).add(node)
return tuple(
frozenset(members) for members in comps.values() if len(members) > 1
)
def equivalent_to(self, identity: Identity) -> frozenset[Identity]:
"""The equivalence group containing ``identity`` (including itself), else just itself."""
for group in self.groups():
if identity in group:
return group
return frozenset({identity})
# -- internals -----------------------------------------------------------
def _make_entry(self, page: Page) -> _Entry:
shingle_set = shingles(page.body)
signature = self._hasher.signature(shingle_set)
return _Entry(
shingle_set=shingle_set,
bands=band_keys(signature, self.num_bands),
title=normalized_title(page.identity.key),
fingerprint=_fingerprint(page.body),
)
def _candidates(self, identity: Identity, entry: _Entry) -> set[Identity]:
candidates: set[Identity] = set()
for key in entry.bands:
candidates |= self._band_buckets.get(key, set())
candidates |= self._title_buckets.get(entry.title, set())
candidates.discard(identity)
return candidates
def _verify(self, a: Identity, b: Identity) -> str | None:
ea, eb = self._entries[a], self._entries[b]
if ea.fingerprint == eb.fingerprint:
return "fingerprint"
if jaccard(ea.shingle_set, eb.shingle_set) >= self.threshold:
return "content"
return None
@staticmethod
def _discard_bucket(buckets: dict, key, identity: Identity) -> None:
bucket = buckets.get(key)
if bucket is not None:
bucket.discard(identity)
if not bucket:
del buckets[key]

View File

@@ -0,0 +1,71 @@
"""MinHash + LSH banding primitives for content-similarity blocking (SHARD-WP-0011 T1).
Pure, deterministic functions (fixed hashing, no per-run randomness) so the derived tier and its
digest are reproducible. Shingle a body into k-grams, MinHash the shingle set into a signature,
split the signature into LSH bands; two pages sharing a band are *candidates* for equivalence —
the cheap pre-filter that replaces pairwise O(N²) comparison.
"""
from __future__ import annotations
import hashlib
import random
import re
from collections.abc import Iterable
__all__ = ["shingles", "MinHasher", "band_keys", "jaccard"]
_WORD_RE = re.compile(r"\w+")
# Largest Mersenne prime below 2**61 — the modulus for the universal-hash permutations.
_PRIME = (1 << 61) - 1
def shingles(text: str, k: int = 3) -> frozenset[str]:
"""The set of word k-grams in ``text`` (lowercased). Short texts fall back to their word set."""
words = _WORD_RE.findall(text.lower())
if len(words) < k:
return frozenset(words)
return frozenset(" ".join(words[i : i + k]) for i in range(len(words) - k + 1))
def _stable_hash(token: str) -> int:
return int.from_bytes(hashlib.blake2b(token.encode("utf-8"), digest_size=8).digest(), "big")
class MinHasher:
"""A bank of ``num_perm`` universal hash permutations producing a fixed-length signature."""
def __init__(self, num_perm: int = 64, seed: int = 1) -> None:
self.num_perm = num_perm
rng = random.Random(seed)
self._coeffs = [
(rng.randrange(1, _PRIME), rng.randrange(0, _PRIME)) for _ in range(num_perm)
]
def signature(self, shingle_set: Iterable[str]) -> tuple[int, ...]:
"""The MinHash signature of ``shingle_set`` (empty set → all-``_PRIME`` sentinel)."""
hashed = [_stable_hash(s) for s in shingle_set]
if not hashed:
return tuple(_PRIME for _ in self._coeffs)
return tuple(min((a * h + b) % _PRIME for h in hashed) for a, b in self._coeffs)
def band_keys(
signature: tuple[int, ...], num_bands: int
) -> tuple[tuple[int, tuple[int, ...]], ...]:
"""Split a signature into ``num_bands`` band keys; two pages sharing one are LSH candidates."""
if num_bands <= 0 or len(signature) % num_bands != 0:
raise ValueError(f"signature length {len(signature)} not divisible into {num_bands} bands")
rows = len(signature) // num_bands
return tuple(
(b, signature[b * rows : (b + 1) * rows]) for b in range(num_bands)
)
def jaccard(a: frozenset[str], b: frozenset[str]) -> float:
"""Jaccard similarity of two shingle sets; two empty sets are defined as identical (1.0)."""
if not a and not b:
return 1.0
if not a or not b:
return 0.0
return len(a & b) / len(a | b)

View File

@@ -0,0 +1,91 @@
"""UnionIndex — the maintained derived tier wired behind resolution + views (SHARD-WP-0011 T4).
Wraps a :class:`UnionGraph` + decision log with an incrementally maintained
:class:`EquivalenceIndex`. Content equivalence is kept fresh by deltas (``note_change`` /
``note_removed``); curator bindings are re-synced live from the log fold. A full :meth:`rebuild`
is the bounded fallback. :meth:`verify` runs the I-2 consistency-checker over the live source.
Consumer-visible results are unchanged — equivalence groups are exposed in the same string form the
decision-log fold uses, a *superset* that additionally collapses genuine content duplicates — only
freshness and cost differ (recompute-on-read becomes change-driven).
"""
from __future__ import annotations
from shard_wiki.coordination import DecisionLog
from shard_wiki.incremental.equivalence import EquivalenceIndex
from shard_wiki.incremental.verification import (
ConsistencyChecker,
ConsistencyReport,
derived_digest,
)
from shard_wiki.model import Identity, Page
from shard_wiki.union import UnionGraph
__all__ = ["UnionIndex"]
def _identity(token: str) -> Identity:
shard, _, key = token.partition(":")
return Identity(shard, key)
class UnionIndex:
"""An incrementally maintained equivalence index over a union, with a rebuild fallback."""
def __init__(self, union: UnionGraph, log: DecisionLog, space: str) -> None:
self._union = union
self._log = log
self._space = space
self._eq = EquivalenceIndex()
self.rebuild()
def rebuild(self) -> None:
"""The bounded fallback: re-derive the whole index from current union pages + bindings."""
self._eq.build(self._union.iter_pages())
self._sync_curator()
def note_change(self, page: Page) -> None:
"""Change-driven update for one added/edited page (the operational path)."""
self._eq.update(page)
def note_removed(self, identity: Identity) -> None:
self._eq.remove(identity)
def _sync_curator(self) -> None:
"""Re-sync curator equivalence from the live decision-log fold (cheap, always correct)."""
groups = self._log.fold(self._space).equivalence_groups
edges: list[tuple[Identity, Identity]] = []
for group in groups:
members = [_identity(m) for m in group]
edges.extend((members[0], other) for other in members[1:])
self._eq.set_curator_edges(edges)
def equivalence_groups(self) -> tuple[frozenset[str], ...]:
"""Equivalence groups in decision-log string form (curator content), for the views."""
self._sync_curator()
return tuple(
frozenset(str(identity) for identity in group) for group in self._eq.groups()
)
def digest(self) -> str:
"""The Merkle-style digest of the maintained derived tier (I-2)."""
self._sync_curator()
return derived_digest(self._eq)
def verify(self) -> ConsistencyReport:
"""Check the maintained index against a from-scratch fold of the live source; self-heal."""
self._sync_curator()
checker = ConsistencyChecker(
self._eq,
pages=lambda: list(self._union.iter_pages()),
curator_edges=self._curator_pairs,
)
return checker.check_and_repair()
def _curator_pairs(self) -> list[tuple[Identity, Identity]]:
pairs: list[tuple[Identity, Identity]] = []
for group in self._log.fold(self._space).equivalence_groups:
members = [_identity(m) for m in group]
pairs.extend((members[0], other) for other in members[1:])
return pairs

View File

@@ -0,0 +1,112 @@
"""I-2 verification — digest + background consistency-checker (SHARD-WP-0011 T3).
``derived = f(canonical)`` is made *verified*, not asserted. A **Merkle-style digest** summarizes
the derived tier (each identity's content fingerprint + its incident equivalence edges as a leaf,
order-independently combined into a root) so two derived states are equal iff their digests match.
A **consistency-checker** recomputes the authoritative fold from the current source, compares it to
the maintained index over a (sampled) region, and on mismatch performs a **scoped recompute** of
just the affected identities — self-healing drift from a missed delta or corrupted state.
The digest is a pure function of index state, so it is "maintained alongside deltas" for free and
is stable under equivalent event orders (leaves are sorted before combination).
"""
from __future__ import annotations
import hashlib
from collections.abc import Callable, Iterable
from dataclasses import dataclass
from shard_wiki.incremental.equivalence import EquivalenceIndex
from shard_wiki.model import Identity, Page
__all__ = ["region_digest", "derived_digest", "ConsistencyReport", "ConsistencyChecker"]
CuratorEdges = Iterable[tuple[Identity, Identity]]
def _leaf(index: EquivalenceIndex, identity: Identity) -> str:
"""A digest leaf for one identity: its fingerprint + its incident edges (as sorted peers)."""
fingerprint = index.fingerprint(identity) or ""
peers = sorted(
str(other)
for edge in index.edges()
if identity in edge
for other in edge
if other != identity
)
payload = f"{identity}|{fingerprint}|{','.join(peers)}"
return hashlib.blake2b(payload.encode("utf-8"), digest_size=16).hexdigest()
def region_digest(index: EquivalenceIndex, identities: Iterable[Identity]) -> str:
"""A Merkle-style root over the given identities' leaves (order-independent)."""
leaves = sorted(_leaf(index, identity) for identity in identities)
root = hashlib.blake2b(digest_size=16)
for leaf in leaves:
root.update(leaf.encode("utf-8"))
return root.hexdigest()
def derived_digest(index: EquivalenceIndex) -> str:
"""The digest of the whole maintained derived tier."""
return region_digest(index, index.identities())
@dataclass(frozen=True, slots=True)
class ConsistencyReport:
"""Outcome of a consistency check: what was examined, whether it drifted, and if it healed."""
checked: int
drifted: bool
repaired: bool
healthy: bool
class ConsistencyChecker:
"""Compares the maintained index against an authoritative rebuild and repairs drift in place."""
def __init__(
self,
index: EquivalenceIndex,
pages: Callable[[], Iterable[Page]],
curator_edges: Callable[[], CuratorEdges] = lambda: (),
) -> None:
self._index = index
self._pages = pages
self._curator = curator_edges
def _authoritative(self) -> EquivalenceIndex:
expected = EquivalenceIndex(
num_bands=self._index.num_bands, threshold=self._index.threshold
)
expected.build(list(self._pages()), list(self._curator()))
return expected
def check_and_repair(self, sample: Iterable[Identity] | None = None) -> ConsistencyReport:
"""Verify the (sampled) region against a from-scratch fold; scoped-recompute on mismatch."""
source = {p.identity: p for p in self._pages()}
expected = self._authoritative()
region = (
set(sample)
if sample is not None
else set(source) | set(self._index.identities())
)
drifted = region_digest(self._index, region) != region_digest(expected, region)
if not drifted:
return ConsistencyReport(len(region), drifted=False, repaired=False, healthy=True)
self._repair(region, source)
healthy = region_digest(self._index, region) == region_digest(expected, region)
return ConsistencyReport(len(region), drifted=True, repaired=True, healthy=healthy)
def _repair(self, region: set[Identity], source: dict[Identity, Page]) -> None:
"""Scoped recompute: reconcile each affected identity to the current source."""
present = self._index.identities()
for identity in region:
page = source.get(identity)
if page is not None:
self._index.update(page) if identity in present else self._index.add(page)
elif identity in present:
self._index.remove(identity)

View File

@@ -8,32 +8,69 @@ a network API is a later workplan.
from __future__ import annotations
from pathlib import Path
from shard_wiki.adapters import ShardAdapter, assert_conformant
from shard_wiki.coordination import (
ApplyResult,
DecisionLog,
EventStore,
EventType,
GitEventStore,
Overlay,
OverlayEngine,
)
from shard_wiki.incremental import ConsistencyReport, UnionIndex
from shard_wiki.model import Page
from shard_wiki.policy import DEFAULT_POLICY, Policy
from shard_wiki.union import Resolution, UnionGraph
from shard_wiki.views import (
AllPagesEntry,
BackLink,
ChangeEntry,
SiteMapNode,
all_pages,
build_backlinks,
recent_changes,
site_map,
)
__all__ = ["InformationSpace"]
class InformationSpace:
def __init__(self, space_id: str, policy: Policy = DEFAULT_POLICY) -> None:
def __init__(
self,
space_id: str,
policy: Policy = DEFAULT_POLICY,
*,
store: EventStore | None = None,
) -> None:
"""Tie the slice together. ``store`` selects the coordination-log backend: the default
in-memory store (tests) or a git-addressable one. Use :meth:`git_backed` for the latter."""
self.space_id = space_id
self.log = DecisionLog()
self.log = DecisionLog(store)
self.union = UnionGraph(space_id, log=self.log, policy=policy)
self.overlays = OverlayEngine(space_id, self.log)
self._index: UnionIndex | None = None # maintained derived tier, built lazily
self._index_stale = True
@classmethod
def git_backed(
cls,
space_id: str,
repo_path: str | Path,
policy: Policy = DEFAULT_POLICY,
) -> InformationSpace:
"""An information space whose coordination log is git-addressable (history/patch/review/
backup — I-6). The decision log lives in the git repo at ``repo_path``."""
return cls(space_id, policy, store=GitEventStore(repo_path))
def attach(self, adapter: ShardAdapter) -> None:
"""Attach a shard — only if it passes conformance (verified profile, I-3/§6.6)."""
assert_conformant(adapter)
self.union.attach(adapter)
self._index_stale = True
def alias(self, name: str, target: str, actor: str | None = None) -> None:
"""Record a coordination-canonical alias (``name`` → ``"shard:key"``) in the log."""
@@ -68,4 +105,44 @@ class InformationSpace:
write-through-capable target fast-forwards (write-through); a read-only target keeps the
draft as local truth (I-5: overlay before mutation, always)."""
overlay = self.overlay(name, body, actor=actor)
return self.apply_overlay(overlay.overlay_id)
result = self.apply_overlay(overlay.overlay_id)
self._index_stale = True # the applied edit changes the derived tier
return result
# --- maintained derived tier (SHARD-WP-0011): incremental-first, rebuild as fallback ---
@property
def index(self) -> UnionIndex:
"""The maintained equivalence index (built lazily; rebuilt when the union has changed)."""
if self._index is None:
self._index = UnionIndex(self.union, self.log, self.space_id)
elif self._index_stale:
self._index.rebuild() # bounded fallback after a mutation
self._index_stale = False
return self._index
def reindex(self) -> None:
"""Force a full rebuild of the maintained derived tier (the explicit fallback path)."""
self.index.rebuild()
def verify_index(self) -> ConsistencyReport:
"""Run the I-2 consistency-checker over the maintained tier; self-heal any drift."""
return self.index.verify()
# --- derived views (SHARD-WP-0010): recomputable, provenance-carrying, presentation-free ---
def backlinks(self, name: str, *, camelcase: bool = False) -> tuple[BackLink, ...]:
"""Pages across the union that link to ``name`` (UC-18)."""
return build_backlinks(self.union, camelcase=camelcase).to(name)
def recent_changes(self, *, limit: int | None = None) -> tuple[ChangeEntry, ...]:
"""The merged newest-first change feed: coordination journal + shard signals (UC-17)."""
return recent_changes(self.union, self.log, self.space_id, limit=limit)
def all_pages(self) -> tuple[AllPagesEntry, ...]:
"""The union's distinct pages, collapsed via the maintained equivalence index."""
return all_pages(self.union, equivalence_groups=self.index.equivalence_groups())
def site_map(self) -> SiteMapNode:
"""The union namespace tree built from page placements."""
return site_map(self.union)

View File

@@ -13,6 +13,7 @@ imported by nothing.
from __future__ import annotations
import dataclasses
from collections.abc import Iterator
from dataclasses import dataclass
from enum import Enum
@@ -68,6 +69,20 @@ class UnionGraph:
def shard(self, shard_id: str) -> ShardAdapter | None:
return next((s for s in self._shards if s.shard_id == shard_id), None)
@property
def shards(self) -> tuple[ShardAdapter, ...]:
return tuple(self._shards)
def iter_pages(self) -> Iterator[Page]:
"""Every page across attached shards, raw (per-shard, not chorus-collapsed). The
enumeration substrate for derived views — BackLinks, AllPages, SiteMap (§8.4)."""
for shard in self._shards:
for key in shard.keys():
try:
yield shard.read(key)
except KeyError:
continue
def _read_all(self, key: str) -> list[Page]:
pages: list[Page] = []
for shard in self._shards:

View File

@@ -0,0 +1,33 @@
"""views/ — derived, recomputable, provenance-carrying read views over the union (§8.4).
All views here are *derived tier*: pure functions of the attached shards plus the coordination-log
fold, storing nothing canonical (SHARD-WP-0011 makes them incrementally maintainable). Presentation
stays out of core (L6) — these produce models, never rendered output. Per the dependency rule this
package imports down (union/model/coordination/provenance) and is imported only by the orchestrator.
"""
from shard_wiki.views.allpages import AllPagesEntry, SiteMapNode, all_pages, site_map
from shard_wiki.views.backlinks import BackLink, BackLinksIndex, build_backlinks
from shard_wiki.views.links import (
ResolvedLink,
WikiLink,
extract_links,
resolve_links,
)
from shard_wiki.views.recentchanges import ChangeEntry, recent_changes
__all__ = [
"WikiLink",
"ResolvedLink",
"extract_links",
"resolve_links",
"BackLink",
"BackLinksIndex",
"build_backlinks",
"ChangeEntry",
"recent_changes",
"AllPagesEntry",
"SiteMapNode",
"all_pages",
"site_map",
]

View File

@@ -0,0 +1,131 @@
"""AllPages + SiteMap — enumeration views over the union (SHARD-WP-0010 T4).
**AllPages** lists the union's distinct pages, collapsing identities that name the same page: a
*chorus* (same key across shards) and *equivalence-bound* identities (decision-log bindings) fold
into one entry, with divergence noted when the members' bodies differ (union without erasure — the
collapse is acknowledged, never silent). **SiteMap** is the namespace tree built from page
placements (paths), spanning shards.
Both are derived/recomputable and presentation-free (the tree is a model, not rendered HTML).
"""
from __future__ import annotations
from dataclasses import dataclass
from shard_wiki.model import Identity, Page
from shard_wiki.union import UnionGraph
__all__ = ["AllPagesEntry", "SiteMapNode", "all_pages", "site_map"]
@dataclass(frozen=True, slots=True)
class AllPagesEntry:
"""One union page: its representative ``name``, the ``members`` collapsed into it, and whether
those members' bodies ``diverge`` (a chorus with differing content)."""
name: str
members: tuple[Identity, ...]
diverges: bool
@dataclass(frozen=True, slots=True)
class SiteMapNode:
"""A namespace node: its path ``name``, child namespaces, and pages directly under it."""
name: str
children: tuple[SiteMapNode, ...]
pages: tuple[Identity, ...]
class _UnionFind:
def __init__(self) -> None:
self._parent: dict[str, str] = {}
def add(self, x: str) -> None:
self._parent.setdefault(x, x)
def find(self, x: str) -> str:
self.add(x)
root = x
while self._parent[root] != root:
root = self._parent[root]
while self._parent[x] != root:
self._parent[x], x = root, self._parent[x]
return root
def union(self, a: str, b: str) -> None:
self.add(a)
self.add(b)
ra, rb = self.find(a), self.find(b)
if ra != rb:
self._parent[max(ra, rb)] = min(ra, rb)
def all_pages(
union: UnionGraph,
equivalence_groups: tuple[frozenset[str], ...] | None = None,
) -> tuple[AllPagesEntry, ...]:
"""Enumerate the union's distinct pages, collapsing chorus + equivalence-bound members.
``equivalence_groups`` (string identities, decision-log form) overrides the source of
equivalence — the orchestrator passes the maintained index's groups (SHARD-WP-0011 T4); the
default falls back to the decision-log fold, so direct callers are unaffected.
"""
pages: dict[str, Page] = {}
by_key: dict[str, list[str]] = {}
for page in union.iter_pages():
ident = str(page.identity)
pages[ident] = page
by_key.setdefault(page.identity.key, []).append(ident)
uf = _UnionFind()
for ident in pages:
uf.add(ident)
for idents in by_key.values(): # same key across shards → chorus
for other in idents[1:]:
uf.union(idents[0], other)
if equivalence_groups is None:
equivalence_groups = union.log.fold(union.space).equivalence_groups
for group in equivalence_groups: # curator bindings (+ maintained content edges)
present = [m for m in group if m in pages]
for other in present[1:]:
uf.union(present[0], other)
groups: dict[str, list[str]] = {}
for ident in pages:
groups.setdefault(uf.find(ident), []).append(ident)
entries: list[AllPagesEntry] = []
for members in groups.values():
member_pages = [pages[m] for m in members]
identities = tuple(p.identity for p in member_pages)
name = min(p.identity.key for p in member_pages)
diverges = len({p.body for p in member_pages}) > 1
entries.append(AllPagesEntry(name=name, members=identities, diverges=diverges))
return tuple(sorted(entries, key=lambda e: e.name))
def _segments(page: Page) -> list[str]:
path = page.placements[0].path if page.placements else page.identity.key
if path.endswith(".md"):
path = path[:-3]
return [seg for seg in path.split("/") if seg]
def site_map(union: UnionGraph) -> SiteMapNode:
"""The union namespace tree from page placements (directories nest; pages sit at their dir)."""
root: dict = {"children": {}, "pages": []}
for page in union.iter_pages():
segments = _segments(page)
node = root
for seg in segments[:-1]: # directory segments build the nesting
node = node["children"].setdefault(seg, {"children": {}, "pages": []})
node["pages"].append(page.identity)
return _freeze("", root)
def _freeze(name: str, node: dict) -> SiteMapNode:
children = tuple(_freeze(k, v) for k, v in sorted(node["children"].items()))
pages = tuple(sorted(node["pages"], key=str))
return SiteMapNode(name=name, children=children, pages=pages)

View File

@@ -0,0 +1,65 @@
"""BackLinks — the strongest core derived view (SHARD-WP-0010 T2; UC-18).
For any page name, the set of pages that link to it. Built by extracting wikilinks (T1) from every
page across the attached shards and resolving each through the union: only **resolved** links
create a backlink (a red-link points at nothing, so it contributes none). Entries carry their
**source provenance** (the linking page's identity / shard). Keying by the resolved *name* means a
chorus target aggregates the backlinks of all its members into one bucket (union without erasure).
Derived/recomputable — stores nothing canonical; SHARD-WP-0011 maintains it incrementally.
"""
from __future__ import annotations
from collections.abc import Mapping
from dataclasses import dataclass
from shard_wiki.model import Identity
from shard_wiki.union import UnionGraph
from shard_wiki.views.links import resolve_links
__all__ = ["BackLink", "BackLinksIndex", "build_backlinks"]
@dataclass(frozen=True, slots=True)
class BackLink:
"""One inbound link: ``source`` (the linking page) references ``target_name``."""
source: Identity
target_name: str
@property
def source_shard(self) -> str:
return self.source.shard
class BackLinksIndex:
"""An immutable name → inbound-links index over the union link graph."""
def __init__(self, edges: Mapping[str, tuple[BackLink, ...]]) -> None:
self._edges = dict(edges)
def to(self, name: str) -> tuple[BackLink, ...]:
"""The backlinks pointing at ``name`` (empty if none)."""
return self._edges.get(name, ())
def sources(self, name: str) -> frozenset[Identity]:
"""Just the identities linking to ``name`` — convenient for set assertions."""
return frozenset(bl.source for bl in self.to(name))
def names(self) -> frozenset[str]:
return frozenset(self._edges)
def build_backlinks(union: UnionGraph, *, camelcase: bool = False) -> BackLinksIndex:
"""Scan every union page's links and index the resolved ones by target name."""
edges: dict[str, set[BackLink]] = {}
for page in union.iter_pages():
for resolved in resolve_links(union, page.body, camelcase=camelcase):
if resolved.is_red_link:
continue # red-links don't create backlinks
backlink = BackLink(source=page.identity, target_name=resolved.link.target)
edges.setdefault(resolved.link.target, set()).add(backlink)
return BackLinksIndex(
{name: tuple(sorted(links, key=lambda bl: str(bl.source))) for name, links in edges.items()}
)

View File

@@ -0,0 +1,91 @@
"""Wikilink + red-link model (SHARD-WP-0010 T1; FederationRequirements ADR-06).
A CommonMark *wikilink extension*: ``[[Target]]`` and ``[[Target|label]]`` are extracted from a
page body and each target is resolved through the union (ADR-01). A target that resolves is a
**link**; one that does not is a **red-link** — a createable hole (UC-23), never a dropped
reference (union without erasure). CamelCase auto-linking (``WikiWord``) is **off by default** and
opt-in per space, since bare CamelCase is noisy and policy-laden.
The link *model and resolution* are core; turning a :class:`ResolvedLink` into an ``<a>`` (or a
red anchor) is L6 presentation and lives outside this package. Link spans are byte/char offsets in
the body so a later layer can address them precisely.
"""
from __future__ import annotations
import re
from dataclasses import dataclass
from shard_wiki.union import Resolution, UnionGraph
__all__ = ["WikiLink", "ResolvedLink", "extract_links", "resolve_links"]
_WIKILINK_RE = re.compile(r"\[\[\s*([^\]|]+?)\s*(?:\|\s*([^\]]+?)\s*)?\]\]")
# A WikiWord: ≥2 capitalized alphanumeric segments run together (e.g. FrontPage, WikiWord).
_CAMELCASE_RE = re.compile(r"\b([A-Z][a-z0-9]+(?:[A-Z][a-z0-9]+)+)\b")
_FENCED_RE = re.compile(r"```.*?```", re.DOTALL)
_INLINE_CODE_RE = re.compile(r"`[^`\n]*`")
@dataclass(frozen=True, slots=True)
class WikiLink:
"""One extracted reference. ``target`` is the resolve key; ``label`` is the display text (or
None to use the target); ``span`` is the ``[start, end)`` offset of the whole token in the body;
``auto`` marks a CamelCase auto-link (vs an explicit ``[[...]]``)."""
target: str
label: str | None
span: tuple[int, int]
auto: bool = False
@property
def text(self) -> str:
return self.label or self.target
@dataclass(frozen=True, slots=True)
class ResolvedLink:
"""A :class:`WikiLink` paired with its union :class:`Resolution` (the link's truth status)."""
link: WikiLink
resolution: Resolution
@property
def is_red_link(self) -> bool:
return self.resolution.is_red_link
def _mask(body: str, pattern: re.Pattern[str]) -> str:
"""Blank out ``pattern`` matches with equal-length spaces so later scans skip them while every
surviving match keeps its true offset."""
return pattern.sub(lambda m: " " * len(m.group(0)), body)
def extract_links(body: str, *, camelcase: bool = False) -> tuple[WikiLink, ...]:
"""Extract wikilinks from ``body`` in document order, skipping fenced/inline code.
With ``camelcase=True`` (per-space opt-in), bare ``WikiWord`` tokens outside code and outside
existing ``[[...]]`` also become links.
"""
scan = _mask(_mask(body, _FENCED_RE), _INLINE_CODE_RE)
links: list[WikiLink] = []
for m in _WIKILINK_RE.finditer(scan):
links.append(WikiLink(target=m.group(1).strip(), label=m.group(2), span=m.span()))
if camelcase:
# Mask explicit-link spans too, so a CamelCase target inside [[...]] isn't double-counted.
cc_scan = _mask(scan, _WIKILINK_RE)
for m in _CAMELCASE_RE.finditer(cc_scan):
links.append(WikiLink(target=m.group(1), label=None, span=m.span(), auto=True))
return tuple(sorted(links, key=lambda link: link.span[0]))
def resolve_links(
union: UnionGraph, body: str, *, camelcase: bool = False
) -> tuple[ResolvedLink, ...]:
"""Extract and resolve every link in ``body`` against ``union`` (link vs red-link, ADR-01)."""
return tuple(
ResolvedLink(link, union.resolve(link.target))
for link in extract_links(body, camelcase=camelcase)
)

View File

@@ -0,0 +1,108 @@
"""RecentChanges — a merged change feed over the union (SHARD-WP-0010 T3; UC-17).
Two streams, one ordered feed (newest-first):
* the **coordination journal** — overlay/alias/fork/merge/binding decisions from the decision log,
each carrying its actor and the decision payload; and
* **shard change signals** — a page's current revision (folder mtime / ``source_rev``), i.e. the
backend's own "this changed" evidence.
Every entry carries provenance: which shard the edit came from, or that it was a coordination
decision (and by whom). Derived/recomputable — `notify`-driven streaming is a later binding.
"""
from __future__ import annotations
from collections.abc import Mapping
from dataclasses import dataclass, field
from datetime import datetime
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.union import UnionGraph
__all__ = ["ChangeEntry", "recent_changes"]
_COORDINATION = "coordination"
# How each journal event names the thing it touched + a human kind label.
_EVENT_KIND = {
EventType.ALIAS_SET: "alias",
EventType.OVERLAY_CREATED: "overlay",
EventType.MERGE_DECIDED: "merge",
EventType.PAGE_FORKED: "fork",
EventType.BINDING_MADE: "binding",
}
@dataclass(frozen=True, slots=True)
class ChangeEntry:
"""One change in the feed. ``source`` is the shard id (a shard edit) or ``"coordination"``."""
when: datetime
kind: str
ref: str
source: str
actor: str | None = None
detail: Mapping[str, object] = field(default_factory=dict)
def _event_ref(event_type: EventType, payload: Mapping[str, object]) -> str:
if event_type is EventType.ALIAS_SET:
return str(payload.get("alias", ""))
if event_type is EventType.OVERLAY_CREATED:
return f"{payload.get('target_shard')}:{payload.get('target_key')}"
if event_type is EventType.PAGE_FORKED:
return f"{payload.get('source')}{payload.get('fork')}"
if event_type is EventType.BINDING_MADE:
return ", ".join(str(m) for m in payload.get("members", ()))
return str(payload.get("overlay_id", "")) # MERGE_DECIDED
def recent_changes(
union: UnionGraph,
log: DecisionLog,
space: str,
*,
limit: int | None = None,
) -> tuple[ChangeEntry, ...]:
"""Merge the coordination journal and shard change signals into one newest-first feed."""
entries: list[ChangeEntry] = []
for event in log.events(space):
entries.append(
ChangeEntry(
when=event.timestamp,
kind=_EVENT_KIND.get(event.type, event.type.value),
ref=_event_ref(event.type, event.payload),
source=_COORDINATION,
actor=event.actor,
detail=dict(event.payload),
)
)
for page in union.iter_pages():
rev = page.envelope.source_rev
when = _parse_rev(rev)
if when is None:
continue # shard offers no change signal for this page — skip gracefully
entries.append(
ChangeEntry(
when=when,
kind="edit",
ref=str(page.identity),
source=page.identity.shard,
detail={"source_rev": rev},
)
)
entries.sort(key=lambda e: e.when, reverse=True)
return tuple(entries if limit is None else entries[:limit])
def _parse_rev(rev: str | None) -> datetime | None:
if rev is None:
return None
try:
return datetime.fromisoformat(rev)
except ValueError:
return None # non-temporal revision token (e.g. a content hash) — no feed timestamp

View File

@@ -0,0 +1,120 @@
"""Tests for the per-space append authority / lease (SHARD-WP-0009 T2).
A single append authority per space serializes appends into a total order; non-holders forward
intents to the holder; the lease is time-bounded and re-grantable (HA hand-off); a stale ex-holder
cannot fork the log.
"""
from datetime import datetime, timedelta, timezone
import pytest
from shard_wiki.coordination import (
AppendAuthority,
EventType,
GitEventStore,
InMemoryEventStore,
LeaseHeld,
LeaseRegistry,
)
class FakeClock:
def __init__(self):
self.now = datetime(2026, 1, 1, tzinfo=timezone.utc)
def __call__(self):
return self.now
def advance(self, seconds):
self.now += timedelta(seconds=seconds)
def test_only_one_node_holds_a_space_at_a_time():
reg = LeaseRegistry()
a = AppendAuthority("A", InMemoryEventStore(), reg)
b = AppendAuthority("B", InMemoryEventStore(), reg)
a.acquire("s")
with pytest.raises(LeaseHeld):
b.acquire("s") # B is refused while A's lease is valid
def test_concurrent_appends_serialize_into_one_total_order():
reg = LeaseRegistry()
store = InMemoryEventStore()
a = AppendAuthority("A", store, reg)
b = AppendAuthority("B", store, reg)
a.acquire("s")
# B is a non-holder: its append forwards to A, the holder. Interleave A and B writers.
a.append("s", EventType.ALIAS_SET, {"alias": "1", "target": "x:1"})
b.append("s", EventType.ALIAS_SET, {"alias": "2", "target": "x:2"}) # forwarded
a.append("s", EventType.ALIAS_SET, {"alias": "3", "target": "x:3"})
seqs = [e.seq for e in store.events("s")]
aliases = [e.payload["alias"] for e in store.events("s")]
assert seqs == [0, 1, 2] # contiguous total order despite two writers
assert aliases == ["1", "2", "3"]
def test_non_holder_forwards_rather_than_writing_directly():
reg = LeaseRegistry()
store = InMemoryEventStore()
a = AppendAuthority("A", store, reg)
b = AppendAuthority("B", store, reg)
a.acquire("s")
assert not b.holds("s")
b.append("s", EventType.ALIAS_SET, {"alias": "fwd", "target": "x:1"})
# The write landed on the shared store under A's authority, in one stream.
assert [e.payload["alias"] for e in store.events("s")] == ["fwd"]
def test_lease_handoff_resumes_from_head():
clock = FakeClock()
reg = LeaseRegistry(clock=clock)
store = InMemoryEventStore()
a = AppendAuthority("A", store, reg, ttl_seconds=10)
b = AppendAuthority("B", store, reg, ttl_seconds=10)
a.acquire("s")
a.append("s", EventType.ALIAS_SET, {"alias": "0", "target": "x:0"})
a.append("s", EventType.ALIAS_SET, {"alias": "1", "target": "x:1"})
clock.advance(20) # A's lease expires (A "dies")
b.acquire("s") # re-grantable: B takes over
b.append("s", EventType.ALIAS_SET, {"alias": "2", "target": "x:2"})
assert [e.seq for e in store.events("s")] == [0, 1, 2] # contiguous across hand-off
def test_stale_ex_holder_cannot_fork_the_log():
clock = FakeClock()
reg = LeaseRegistry(clock=clock)
store = InMemoryEventStore()
a = AppendAuthority("A", store, reg, ttl_seconds=10)
b = AppendAuthority("B", store, reg, ttl_seconds=10)
a.acquire("s")
a.append("s", EventType.ALIAS_SET, {"alias": "0", "target": "x:0"})
clock.advance(20)
b.acquire("s") # B is now the holder; A's lease is stale
b.append("s", EventType.ALIAS_SET, {"alias": "1", "target": "x:1"})
# A still thinks it can write, but it's no longer the holder: its intent forwards to B.
assert not a.holds("s")
a.append("s", EventType.ALIAS_SET, {"alias": "2", "target": "x:2"})
aliases = [e.payload["alias"] for e in store.events("s")]
assert aliases == ["0", "1", "2"] # one stream, no fork
def test_authority_over_git_store_keeps_total_order(tmp_path):
reg = LeaseRegistry()
store = GitEventStore(tmp_path / "coord")
a = AppendAuthority("A", store, reg)
b = AppendAuthority("B", store, reg)
a.acquire("s")
a.append("s", EventType.BINDING_MADE, {"members": ["a", "b"]})
b.append("s", EventType.PAGE_FORKED, {"source": "a", "fork": "c"}) # forwarded
assert [e.seq for e in store.events("s")] == [0, 1]
def test_unleased_space_self_acquires_on_append():
reg = LeaseRegistry()
store = InMemoryEventStore()
a = AppendAuthority("A", store, reg)
a.append("s", EventType.ALIAS_SET, {"alias": "x", "target": "y:1"}) # no explicit acquire
assert a.holds("s")
assert len(store.events("s")) == 1

View File

@@ -0,0 +1,74 @@
"""Migration + wiring of the git coordination backend (SHARD-WP-0009 T4)."""
from shard_wiki.coordination import (
DecisionLog,
EventType,
GitEventStore,
InMemoryEventStore,
export_jsonl,
import_jsonl,
migrate_space,
)
from shard_wiki.space import InformationSpace
def test_information_space_git_backed_uses_git_log(tmp_path):
space = InformationSpace.git_backed("space-1", tmp_path / "coord")
assert isinstance(space.log._store, GitEventStore)
space.alias("Home", "shardA:Index")
# Read-your-writes through the orchestrator's git-backed log.
assert space.log.fold("space-1").resolve_alias("Home") == "shardA:Index"
def test_default_information_space_stays_in_memory():
space = InformationSpace("space-1")
assert isinstance(space.log._store, InMemoryEventStore)
def test_migrate_space_preserves_order_and_provenance(tmp_path):
source = InMemoryEventStore()
e0 = source.append("s", EventType.ALIAS_SET, {"alias": "Home", "target": "x:1"}, actor="ana")
source.append("s", EventType.BINDING_MADE, {"members": ["a", "b"]}, actor="ben")
dest = GitEventStore(tmp_path / "coord")
n = migrate_space(source, "s", dest)
assert n == 2
migrated = dest.events("s")
assert [e.seq for e in migrated] == [0, 1]
# Provenance preserved verbatim — actor and timestamp survive the move (no restamping).
assert migrated[0].actor == "ana"
assert migrated[1].actor == "ben"
assert migrated[0].timestamp == e0.timestamp
def test_migration_yields_identical_fold(tmp_path):
source = DecisionLog(InMemoryEventStore())
for typ, payload in [
(EventType.ALIAS_SET, {"alias": "Home", "target": "x:1"}),
(EventType.BINDING_MADE, {"members": ["a", "b"]}),
(EventType.BINDING_MADE, {"members": ["b", "c"]}),
(EventType.ALIAS_SET, {"alias": "Home", "target": "x:2"}),
]:
source.append("s", typ, payload)
dest = GitEventStore(tmp_path / "coord")
migrate_space(source._store, "s", dest)
after = DecisionLog(dest)
assert after.fold("s").aliases == source.fold("s").aliases
assert after.fold("s").equivalence_groups == source.fold("s").equivalence_groups
def test_jsonl_round_trip_into_git(tmp_path):
source = InMemoryEventStore()
source.append("s", EventType.ALIAS_SET, {"alias": "Home", "target": "x:1"})
source.append("s", EventType.PAGE_FORKED, {"source": "p", "fork": "q"})
path = tmp_path / "log.jsonl"
assert export_jsonl(source.events("s"), path) == 2
dest = GitEventStore(tmp_path / "coord")
assert import_jsonl(path, dest) == 2
state = DecisionLog(dest).fold("s")
assert state.resolve_alias("Home") == "x:1"
assert state.equivalent_to("p") == frozenset({"p", "q"})

View File

@@ -0,0 +1,83 @@
"""Cross-process read-your-writes over the git log + fold parity (SHARD-WP-0009 T3).
The git backend's value over the in-memory double is that the totally ordered log is durable and
shared: a write by one process/handle is immediately visible to another opening the same ref, and
the derived fold is identical to the in-memory fold of the same event sequence (derived = f(log)).
"""
import os
import subprocess
import sys
import textwrap
from pathlib import Path
from shard_wiki.coordination import (
DecisionLog,
EventType,
GitEventStore,
InMemoryEventStore,
)
_SRC = str(Path(__file__).resolve().parents[1] / "src")
def test_new_handle_sees_prior_writes(tmp_path):
repo = tmp_path / "coord"
writer = DecisionLog(GitEventStore(repo))
writer.append("s", EventType.ALIAS_SET, {"alias": "Home", "target": "shardA:Index"})
writer.append("s", EventType.BINDING_MADE, {"members": ["a", "b"]})
# A second, independent handle on the same repo — read-your-writes across handles.
reader = DecisionLog(GitEventStore(repo))
assert [e.seq for e in reader.events("s")] == [0, 1]
assert reader.fold("s").resolve_alias("Home") == "shardA:Index"
def test_append_in_separate_process_is_visible(tmp_path):
repo = tmp_path / "coord"
# Seed from this process so the repo exists.
DecisionLog(GitEventStore(repo)).append(
"s", EventType.ALIAS_SET, {"alias": "A", "target": "x:1"}
)
child = textwrap.dedent(
f"""
from shard_wiki.coordination import DecisionLog, EventType, GitEventStore
log = DecisionLog(GitEventStore({str(repo)!r}))
log.append("s", EventType.ALIAS_SET, {{"alias": "B", "target": "x:2"}})
"""
)
result = subprocess.run(
[sys.executable, "-c", child],
capture_output=True,
text=True,
env={"PYTHONPATH": _SRC, "PATH": os.environ.get("PATH", "")},
)
assert result.returncode == 0, result.stderr
# This process, with a fresh handle, sees the child's append in order.
reader = DecisionLog(GitEventStore(repo))
assert [e.payload["alias"] for e in reader.events("s")] == ["A", "B"]
assert [e.seq for e in reader.events("s")] == [0, 1]
def test_cross_process_fold_equals_in_memory_fold(tmp_path):
sequence = [
(EventType.ALIAS_SET, {"alias": "Home", "target": "shardA:Index"}),
(EventType.BINDING_MADE, {"members": ["a", "b"]}),
(EventType.BINDING_MADE, {"members": ["b", "c"]}),
(EventType.PAGE_FORKED, {"source": "p", "fork": "q"}),
(EventType.ALIAS_SET, {"alias": "Home", "target": "shardB:Main"}),
]
mem = DecisionLog(InMemoryEventStore())
for typ, payload in sequence:
mem.append("s", typ, payload)
repo = tmp_path / "coord"
DecisionLog(GitEventStore(repo)) # init repo
for typ, payload in sequence:
# Each append from a fresh handle to simulate distinct writers over time.
DecisionLog(GitEventStore(repo)).append("s", typ, payload)
git_state = DecisionLog(GitEventStore(repo)).fold("s")
mem_state = mem.fold("s")
assert git_state.aliases == mem_state.aliases
assert git_state.equivalence_groups == mem_state.equivalence_groups
assert git_state.equivalent_to("a") == frozenset({"a", "b", "c"})

131
tests/test_git_adapter.py Normal file
View File

@@ -0,0 +1,131 @@
"""Tests for the GitShardAdapter read path + profile (SHARD-WP-0012 T1)."""
import subprocess
import pytest
from shard_wiki.adapters import GitShardAdapter, run_conformance
from shard_wiki.model import (
AttachmentMode,
History,
NotSupported,
ProfileError,
Substrate,
Verb,
)
def _git(repo, *args):
subprocess.run(
["git", "-C", str(repo), *args],
check=True,
capture_output=True,
env={"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": __import__("os").environ.get("PATH", "")},
)
def _repo(tmp_path, files, name="repo"):
repo = tmp_path / name
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
p = repo / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def test_keys_are_tracked_md_paths(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h", "docs/Guide.md": "g", "ignore.txt": "x"})
adapter = GitShardAdapter("git", repo)
assert set(adapter.keys()) == {"Home", "docs/Guide"} # only tracked *.md
def test_read_returns_page_with_commit_sha_rev(tmp_path):
repo = _repo(tmp_path, {"Home.md": "welcome"})
adapter = GitShardAdapter("git", repo)
page = adapter.read("Home")
assert page.identity.shard == "git"
assert page.body == "welcome"
head = subprocess.run(
["git", "-C", str(repo), "rev-parse", "HEAD"], capture_output=True, text=True, check=True
).stdout.strip()
assert page.envelope.source_rev == head # source_rev is the commit sha
assert page.envelope.lineage == "git-native"
def test_read_missing_key_raises(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(KeyError):
adapter.read("Nope")
def test_profile_validates_implication_rules(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"})).profile()
assert profile.substrate is Substrate.GIT
assert profile.attachment_mode is AttachmentMode.GIT_IS_STORE
assert profile.history is History.GIT_NATIVE # git-is-store ⟹ git-native
profile.validate() # raises if the implication rule were violated
def test_profile_is_read_only_in_t1(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"})).profile()
assert profile.supports(Verb.READ)
assert not profile.supports(Verb.WRITE)
def test_conformance_read_path_passes(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h", "Other.md": "o"}))
report = run_conformance(adapter)
assert report.ok, report.diff()
def test_unclaimed_write_raises_not_supported(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(NotSupported):
adapter.write("Home", "new") # read-only: honest absence
def test_empty_repo_has_no_keys(tmp_path):
repo = tmp_path / "empty"
repo.mkdir()
_git(repo, "init", "--quiet")
adapter = GitShardAdapter("git", repo)
assert list(adapter.keys()) == []
def test_bad_profile_combo_is_rejected():
# Sanity: the implication rule that backs the git profile actually bites when violated.
from shard_wiki.model import (
AccessGrant,
Addressing,
CapabilityProfile,
ContentOpacity,
MergeModel,
NativeQuery,
OperationalEnvelope,
Translation,
WriteGranularity,
)
from shard_wiki.provenance import Liveness
with pytest.raises(ProfileError):
CapabilityProfile(
substrate=Substrate.FILES, # not git, but claims git-is-store
attachment_mode=AttachmentMode.GIT_IS_STORE,
write_granularity=WriteGranularity.NONE,
content_opacity=ContentOpacity.TRANSPARENT,
operational_envelope=OperationalEnvelope.LOCAL_UNBOUNDED,
access_grant=AccessGrant.OPEN,
liveness=Liveness.STATIC,
history=History.NONE,
merge_model=MergeModel.NONE,
addressing=Addressing.PATH,
native_query=NativeQuery.NONE,
translation=Translation.NATIVE,
supported_verbs=frozenset({Verb.READ}),
).validate()

View File

@@ -0,0 +1,116 @@
"""GitShardAdapter history adopt + cross-substrate integration (SHARD-WP-0012 T3)."""
import os
import subprocess
import pytest
from shard_wiki.adapters import FolderAdapter, GitShardAdapter
from shard_wiki.coordination import ApplyStatus
from shard_wiki.space import InformationSpace
_ENV = {
"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": os.environ.get("PATH", ""),
}
def _git(repo, *args):
return subprocess.run(
["git", "-C", str(repo), *args], check=True, capture_output=True, text=True, env=_ENV
).stdout.strip()
def _git_repo(tmp_path, files, name="git"):
repo = tmp_path / name
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
(repo / rel).parent.mkdir(parents=True, exist_ok=True)
(repo / rel).write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def _folder(tmp_path, name, files, writable=False):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root, writable=writable)
# -- history adopt -------------------------------------------------------------
def test_history_lists_commits_newest_first(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "v1"})
adapter = GitShardAdapter("git", repo, writable=True)
adapter.write("Home", "v2")
history = adapter.history("Home")
assert len(history) == 2
assert history[0].message == "write Home.md" # newest first
assert history[-1].message == "seed"
assert all(rev.sha for rev in history)
def test_history_unknown_key_raises(tmp_path):
adapter = GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "h"}))
with pytest.raises(KeyError):
adapter.history("Nope")
# -- cross-substrate integration ----------------------------------------------
def test_resolve_across_git_and_folder(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "git home"})))
space.attach(_folder(tmp_path, "notes", {"Daily.md": "folder daily"}))
assert space.read("Home").body == "git home" # resolved from the git shard
assert space.read("Daily").body == "folder daily" # resolved from the folder shard
def test_chorus_spans_substrates_with_divergence(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Shared.md": "from git"})))
space.attach(_folder(tmp_path, "notes", {"Shared.md": "from folder"}))
res = space.resolve("Shared")
assert {p.body for p in res.pages} == {"from git", "from folder"} # chorus across substrates
git_page = next(p for p in res.pages if p.identity.shard == "git")
assert git_page.envelope.divergence # divergence recorded, not erased
def test_edit_through_git_shard_commits(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "original"})
space = InformationSpace("space")
space.attach(GitShardAdapter("git", repo, writable=True))
result = space.edit("Home", "edited via overlay")
assert result.status is ApplyStatus.APPLIED # write-through fast-forward on a git shard
assert space.read("Home").body == "edited via overlay"
assert int(_git(repo, "rev-list", "--count", "HEAD")) == 2 # the edit became a commit
def test_apply_under_drift_refuses_on_external_commit(tmp_path):
repo = _git_repo(tmp_path, {"Home.md": "original"})
space = InformationSpace("space")
space.attach(GitShardAdapter("git", repo, writable=True))
overlay = space.overlay("Home", "my draft") # base_rev = current git sha
# Another writer commits to the same path → the sha moves underneath the draft.
(repo / "Home.md").write_text("someone else", encoding="utf-8")
_git(repo, "add", "Home.md")
_git(repo, "commit", "-m", "external")
result = space.apply_overlay(overlay.overlay_id)
assert result.status is ApplyStatus.REFUSED_DRIFT # never clobber (sha drift detected)
# The shard itself is untouched — the external commit stands; the draft remains a draft.
assert space.union.shard("git").read("Home").body == "someone else"
def test_overlay_on_read_only_git_shard_kept_as_draft(tmp_path):
space = InformationSpace("space")
space.attach(GitShardAdapter("git", _git_repo(tmp_path, {"Home.md": "ro"}), writable=False))
result = space.edit("Home", "wanted change")
assert result.status is ApplyStatus.KEPT_DRAFT # read-only target → overlay retained

View File

@@ -0,0 +1,89 @@
"""Tests for GitShardAdapter write=commit + current_rev drift (SHARD-WP-0012 T2)."""
import os
import subprocess
from shard_wiki.adapters import GitShardAdapter, run_conformance
from shard_wiki.model import Verb
_ENV = {
"GIT_AUTHOR_NAME": "t", "GIT_AUTHOR_EMAIL": "t@t",
"GIT_COMMITTER_NAME": "t", "GIT_COMMITTER_EMAIL": "t@t",
"PATH": os.environ.get("PATH", ""),
}
def _git(repo, *args, capture=False):
return subprocess.run(
["git", "-C", str(repo), *args], check=True, capture_output=True, text=True, env=_ENV
).stdout.strip()
def _repo(tmp_path, files):
repo = tmp_path / "repo"
repo.mkdir()
_git(repo, "init", "--quiet")
for rel, text in files.items():
(repo / rel).write_text(text, encoding="utf-8")
_git(repo, "add", rel)
_git(repo, "commit", "-m", "seed")
return repo
def test_writable_profile_declares_write_and_version(tmp_path):
profile = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "h"}), writable=True).profile()
assert profile.supports(Verb.WRITE)
assert profile.supports(Verb.VERSION)
profile.validate() # PER_PAGE + WRITE is a consistent combination
def test_write_creates_a_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "old"})
adapter = GitShardAdapter("git", repo, writable=True)
before = _git(repo, "rev-list", "--count", "HEAD")
page = adapter.write("Home", "new body")
after = _git(repo, "rev-list", "--count", "HEAD")
assert int(after) == int(before) + 1 # one new commit
assert page.body == "new body"
assert page.envelope.source_rev == _git(repo, "rev-parse", "HEAD") # page is at the new sha
def test_write_advances_current_rev(tmp_path):
repo = _repo(tmp_path, {"Home.md": "old"})
adapter = GitShardAdapter("git", repo, writable=True)
rev_before = adapter.current_rev("Home")
adapter.write("Home", "changed")
assert adapter.current_rev("Home") != rev_before # sha moved → drift detectable
def test_write_new_key_tracks_it(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h"})
adapter = GitShardAdapter("git", repo, writable=True)
adapter.write("docs/New", "fresh page")
assert "docs/New" in set(adapter.keys())
assert adapter.read("docs/New").body == "fresh page"
def test_noop_write_creates_no_empty_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "same"})
adapter = GitShardAdapter("git", repo, writable=True)
before = _git(repo, "rev-list", "--count", "HEAD")
adapter.write("Home", "same") # identical body → nothing to commit
assert _git(repo, "rev-list", "--count", "HEAD") == before
def test_current_rev_reflects_external_commit(tmp_path):
repo = _repo(tmp_path, {"Home.md": "h"})
adapter = GitShardAdapter("git", repo, writable=True)
rev = adapter.current_rev("Home")
# An out-of-band commit to the same path (another writer) moves the per-path sha.
(repo / "Home.md").write_text("externally edited", encoding="utf-8")
_git(repo, "add", "Home.md")
_git(repo, "commit", "-m", "external")
assert adapter.current_rev("Home") != rev
def test_conformance_positive_write_probe_passes(tmp_path):
adapter = GitShardAdapter("git", _repo(tmp_path, {"Home.md": "body"}), writable=True)
report = run_conformance(adapter)
assert report.ok, report.diff()

View File

@@ -0,0 +1,84 @@
"""Tests for the git-backed event store (SHARD-WP-0009 T1).
The git backend must satisfy the same EventStore contract as the in-memory one (round-trip,
ordering, determinism) while making the log git-addressable.
"""
import subprocess
import pytest
from shard_wiki.coordination import (
DecisionLog,
EventType,
GitEventStore,
InMemoryEventStore,
deserialize_event,
serialize_event,
)
@pytest.fixture
def git_store(tmp_path):
return GitEventStore(tmp_path / "coord")
def test_append_git_read_round_trips(git_store):
log = DecisionLog(git_store)
ev = log.append("s", EventType.ALIAS_SET, {"alias": "Home", "target": "shardA:Index"})
(read,) = log.events("s")
assert read.seq == ev.seq == 0
assert read.space == "s"
assert read.type is EventType.ALIAS_SET
assert read.payload == {"alias": "Home", "target": "shardA:Index"}
def test_ordering_preserved_and_per_space_monotonic(git_store):
log = DecisionLog(git_store)
log.append("a", EventType.ALIAS_SET, {"alias": "X", "target": "s:1"})
log.append("a", EventType.ALIAS_SET, {"alias": "Y", "target": "s:2"})
log.append("b", EventType.ALIAS_SET, {"alias": "Z", "target": "s:3"})
assert [e.seq for e in log.events("a")] == [0, 1]
assert [e.payload["alias"] for e in log.events("a")] == ["X", "Y"]
assert [e.seq for e in log.events("b")] == [0] # independent ref/ordering
def test_each_append_is_a_git_commit(git_store):
log = DecisionLog(git_store)
log.append("s", EventType.BINDING_MADE, {"members": ["a", "b"]})
log.append("s", EventType.PAGE_FORKED, {"source": "a", "fork": "c"})
ref = GitEventStore._ref("s")
count = subprocess.run(
["git", "-C", str(git_store.repo_path), "rev-list", "--count", ref],
capture_output=True, text=True, check=True,
).stdout.strip()
assert count == "2" # one immutable commit object per append
def test_deterministic_serialization_is_stable_and_sorted():
log = InMemoryEventStore()
ev = log.append("s", EventType.ALIAS_SET, {"target": "z", "alias": "a"})
blob = serialize_event(ev)
assert serialize_event(ev) == blob # stable across calls
assert blob.index(b'"alias"') < blob.index(b'"target"') # payload keys sorted, not insertion
assert deserialize_event(blob).payload == {"alias": "a", "target": "z"}
def test_git_fold_matches_in_memory_fold(git_store):
events = [
(EventType.ALIAS_SET, {"alias": "Home", "target": "shardA:Index"}),
(EventType.BINDING_MADE, {"members": ["a", "b"]}),
(EventType.BINDING_MADE, {"members": ["b", "c"]}),
(EventType.ALIAS_SET, {"alias": "Home", "target": "shardB:Main"}),
]
mem = DecisionLog(InMemoryEventStore())
git = DecisionLog(git_store)
for typ, payload in events:
mem.append("s", typ, payload)
git.append("s", typ, payload)
assert git.fold("s").aliases == mem.fold("s").aliases
assert git.fold("s").equivalence_groups == mem.fold("s").equivalence_groups
def test_default_decisionlog_is_in_memory():
assert isinstance(DecisionLog()._store, InMemoryEventStore)

View File

@@ -0,0 +1,89 @@
"""Tests for the indexed equivalence relation — blocking + verify (SHARD-WP-0011 T1)."""
from itertools import combinations
from shard_wiki.incremental import EquivalenceIndex, MinHasher, band_keys, jaccard, shingles
from shard_wiki.incremental.equivalence import _fingerprint
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def _brute_force_groups(pages, threshold):
"""Oracle: O(N²) verify of every pair, then connected components."""
parent = {p.identity: p.identity for p in pages}
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
for p, q in combinations(pages, 2):
same_fp = _fingerprint(p.body) == _fingerprint(q.body)
sim = jaccard(shingles(p.body), shingles(q.body))
if same_fp or sim >= threshold:
parent[find(p.identity)] = find(q.identity)
comps = {}
for p in pages:
comps.setdefault(find(p.identity), set()).add(p.identity)
return {frozenset(v) for v in comps.values() if len(v) > 1}
def test_minhash_lsh_buckets_near_duplicates_together():
hasher = MinHasher(num_perm=64)
base = "the quick brown fox jumps over the lazy dog near the river bank today"
near = base + " and then some"
far = "completely unrelated content about astrophysics and distant galaxies far"
b_base = set(band_keys(hasher.signature(shingles(base)), 32))
b_near = set(band_keys(hasher.signature(shingles(near)), 32))
b_far = set(band_keys(hasher.signature(shingles(far)), 32))
assert b_base & b_near # near-duplicates share at least one band
assert not (b_base & b_far) # unrelated pages do not
def test_exact_duplicate_across_shards_is_equivalent():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "identical body text here"))
idx.add(_page("B", "Bar", "identical body text here"))
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_unrelated_pages_are_not_equivalent():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "alpha beta gamma delta epsilon"))
idx.add(_page("B", "Bar", "nothing in common whatsoever entirely"))
assert idx.groups() == ()
def test_curator_binding_forces_equivalence_regardless_of_content():
idx = EquivalenceIndex()
idx.add(_page("A", "Foo", "one thing"))
idx.add(_page("B", "Bar", "totally different"))
idx.bind(Identity("A", "Foo"), Identity("B", "Bar"))
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_index_matches_brute_force_oracle():
threshold = 0.7
shared = "shared sentence one shared sentence two shared sentence three end"
pages = [
_page("A", "Doc1", shared),
_page("B", "Doc1copy", shared + " minor tail"), # near-dup of A
_page("C", "Other", "a totally distinct page with no overlapping shingles at all here"),
_page("D", "Lonely", "yet another isolated document about unrelated subject matter alone"),
]
idx = EquivalenceIndex(threshold=threshold)
idx.build(pages)
assert set(idx.groups()) == _brute_force_groups(pages, threshold)

View File

@@ -0,0 +1,84 @@
"""Incremental maintenance == rebuild, with retraction + propagation (SHARD-WP-0011 T2)."""
from shard_wiki.incremental import EquivalenceIndex
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def _rebuilt(pages, curator=()):
idx = EquivalenceIndex()
idx.build(pages, curator)
return idx
def _equal(a, b):
return a.edges() == b.edges() and set(a.groups()) == set(b.groups())
def test_add_keeps_index_equal_to_rebuild():
pages = [_page("A", "Foo", "same content here"), _page("B", "Bar", "same content here")]
idx = EquivalenceIndex()
for p in pages:
idx.add(p)
assert _equal(idx, _rebuilt(pages))
assert idx.groups() # the two collapse
def test_remove_keeps_index_equal_to_rebuild():
pages = [
_page("A", "Foo", "same content here"),
_page("B", "Bar", "same content here"),
_page("C", "Baz", "unrelated isolated material entirely"),
]
idx = _rebuilt(pages)
idx.remove(Identity("B", "Bar"))
assert _equal(idx, _rebuilt([pages[0], pages[2]]))
def test_edit_into_new_bucket_retracts_stale_edge():
a = _page("A", "Foo", "shared identical body text")
b = _page("B", "Bar", "shared identical body text")
idx = _rebuilt([a, b])
assert idx.groups() # A ≡ B initially
# Edit B to something completely different: it exits A's buckets, the edge is retracted.
b2 = _page("B", "Bar", "now totally divergent unrelated prose about nothing")
idx.update(b2)
assert idx.groups() == () # stale edge gone
assert _equal(idx, _rebuilt([a, b2]))
def test_edit_into_equivalence_adds_edge():
a = _page("A", "Foo", "target body to converge on later")
b = _page("B", "Bar", "initially completely separate writing here")
idx = _rebuilt([a, b])
assert idx.groups() == ()
b2 = _page("B", "Bar", "target body to converge on later") # now identical to A
idx.update(b2)
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
assert _equal(idx, _rebuilt([a, b2]))
def test_removing_connector_splits_a_chorus():
# Curator chain A—B—C (no direct A—C): one group of three.
a, b, c = (_page("A", "X", "aaa"), _page("B", "Y", "bbb"), _page("C", "Z", "ccc"))
idx = EquivalenceIndex()
for p in (a, b, c):
idx.add(p)
idx.bind(a.identity, b.identity)
idx.bind(b.identity, c.identity)
assert idx.equivalent_to(a.identity) == {a.identity, b.identity, c.identity}
# Removing the connector B retracts/propagates: the chorus splits.
idx.remove(b.identity)
assert idx.groups() == ()
chain = [(a.identity, b.identity), (b.identity, c.identity)]
assert _equal(idx, _rebuilt([a, c], curator=chain))

View File

@@ -0,0 +1,89 @@
"""Tests for I-2 verification — digest + consistency-checker (SHARD-WP-0011 T3)."""
from shard_wiki.incremental import (
ConsistencyChecker,
EquivalenceIndex,
derived_digest,
)
from shard_wiki.model import Identity, Page
from shard_wiki.provenance import ProvenanceEnvelope
def _page(shard, key, body):
return Page(
identity=Identity(shard, key),
body=body,
envelope=ProvenanceEnvelope(source_shard=shard),
)
def test_digest_is_stable_under_equivalent_event_orders():
pages = [
_page("A", "Foo", "shared body text here"),
_page("B", "Bar", "shared body text here"),
_page("C", "Baz", "an entirely separate unrelated document"),
]
forward = EquivalenceIndex()
for p in pages:
forward.add(p)
reverse = EquivalenceIndex()
for p in reversed(pages):
reverse.add(p)
assert derived_digest(forward) == derived_digest(reverse)
def test_clean_index_reports_healthy():
pages = [_page("A", "Foo", "same body"), _page("B", "Bar", "same body")]
idx = EquivalenceIndex()
idx.build(pages)
checker = ConsistencyChecker(idx, pages_fn := (lambda: pages))
report = checker.check_and_repair()
assert report.drifted is False and report.healthy is True
assert pages_fn() # source unchanged
def test_missed_delta_drift_is_detected_and_repaired():
a = _page("A", "Foo", "converging target body")
b = _page("B", "Bar", "initially unrelated separate text")
source = {"pages": [a, b]}
idx = EquivalenceIndex()
idx.build(source["pages"])
assert idx.groups() == () # not equivalent yet
# Source changes B to match A, but the index is never told (a missed delta → drift).
b2 = _page("B", "Bar", "converging target body")
source["pages"] = [a, b2]
checker = ConsistencyChecker(idx, lambda: source["pages"])
report = checker.check_and_repair()
assert report.drifted is True and report.repaired is True and report.healthy is True
# Self-healed: the index now reflects the equivalence.
assert idx.equivalent_to(Identity("A", "Foo")) == frozenset(
{Identity("A", "Foo"), Identity("B", "Bar")}
)
def test_corrupted_internal_state_is_healed():
a = _page("A", "Foo", "identical content")
b = _page("B", "Bar", "identical content")
idx = EquivalenceIndex()
idx.build([a, b])
# Corrupt the derived tier directly: delete a true edge (simulated index corruption).
idx._content_edges.clear()
assert idx.groups() == () # corrupted away
checker = ConsistencyChecker(idx, lambda: [a, b])
report = checker.check_and_repair()
assert report.drifted is True and report.healthy is True
assert idx.groups() # edge restored by scoped recompute
def test_removed_source_page_is_reconciled():
a = _page("A", "Foo", "same body")
b = _page("B", "Bar", "same body")
idx = EquivalenceIndex()
idx.build([a, b])
checker = ConsistencyChecker(idx, lambda: [a]) # B vanished from source
report = checker.check_and_repair()
assert report.healthy is True
assert Identity("B", "Bar") not in idx.identities()

View File

@@ -0,0 +1,74 @@
"""Wire the incremental tier behind InformationSpace views (SHARD-WP-0011 T4)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import EventType
from shard_wiki.model import Identity
from shard_wiki.space import InformationSpace
from shard_wiki.views import all_pages
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_all_pages_via_index_matches_direct_fold(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "wiki", {"Home.md": "welcome", "Guide.md": "the guide"}))
space.attach(_shard(tmp_path, "notes", {"Daily.md": "today"}))
# Routed-through-index result equals the direct fold-based computation (behaviour unchanged).
via_index = {(e.name, e.members) for e in space.all_pages()}
direct = {(e.name, e.members) for e in all_pages(space.union)}
assert via_index == direct
def test_curator_binding_collapses_via_maintained_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "x"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "y"}))
space.log.append(
"space", EventType.BINDING_MADE, {"members": ["a:Foo", "b:Bar"]}
)
# The maintained index re-syncs curator edges live from the log fold.
collapsed = [e for e in space.all_pages() if len(e.members) == 2]
assert len(collapsed) == 1
assert set(collapsed[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_content_duplicate_collapses_via_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "the very same body content here"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "the very same body content here"}))
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # content equivalence detected by the maintained index
assert set(dup[0].members) == {Identity("a", "Foo"), Identity("b", "Bar")}
def test_attach_invalidates_index(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
assert space.all_pages() # builds the index (one page, no groups)
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"})) # marks index stale
dup = [e for e in space.all_pages() if len(e.members) == 2]
assert len(dup) == 1 # rebuilt fallback picks up the new equivalent page
def test_verify_index_reports_healthy_when_consistent(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "same body"}))
space.attach(_shard(tmp_path, "b", {"Bar.md": "same body"}))
space.all_pages() # ensure built
report = space.verify_index()
assert report.healthy is True
def test_reindex_is_an_explicit_fallback(tmp_path):
space = InformationSpace("space")
space.attach(_shard(tmp_path, "a", {"Foo.md": "content"}))
before = space.index.digest()
space.reindex()
assert space.index.digest() == before # rebuild is deterministic

View File

@@ -0,0 +1,76 @@
"""Tests for the AllPages + SiteMap enumeration views (SHARD-WP-0010 T4)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.model import Identity
from shard_wiki.union import UnionGraph
from shard_wiki.views import all_pages, site_map
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_all_pages_spans_shards(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a"}))
u.attach(_shard(tmp_path, "shardB", {"B.md": "b"}))
names = {e.name for e in all_pages(u)}
assert names == {"A", "B"}
def test_chorus_collapses_to_one_entry_with_divergence(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "A home"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "B home"}))
entries = all_pages(u)
home = [e for e in entries if e.name == "Home"]
assert len(home) == 1 # chorus → single entry
assert set(home[0].members) == {Identity("shardA", "Home"), Identity("shardB", "Home")}
assert home[0].diverges is True # bodies differ — collapse acknowledged, not silent
def test_chorus_same_body_does_not_diverge(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "same"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "same"}))
(home,) = [e for e in all_pages(u) if e.name == "Home"]
assert home.diverges is False
def test_equivalence_binding_collapses_distinct_keys(tmp_path):
log = DecisionLog()
log.append(
"space", EventType.BINDING_MADE, {"members": ["shardA:Foo", "shardB:Bar"]}
)
u = UnionGraph("space", log=log)
u.attach(_shard(tmp_path, "shardA", {"Foo.md": "x"}))
u.attach(_shard(tmp_path, "shardB", {"Bar.md": "x"}))
pair = {Identity("shardA", "Foo"), Identity("shardB", "Bar")}
# The two bound identities fold into one entry (named by the min key, "Bar").
bound = [e for e in all_pages(u) if {*e.members} == pair]
assert len(bound) == 1
assert bound[0].name == "Bar"
def test_sitemap_reflects_namespace_paths(tmp_path):
u = UnionGraph("space")
u.attach(
_shard(
tmp_path,
"shardA",
{"Home.md": "h", "docs/Guide.md": "g", "docs/api/Ref.md": "r"},
)
)
root = site_map(u)
# Top level: "Home" page directly, and a "docs" namespace.
assert any(p.key == "Home" for p in root.pages)
docs = next(c for c in root.children if c.name == "docs")
assert any(p.key == "docs/Guide" for p in docs.pages)
api = next(c for c in docs.children if c.name == "api")
assert any(p.key == "docs/api/Ref" for p in api.pages)

View File

@@ -0,0 +1,51 @@
"""Tests for the BackLinks derived view (SHARD-WP-0010 T2)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.model import Identity
from shard_wiki.union import UnionGraph
from shard_wiki.views import build_backlinks
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_link_yields_backlink_with_provenance(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "see [[B]]", "B.md": "target"}))
index = build_backlinks(u)
assert index.sources("B") == frozenset({Identity("shardA", "A")})
(bl,) = index.to("B")
assert bl.source_shard == "shardA" # entry carries source provenance
def test_red_links_create_no_backlinks(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "see [[Ghost]]"}))
index = build_backlinks(u)
assert index.to("Ghost") == () # unresolved target → no backlink
assert "Ghost" not in index.names()
def test_chorus_target_aggregates_backlinks(tmp_path):
# "Home" exists in two shards (a chorus); links to it from anywhere aggregate under one name.
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "A home", "A.md": "[[Home]]"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "B home", "B.md": "[[Home]]"}))
index = build_backlinks(u)
assert index.sources("Home") == frozenset(
{Identity("shardA", "A"), Identity("shardB", "B")}
)
def test_backlinks_span_shards(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Index.md": "x"}))
u.attach(_shard(tmp_path, "shardB", {"B.md": "links [[Index]]"}))
index = build_backlinks(u)
assert index.sources("Index") == frozenset({Identity("shardB", "B")})

View File

@@ -0,0 +1,52 @@
"""Integration: derived views exposed on InformationSpace over two shards (SHARD-WP-0010 T5)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.model import Identity
from shard_wiki.space import InformationSpace
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def _space(tmp_path):
space = InformationSpace("space")
space.attach(
_shard(tmp_path, "wiki", {"Home.md": "welcome, see [[Guide]]", "Guide.md": "the guide"})
)
space.attach(_shard(tmp_path, "notes", {"Daily.md": "today I read [[Guide]]"}))
return space
def test_backlinks_across_two_shards(tmp_path):
space = _space(tmp_path)
sources = {bl.source for bl in space.backlinks("Guide")}
assert sources == {Identity("wiki", "Home"), Identity("notes", "Daily")}
def test_all_pages_and_site_map_over_union(tmp_path):
space = _space(tmp_path)
names = {e.name for e in space.all_pages()}
assert names == {"Home", "Guide", "Daily"}
leaves = {p.key for p in space.site_map().pages}
assert {"Home", "Guide", "Daily"} <= leaves
def test_recent_changes_includes_alias_and_edits(tmp_path):
space = _space(tmp_path)
space.alias("Start", "wiki:Home", actor="ana")
feed = space.recent_changes()
kinds = {e.kind for e in feed}
assert "alias" in kinds and "edit" in kinds
alias = next(e for e in feed if e.kind == "alias")
assert alias.source == "coordination" and alias.actor == "ana"
def test_red_link_creates_no_backlink_via_space(tmp_path):
space = _space(tmp_path)
assert space.backlinks("Nonexistent") == ()

69
tests/test_views_links.py Normal file
View File

@@ -0,0 +1,69 @@
"""Tests for the wikilink + red-link model (SHARD-WP-0010 T1)."""
from shard_wiki.adapters import FolderAdapter
from shard_wiki.union import ResolutionKind, UnionGraph
from shard_wiki.views import extract_links, resolve_links
def _shard(tmp_path, name, files):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
return FolderAdapter(name, root)
def test_extracts_plain_and_labelled_links():
links = extract_links("See [[Home]] and [[Index|the index]].")
assert [(link.target, link.label, link.text) for link in links] == [
("Home", None, "Home"),
("Index", "the index", "the index"),
]
def test_links_carry_body_offsets_in_document_order():
body = "a [[One]] b [[Two]]"
links = extract_links(body)
assert [link.target for link in links] == ["One", "Two"]
s, e = links[0].span
assert body[s:e] == "[[One]]"
def test_code_regions_are_not_scanned():
body = "real [[Home]]\n```\n[[NotALink]]\n```\ninline `[[AlsoNot]]` done"
targets = [link.target for link in extract_links(body)]
assert targets == ["Home"]
def test_camelcase_off_by_default_then_opt_in():
body = "FrontPage links to [[Home]]"
assert [link.target for link in extract_links(body)] == ["Home"] # CamelCase ignored
on = extract_links(body, camelcase=True)
assert {link.target for link in on} == {"FrontPage", "Home"}
assert next(link for link in on if link.target == "FrontPage").auto is True
def test_camelcase_does_not_double_count_inside_explicit_link():
# [[FrontPage]] is one explicit link, not also a CamelCase auto-link.
links = extract_links("[[FrontPage]]", camelcase=True)
assert len(links) == 1
assert links[0].auto is False
def test_resolve_links_distinguishes_link_from_red_link(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "home"}))
resolved = resolve_links(u, "[[Home]] and [[Ghost]]")
by_target = {r.link.target: r for r in resolved}
assert by_target["Home"].resolution.kind is ResolutionKind.SINGLE
assert by_target["Home"].is_red_link is False
assert by_target["Ghost"].is_red_link is True # unresolved → createable red-link
def test_resolve_links_surfaces_chorus(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "A"}))
u.attach(_shard(tmp_path, "shardB", {"Home.md": "B"}))
(resolved,) = resolve_links(u, "[[Home]]")
assert resolved.resolution.kind is ResolutionKind.CHORUS

View File

@@ -0,0 +1,67 @@
"""Tests for the RecentChanges merged feed (SHARD-WP-0010 T3)."""
import os
from datetime import datetime, timezone
from shard_wiki.adapters import FolderAdapter
from shard_wiki.coordination import DecisionLog, EventType
from shard_wiki.union import UnionGraph
from shard_wiki.views import recent_changes
def _shard(tmp_path, name, files, mtime=None):
root = tmp_path / name
for rel, text in files.items():
p = root / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(text, encoding="utf-8")
if mtime is not None:
os.utime(p, (mtime, mtime))
return FolderAdapter(name, root)
def test_edit_and_alias_both_appear_newest_first(tmp_path):
# Page edit signal pinned to an old mtime; the alias decision happens "now" → alias is newest.
old = datetime(2020, 1, 1, tzinfo=timezone.utc).timestamp()
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Home.md": "home"}, mtime=old))
log = DecisionLog()
log.append("space", EventType.ALIAS_SET, {"alias": "Start", "target": "shardA:Home"})
feed = recent_changes(u, log, "space")
kinds = [e.kind for e in feed]
assert "edit" in kinds and "alias" in kinds
assert feed[0].kind == "alias" # newest first
assert feed[-1].kind == "edit"
# Monotonic non-increasing by time.
assert all(feed[i].when >= feed[i + 1].when for i in range(len(feed) - 1))
def test_per_shard_attribution_present(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a"}))
u.attach(_shard(tmp_path, "shardB", {"B.md": "b"}))
feed = recent_changes(u, DecisionLog(), "space")
edits = {e.ref: e.source for e in feed if e.kind == "edit"}
assert edits["shardA:A"] == "shardA"
assert edits["shardB:B"] == "shardB" # each edit attributed to its shard
def test_coordination_entries_carry_actor_and_ref(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"Doc.md": "x"}))
log = DecisionLog()
log.append(
"space", EventType.PAGE_FORKED, {"source": "shardA:Doc", "fork": "shardB:Doc"}, actor="ana"
)
fork = next(e for e in recent_changes(u, log, "space") if e.kind == "fork")
assert fork.source == "coordination"
assert fork.actor == "ana"
assert fork.ref == "shardA:Doc→shardB:Doc"
def test_limit_truncates_to_newest(tmp_path):
u = UnionGraph("space")
u.attach(_shard(tmp_path, "shardA", {"A.md": "a", "B.md": "b", "C.md": "c"}))
feed = recent_changes(u, DecisionLog(), "space", limit=2)
assert len(feed) == 2

View File

@@ -4,7 +4,7 @@ type: workplan
title: "git-backed DecisionLog + per-space append authority"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -39,7 +39,7 @@ sharding (blueprint O-12). Single append authority per space is the target.
```task
id: SHARD-WP-0009-T1
status: todo
status: done
priority: high
state_hub_task_id: "a8fcbb3e-fbc4-4f68-9cf0-d8a6ee057191"
```
@@ -54,7 +54,7 @@ ordering preserved; deterministic serialization.
```task
id: SHARD-WP-0009-T2
status: todo
status: done
priority: high
state_hub_task_id: "62abd162-4243-4659-8d27-9fc967ab11a0"
```
@@ -69,7 +69,7 @@ hand-off resumes from head; a partitioned non-holder cannot fork the log.
```task
id: SHARD-WP-0009-T3
status: todo
status: done
priority: high
state_hub_task_id: "8cc3691e-05a7-443f-9292-a3fdf3fd59a4"
```
@@ -82,7 +82,7 @@ process B (new handle) sees it; fold equals the in-memory fold for the same even
```task
id: SHARD-WP-0009-T4
status: todo
status: done
priority: medium
state_hub_task_id: "281e1db4-6a75-456b-a2bc-b761feb10609"
```

View File

@@ -4,7 +4,7 @@ type: workplan
title: "derived views — wikilinks, BackLinks, RecentChanges, AllPages/SiteMap"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -36,7 +36,7 @@ later by SHARD-WP-0011) and carry provenance. Presentation stays out of core (L6
```task
id: SHARD-WP-0010-T1
status: todo
status: done
priority: high
state_hub_task_id: "792660c3-9be9-4771-9f51-69d01f0c7f13"
```
@@ -51,7 +51,7 @@ red-link, CamelCase opt-in.
```task
id: SHARD-WP-0010-T2
status: todo
status: done
priority: high
state_hub_task_id: "431a54c3-82b5-4b08-b3f0-762624d4c91d"
```
@@ -65,7 +65,7 @@ chorus pages aggregate.
```task
id: SHARD-WP-0010-T3
status: todo
status: done
priority: medium
state_hub_task_id: "270c1c31-0445-42b9-9a49-92d32c298eb2"
```
@@ -79,7 +79,7 @@ alias both appear, newest-first; per-shard attribution present.
```task
id: SHARD-WP-0010-T4
status: todo
status: done
priority: low
state_hub_task_id: "898ba43e-cdef-4ce8-9fa3-4ce60ebb4fdd"
```
@@ -92,7 +92,7 @@ collapses to one entry with divergence noted; sitemap reflects paths.
```task
id: SHARD-WP-0010-T5
status: todo
status: done
priority: medium
state_hub_task_id: "7157544b-5d3b-45a2-ba5a-c32244c59323"
```

View File

@@ -4,7 +4,7 @@ type: workplan
title: "incremental union maintenance + equivalence index + I-2 verification"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -41,7 +41,7 @@ deployment is later.
```task
id: SHARD-WP-0011-T1
status: todo
status: done
priority: high
state_hub_task_id: "842f480b-7b14-47cd-818b-012dbda9c187"
```
@@ -55,7 +55,7 @@ unrelated pages don't; verified edges match a brute-force oracle on a small corp
```task
id: SHARD-WP-0011-T2
status: todo
status: done
priority: high
state_hub_task_id: "2da4e0b8-22cc-4ad1-a9aa-b5e991515d30"
```
@@ -70,7 +70,7 @@ stale edge.
```task
id: SHARD-WP-0011-T3
status: todo
status: done
priority: high
state_hub_task_id: "b602ce31-ad9a-4c7f-b596-f039722373fc"
```
@@ -85,7 +85,7 @@ equivalent event orders.
```task
id: SHARD-WP-0011-T4
status: todo
status: done
priority: medium
state_hub_task_id: "2f3d083c-0b2e-4b58-9e96-c0461c5eb089"
```

View File

@@ -4,7 +4,7 @@ type: workplan
title: "second adapter — git-IS-store shard (contract validation on a new substrate)"
domain: whynot
repo: shard-wiki
status: active
status: done
owner: tegwick
topic_slug: whynot
created: "2026-06-15"
@@ -40,7 +40,7 @@ merge beyond fast-forward (apply-under-drift refuse is enough, as in SHARD-WP-00
```task
id: SHARD-WP-0012-T1
status: todo
status: done
priority: high
state_hub_task_id: "8a1c7c80-a0cc-4e02-a611-1f1fd7dec57b"
```
@@ -54,7 +54,7 @@ implication rules. Tests: read tracked files; profile validates; conformance rea
```task
id: SHARD-WP-0012-T2
status: todo
status: done
priority: high
state_hub_task_id: "b47dfb86-46c1-4e97-a62f-377719499ff2"
```
@@ -68,7 +68,7 @@ changes after an external commit.
```task
id: SHARD-WP-0012-T3
status: todo
status: done
priority: medium
state_hub_task_id: "4c895f42-671d-4948-8bdf-941fd85644bb"
```