spec(SHARD-WP-0005 T1): re-frame state model — three states, derived=f(canonical)

Replaces the two-bucket thesis with sharded-canonical / coordination-canonical
(journal: overlays, bindings, aliases, merges) / derived-disposable. Fixes the
I-2 contradiction (curator bindings can't be rebuilt). Updates §1, I-2, §3
dependency rule, §4 abstractions, §8.4. (review A-1)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-15 01:06:54 +02:00
parent ee2449d987
commit b5d2cbc330

View File

@@ -21,34 +21,51 @@ Scope relationship to the other specs:
---
## 1. The thesis: *canonical at the edges, derived in the middle*
## 1. The thesis: *canonical vs derived* (three states)
Everything in shard-wiki follows from one organising decision:
Everything in shard-wiki follows from one organising decision — that state comes in exactly
**three kinds**, and only one of them is disposable:
> **The canonical truth lives at the edges — in each shard (content) and in the Git
> coordination journal (history). Everything shard-wiki computes in between — the union, the
> projections, the views, the query indexes — is _derived state that can be deleted and
> rebuilt_ from those edges.**
> **1. Sharded-canonical** — content owned by each shard (shard sovereignty).
> **2. Coordination-canonical** — durable state *born inside shard-wiki* that encodes human
> or cross-shard decisions and exists nowhere else: overlays (the local truth against a
> read-only shard), curator equivalence bindings, alias tables, merge/reconciliation
> decisions. It lives in the **Git coordination journal**.
> **3. Derived-disposable** — everything shard-wiki *computes* from (1)+(2): the union graph,
> equivalence index, query indexes, projections, views. It can be deleted and recomputed.
>
> **Canonical = sharded coordination. Derived = a pure function of canonical:**
> `derived = f(sharded, coordination)`.
This is the architectural form of "orchestrator, not engine." shard-wiki never *becomes* the
source of truth; it composes sources. The research earned this thesis empirically — every
serious system externalises its durable truth to files+VCS and treats the rest as derived:
Logseq (DataScript index over plain files), ikiwiki (static HTML compiled from a git repo),
Glamorous Toolkit / Lepiter (live views over git-versioned JSON), Pharo (Tonel/Iceberg code
as git text), Jupyter teams (nbstripout — outputs are derived noise). The one tradition that
refuses this — the Smalltalk **image** — is exactly the one we record as a *boundary, not a
backend* (`research/260614-squeak-pharo-deep-dive`).
source of truth; it composes sources and records the decisions it makes about them. The
research earned the *files-canonical* half empirically — every serious system externalises its
durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain
files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live
views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams
(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk
**image** — is exactly the one we record as a *boundary, not a backend*
(`research/260614-squeak-pharo-deep-dive`).
The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges,
derived in the middle"). That was wrong by omission: it had no home for **coordination-
canonical** state, and so contradicted itself by listing curator bindings and alias tables as
"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model
fixes that crack (review finding A-1) and makes `derived = f(canonical)` *literally* true.
Three consequences fall straight out, and they are the spine of the rest of this document:
1. **Graceful degradation is free.** If the derived middle is always rebuildable, a backend
1. **Graceful degradation is free.** If the derived tier is always recomputable, a backend
that can only be read is still a first-class participant — you just derive less from it.
2. **Provenance is tractable.** Because shard-wiki never claims to *be* the source, every
derived artifact can always point back to the canonical edge it came from (union without
derived artifact can always point back to the canonical input it came from (union without
erasure is a structural property, not a feature bolted on).
3. **The system is a pure function of its inputs.** `union/index/projection = f(shards,
journal)`. Bugs in the middle are recoverable by rebuild; the edges are the only thing
that must be protected (and history protects them).
3. **The derived tier is a pure function of canonical state.** `derived = f(sharded,
coordination)`. Bugs in the derived tier are recoverable by recompute; only the two
canonical tiers must be durably protected — sharded by each shard, coordination by the
Git journal (history). *Recomputability is a correctness property of the derived tier, not
a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the
operational-envelope axis.*
### The dual narrow waist
@@ -73,7 +90,7 @@ principles fused with the research through-lines.
| # | Invariant | Source |
|---|-----------|--------|
| I-1 | **Orchestrator, not engine.** Core composes shards; it never replaces or homogenises them. | INTENT Stability Note |
| I-2 | **Canonical at the edges, derived in the middle.** Union/index/projection are rebuildable from shards + journal. | §1; Logseq/ikiwiki/GT through-line |
| I-2 | **Three states; derived = f(canonical).** State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line |
| I-3 | **Capability-awareness is data.** A binding's abilities are a *profile* (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters |
| I-4 | **Union without erasure.** Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) |
| I-5 | **Overlay before mutation.** Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT |
@@ -124,8 +141,11 @@ carries provenance; an authz decision at L5 records the context under which cont
Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4
(can we delegate a query?), and L5 (can this principal even reach the op?).
The dependency rule is strict and downward: **L4 may be deleted and recomputed from L1L3.**
Nothing at L1L3 may depend on L4 state.
The dependency rule is strict and downward, and it tracks the **three states (§1)**, not the
layer numbers: **the derived-disposable tier (the whole of L4) may be deleted and recomputed
from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal).**
Nothing canonical may depend on derived state. Note the journal at L3 is *canonical* (it holds
overlays, bindings, aliases, merges); only L4 is disposable.
---
@@ -146,9 +166,13 @@ parallel terms.
(paths/shards). Addressing, equivalence, and transclusion key on identity (I-9).
- **Provenance envelope** — the metadata wrapper every artifact carries: source shard,
freshness, liveness, authorization context, overlay status, divergence, derivation lineage.
- **Coordination journal** — the L3 Git-backed record of change flows for a space.
- **Coordination journal** — the L3 Git-backed record of change flows for a space, and the
durable home of all **coordination-canonical** state (§1): overlays, curator equivalence
bindings, alias tables, merge/reconciliation decisions. This state is born inside shard-wiki,
exists nowhere else, and is *not* derived — it must be committed, never recomputed.
- **Overlay** — a non-destructive local edit against a remote/read-only/limited shard,
representable as draft/patch/commit/MR before destructive apply.
representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an
unapplied overlay is the local truth and lives in the journal.
- **Projection** — a derived view of shard content, typed on two axes (§8): *kind*
(replication | derivation) × *liveness* (static … irreducibly-live).
- **Federation model** — the selected coordination strategy for a space (§ taxonomy, T17).
@@ -311,11 +335,16 @@ composes per shard** (mechanism over policy, I-7):
### 8.4 Union & projection (L4) — the derived cache
This whole layer is rebuildable from L1L3 (I-2). It comprises:
This whole layer is **derived-disposable**: recomputable from canonical state — sharded
content + the **coordination-canonical** inputs in the journal (I-2). Crucially, the *automatic*
equivalence results are derived, but the **human/curatorial inputs they consume — alias tables
and curator equivalence bindings — are coordination-canonical (they live in the journal), not
derived**; recompute reads them, never regenerates them. It comprises:
- **Identity resolution & equivalence** — detect "same topic / derived content" path-
independently (fingerprint, span-set overlap, alias table, curator binding); present as
**chorus-of-voices** or designated-canonical (a *policy* preset).
independently from *derived* signals (content fingerprint, span-set overlap) **plus** the
*coordination-canonical* inputs (alias table, curator binding); present as
**chorus-of-voices** or designated-canonical (a *policy* preset). (Scaling: §8.6.)
- **Union graph** — the navigable join of pages, links, and dimensions (namespace, genealogy,
version, shard, equivalence). A *derived lens over canonical files+journal, never a new
store* (the ZigZag boundary).