diff --git a/spec/CoreArchitectureBlueprint.md b/spec/CoreArchitectureBlueprint.md index 33f7c5f..4073650 100644 --- a/spec/CoreArchitectureBlueprint.md +++ b/spec/CoreArchitectureBlueprint.md @@ -21,34 +21,51 @@ Scope relationship to the other specs: --- -## 1. The thesis: *canonical at the edges, derived in the middle* +## 1. The thesis: *canonical vs derived* (three states) -Everything in shard-wiki follows from one organising decision: +Everything in shard-wiki follows from one organising decision — that state comes in exactly +**three kinds**, and only one of them is disposable: -> **The canonical truth lives at the edges — in each shard (content) and in the Git -> coordination journal (history). Everything shard-wiki computes in between — the union, the -> projections, the views, the query indexes — is _derived state that can be deleted and -> rebuilt_ from those edges.** +> **1. Sharded-canonical** — content owned by each shard (shard sovereignty). +> **2. Coordination-canonical** — durable state *born inside shard-wiki* that encodes human +> or cross-shard decisions and exists nowhere else: overlays (the local truth against a +> read-only shard), curator equivalence bindings, alias tables, merge/reconciliation +> decisions. It lives in the **Git coordination journal**. +> **3. Derived-disposable** — everything shard-wiki *computes* from (1)+(2): the union graph, +> equivalence index, query indexes, projections, views. It can be deleted and recomputed. +> +> **Canonical = sharded ∪ coordination. Derived = a pure function of canonical:** +> `derived = f(sharded, coordination)`. This is the architectural form of "orchestrator, not engine." shard-wiki never *becomes* the -source of truth; it composes sources. The research earned this thesis empirically — every -serious system externalises its durable truth to files+VCS and treats the rest as derived: -Logseq (DataScript index over plain files), ikiwiki (static HTML compiled from a git repo), -Glamorous Toolkit / Lepiter (live views over git-versioned JSON), Pharo (Tonel/Iceberg code -as git text), Jupyter teams (nbstripout — outputs are derived noise). The one tradition that -refuses this — the Smalltalk **image** — is exactly the one we record as a *boundary, not a -backend* (`research/260614-squeak-pharo-deep-dive`). +source of truth; it composes sources and records the decisions it makes about them. The +research earned the *files-canonical* half empirically — every serious system externalises its +durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain +files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live +views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams +(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk +**image** — is exactly the one we record as a *boundary, not a backend* +(`research/260614-squeak-pharo-deep-dive`). + +The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges, +derived in the middle"). That was wrong by omission: it had no home for **coordination- +canonical** state, and so contradicted itself by listing curator bindings and alias tables as +"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model +fixes that crack (review finding A-1) and makes `derived = f(canonical)` *literally* true. Three consequences fall straight out, and they are the spine of the rest of this document: -1. **Graceful degradation is free.** If the derived middle is always rebuildable, a backend +1. **Graceful degradation is free.** If the derived tier is always recomputable, a backend that can only be read is still a first-class participant — you just derive less from it. 2. **Provenance is tractable.** Because shard-wiki never claims to *be* the source, every - derived artifact can always point back to the canonical edge it came from (union without + derived artifact can always point back to the canonical input it came from (union without erasure is a structural property, not a feature bolted on). -3. **The system is a pure function of its inputs.** `union/index/projection = f(shards, - journal)`. Bugs in the middle are recoverable by rebuild; the edges are the only thing - that must be protected (and history protects them). +3. **The derived tier is a pure function of canonical state.** `derived = f(sharded, + coordination)`. Bugs in the derived tier are recoverable by recompute; only the two + canonical tiers must be durably protected — sharded by each shard, coordination by the + Git journal (history). *Recomputability is a correctness property of the derived tier, not + a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the + operational-envelope axis.* ### The dual narrow waist @@ -73,7 +90,7 @@ principles fused with the research through-lines. | # | Invariant | Source | |---|-----------|--------| | I-1 | **Orchestrator, not engine.** Core composes shards; it never replaces or homogenises them. | INTENT Stability Note | -| I-2 | **Canonical at the edges, derived in the middle.** Union/index/projection are rebuildable from shards + journal. | §1; Logseq/ikiwiki/GT through-line | +| I-2 | **Three states; derived = f(canonical).** State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line | | I-3 | **Capability-awareness is data.** A binding's abilities are a *profile* (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters | | I-4 | **Union without erasure.** Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) | | I-5 | **Overlay before mutation.** Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT | @@ -124,8 +141,11 @@ carries provenance; an authz decision at L5 records the context under which cont Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4 (can we delegate a query?), and L5 (can this principal even reach the op?). -The dependency rule is strict and downward: **L4 may be deleted and recomputed from L1–L3.** -Nothing at L1–L3 may depend on L4 state. +The dependency rule is strict and downward, and it tracks the **three states (§1)**, not the +layer numbers: **the derived-disposable tier (the whole of L4) may be deleted and recomputed +from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal).** +Nothing canonical may depend on derived state. Note the journal at L3 is *canonical* (it holds +overlays, bindings, aliases, merges); only L4 is disposable. --- @@ -146,9 +166,13 @@ parallel terms. (paths/shards). Addressing, equivalence, and transclusion key on identity (I-9). - **Provenance envelope** — the metadata wrapper every artifact carries: source shard, freshness, liveness, authorization context, overlay status, divergence, derivation lineage. -- **Coordination journal** — the L3 Git-backed record of change flows for a space. +- **Coordination journal** — the L3 Git-backed record of change flows for a space, and the + durable home of all **coordination-canonical** state (§1): overlays, curator equivalence + bindings, alias tables, merge/reconciliation decisions. This state is born inside shard-wiki, + exists nowhere else, and is *not* derived — it must be committed, never recomputed. - **Overlay** — a non-destructive local edit against a remote/read-only/limited shard, - representable as draft/patch/commit/MR before destructive apply. + representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an + unapplied overlay is the local truth and lives in the journal. - **Projection** — a derived view of shard content, typed on two axes (§8): *kind* (replication | derivation) × *liveness* (static … irreducibly-live). - **Federation model** — the selected coordination strategy for a space (§ taxonomy, T17). @@ -311,11 +335,16 @@ composes per shard** (mechanism over policy, I-7): ### 8.4 Union & projection (L4) — the derived cache -This whole layer is rebuildable from L1–L3 (I-2). It comprises: +This whole layer is **derived-disposable**: recomputable from canonical state — sharded +content + the **coordination-canonical** inputs in the journal (I-2). Crucially, the *automatic* +equivalence results are derived, but the **human/curatorial inputs they consume — alias tables +and curator equivalence bindings — are coordination-canonical (they live in the journal), not +derived**; recompute reads them, never regenerates them. It comprises: - **Identity resolution & equivalence** — detect "same topic / derived content" path- - independently (fingerprint, span-set overlap, alias table, curator binding); present as - **chorus-of-voices** or designated-canonical (a *policy* preset). + independently from *derived* signals (content fingerprint, span-set overlap) **plus** the + *coordination-canonical* inputs (alias table, curator binding); present as + **chorus-of-voices** or designated-canonical (a *policy* preset). (Scaling: §8.6.) - **Union graph** — the navigable join of pages, links, and dimensions (namespace, genealogy, version, shard, equivalence). A *derived lens over canonical files+journal, never a new store* (the ZigZag boundary).