Files
shard-wiki/spec/CoreArchitectureBlueprint.md
tegwick d39e7f7765 spec(SHARD-WP-0006 T5): track §C as O-8..O-12; refresh decisions/traceability; close-out
Adds O-8 (preset bundles), O-9 (shard sharing vs tenant partition), O-10
(span authz + transclusion ⊕), O-11 (union under unavailability), O-12
(append-log throughput) to §12. Refreshes §14 decisions (event-sourced
coordination, conformance, I-2 verification) and §16 traceability (round-2
review + WP-0006). Flips SHARD-WP-0006 done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 02:03:58 +02:00

973 lines
66 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CoreArchitectureBlueprint — shard-wiki
Status: **draft for review** · Date: 2026-06-15 · Owner: tegwick
The whole-system architecture for shard-wiki, synthesised from `INTENT.md`, the 84-entry
`UseCaseCatalog.md`, and the full research arc (`research/260608-*`, `research/260613-*`,
`research/260614-*` — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the
**core** blueprint: it defines the layers, the abstractions, and the load-bearing decisions
that everything else implements.
Scope relationship to the other specs:
- **`ArchitectureBlueprint.md`** (existing) is the **authorization & history sub-blueprint**
(the L0L4 ladder). This document references it as the design of the cross-cutting
authorization layer (§9) and does not restate it.
- **`SHARD-WP-0002`** is the workplan that turns §6§8 into
`spec/FederationArchitecture.md` + the adapter-contract section of
`spec/TechnicalSpecificationDocument.md`.
- **`UseCaseCatalog.md`** is the demand this architecture must satisfy; UC references below
are load tests, not decoration.
---
## 1. The thesis: *canonical vs derived* (three states)
Everything in shard-wiki follows from one organising decision — that state comes in exactly
**three kinds**, and only one of them is disposable:
> **1. Sharded-canonical** — content owned by each shard (shard sovereignty).
> **2. Coordination-canonical** — durable state *born inside shard-wiki* that encodes human
> or cross-shard decisions and exists nowhere else: overlays (the local truth against a
> read-only shard), curator equivalence bindings, alias tables, merge/reconciliation
> decisions. It is recorded as an **append-only decision log in the Git coordination
> journal** (event-sourced, §8.1); the *queryable current form* of that state (the effective
> alias table, the equivalence set) is a **derived fold** of the log — i.e. tier 3, not tier 2.
> What is canonical is the **log of decisions**, not any mutable snapshot of them.
> **3. Derived-disposable** — everything shard-wiki *computes* from (1)+(2): the union graph,
> equivalence index, query indexes, projections, views. It can be deleted and recomputed.
>
> **Canonical = sharded coordination. Derived = a pure function of canonical:**
> `derived = f(sharded, coordination)`.
This is the architectural form of "orchestrator, not engine." shard-wiki never *becomes* the
source of truth; it composes sources and records the decisions it makes about them. The
research earned the *files-canonical* half empirically — every serious system externalises its
durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain
files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live
views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams
(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk
**image** — is exactly the one we record as a *boundary, not a backend*
(`research/260614-squeak-pharo-deep-dive`).
The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges,
derived in the middle"). That was wrong by omission: it had no home for **coordination-
canonical** state, and so contradicted itself by listing curator bindings and alias tables as
"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model
fixes that crack (review finding A-1) and makes `derived = f(canonical)` *literally* true.
Three consequences fall straight out, and they are the spine of the rest of this document:
1. **Graceful degradation is free.** If the derived tier is always recomputable, a backend
that can only be read is still a first-class participant — you just derive less from it.
2. **Provenance is tractable.** Because shard-wiki never claims to *be* the source, every
derived artifact can always point back to the canonical input it came from (union without
erasure is a structural property, not a feature bolted on).
3. **The derived tier is a pure function of canonical state.** `derived = f(sharded,
coordination)`. Bugs in the derived tier are recoverable by recompute; only the two
canonical tiers must be durably protected — sharded by each shard, coordination by the
Git journal (history). *Recomputability is a correctness property of the derived tier, not
a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the
operational-envelope axis.*
### The dual narrow waist
Heterogeneity is mediated at exactly two interfaces, and nowhere else:
- **Bottom waist — the Shard Adapter Contract (§6).** Every backend, however weird, enters
through one versioned, capability-described interface.
- **Top waist — the Wiki Page Model (§7).** Every consumer, however demanding, sees one
backend-neutral, Markdown-first-but-stretchable page model.
Between the waists, core logic is written **once** against capabilities and the page model —
never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an
adapter and declaring a capability profile, not editing core algorithms.
---
## 2. Architectural invariants
These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's
principles fused with the research through-lines.
| # | Invariant | Source |
|---|-----------|--------|
| I-1 | **Orchestrator, not engine.** Core composes shards; it never replaces or homogenises them. | INTENT Stability Note |
| I-2 | **Three states; derived = f(canonical).** State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line |
| I-3 | **Capability-awareness is data.** A binding's abilities are a *profile* (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters |
| I-4 | **Union without erasure.** Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) |
| I-5 | **Overlay before mutation.** Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT |
| I-6 | **Git-addressable coordination.** Every information space has a Git-backed journal even when its shards are not git-native. | INTENT |
| I-7 | **Mechanism over policy.** Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. | INTENT |
| I-8 | **Graceful degradation.** A limited backend is still usable as read-only / cache / projection / backup / patch target. | INTENT |
| I-9 | **Identity ≠ placement.** A page is an entity that may occupy N locations; address by identity, not by path. | Trilium note/branch; ZigZag |
| I-10 | **History is the floor.** Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. | ArchitectureBlueprint §2 |
| I-11 | **Authorization in core, authentication delegated.** Core decides who-may; an external provider says who-is. | INTENT; ArchitectureBlueprint |
| I-12 | **Not a file-sync daemon; not an execution platform.** Sync is wiki-page-semantic; computation is recognised+projected, not hosted. | INTENT; computational-page-model synthesis |
| I-13 | **Tenant-partitioned derived state.** Derived state is partitioned by tenant/root entity; no derived artifact spans tenants except via explicit, authorised cross-root federation. | §9.1; review B-3 |
---
## 3. The layered architecture
```
┌───────────────────────────────────────────────────────────────┐
│ L6 Consumers — Orchestrator API · CLI/agents · Web/Obsidian │
├───────────────────────────────────────────────────────────────┤
X-cut │ L5 Authorization (PEP/PDP, identity-provider iface) → │ X-cut
Prove- │ see ArchitectureBlueprint.md (L0L4 ladder) │ Capa-
nance ├───────────────────────────────────────────────────────────────┤ bility
▲ │ L4 Union & Projection (DERIVED · rebuild=fallback) │ ▲
│ │ identity resolution · equivalence/chorus · union graph · │ │
│ │ replication+derivation projections · moldable view registry│ │
│ │ · derived query index │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L3 Coordination (Git journal · overlay/patch engine · │ │
│ │ federation-model strategies · reconciliation) │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L2 Wiki Page Model ── TOP WAIST ── │ │
│ │ backend-neutral pages · identity≠placement · span address ·│ │
│ │ provenance envelope · the page shapes │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L1 Shard Adapter Contract ── BOTTOM WAIST ── │ │
│ │ versioned iface · capability profile (orthogonal) · │ │
│ │ attachment-mode binding · operation verbs │ │
└──── ├───────────────────────────────────────────────────────────────┤ ──┘
│ L0 Backends (not ours): git repos, wiki/ subdirs, Gitea/ │
│ GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion, │
│ Coulomb spaces, notebooks, … │
└───────────────────────────────────────────────────────────────┘
```
**Provenance** and **Capability** are drawn as vertical rails because they are not layers —
they are present at every layer. A page object at L2 carries provenance; a projection at L4
carries provenance; an authz decision at L5 records the context under which content was read.
Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4
(can we delegate a query?), and L5 (can this principal even reach the op?).
The dependency rule is strict and downward, and it tracks the **three states (§1)**, not the
layer numbers: **the derived-disposable tier (the whole of L4) may be deleted and recomputed
from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal).**
Nothing canonical may depend on derived state. Note the journal at L3 is *canonical* (it holds
overlays, bindings, aliases, merges); only L4 is disposable.
---
## 4. Core abstractions (the vocabulary code must use)
Straight from INTENT, sharpened by research. New code maps onto these; it does not invent
parallel terms.
- **Shard** — an independently meaningful page store attached to a root entity, with
*sovereignty*: its own backend, capability profile, history, identity model, limits.
- **Root entity / information space** — the joined space shards attach to; the unit of
Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
- **Shard adapter contract** — the versioned L1 interface; the bottom waist.
- **Capability profile** — a shard binding's position on each of the 15 spectra (§6) plus its
supported verbs. *The* data structure that drives degradation.
- **Wiki page model** — the L2 backend-neutral page; the top waist.
- **Page identity vs placement vs equivalence** — a page is an entity with a *stable handle*
(identity); it may have N placements (paths/shards); **addressing and transclusion key on
identity, but equivalence keys on content fingerprint *across* identities** (§7.2, I-9). The
three are distinct mechanisms — never conflate identity with a fingerprint.
- **Provenance envelope** — the metadata each artifact carries (source shard, freshness,
liveness, authz context, overlay status, divergence, lineage), stored **layered**: a
page-level envelope + span-level *deltas*, so per-span cost is near-zero when uniform (§7.3).
- **Coordination journal** — the L3 Git-backed, **append-only decision log** for a space: the
durable home of all **coordination-canonical** state (§1, §8.1) as *events* (overlay-created,
binding-made, alias-set, merge-decided), plus the content change-flow record. It is event-
sourced — committed, never overwritten; the queryable current coordination state is a derived
fold of it (§8.1).
- **Overlay** — a non-destructive local edit against a remote/read-only/limited shard,
representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an
unapplied overlay is the local truth and lives in the journal.
- **Projection** — a derived view of shard content. The default is a **plain lazy
replication-projection** (a freshness-stamped cache); only *source* content needing
transform/evaluate uses the **derivation-projection** extension point with its two-axis
typing (kind × liveness) and the moldable view registry (§8.4§8.5).
- **Federation model** — the selected coordination strategy for a space (§ taxonomy, T17).
- **Shard mode** — read-only · write-through · mirrored · projected · cached · canonical
(a *policy* selection constrained by the capability profile).
---
## 5. Why "layered" and not "pipeline" or "plugin-bus"
Two rejected alternatives, recorded so the choice is legible:
- **A sync pipeline** (source → transform → sink) was rejected: it implies a privileged
direction and a canonical sink, which violates shard sovereignty (I-1) and union-without-
erasure (I-4). shard-wiki is a *star* (many shards ↔ one space), not a pipe.
- **A flat plugin bus** (every backend a peer plugin emitting events) was rejected as the
*top-level* shape: it has no narrow waist, so heterogeneity leaks into every consumer.
We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies),
behind the waists.
The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.
---
## 6. Bottom waist — the Shard Adapter Contract (L1)
The single most important design decision in the project: **the adapter contract models
positions on capability spectra, not a flat checklist of boolean verbs.** A backend is not
"can/can't merge"; it sits *somewhere* on the merge spectrum, and federation operations
degrade by position. This is the lesson of putting ~23 systems in one matrix
(`research/260614-shard-spectrum-synthesis`, v3).
### 6.1 The fifteen capability spectra
Each binding declares a position on each axis. Core algorithms read these positions; there is
no per-backend code in core (I-3).
1. **Addressing granularity** — none → path → page-level store-id → in-file span → in-file
block id (Logseq `id::`) → store-UUID → portable tumbler (Xanadu, the unreached ideal)
2. **Content identity** — none → path/title → fingerprint → span-set
3. **Identity vs placement** — path=identity → identity separated from placement (Trilium
note/branch = a DAG)
4. **Structure** — flat MD → frontmatter/`key::` → `%META%` → typed objects → DB schema+
relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements
5. **History** — none → internal-only / CRDT-log → open-file → git-native
6. **Merge model** — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
7. **Native query** — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
8. **Translation** — native → lossless → lossy-with-fidelity-report (incl. HTML)
9. **Attachment mode** — file-store (native | interchange-mirror) → git-IS-store → in-engine-host
→ local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
10. **Operational envelope** — local/unbounded → realtime CRDT/WebSocket → rate-limited/
eventually-consistent/paginated
11. **Access grant** — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
12. **Content opacity** — plaintext → structured re-evaluable value → encrypted whole-shard →
per-item → proprietary-lossy-exportable
13. **Write granularity** — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
14. **Provenance granularity** — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
15. **Computational / liveness** — static source → captured-output snapshot → live-over-files →
view-time render → irreducibly-live/temporal
### 6.2 Operation verbs
`read, write, diff, merge, lock, version, publish, notify, transclude-source,
translate-syntax, structured-payload, derive-projection, execute/evaluate`. The last two are
**gated, off by default** (§8, computational content). Verb support is part of the profile and
must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).
### 6.3 Attachment-mode taxonomy (axis 9, expanded)
A backend may offer **several** modes; attach mode is a **per-binding, capability-gated
choice**, with one declared authoritative. Modes: file-store (native vault/folder *or* an
interchange/sync mirror), **git-IS-store** (the home case — forge wikis & ikiwiki: git is the
store *and* the journal at once, resolving the engine-mirror write-race), in-engine hosted
adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin
Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model),
CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. **Boundary:** a monolithic
live-memory blob (Smalltalk image, a kernel) is **never** an attach target — it participates
only via export→files (I-12).
### 6.4 Contract rules
- **Versioned interface** (Foswiki::Store + Foswiki::Meta is the proof that a stable
store-interface-with-swappable-backends works). Capability discovery is a static profile
with optional runtime negotiation.
- **Backend-swap tolerance** — shard identity/provenance survives a substrate change
(RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to *capabilities*, not to "it's files."
- **Absence is first-class** — the profile must express *can't* cleanly (Oddmuse floor), so
degradation paths are explicit, never guessed.
### 6.5 Orthogonal core, implied positions, and the interaction subset
Fifteen independent ordinal axes is *descriptively* right but would be *operationally* a mess
if treated as fifteen free dimensions: the axes are **not orthogonal**, and a degradation
function over all 15 jointly is the flat-checklist problem returning in higher dimensions
(review D-1). Three rules tame it.
**(a) A small orthogonal core; the rest are implied.** Most axes are *correlated* and collapse
to a few independent choices. The **core axes** an adapter must independently declare:
1. **Substrate** → drives attachment-mode, history, merge, and native-query positions together
(git-IS-store ⟹ history=git-native ⟹ merge=git/text ⟹ query=build-your-own-index;
relational-DB ⟹ direct-DB attach ⟹ DB-version-row history ⟹ DB query).
2. **Write granularity** → drives addressing granularity and the overlay/patch shape.
3. **Content opacity** → drives translation and (where encrypted) collapses native-query.
4. **Operational envelope** → drives freshness mode (§8.8) and rebuild expectations (§8.7).
5. **Access grant** → independent (authz, L5).
6. **Computational/liveness** → independent (projection kind, §8.5).
The remaining axes are **implied/derived** from these via published implication rules; an
adapter *may* override an implied position, but the default is computed, not hand-set. This
turns ~15 free dimensions into ~6 independent ones plus derivations — fewer things to get
wrong, and impossible combinations become unrepresentable.
**(b) Implication rules forbid impossible profiles.** E.g. `attachment=git-IS-store ⟹
history≥git-native`; `opacity=encrypted-whole-shard ⟹ native-query=none ∧ translation≤opaque`;
`merge=native-CRDT ⟹ history=CRDT-log ∧ envelope=realtime`. A profile that violates an
implication is rejected at registration — capability-as-data (I-3) with integrity constraints.
**(c) The degradation function reads a *named, small* interaction subset — not all pairs.**
"No per-backend code" is only credible if we say *which* axis interactions the generic logic
actually consults. They are:
| Operation | Axes consulted (jointly) |
|-----------|--------------------------|
| **write / overlay-apply** | write-granularity × merge-model × history × access-grant |
| **transclude / address a span** | addressing-granularity × write-granularity × identity-vs-placement |
| **project / cache** | operational-envelope × computational-liveness × content-opacity |
| **query** | native-query × content-opacity (encrypted ⇒ derive-index-or-none) |
| **translate** | translation × content-opacity × structure |
| **federate** | substrate × history × merge (per the §8.3 model) |
Everything else is a single-axis check. This table *is* the degradation contract: it is small,
enumerated, and testable — the proof obligation behind "core logic written once."
### 6.6 Conformance — profiles are verified, never self-asserted
Capability-as-data (I-3) and the entire degradation contract (§6.5) rest on one assumption:
**the profile tells the truth.** If an adapter declares `merge=git/text` but corrupts merges,
or claims `notify` and never emits, it silently poisons every degradation decision in core —
the failure is invisible because core *believed the data* (review B-2). So the profile is not
taken on trust:
- **The contract ships a versioned conformance suite.** A published battery that, given a live
binding, **exercises each declared verb and each declared spectrum position and checks that
observed behaviour matches the claim** (a `write` round-trips; a `diff` is real; `notify`
actually fires; an "encrypted/opaque" shard genuinely refuses plaintext query; an
implication-rule position, §6.5(b), holds). The suite is versioned *with* the contract, so an
adapter proves conformance against a known contract version.
- **Passing conformance is an admissibility precondition.** A binding that fails (declares a
capability it does not honour) is **rejected at registration**, not run in production with a
lying profile. Capability discovery (§6.4) therefore yields a *verified* profile.
- **Self-reported, then verified.** Adapters still *declare* their profile (discovery stays
cheap); conformance *verifies* the declaration. The two together are what make I-3 and §6.5
sound rather than aspirational — degradation logic acts on verified data.
- **Mismatch is data, not a crash.** A conformance gap is reported as a precise
capability-by-capability diff (what was claimed vs observed), so an adapter author fixes the
profile or the code; degraded-but-honest registration (drop the unsupported claim) is allowed.
This is the same discipline a versioned store interface needs in general (the `Foswiki::Store`
lineage that inspired the contract): a backend may only participate behind the interface if it
*demonstrably* behaves as the interface says.
---
## 7. Top waist — the Wiki Page Model (L2)
Backend-neutral, **Markdown-first but stretchable many ways at once**. The page model is the
lingua franca every consumer sees; an adapter's job is to project its backend into this model
(read) and accept overlays back (write), within its capabilities.
### 7.1 Page shapes the model must carry
- **Prose Markdown** — the baseline.
- **Typed / computed records** — frontmatter/`%META%`/XObjects/Notion DB rows; **computed
metadata** (Trilium inherited+templated) represented as *effective-vs-own with per-attribute
provenance*.
- **Typed-graph statements** — Wikibase claim + qualifiers + references + rank (structure
far-end).
- **Inline-embedded objects** — Quip/Notion spreadsheets & live apps inside prose.
- **Non-Markdown assets** — drawings, canvases, images: typed asset / opaque blob / pluggable
content-type registry, never silent-flattened.
- **The four computational shapes** (§8): one-source-many-projections, notebook (embedded
computed output), program-as-page, live/temporal.
All shapes reduce to a common skeleton: **`(content | source, structure, provenance envelope,
[derivation rule])`**. The page model stores the richest faithful form as canonical and treats
any Markdown rendering of a non-Markdown shape as a *lossy projection* (I-4 + fidelity report).
### 7.2 Identity, placement, addressing — three distinct concepts
The earlier draft used "identity" for two different things and (worse) suggested deriving page
identity from a content fingerprint — which would make *editing a page change its identity* and
break every reference to it (review bug B-1). They are pulled apart here:
- **Page identity — a *stable handle*.** A shard-scoped, durable key that **survives edits**:
the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path
treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and
never collides across shards. Identity is *assigned/minted, not computed from content*.
References, placement, transclusion targets, and overlays all key on identity.
- **Placement — *where* an identity sits.** One identity → N placements (paths/shards) = a DAG;
no single canonical path (I-9). Placement can change without changing identity.
- **Content equivalence — *detecting sameness*, never identity.** A **content fingerprint** (or
span-set overlap) identifies a *version / a piece of content*, used to detect that two
*distinct identities* hold the same or derived content (the equivalence/chorus mechanism,
§8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, **same
identity**; two pages, identical content → same fingerprint, **different identities**.
- **Span addressing** — a sub-page address within an identity: adopt native span IDs where
minted (Roam `:block/uid`, Logseq `id::`, Notion/CRDT UUID); else a *position* address
(path+range) or a *content-fingerprint* address for equivalence/transclusion. The Xanadu
tumbler is the portable ideal the scheme aims at without requiring.
- **Provenance envelope** rides on pages and spans (see §7.3 for its layered, low-cost form).
So the chain is: **identity (stable) → placements (N, mutable) → equivalence (cross-identity
sameness, fingerprint-based)** — three concepts, three mechanisms, never conflated.
### 7.3 Provenance is layered, not per-span-duplicated
A provenance envelope on *every span* (source shard, freshness, liveness, overlay status,
authz context, divergence, lineage) would, at block granularity, mean ~10k near-identical
envelopes for a 10k-block page — provenance dwarfing content (review D-2). The fix is the exact
pattern the page model already uses for Trilium's computed metadata: **effective-vs-own**.
- **Page-level envelope** holds the values that are uniform across the page (almost always:
source shard, observed-at, liveness, authz context).
- **Span-level deltas** record *only where a span differs* from its page envelope — a
transcluded span from another shard, an overlaid span, a span that diverges. A span with no
delta inherits the page envelope at zero storage cost.
- **Effective provenance** for any span = page envelope ⊕ span delta, computed on read.
Per-span cost is therefore **near-zero in the common (uniform) case** and pays only for genuine
heterogeneity — the same "carry only the difference" principle, applied to shard-wiki's own
metadata. Provenance remains complete (I-4); it is just not redundantly materialised.
---
## 8. Coordination, federation & projection
### 8.1 Coordination journal (L3) — Git as the spine
Every information space has a Git-backed coordination journal (I-6). It records cross-shard
operations (fork, import, reconcile, overlay-apply, space-branch) and **is** the history floor
(I-10). For git-IS-store shards the shard's own git log *is* this journal; for non-git shards
the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports
(backfill open file history). History portability is a spectrum, handled per profile (axis 5).
**The journal is an append-only decision log; current coordination state is a derived fold
(review B-3).** The first draft said coordination-canonical state "lives in the journal"
without saying how Git — excellent for history, poor for mutable structured state — represents
an alias table or an equivalence graph. Resolution: **event sourcing.** The journal stores
*decisions as events* (`overlay-created`, `binding-made`, `alias-set`, `merge-decided`,
`page-forked`), append-only and git-addressable (so history/patch/review/backup over
coordination state come for free — I-6 is *strengthened*, not bypassed). The **queryable
current state** (the effective alias table, the live equivalence set) is a **derived fold** of
the log — tier-3 disposable, indexed like any other derived structure (§8.7), rebuilt by
replaying the log. So "all equivalences touching X" is an index lookup, not an O(scan) of Git.
This is the clean form of the §1 three-state model: **the log is canonical; its folded current
state is derived.**
**Concurrency: who may append (review B-1).** A multi-tenant L4 deployment runs several
orchestrator instances, so "the journal is local Git, single writer" is not given. The model:
- **One *append authority* per information space.** Appends to a space's log are serialized
through a single logical writer (a per-space lease/leader; instances without the lease forward
their append intents to it). This makes the log a **totally-ordered event sequence** per space
— the ordering authority §8.6 relies on — without a distributed transaction. Spaces are
independent, so this scales horizontally *across* spaces (the unit of partition is the space /
root entity, matching the tenant partition, I-13); it is a per-space serialization point, not
a global one.
- **Git is the durable, addressable form; appends are commits** (or fast objects batched into
commits) under the lease — no concurrent-writer merge races because there is one writer at a
time per space.
- **Read-your-writes** holds within a space because every reader resolves current state from
the same ordered log (or its fold); across spaces there is no shared state to be inconsistent.
- **HA / failover:** the lease is time-bounded and re-grantable; a failed append-authority is
replaced and resumes from the log's head (the log is the recovery point). A partition that
splits the authority degrades that *space* to read-only until a single writer is re-elected —
it never forks the log (availability yields to log integrity; an explicit, stated trade).
- **Open residual (→ §12, O-3-adjacent):** whether very high append rates need per-space log
*sharding* (sub-logs merged by a deterministic order) is an implementation spike, not an
architectural change.
**History must stay recoverable *and* bounded (review C-3).** "Every write is a commit" + open
L0 means an unbounded, bot-/vandalism-amplified journal that eventually degrades Git itself.
Recoverability (I-10) is non-negotiable, so the answer is *compaction, not deletion*:
- **Routine git maintenance** — background `gc`/repack, commit-graph, and (for very large
spaces) partial-clone / sparse strategies; operational, no semantic change.
- **Squash-compaction of low-value churn (policy, §10)** — long runs of rapid same-author
edits or revert-pairs can be folded into checkpoint commits *while preserving the recoverable
endpoints*; what is squashed is configurable and always leaves the content recoverable (it
compacts the *path*, not the *reachable states*).
- **Per-shard history offload** — a git-IS-store shard keeps its own history in its own repo;
the coordination journal references it rather than duplicating it (the journal records
*coordination* events, not a second copy of every shard commit).
- **Anti-abuse hooks (policy)** — rate-limiting / quarantine for anonymous L0 writers feed the
authz/policy layer; they throttle *abuse*, never legitimate history. Recoverability is the
floor; bounding is how it survives at scale.
### 8.2 Overlay / patch engine (L3)
The default write path for anything below write-through capability (I-5): an edit becomes a
draft → patch/commit → MR, applied destructively only on explicit intent and only where the
profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend
still be *edited* safely.
### 8.3 Federation is plural & composable (L3) — the model taxonomy
Federation is not one mechanism. shard-wiki selects a **federation model per space and
composes per shard** (mechanism over policy, I-7):
| Model | Anchor | Coordination shape |
|-------|--------|--------------------|
| **Fork + journal** (default home case) | Federated Wiki | copy-with-provenance + per-page action journal (story = replay) |
| **VCS-replication + ping** | ikiwiki | git clone/pull/push + change-ping |
| **Query-time graph-join** | Wikibase SPARQL `SERVICE` | join remote graphs at query time, no copy |
| **Feed aggregation** | RSS/Atom | inbound feed → pages |
| **Activity streams** | ActivityPub | Create/Update events, notify or content-bearing |
| **Engine-mirror** | Wiki.js DB↔Git | engine syncs its own store to a git mirror |
### 8.4 Union & projection (L4) — the derived cache
This whole layer is **derived-disposable**: recomputable from canonical state — sharded
content + the **coordination-canonical** inputs in the journal (I-2). Crucially, the *automatic*
equivalence results are derived, but the **human/curatorial inputs they consume — alias tables
and curator equivalence bindings — are coordination-canonical (they live in the journal), not
derived**; recompute reads them, never regenerates them. It comprises:
- **Identity resolution & equivalence** — detect "same topic / derived content" path-
independently from *derived* signals (content fingerprint, span-set overlap) **plus** the
*coordination-canonical* inputs (alias table, curator binding); present as
**chorus-of-voices** or designated-canonical (a *policy* preset). (Scaling: §8.7.)
- **Union graph** — the navigable join of pages, links, and dimensions (namespace, genealogy,
version, shard, equivalence). A *derived lens over canonical files+journal, never a new
store* (the ZigZag boundary).
- **Transclusion** — one **reference-not-copy** primitive unifying Xanadu transclusion, ZigZag
clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate
named-chunk assembly, over the addressable union.
- **Projection — trivial by default, extensible for the tail.** The 95% case (Markdown in a
shard) must cost nothing conceptually, so:
- **Default = plain lazy replication-projection** — a freshness-stamped cache of remote
content (§8.8). This is *the* projection for ordinary pages; it needs no taxonomy, no
liveness reasoning, no registry. Most shards never touch anything below.
- **Extension point — derivation-projection** — invoked *only* for content that is a
*source* needing transform/compile/weave/evaluate (computational/typed content, §8.5). It
adds the liveness axis (static → captured → live-over-files → view-time → irreducibly-live)
and facets (materialization timing, multiplicity, continuity); the irreducibly-live far end
has no faithful static form (source + a marked recording). A binding that never serves such
content never instantiates any of this.
- Both kinds stamp freshness + provenance; only derivation carries the liveness machinery.
- **Moldable view registry — also an extension point, not a tax on every page.** Where a content
type offers multiple co-equal views (typed/computed/dimensional content), they are registered
as an **open, type-keyed set, none canonical-by-fact** (display-canonical is policy; GT prior
art, answers the "pluggable content-type registry" question). An ordinary Markdown page has
exactly one view and never consults the registry — the registry is queried only when a type
declares >1 view.
- **Derived query index** — delegate to a shard's native query engine where present
(Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived
index over the projection (the Logseq DataScript-over-files pattern). The index is
disposable (I-2).
### 8.5 Computational / executable content — the scope decision
**In scope as a page-model + projection concern; out of scope as an execution platform.**
shard-wiki *recognises* computational types, attaches the **canonical source**, and presents
derived forms as **provenance- and liveness-marked projections**. Driving a derivation
(tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a **gated
capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot**.
One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed")
serves notebooks, renders, and recordings alike. **No INTENT amendment is required** — this
lives inside the existing page model (L2) and projection model (L4).
### 8.6 Consistency, concurrency & conflict model
INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not
the same as "no defined consistency." This is the guarantee shard-wiki *does* offer, and the
mechanism (not policy) that makes concurrent editing safe (review bug B-2).
**The consistency guarantee — causal, anchored on the journal:**
- **Read-your-writes for coordination-canonical state.** Once an overlay/binding/merge is
appended to the space's decision log, every reader of that space sees it — because the log is
a **single totally-ordered sequence per space** (one append authority, §8.1), and all readers
resolve current state from that one order. The guarantee holds across orchestrator instances,
not just within one process; it is cheap because ordering is per-space, never global.
- **Causal consistency across the derived tier.** The union/index/projections reflect a causal
cut of `(sharded inputs seen so far, journal)`. Effects never appear before their causes; a
projection that has seen journal commit *C* has seen everything *C* depends on.
- **Eventual convergence for sharded-canonical inputs.** Remote shard content is pulled
asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest
*as observed*, bounded by the shard's operational envelope. Freshness is always *shown*
(provenance envelope), never faked — a stale projection is labelled stale, not wrong.
So: **strong + read-your-writes for what shard-wiki owns (the journal); causal for what it
derives; eventual + freshness-labelled for what shards own.** No global clock, no distributed
transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates
rather than controls.
**Conflict detection & representation is core mechanism; only resolution is policy (I-7).**
The split the earlier draft elided:
- **Detection (core).** Divergence is detected structurally: two identities resolve as
equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no
longer matches the shard's current revision. Detection is always on; it is never optional.
- **Representation (core).** A detected conflict is **first-class data in the union**, not an
error: equivalent-but-divergent pages are presented as a **coexisting set** (the
chorus/keep-both representation), each fully attributed, with the divergence recorded in the
provenance envelope (union without erasure — a conflict is information, not a failure).
- **Resolution (policy).** *Which* version wins, or whether they stay coexisting, is a
configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge /
overlay-only. Core never hard-codes one.
**Overlay-apply under source drift (the concurrent-write case).** An overlay carries the
**base revision** of the shard content it was authored against. On apply, core compares base to
the shard's current revision:
- *unchanged* → apply (fast-forward), commit to journal, propagate if the profile permits;
- *changed, non-overlapping* → three-way merge where the merge capability allows (axis 6),
else keep-both;
- *changed, overlapping* → **refuse + re-present** as a conflict (above); never silently
clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination-
canonical and valid against its base.
**Ordering.** The journal commit is the ordering authority for coordination-canonical effects;
a shard-native write is only *acknowledged* in the journal after the adapter confirms it, so a
crash between journal-intent and shard-write is recoverable (the intent is replayable, the
write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their
journal commits, giving the causal cut above.
**Residual open items** (tracked in *Known scaling risks & open problems*, §12, not pretended
solved): the exact convergence bound for
high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a
vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.
### 8.7 Scaling the union — incremental-first, rebuild as fallback
The derived tier is *recomputable* (I-2) but recompute must never be the **operational**
mechanism. A from-scratch rebuild reads every page of every shard — including rate-limited,
paginated external APIs (Notion) and irreducibly-live sources — which can take hours to days
and directly fights the operational-envelope axis (review C-2). So:
**Incremental, change-driven maintenance is the primary mechanism.** Each shard's `notify`
capability (or a poll/ETag fallback where it has none, §8.8) emits **change events**; an event
drives a **delta update** to exactly the affected union nodes, equivalence candidates, indexes,
and projections. The derived tier is a continuously-maintained materialised view, not a
periodically-recomputed one. Steady-state cost is O(changes), not O(corpus).
**Full rebuild is a rare, bounded fallback** — for cold start, schema/algorithm change, or
suspected corruption — and it is **explicitly not required to be cheap**. It respects each
shard's envelope (it may be slow, throttled, or resumable for a rate-limited shard) and runs
*concurrently with serving the existing derived tier*; it swaps in atomically on completion.
I-2 guarantees rebuild is *possible and correct*, not instant.
**Equivalence detection is indexed, not pairwise (review C-1).** Naive fingerprint/span-set
comparison across all pages of all shards is O(N²) and is forbidden. Instead:
1. **Blocking / candidate generation** — cheap keys bucket pages that *could* be equivalent:
normalised title, normalised path tail, explicit alias-table entries (coordination-
canonical), and **MinHash/LSH bands over content shingles** for near-duplicate and
derived-content detection. Only within-bucket pairs are considered — turning O(N²) into
≈O(N) candidates.
2. **Verification** — candidate pairs are confirmed by full fingerprint / span-set overlap and
any curator binding. Confirmed equivalences become union edges.
3. **Incremental maintenance — the delta is *not* additive (review B-4).** A changed page may
*leave* buckets as well as *enter* them, and leaving a bucket can **break an existing
equivalence edge** another page relied on. So a change is processed as: (i) recompute the
page's bucket membership; (ii) for buckets it **left**, re-verify the pairs that depended on
the shared bucket and **retract** edges no longer supported; (iii) for buckets it **entered**,
verify the new candidate pairs and **add** edges; (iv) **propagate** to the equivalence
neighbours of any retracted/added edge (equivalence is transitive-ish via chorus sets, so a
retraction can split a set). Maintenance is per-change and bounded by the page's
neighbourhood, but it covers retraction and propagation — not just additions.
**The index is itself derived** (disposable, recomputable) and per-tenant-partitioned (§9).
Its parameters (LSH band/row counts, shingle size, precision/recall) are tunable; the accepted
**false-negative rate of blocking** is a known, tracked limitation (§12) — blocking trades a
small miss rate for tractability, and curator bindings are the escape hatch for misses.
**Verifying I-2 (`derived = f(canonical)`) — eventually, not on faith (review B-4).**
Incremental maintenance can drift from a from-scratch fold over time (a missed retraction, a
dropped event, a bug). I-2 is therefore an **eventually-verified** property, not a free one,
and the architecture names the mechanism that verifies it:
- **A digest of the derived tier.** Each partition's derived tier carries a rolling content
digest (a Merkle-style hash over union nodes/edges/index entries) maintained alongside the
incremental updates.
- **A background consistency-checker** periodically recomputes the digest over a *sampled* (or,
on a slow cadence, full) fold of canonical state and compares. A mismatch localises the drift
to a partition/region and triggers a **scoped recompute** of just that region — cheap relative
to a global rebuild, and self-healing.
- **So I-2 holds *eventually and verifiably*:** the incremental engine is the fast path, the
checker is the guarantee, and divergence is detected and repaired rather than silently
accumulating. The exact sampling rate / digest granularity is an implementation spike (§12).
### 8.8 Cache freshness & invalidation
Replication-projection caches remote shard content; cache invalidation is the actual hard part
and was missing from the first draft (review C-2). The protocol is **per-binding, driven by the
capability profile**, with one rule: **freshness is always represented, never assumed** — every
cached page's provenance envelope carries `(observed-at, source-rev-if-known, staleness-state)`,
so a consumer can always tell live from stale.
**Three invalidation modes, chosen by capability, not hard-coded:**
| Mode | When | Mechanism |
|------|------|-----------|
| **Event-driven (push)** | shard has `notify` | a change event invalidates exactly the affected entries and enqueues a delta refresh (§8.7); the preferred mode |
| **Validator poll** | shard exposes ETag / Last-Modified / rev | conditional fetch (`If-None-Match`); cheap "still fresh?" checks without transferring bodies |
| **TTL** | shard offers neither | time-bounded staleness; the floor mode (Oddmuse-class shards) |
Most real bindings are **hybrid**: event-driven for invalidation + a long TTL as a safety net
for missed events + validator polls on read when an entry is past a soft age.
**Operational-envelope coupling.** The mode is constrained by axis-10: a **rate-limited** shard
(Notion) *must* favour event-driven + long TTL and *must not* poll per-read — the freshness
policy is capability-gated like everything else. A local file shard can watch the filesystem
(near-instant invalidation, effectively event-driven for free).
**Thundering-herd / coalescing.** Concurrent reads of the same stale entry trigger a **single
in-flight refresh** (single-flight); other readers await it or are served the stale-but-labelled
value per policy. Bulk invalidations (a shard-wide event) are **batched and rate-shaped** to the
shard's envelope rather than fired as N concurrent fetches.
**Staleness is a policy knob, not a correctness bug.** Whether a reader gets *stale-but-fast* or
*blocks-for-fresh* is a §10 preset (per space or per request); either way the envelope tells the
truth about what was served. This is union-without-erasure applied to time.
---
## 9. Cross-cut — Authorization (L5)
Fully specified in **`ArchitectureBlueprint.md`** (the access & history sub-blueprint);
summarised here for completeness:
- **One core, a ladder of modes** L0 (open/c2, zero deps) → L1 (attributed) → L2
(authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration,
not re-architecture.
- **PEP** wraps every adapter op; **PDP** decides `(principal, action, target)` over actions
`read/write/patch/merge/administer`, layered on the adapter's capability profile (a shard
that can't write can't be written regardless of policy — L5 consults the L1 rail).
- **Authentication delegated** to a pluggable IdentityProvider (null provider = L0 default);
real identity from `user-engine` over `net-kingdom` IAM.
- **Fail open only at L0, fail closed at L2+.** Authorization is pure/offline once a Principal
is resolved. Provenance carries authz context so the union never leaks unreadable content
(the L5↔provenance-rail interaction).
### 9.1 Tenant isolation of the derived tier (review B-3)
Read-time authz filtering is necessary but **not sufficient** when the derived tier is
*persisted*: a single cross-tenant union/index cache guarded only by a filter on read is a
standing leak surface (one filtering bug exposes another tenant's content). So isolation is
**structural, not just procedural**:
- **The derived tier is partitioned per tenant / root entity.** A tenant maps to a root entity
(§4); its union graph, equivalence index, projections, and caches live in a **separate
partition** keyed by that tenant. There is no shared cross-tenant derived store to leak from.
- **No cross-tenant equivalence by default.** Blocking/LSH (§8.7) operates *within* a partition;
cross-tenant equivalence is an explicit, authorised, opt-in federation between roots, never an
accident of a shared index.
- **Read-time filtering remains, as defence-in-depth** — the provenance envelope's authz context
is still checked, so even within a partition a principal sees only what it may; partitioning
removes the *blast radius*, filtering removes the *fine-grained* leak.
- **This reconciles I-2 with L5:** recomputability (a persisted-but-disposable derived tier) is
preserved *per partition* — each tenant's derived tier is independently rebuildable from that
tenant's canonical state — so isolation costs nothing in the rebuild model. At L0/L1 (single
tenant) there is one partition and the machinery is invisible.
**Isolation invariant (add to §2 as I-13):** *derived state is partitioned by tenant; no
derived artifact spans tenants except through an explicit, authorised cross-root federation.*
---
## 10. The policy surface (mechanism over policy, made concrete)
I-7 only means something if the policy knobs are enumerated and kept *out* of core algorithms.
The configurable presets are:
- **Canonical-source policy** — chorus / designated-canonical / git-merge / overlay-only /
vote-to-merge (per space or per equivalence set).
- **Federation model** — the §8.3 taxonomy, per space, composable per shard.
- **Shard mode** — read-only / write-through / mirrored / projected / cached / canonical
(constrained by the capability profile).
- **Reconciliation cadence & conflict exposure** — push/poll/manual; show-conflicts vs
auto-merge-when-supported.
- **Conflict-resolution preset** — chorus / designated-canonical / git-merge / vote-to-merge /
overlay-only (the *resolution* policy over §8.6's core detection; per space or equivalence set).
- **Freshness / invalidation mode** — event-driven / validator-poll / TTL / hybrid, and
stale-but-fast vs block-for-fresh on read (§8.8; constrained by the operational envelope).
- **History compaction** — squash policy for low-value churn, gc/repack cadence, per-shard
offload (§8.1), always preserving recoverable endpoints.
- **Tenant partition mapping** — tenant ↔ root-entity, and any explicit cross-root federation
(§9.1, I-13).
- **Execution policy** — derive/execute off (default) / sandboxed / per-shard-allowed.
- **Authorization mode** — the L0L4 ladder.
- **Projection materialization** — lazy/eager; snapshot vs view-time; recording retention.
Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; event-driven+TTL
freshness; overlay-before-mutation; execution off; one tenant = one root) and never hard-codes
any of the above. (**Preset bundles** that package coherent knob-sets per persona are tracked
as O-8, §12 — flexibility without bundles is operator burden.)
---
## 11. Concrete module structure (bridge to implementation)
A proposed package layout for `src/shard_wiki/`, mapping 1:1 to the layers so the dependency
rule (downward only; the derived tier is incrementally maintained, rebuild = fallback) is
enforceable by import lint:
```
src/shard_wiki/
model/ # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
# Span, the page-shape types; capability-spectrum value types
adapters/ # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
# attachment-mode binding; concrete adapters:
git/ folder/ gitea/ obsidian/ webdav/ notion/ … # each: profile + verbs
coordination/ # L3: DecisionLog (append-only, git-backed, per-space append authority/
# lease), OverlayEngine (draft→patch→MR), reconcile
# (current coordination state = a derived fold → lives in union/)
federation/ # L3: FederationModel strategies (fork_journal, vcs_ping,
# graph_join, feed, activitypub, engine_mirror)
union/ # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
# Transclusion (reference-not-copy)
projection/ # L4 (derived): ReplicationProjection, DerivationProjection,
# ViewRegistry (moldable), QueryIndex (delegate|derive)
authz/ # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
provenance/ # cross-cut LEAF: ProvenanceEnvelope type + ⊕ (effective) only — pure data
policy/ # cross-cut LEAF: the §10 policy surface (presets + a resolve() read by
# coordination/federation/projection/authz); owns NO mechanism
api/ # L6: orchestrator API (server-side union for agents/CLI)
```
**The cross-cutting rails are leaves, not god-modules (review D-4).** `provenance/` and
`policy/` are imported widely, so they are the highest coupling risk; the discipline that caps
it is: **they may import *nothing* in the tree and contain *only* stable data types + pure
functions** (the envelope and its ``; the policy presets and a `resolve(question) → choice`).
Mechanism never lives in a rail — `policy/` says *what* the preset is, `coordination/`/
`projection/` decide *how* to honour it. A change to a rail is then a change to a small, stable,
dependency-free leaf, not a ripple through every layer. Capability-spectrum value types live in
`model/` (also leaf-like) for the same reason.
Hard import rules (enforced by import lint):
- `union/` and `projection/` may import `model/`, `adapters/`, `coordination/`, `policy/`,
`provenance/` — but **nothing may import them** (they are the disposable derived tier).
- `model/`, `adapters/`, `provenance/`, `policy/` import nothing else in the tree (the waists
and rails stay thin); `provenance/` and `policy/` import nothing at all.
- `coordination/` and `federation/` may import the waists + rails, never the derived tier.
---
## 12. Known scaling risks & open problems
Tracked honestly rather than pretend-solved (review disposition F). Each has a **chosen
direction** and a **revisit trigger** — the thing that, if observed, forces a redesign.
| # | Risk / open problem | Chosen direction | Revisit trigger |
|---|---------------------|------------------|-----------------|
| O-1 | **Equivalence blocking misses true matches** (LSH false negatives, §8.7) | accept a small miss rate; curator bindings are the escape hatch | measured recall below an agreed threshold on real corpora |
| O-2 | **Convergence bound for high-write CRDT shards under partition** (§8.6) | causal via journal + CRDT-native merge at the shard; no global bound promised | user-visible divergence that outlives a partition |
| O-3 | **Per-equivalence-set divergence tracking** (§8.6) | start with base-rev comparison; add vector clocks only if needed | 3-way concurrent divergence that base-rev mis-orders |
| O-4 | **Persisted derived-tier cost ceiling** (§8.7/§9.1) | per-tenant partition, incremental-maintained, rebuild is fallback | a tenant whose incremental cost still exceeds budget |
| O-5 | **Axis-interaction completeness** (§6.5) | the named interaction table is the contract; extend deliberately | a real adapter needing an interaction not in the table |
| O-6 | **Span-address portability across projection** (§7.2) | shard-scoped native-id wrapping now; tumbler later | cross-shard transclusion that native ids can't satisfy |
| O-7 | **Squash-compaction vs. perfect auditability** (§8.1) | compact the *path*, preserve reachable states; configurable | a compliance need for every intermediate keystroke |
| O-8 | **Policy-knob proliferation → operator burden** (§10) | ship named **preset bundles** ("personal vault" / "team wiki" / "enterprise federation") over the policy surface | operators mis-configuring interacting knobs |
| O-9 | **Shard sharing across roots vs tenant partition** (§9.1, I-13) | shard exclusive to one root by default; explicit shared-read binding otherwise (avoids double-caching a rate-limited shard) | a shard legitimately needed live in two tenants |
| O-10 | **Span-level authz under transclusion** (aggregation/inference leak; ⊕ across boundaries, §7.3/§9) | a transcluded span inherits the **stricter** of source & host authz; provenance ⊕ composes the source-page envelope under the host | a real cross-authz transclusion |
| O-11 | **Union under shard unavailability** (§8.8 covers stale, not down) | **partial union** + per-shard "unavailable" provenance + last-known-projection where policy allows | an SLA need on partial reads |
| O-12 | **Per-space append-log throughput ceiling** (§8.1 append authority) | single writer per space scales across spaces; per-space log sharding if needed | a single space exceeding one writer's append rate |
These are the spec-writing inputs for `SHARD-WP-0002`; none blocks the architecture, each
scopes an implementation spike.
---
## 13. Canonical data flows (the architecture exercised)
**A. Attach a shard.** Adapter binds (chosen attachment mode) → probes/declares a capability
profile → core registers the shard under a root entity → if not git-native, the coordination
journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).
**B. Read a page through the union.** Consumer asks the union for an identity → Identity
resolver maps it to placements across shards → equivalence yields chorus or canonical →
replication-projection lazily fetches from each shard (cache + freshness) → page returned
wrapped in its provenance envelope → L5 filters anything the principal can't see at source.
**C. Edit a read-only / limited shard.** Write request → L5 PDP allows → capability profile
says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native
syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal
and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully
attributed.
**D. Attach a computational notebook.** Adapter declares profile (attachment=file-store,
opacity=mixed, computational=captured-output). Core attaches the `.ipynb` **source** as
canonical; presents cells + embedded outputs as **derivation-projection snapshots** marked
"run N, env unguaranteed"; offers a static render via the view registry; re-execution stays
gated off. History uses paired-text/nbdime per axis 5.
---
## 14. Key tradeoffs & decisions
Decided:
- **Capability spectra over a verb checklist** — richer contract for precise, uniform
degradation; tamed by an orthogonal core + implied positions + a named interaction table
(§6.5). (Decided.)
- **Three states; derived = f(canonical)** — sharded + coordination canonical, derived
disposable (§1). (Decided; supersedes the earlier "edges vs middle" framing.)
- **Event-sourced coordination, one append authority per space** — coordination-canonical state
is an append-only **decision log** in the git journal; current state is a derived fold; a
per-space append lease gives a totally-ordered log and read-your-writes across orchestrator
instances (§8.1). (Decided — resolves the single-vs-multi-writer keystone.)
- **Profiles are verified, not asserted** — a versioned **conformance suite** gates adapter
admission; capability-as-data acts on verified data (§6.6). (Decided.)
- **I-2 is eventually-verified** — incremental maintenance is the fast path; a digest +
background consistency-checker detects and self-heals drift (§8.7). (Decided.)
- **Incremental-first, rebuild-as-fallback** — the derived tier is continuously maintained from
change events; full rebuild is rare and need not be cheap (§8.7). (Decided — resolves the
earlier "union graph persistence" open item: **persisted, per-tenant, incrementally
maintained, rebuildable**, §9.1.)
- **Identity ≠ fingerprint** — page identity is a stable handle; fingerprints are for
equivalence (§7.2). (Decided.)
- **Consistency = read-your-writes (journal) + causal (derived) + eventual/freshness-labelled
(shards)**; conflict detection/representation is core, resolution is policy (§8.6). (Decided.)
- **Address scheme** — shard-scoped native-id wrapping now; portable tumbler later (§7.2, O-6).
(Decided.)
- **Default federation = fork+journal over Git**; other models opt-in (§8.3). (Decided.)
- **Execution off by default** — recognise+project always; execute only when gated (§8.5). (Decided.)
- **Derived tier is tenant-partitioned** (I-13, §9.1). (Decided.)
Still open (carried to §12 / policy):
1. **L1 "attributed-but-open" mode** — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
2. **Per-page ACL default** — off (per-shard/namespace) confirmed; revisit only if demand appears.
3. The implementation spikes in **§12** (O-1…O-7).
---
## 15. What this architecture is *not*
- Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
- Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
- Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
- Not an execution platform — computation is recognised and projected, not hosted.
- Not a universal ontology — no single schema is imposed on all shards.
- Not an authentication/identity store — that is delegated (authorization is owned).
---
## 16. Traceability
- **INTENT** — every invariant in §2 (I-1…I-13) cites an INTENT principle or boundary; no
invariant contradicts the Stability Note.
- **Review & hardening** — this revision folds in
`history/260615-core-architecture-blueprint-review.md` via **`SHARD-WP-0005`**: A-1→§1/§3/§4
(three-state re-frame), B-1→§7.2 (identity vs equivalence), B-2→§8.6 (consistency/conflict),
B-3→§9.1+I-13 (tenant isolation), C-1/C-2→§8.7/§8.8 (incremental + indexed + invalidation),
C-3→§8.1 (history scaling), D-1→§6.5 (orthogonal core), D-2→§7.3 (layered provenance),
D-3→§8.4 (common-case projection), D-4→§11 (policy module + rail discipline); open items→§12.
- **Round-2 review & hardening II** — folds in
`history/260615-core-architecture-blueprint-review-2.md` via **`SHARD-WP-0006`**:
A-1…A-4→§3/§4/§10/§11 (overview reconciled to the body), B-1+B-3→§8.1 (event-sourced
coordination + per-space append authority), B-2→§6.6 (adapter conformance suite),
B-4→§8.7 (incremental retraction/propagation + I-2 digest/checker); C-1…C-4→§12 (O-8…O-11).
- **Research** — §6 (spectra) ← `260614-shard-spectrum-synthesis` v3; §8.3 (federation
taxonomy) ← v3 §2.5; §8.4§8.5 (two-axis projection, view registry, computational scope) ←
`260614-computational-page-model-synthesis`; §7 page shapes ← the engine + modern-tool +
computational dives; §1 thesis ← the files-canonical/index-derived through-line across
Logseq/ikiwiki/GT/Pharo/Jupyter.
- **Use cases** — the architecture is sized to UC-01UC-84: federation/coordination (UC-0107,
2633, 56, 7172, 79) → §8; attachment/adapter (UC-3443, 50, 53, 57, 6062, 6466, 6870,
7682) → §6; page model & fidelity (UC-34, 39, 42, 55, 5859, 67, 73, 80, 8384) → §7/§8.5;
addressing/identity/query (UC-32, 4449, 5152, 54, 63, 74) → §7.2/§8.4; provenance &
metadata (UC-2425, 75) → the provenance rail; collaboration & discovery (UC-0823) → L6
consumers over the union.
- **Workplans** — §6§8 are the design target of `SHARD-WP-0002` (T11T18); §9 is owned by
`ArchitectureBlueprint.md`; §1 (yawex-derived resolution/overlay) aligns with
`SHARD-WP-0001`.
---
## 17. Stability note
This document defines shard-wiki's **internal** architecture; it may evolve as the spec
workplans land. But the **thesis (§1)**, the **invariants (§2)**, and the **dual narrow waist
(§1, §6, §7)** are load-bearing — changing any of them is an architectural change in the sense
of INTENT's Stability Note and should be rare and deliberate.