Files
shard-wiki/spec/CoreArchitectureBlueprint.md
tegwick 04be66161e spec(SHARD-WP-0005 T3): consistency, concurrency & conflict model (§8.6)
Fixes bug B-2. States the guarantee (read-your-writes for journal-owned
coordination-canonical state; causal across the derived tier; eventual+
freshness-labelled for sharded inputs). Conflict detection+representation =
core mechanism, resolution = policy. Overlay-apply-under-drift semantics
(fast-forward / three-way / refuse+re-present) and journal ordering.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:35:00 +02:00

615 lines
39 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CoreArchitectureBlueprint — shard-wiki
Status: **draft for review** · Date: 2026-06-15 · Owner: tegwick
The whole-system architecture for shard-wiki, synthesised from `INTENT.md`, the 84-entry
`UseCaseCatalog.md`, and the full research arc (`research/260608-*`, `research/260613-*`,
`research/260614-*` — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the
**core** blueprint: it defines the layers, the abstractions, and the load-bearing decisions
that everything else implements.
Scope relationship to the other specs:
- **`ArchitectureBlueprint.md`** (existing) is the **authorization & history sub-blueprint**
(the L0L4 ladder). This document references it as the design of the cross-cutting
authorization layer (§9) and does not restate it.
- **`SHARD-WP-0002`** is the workplan that turns §6§8 into
`spec/FederationArchitecture.md` + the adapter-contract section of
`spec/TechnicalSpecificationDocument.md`.
- **`UseCaseCatalog.md`** is the demand this architecture must satisfy; UC references below
are load tests, not decoration.
---
## 1. The thesis: *canonical vs derived* (three states)
Everything in shard-wiki follows from one organising decision — that state comes in exactly
**three kinds**, and only one of them is disposable:
> **1. Sharded-canonical** — content owned by each shard (shard sovereignty).
> **2. Coordination-canonical** — durable state *born inside shard-wiki* that encodes human
> or cross-shard decisions and exists nowhere else: overlays (the local truth against a
> read-only shard), curator equivalence bindings, alias tables, merge/reconciliation
> decisions. It lives in the **Git coordination journal**.
> **3. Derived-disposable** — everything shard-wiki *computes* from (1)+(2): the union graph,
> equivalence index, query indexes, projections, views. It can be deleted and recomputed.
>
> **Canonical = sharded coordination. Derived = a pure function of canonical:**
> `derived = f(sharded, coordination)`.
This is the architectural form of "orchestrator, not engine." shard-wiki never *becomes* the
source of truth; it composes sources and records the decisions it makes about them. The
research earned the *files-canonical* half empirically — every serious system externalises its
durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain
files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live
views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams
(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk
**image** — is exactly the one we record as a *boundary, not a backend*
(`research/260614-squeak-pharo-deep-dive`).
The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges,
derived in the middle"). That was wrong by omission: it had no home for **coordination-
canonical** state, and so contradicted itself by listing curator bindings and alias tables as
"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model
fixes that crack (review finding A-1) and makes `derived = f(canonical)` *literally* true.
Three consequences fall straight out, and they are the spine of the rest of this document:
1. **Graceful degradation is free.** If the derived tier is always recomputable, a backend
that can only be read is still a first-class participant — you just derive less from it.
2. **Provenance is tractable.** Because shard-wiki never claims to *be* the source, every
derived artifact can always point back to the canonical input it came from (union without
erasure is a structural property, not a feature bolted on).
3. **The derived tier is a pure function of canonical state.** `derived = f(sharded,
coordination)`. Bugs in the derived tier are recoverable by recompute; only the two
canonical tiers must be durably protected — sharded by each shard, coordination by the
Git journal (history). *Recomputability is a correctness property of the derived tier, not
a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the
operational-envelope axis.*
### The dual narrow waist
Heterogeneity is mediated at exactly two interfaces, and nowhere else:
- **Bottom waist — the Shard Adapter Contract (§6).** Every backend, however weird, enters
through one versioned, capability-described interface.
- **Top waist — the Wiki Page Model (§7).** Every consumer, however demanding, sees one
backend-neutral, Markdown-first-but-stretchable page model.
Between the waists, core logic is written **once** against capabilities and the page model —
never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an
adapter and declaring a capability profile, not editing core algorithms.
---
## 2. Architectural invariants
These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's
principles fused with the research through-lines.
| # | Invariant | Source |
|---|-----------|--------|
| I-1 | **Orchestrator, not engine.** Core composes shards; it never replaces or homogenises them. | INTENT Stability Note |
| I-2 | **Three states; derived = f(canonical).** State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line |
| I-3 | **Capability-awareness is data.** A binding's abilities are a *profile* (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters |
| I-4 | **Union without erasure.** Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) |
| I-5 | **Overlay before mutation.** Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT |
| I-6 | **Git-addressable coordination.** Every information space has a Git-backed journal even when its shards are not git-native. | INTENT |
| I-7 | **Mechanism over policy.** Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. | INTENT |
| I-8 | **Graceful degradation.** A limited backend is still usable as read-only / cache / projection / backup / patch target. | INTENT |
| I-9 | **Identity ≠ placement.** A page is an entity that may occupy N locations; address by identity, not by path. | Trilium note/branch; ZigZag |
| I-10 | **History is the floor.** Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. | ArchitectureBlueprint §2 |
| I-11 | **Authorization in core, authentication delegated.** Core decides who-may; an external provider says who-is. | INTENT; ArchitectureBlueprint |
| I-12 | **Not a file-sync daemon; not an execution platform.** Sync is wiki-page-semantic; computation is recognised+projected, not hosted. | INTENT; computational-page-model synthesis |
---
## 3. The layered architecture
```
┌───────────────────────────────────────────────────────────────┐
│ L6 Consumers — Orchestrator API · CLI/agents · Web/Obsidian │
├───────────────────────────────────────────────────────────────┤
X-cut │ L5 Authorization (PEP/PDP, identity-provider iface) → │ X-cut
Prove- │ see ArchitectureBlueprint.md (L0L4 ladder) │ Capa-
nance ├───────────────────────────────────────────────────────────────┤ bility
▲ │ L4 Union & Projection (DERIVED, rebuildable cache) │ ▲
│ │ identity resolution · equivalence/chorus · union graph · │ │
│ │ replication+derivation projections · moldable view registry│ │
│ │ · derived query index │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L3 Coordination (Git journal · overlay/patch engine · │ │
│ │ federation-model strategies · reconciliation) │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L2 Wiki Page Model ── TOP WAIST ── │ │
│ │ backend-neutral pages · identity≠placement · span address ·│ │
│ │ provenance envelope · the page shapes │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L1 Shard Adapter Contract ── BOTTOM WAIST ── │ │
│ │ versioned iface · capability profile (15 spectra) · │ │
│ │ attachment-mode binding · operation verbs │ │
└──── ├───────────────────────────────────────────────────────────────┤ ──┘
│ L0 Backends (not ours): git repos, wiki/ subdirs, Gitea/ │
│ GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion, │
│ Coulomb spaces, notebooks, … │
└───────────────────────────────────────────────────────────────┘
```
**Provenance** and **Capability** are drawn as vertical rails because they are not layers —
they are present at every layer. A page object at L2 carries provenance; a projection at L4
carries provenance; an authz decision at L5 records the context under which content was read.
Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4
(can we delegate a query?), and L5 (can this principal even reach the op?).
The dependency rule is strict and downward, and it tracks the **three states (§1)**, not the
layer numbers: **the derived-disposable tier (the whole of L4) may be deleted and recomputed
from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal).**
Nothing canonical may depend on derived state. Note the journal at L3 is *canonical* (it holds
overlays, bindings, aliases, merges); only L4 is disposable.
---
## 4. Core abstractions (the vocabulary code must use)
Straight from INTENT, sharpened by research. New code maps onto these; it does not invent
parallel terms.
- **Shard** — an independently meaningful page store attached to a root entity, with
*sovereignty*: its own backend, capability profile, history, identity model, limits.
- **Root entity / information space** — the joined space shards attach to; the unit of
Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
- **Shard adapter contract** — the versioned L1 interface; the bottom waist.
- **Capability profile** — a shard binding's position on each of the 15 spectra (§6) plus its
supported verbs. *The* data structure that drives degradation.
- **Wiki page model** — the L2 backend-neutral page; the top waist.
- **Page identity vs placement** — a page is an entity (identity); it may have N placements
(paths/shards). Addressing, equivalence, and transclusion key on identity (I-9).
- **Provenance envelope** — the metadata wrapper every artifact carries: source shard,
freshness, liveness, authorization context, overlay status, divergence, derivation lineage.
- **Coordination journal** — the L3 Git-backed record of change flows for a space, and the
durable home of all **coordination-canonical** state (§1): overlays, curator equivalence
bindings, alias tables, merge/reconciliation decisions. This state is born inside shard-wiki,
exists nowhere else, and is *not* derived — it must be committed, never recomputed.
- **Overlay** — a non-destructive local edit against a remote/read-only/limited shard,
representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an
unapplied overlay is the local truth and lives in the journal.
- **Projection** — a derived view of shard content, typed on two axes (§8): *kind*
(replication | derivation) × *liveness* (static … irreducibly-live).
- **Federation model** — the selected coordination strategy for a space (§ taxonomy, T17).
- **Shard mode** — read-only · write-through · mirrored · projected · cached · canonical
(a *policy* selection constrained by the capability profile).
---
## 5. Why "layered" and not "pipeline" or "plugin-bus"
Two rejected alternatives, recorded so the choice is legible:
- **A sync pipeline** (source → transform → sink) was rejected: it implies a privileged
direction and a canonical sink, which violates shard sovereignty (I-1) and union-without-
erasure (I-4). shard-wiki is a *star* (many shards ↔ one space), not a pipe.
- **A flat plugin bus** (every backend a peer plugin emitting events) was rejected as the
*top-level* shape: it has no narrow waist, so heterogeneity leaks into every consumer.
We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies),
behind the waists.
The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.
---
## 6. Bottom waist — the Shard Adapter Contract (L1)
The single most important design decision in the project: **the adapter contract models
positions on capability spectra, not a flat checklist of boolean verbs.** A backend is not
"can/can't merge"; it sits *somewhere* on the merge spectrum, and federation operations
degrade by position. This is the lesson of putting ~23 systems in one matrix
(`research/260614-shard-spectrum-synthesis`, v3).
### 6.1 The fifteen capability spectra
Each binding declares a position on each axis. Core algorithms read these positions; there is
no per-backend code in core (I-3).
1. **Addressing granularity** — none → path → page-level store-id → in-file span → in-file
block id (Logseq `id::`) → store-UUID → portable tumbler (Xanadu, the unreached ideal)
2. **Content identity** — none → path/title → fingerprint → span-set
3. **Identity vs placement** — path=identity → identity separated from placement (Trilium
note/branch = a DAG)
4. **Structure** — flat MD → frontmatter/`key::` → `%META%` → typed objects → DB schema+
relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements
5. **History** — none → internal-only / CRDT-log → open-file → git-native
6. **Merge model** — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
7. **Native query** — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
8. **Translation** — native → lossless → lossy-with-fidelity-report (incl. HTML)
9. **Attachment mode** — file-store (native | interchange-mirror) → git-IS-store → in-engine-host
→ local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
10. **Operational envelope** — local/unbounded → realtime CRDT/WebSocket → rate-limited/
eventually-consistent/paginated
11. **Access grant** — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
12. **Content opacity** — plaintext → structured re-evaluable value → encrypted whole-shard →
per-item → proprietary-lossy-exportable
13. **Write granularity** — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
14. **Provenance granularity** — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
15. **Computational / liveness** — static source → captured-output snapshot → live-over-files →
view-time render → irreducibly-live/temporal
### 6.2 Operation verbs
`read, write, diff, merge, lock, version, publish, notify, transclude-source,
translate-syntax, structured-payload, derive-projection, execute/evaluate`. The last two are
**gated, off by default** (§8, computational content). Verb support is part of the profile and
must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).
### 6.3 Attachment-mode taxonomy (axis 9, expanded)
A backend may offer **several** modes; attach mode is a **per-binding, capability-gated
choice**, with one declared authoritative. Modes: file-store (native vault/folder *or* an
interchange/sync mirror), **git-IS-store** (the home case — forge wikis & ikiwiki: git is the
store *and* the journal at once, resolving the engine-mirror write-race), in-engine hosted
adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin
Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model),
CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. **Boundary:** a monolithic
live-memory blob (Smalltalk image, a kernel) is **never** an attach target — it participates
only via export→files (I-12).
### 6.4 Contract rules
- **Versioned interface** (Foswiki::Store + Foswiki::Meta is the proof that a stable
store-interface-with-swappable-backends works). Capability discovery is a static profile
with optional runtime negotiation.
- **Backend-swap tolerance** — shard identity/provenance survives a substrate change
(RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to *capabilities*, not to "it's files."
- **Absence is first-class** — the profile must express *can't* cleanly (Oddmuse floor), so
degradation paths are explicit, never guessed.
---
## 7. Top waist — the Wiki Page Model (L2)
Backend-neutral, **Markdown-first but stretchable many ways at once**. The page model is the
lingua franca every consumer sees; an adapter's job is to project its backend into this model
(read) and accept overlays back (write), within its capabilities.
### 7.1 Page shapes the model must carry
- **Prose Markdown** — the baseline.
- **Typed / computed records** — frontmatter/`%META%`/XObjects/Notion DB rows; **computed
metadata** (Trilium inherited+templated) represented as *effective-vs-own with per-attribute
provenance*.
- **Typed-graph statements** — Wikibase claim + qualifiers + references + rank (structure
far-end).
- **Inline-embedded objects** — Quip/Notion spreadsheets & live apps inside prose.
- **Non-Markdown assets** — drawings, canvases, images: typed asset / opaque blob / pluggable
content-type registry, never silent-flattened.
- **The four computational shapes** (§8): one-source-many-projections, notebook (embedded
computed output), program-as-page, live/temporal.
All shapes reduce to a common skeleton: **`(content | source, structure, provenance envelope,
[derivation rule])`**. The page model stores the richest faithful form as canonical and treats
any Markdown rendering of a non-Markdown shape as a *lossy projection* (I-4 + fidelity report).
### 7.2 Identity, placement, addressing — three distinct concepts
The earlier draft used "identity" for two different things and (worse) suggested deriving page
identity from a content fingerprint — which would make *editing a page change its identity* and
break every reference to it (review bug B-1). They are pulled apart here:
- **Page identity — a *stable handle*.** A shard-scoped, durable key that **survives edits**:
the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path
treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and
never collides across shards. Identity is *assigned/minted, not computed from content*.
References, placement, transclusion targets, and overlays all key on identity.
- **Placement — *where* an identity sits.** One identity → N placements (paths/shards) = a DAG;
no single canonical path (I-9). Placement can change without changing identity.
- **Content equivalence — *detecting sameness*, never identity.** A **content fingerprint** (or
span-set overlap) identifies a *version / a piece of content*, used to detect that two
*distinct identities* hold the same or derived content (the equivalence/chorus mechanism,
§8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, **same
identity**; two pages, identical content → same fingerprint, **different identities**.
- **Span addressing** — a sub-page address within an identity: adopt native span IDs where
minted (Roam `:block/uid`, Logseq `id::`, Notion/CRDT UUID); else a *position* address
(path+range) or a *content-fingerprint* address for equivalence/transclusion. The Xanadu
tumbler is the portable ideal the scheme aims at without requiring.
- **Provenance envelope** rides on pages and spans (see §7.3 for its layered, low-cost form).
So the chain is: **identity (stable) → placements (N, mutable) → equivalence (cross-identity
sameness, fingerprint-based)** — three concepts, three mechanisms, never conflated.
---
## 8. Coordination, federation & projection
### 8.1 Coordination journal (L3) — Git as the spine
Every information space has a Git-backed coordination journal (I-6). It records cross-shard
operations (fork, import, reconcile, overlay-apply, space-branch) and **is** the history floor
(I-10). For git-IS-store shards the shard's own git log *is* this journal; for non-git shards
the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports
(backfill open file history). History portability is a spectrum, handled per profile (axis 5).
### 8.2 Overlay / patch engine (L3)
The default write path for anything below write-through capability (I-5): an edit becomes a
draft → patch/commit → MR, applied destructively only on explicit intent and only where the
profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend
still be *edited* safely.
### 8.3 Federation is plural & composable (L3) — the model taxonomy
Federation is not one mechanism. shard-wiki selects a **federation model per space and
composes per shard** (mechanism over policy, I-7):
| Model | Anchor | Coordination shape |
|-------|--------|--------------------|
| **Fork + journal** (default home case) | Federated Wiki | copy-with-provenance + per-page action journal (story = replay) |
| **VCS-replication + ping** | ikiwiki | git clone/pull/push + change-ping |
| **Query-time graph-join** | Wikibase SPARQL `SERVICE` | join remote graphs at query time, no copy |
| **Feed aggregation** | RSS/Atom | inbound feed → pages |
| **Activity streams** | ActivityPub | Create/Update events, notify or content-bearing |
| **Engine-mirror** | Wiki.js DB↔Git | engine syncs its own store to a git mirror |
### 8.4 Union & projection (L4) — the derived cache
This whole layer is **derived-disposable**: recomputable from canonical state — sharded
content + the **coordination-canonical** inputs in the journal (I-2). Crucially, the *automatic*
equivalence results are derived, but the **human/curatorial inputs they consume — alias tables
and curator equivalence bindings — are coordination-canonical (they live in the journal), not
derived**; recompute reads them, never regenerates them. It comprises:
- **Identity resolution & equivalence** — detect "same topic / derived content" path-
independently from *derived* signals (content fingerprint, span-set overlap) **plus** the
*coordination-canonical* inputs (alias table, curator binding); present as
**chorus-of-voices** or designated-canonical (a *policy* preset). (Scaling: §8.7.)
- **Union graph** — the navigable join of pages, links, and dimensions (namespace, genealogy,
version, shard, equivalence). A *derived lens over canonical files+journal, never a new
store* (the ZigZag boundary).
- **Transclusion** — one **reference-not-copy** primitive unifying Xanadu transclusion, ZigZag
clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate
named-chunk assembly, over the addressable union.
- **Projection — the two-axis model:**
- *Kind:* **replication-projection** (lazy cache of remote content — the default) vs
**derivation-projection** (transform/compile/weave/evaluate a source).
- *Liveness:* static → captured snapshot → live-over-files → view-time → irreducibly-live.
- Derivation facets: materialization timing (ahead-of-time vs view-time), multiplicity (one
output vs N co-equal), continuity (one-shot vs continuous). Every projection declares its
liveness + freshness + provenance; the irreducibly-live far end has no faithful static
form (source + a marked recording).
- **Moldable view registry** — projection generalises to an **open, type-keyed set of
co-equal, possibly-computed views, none canonical-by-fact** (display-canonical is policy).
This unifies replication/derivation/dimensional/query projection and answers the "pluggable
content-type registry" question (GT prior art).
- **Derived query index** — delegate to a shard's native query engine where present
(Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived
index over the projection (the Logseq DataScript-over-files pattern). The index is
disposable (I-2).
### 8.5 Computational / executable content — the scope decision
**In scope as a page-model + projection concern; out of scope as an execution platform.**
shard-wiki *recognises* computational types, attaches the **canonical source**, and presents
derived forms as **provenance- and liveness-marked projections**. Driving a derivation
(tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a **gated
capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot**.
One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed")
serves notebooks, renders, and recordings alike. **No INTENT amendment is required** — this
lives inside the existing page model (L2) and projection model (L4).
### 8.6 Consistency, concurrency & conflict model
INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not
the same as "no defined consistency." This is the guarantee shard-wiki *does* offer, and the
mechanism (not policy) that makes concurrent editing safe (review bug B-2).
**The consistency guarantee — causal, anchored on the journal:**
- **Read-your-writes for coordination-canonical state.** Once an overlay/binding/merge is
committed to the journal, this client always sees it (the journal is the client's own causal
spine). This is a *strong* local guarantee, cheap because the journal is local Git.
- **Causal consistency across the derived tier.** The union/index/projections reflect a causal
cut of `(sharded inputs seen so far, journal)`. Effects never appear before their causes; a
projection that has seen journal commit *C* has seen everything *C* depends on.
- **Eventual convergence for sharded-canonical inputs.** Remote shard content is pulled
asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest
*as observed*, bounded by the shard's operational envelope. Freshness is always *shown*
(provenance envelope), never faked — a stale projection is labelled stale, not wrong.
So: **strong + read-your-writes for what shard-wiki owns (the journal); causal for what it
derives; eventual + freshness-labelled for what shards own.** No global clock, no distributed
transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates
rather than controls.
**Conflict detection & representation is core mechanism; only resolution is policy (I-7).**
The split the earlier draft elided:
- **Detection (core).** Divergence is detected structurally: two identities resolve as
equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no
longer matches the shard's current revision. Detection is always on; it is never optional.
- **Representation (core).** A detected conflict is **first-class data in the union**, not an
error: equivalent-but-divergent pages are presented as a **coexisting set** (the
chorus/keep-both representation), each fully attributed, with the divergence recorded in the
provenance envelope (union without erasure — a conflict is information, not a failure).
- **Resolution (policy).** *Which* version wins, or whether they stay coexisting, is a
configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge /
overlay-only. Core never hard-codes one.
**Overlay-apply under source drift (the concurrent-write case).** An overlay carries the
**base revision** of the shard content it was authored against. On apply, core compares base to
the shard's current revision:
- *unchanged* → apply (fast-forward), commit to journal, propagate if the profile permits;
- *changed, non-overlapping* → three-way merge where the merge capability allows (axis 6),
else keep-both;
- *changed, overlapping* → **refuse + re-present** as a conflict (above); never silently
clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination-
canonical and valid against its base.
**Ordering.** The journal commit is the ordering authority for coordination-canonical effects;
a shard-native write is only *acknowledged* in the journal after the adapter confirms it, so a
crash between journal-intent and shard-write is recoverable (the intent is replayable, the
write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their
journal commits, giving the causal cut above.
**Residual open items** (tracked in *Known scaling risks & open problems*, §12, not pretended
solved): the exact convergence bound for
high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a
vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.
---
## 9. Cross-cut — Authorization (L5)
Fully specified in **`ArchitectureBlueprint.md`** (the access & history sub-blueprint);
summarised here for completeness:
- **One core, a ladder of modes** L0 (open/c2, zero deps) → L1 (attributed) → L2
(authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration,
not re-architecture.
- **PEP** wraps every adapter op; **PDP** decides `(principal, action, target)` over actions
`read/write/patch/merge/administer`, layered on the adapter's capability profile (a shard
that can't write can't be written regardless of policy — L5 consults the L1 rail).
- **Authentication delegated** to a pluggable IdentityProvider (null provider = L0 default);
real identity from `user-engine` over `net-kingdom` IAM.
- **Fail open only at L0, fail closed at L2+.** Authorization is pure/offline once a Principal
is resolved. Provenance carries authz context so the union never leaks unreadable content
(the L5↔provenance-rail interaction).
---
## 10. The policy surface (mechanism over policy, made concrete)
I-7 only means something if the policy knobs are enumerated and kept *out* of core algorithms.
The configurable presets are:
- **Canonical-source policy** — chorus / designated-canonical / git-merge / overlay-only /
vote-to-merge (per space or per equivalence set).
- **Federation model** — the §8.3 taxonomy, per space, composable per shard.
- **Shard mode** — read-only / write-through / mirrored / projected / cached / canonical
(constrained by the capability profile).
- **Reconciliation cadence & conflict exposure** — push/poll/manual; show-conflicts vs
auto-merge-when-supported.
- **Execution policy** — derive/execute off (default) / sandboxed / per-shard-allowed.
- **Authorization mode** — the L0L4 ladder.
- **Projection materialization** — lazy/eager; snapshot vs view-time; recording retention.
Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; overlay-before-
mutation; execution off) and never hard-codes any of the above.
---
## 11. Concrete module structure (bridge to implementation)
A proposed package layout for `src/shard_wiki/`, mapping 1:1 to the layers so the dependency
rule (downward only; L4 rebuildable) is enforceable by import lint:
```
src/shard_wiki/
model/ # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
# Span, the page-shape types; capability-spectrum value types
adapters/ # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
# attachment-mode binding; concrete adapters:
git/ folder/ gitea/ obsidian/ webdav/ notion/ … # each: profile + verbs
coordination/ # L3: GitJournal, OverlayEngine (draft→patch→MR), reconcile
federation/ # L3: FederationModel strategies (fork_journal, vcs_ping,
# graph_join, feed, activitypub, engine_mirror)
union/ # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
# Transclusion (reference-not-copy)
projection/ # L4 (derived): ReplicationProjection, DerivationProjection,
# ViewRegistry (moldable), QueryIndex (delegate|derive)
authz/ # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
provenance/ # cross-cut: the envelope plumbing used by every layer
api/ # L6: orchestrator API (server-side union for agents/CLI)
```
Hard import rules: `union/` and `projection/` may import `model/`, `adapters/`,
`coordination/` but **nothing may import them** (they are the disposable middle). `model/` and
`adapters/` import nothing else in the tree except `provenance/` (the waists stay thin).
---
## 12. Canonical data flows (the architecture exercised)
**A. Attach a shard.** Adapter binds (chosen attachment mode) → probes/declares a capability
profile → core registers the shard under a root entity → if not git-native, the coordination
journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).
**B. Read a page through the union.** Consumer asks the union for an identity → Identity
resolver maps it to placements across shards → equivalence yields chorus or canonical →
replication-projection lazily fetches from each shard (cache + freshness) → page returned
wrapped in its provenance envelope → L5 filters anything the principal can't see at source.
**C. Edit a read-only / limited shard.** Write request → L5 PDP allows → capability profile
says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native
syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal
and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully
attributed.
**D. Attach a computational notebook.** Adapter declares profile (attachment=file-store,
opacity=mixed, computational=captured-output). Core attaches the `.ipynb` **source** as
canonical; presents cells + embedded outputs as **derivation-projection snapshots** marked
"run N, env unguaranteed"; offers a static render via the view registry; re-execution stays
gated off. History uses paired-text/nbdime per axis 5.
---
## 13. Key tradeoffs & decisions to confirm
Resolved here:
- **Capability spectra over a verb checklist** — accept richer contract complexity for precise,
uniform degradation. (Decided: spectra.)
- **Derived middle is a cache, not a store** — accept recompute cost for rebuildability,
provenance, and graceful degradation. (Decided: cache.)
- **Default federation = fork+journal over Git** — the home case; other models opt-in. (Decided.)
- **Execution off by default** — recognise+project always; execute only when gated on. (Decided.)
Open — to confirm before SHARD-WP-0002 spec-writing finalises:
1. **Union graph persistence.** Pure-recompute (simplest, honours I-2 hardest) vs a persisted-
but-disposable cache (faster, must guarantee rebuild equivalence). *Recommendation:
persisted-but-disposable with a `rebuild` that must reproduce it byte-for-byte.*
2. **Address scheme.** Ship shard-scoped native-id wrapping now and treat a portable tumbler as
a later capability, or design the tumbler up front? *Recommendation: wrap native ids now.*
3. **L1 "attributed-but-open" mode** — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
4. **Per-page ACL default** — off (per-shard/namespace) confirmed; revisit only if demand appears.
---
## 14. What this architecture is *not*
- Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
- Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
- Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
- Not an execution platform — computation is recognised and projected, not hosted.
- Not a universal ontology — no single schema is imposed on all shards.
- Not an authentication/identity store — that is delegated (authorization is owned).
---
## 15. Traceability
- **INTENT** — every invariant in §2 cites an INTENT principle or boundary; no invariant
contradicts the Stability Note.
- **Research** — §6 (spectra) ← `260614-shard-spectrum-synthesis` v3; §8.3 (federation
taxonomy) ← v3 §2.5; §8.4§8.5 (two-axis projection, view registry, computational scope) ←
`260614-computational-page-model-synthesis`; §7 page shapes ← the engine + modern-tool +
computational dives; §1 thesis ← the files-canonical/index-derived through-line across
Logseq/ikiwiki/GT/Pharo/Jupyter.
- **Use cases** — the architecture is sized to UC-01UC-84: federation/coordination (UC-0107,
2633, 56, 7172, 79) → §8; attachment/adapter (UC-3443, 50, 53, 57, 6062, 6466, 6870,
7682) → §6; page model & fidelity (UC-34, 39, 42, 55, 5859, 67, 73, 80, 8384) → §7/§8.5;
addressing/identity/query (UC-32, 4449, 5152, 54, 63, 74) → §7.2/§8.4; provenance &
metadata (UC-2425, 75) → the provenance rail; collaboration & discovery (UC-0823) → L6
consumers over the union.
- **Workplans** — §6§8 are the design target of `SHARD-WP-0002` (T11T18); §9 is owned by
`ArchitectureBlueprint.md`; §1 (yawex-derived resolution/overlay) aligns with
`SHARD-WP-0001`.
---
## 16. Stability note
This document defines shard-wiki's **internal** architecture; it may evolve as the spec
workplans land. But the **thesis (§1)**, the **invariants (§2)**, and the **dual narrow waist
(§1, §6, §7)** are load-bearing — changing any of them is an architectural change in the sense
of INTENT's Stability Note and should be rare and deliberate.