Fixes D-1. ~6 independent core axes (substrate, write-granularity, opacity, envelope, access, liveness) with the rest implied via published rules that forbid impossible profiles; a small named axis-interaction table is the degradation contract (proof obligation behind 'core logic written once'). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
46 KiB
CoreArchitectureBlueprint — shard-wiki
Status: draft for review · Date: 2026-06-15 · Owner: tegwick
The whole-system architecture for shard-wiki, synthesised from INTENT.md, the 84-entry
UseCaseCatalog.md, and the full research arc (research/260608-*, research/260613-*,
research/260614-* — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the
core blueprint: it defines the layers, the abstractions, and the load-bearing decisions
that everything else implements.
Scope relationship to the other specs:
ArchitectureBlueprint.md(existing) is the authorization & history sub-blueprint (the L0–L4 ladder). This document references it as the design of the cross-cutting authorization layer (§9) and does not restate it.SHARD-WP-0002is the workplan that turns §6–§8 intospec/FederationArchitecture.md+ the adapter-contract section ofspec/TechnicalSpecificationDocument.md.UseCaseCatalog.mdis the demand this architecture must satisfy; UC references below are load tests, not decoration.
1. The thesis: canonical vs derived (three states)
Everything in shard-wiki follows from one organising decision — that state comes in exactly three kinds, and only one of them is disposable:
1. Sharded-canonical — content owned by each shard (shard sovereignty). 2. Coordination-canonical — durable state born inside shard-wiki that encodes human or cross-shard decisions and exists nowhere else: overlays (the local truth against a read-only shard), curator equivalence bindings, alias tables, merge/reconciliation decisions. It lives in the Git coordination journal. 3. Derived-disposable — everything shard-wiki computes from (1)+(2): the union graph, equivalence index, query indexes, projections, views. It can be deleted and recomputed.
Canonical = sharded ∪ coordination. Derived = a pure function of canonical:
derived = f(sharded, coordination).
This is the architectural form of "orchestrator, not engine." shard-wiki never becomes the
source of truth; it composes sources and records the decisions it makes about them. The
research earned the files-canonical half empirically — every serious system externalises its
durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain
files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live
views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams
(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk
image — is exactly the one we record as a boundary, not a backend
(research/260614-squeak-pharo-deep-dive).
The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges,
derived in the middle"). That was wrong by omission: it had no home for coordination-
canonical state, and so contradicted itself by listing curator bindings and alias tables as
"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model
fixes that crack (review finding A-1) and makes derived = f(canonical) literally true.
Three consequences fall straight out, and they are the spine of the rest of this document:
- Graceful degradation is free. If the derived tier is always recomputable, a backend that can only be read is still a first-class participant — you just derive less from it.
- Provenance is tractable. Because shard-wiki never claims to be the source, every derived artifact can always point back to the canonical input it came from (union without erasure is a structural property, not a feature bolted on).
- The derived tier is a pure function of canonical state.
derived = f(sharded, coordination). Bugs in the derived tier are recoverable by recompute; only the two canonical tiers must be durably protected — sharded by each shard, coordination by the Git journal (history). Recomputability is a correctness property of the derived tier, not a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the operational-envelope axis.
The dual narrow waist
Heterogeneity is mediated at exactly two interfaces, and nowhere else:
- Bottom waist — the Shard Adapter Contract (§6). Every backend, however weird, enters through one versioned, capability-described interface.
- Top waist — the Wiki Page Model (§7). Every consumer, however demanding, sees one backend-neutral, Markdown-first-but-stretchable page model.
Between the waists, core logic is written once against capabilities and the page model — never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an adapter and declaring a capability profile, not editing core algorithms.
2. Architectural invariants
These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's principles fused with the research through-lines.
| # | Invariant | Source |
|---|---|---|
| I-1 | Orchestrator, not engine. Core composes shards; it never replaces or homogenises them. | INTENT Stability Note |
| I-2 | Three states; derived = f(canonical). State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line |
| I-3 | Capability-awareness is data. A binding's abilities are a profile (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters |
| I-4 | Union without erasure. Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) |
| I-5 | Overlay before mutation. Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT |
| I-6 | Git-addressable coordination. Every information space has a Git-backed journal even when its shards are not git-native. | INTENT |
| I-7 | Mechanism over policy. Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. | INTENT |
| I-8 | Graceful degradation. A limited backend is still usable as read-only / cache / projection / backup / patch target. | INTENT |
| I-9 | Identity ≠ placement. A page is an entity that may occupy N locations; address by identity, not by path. | Trilium note/branch; ZigZag |
| I-10 | History is the floor. Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. | ArchitectureBlueprint §2 |
| I-11 | Authorization in core, authentication delegated. Core decides who-may; an external provider says who-is. | INTENT; ArchitectureBlueprint |
| I-12 | Not a file-sync daemon; not an execution platform. Sync is wiki-page-semantic; computation is recognised+projected, not hosted. | INTENT; computational-page-model synthesis |
3. The layered architecture
┌───────────────────────────────────────────────────────────────┐
│ L6 Consumers — Orchestrator API · CLI/agents · Web/Obsidian │
├───────────────────────────────────────────────────────────────┤
X-cut │ L5 Authorization (PEP/PDP, identity-provider iface) → │ X-cut
Prove- │ see ArchitectureBlueprint.md (L0–L4 ladder) │ Capa-
nance ├───────────────────────────────────────────────────────────────┤ bility
▲ │ L4 Union & Projection (DERIVED, rebuildable cache) │ ▲
│ │ identity resolution · equivalence/chorus · union graph · │ │
│ │ replication+derivation projections · moldable view registry│ │
│ │ · derived query index │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L3 Coordination (Git journal · overlay/patch engine · │ │
│ │ federation-model strategies · reconciliation) │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L2 Wiki Page Model ── TOP WAIST ── │ │
│ │ backend-neutral pages · identity≠placement · span address ·│ │
│ │ provenance envelope · the page shapes │ │
│ ├───────────────────────────────────────────────────────────────┤ │
│ │ L1 Shard Adapter Contract ── BOTTOM WAIST ── │ │
│ │ versioned iface · capability profile (15 spectra) · │ │
│ │ attachment-mode binding · operation verbs │ │
└──── ├───────────────────────────────────────────────────────────────┤ ──┘
│ L0 Backends (not ours): git repos, wiki/ subdirs, Gitea/ │
│ GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion, │
│ Coulomb spaces, notebooks, … │
└───────────────────────────────────────────────────────────────┘
Provenance and Capability are drawn as vertical rails because they are not layers — they are present at every layer. A page object at L2 carries provenance; a projection at L4 carries provenance; an authz decision at L5 records the context under which content was read. Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4 (can we delegate a query?), and L5 (can this principal even reach the op?).
The dependency rule is strict and downward, and it tracks the three states (§1), not the layer numbers: the derived-disposable tier (the whole of L4) may be deleted and recomputed from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal). Nothing canonical may depend on derived state. Note the journal at L3 is canonical (it holds overlays, bindings, aliases, merges); only L4 is disposable.
4. Core abstractions (the vocabulary code must use)
Straight from INTENT, sharpened by research. New code maps onto these; it does not invent parallel terms.
- Shard — an independently meaningful page store attached to a root entity, with sovereignty: its own backend, capability profile, history, identity model, limits.
- Root entity / information space — the joined space shards attach to; the unit of Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
- Shard adapter contract — the versioned L1 interface; the bottom waist.
- Capability profile — a shard binding's position on each of the 15 spectra (§6) plus its supported verbs. The data structure that drives degradation.
- Wiki page model — the L2 backend-neutral page; the top waist.
- Page identity vs placement — a page is an entity (identity); it may have N placements (paths/shards). Addressing, equivalence, and transclusion key on identity (I-9).
- Provenance envelope — the metadata wrapper every artifact carries: source shard, freshness, liveness, authorization context, overlay status, divergence, derivation lineage.
- Coordination journal — the L3 Git-backed record of change flows for a space, and the durable home of all coordination-canonical state (§1): overlays, curator equivalence bindings, alias tables, merge/reconciliation decisions. This state is born inside shard-wiki, exists nowhere else, and is not derived — it must be committed, never recomputed.
- Overlay — a non-destructive local edit against a remote/read-only/limited shard, representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an unapplied overlay is the local truth and lives in the journal.
- Projection — a derived view of shard content, typed on two axes (§8): kind (replication | derivation) × liveness (static … irreducibly-live).
- Federation model — the selected coordination strategy for a space (§ taxonomy, T17).
- Shard mode — read-only · write-through · mirrored · projected · cached · canonical (a policy selection constrained by the capability profile).
5. Why "layered" and not "pipeline" or "plugin-bus"
Two rejected alternatives, recorded so the choice is legible:
- A sync pipeline (source → transform → sink) was rejected: it implies a privileged direction and a canonical sink, which violates shard sovereignty (I-1) and union-without- erasure (I-4). shard-wiki is a star (many shards ↔ one space), not a pipe.
- A flat plugin bus (every backend a peer plugin emitting events) was rejected as the top-level shape: it has no narrow waist, so heterogeneity leaks into every consumer. We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies), behind the waists.
The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.
6. Bottom waist — the Shard Adapter Contract (L1)
The single most important design decision in the project: the adapter contract models
positions on capability spectra, not a flat checklist of boolean verbs. A backend is not
"can/can't merge"; it sits somewhere on the merge spectrum, and federation operations
degrade by position. This is the lesson of putting ~23 systems in one matrix
(research/260614-shard-spectrum-synthesis, v3).
6.1 The fifteen capability spectra
Each binding declares a position on each axis. Core algorithms read these positions; there is no per-backend code in core (I-3).
- Addressing granularity — none → path → page-level store-id → in-file span → in-file
block id (Logseq
id::) → store-UUID → portable tumbler (Xanadu, the unreached ideal) - Content identity — none → path/title → fingerprint → span-set
- Identity vs placement — path=identity → identity separated from placement (Trilium note/branch = a DAG)
- Structure — flat MD → frontmatter/
key::→%META%→ typed objects → DB schema+ relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements - History — none → internal-only / CRDT-log → open-file → git-native
- Merge model — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
- Native query — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
- Translation — native → lossless → lossy-with-fidelity-report (incl. HTML)
- Attachment mode — file-store (native | interchange-mirror) → git-IS-store → in-engine-host → local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
- Operational envelope — local/unbounded → realtime CRDT/WebSocket → rate-limited/ eventually-consistent/paginated
- Access grant — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
- Content opacity — plaintext → structured re-evaluable value → encrypted whole-shard → per-item → proprietary-lossy-exportable
- Write granularity — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
- Provenance granularity — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
- Computational / liveness — static source → captured-output snapshot → live-over-files → view-time render → irreducibly-live/temporal
6.2 Operation verbs
read, write, diff, merge, lock, version, publish, notify, transclude-source, translate-syntax, structured-payload, derive-projection, execute/evaluate. The last two are
gated, off by default (§8, computational content). Verb support is part of the profile and
must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).
6.3 Attachment-mode taxonomy (axis 9, expanded)
A backend may offer several modes; attach mode is a per-binding, capability-gated choice, with one declared authoritative. Modes: file-store (native vault/folder or an interchange/sync mirror), git-IS-store (the home case — forge wikis & ikiwiki: git is the store and the journal at once, resolving the engine-mirror write-race), in-engine hosted adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model), CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. Boundary: a monolithic live-memory blob (Smalltalk image, a kernel) is never an attach target — it participates only via export→files (I-12).
6.4 Contract rules
- Versioned interface (Foswiki::Store + Foswiki::Meta is the proof that a stable store-interface-with-swappable-backends works). Capability discovery is a static profile with optional runtime negotiation.
- Backend-swap tolerance — shard identity/provenance survives a substrate change (RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to capabilities, not to "it's files."
- Absence is first-class — the profile must express can't cleanly (Oddmuse floor), so degradation paths are explicit, never guessed.
6.5 Orthogonal core, implied positions, and the interaction subset
Fifteen independent ordinal axes is descriptively right but would be operationally a mess if treated as fifteen free dimensions: the axes are not orthogonal, and a degradation function over all 15 jointly is the flat-checklist problem returning in higher dimensions (review D-1). Three rules tame it.
(a) A small orthogonal core; the rest are implied. Most axes are correlated and collapse to a few independent choices. The core axes an adapter must independently declare:
- Substrate → drives attachment-mode, history, merge, and native-query positions together (git-IS-store ⟹ history=git-native ⟹ merge=git/text ⟹ query=build-your-own-index; relational-DB ⟹ direct-DB attach ⟹ DB-version-row history ⟹ DB query).
- Write granularity → drives addressing granularity and the overlay/patch shape.
- Content opacity → drives translation and (where encrypted) collapses native-query.
- Operational envelope → drives freshness mode (§8.8) and rebuild expectations (§8.7).
- Access grant → independent (authz, L5).
- Computational/liveness → independent (projection kind, §8.5).
The remaining axes are implied/derived from these via published implication rules; an adapter may override an implied position, but the default is computed, not hand-set. This turns ~15 free dimensions into ~6 independent ones plus derivations — fewer things to get wrong, and impossible combinations become unrepresentable.
(b) Implication rules forbid impossible profiles. E.g. attachment=git-IS-store ⟹ history≥git-native; opacity=encrypted-whole-shard ⟹ native-query=none ∧ translation≤opaque;
merge=native-CRDT ⟹ history=CRDT-log ∧ envelope=realtime. A profile that violates an
implication is rejected at registration — capability-as-data (I-3) with integrity constraints.
(c) The degradation function reads a named, small interaction subset — not all pairs. "No per-backend code" is only credible if we say which axis interactions the generic logic actually consults. They are:
| Operation | Axes consulted (jointly) |
|---|---|
| write / overlay-apply | write-granularity × merge-model × history × access-grant |
| transclude / address a span | addressing-granularity × write-granularity × identity-vs-placement |
| project / cache | operational-envelope × computational-liveness × content-opacity |
| query | native-query × content-opacity (encrypted ⇒ derive-index-or-none) |
| translate | translation × content-opacity × structure |
| federate | substrate × history × merge (per the §8.3 model) |
Everything else is a single-axis check. This table is the degradation contract: it is small, enumerated, and testable — the proof obligation behind "core logic written once."
7. Top waist — the Wiki Page Model (L2)
Backend-neutral, Markdown-first but stretchable many ways at once. The page model is the lingua franca every consumer sees; an adapter's job is to project its backend into this model (read) and accept overlays back (write), within its capabilities.
7.1 Page shapes the model must carry
- Prose Markdown — the baseline.
- Typed / computed records — frontmatter/
%META%/XObjects/Notion DB rows; computed metadata (Trilium inherited+templated) represented as effective-vs-own with per-attribute provenance. - Typed-graph statements — Wikibase claim + qualifiers + references + rank (structure far-end).
- Inline-embedded objects — Quip/Notion spreadsheets & live apps inside prose.
- Non-Markdown assets — drawings, canvases, images: typed asset / opaque blob / pluggable content-type registry, never silent-flattened.
- The four computational shapes (§8): one-source-many-projections, notebook (embedded computed output), program-as-page, live/temporal.
All shapes reduce to a common skeleton: (content | source, structure, provenance envelope, [derivation rule]). The page model stores the richest faithful form as canonical and treats
any Markdown rendering of a non-Markdown shape as a lossy projection (I-4 + fidelity report).
7.2 Identity, placement, addressing — three distinct concepts
The earlier draft used "identity" for two different things and (worse) suggested deriving page identity from a content fingerprint — which would make editing a page change its identity and break every reference to it (review bug B-1). They are pulled apart here:
- Page identity — a stable handle. A shard-scoped, durable key that survives edits: the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and never collides across shards. Identity is assigned/minted, not computed from content. References, placement, transclusion targets, and overlays all key on identity.
- Placement — where an identity sits. One identity → N placements (paths/shards) = a DAG; no single canonical path (I-9). Placement can change without changing identity.
- Content equivalence — detecting sameness, never identity. A content fingerprint (or span-set overlap) identifies a version / a piece of content, used to detect that two distinct identities hold the same or derived content (the equivalence/chorus mechanism, §8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, same identity; two pages, identical content → same fingerprint, different identities.
- Span addressing — a sub-page address within an identity: adopt native span IDs where
minted (Roam
:block/uid, Logseqid::, Notion/CRDT UUID); else a position address (path+range) or a content-fingerprint address for equivalence/transclusion. The Xanadu tumbler is the portable ideal the scheme aims at without requiring. - Provenance envelope rides on pages and spans (see §7.3 for its layered, low-cost form).
So the chain is: identity (stable) → placements (N, mutable) → equivalence (cross-identity sameness, fingerprint-based) — three concepts, three mechanisms, never conflated.
8. Coordination, federation & projection
8.1 Coordination journal (L3) — Git as the spine
Every information space has a Git-backed coordination journal (I-6). It records cross-shard operations (fork, import, reconcile, overlay-apply, space-branch) and is the history floor (I-10). For git-IS-store shards the shard's own git log is this journal; for non-git shards the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports (backfill open file history). History portability is a spectrum, handled per profile (axis 5).
8.2 Overlay / patch engine (L3)
The default write path for anything below write-through capability (I-5): an edit becomes a draft → patch/commit → MR, applied destructively only on explicit intent and only where the profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend still be edited safely.
8.3 Federation is plural & composable (L3) — the model taxonomy
Federation is not one mechanism. shard-wiki selects a federation model per space and composes per shard (mechanism over policy, I-7):
| Model | Anchor | Coordination shape |
|---|---|---|
| Fork + journal (default home case) | Federated Wiki | copy-with-provenance + per-page action journal (story = replay) |
| VCS-replication + ping | ikiwiki | git clone/pull/push + change-ping |
| Query-time graph-join | Wikibase SPARQL SERVICE |
join remote graphs at query time, no copy |
| Feed aggregation | RSS/Atom | inbound feed → pages |
| Activity streams | ActivityPub | Create/Update events, notify or content-bearing |
| Engine-mirror | Wiki.js DB↔Git | engine syncs its own store to a git mirror |
8.4 Union & projection (L4) — the derived cache
This whole layer is derived-disposable: recomputable from canonical state — sharded content + the coordination-canonical inputs in the journal (I-2). Crucially, the automatic equivalence results are derived, but the human/curatorial inputs they consume — alias tables and curator equivalence bindings — are coordination-canonical (they live in the journal), not derived; recompute reads them, never regenerates them. It comprises:
- Identity resolution & equivalence — detect "same topic / derived content" path- independently from derived signals (content fingerprint, span-set overlap) plus the coordination-canonical inputs (alias table, curator binding); present as chorus-of-voices or designated-canonical (a policy preset). (Scaling: §8.7.)
- Union graph — the navigable join of pages, links, and dimensions (namespace, genealogy, version, shard, equivalence). A derived lens over canonical files+journal, never a new store (the ZigZag boundary).
- Transclusion — one reference-not-copy primitive unifying Xanadu transclusion, ZigZag clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate named-chunk assembly, over the addressable union.
- Projection — the two-axis model:
- Kind: replication-projection (lazy cache of remote content — the default) vs derivation-projection (transform/compile/weave/evaluate a source).
- Liveness: static → captured snapshot → live-over-files → view-time → irreducibly-live.
- Derivation facets: materialization timing (ahead-of-time vs view-time), multiplicity (one output vs N co-equal), continuity (one-shot vs continuous). Every projection declares its liveness + freshness + provenance; the irreducibly-live far end has no faithful static form (source + a marked recording).
- Moldable view registry — projection generalises to an open, type-keyed set of co-equal, possibly-computed views, none canonical-by-fact (display-canonical is policy). This unifies replication/derivation/dimensional/query projection and answers the "pluggable content-type registry" question (GT prior art).
- Derived query index — delegate to a shard's native query engine where present (Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived index over the projection (the Logseq DataScript-over-files pattern). The index is disposable (I-2).
8.5 Computational / executable content — the scope decision
In scope as a page-model + projection concern; out of scope as an execution platform. shard-wiki recognises computational types, attaches the canonical source, and presents derived forms as provenance- and liveness-marked projections. Driving a derivation (tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a gated capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot. One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed") serves notebooks, renders, and recordings alike. No INTENT amendment is required — this lives inside the existing page model (L2) and projection model (L4).
8.6 Consistency, concurrency & conflict model
INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not the same as "no defined consistency." This is the guarantee shard-wiki does offer, and the mechanism (not policy) that makes concurrent editing safe (review bug B-2).
The consistency guarantee — causal, anchored on the journal:
- Read-your-writes for coordination-canonical state. Once an overlay/binding/merge is committed to the journal, this client always sees it (the journal is the client's own causal spine). This is a strong local guarantee, cheap because the journal is local Git.
- Causal consistency across the derived tier. The union/index/projections reflect a causal
cut of
(sharded inputs seen so far, journal). Effects never appear before their causes; a projection that has seen journal commit C has seen everything C depends on. - Eventual convergence for sharded-canonical inputs. Remote shard content is pulled asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest as observed, bounded by the shard's operational envelope. Freshness is always shown (provenance envelope), never faked — a stale projection is labelled stale, not wrong.
So: strong + read-your-writes for what shard-wiki owns (the journal); causal for what it derives; eventual + freshness-labelled for what shards own. No global clock, no distributed transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates rather than controls.
Conflict detection & representation is core mechanism; only resolution is policy (I-7). The split the earlier draft elided:
- Detection (core). Divergence is detected structurally: two identities resolve as equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no longer matches the shard's current revision. Detection is always on; it is never optional.
- Representation (core). A detected conflict is first-class data in the union, not an error: equivalent-but-divergent pages are presented as a coexisting set (the chorus/keep-both representation), each fully attributed, with the divergence recorded in the provenance envelope (union without erasure — a conflict is information, not a failure).
- Resolution (policy). Which version wins, or whether they stay coexisting, is a configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge / overlay-only. Core never hard-codes one.
Overlay-apply under source drift (the concurrent-write case). An overlay carries the base revision of the shard content it was authored against. On apply, core compares base to the shard's current revision:
- unchanged → apply (fast-forward), commit to journal, propagate if the profile permits;
- changed, non-overlapping → three-way merge where the merge capability allows (axis 6), else keep-both;
- changed, overlapping → refuse + re-present as a conflict (above); never silently clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination- canonical and valid against its base.
Ordering. The journal commit is the ordering authority for coordination-canonical effects; a shard-native write is only acknowledged in the journal after the adapter confirms it, so a crash between journal-intent and shard-write is recoverable (the intent is replayable, the write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their journal commits, giving the causal cut above.
Residual open items (tracked in Known scaling risks & open problems, §12, not pretended solved): the exact convergence bound for high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.
8.7 Scaling the union — incremental-first, rebuild as fallback
The derived tier is recomputable (I-2) but recompute must never be the operational mechanism. A from-scratch rebuild reads every page of every shard — including rate-limited, paginated external APIs (Notion) and irreducibly-live sources — which can take hours to days and directly fights the operational-envelope axis (review C-2). So:
Incremental, change-driven maintenance is the primary mechanism. Each shard's notify
capability (or a poll/ETag fallback where it has none, §8.8) emits change events; an event
drives a delta update to exactly the affected union nodes, equivalence candidates, indexes,
and projections. The derived tier is a continuously-maintained materialised view, not a
periodically-recomputed one. Steady-state cost is O(changes), not O(corpus).
Full rebuild is a rare, bounded fallback — for cold start, schema/algorithm change, or suspected corruption — and it is explicitly not required to be cheap. It respects each shard's envelope (it may be slow, throttled, or resumable for a rate-limited shard) and runs concurrently with serving the existing derived tier; it swaps in atomically on completion. I-2 guarantees rebuild is possible and correct, not instant.
Equivalence detection is indexed, not pairwise (review C-1). Naive fingerprint/span-set comparison across all pages of all shards is O(N²) and is forbidden. Instead:
- Blocking / candidate generation — cheap keys bucket pages that could be equivalent: normalised title, normalised path tail, explicit alias-table entries (coordination- canonical), and MinHash/LSH bands over content shingles for near-duplicate and derived-content detection. Only within-bucket pairs are considered — turning O(N²) into ≈O(N) candidates.
- Verification — candidate pairs are confirmed by full fingerprint / span-set overlap and any curator binding. Confirmed equivalences become union edges.
- Incremental maintenance — a changed page is re-bucketed and only its new candidate set is re-verified; equivalence is maintained per-change, never recomputed globally.
The index is itself derived (disposable, recomputable) and per-tenant-partitioned (§9). Its parameters (LSH band/row counts, shingle size, precision/recall) are tunable; the accepted false-negative rate of blocking is a known, tracked limitation (§12) — blocking trades a small miss rate for tractability, and curator bindings are the escape hatch for misses.
8.8 Cache freshness & invalidation
Replication-projection caches remote shard content; cache invalidation is the actual hard part
and was missing from the first draft (review C-2). The protocol is per-binding, driven by the
capability profile, with one rule: freshness is always represented, never assumed — every
cached page's provenance envelope carries (observed-at, source-rev-if-known, staleness-state),
so a consumer can always tell live from stale.
Three invalidation modes, chosen by capability, not hard-coded:
| Mode | When | Mechanism |
|---|---|---|
| Event-driven (push) | shard has notify |
a change event invalidates exactly the affected entries and enqueues a delta refresh (§8.7); the preferred mode |
| Validator poll | shard exposes ETag / Last-Modified / rev | conditional fetch (If-None-Match); cheap "still fresh?" checks without transferring bodies |
| TTL | shard offers neither | time-bounded staleness; the floor mode (Oddmuse-class shards) |
Most real bindings are hybrid: event-driven for invalidation + a long TTL as a safety net for missed events + validator polls on read when an entry is past a soft age.
Operational-envelope coupling. The mode is constrained by axis-10: a rate-limited shard (Notion) must favour event-driven + long TTL and must not poll per-read — the freshness policy is capability-gated like everything else. A local file shard can watch the filesystem (near-instant invalidation, effectively event-driven for free).
Thundering-herd / coalescing. Concurrent reads of the same stale entry trigger a single in-flight refresh (single-flight); other readers await it or are served the stale-but-labelled value per policy. Bulk invalidations (a shard-wide event) are batched and rate-shaped to the shard's envelope rather than fired as N concurrent fetches.
Staleness is a policy knob, not a correctness bug. Whether a reader gets stale-but-fast or blocks-for-fresh is a §10 preset (per space or per request); either way the envelope tells the truth about what was served. This is union-without-erasure applied to time.
9. Cross-cut — Authorization (L5)
Fully specified in ArchitectureBlueprint.md (the access & history sub-blueprint);
summarised here for completeness:
- One core, a ladder of modes L0 (open/c2, zero deps) → L1 (attributed) → L2 (authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration, not re-architecture.
- PEP wraps every adapter op; PDP decides
(principal, action, target)over actionsread/write/patch/merge/administer, layered on the adapter's capability profile (a shard that can't write can't be written regardless of policy — L5 consults the L1 rail). - Authentication delegated to a pluggable IdentityProvider (null provider = L0 default);
real identity from
user-engineovernet-kingdomIAM. - Fail open only at L0, fail closed at L2+. Authorization is pure/offline once a Principal is resolved. Provenance carries authz context so the union never leaks unreadable content (the L5↔provenance-rail interaction).
10. The policy surface (mechanism over policy, made concrete)
I-7 only means something if the policy knobs are enumerated and kept out of core algorithms. The configurable presets are:
- Canonical-source policy — chorus / designated-canonical / git-merge / overlay-only / vote-to-merge (per space or per equivalence set).
- Federation model — the §8.3 taxonomy, per space, composable per shard.
- Shard mode — read-only / write-through / mirrored / projected / cached / canonical (constrained by the capability profile).
- Reconciliation cadence & conflict exposure — push/poll/manual; show-conflicts vs auto-merge-when-supported.
- Execution policy — derive/execute off (default) / sandboxed / per-shard-allowed.
- Authorization mode — the L0–L4 ladder.
- Projection materialization — lazy/eager; snapshot vs view-time; recording retention.
Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; overlay-before- mutation; execution off) and never hard-codes any of the above.
11. Concrete module structure (bridge to implementation)
A proposed package layout for src/shard_wiki/, mapping 1:1 to the layers so the dependency
rule (downward only; L4 rebuildable) is enforceable by import lint:
src/shard_wiki/
model/ # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
# Span, the page-shape types; capability-spectrum value types
adapters/ # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
# attachment-mode binding; concrete adapters:
git/ folder/ gitea/ obsidian/ webdav/ notion/ … # each: profile + verbs
coordination/ # L3: GitJournal, OverlayEngine (draft→patch→MR), reconcile
federation/ # L3: FederationModel strategies (fork_journal, vcs_ping,
# graph_join, feed, activitypub, engine_mirror)
union/ # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
# Transclusion (reference-not-copy)
projection/ # L4 (derived): ReplicationProjection, DerivationProjection,
# ViewRegistry (moldable), QueryIndex (delegate|derive)
authz/ # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
provenance/ # cross-cut: the envelope plumbing used by every layer
api/ # L6: orchestrator API (server-side union for agents/CLI)
Hard import rules: union/ and projection/ may import model/, adapters/,
coordination/ but nothing may import them (they are the disposable middle). model/ and
adapters/ import nothing else in the tree except provenance/ (the waists stay thin).
12. Canonical data flows (the architecture exercised)
A. Attach a shard. Adapter binds (chosen attachment mode) → probes/declares a capability profile → core registers the shard under a root entity → if not git-native, the coordination journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).
B. Read a page through the union. Consumer asks the union for an identity → Identity resolver maps it to placements across shards → equivalence yields chorus or canonical → replication-projection lazily fetches from each shard (cache + freshness) → page returned wrapped in its provenance envelope → L5 filters anything the principal can't see at source.
C. Edit a read-only / limited shard. Write request → L5 PDP allows → capability profile says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully attributed.
D. Attach a computational notebook. Adapter declares profile (attachment=file-store,
opacity=mixed, computational=captured-output). Core attaches the .ipynb source as
canonical; presents cells + embedded outputs as derivation-projection snapshots marked
"run N, env unguaranteed"; offers a static render via the view registry; re-execution stays
gated off. History uses paired-text/nbdime per axis 5.
13. Key tradeoffs & decisions to confirm
Resolved here:
- Capability spectra over a verb checklist — accept richer contract complexity for precise, uniform degradation. (Decided: spectra.)
- Derived middle is a cache, not a store — accept recompute cost for rebuildability, provenance, and graceful degradation. (Decided: cache.)
- Default federation = fork+journal over Git — the home case; other models opt-in. (Decided.)
- Execution off by default — recognise+project always; execute only when gated on. (Decided.)
Open — to confirm before SHARD-WP-0002 spec-writing finalises:
- Union graph persistence. Pure-recompute (simplest, honours I-2 hardest) vs a persisted-
but-disposable cache (faster, must guarantee rebuild equivalence). Recommendation:
persisted-but-disposable with a
rebuildthat must reproduce it byte-for-byte. - Address scheme. Ship shard-scoped native-id wrapping now and treat a portable tumbler as a later capability, or design the tumbler up front? Recommendation: wrap native ids now.
- L1 "attributed-but-open" mode — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
- Per-page ACL default — off (per-shard/namespace) confirmed; revisit only if demand appears.
14. What this architecture is not
- Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
- Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
- Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
- Not an execution platform — computation is recognised and projected, not hosted.
- Not a universal ontology — no single schema is imposed on all shards.
- Not an authentication/identity store — that is delegated (authorization is owned).
15. Traceability
- INTENT — every invariant in §2 cites an INTENT principle or boundary; no invariant contradicts the Stability Note.
- Research — §6 (spectra) ←
260614-shard-spectrum-synthesisv3; §8.3 (federation taxonomy) ← v3 §2.5; §8.4–§8.5 (two-axis projection, view registry, computational scope) ←260614-computational-page-model-synthesis; §7 page shapes ← the engine + modern-tool + computational dives; §1 thesis ← the files-canonical/index-derived through-line across Logseq/ikiwiki/GT/Pharo/Jupyter. - Use cases — the architecture is sized to UC-01–UC-84: federation/coordination (UC-01–07, 26–33, 56, 71–72, 79) → §8; attachment/adapter (UC-34–43, 50, 53, 57, 60–62, 64–66, 68–70, 76–82) → §6; page model & fidelity (UC-34, 39, 42, 55, 58–59, 67, 73, 80, 83–84) → §7/§8.5; addressing/identity/query (UC-32, 44–49, 51–52, 54, 63, 74) → §7.2/§8.4; provenance & metadata (UC-24–25, 75) → the provenance rail; collaboration & discovery (UC-08–23) → L6 consumers over the union.
- Workplans — §6–§8 are the design target of
SHARD-WP-0002(T11–T18); §9 is owned byArchitectureBlueprint.md; §1 (yawex-derived resolution/overlay) aligns withSHARD-WP-0001.
16. Stability note
This document defines shard-wiki's internal architecture; it may evolve as the spec workplans land. But the thesis (§1), the invariants (§2), and the dual narrow waist (§1, §6, §7) are load-bearing — changing any of them is an architectural change in the sense of INTENT's Stability Note and should be rare and deliberate.