Files
shard-wiki/spec/CoreArchitectureBlueprint.md
tegwick e18397272a spec(SHARD-WP-0013 T6): wire-up + close-out
spec/README + SCOPE list WikiEngineCoreArchitecture.md; CoreArchitectureBlueprint
cross-links the engine as a canonical-mode shard (federation/union stay in the
orchestrator). reuse-surface engine capability promoted D2->D3 (4204255).
Marks T6 + SHARD-WP-0013 done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 23:02:15 +02:00

66 KiB
Raw Blame History

CoreArchitectureBlueprint — shard-wiki

Status: draft for review · Date: 2026-06-15 · Owner: tegwick

The whole-system architecture for shard-wiki, synthesised from INTENT.md, the 84-entry UseCaseCatalog.md, and the full research arc (research/260608-*, research/260613-*, research/260614-* — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the core blueprint: it defines the layers, the abstractions, and the load-bearing decisions that everything else implements.

Scope relationship to the other specs:

  • ArchitectureBlueprint.md (existing) is the authorization & history sub-blueprint (the L0L4 ladder). This document references it as the design of the cross-cutting authorization layer (§9) and does not restate it.
  • SHARD-WP-0002 is the workplan that turns §6§8 into spec/FederationArchitecture.md + the adapter-contract section of spec/TechnicalSpecificationDocument.md.
  • UseCaseCatalog.md is the demand this architecture must satisfy; UC references below are load tests, not decoration.
  • WikiEngineCoreArchitecture.md designs shard-wiki's native, headless, API-first wiki engine as a canonical-mode shard backend (one shard behind §6/§A — federation, union, and projection stay here in the orchestrator, not in the engine). Added per the 2026-06-15 INTENT amendment (decision 84ffdb48, SHARD-WP-0013).

1. The thesis: canonical vs derived (three states)

Everything in shard-wiki follows from one organising decision — that state comes in exactly three kinds, and only one of them is disposable:

1. Sharded-canonical — content owned by each shard (shard sovereignty). 2. Coordination-canonical — durable state born inside shard-wiki that encodes human or cross-shard decisions and exists nowhere else: overlays (the local truth against a read-only shard), curator equivalence bindings, alias tables, merge/reconciliation decisions. It is recorded as an append-only decision log in the Git coordination journal (event-sourced, §8.1); the queryable current form of that state (the effective alias table, the equivalence set) is a derived fold of the log — i.e. tier 3, not tier 2. What is canonical is the log of decisions, not any mutable snapshot of them. 3. Derived-disposable — everything shard-wiki computes from (1)+(2): the union graph, equivalence index, query indexes, projections, views. It can be deleted and recomputed.

Canonical = sharded coordination. Derived = a pure function of canonical: derived = f(sharded, coordination).

This is the architectural form of "orchestrator, not engine." shard-wiki never becomes the source of truth; it composes sources and records the decisions it makes about them. The research earned the files-canonical half empirically — every serious system externalises its durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams (nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk image — is exactly the one we record as a boundary, not a backend (research/260614-squeak-pharo-deep-dive).

The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges, derived in the middle"). That was wrong by omission: it had no home for coordination- canonical state, and so contradicted itself by listing curator bindings and alias tables as "derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model fixes that crack (review finding A-1) and makes derived = f(canonical) literally true.

Three consequences fall straight out, and they are the spine of the rest of this document:

  1. Graceful degradation is free. If the derived tier is always recomputable, a backend that can only be read is still a first-class participant — you just derive less from it.
  2. Provenance is tractable. Because shard-wiki never claims to be the source, every derived artifact can always point back to the canonical input it came from (union without erasure is a structural property, not a feature bolted on).
  3. The derived tier is a pure function of canonical state. derived = f(sharded, coordination). Bugs in the derived tier are recoverable by recompute; only the two canonical tiers must be durably protected — sharded by each shard, coordination by the Git journal (history). Recomputability is a correctness property of the derived tier, not a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the operational-envelope axis.

The dual narrow waist

Heterogeneity is mediated at exactly two interfaces, and nowhere else:

  • Bottom waist — the Shard Adapter Contract (§6). Every backend, however weird, enters through one versioned, capability-described interface.
  • Top waist — the Wiki Page Model (§7). Every consumer, however demanding, sees one backend-neutral, Markdown-first-but-stretchable page model.

Between the waists, core logic is written once against capabilities and the page model — never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an adapter and declaring a capability profile, not editing core algorithms.


2. Architectural invariants

These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's principles fused with the research through-lines.

# Invariant Source
I-1 Orchestrator, not engine. Core composes shards; it never replaces or homogenises them. INTENT Stability Note
I-2 Three states; derived = f(canonical). State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. §1; Logseq/ikiwiki/GT through-line
I-3 Capability-awareness is data. A binding's abilities are a profile (positions on spectra), read by generic core logic — not per-backend branches. synthesis v3 §2; INTENT capability-aware adapters
I-4 Union without erasure. Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. INTENT; provenance-granularity spectrum (Wikibase)
I-5 Overlay before mutation. Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. INTENT
I-6 Git-addressable coordination. Every information space has a Git-backed journal even when its shards are not git-native. INTENT
I-7 Mechanism over policy. Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. INTENT
I-8 Graceful degradation. A limited backend is still usable as read-only / cache / projection / backup / patch target. INTENT
I-9 Identity ≠ placement. A page is an entity that may occupy N locations; address by identity, not by path. Trilium note/branch; ZigZag
I-10 History is the floor. Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. ArchitectureBlueprint §2
I-11 Authorization in core, authentication delegated. Core decides who-may; an external provider says who-is. INTENT; ArchitectureBlueprint
I-12 Not a file-sync daemon; not an execution platform. Sync is wiki-page-semantic; computation is recognised+projected, not hosted. INTENT; computational-page-model synthesis
I-13 Tenant-partitioned derived state. Derived state is partitioned by tenant/root entity; no derived artifact spans tenants except via explicit, authorised cross-root federation. §9.1; review B-3

3. The layered architecture

        ┌───────────────────────────────────────────────────────────────┐
        │  L6  Consumers — Orchestrator API · CLI/agents · Web/Obsidian   │
        ├───────────────────────────────────────────────────────────────┤
 X-cut  │  L5  Authorization (PEP/PDP, identity-provider iface) →         │ X-cut
 Prove- │      see ArchitectureBlueprint.md (L0L4 ladder)                │ Capa-
 nance  ├───────────────────────────────────────────────────────────────┤ bility
  ▲     │  L4  Union & Projection  (DERIVED · rebuild=fallback)           │   ▲
  │     │      identity resolution · equivalence/chorus · union graph ·   │   │
  │     │      replication+derivation projections · moldable view registry│   │
  │     │      · derived query index                                      │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L3  Coordination  (Git journal · overlay/patch engine ·        │   │
  │     │      federation-model strategies · reconciliation)              │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L2  Wiki Page Model  ── TOP WAIST ──                           │   │
  │     │      backend-neutral pages · identity≠placement · span address ·│   │
  │     │      provenance envelope · the page shapes                      │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L1  Shard Adapter Contract  ── BOTTOM WAIST ──                 │   │
  │     │      versioned iface · capability profile (orthogonal) ·        │   │
  │     │      attachment-mode binding · operation verbs                  │   │
  └──── ├───────────────────────────────────────────────────────────────┤ ──┘
        │  L0  Backends (not ours): git repos, wiki/ subdirs, Gitea/      │
        │      GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion,    │
        │      Coulomb spaces, notebooks, …                               │
        └───────────────────────────────────────────────────────────────┘

Provenance and Capability are drawn as vertical rails because they are not layers — they are present at every layer. A page object at L2 carries provenance; a projection at L4 carries provenance; an authz decision at L5 records the context under which content was read. Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4 (can we delegate a query?), and L5 (can this principal even reach the op?).

The dependency rule is strict and downward, and it tracks the three states (§1), not the layer numbers: the derived-disposable tier (the whole of L4) may be deleted and recomputed from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal). Nothing canonical may depend on derived state. Note the journal at L3 is canonical (it holds overlays, bindings, aliases, merges); only L4 is disposable.


4. Core abstractions (the vocabulary code must use)

Straight from INTENT, sharpened by research. New code maps onto these; it does not invent parallel terms.

  • Shard — an independently meaningful page store attached to a root entity, with sovereignty: its own backend, capability profile, history, identity model, limits.
  • Root entity / information space — the joined space shards attach to; the unit of Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
  • Shard adapter contract — the versioned L1 interface; the bottom waist.
  • Capability profile — a shard binding's position on each of the 15 spectra (§6) plus its supported verbs. The data structure that drives degradation.
  • Wiki page model — the L2 backend-neutral page; the top waist.
  • Page identity vs placement vs equivalence — a page is an entity with a stable handle (identity); it may have N placements (paths/shards); addressing and transclusion key on identity, but equivalence keys on content fingerprint across identities (§7.2, I-9). The three are distinct mechanisms — never conflate identity with a fingerprint.
  • Provenance envelope — the metadata each artifact carries (source shard, freshness, liveness, authz context, overlay status, divergence, lineage), stored layered: a page-level envelope + span-level deltas, so per-span cost is near-zero when uniform (§7.3).
  • Coordination journal — the L3 Git-backed, append-only decision log for a space: the durable home of all coordination-canonical state (§1, §8.1) as events (overlay-created, binding-made, alias-set, merge-decided), plus the content change-flow record. It is event- sourced — committed, never overwritten; the queryable current coordination state is a derived fold of it (§8.1).
  • Overlay — a non-destructive local edit against a remote/read-only/limited shard, representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an unapplied overlay is the local truth and lives in the journal.
  • Projection — a derived view of shard content. The default is a plain lazy replication-projection (a freshness-stamped cache); only source content needing transform/evaluate uses the derivation-projection extension point with its two-axis typing (kind × liveness) and the moldable view registry (§8.4§8.5).
  • Federation model — the selected coordination strategy for a space (§ taxonomy, T17).
  • Shard mode — read-only · write-through · mirrored · projected · cached · canonical (a policy selection constrained by the capability profile).

5. Why "layered" and not "pipeline" or "plugin-bus"

Two rejected alternatives, recorded so the choice is legible:

  • A sync pipeline (source → transform → sink) was rejected: it implies a privileged direction and a canonical sink, which violates shard sovereignty (I-1) and union-without- erasure (I-4). shard-wiki is a star (many shards ↔ one space), not a pipe.
  • A flat plugin bus (every backend a peer plugin emitting events) was rejected as the top-level shape: it has no narrow waist, so heterogeneity leaks into every consumer. We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies), behind the waists.

The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.


6. Bottom waist — the Shard Adapter Contract (L1)

The single most important design decision in the project: the adapter contract models positions on capability spectra, not a flat checklist of boolean verbs. A backend is not "can/can't merge"; it sits somewhere on the merge spectrum, and federation operations degrade by position. This is the lesson of putting ~23 systems in one matrix (research/260614-shard-spectrum-synthesis, v3).

6.1 The fifteen capability spectra

Each binding declares a position on each axis. Core algorithms read these positions; there is no per-backend code in core (I-3).

  1. Addressing granularity — none → path → page-level store-id → in-file span → in-file block id (Logseq id::) → store-UUID → portable tumbler (Xanadu, the unreached ideal)
  2. Content identity — none → path/title → fingerprint → span-set
  3. Identity vs placement — path=identity → identity separated from placement (Trilium note/branch = a DAG)
  4. Structure — flat MD → frontmatter/key::%META% → typed objects → DB schema+ relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements
  5. History — none → internal-only / CRDT-log → open-file → git-native
  6. Merge model — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
  7. Native query — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
  8. Translation — native → lossless → lossy-with-fidelity-report (incl. HTML)
  9. Attachment mode — file-store (native | interchange-mirror) → git-IS-store → in-engine-host → local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
  10. Operational envelope — local/unbounded → realtime CRDT/WebSocket → rate-limited/ eventually-consistent/paginated
  11. Access grant — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
  12. Content opacity — plaintext → structured re-evaluable value → encrypted whole-shard → per-item → proprietary-lossy-exportable
  13. Write granularity — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
  14. Provenance granularity — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
  15. Computational / liveness — static source → captured-output snapshot → live-over-files → view-time render → irreducibly-live/temporal

6.2 Operation verbs

read, write, diff, merge, lock, version, publish, notify, transclude-source, translate-syntax, structured-payload, derive-projection, execute/evaluate. The last two are gated, off by default (§8, computational content). Verb support is part of the profile and must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).

6.3 Attachment-mode taxonomy (axis 9, expanded)

A backend may offer several modes; attach mode is a per-binding, capability-gated choice, with one declared authoritative. Modes: file-store (native vault/folder or an interchange/sync mirror), git-IS-store (the home case — forge wikis & ikiwiki: git is the store and the journal at once, resolving the engine-mirror write-race), in-engine hosted adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model), CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. Boundary: a monolithic live-memory blob (Smalltalk image, a kernel) is never an attach target — it participates only via export→files (I-12).

6.4 Contract rules

  • Versioned interface (Foswiki::Store + Foswiki::Meta is the proof that a stable store-interface-with-swappable-backends works). Capability discovery is a static profile with optional runtime negotiation.
  • Backend-swap tolerance — shard identity/provenance survives a substrate change (RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to capabilities, not to "it's files."
  • Absence is first-class — the profile must express can't cleanly (Oddmuse floor), so degradation paths are explicit, never guessed.

6.5 Orthogonal core, implied positions, and the interaction subset

Fifteen independent ordinal axes is descriptively right but would be operationally a mess if treated as fifteen free dimensions: the axes are not orthogonal, and a degradation function over all 15 jointly is the flat-checklist problem returning in higher dimensions (review D-1). Three rules tame it.

(a) A small orthogonal core; the rest are implied. Most axes are correlated and collapse to a few independent choices. The core axes an adapter must independently declare:

  1. Substrate → drives attachment-mode, history, merge, and native-query positions together (git-IS-store ⟹ history=git-native ⟹ merge=git/text ⟹ query=build-your-own-index; relational-DB ⟹ direct-DB attach ⟹ DB-version-row history ⟹ DB query).
  2. Write granularity → drives addressing granularity and the overlay/patch shape.
  3. Content opacity → drives translation and (where encrypted) collapses native-query.
  4. Operational envelope → drives freshness mode (§8.8) and rebuild expectations (§8.7).
  5. Access grant → independent (authz, L5).
  6. Computational/liveness → independent (projection kind, §8.5).

The remaining axes are implied/derived from these via published implication rules; an adapter may override an implied position, but the default is computed, not hand-set. This turns ~15 free dimensions into ~6 independent ones plus derivations — fewer things to get wrong, and impossible combinations become unrepresentable.

(b) Implication rules forbid impossible profiles. E.g. attachment=git-IS-store ⟹ history≥git-native; opacity=encrypted-whole-shard ⟹ native-query=none ∧ translation≤opaque; merge=native-CRDT ⟹ history=CRDT-log ∧ envelope=realtime. A profile that violates an implication is rejected at registration — capability-as-data (I-3) with integrity constraints.

(c) The degradation function reads a named, small interaction subset — not all pairs. "No per-backend code" is only credible if we say which axis interactions the generic logic actually consults. They are:

Operation Axes consulted (jointly)
write / overlay-apply write-granularity × merge-model × history × access-grant
transclude / address a span addressing-granularity × write-granularity × identity-vs-placement
project / cache operational-envelope × computational-liveness × content-opacity
query native-query × content-opacity (encrypted ⇒ derive-index-or-none)
translate translation × content-opacity × structure
federate substrate × history × merge (per the §8.3 model)

Everything else is a single-axis check. This table is the degradation contract: it is small, enumerated, and testable — the proof obligation behind "core logic written once."

6.6 Conformance — profiles are verified, never self-asserted

Capability-as-data (I-3) and the entire degradation contract (§6.5) rest on one assumption: the profile tells the truth. If an adapter declares merge=git/text but corrupts merges, or claims notify and never emits, it silently poisons every degradation decision in core — the failure is invisible because core believed the data (review B-2). So the profile is not taken on trust:

  • The contract ships a versioned conformance suite. A published battery that, given a live binding, exercises each declared verb and each declared spectrum position and checks that observed behaviour matches the claim (a write round-trips; a diff is real; notify actually fires; an "encrypted/opaque" shard genuinely refuses plaintext query; an implication-rule position, §6.5(b), holds). The suite is versioned with the contract, so an adapter proves conformance against a known contract version.
  • Passing conformance is an admissibility precondition. A binding that fails (declares a capability it does not honour) is rejected at registration, not run in production with a lying profile. Capability discovery (§6.4) therefore yields a verified profile.
  • Self-reported, then verified. Adapters still declare their profile (discovery stays cheap); conformance verifies the declaration. The two together are what make I-3 and §6.5 sound rather than aspirational — degradation logic acts on verified data.
  • Mismatch is data, not a crash. A conformance gap is reported as a precise capability-by-capability diff (what was claimed vs observed), so an adapter author fixes the profile or the code; degraded-but-honest registration (drop the unsupported claim) is allowed.

This is the same discipline a versioned store interface needs in general (the Foswiki::Store lineage that inspired the contract): a backend may only participate behind the interface if it demonstrably behaves as the interface says.


7. Top waist — the Wiki Page Model (L2)

Backend-neutral, Markdown-first but stretchable many ways at once. The page model is the lingua franca every consumer sees; an adapter's job is to project its backend into this model (read) and accept overlays back (write), within its capabilities.

7.1 Page shapes the model must carry

  • Prose Markdown — the baseline.
  • Typed / computed records — frontmatter/%META%/XObjects/Notion DB rows; computed metadata (Trilium inherited+templated) represented as effective-vs-own with per-attribute provenance.
  • Typed-graph statements — Wikibase claim + qualifiers + references + rank (structure far-end).
  • Inline-embedded objects — Quip/Notion spreadsheets & live apps inside prose.
  • Non-Markdown assets — drawings, canvases, images: typed asset / opaque blob / pluggable content-type registry, never silent-flattened.
  • The four computational shapes (§8): one-source-many-projections, notebook (embedded computed output), program-as-page, live/temporal.

All shapes reduce to a common skeleton: (content | source, structure, provenance envelope, [derivation rule]). The page model stores the richest faithful form as canonical and treats any Markdown rendering of a non-Markdown shape as a lossy projection (I-4 + fidelity report).

7.2 Identity, placement, addressing — three distinct concepts

The earlier draft used "identity" for two different things and (worse) suggested deriving page identity from a content fingerprint — which would make editing a page change its identity and break every reference to it (review bug B-1). They are pulled apart here:

  • Page identity — a stable handle. A shard-scoped, durable key that survives edits: the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and never collides across shards. Identity is assigned/minted, not computed from content. References, placement, transclusion targets, and overlays all key on identity.
  • Placement — where an identity sits. One identity → N placements (paths/shards) = a DAG; no single canonical path (I-9). Placement can change without changing identity.
  • Content equivalence — detecting sameness, never identity. A content fingerprint (or span-set overlap) identifies a version / a piece of content, used to detect that two distinct identities hold the same or derived content (the equivalence/chorus mechanism, §8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, same identity; two pages, identical content → same fingerprint, different identities.
  • Span addressing — a sub-page address within an identity: adopt native span IDs where minted (Roam :block/uid, Logseq id::, Notion/CRDT UUID); else a position address (path+range) or a content-fingerprint address for equivalence/transclusion. The Xanadu tumbler is the portable ideal the scheme aims at without requiring.
  • Provenance envelope rides on pages and spans (see §7.3 for its layered, low-cost form).

So the chain is: identity (stable) → placements (N, mutable) → equivalence (cross-identity sameness, fingerprint-based) — three concepts, three mechanisms, never conflated.

7.3 Provenance is layered, not per-span-duplicated

A provenance envelope on every span (source shard, freshness, liveness, overlay status, authz context, divergence, lineage) would, at block granularity, mean ~10k near-identical envelopes for a 10k-block page — provenance dwarfing content (review D-2). The fix is the exact pattern the page model already uses for Trilium's computed metadata: effective-vs-own.

  • Page-level envelope holds the values that are uniform across the page (almost always: source shard, observed-at, liveness, authz context).
  • Span-level deltas record only where a span differs from its page envelope — a transcluded span from another shard, an overlaid span, a span that diverges. A span with no delta inherits the page envelope at zero storage cost.
  • Effective provenance for any span = page envelope ⊕ span delta, computed on read.

Per-span cost is therefore near-zero in the common (uniform) case and pays only for genuine heterogeneity — the same "carry only the difference" principle, applied to shard-wiki's own metadata. Provenance remains complete (I-4); it is just not redundantly materialised.


8. Coordination, federation & projection

8.1 Coordination journal (L3) — Git as the spine

Every information space has a Git-backed coordination journal (I-6). It records cross-shard operations (fork, import, reconcile, overlay-apply, space-branch) and is the history floor (I-10). For git-IS-store shards the shard's own git log is this journal; for non-git shards the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports (backfill open file history). History portability is a spectrum, handled per profile (axis 5).

The journal is an append-only decision log; current coordination state is a derived fold (review B-3). The first draft said coordination-canonical state "lives in the journal" without saying how Git — excellent for history, poor for mutable structured state — represents an alias table or an equivalence graph. Resolution: event sourcing. The journal stores decisions as events (overlay-created, binding-made, alias-set, merge-decided, page-forked), append-only and git-addressable (so history/patch/review/backup over coordination state come for free — I-6 is strengthened, not bypassed). The queryable current state (the effective alias table, the live equivalence set) is a derived fold of the log — tier-3 disposable, indexed like any other derived structure (§8.7), rebuilt by replaying the log. So "all equivalences touching X" is an index lookup, not an O(scan) of Git. This is the clean form of the §1 three-state model: the log is canonical; its folded current state is derived.

Concurrency: who may append (review B-1). A multi-tenant L4 deployment runs several orchestrator instances, so "the journal is local Git, single writer" is not given. The model:

  • One append authority per information space. Appends to a space's log are serialized through a single logical writer (a per-space lease/leader; instances without the lease forward their append intents to it). This makes the log a totally-ordered event sequence per space — the ordering authority §8.6 relies on — without a distributed transaction. Spaces are independent, so this scales horizontally across spaces (the unit of partition is the space / root entity, matching the tenant partition, I-13); it is a per-space serialization point, not a global one.
  • Git is the durable, addressable form; appends are commits (or fast objects batched into commits) under the lease — no concurrent-writer merge races because there is one writer at a time per space.
  • Read-your-writes holds within a space because every reader resolves current state from the same ordered log (or its fold); across spaces there is no shared state to be inconsistent.
  • HA / failover: the lease is time-bounded and re-grantable; a failed append-authority is replaced and resumes from the log's head (the log is the recovery point). A partition that splits the authority degrades that space to read-only until a single writer is re-elected — it never forks the log (availability yields to log integrity; an explicit, stated trade).
  • Open residual (→ §12, O-3-adjacent): whether very high append rates need per-space log sharding (sub-logs merged by a deterministic order) is an implementation spike, not an architectural change.

History must stay recoverable and bounded (review C-3). "Every write is a commit" + open L0 means an unbounded, bot-/vandalism-amplified journal that eventually degrades Git itself. Recoverability (I-10) is non-negotiable, so the answer is compaction, not deletion:

  • Routine git maintenance — background gc/repack, commit-graph, and (for very large spaces) partial-clone / sparse strategies; operational, no semantic change.
  • Squash-compaction of low-value churn (policy, §10) — long runs of rapid same-author edits or revert-pairs can be folded into checkpoint commits while preserving the recoverable endpoints; what is squashed is configurable and always leaves the content recoverable (it compacts the path, not the reachable states).
  • Per-shard history offload — a git-IS-store shard keeps its own history in its own repo; the coordination journal references it rather than duplicating it (the journal records coordination events, not a second copy of every shard commit).
  • Anti-abuse hooks (policy) — rate-limiting / quarantine for anonymous L0 writers feed the authz/policy layer; they throttle abuse, never legitimate history. Recoverability is the floor; bounding is how it survives at scale.

8.2 Overlay / patch engine (L3)

The default write path for anything below write-through capability (I-5): an edit becomes a draft → patch/commit → MR, applied destructively only on explicit intent and only where the profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend still be edited safely.

8.3 Federation is plural & composable (L3) — the model taxonomy

Federation is not one mechanism. shard-wiki selects a federation model per space and composes per shard (mechanism over policy, I-7):

Model Anchor Coordination shape
Fork + journal (default home case) Federated Wiki copy-with-provenance + per-page action journal (story = replay)
VCS-replication + ping ikiwiki git clone/pull/push + change-ping
Query-time graph-join Wikibase SPARQL SERVICE join remote graphs at query time, no copy
Feed aggregation RSS/Atom inbound feed → pages
Activity streams ActivityPub Create/Update events, notify or content-bearing
Engine-mirror Wiki.js DB↔Git engine syncs its own store to a git mirror

8.4 Union & projection (L4) — the derived cache

This whole layer is derived-disposable: recomputable from canonical state — sharded content + the coordination-canonical inputs in the journal (I-2). Crucially, the automatic equivalence results are derived, but the human/curatorial inputs they consume — alias tables and curator equivalence bindings — are coordination-canonical (they live in the journal), not derived; recompute reads them, never regenerates them. It comprises:

  • Identity resolution & equivalence — detect "same topic / derived content" path- independently from derived signals (content fingerprint, span-set overlap) plus the coordination-canonical inputs (alias table, curator binding); present as chorus-of-voices or designated-canonical (a policy preset). (Scaling: §8.7.)
  • Union graph — the navigable join of pages, links, and dimensions (namespace, genealogy, version, shard, equivalence). A derived lens over canonical files+journal, never a new store (the ZigZag boundary).
  • Transclusion — one reference-not-copy primitive unifying Xanadu transclusion, ZigZag clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate named-chunk assembly, over the addressable union.
  • Projection — trivial by default, extensible for the tail. The 95% case (Markdown in a shard) must cost nothing conceptually, so:
    • Default = plain lazy replication-projection — a freshness-stamped cache of remote content (§8.8). This is the projection for ordinary pages; it needs no taxonomy, no liveness reasoning, no registry. Most shards never touch anything below.
    • Extension point — derivation-projection — invoked only for content that is a source needing transform/compile/weave/evaluate (computational/typed content, §8.5). It adds the liveness axis (static → captured → live-over-files → view-time → irreducibly-live) and facets (materialization timing, multiplicity, continuity); the irreducibly-live far end has no faithful static form (source + a marked recording). A binding that never serves such content never instantiates any of this.
    • Both kinds stamp freshness + provenance; only derivation carries the liveness machinery.
  • Moldable view registry — also an extension point, not a tax on every page. Where a content type offers multiple co-equal views (typed/computed/dimensional content), they are registered as an open, type-keyed set, none canonical-by-fact (display-canonical is policy; GT prior art, answers the "pluggable content-type registry" question). An ordinary Markdown page has exactly one view and never consults the registry — the registry is queried only when a type declares >1 view.
  • Derived query index — delegate to a shard's native query engine where present (Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived index over the projection (the Logseq DataScript-over-files pattern). The index is disposable (I-2).

8.5 Computational / executable content — the scope decision

In scope as a page-model + projection concern; out of scope as an execution platform. shard-wiki recognises computational types, attaches the canonical source, and presents derived forms as provenance- and liveness-marked projections. Driving a derivation (tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a gated capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot. One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed") serves notebooks, renders, and recordings alike. No INTENT amendment is required — this lives inside the existing page model (L2) and projection model (L4).

8.6 Consistency, concurrency & conflict model

INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not the same as "no defined consistency." This is the guarantee shard-wiki does offer, and the mechanism (not policy) that makes concurrent editing safe (review bug B-2).

The consistency guarantee — causal, anchored on the journal:

  • Read-your-writes for coordination-canonical state. Once an overlay/binding/merge is appended to the space's decision log, every reader of that space sees it — because the log is a single totally-ordered sequence per space (one append authority, §8.1), and all readers resolve current state from that one order. The guarantee holds across orchestrator instances, not just within one process; it is cheap because ordering is per-space, never global.
  • Causal consistency across the derived tier. The union/index/projections reflect a causal cut of (sharded inputs seen so far, journal). Effects never appear before their causes; a projection that has seen journal commit C has seen everything C depends on.
  • Eventual convergence for sharded-canonical inputs. Remote shard content is pulled asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest as observed, bounded by the shard's operational envelope. Freshness is always shown (provenance envelope), never faked — a stale projection is labelled stale, not wrong.

So: strong + read-your-writes for what shard-wiki owns (the journal); causal for what it derives; eventual + freshness-labelled for what shards own. No global clock, no distributed transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates rather than controls.

Conflict detection & representation is core mechanism; only resolution is policy (I-7). The split the earlier draft elided:

  • Detection (core). Divergence is detected structurally: two identities resolve as equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no longer matches the shard's current revision. Detection is always on; it is never optional.
  • Representation (core). A detected conflict is first-class data in the union, not an error: equivalent-but-divergent pages are presented as a coexisting set (the chorus/keep-both representation), each fully attributed, with the divergence recorded in the provenance envelope (union without erasure — a conflict is information, not a failure).
  • Resolution (policy). Which version wins, or whether they stay coexisting, is a configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge / overlay-only. Core never hard-codes one.

Overlay-apply under source drift (the concurrent-write case). An overlay carries the base revision of the shard content it was authored against. On apply, core compares base to the shard's current revision:

  • unchanged → apply (fast-forward), commit to journal, propagate if the profile permits;
  • changed, non-overlapping → three-way merge where the merge capability allows (axis 6), else keep-both;
  • changed, overlappingrefuse + re-present as a conflict (above); never silently clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination- canonical and valid against its base.

Ordering. The journal commit is the ordering authority for coordination-canonical effects; a shard-native write is only acknowledged in the journal after the adapter confirms it, so a crash between journal-intent and shard-write is recoverable (the intent is replayable, the write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their journal commits, giving the causal cut above.

Residual open items (tracked in Known scaling risks & open problems, §12, not pretended solved): the exact convergence bound for high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.

8.7 Scaling the union — incremental-first, rebuild as fallback

The derived tier is recomputable (I-2) but recompute must never be the operational mechanism. A from-scratch rebuild reads every page of every shard — including rate-limited, paginated external APIs (Notion) and irreducibly-live sources — which can take hours to days and directly fights the operational-envelope axis (review C-2). So:

Incremental, change-driven maintenance is the primary mechanism. Each shard's notify capability (or a poll/ETag fallback where it has none, §8.8) emits change events; an event drives a delta update to exactly the affected union nodes, equivalence candidates, indexes, and projections. The derived tier is a continuously-maintained materialised view, not a periodically-recomputed one. Steady-state cost is O(changes), not O(corpus).

Full rebuild is a rare, bounded fallback — for cold start, schema/algorithm change, or suspected corruption — and it is explicitly not required to be cheap. It respects each shard's envelope (it may be slow, throttled, or resumable for a rate-limited shard) and runs concurrently with serving the existing derived tier; it swaps in atomically on completion. I-2 guarantees rebuild is possible and correct, not instant.

Equivalence detection is indexed, not pairwise (review C-1). Naive fingerprint/span-set comparison across all pages of all shards is O(N²) and is forbidden. Instead:

  1. Blocking / candidate generation — cheap keys bucket pages that could be equivalent: normalised title, normalised path tail, explicit alias-table entries (coordination- canonical), and MinHash/LSH bands over content shingles for near-duplicate and derived-content detection. Only within-bucket pairs are considered — turning O(N²) into ≈O(N) candidates.
  2. Verification — candidate pairs are confirmed by full fingerprint / span-set overlap and any curator binding. Confirmed equivalences become union edges.
  3. Incremental maintenance — the delta is not additive (review B-4). A changed page may leave buckets as well as enter them, and leaving a bucket can break an existing equivalence edge another page relied on. So a change is processed as: (i) recompute the page's bucket membership; (ii) for buckets it left, re-verify the pairs that depended on the shared bucket and retract edges no longer supported; (iii) for buckets it entered, verify the new candidate pairs and add edges; (iv) propagate to the equivalence neighbours of any retracted/added edge (equivalence is transitive-ish via chorus sets, so a retraction can split a set). Maintenance is per-change and bounded by the page's neighbourhood, but it covers retraction and propagation — not just additions.

The index is itself derived (disposable, recomputable) and per-tenant-partitioned (§9). Its parameters (LSH band/row counts, shingle size, precision/recall) are tunable; the accepted false-negative rate of blocking is a known, tracked limitation (§12) — blocking trades a small miss rate for tractability, and curator bindings are the escape hatch for misses.

Verifying I-2 (derived = f(canonical)) — eventually, not on faith (review B-4). Incremental maintenance can drift from a from-scratch fold over time (a missed retraction, a dropped event, a bug). I-2 is therefore an eventually-verified property, not a free one, and the architecture names the mechanism that verifies it:

  • A digest of the derived tier. Each partition's derived tier carries a rolling content digest (a Merkle-style hash over union nodes/edges/index entries) maintained alongside the incremental updates.
  • A background consistency-checker periodically recomputes the digest over a sampled (or, on a slow cadence, full) fold of canonical state and compares. A mismatch localises the drift to a partition/region and triggers a scoped recompute of just that region — cheap relative to a global rebuild, and self-healing.
  • So I-2 holds eventually and verifiably: the incremental engine is the fast path, the checker is the guarantee, and divergence is detected and repaired rather than silently accumulating. The exact sampling rate / digest granularity is an implementation spike (§12).

8.8 Cache freshness & invalidation

Replication-projection caches remote shard content; cache invalidation is the actual hard part and was missing from the first draft (review C-2). The protocol is per-binding, driven by the capability profile, with one rule: freshness is always represented, never assumed — every cached page's provenance envelope carries (observed-at, source-rev-if-known, staleness-state), so a consumer can always tell live from stale.

Three invalidation modes, chosen by capability, not hard-coded:

Mode When Mechanism
Event-driven (push) shard has notify a change event invalidates exactly the affected entries and enqueues a delta refresh (§8.7); the preferred mode
Validator poll shard exposes ETag / Last-Modified / rev conditional fetch (If-None-Match); cheap "still fresh?" checks without transferring bodies
TTL shard offers neither time-bounded staleness; the floor mode (Oddmuse-class shards)

Most real bindings are hybrid: event-driven for invalidation + a long TTL as a safety net for missed events + validator polls on read when an entry is past a soft age.

Operational-envelope coupling. The mode is constrained by axis-10: a rate-limited shard (Notion) must favour event-driven + long TTL and must not poll per-read — the freshness policy is capability-gated like everything else. A local file shard can watch the filesystem (near-instant invalidation, effectively event-driven for free).

Thundering-herd / coalescing. Concurrent reads of the same stale entry trigger a single in-flight refresh (single-flight); other readers await it or are served the stale-but-labelled value per policy. Bulk invalidations (a shard-wide event) are batched and rate-shaped to the shard's envelope rather than fired as N concurrent fetches.

Staleness is a policy knob, not a correctness bug. Whether a reader gets stale-but-fast or blocks-for-fresh is a §10 preset (per space or per request); either way the envelope tells the truth about what was served. This is union-without-erasure applied to time.


9. Cross-cut — Authorization (L5)

Fully specified in ArchitectureBlueprint.md (the access & history sub-blueprint); summarised here for completeness:

  • One core, a ladder of modes L0 (open/c2, zero deps) → L1 (attributed) → L2 (authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration, not re-architecture.
  • PEP wraps every adapter op; PDP decides (principal, action, target) over actions read/write/patch/merge/administer, layered on the adapter's capability profile (a shard that can't write can't be written regardless of policy — L5 consults the L1 rail).
  • Authentication delegated to a pluggable IdentityProvider (null provider = L0 default); real identity from user-engine over net-kingdom IAM.
  • Fail open only at L0, fail closed at L2+. Authorization is pure/offline once a Principal is resolved. Provenance carries authz context so the union never leaks unreadable content (the L5↔provenance-rail interaction).

9.1 Tenant isolation of the derived tier (review B-3)

Read-time authz filtering is necessary but not sufficient when the derived tier is persisted: a single cross-tenant union/index cache guarded only by a filter on read is a standing leak surface (one filtering bug exposes another tenant's content). So isolation is structural, not just procedural:

  • The derived tier is partitioned per tenant / root entity. A tenant maps to a root entity (§4); its union graph, equivalence index, projections, and caches live in a separate partition keyed by that tenant. There is no shared cross-tenant derived store to leak from.
  • No cross-tenant equivalence by default. Blocking/LSH (§8.7) operates within a partition; cross-tenant equivalence is an explicit, authorised, opt-in federation between roots, never an accident of a shared index.
  • Read-time filtering remains, as defence-in-depth — the provenance envelope's authz context is still checked, so even within a partition a principal sees only what it may; partitioning removes the blast radius, filtering removes the fine-grained leak.
  • This reconciles I-2 with L5: recomputability (a persisted-but-disposable derived tier) is preserved per partition — each tenant's derived tier is independently rebuildable from that tenant's canonical state — so isolation costs nothing in the rebuild model. At L0/L1 (single tenant) there is one partition and the machinery is invisible.

Isolation invariant (add to §2 as I-13): derived state is partitioned by tenant; no derived artifact spans tenants except through an explicit, authorised cross-root federation.


10. The policy surface (mechanism over policy, made concrete)

I-7 only means something if the policy knobs are enumerated and kept out of core algorithms. The configurable presets are:

  • Canonical-source policy — chorus / designated-canonical / git-merge / overlay-only / vote-to-merge (per space or per equivalence set).
  • Federation model — the §8.3 taxonomy, per space, composable per shard.
  • Shard mode — read-only / write-through / mirrored / projected / cached / canonical (constrained by the capability profile).
  • Reconciliation cadence & conflict exposure — push/poll/manual; show-conflicts vs auto-merge-when-supported.
  • Conflict-resolution preset — chorus / designated-canonical / git-merge / vote-to-merge / overlay-only (the resolution policy over §8.6's core detection; per space or equivalence set).
  • Freshness / invalidation mode — event-driven / validator-poll / TTL / hybrid, and stale-but-fast vs block-for-fresh on read (§8.8; constrained by the operational envelope).
  • History compaction — squash policy for low-value churn, gc/repack cadence, per-shard offload (§8.1), always preserving recoverable endpoints.
  • Tenant partition mapping — tenant ↔ root-entity, and any explicit cross-root federation (§9.1, I-13).
  • Execution policy — derive/execute off (default) / sandboxed / per-shard-allowed.
  • Authorization mode — the L0L4 ladder.
  • Projection materialization — lazy/eager; snapshot vs view-time; recording retention.

Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; event-driven+TTL freshness; overlay-before-mutation; execution off; one tenant = one root) and never hard-codes any of the above. (Preset bundles that package coherent knob-sets per persona are tracked as O-8, §12 — flexibility without bundles is operator burden.)


11. Concrete module structure (bridge to implementation)

A proposed package layout for src/shard_wiki/, mapping 1:1 to the layers so the dependency rule (downward only; the derived tier is incrementally maintained, rebuild = fallback) is enforceable by import lint:

src/shard_wiki/
  model/          # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
                  #   Span, the page-shape types; capability-spectrum value types
  adapters/       # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
                  #   attachment-mode binding; concrete adapters:
    git/  folder/  gitea/  obsidian/  webdav/  notion/  …   # each: profile + verbs
  coordination/   # L3: DecisionLog (append-only, git-backed, per-space append authority/
                  #   lease), OverlayEngine (draft→patch→MR), reconcile
                  #   (current coordination state = a derived fold → lives in union/)
  federation/     # L3: FederationModel strategies (fork_journal, vcs_ping,
                  #   graph_join, feed, activitypub, engine_mirror)
  union/          # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
                  #   Transclusion (reference-not-copy)
  projection/     # L4 (derived): ReplicationProjection, DerivationProjection,
                  #   ViewRegistry (moldable), QueryIndex (delegate|derive)
  authz/          # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
  provenance/     # cross-cut LEAF: ProvenanceEnvelope type + ⊕ (effective) only — pure data
  policy/         # cross-cut LEAF: the §10 policy surface (presets + a resolve() read by
                  #   coordination/federation/projection/authz); owns NO mechanism
  api/            # L6: orchestrator API (server-side union for agents/CLI)

The cross-cutting rails are leaves, not god-modules (review D-4). provenance/ and policy/ are imported widely, so they are the highest coupling risk; the discipline that caps it is: they may import nothing in the tree and contain only stable data types + pure functions (the envelope and its ; the policy presets and a resolve(question) → choice). Mechanism never lives in a rail — policy/ says what the preset is, coordination// projection/ decide how to honour it. A change to a rail is then a change to a small, stable, dependency-free leaf, not a ripple through every layer. Capability-spectrum value types live in model/ (also leaf-like) for the same reason.

Hard import rules (enforced by import lint):

  • union/ and projection/ may import model/, adapters/, coordination/, policy/, provenance/ — but nothing may import them (they are the disposable derived tier).
  • model/, adapters/, provenance/, policy/ import nothing else in the tree (the waists and rails stay thin); provenance/ and policy/ import nothing at all.
  • coordination/ and federation/ may import the waists + rails, never the derived tier.

12. Known scaling risks & open problems

Tracked honestly rather than pretend-solved (review disposition F). Each has a chosen direction and a revisit trigger — the thing that, if observed, forces a redesign.

# Risk / open problem Chosen direction Revisit trigger
O-1 Equivalence blocking misses true matches (LSH false negatives, §8.7) accept a small miss rate; curator bindings are the escape hatch measured recall below an agreed threshold on real corpora
O-2 Convergence bound for high-write CRDT shards under partition (§8.6) causal via journal + CRDT-native merge at the shard; no global bound promised user-visible divergence that outlives a partition
O-3 Per-equivalence-set divergence tracking (§8.6) start with base-rev comparison; add vector clocks only if needed 3-way concurrent divergence that base-rev mis-orders
O-4 Persisted derived-tier cost ceiling (§8.7/§9.1) per-tenant partition, incremental-maintained, rebuild is fallback a tenant whose incremental cost still exceeds budget
O-5 Axis-interaction completeness (§6.5) the named interaction table is the contract; extend deliberately a real adapter needing an interaction not in the table
O-6 Span-address portability across projection (§7.2) shard-scoped native-id wrapping now; tumbler later cross-shard transclusion that native ids can't satisfy
O-7 Squash-compaction vs. perfect auditability (§8.1) compact the path, preserve reachable states; configurable a compliance need for every intermediate keystroke
O-8 Policy-knob proliferation → operator burden (§10) ship named preset bundles ("personal vault" / "team wiki" / "enterprise federation") over the policy surface operators mis-configuring interacting knobs
O-9 Shard sharing across roots vs tenant partition (§9.1, I-13) shard exclusive to one root by default; explicit shared-read binding otherwise (avoids double-caching a rate-limited shard) a shard legitimately needed live in two tenants
O-10 Span-level authz under transclusion (aggregation/inference leak; ⊕ across boundaries, §7.3/§9) a transcluded span inherits the stricter of source & host authz; provenance ⊕ composes the source-page envelope under the host a real cross-authz transclusion
O-11 Union under shard unavailability (§8.8 covers stale, not down) partial union + per-shard "unavailable" provenance + last-known-projection where policy allows an SLA need on partial reads
O-12 Per-space append-log throughput ceiling (§8.1 append authority) single writer per space scales across spaces; per-space log sharding if needed a single space exceeding one writer's append rate

These are the spec-writing inputs for SHARD-WP-0002; none blocks the architecture, each scopes an implementation spike.


13. Canonical data flows (the architecture exercised)

A. Attach a shard. Adapter binds (chosen attachment mode) → probes/declares a capability profile → core registers the shard under a root entity → if not git-native, the coordination journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).

B. Read a page through the union. Consumer asks the union for an identity → Identity resolver maps it to placements across shards → equivalence yields chorus or canonical → replication-projection lazily fetches from each shard (cache + freshness) → page returned wrapped in its provenance envelope → L5 filters anything the principal can't see at source.

C. Edit a read-only / limited shard. Write request → L5 PDP allows → capability profile says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully attributed.

D. Attach a computational notebook. Adapter declares profile (attachment=file-store, opacity=mixed, computational=captured-output). Core attaches the .ipynb source as canonical; presents cells + embedded outputs as derivation-projection snapshots marked "run N, env unguaranteed"; offers a static render via the view registry; re-execution stays gated off. History uses paired-text/nbdime per axis 5.


14. Key tradeoffs & decisions

Decided:

  • Capability spectra over a verb checklist — richer contract for precise, uniform degradation; tamed by an orthogonal core + implied positions + a named interaction table (§6.5). (Decided.)
  • Three states; derived = f(canonical) — sharded + coordination canonical, derived disposable (§1). (Decided; supersedes the earlier "edges vs middle" framing.)
  • Event-sourced coordination, one append authority per space — coordination-canonical state is an append-only decision log in the git journal; current state is a derived fold; a per-space append lease gives a totally-ordered log and read-your-writes across orchestrator instances (§8.1). (Decided — resolves the single-vs-multi-writer keystone.)
  • Profiles are verified, not asserted — a versioned conformance suite gates adapter admission; capability-as-data acts on verified data (§6.6). (Decided.)
  • I-2 is eventually-verified — incremental maintenance is the fast path; a digest + background consistency-checker detects and self-heals drift (§8.7). (Decided.)
  • Incremental-first, rebuild-as-fallback — the derived tier is continuously maintained from change events; full rebuild is rare and need not be cheap (§8.7). (Decided — resolves the earlier "union graph persistence" open item: persisted, per-tenant, incrementally maintained, rebuildable, §9.1.)
  • Identity ≠ fingerprint — page identity is a stable handle; fingerprints are for equivalence (§7.2). (Decided.)
  • Consistency = read-your-writes (journal) + causal (derived) + eventual/freshness-labelled (shards); conflict detection/representation is core, resolution is policy (§8.6). (Decided.)
  • Address scheme — shard-scoped native-id wrapping now; portable tumbler later (§7.2, O-6). (Decided.)
  • Default federation = fork+journal over Git; other models opt-in (§8.3). (Decided.)
  • Execution off by default — recognise+project always; execute only when gated (§8.5). (Decided.)
  • Derived tier is tenant-partitioned (I-13, §9.1). (Decided.)

Still open (carried to §12 / policy):

  1. L1 "attributed-but-open" mode — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
  2. Per-page ACL default — off (per-shard/namespace) confirmed; revisit only if demand appears.
  3. The implementation spikes in §12 (O-1…O-7).

15. What this architecture is not

  • Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
  • Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
  • Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
  • Not an execution platform — computation is recognised and projected, not hosted.
  • Not a universal ontology — no single schema is imposed on all shards.
  • Not an authentication/identity store — that is delegated (authorization is owned).

16. Traceability

  • INTENT — every invariant in §2 (I-1…I-13) cites an INTENT principle or boundary; no invariant contradicts the Stability Note.
  • Review & hardening — this revision folds in history/260615-core-architecture-blueprint-review.md via SHARD-WP-0005: A-1→§1/§3/§4 (three-state re-frame), B-1→§7.2 (identity vs equivalence), B-2→§8.6 (consistency/conflict), B-3→§9.1+I-13 (tenant isolation), C-1/C-2→§8.7/§8.8 (incremental + indexed + invalidation), C-3→§8.1 (history scaling), D-1→§6.5 (orthogonal core), D-2→§7.3 (layered provenance), D-3→§8.4 (common-case projection), D-4→§11 (policy module + rail discipline); open items→§12.
  • Round-2 review & hardening II — folds in history/260615-core-architecture-blueprint-review-2.md via SHARD-WP-0006: A-1…A-4→§3/§4/§10/§11 (overview reconciled to the body), B-1+B-3→§8.1 (event-sourced coordination + per-space append authority), B-2→§6.6 (adapter conformance suite), B-4→§8.7 (incremental retraction/propagation + I-2 digest/checker); C-1…C-4→§12 (O-8…O-11).
  • Research — §6 (spectra) ← 260614-shard-spectrum-synthesis v3; §8.3 (federation taxonomy) ← v3 §2.5; §8.4§8.5 (two-axis projection, view registry, computational scope) ← 260614-computational-page-model-synthesis; §7 page shapes ← the engine + modern-tool + computational dives; §1 thesis ← the files-canonical/index-derived through-line across Logseq/ikiwiki/GT/Pharo/Jupyter.
  • Use cases — the architecture is sized to UC-01UC-84: federation/coordination (UC-0107, 2633, 56, 7172, 79) → §8; attachment/adapter (UC-3443, 50, 53, 57, 6062, 6466, 6870, 7682) → §6; page model & fidelity (UC-34, 39, 42, 55, 5859, 67, 73, 80, 8384) → §7/§8.5; addressing/identity/query (UC-32, 4449, 5152, 54, 63, 74) → §7.2/§8.4; provenance & metadata (UC-2425, 75) → the provenance rail; collaboration & discovery (UC-0823) → L6 consumers over the union.
  • Workplans — §6§8 are the design target of SHARD-WP-0002 (T11T18); §9 is owned by ArchitectureBlueprint.md; §1 (yawex-derived resolution/overlay) aligns with SHARD-WP-0001.

17. Stability note

This document defines shard-wiki's internal architecture; it may evolve as the spec workplans land. But the thesis (§1), the invariants (§2), and the dual narrow waist (§1, §6, §7) are load-bearing — changing any of them is an architectural change in the sense of INTENT's Stability Note and should be rare and deliberate.