Files
shard-wiki/spec/CoreArchitectureBlueprint.md
tegwick 04be66161e spec(SHARD-WP-0005 T3): consistency, concurrency & conflict model (§8.6)
Fixes bug B-2. States the guarantee (read-your-writes for journal-owned
coordination-canonical state; causal across the derived tier; eventual+
freshness-labelled for sharded inputs). Conflict detection+representation =
core mechanism, resolution = policy. Overlay-apply-under-drift semantics
(fast-forward / three-way / refuse+re-present) and journal ordering.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:35:00 +02:00

39 KiB
Raw Blame History

CoreArchitectureBlueprint — shard-wiki

Status: draft for review · Date: 2026-06-15 · Owner: tegwick

The whole-system architecture for shard-wiki, synthesised from INTENT.md, the 84-entry UseCaseCatalog.md, and the full research arc (research/260608-*, research/260613-*, research/260614-* — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the core blueprint: it defines the layers, the abstractions, and the load-bearing decisions that everything else implements.

Scope relationship to the other specs:

  • ArchitectureBlueprint.md (existing) is the authorization & history sub-blueprint (the L0L4 ladder). This document references it as the design of the cross-cutting authorization layer (§9) and does not restate it.
  • SHARD-WP-0002 is the workplan that turns §6§8 into spec/FederationArchitecture.md + the adapter-contract section of spec/TechnicalSpecificationDocument.md.
  • UseCaseCatalog.md is the demand this architecture must satisfy; UC references below are load tests, not decoration.

1. The thesis: canonical vs derived (three states)

Everything in shard-wiki follows from one organising decision — that state comes in exactly three kinds, and only one of them is disposable:

1. Sharded-canonical — content owned by each shard (shard sovereignty). 2. Coordination-canonical — durable state born inside shard-wiki that encodes human or cross-shard decisions and exists nowhere else: overlays (the local truth against a read-only shard), curator equivalence bindings, alias tables, merge/reconciliation decisions. It lives in the Git coordination journal. 3. Derived-disposable — everything shard-wiki computes from (1)+(2): the union graph, equivalence index, query indexes, projections, views. It can be deleted and recomputed.

Canonical = sharded coordination. Derived = a pure function of canonical: derived = f(sharded, coordination).

This is the architectural form of "orchestrator, not engine." shard-wiki never becomes the source of truth; it composes sources and records the decisions it makes about them. The research earned the files-canonical half empirically — every serious system externalises its durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams (nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk image — is exactly the one we record as a boundary, not a backend (research/260614-squeak-pharo-deep-dive).

The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges, derived in the middle"). That was wrong by omission: it had no home for coordination- canonical state, and so contradicted itself by listing curator bindings and alias tables as "derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model fixes that crack (review finding A-1) and makes derived = f(canonical) literally true.

Three consequences fall straight out, and they are the spine of the rest of this document:

  1. Graceful degradation is free. If the derived tier is always recomputable, a backend that can only be read is still a first-class participant — you just derive less from it.
  2. Provenance is tractable. Because shard-wiki never claims to be the source, every derived artifact can always point back to the canonical input it came from (union without erasure is a structural property, not a feature bolted on).
  3. The derived tier is a pure function of canonical state. derived = f(sharded, coordination). Bugs in the derived tier are recoverable by recompute; only the two canonical tiers must be durably protected — sharded by each shard, coordination by the Git journal (history). Recomputability is a correctness property of the derived tier, not a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the operational-envelope axis.

The dual narrow waist

Heterogeneity is mediated at exactly two interfaces, and nowhere else:

  • Bottom waist — the Shard Adapter Contract (§6). Every backend, however weird, enters through one versioned, capability-described interface.
  • Top waist — the Wiki Page Model (§7). Every consumer, however demanding, sees one backend-neutral, Markdown-first-but-stretchable page model.

Between the waists, core logic is written once against capabilities and the page model — never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an adapter and declaring a capability profile, not editing core algorithms.


2. Architectural invariants

These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's principles fused with the research through-lines.

# Invariant Source
I-1 Orchestrator, not engine. Core composes shards; it never replaces or homogenises them. INTENT Stability Note
I-2 Three states; derived = f(canonical). State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. §1; Logseq/ikiwiki/GT through-line
I-3 Capability-awareness is data. A binding's abilities are a profile (positions on spectra), read by generic core logic — not per-backend branches. synthesis v3 §2; INTENT capability-aware adapters
I-4 Union without erasure. Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. INTENT; provenance-granularity spectrum (Wikibase)
I-5 Overlay before mutation. Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. INTENT
I-6 Git-addressable coordination. Every information space has a Git-backed journal even when its shards are not git-native. INTENT
I-7 Mechanism over policy. Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. INTENT
I-8 Graceful degradation. A limited backend is still usable as read-only / cache / projection / backup / patch target. INTENT
I-9 Identity ≠ placement. A page is an entity that may occupy N locations; address by identity, not by path. Trilium note/branch; ZigZag
I-10 History is the floor. Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. ArchitectureBlueprint §2
I-11 Authorization in core, authentication delegated. Core decides who-may; an external provider says who-is. INTENT; ArchitectureBlueprint
I-12 Not a file-sync daemon; not an execution platform. Sync is wiki-page-semantic; computation is recognised+projected, not hosted. INTENT; computational-page-model synthesis

3. The layered architecture

        ┌───────────────────────────────────────────────────────────────┐
        │  L6  Consumers — Orchestrator API · CLI/agents · Web/Obsidian   │
        ├───────────────────────────────────────────────────────────────┤
 X-cut  │  L5  Authorization (PEP/PDP, identity-provider iface) →         │ X-cut
 Prove- │      see ArchitectureBlueprint.md (L0L4 ladder)                │ Capa-
 nance  ├───────────────────────────────────────────────────────────────┤ bility
  ▲     │  L4  Union & Projection  (DERIVED, rebuildable cache)           │   ▲
  │     │      identity resolution · equivalence/chorus · union graph ·   │   │
  │     │      replication+derivation projections · moldable view registry│   │
  │     │      · derived query index                                      │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L3  Coordination  (Git journal · overlay/patch engine ·        │   │
  │     │      federation-model strategies · reconciliation)              │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L2  Wiki Page Model  ── TOP WAIST ──                           │   │
  │     │      backend-neutral pages · identity≠placement · span address ·│   │
  │     │      provenance envelope · the page shapes                      │   │
  │     ├───────────────────────────────────────────────────────────────┤   │
  │     │  L1  Shard Adapter Contract  ── BOTTOM WAIST ──                 │   │
  │     │      versioned iface · capability profile (15 spectra) ·        │   │
  │     │      attachment-mode binding · operation verbs                  │   │
  └──── ├───────────────────────────────────────────────────────────────┤ ──┘
        │  L0  Backends (not ours): git repos, wiki/ subdirs, Gitea/      │
        │      GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion,    │
        │      Coulomb spaces, notebooks, …                               │
        └───────────────────────────────────────────────────────────────┘

Provenance and Capability are drawn as vertical rails because they are not layers — they are present at every layer. A page object at L2 carries provenance; a projection at L4 carries provenance; an authz decision at L5 records the context under which content was read. Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4 (can we delegate a query?), and L5 (can this principal even reach the op?).

The dependency rule is strict and downward, and it tracks the three states (§1), not the layer numbers: the derived-disposable tier (the whole of L4) may be deleted and recomputed from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal). Nothing canonical may depend on derived state. Note the journal at L3 is canonical (it holds overlays, bindings, aliases, merges); only L4 is disposable.


4. Core abstractions (the vocabulary code must use)

Straight from INTENT, sharpened by research. New code maps onto these; it does not invent parallel terms.

  • Shard — an independently meaningful page store attached to a root entity, with sovereignty: its own backend, capability profile, history, identity model, limits.
  • Root entity / information space — the joined space shards attach to; the unit of Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
  • Shard adapter contract — the versioned L1 interface; the bottom waist.
  • Capability profile — a shard binding's position on each of the 15 spectra (§6) plus its supported verbs. The data structure that drives degradation.
  • Wiki page model — the L2 backend-neutral page; the top waist.
  • Page identity vs placement — a page is an entity (identity); it may have N placements (paths/shards). Addressing, equivalence, and transclusion key on identity (I-9).
  • Provenance envelope — the metadata wrapper every artifact carries: source shard, freshness, liveness, authorization context, overlay status, divergence, derivation lineage.
  • Coordination journal — the L3 Git-backed record of change flows for a space, and the durable home of all coordination-canonical state (§1): overlays, curator equivalence bindings, alias tables, merge/reconciliation decisions. This state is born inside shard-wiki, exists nowhere else, and is not derived — it must be committed, never recomputed.
  • Overlay — a non-destructive local edit against a remote/read-only/limited shard, representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an unapplied overlay is the local truth and lives in the journal.
  • Projection — a derived view of shard content, typed on two axes (§8): kind (replication | derivation) × liveness (static … irreducibly-live).
  • Federation model — the selected coordination strategy for a space (§ taxonomy, T17).
  • Shard mode — read-only · write-through · mirrored · projected · cached · canonical (a policy selection constrained by the capability profile).

5. Why "layered" and not "pipeline" or "plugin-bus"

Two rejected alternatives, recorded so the choice is legible:

  • A sync pipeline (source → transform → sink) was rejected: it implies a privileged direction and a canonical sink, which violates shard sovereignty (I-1) and union-without- erasure (I-4). shard-wiki is a star (many shards ↔ one space), not a pipe.
  • A flat plugin bus (every backend a peer plugin emitting events) was rejected as the top-level shape: it has no narrow waist, so heterogeneity leaks into every consumer. We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies), behind the waists.

The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.


6. Bottom waist — the Shard Adapter Contract (L1)

The single most important design decision in the project: the adapter contract models positions on capability spectra, not a flat checklist of boolean verbs. A backend is not "can/can't merge"; it sits somewhere on the merge spectrum, and federation operations degrade by position. This is the lesson of putting ~23 systems in one matrix (research/260614-shard-spectrum-synthesis, v3).

6.1 The fifteen capability spectra

Each binding declares a position on each axis. Core algorithms read these positions; there is no per-backend code in core (I-3).

  1. Addressing granularity — none → path → page-level store-id → in-file span → in-file block id (Logseq id::) → store-UUID → portable tumbler (Xanadu, the unreached ideal)
  2. Content identity — none → path/title → fingerprint → span-set
  3. Identity vs placement — path=identity → identity separated from placement (Trilium note/branch = a DAG)
  4. Structure — flat MD → frontmatter/key::%META% → typed objects → DB schema+ relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements
  5. History — none → internal-only / CRDT-log → open-file → git-native
  6. Merge model — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
  7. Native query — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
  8. Translation — native → lossless → lossy-with-fidelity-report (incl. HTML)
  9. Attachment mode — file-store (native | interchange-mirror) → git-IS-store → in-engine-host → local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
  10. Operational envelope — local/unbounded → realtime CRDT/WebSocket → rate-limited/ eventually-consistent/paginated
  11. Access grant — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
  12. Content opacity — plaintext → structured re-evaluable value → encrypted whole-shard → per-item → proprietary-lossy-exportable
  13. Write granularity — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
  14. Provenance granularity — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
  15. Computational / liveness — static source → captured-output snapshot → live-over-files → view-time render → irreducibly-live/temporal

6.2 Operation verbs

read, write, diff, merge, lock, version, publish, notify, transclude-source, translate-syntax, structured-payload, derive-projection, execute/evaluate. The last two are gated, off by default (§8, computational content). Verb support is part of the profile and must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).

6.3 Attachment-mode taxonomy (axis 9, expanded)

A backend may offer several modes; attach mode is a per-binding, capability-gated choice, with one declared authoritative. Modes: file-store (native vault/folder or an interchange/sync mirror), git-IS-store (the home case — forge wikis & ikiwiki: git is the store and the journal at once, resolving the engine-mirror write-race), in-engine hosted adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model), CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. Boundary: a monolithic live-memory blob (Smalltalk image, a kernel) is never an attach target — it participates only via export→files (I-12).

6.4 Contract rules

  • Versioned interface (Foswiki::Store + Foswiki::Meta is the proof that a stable store-interface-with-swappable-backends works). Capability discovery is a static profile with optional runtime negotiation.
  • Backend-swap tolerance — shard identity/provenance survives a substrate change (RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to capabilities, not to "it's files."
  • Absence is first-class — the profile must express can't cleanly (Oddmuse floor), so degradation paths are explicit, never guessed.

7. Top waist — the Wiki Page Model (L2)

Backend-neutral, Markdown-first but stretchable many ways at once. The page model is the lingua franca every consumer sees; an adapter's job is to project its backend into this model (read) and accept overlays back (write), within its capabilities.

7.1 Page shapes the model must carry

  • Prose Markdown — the baseline.
  • Typed / computed records — frontmatter/%META%/XObjects/Notion DB rows; computed metadata (Trilium inherited+templated) represented as effective-vs-own with per-attribute provenance.
  • Typed-graph statements — Wikibase claim + qualifiers + references + rank (structure far-end).
  • Inline-embedded objects — Quip/Notion spreadsheets & live apps inside prose.
  • Non-Markdown assets — drawings, canvases, images: typed asset / opaque blob / pluggable content-type registry, never silent-flattened.
  • The four computational shapes (§8): one-source-many-projections, notebook (embedded computed output), program-as-page, live/temporal.

All shapes reduce to a common skeleton: (content | source, structure, provenance envelope, [derivation rule]). The page model stores the richest faithful form as canonical and treats any Markdown rendering of a non-Markdown shape as a lossy projection (I-4 + fidelity report).

7.2 Identity, placement, addressing — three distinct concepts

The earlier draft used "identity" for two different things and (worse) suggested deriving page identity from a content fingerprint — which would make editing a page change its identity and break every reference to it (review bug B-1). They are pulled apart here:

  • Page identity — a stable handle. A shard-scoped, durable key that survives edits: the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and never collides across shards. Identity is assigned/minted, not computed from content. References, placement, transclusion targets, and overlays all key on identity.
  • Placement — where an identity sits. One identity → N placements (paths/shards) = a DAG; no single canonical path (I-9). Placement can change without changing identity.
  • Content equivalence — detecting sameness, never identity. A content fingerprint (or span-set overlap) identifies a version / a piece of content, used to detect that two distinct identities hold the same or derived content (the equivalence/chorus mechanism, §8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, same identity; two pages, identical content → same fingerprint, different identities.
  • Span addressing — a sub-page address within an identity: adopt native span IDs where minted (Roam :block/uid, Logseq id::, Notion/CRDT UUID); else a position address (path+range) or a content-fingerprint address for equivalence/transclusion. The Xanadu tumbler is the portable ideal the scheme aims at without requiring.
  • Provenance envelope rides on pages and spans (see §7.3 for its layered, low-cost form).

So the chain is: identity (stable) → placements (N, mutable) → equivalence (cross-identity sameness, fingerprint-based) — three concepts, three mechanisms, never conflated.


8. Coordination, federation & projection

8.1 Coordination journal (L3) — Git as the spine

Every information space has a Git-backed coordination journal (I-6). It records cross-shard operations (fork, import, reconcile, overlay-apply, space-branch) and is the history floor (I-10). For git-IS-store shards the shard's own git log is this journal; for non-git shards the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports (backfill open file history). History portability is a spectrum, handled per profile (axis 5).

8.2 Overlay / patch engine (L3)

The default write path for anything below write-through capability (I-5): an edit becomes a draft → patch/commit → MR, applied destructively only on explicit intent and only where the profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend still be edited safely.

8.3 Federation is plural & composable (L3) — the model taxonomy

Federation is not one mechanism. shard-wiki selects a federation model per space and composes per shard (mechanism over policy, I-7):

Model Anchor Coordination shape
Fork + journal (default home case) Federated Wiki copy-with-provenance + per-page action journal (story = replay)
VCS-replication + ping ikiwiki git clone/pull/push + change-ping
Query-time graph-join Wikibase SPARQL SERVICE join remote graphs at query time, no copy
Feed aggregation RSS/Atom inbound feed → pages
Activity streams ActivityPub Create/Update events, notify or content-bearing
Engine-mirror Wiki.js DB↔Git engine syncs its own store to a git mirror

8.4 Union & projection (L4) — the derived cache

This whole layer is derived-disposable: recomputable from canonical state — sharded content + the coordination-canonical inputs in the journal (I-2). Crucially, the automatic equivalence results are derived, but the human/curatorial inputs they consume — alias tables and curator equivalence bindings — are coordination-canonical (they live in the journal), not derived; recompute reads them, never regenerates them. It comprises:

  • Identity resolution & equivalence — detect "same topic / derived content" path- independently from derived signals (content fingerprint, span-set overlap) plus the coordination-canonical inputs (alias table, curator binding); present as chorus-of-voices or designated-canonical (a policy preset). (Scaling: §8.7.)
  • Union graph — the navigable join of pages, links, and dimensions (namespace, genealogy, version, shard, equivalence). A derived lens over canonical files+journal, never a new store (the ZigZag boundary).
  • Transclusion — one reference-not-copy primitive unifying Xanadu transclusion, ZigZag clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate named-chunk assembly, over the addressable union.
  • Projection — the two-axis model:
    • Kind: replication-projection (lazy cache of remote content — the default) vs derivation-projection (transform/compile/weave/evaluate a source).
    • Liveness: static → captured snapshot → live-over-files → view-time → irreducibly-live.
    • Derivation facets: materialization timing (ahead-of-time vs view-time), multiplicity (one output vs N co-equal), continuity (one-shot vs continuous). Every projection declares its liveness + freshness + provenance; the irreducibly-live far end has no faithful static form (source + a marked recording).
  • Moldable view registry — projection generalises to an open, type-keyed set of co-equal, possibly-computed views, none canonical-by-fact (display-canonical is policy). This unifies replication/derivation/dimensional/query projection and answers the "pluggable content-type registry" question (GT prior art).
  • Derived query index — delegate to a shard's native query engine where present (Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived index over the projection (the Logseq DataScript-over-files pattern). The index is disposable (I-2).

8.5 Computational / executable content — the scope decision

In scope as a page-model + projection concern; out of scope as an execution platform. shard-wiki recognises computational types, attaches the canonical source, and presents derived forms as provenance- and liveness-marked projections. Driving a derivation (tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a gated capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot. One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed") serves notebooks, renders, and recordings alike. No INTENT amendment is required — this lives inside the existing page model (L2) and projection model (L4).

8.6 Consistency, concurrency & conflict model

INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not the same as "no defined consistency." This is the guarantee shard-wiki does offer, and the mechanism (not policy) that makes concurrent editing safe (review bug B-2).

The consistency guarantee — causal, anchored on the journal:

  • Read-your-writes for coordination-canonical state. Once an overlay/binding/merge is committed to the journal, this client always sees it (the journal is the client's own causal spine). This is a strong local guarantee, cheap because the journal is local Git.
  • Causal consistency across the derived tier. The union/index/projections reflect a causal cut of (sharded inputs seen so far, journal). Effects never appear before their causes; a projection that has seen journal commit C has seen everything C depends on.
  • Eventual convergence for sharded-canonical inputs. Remote shard content is pulled asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest as observed, bounded by the shard's operational envelope. Freshness is always shown (provenance envelope), never faked — a stale projection is labelled stale, not wrong.

So: strong + read-your-writes for what shard-wiki owns (the journal); causal for what it derives; eventual + freshness-labelled for what shards own. No global clock, no distributed transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates rather than controls.

Conflict detection & representation is core mechanism; only resolution is policy (I-7). The split the earlier draft elided:

  • Detection (core). Divergence is detected structurally: two identities resolve as equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no longer matches the shard's current revision. Detection is always on; it is never optional.
  • Representation (core). A detected conflict is first-class data in the union, not an error: equivalent-but-divergent pages are presented as a coexisting set (the chorus/keep-both representation), each fully attributed, with the divergence recorded in the provenance envelope (union without erasure — a conflict is information, not a failure).
  • Resolution (policy). Which version wins, or whether they stay coexisting, is a configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge / overlay-only. Core never hard-codes one.

Overlay-apply under source drift (the concurrent-write case). An overlay carries the base revision of the shard content it was authored against. On apply, core compares base to the shard's current revision:

  • unchanged → apply (fast-forward), commit to journal, propagate if the profile permits;
  • changed, non-overlapping → three-way merge where the merge capability allows (axis 6), else keep-both;
  • changed, overlappingrefuse + re-present as a conflict (above); never silently clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination- canonical and valid against its base.

Ordering. The journal commit is the ordering authority for coordination-canonical effects; a shard-native write is only acknowledged in the journal after the adapter confirms it, so a crash between journal-intent and shard-write is recoverable (the intent is replayable, the write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their journal commits, giving the causal cut above.

Residual open items (tracked in Known scaling risks & open problems, §12, not pretended solved): the exact convergence bound for high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.


9. Cross-cut — Authorization (L5)

Fully specified in ArchitectureBlueprint.md (the access & history sub-blueprint); summarised here for completeness:

  • One core, a ladder of modes L0 (open/c2, zero deps) → L1 (attributed) → L2 (authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration, not re-architecture.
  • PEP wraps every adapter op; PDP decides (principal, action, target) over actions read/write/patch/merge/administer, layered on the adapter's capability profile (a shard that can't write can't be written regardless of policy — L5 consults the L1 rail).
  • Authentication delegated to a pluggable IdentityProvider (null provider = L0 default); real identity from user-engine over net-kingdom IAM.
  • Fail open only at L0, fail closed at L2+. Authorization is pure/offline once a Principal is resolved. Provenance carries authz context so the union never leaks unreadable content (the L5↔provenance-rail interaction).

10. The policy surface (mechanism over policy, made concrete)

I-7 only means something if the policy knobs are enumerated and kept out of core algorithms. The configurable presets are:

  • Canonical-source policy — chorus / designated-canonical / git-merge / overlay-only / vote-to-merge (per space or per equivalence set).
  • Federation model — the §8.3 taxonomy, per space, composable per shard.
  • Shard mode — read-only / write-through / mirrored / projected / cached / canonical (constrained by the capability profile).
  • Reconciliation cadence & conflict exposure — push/poll/manual; show-conflicts vs auto-merge-when-supported.
  • Execution policy — derive/execute off (default) / sandboxed / per-shard-allowed.
  • Authorization mode — the L0L4 ladder.
  • Projection materialization — lazy/eager; snapshot vs view-time; recording retention.

Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; overlay-before- mutation; execution off) and never hard-codes any of the above.


11. Concrete module structure (bridge to implementation)

A proposed package layout for src/shard_wiki/, mapping 1:1 to the layers so the dependency rule (downward only; L4 rebuildable) is enforceable by import lint:

src/shard_wiki/
  model/          # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
                  #   Span, the page-shape types; capability-spectrum value types
  adapters/       # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
                  #   attachment-mode binding; concrete adapters:
    git/  folder/  gitea/  obsidian/  webdav/  notion/  …   # each: profile + verbs
  coordination/   # L3: GitJournal, OverlayEngine (draft→patch→MR), reconcile
  federation/     # L3: FederationModel strategies (fork_journal, vcs_ping,
                  #   graph_join, feed, activitypub, engine_mirror)
  union/          # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
                  #   Transclusion (reference-not-copy)
  projection/     # L4 (derived): ReplicationProjection, DerivationProjection,
                  #   ViewRegistry (moldable), QueryIndex (delegate|derive)
  authz/          # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
  provenance/     # cross-cut: the envelope plumbing used by every layer
  api/            # L6: orchestrator API (server-side union for agents/CLI)

Hard import rules: union/ and projection/ may import model/, adapters/, coordination/ but nothing may import them (they are the disposable middle). model/ and adapters/ import nothing else in the tree except provenance/ (the waists stay thin).


12. Canonical data flows (the architecture exercised)

A. Attach a shard. Adapter binds (chosen attachment mode) → probes/declares a capability profile → core registers the shard under a root entity → if not git-native, the coordination journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).

B. Read a page through the union. Consumer asks the union for an identity → Identity resolver maps it to placements across shards → equivalence yields chorus or canonical → replication-projection lazily fetches from each shard (cache + freshness) → page returned wrapped in its provenance envelope → L5 filters anything the principal can't see at source.

C. Edit a read-only / limited shard. Write request → L5 PDP allows → capability profile says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully attributed.

D. Attach a computational notebook. Adapter declares profile (attachment=file-store, opacity=mixed, computational=captured-output). Core attaches the .ipynb source as canonical; presents cells + embedded outputs as derivation-projection snapshots marked "run N, env unguaranteed"; offers a static render via the view registry; re-execution stays gated off. History uses paired-text/nbdime per axis 5.


13. Key tradeoffs & decisions to confirm

Resolved here:

  • Capability spectra over a verb checklist — accept richer contract complexity for precise, uniform degradation. (Decided: spectra.)
  • Derived middle is a cache, not a store — accept recompute cost for rebuildability, provenance, and graceful degradation. (Decided: cache.)
  • Default federation = fork+journal over Git — the home case; other models opt-in. (Decided.)
  • Execution off by default — recognise+project always; execute only when gated on. (Decided.)

Open — to confirm before SHARD-WP-0002 spec-writing finalises:

  1. Union graph persistence. Pure-recompute (simplest, honours I-2 hardest) vs a persisted- but-disposable cache (faster, must guarantee rebuild equivalence). Recommendation: persisted-but-disposable with a rebuild that must reproduce it byte-for-byte.
  2. Address scheme. Ship shard-scoped native-id wrapping now and treat a portable tumbler as a later capability, or design the tumbler up front? Recommendation: wrap native ids now.
  3. L1 "attributed-but-open" mode — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
  4. Per-page ACL default — off (per-shard/namespace) confirmed; revisit only if demand appears.

14. What this architecture is not

  • Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
  • Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
  • Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
  • Not an execution platform — computation is recognised and projected, not hosted.
  • Not a universal ontology — no single schema is imposed on all shards.
  • Not an authentication/identity store — that is delegated (authorization is owned).

15. Traceability

  • INTENT — every invariant in §2 cites an INTENT principle or boundary; no invariant contradicts the Stability Note.
  • Research — §6 (spectra) ← 260614-shard-spectrum-synthesis v3; §8.3 (federation taxonomy) ← v3 §2.5; §8.4§8.5 (two-axis projection, view registry, computational scope) ← 260614-computational-page-model-synthesis; §7 page shapes ← the engine + modern-tool + computational dives; §1 thesis ← the files-canonical/index-derived through-line across Logseq/ikiwiki/GT/Pharo/Jupyter.
  • Use cases — the architecture is sized to UC-01UC-84: federation/coordination (UC-0107, 2633, 56, 7172, 79) → §8; attachment/adapter (UC-3443, 50, 53, 57, 6062, 6466, 6870, 7682) → §6; page model & fidelity (UC-34, 39, 42, 55, 5859, 67, 73, 80, 8384) → §7/§8.5; addressing/identity/query (UC-32, 4449, 5152, 54, 63, 74) → §7.2/§8.4; provenance & metadata (UC-2425, 75) → the provenance rail; collaboration & discovery (UC-0823) → L6 consumers over the union.
  • Workplans — §6§8 are the design target of SHARD-WP-0002 (T11T18); §9 is owned by ArchitectureBlueprint.md; §1 (yawex-derived resolution/overlay) aligns with SHARD-WP-0001.

16. Stability note

This document defines shard-wiki's internal architecture; it may evolve as the spec workplans land. But the thesis (§1), the invariants (§2), and the dual narrow waist (§1, §6, §7) are load-bearing — changing any of them is an architectural change in the sense of INTENT's Stability Note and should be rare and deliberate.