generated from coulomb/repo-seed
Adds O-8 (preset bundles), O-9 (shard sharing vs tenant partition), O-10 (span authz + transclusion ⊕), O-11 (union under unavailability), O-12 (append-log throughput) to §12. Refreshes §14 decisions (event-sourced coordination, conformance, I-2 verification) and §16 traceability (round-2 review + WP-0006). Flips SHARD-WP-0006 done. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
973 lines
66 KiB
Markdown
973 lines
66 KiB
Markdown
# CoreArchitectureBlueprint — shard-wiki
|
||
|
||
Status: **draft for review** · Date: 2026-06-15 · Owner: tegwick
|
||
|
||
The whole-system architecture for shard-wiki, synthesised from `INTENT.md`, the 84-entry
|
||
`UseCaseCatalog.md`, and the full research arc (`research/260608-*`, `research/260613-*`,
|
||
`research/260614-*` — ~23 wiki/knowledge systems plus two cross-dive syntheses). This is the
|
||
**core** blueprint: it defines the layers, the abstractions, and the load-bearing decisions
|
||
that everything else implements.
|
||
|
||
Scope relationship to the other specs:
|
||
|
||
- **`ArchitectureBlueprint.md`** (existing) is the **authorization & history sub-blueprint**
|
||
(the L0–L4 ladder). This document references it as the design of the cross-cutting
|
||
authorization layer (§9) and does not restate it.
|
||
- **`SHARD-WP-0002`** is the workplan that turns §6–§8 into
|
||
`spec/FederationArchitecture.md` + the adapter-contract section of
|
||
`spec/TechnicalSpecificationDocument.md`.
|
||
- **`UseCaseCatalog.md`** is the demand this architecture must satisfy; UC references below
|
||
are load tests, not decoration.
|
||
|
||
---
|
||
|
||
## 1. The thesis: *canonical vs derived* (three states)
|
||
|
||
Everything in shard-wiki follows from one organising decision — that state comes in exactly
|
||
**three kinds**, and only one of them is disposable:
|
||
|
||
> **1. Sharded-canonical** — content owned by each shard (shard sovereignty).
|
||
> **2. Coordination-canonical** — durable state *born inside shard-wiki* that encodes human
|
||
> or cross-shard decisions and exists nowhere else: overlays (the local truth against a
|
||
> read-only shard), curator equivalence bindings, alias tables, merge/reconciliation
|
||
> decisions. It is recorded as an **append-only decision log in the Git coordination
|
||
> journal** (event-sourced, §8.1); the *queryable current form* of that state (the effective
|
||
> alias table, the equivalence set) is a **derived fold** of the log — i.e. tier 3, not tier 2.
|
||
> What is canonical is the **log of decisions**, not any mutable snapshot of them.
|
||
> **3. Derived-disposable** — everything shard-wiki *computes* from (1)+(2): the union graph,
|
||
> equivalence index, query indexes, projections, views. It can be deleted and recomputed.
|
||
>
|
||
> **Canonical = sharded ∪ coordination. Derived = a pure function of canonical:**
|
||
> `derived = f(sharded, coordination)`.
|
||
|
||
This is the architectural form of "orchestrator, not engine." shard-wiki never *becomes* the
|
||
source of truth; it composes sources and records the decisions it makes about them. The
|
||
research earned the *files-canonical* half empirically — every serious system externalises its
|
||
durable truth to files+VCS and treats the rest as derived: Logseq (DataScript index over plain
|
||
files), ikiwiki (static HTML compiled from a git repo), Glamorous Toolkit / Lepiter (live
|
||
views over git-versioned JSON), Pharo (Tonel/Iceberg code as git text), Jupyter teams
|
||
(nbstripout — outputs are derived noise). The one tradition that refuses this — the Smalltalk
|
||
**image** — is exactly the one we record as a *boundary, not a backend*
|
||
(`research/260614-squeak-pharo-deep-dive`).
|
||
|
||
The earlier draft of this blueprint used a two-bucket framing ("canonical at the edges,
|
||
derived in the middle"). That was wrong by omission: it had no home for **coordination-
|
||
canonical** state, and so contradicted itself by listing curator bindings and alias tables as
|
||
"derived/rebuildable" when a human binding manifestly cannot be rebuilt. The three-state model
|
||
fixes that crack (review finding A-1) and makes `derived = f(canonical)` *literally* true.
|
||
|
||
Three consequences fall straight out, and they are the spine of the rest of this document:
|
||
|
||
1. **Graceful degradation is free.** If the derived tier is always recomputable, a backend
|
||
that can only be read is still a first-class participant — you just derive less from it.
|
||
2. **Provenance is tractable.** Because shard-wiki never claims to *be* the source, every
|
||
derived artifact can always point back to the canonical input it came from (union without
|
||
erasure is a structural property, not a feature bolted on).
|
||
3. **The derived tier is a pure function of canonical state.** `derived = f(sharded,
|
||
coordination)`. Bugs in the derived tier are recoverable by recompute; only the two
|
||
canonical tiers must be durably protected — sharded by each shard, coordination by the
|
||
Git journal (history). *Recomputability is a correctness property of the derived tier, not
|
||
a promise that a from-scratch rebuild is operationally cheap — see §8.4 and the
|
||
operational-envelope axis.*
|
||
|
||
### The dual narrow waist
|
||
|
||
Heterogeneity is mediated at exactly two interfaces, and nowhere else:
|
||
|
||
- **Bottom waist — the Shard Adapter Contract (§6).** Every backend, however weird, enters
|
||
through one versioned, capability-described interface.
|
||
- **Top waist — the Wiki Page Model (§7).** Every consumer, however demanding, sees one
|
||
backend-neutral, Markdown-first-but-stretchable page model.
|
||
|
||
Between the waists, core logic is written **once** against capabilities and the page model —
|
||
never against a specific backend. Adding TiddlyWiki or Notion or a git forge is writing an
|
||
adapter and declaring a capability profile, not editing core algorithms.
|
||
|
||
---
|
||
|
||
## 2. Architectural invariants
|
||
|
||
These are non-negotiable. Violating one is a design bug, not a tradeoff. They are INTENT's
|
||
principles fused with the research through-lines.
|
||
|
||
| # | Invariant | Source |
|
||
|---|-----------|--------|
|
||
| I-1 | **Orchestrator, not engine.** Core composes shards; it never replaces or homogenises them. | INTENT Stability Note |
|
||
| I-2 | **Three states; derived = f(canonical).** State is sharded-canonical, coordination-canonical (journal), or derived-disposable. The derived tier (union/index/projection) is a pure, recomputable function of the two canonical tiers; only canonical state is durably protected. | §1; Logseq/ikiwiki/GT through-line |
|
||
| I-3 | **Capability-awareness is data.** A binding's abilities are a *profile* (positions on spectra), read by generic core logic — not per-backend branches. | synthesis v3 §2; INTENT capability-aware adapters |
|
||
| I-4 | **Union without erasure.** Every page/revision/projection/overlay/view carries its provenance, freshness, liveness, and divergence. | INTENT; provenance-granularity spectrum (Wikibase) |
|
||
| I-5 | **Overlay before mutation.** Writes to anything below write-through land as drafts/patches/MRs first; no silent remote mutation. | INTENT |
|
||
| I-6 | **Git-addressable coordination.** Every information space has a Git-backed journal even when its shards are not git-native. | INTENT |
|
||
| I-7 | **Mechanism over policy.** Canonical-source, conflict, editorial, sync cadence are configurable presets, never hard-coded. | INTENT |
|
||
| I-8 | **Graceful degradation.** A limited backend is still usable as read-only / cache / projection / backup / patch target. | INTENT |
|
||
| I-9 | **Identity ≠ placement.** A page is an entity that may occupy N locations; address by identity, not by path. | Trilium note/branch; ZigZag |
|
||
| I-10 | **History is the floor.** Every write is a recoverable commit; recoverability, not gatekeeping, is the baseline protection. | ArchitectureBlueprint §2 |
|
||
| I-11 | **Authorization in core, authentication delegated.** Core decides who-may; an external provider says who-is. | INTENT; ArchitectureBlueprint |
|
||
| I-12 | **Not a file-sync daemon; not an execution platform.** Sync is wiki-page-semantic; computation is recognised+projected, not hosted. | INTENT; computational-page-model synthesis |
|
||
| I-13 | **Tenant-partitioned derived state.** Derived state is partitioned by tenant/root entity; no derived artifact spans tenants except via explicit, authorised cross-root federation. | §9.1; review B-3 |
|
||
|
||
---
|
||
|
||
## 3. The layered architecture
|
||
|
||
```
|
||
┌───────────────────────────────────────────────────────────────┐
|
||
│ L6 Consumers — Orchestrator API · CLI/agents · Web/Obsidian │
|
||
├───────────────────────────────────────────────────────────────┤
|
||
X-cut │ L5 Authorization (PEP/PDP, identity-provider iface) → │ X-cut
|
||
Prove- │ see ArchitectureBlueprint.md (L0–L4 ladder) │ Capa-
|
||
nance ├───────────────────────────────────────────────────────────────┤ bility
|
||
▲ │ L4 Union & Projection (DERIVED · rebuild=fallback) │ ▲
|
||
│ │ identity resolution · equivalence/chorus · union graph · │ │
|
||
│ │ replication+derivation projections · moldable view registry│ │
|
||
│ │ · derived query index │ │
|
||
│ ├───────────────────────────────────────────────────────────────┤ │
|
||
│ │ L3 Coordination (Git journal · overlay/patch engine · │ │
|
||
│ │ federation-model strategies · reconciliation) │ │
|
||
│ ├───────────────────────────────────────────────────────────────┤ │
|
||
│ │ L2 Wiki Page Model ── TOP WAIST ── │ │
|
||
│ │ backend-neutral pages · identity≠placement · span address ·│ │
|
||
│ │ provenance envelope · the page shapes │ │
|
||
│ ├───────────────────────────────────────────────────────────────┤ │
|
||
│ │ L1 Shard Adapter Contract ── BOTTOM WAIST ── │ │
|
||
│ │ versioned iface · capability profile (orthogonal) · │ │
|
||
│ │ attachment-mode binding · operation verbs │ │
|
||
└──── ├───────────────────────────────────────────────────────────────┤ ──┘
|
||
│ L0 Backends (not ours): git repos, wiki/ subdirs, Gitea/ │
|
||
│ GitLab/GitHub wikis, folders, Obsidian, WebDAV, Notion, │
|
||
│ Coulomb spaces, notebooks, … │
|
||
└───────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
**Provenance** and **Capability** are drawn as vertical rails because they are not layers —
|
||
they are present at every layer. A page object at L2 carries provenance; a projection at L4
|
||
carries provenance; an authz decision at L5 records the context under which content was read.
|
||
Likewise a capability profile declared at L1 is consulted at L3 (can we write-through?), L4
|
||
(can we delegate a query?), and L5 (can this principal even reach the op?).
|
||
|
||
The dependency rule is strict and downward, and it tracks the **three states (§1)**, not the
|
||
layer numbers: **the derived-disposable tier (the whole of L4) may be deleted and recomputed
|
||
from canonical state (sharded content at L1 + coordination-canonical state in the L3 journal).**
|
||
Nothing canonical may depend on derived state. Note the journal at L3 is *canonical* (it holds
|
||
overlays, bindings, aliases, merges); only L4 is disposable.
|
||
|
||
---
|
||
|
||
## 4. Core abstractions (the vocabulary code must use)
|
||
|
||
Straight from INTENT, sharpened by research. New code maps onto these; it does not invent
|
||
parallel terms.
|
||
|
||
- **Shard** — an independently meaningful page store attached to a root entity, with
|
||
*sovereignty*: its own backend, capability profile, history, identity model, limits.
|
||
- **Root entity / information space** — the joined space shards attach to; the unit of
|
||
Git coordination and of multi-tenancy (a tenant maps to a root entity, ArchitectureBlueprint).
|
||
- **Shard adapter contract** — the versioned L1 interface; the bottom waist.
|
||
- **Capability profile** — a shard binding's position on each of the 15 spectra (§6) plus its
|
||
supported verbs. *The* data structure that drives degradation.
|
||
- **Wiki page model** — the L2 backend-neutral page; the top waist.
|
||
- **Page identity vs placement vs equivalence** — a page is an entity with a *stable handle*
|
||
(identity); it may have N placements (paths/shards); **addressing and transclusion key on
|
||
identity, but equivalence keys on content fingerprint *across* identities** (§7.2, I-9). The
|
||
three are distinct mechanisms — never conflate identity with a fingerprint.
|
||
- **Provenance envelope** — the metadata each artifact carries (source shard, freshness,
|
||
liveness, authz context, overlay status, divergence, lineage), stored **layered**: a
|
||
page-level envelope + span-level *deltas*, so per-span cost is near-zero when uniform (§7.3).
|
||
- **Coordination journal** — the L3 Git-backed, **append-only decision log** for a space: the
|
||
durable home of all **coordination-canonical** state (§1, §8.1) as *events* (overlay-created,
|
||
binding-made, alias-set, merge-decided), plus the content change-flow record. It is event-
|
||
sourced — committed, never overwritten; the queryable current coordination state is a derived
|
||
fold of it (§8.1).
|
||
- **Overlay** — a non-destructive local edit against a remote/read-only/limited shard,
|
||
representable as draft/patch/commit/MR before destructive apply. Coordination-canonical: an
|
||
unapplied overlay is the local truth and lives in the journal.
|
||
- **Projection** — a derived view of shard content. The default is a **plain lazy
|
||
replication-projection** (a freshness-stamped cache); only *source* content needing
|
||
transform/evaluate uses the **derivation-projection** extension point with its two-axis
|
||
typing (kind × liveness) and the moldable view registry (§8.4–§8.5).
|
||
- **Federation model** — the selected coordination strategy for a space (§ taxonomy, T17).
|
||
- **Shard mode** — read-only · write-through · mirrored · projected · cached · canonical
|
||
(a *policy* selection constrained by the capability profile).
|
||
|
||
---
|
||
|
||
## 5. Why "layered" and not "pipeline" or "plugin-bus"
|
||
|
||
Two rejected alternatives, recorded so the choice is legible:
|
||
|
||
- **A sync pipeline** (source → transform → sink) was rejected: it implies a privileged
|
||
direction and a canonical sink, which violates shard sovereignty (I-1) and union-without-
|
||
erasure (I-4). shard-wiki is a *star* (many shards ↔ one space), not a pipe.
|
||
- **A flat plugin bus** (every backend a peer plugin emitting events) was rejected as the
|
||
*top-level* shape: it has no narrow waist, so heterogeneity leaks into every consumer.
|
||
We keep the plugin idea but confine it to L1 (adapters) and L3 (federation strategies),
|
||
behind the waists.
|
||
|
||
The layered-with-rails shape is what makes I-2/I-3/I-4 hold simultaneously.
|
||
|
||
---
|
||
|
||
## 6. Bottom waist — the Shard Adapter Contract (L1)
|
||
|
||
The single most important design decision in the project: **the adapter contract models
|
||
positions on capability spectra, not a flat checklist of boolean verbs.** A backend is not
|
||
"can/can't merge"; it sits *somewhere* on the merge spectrum, and federation operations
|
||
degrade by position. This is the lesson of putting ~23 systems in one matrix
|
||
(`research/260614-shard-spectrum-synthesis`, v3).
|
||
|
||
### 6.1 The fifteen capability spectra
|
||
|
||
Each binding declares a position on each axis. Core algorithms read these positions; there is
|
||
no per-backend code in core (I-3).
|
||
|
||
1. **Addressing granularity** — none → path → page-level store-id → in-file span → in-file
|
||
block id (Logseq `id::`) → store-UUID → portable tumbler (Xanadu, the unreached ideal)
|
||
2. **Content identity** — none → path/title → fingerprint → span-set
|
||
3. **Identity vs placement** — path=identity → identity separated from placement (Trilium
|
||
note/branch = a DAG)
|
||
4. **Structure** — flat MD → frontmatter/`key::` → `%META%` → typed objects → DB schema+
|
||
relations → object-graph/ontology → computed (inherited+templated) → typed-graph statements
|
||
5. **History** — none → internal-only / CRDT-log → open-file → git-native
|
||
6. **Merge model** — none → git/text → conflict-notes/keep-both → native-CRDT → coexist-with-rank
|
||
7. **Native query** — none → text → build-your-own derived index → datalog/graph → DB query → SPARQL
|
||
8. **Translation** — native → lossless → lossy-with-fidelity-report (incl. HTML)
|
||
9. **Attachment mode** — file-store (native | interchange-mirror) → git-IS-store → in-engine-host
|
||
→ local-REST → external-API → direct-DB → CRDT-replica → P2P/no-central-endpoint
|
||
10. **Operational envelope** — local/unbounded → realtime CRDT/WebSocket → rate-limited/
|
||
eventually-consistent/paginated
|
||
11. **Access grant** — open → token → OAuth scoped+revocable → P2P key/invite → enterprise ACL
|
||
12. **Content opacity** — plaintext → structured re-evaluable value → encrypted whole-shard →
|
||
per-item → proprietary-lossy-exportable
|
||
13. **Write granularity** — whole-file (TiddlyWiki) → per-page → section/anchor → per-block → story-item
|
||
14. **Provenance granularity** — per-shard → per-page → per-edit → per-statement/value (Wikibase rank+refs)
|
||
15. **Computational / liveness** — static source → captured-output snapshot → live-over-files →
|
||
view-time render → irreducibly-live/temporal
|
||
|
||
### 6.2 Operation verbs
|
||
|
||
`read, write, diff, merge, lock, version, publish, notify, transclude-source,
|
||
translate-syntax, structured-payload, derive-projection, execute/evaluate`. The last two are
|
||
**gated, off by default** (§8, computational content). Verb support is part of the profile and
|
||
must reconcile with the federation-ops capability matrix (SHARD-WP-0002 T10).
|
||
|
||
### 6.3 Attachment-mode taxonomy (axis 9, expanded)
|
||
|
||
A backend may offer **several** modes; attach mode is a **per-binding, capability-gated
|
||
choice**, with one declared authoritative. Modes: file-store (native vault/folder *or* an
|
||
interchange/sync mirror), **git-IS-store** (the home case — forge wikis & ikiwiki: git is the
|
||
store *and* the journal at once, resolving the engine-mirror write-race), in-engine hosted
|
||
adapter (XWiki component, Obsidian/Logseq/Roam plugin, Trilium script), local-REST (Joplin
|
||
Data API, Trilium ETAPI), external-API-only (Notion), direct-DB (MojoMojo schema→model),
|
||
CRDT-replica (Anytype/AFFiNE/AppFlowy), P2P/no-central-endpoint. **Boundary:** a monolithic
|
||
live-memory blob (Smalltalk image, a kernel) is **never** an attach target — it participates
|
||
only via export→files (I-12).
|
||
|
||
### 6.4 Contract rules
|
||
|
||
- **Versioned interface** (Foswiki::Store + Foswiki::Meta is the proof that a stable
|
||
store-interface-with-swappable-backends works). Capability discovery is a static profile
|
||
with optional runtime negotiation.
|
||
- **Backend-swap tolerance** — shard identity/provenance survives a substrate change
|
||
(RCS↔PlainFile, folder→Git, Logseq file→SQLite): bind to *capabilities*, not to "it's files."
|
||
- **Absence is first-class** — the profile must express *can't* cleanly (Oddmuse floor), so
|
||
degradation paths are explicit, never guessed.
|
||
|
||
### 6.5 Orthogonal core, implied positions, and the interaction subset
|
||
|
||
Fifteen independent ordinal axes is *descriptively* right but would be *operationally* a mess
|
||
if treated as fifteen free dimensions: the axes are **not orthogonal**, and a degradation
|
||
function over all 15 jointly is the flat-checklist problem returning in higher dimensions
|
||
(review D-1). Three rules tame it.
|
||
|
||
**(a) A small orthogonal core; the rest are implied.** Most axes are *correlated* and collapse
|
||
to a few independent choices. The **core axes** an adapter must independently declare:
|
||
|
||
1. **Substrate** → drives attachment-mode, history, merge, and native-query positions together
|
||
(git-IS-store ⟹ history=git-native ⟹ merge=git/text ⟹ query=build-your-own-index;
|
||
relational-DB ⟹ direct-DB attach ⟹ DB-version-row history ⟹ DB query).
|
||
2. **Write granularity** → drives addressing granularity and the overlay/patch shape.
|
||
3. **Content opacity** → drives translation and (where encrypted) collapses native-query.
|
||
4. **Operational envelope** → drives freshness mode (§8.8) and rebuild expectations (§8.7).
|
||
5. **Access grant** → independent (authz, L5).
|
||
6. **Computational/liveness** → independent (projection kind, §8.5).
|
||
|
||
The remaining axes are **implied/derived** from these via published implication rules; an
|
||
adapter *may* override an implied position, but the default is computed, not hand-set. This
|
||
turns ~15 free dimensions into ~6 independent ones plus derivations — fewer things to get
|
||
wrong, and impossible combinations become unrepresentable.
|
||
|
||
**(b) Implication rules forbid impossible profiles.** E.g. `attachment=git-IS-store ⟹
|
||
history≥git-native`; `opacity=encrypted-whole-shard ⟹ native-query=none ∧ translation≤opaque`;
|
||
`merge=native-CRDT ⟹ history=CRDT-log ∧ envelope=realtime`. A profile that violates an
|
||
implication is rejected at registration — capability-as-data (I-3) with integrity constraints.
|
||
|
||
**(c) The degradation function reads a *named, small* interaction subset — not all pairs.**
|
||
"No per-backend code" is only credible if we say *which* axis interactions the generic logic
|
||
actually consults. They are:
|
||
|
||
| Operation | Axes consulted (jointly) |
|
||
|-----------|--------------------------|
|
||
| **write / overlay-apply** | write-granularity × merge-model × history × access-grant |
|
||
| **transclude / address a span** | addressing-granularity × write-granularity × identity-vs-placement |
|
||
| **project / cache** | operational-envelope × computational-liveness × content-opacity |
|
||
| **query** | native-query × content-opacity (encrypted ⇒ derive-index-or-none) |
|
||
| **translate** | translation × content-opacity × structure |
|
||
| **federate** | substrate × history × merge (per the §8.3 model) |
|
||
|
||
Everything else is a single-axis check. This table *is* the degradation contract: it is small,
|
||
enumerated, and testable — the proof obligation behind "core logic written once."
|
||
|
||
### 6.6 Conformance — profiles are verified, never self-asserted
|
||
|
||
Capability-as-data (I-3) and the entire degradation contract (§6.5) rest on one assumption:
|
||
**the profile tells the truth.** If an adapter declares `merge=git/text` but corrupts merges,
|
||
or claims `notify` and never emits, it silently poisons every degradation decision in core —
|
||
the failure is invisible because core *believed the data* (review B-2). So the profile is not
|
||
taken on trust:
|
||
|
||
- **The contract ships a versioned conformance suite.** A published battery that, given a live
|
||
binding, **exercises each declared verb and each declared spectrum position and checks that
|
||
observed behaviour matches the claim** (a `write` round-trips; a `diff` is real; `notify`
|
||
actually fires; an "encrypted/opaque" shard genuinely refuses plaintext query; an
|
||
implication-rule position, §6.5(b), holds). The suite is versioned *with* the contract, so an
|
||
adapter proves conformance against a known contract version.
|
||
- **Passing conformance is an admissibility precondition.** A binding that fails (declares a
|
||
capability it does not honour) is **rejected at registration**, not run in production with a
|
||
lying profile. Capability discovery (§6.4) therefore yields a *verified* profile.
|
||
- **Self-reported, then verified.** Adapters still *declare* their profile (discovery stays
|
||
cheap); conformance *verifies* the declaration. The two together are what make I-3 and §6.5
|
||
sound rather than aspirational — degradation logic acts on verified data.
|
||
- **Mismatch is data, not a crash.** A conformance gap is reported as a precise
|
||
capability-by-capability diff (what was claimed vs observed), so an adapter author fixes the
|
||
profile or the code; degraded-but-honest registration (drop the unsupported claim) is allowed.
|
||
|
||
This is the same discipline a versioned store interface needs in general (the `Foswiki::Store`
|
||
lineage that inspired the contract): a backend may only participate behind the interface if it
|
||
*demonstrably* behaves as the interface says.
|
||
|
||
---
|
||
|
||
## 7. Top waist — the Wiki Page Model (L2)
|
||
|
||
Backend-neutral, **Markdown-first but stretchable many ways at once**. The page model is the
|
||
lingua franca every consumer sees; an adapter's job is to project its backend into this model
|
||
(read) and accept overlays back (write), within its capabilities.
|
||
|
||
### 7.1 Page shapes the model must carry
|
||
|
||
- **Prose Markdown** — the baseline.
|
||
- **Typed / computed records** — frontmatter/`%META%`/XObjects/Notion DB rows; **computed
|
||
metadata** (Trilium inherited+templated) represented as *effective-vs-own with per-attribute
|
||
provenance*.
|
||
- **Typed-graph statements** — Wikibase claim + qualifiers + references + rank (structure
|
||
far-end).
|
||
- **Inline-embedded objects** — Quip/Notion spreadsheets & live apps inside prose.
|
||
- **Non-Markdown assets** — drawings, canvases, images: typed asset / opaque blob / pluggable
|
||
content-type registry, never silent-flattened.
|
||
- **The four computational shapes** (§8): one-source-many-projections, notebook (embedded
|
||
computed output), program-as-page, live/temporal.
|
||
|
||
All shapes reduce to a common skeleton: **`(content | source, structure, provenance envelope,
|
||
[derivation rule])`**. The page model stores the richest faithful form as canonical and treats
|
||
any Markdown rendering of a non-Markdown shape as a *lossy projection* (I-4 + fidelity report).
|
||
|
||
### 7.2 Identity, placement, addressing — three distinct concepts
|
||
|
||
The earlier draft used "identity" for two different things and (worse) suggested deriving page
|
||
identity from a content fingerprint — which would make *editing a page change its identity* and
|
||
break every reference to it (review bug B-1). They are pulled apart here:
|
||
|
||
- **Page identity — a *stable handle*.** A shard-scoped, durable key that **survives edits**:
|
||
the backend's native page/note id where one exists (Roam/Notion/Trilium uid, a git path
|
||
treated as a name, a wiki page name), wrapped in a shard scope so it survives projection and
|
||
never collides across shards. Identity is *assigned/minted, not computed from content*.
|
||
References, placement, transclusion targets, and overlays all key on identity.
|
||
- **Placement — *where* an identity sits.** One identity → N placements (paths/shards) = a DAG;
|
||
no single canonical path (I-9). Placement can change without changing identity.
|
||
- **Content equivalence — *detecting sameness*, never identity.** A **content fingerprint** (or
|
||
span-set overlap) identifies a *version / a piece of content*, used to detect that two
|
||
*distinct identities* hold the same or derived content (the equivalence/chorus mechanism,
|
||
§8.4). A fingerprint is never a page's identity: same page, edited → new fingerprint, **same
|
||
identity**; two pages, identical content → same fingerprint, **different identities**.
|
||
- **Span addressing** — a sub-page address within an identity: adopt native span IDs where
|
||
minted (Roam `:block/uid`, Logseq `id::`, Notion/CRDT UUID); else a *position* address
|
||
(path+range) or a *content-fingerprint* address for equivalence/transclusion. The Xanadu
|
||
tumbler is the portable ideal the scheme aims at without requiring.
|
||
- **Provenance envelope** rides on pages and spans (see §7.3 for its layered, low-cost form).
|
||
|
||
So the chain is: **identity (stable) → placements (N, mutable) → equivalence (cross-identity
|
||
sameness, fingerprint-based)** — three concepts, three mechanisms, never conflated.
|
||
|
||
### 7.3 Provenance is layered, not per-span-duplicated
|
||
|
||
A provenance envelope on *every span* (source shard, freshness, liveness, overlay status,
|
||
authz context, divergence, lineage) would, at block granularity, mean ~10k near-identical
|
||
envelopes for a 10k-block page — provenance dwarfing content (review D-2). The fix is the exact
|
||
pattern the page model already uses for Trilium's computed metadata: **effective-vs-own**.
|
||
|
||
- **Page-level envelope** holds the values that are uniform across the page (almost always:
|
||
source shard, observed-at, liveness, authz context).
|
||
- **Span-level deltas** record *only where a span differs* from its page envelope — a
|
||
transcluded span from another shard, an overlaid span, a span that diverges. A span with no
|
||
delta inherits the page envelope at zero storage cost.
|
||
- **Effective provenance** for any span = page envelope ⊕ span delta, computed on read.
|
||
|
||
Per-span cost is therefore **near-zero in the common (uniform) case** and pays only for genuine
|
||
heterogeneity — the same "carry only the difference" principle, applied to shard-wiki's own
|
||
metadata. Provenance remains complete (I-4); it is just not redundantly materialised.
|
||
|
||
---
|
||
|
||
## 8. Coordination, federation & projection
|
||
|
||
### 8.1 Coordination journal (L3) — Git as the spine
|
||
|
||
Every information space has a Git-backed coordination journal (I-6). It records cross-shard
|
||
operations (fork, import, reconcile, overlay-apply, space-branch) and **is** the history floor
|
||
(I-10). For git-IS-store shards the shard's own git log *is* this journal; for non-git shards
|
||
the journal supplements (begins-now / mirrors-forward / snapshots-replica) or imports
|
||
(backfill open file history). History portability is a spectrum, handled per profile (axis 5).
|
||
|
||
**The journal is an append-only decision log; current coordination state is a derived fold
|
||
(review B-3).** The first draft said coordination-canonical state "lives in the journal"
|
||
without saying how Git — excellent for history, poor for mutable structured state — represents
|
||
an alias table or an equivalence graph. Resolution: **event sourcing.** The journal stores
|
||
*decisions as events* (`overlay-created`, `binding-made`, `alias-set`, `merge-decided`,
|
||
`page-forked`), append-only and git-addressable (so history/patch/review/backup over
|
||
coordination state come for free — I-6 is *strengthened*, not bypassed). The **queryable
|
||
current state** (the effective alias table, the live equivalence set) is a **derived fold** of
|
||
the log — tier-3 disposable, indexed like any other derived structure (§8.7), rebuilt by
|
||
replaying the log. So "all equivalences touching X" is an index lookup, not an O(scan) of Git.
|
||
This is the clean form of the §1 three-state model: **the log is canonical; its folded current
|
||
state is derived.**
|
||
|
||
**Concurrency: who may append (review B-1).** A multi-tenant L4 deployment runs several
|
||
orchestrator instances, so "the journal is local Git, single writer" is not given. The model:
|
||
|
||
- **One *append authority* per information space.** Appends to a space's log are serialized
|
||
through a single logical writer (a per-space lease/leader; instances without the lease forward
|
||
their append intents to it). This makes the log a **totally-ordered event sequence** per space
|
||
— the ordering authority §8.6 relies on — without a distributed transaction. Spaces are
|
||
independent, so this scales horizontally *across* spaces (the unit of partition is the space /
|
||
root entity, matching the tenant partition, I-13); it is a per-space serialization point, not
|
||
a global one.
|
||
- **Git is the durable, addressable form; appends are commits** (or fast objects batched into
|
||
commits) under the lease — no concurrent-writer merge races because there is one writer at a
|
||
time per space.
|
||
- **Read-your-writes** holds within a space because every reader resolves current state from
|
||
the same ordered log (or its fold); across spaces there is no shared state to be inconsistent.
|
||
- **HA / failover:** the lease is time-bounded and re-grantable; a failed append-authority is
|
||
replaced and resumes from the log's head (the log is the recovery point). A partition that
|
||
splits the authority degrades that *space* to read-only until a single writer is re-elected —
|
||
it never forks the log (availability yields to log integrity; an explicit, stated trade).
|
||
- **Open residual (→ §12, O-3-adjacent):** whether very high append rates need per-space log
|
||
*sharding* (sub-logs merged by a deterministic order) is an implementation spike, not an
|
||
architectural change.
|
||
|
||
**History must stay recoverable *and* bounded (review C-3).** "Every write is a commit" + open
|
||
L0 means an unbounded, bot-/vandalism-amplified journal that eventually degrades Git itself.
|
||
Recoverability (I-10) is non-negotiable, so the answer is *compaction, not deletion*:
|
||
|
||
- **Routine git maintenance** — background `gc`/repack, commit-graph, and (for very large
|
||
spaces) partial-clone / sparse strategies; operational, no semantic change.
|
||
- **Squash-compaction of low-value churn (policy, §10)** — long runs of rapid same-author
|
||
edits or revert-pairs can be folded into checkpoint commits *while preserving the recoverable
|
||
endpoints*; what is squashed is configurable and always leaves the content recoverable (it
|
||
compacts the *path*, not the *reachable states*).
|
||
- **Per-shard history offload** — a git-IS-store shard keeps its own history in its own repo;
|
||
the coordination journal references it rather than duplicating it (the journal records
|
||
*coordination* events, not a second copy of every shard commit).
|
||
- **Anti-abuse hooks (policy)** — rate-limiting / quarantine for anonymous L0 writers feed the
|
||
authz/policy layer; they throttle *abuse*, never legitimate history. Recoverability is the
|
||
floor; bounding is how it survives at scale.
|
||
|
||
### 8.2 Overlay / patch engine (L3)
|
||
|
||
The default write path for anything below write-through capability (I-5): an edit becomes a
|
||
draft → patch/commit → MR, applied destructively only on explicit intent and only where the
|
||
profile + policy both permit. This is what lets a read-only or rate-limited or lossy backend
|
||
still be *edited* safely.
|
||
|
||
### 8.3 Federation is plural & composable (L3) — the model taxonomy
|
||
|
||
Federation is not one mechanism. shard-wiki selects a **federation model per space and
|
||
composes per shard** (mechanism over policy, I-7):
|
||
|
||
| Model | Anchor | Coordination shape |
|
||
|-------|--------|--------------------|
|
||
| **Fork + journal** (default home case) | Federated Wiki | copy-with-provenance + per-page action journal (story = replay) |
|
||
| **VCS-replication + ping** | ikiwiki | git clone/pull/push + change-ping |
|
||
| **Query-time graph-join** | Wikibase SPARQL `SERVICE` | join remote graphs at query time, no copy |
|
||
| **Feed aggregation** | RSS/Atom | inbound feed → pages |
|
||
| **Activity streams** | ActivityPub | Create/Update events, notify or content-bearing |
|
||
| **Engine-mirror** | Wiki.js DB↔Git | engine syncs its own store to a git mirror |
|
||
|
||
### 8.4 Union & projection (L4) — the derived cache
|
||
|
||
This whole layer is **derived-disposable**: recomputable from canonical state — sharded
|
||
content + the **coordination-canonical** inputs in the journal (I-2). Crucially, the *automatic*
|
||
equivalence results are derived, but the **human/curatorial inputs they consume — alias tables
|
||
and curator equivalence bindings — are coordination-canonical (they live in the journal), not
|
||
derived**; recompute reads them, never regenerates them. It comprises:
|
||
|
||
- **Identity resolution & equivalence** — detect "same topic / derived content" path-
|
||
independently from *derived* signals (content fingerprint, span-set overlap) **plus** the
|
||
*coordination-canonical* inputs (alias table, curator binding); present as
|
||
**chorus-of-voices** or designated-canonical (a *policy* preset). (Scaling: §8.7.)
|
||
- **Union graph** — the navigable join of pages, links, and dimensions (namespace, genealogy,
|
||
version, shard, equivalence). A *derived lens over canonical files+journal, never a new
|
||
store* (the ZigZag boundary).
|
||
- **Transclusion** — one **reference-not-copy** primitive unifying Xanadu transclusion, ZigZag
|
||
clone, Roam/Obsidian/Logseq embed, Notion synced block, Trilium note-cloning, and literate
|
||
named-chunk assembly, over the addressable union.
|
||
- **Projection — trivial by default, extensible for the tail.** The 95% case (Markdown in a
|
||
shard) must cost nothing conceptually, so:
|
||
- **Default = plain lazy replication-projection** — a freshness-stamped cache of remote
|
||
content (§8.8). This is *the* projection for ordinary pages; it needs no taxonomy, no
|
||
liveness reasoning, no registry. Most shards never touch anything below.
|
||
- **Extension point — derivation-projection** — invoked *only* for content that is a
|
||
*source* needing transform/compile/weave/evaluate (computational/typed content, §8.5). It
|
||
adds the liveness axis (static → captured → live-over-files → view-time → irreducibly-live)
|
||
and facets (materialization timing, multiplicity, continuity); the irreducibly-live far end
|
||
has no faithful static form (source + a marked recording). A binding that never serves such
|
||
content never instantiates any of this.
|
||
- Both kinds stamp freshness + provenance; only derivation carries the liveness machinery.
|
||
- **Moldable view registry — also an extension point, not a tax on every page.** Where a content
|
||
type offers multiple co-equal views (typed/computed/dimensional content), they are registered
|
||
as an **open, type-keyed set, none canonical-by-fact** (display-canonical is policy; GT prior
|
||
art, answers the "pluggable content-type registry" question). An ordinary Markdown page has
|
||
exactly one view and never consults the registry — the registry is queried only when a type
|
||
declares >1 view.
|
||
- **Derived query index** — delegate to a shard's native query engine where present
|
||
(Roam/Logseq Datalog, Notion DB query, XWiki XWQL, Wikibase SPARQL); else build a derived
|
||
index over the projection (the Logseq DataScript-over-files pattern). The index is
|
||
disposable (I-2).
|
||
|
||
### 8.5 Computational / executable content — the scope decision
|
||
|
||
**In scope as a page-model + projection concern; out of scope as an execution platform.**
|
||
shard-wiki *recognises* computational types, attaches the **canonical source**, and presents
|
||
derived forms as **provenance- and liveness-marked projections**. Driving a derivation
|
||
(tangle/weave, re-execute a notebook, render a sketch, evaluate a pattern) is a **gated
|
||
capability, off by default, with a trust/sandbox concern, degrading to a captured snapshot**.
|
||
One snapshot-provenance record (run id, source rev, timestamp, environment "unguaranteed")
|
||
serves notebooks, renders, and recordings alike. **No INTENT amendment is required** — this
|
||
lives inside the existing page model (L2) and projection model (L4).
|
||
|
||
### 8.6 Consistency, concurrency & conflict model
|
||
|
||
INTENT makes real-time cross-shard consistency a non-goal — but "no strong consistency" is not
|
||
the same as "no defined consistency." This is the guarantee shard-wiki *does* offer, and the
|
||
mechanism (not policy) that makes concurrent editing safe (review bug B-2).
|
||
|
||
**The consistency guarantee — causal, anchored on the journal:**
|
||
|
||
- **Read-your-writes for coordination-canonical state.** Once an overlay/binding/merge is
|
||
appended to the space's decision log, every reader of that space sees it — because the log is
|
||
a **single totally-ordered sequence per space** (one append authority, §8.1), and all readers
|
||
resolve current state from that one order. The guarantee holds across orchestrator instances,
|
||
not just within one process; it is cheap because ordering is per-space, never global.
|
||
- **Causal consistency across the derived tier.** The union/index/projections reflect a causal
|
||
cut of `(sharded inputs seen so far, journal)`. Effects never appear before their causes; a
|
||
projection that has seen journal commit *C* has seen everything *C* depends on.
|
||
- **Eventual convergence for sharded-canonical inputs.** Remote shard content is pulled
|
||
asynchronously (lazily or by notify/poll, §8.7); the union converges to each shard's latest
|
||
*as observed*, bounded by the shard's operational envelope. Freshness is always *shown*
|
||
(provenance envelope), never faked — a stale projection is labelled stale, not wrong.
|
||
|
||
So: **strong + read-your-writes for what shard-wiki owns (the journal); causal for what it
|
||
derives; eventual + freshness-labelled for what shards own.** No global clock, no distributed
|
||
transaction, no two-phase commit across shards — none is needed, because shard-wiki coordinates
|
||
rather than controls.
|
||
|
||
**Conflict detection & representation is core mechanism; only resolution is policy (I-7).**
|
||
The split the earlier draft elided:
|
||
|
||
- **Detection (core).** Divergence is detected structurally: two identities resolve as
|
||
equivalent (§8.4) but their content fingerprints differ, or an overlay's base revision no
|
||
longer matches the shard's current revision. Detection is always on; it is never optional.
|
||
- **Representation (core).** A detected conflict is **first-class data in the union**, not an
|
||
error: equivalent-but-divergent pages are presented as a **coexisting set** (the
|
||
chorus/keep-both representation), each fully attributed, with the divergence recorded in the
|
||
provenance envelope (union without erasure — a conflict is information, not a failure).
|
||
- **Resolution (policy).** *Which* version wins, or whether they stay coexisting, is a
|
||
configurable preset (§10): chorus / designated-canonical / git-merge / vote-to-merge /
|
||
overlay-only. Core never hard-codes one.
|
||
|
||
**Overlay-apply under source drift (the concurrent-write case).** An overlay carries the
|
||
**base revision** of the shard content it was authored against. On apply, core compares base to
|
||
the shard's current revision:
|
||
|
||
- *unchanged* → apply (fast-forward), commit to journal, propagate if the profile permits;
|
||
- *changed, non-overlapping* → three-way merge where the merge capability allows (axis 6),
|
||
else keep-both;
|
||
- *changed, overlapping* → **refuse + re-present** as a conflict (above); never silently
|
||
clobber (I-5, no silent remote mutation). The unapplied overlay remains coordination-
|
||
canonical and valid against its base.
|
||
|
||
**Ordering.** The journal commit is the ordering authority for coordination-canonical effects;
|
||
a shard-native write is only *acknowledged* in the journal after the adapter confirms it, so a
|
||
crash between journal-intent and shard-write is recoverable (the intent is replayable, the
|
||
write is idempotent-keyed on identity+base-rev). Cross-shard operations are ordered by their
|
||
journal commits, giving the causal cut above.
|
||
|
||
**Residual open items** (tracked in *Known scaling risks & open problems*, §12, not pretended
|
||
solved): the exact convergence bound for
|
||
high-write CRDT shards under partition, and whether per-equivalence-set divergence needs a
|
||
vector clock vs. a simple base-rev comparison, are deferred to implementation spikes.
|
||
|
||
### 8.7 Scaling the union — incremental-first, rebuild as fallback
|
||
|
||
The derived tier is *recomputable* (I-2) but recompute must never be the **operational**
|
||
mechanism. A from-scratch rebuild reads every page of every shard — including rate-limited,
|
||
paginated external APIs (Notion) and irreducibly-live sources — which can take hours to days
|
||
and directly fights the operational-envelope axis (review C-2). So:
|
||
|
||
**Incremental, change-driven maintenance is the primary mechanism.** Each shard's `notify`
|
||
capability (or a poll/ETag fallback where it has none, §8.8) emits **change events**; an event
|
||
drives a **delta update** to exactly the affected union nodes, equivalence candidates, indexes,
|
||
and projections. The derived tier is a continuously-maintained materialised view, not a
|
||
periodically-recomputed one. Steady-state cost is O(changes), not O(corpus).
|
||
|
||
**Full rebuild is a rare, bounded fallback** — for cold start, schema/algorithm change, or
|
||
suspected corruption — and it is **explicitly not required to be cheap**. It respects each
|
||
shard's envelope (it may be slow, throttled, or resumable for a rate-limited shard) and runs
|
||
*concurrently with serving the existing derived tier*; it swaps in atomically on completion.
|
||
I-2 guarantees rebuild is *possible and correct*, not instant.
|
||
|
||
**Equivalence detection is indexed, not pairwise (review C-1).** Naive fingerprint/span-set
|
||
comparison across all pages of all shards is O(N²) and is forbidden. Instead:
|
||
|
||
1. **Blocking / candidate generation** — cheap keys bucket pages that *could* be equivalent:
|
||
normalised title, normalised path tail, explicit alias-table entries (coordination-
|
||
canonical), and **MinHash/LSH bands over content shingles** for near-duplicate and
|
||
derived-content detection. Only within-bucket pairs are considered — turning O(N²) into
|
||
≈O(N) candidates.
|
||
2. **Verification** — candidate pairs are confirmed by full fingerprint / span-set overlap and
|
||
any curator binding. Confirmed equivalences become union edges.
|
||
3. **Incremental maintenance — the delta is *not* additive (review B-4).** A changed page may
|
||
*leave* buckets as well as *enter* them, and leaving a bucket can **break an existing
|
||
equivalence edge** another page relied on. So a change is processed as: (i) recompute the
|
||
page's bucket membership; (ii) for buckets it **left**, re-verify the pairs that depended on
|
||
the shared bucket and **retract** edges no longer supported; (iii) for buckets it **entered**,
|
||
verify the new candidate pairs and **add** edges; (iv) **propagate** to the equivalence
|
||
neighbours of any retracted/added edge (equivalence is transitive-ish via chorus sets, so a
|
||
retraction can split a set). Maintenance is per-change and bounded by the page's
|
||
neighbourhood, but it covers retraction and propagation — not just additions.
|
||
|
||
**The index is itself derived** (disposable, recomputable) and per-tenant-partitioned (§9).
|
||
Its parameters (LSH band/row counts, shingle size, precision/recall) are tunable; the accepted
|
||
**false-negative rate of blocking** is a known, tracked limitation (§12) — blocking trades a
|
||
small miss rate for tractability, and curator bindings are the escape hatch for misses.
|
||
|
||
**Verifying I-2 (`derived = f(canonical)`) — eventually, not on faith (review B-4).**
|
||
Incremental maintenance can drift from a from-scratch fold over time (a missed retraction, a
|
||
dropped event, a bug). I-2 is therefore an **eventually-verified** property, not a free one,
|
||
and the architecture names the mechanism that verifies it:
|
||
|
||
- **A digest of the derived tier.** Each partition's derived tier carries a rolling content
|
||
digest (a Merkle-style hash over union nodes/edges/index entries) maintained alongside the
|
||
incremental updates.
|
||
- **A background consistency-checker** periodically recomputes the digest over a *sampled* (or,
|
||
on a slow cadence, full) fold of canonical state and compares. A mismatch localises the drift
|
||
to a partition/region and triggers a **scoped recompute** of just that region — cheap relative
|
||
to a global rebuild, and self-healing.
|
||
- **So I-2 holds *eventually and verifiably*:** the incremental engine is the fast path, the
|
||
checker is the guarantee, and divergence is detected and repaired rather than silently
|
||
accumulating. The exact sampling rate / digest granularity is an implementation spike (§12).
|
||
|
||
### 8.8 Cache freshness & invalidation
|
||
|
||
Replication-projection caches remote shard content; cache invalidation is the actual hard part
|
||
and was missing from the first draft (review C-2). The protocol is **per-binding, driven by the
|
||
capability profile**, with one rule: **freshness is always represented, never assumed** — every
|
||
cached page's provenance envelope carries `(observed-at, source-rev-if-known, staleness-state)`,
|
||
so a consumer can always tell live from stale.
|
||
|
||
**Three invalidation modes, chosen by capability, not hard-coded:**
|
||
|
||
| Mode | When | Mechanism |
|
||
|------|------|-----------|
|
||
| **Event-driven (push)** | shard has `notify` | a change event invalidates exactly the affected entries and enqueues a delta refresh (§8.7); the preferred mode |
|
||
| **Validator poll** | shard exposes ETag / Last-Modified / rev | conditional fetch (`If-None-Match`); cheap "still fresh?" checks without transferring bodies |
|
||
| **TTL** | shard offers neither | time-bounded staleness; the floor mode (Oddmuse-class shards) |
|
||
|
||
Most real bindings are **hybrid**: event-driven for invalidation + a long TTL as a safety net
|
||
for missed events + validator polls on read when an entry is past a soft age.
|
||
|
||
**Operational-envelope coupling.** The mode is constrained by axis-10: a **rate-limited** shard
|
||
(Notion) *must* favour event-driven + long TTL and *must not* poll per-read — the freshness
|
||
policy is capability-gated like everything else. A local file shard can watch the filesystem
|
||
(near-instant invalidation, effectively event-driven for free).
|
||
|
||
**Thundering-herd / coalescing.** Concurrent reads of the same stale entry trigger a **single
|
||
in-flight refresh** (single-flight); other readers await it or are served the stale-but-labelled
|
||
value per policy. Bulk invalidations (a shard-wide event) are **batched and rate-shaped** to the
|
||
shard's envelope rather than fired as N concurrent fetches.
|
||
|
||
**Staleness is a policy knob, not a correctness bug.** Whether a reader gets *stale-but-fast* or
|
||
*blocks-for-fresh* is a §10 preset (per space or per request); either way the envelope tells the
|
||
truth about what was served. This is union-without-erasure applied to time.
|
||
|
||
---
|
||
|
||
## 9. Cross-cut — Authorization (L5)
|
||
|
||
Fully specified in **`ArchitectureBlueprint.md`** (the access & history sub-blueprint);
|
||
summarised here for completeness:
|
||
|
||
- **One core, a ladder of modes** L0 (open/c2, zero deps) → L1 (attributed) → L2
|
||
(authenticated) → L3 (role/group) → L4 (multi-tenant enterprise). Climbing is configuration,
|
||
not re-architecture.
|
||
- **PEP** wraps every adapter op; **PDP** decides `(principal, action, target)` over actions
|
||
`read/write/patch/merge/administer`, layered on the adapter's capability profile (a shard
|
||
that can't write can't be written regardless of policy — L5 consults the L1 rail).
|
||
- **Authentication delegated** to a pluggable IdentityProvider (null provider = L0 default);
|
||
real identity from `user-engine` over `net-kingdom` IAM.
|
||
- **Fail open only at L0, fail closed at L2+.** Authorization is pure/offline once a Principal
|
||
is resolved. Provenance carries authz context so the union never leaks unreadable content
|
||
(the L5↔provenance-rail interaction).
|
||
|
||
### 9.1 Tenant isolation of the derived tier (review B-3)
|
||
|
||
Read-time authz filtering is necessary but **not sufficient** when the derived tier is
|
||
*persisted*: a single cross-tenant union/index cache guarded only by a filter on read is a
|
||
standing leak surface (one filtering bug exposes another tenant's content). So isolation is
|
||
**structural, not just procedural**:
|
||
|
||
- **The derived tier is partitioned per tenant / root entity.** A tenant maps to a root entity
|
||
(§4); its union graph, equivalence index, projections, and caches live in a **separate
|
||
partition** keyed by that tenant. There is no shared cross-tenant derived store to leak from.
|
||
- **No cross-tenant equivalence by default.** Blocking/LSH (§8.7) operates *within* a partition;
|
||
cross-tenant equivalence is an explicit, authorised, opt-in federation between roots, never an
|
||
accident of a shared index.
|
||
- **Read-time filtering remains, as defence-in-depth** — the provenance envelope's authz context
|
||
is still checked, so even within a partition a principal sees only what it may; partitioning
|
||
removes the *blast radius*, filtering removes the *fine-grained* leak.
|
||
- **This reconciles I-2 with L5:** recomputability (a persisted-but-disposable derived tier) is
|
||
preserved *per partition* — each tenant's derived tier is independently rebuildable from that
|
||
tenant's canonical state — so isolation costs nothing in the rebuild model. At L0/L1 (single
|
||
tenant) there is one partition and the machinery is invisible.
|
||
|
||
**Isolation invariant (add to §2 as I-13):** *derived state is partitioned by tenant; no
|
||
derived artifact spans tenants except through an explicit, authorised cross-root federation.*
|
||
|
||
---
|
||
|
||
## 10. The policy surface (mechanism over policy, made concrete)
|
||
|
||
I-7 only means something if the policy knobs are enumerated and kept *out* of core algorithms.
|
||
The configurable presets are:
|
||
|
||
- **Canonical-source policy** — chorus / designated-canonical / git-merge / overlay-only /
|
||
vote-to-merge (per space or per equivalence set).
|
||
- **Federation model** — the §8.3 taxonomy, per space, composable per shard.
|
||
- **Shard mode** — read-only / write-through / mirrored / projected / cached / canonical
|
||
(constrained by the capability profile).
|
||
- **Reconciliation cadence & conflict exposure** — push/poll/manual; show-conflicts vs
|
||
auto-merge-when-supported.
|
||
- **Conflict-resolution preset** — chorus / designated-canonical / git-merge / vote-to-merge /
|
||
overlay-only (the *resolution* policy over §8.6's core detection; per space or equivalence set).
|
||
- **Freshness / invalidation mode** — event-driven / validator-poll / TTL / hybrid, and
|
||
stale-but-fast vs block-for-fresh on read (§8.8; constrained by the operational envelope).
|
||
- **History compaction** — squash policy for low-value churn, gc/repack cadence, per-shard
|
||
offload (§8.1), always preserving recoverable endpoints.
|
||
- **Tenant partition mapping** — tenant ↔ root-entity, and any explicit cross-root federation
|
||
(§9.1, I-13).
|
||
- **Execution policy** — derive/execute off (default) / sandboxed / per-shard-allowed.
|
||
- **Authorization mode** — the L0–L4 ladder.
|
||
- **Projection materialization** — lazy/eager; snapshot vs view-time; recording retention.
|
||
|
||
Core ships sane defaults (L0 open; fork+journal; lazy replication-projection; event-driven+TTL
|
||
freshness; overlay-before-mutation; execution off; one tenant = one root) and never hard-codes
|
||
any of the above. (**Preset bundles** that package coherent knob-sets per persona are tracked
|
||
as O-8, §12 — flexibility without bundles is operator burden.)
|
||
|
||
---
|
||
|
||
## 11. Concrete module structure (bridge to implementation)
|
||
|
||
A proposed package layout for `src/shard_wiki/`, mapping 1:1 to the layers so the dependency
|
||
rule (downward only; the derived tier is incrementally maintained, rebuild = fallback) is
|
||
enforceable by import lint:
|
||
|
||
```
|
||
src/shard_wiki/
|
||
model/ # L2 top waist: Page, Identity, Placement, ProvenanceEnvelope,
|
||
# Span, the page-shape types; capability-spectrum value types
|
||
adapters/ # L1 bottom waist: AdapterContract (versioned iface), CapabilityProfile,
|
||
# attachment-mode binding; concrete adapters:
|
||
git/ folder/ gitea/ obsidian/ webdav/ notion/ … # each: profile + verbs
|
||
coordination/ # L3: DecisionLog (append-only, git-backed, per-space append authority/
|
||
# lease), OverlayEngine (draft→patch→MR), reconcile
|
||
# (current coordination state = a derived fold → lives in union/)
|
||
federation/ # L3: FederationModel strategies (fork_journal, vcs_ping,
|
||
# graph_join, feed, activitypub, engine_mirror)
|
||
union/ # L4 (derived): IdentityResolver, EquivalenceGraph, UnionGraph,
|
||
# Transclusion (reference-not-copy)
|
||
projection/ # L4 (derived): ReplicationProjection, DerivationProjection,
|
||
# ViewRegistry (moldable), QueryIndex (delegate|derive)
|
||
authz/ # L5 cross-cut: PDP, PEP, IdentityProvider iface, NullProvider
|
||
provenance/ # cross-cut LEAF: ProvenanceEnvelope type + ⊕ (effective) only — pure data
|
||
policy/ # cross-cut LEAF: the §10 policy surface (presets + a resolve() read by
|
||
# coordination/federation/projection/authz); owns NO mechanism
|
||
api/ # L6: orchestrator API (server-side union for agents/CLI)
|
||
```
|
||
|
||
**The cross-cutting rails are leaves, not god-modules (review D-4).** `provenance/` and
|
||
`policy/` are imported widely, so they are the highest coupling risk; the discipline that caps
|
||
it is: **they may import *nothing* in the tree and contain *only* stable data types + pure
|
||
functions** (the envelope and its `⊕`; the policy presets and a `resolve(question) → choice`).
|
||
Mechanism never lives in a rail — `policy/` says *what* the preset is, `coordination/`/
|
||
`projection/` decide *how* to honour it. A change to a rail is then a change to a small, stable,
|
||
dependency-free leaf, not a ripple through every layer. Capability-spectrum value types live in
|
||
`model/` (also leaf-like) for the same reason.
|
||
|
||
Hard import rules (enforced by import lint):
|
||
- `union/` and `projection/` may import `model/`, `adapters/`, `coordination/`, `policy/`,
|
||
`provenance/` — but **nothing may import them** (they are the disposable derived tier).
|
||
- `model/`, `adapters/`, `provenance/`, `policy/` import nothing else in the tree (the waists
|
||
and rails stay thin); `provenance/` and `policy/` import nothing at all.
|
||
- `coordination/` and `federation/` may import the waists + rails, never the derived tier.
|
||
|
||
---
|
||
|
||
## 12. Known scaling risks & open problems
|
||
|
||
Tracked honestly rather than pretend-solved (review disposition F). Each has a **chosen
|
||
direction** and a **revisit trigger** — the thing that, if observed, forces a redesign.
|
||
|
||
| # | Risk / open problem | Chosen direction | Revisit trigger |
|
||
|---|---------------------|------------------|-----------------|
|
||
| O-1 | **Equivalence blocking misses true matches** (LSH false negatives, §8.7) | accept a small miss rate; curator bindings are the escape hatch | measured recall below an agreed threshold on real corpora |
|
||
| O-2 | **Convergence bound for high-write CRDT shards under partition** (§8.6) | causal via journal + CRDT-native merge at the shard; no global bound promised | user-visible divergence that outlives a partition |
|
||
| O-3 | **Per-equivalence-set divergence tracking** (§8.6) | start with base-rev comparison; add vector clocks only if needed | 3-way concurrent divergence that base-rev mis-orders |
|
||
| O-4 | **Persisted derived-tier cost ceiling** (§8.7/§9.1) | per-tenant partition, incremental-maintained, rebuild is fallback | a tenant whose incremental cost still exceeds budget |
|
||
| O-5 | **Axis-interaction completeness** (§6.5) | the named interaction table is the contract; extend deliberately | a real adapter needing an interaction not in the table |
|
||
| O-6 | **Span-address portability across projection** (§7.2) | shard-scoped native-id wrapping now; tumbler later | cross-shard transclusion that native ids can't satisfy |
|
||
| O-7 | **Squash-compaction vs. perfect auditability** (§8.1) | compact the *path*, preserve reachable states; configurable | a compliance need for every intermediate keystroke |
|
||
| O-8 | **Policy-knob proliferation → operator burden** (§10) | ship named **preset bundles** ("personal vault" / "team wiki" / "enterprise federation") over the policy surface | operators mis-configuring interacting knobs |
|
||
| O-9 | **Shard sharing across roots vs tenant partition** (§9.1, I-13) | shard exclusive to one root by default; explicit shared-read binding otherwise (avoids double-caching a rate-limited shard) | a shard legitimately needed live in two tenants |
|
||
| O-10 | **Span-level authz under transclusion** (aggregation/inference leak; ⊕ across boundaries, §7.3/§9) | a transcluded span inherits the **stricter** of source & host authz; provenance ⊕ composes the source-page envelope under the host | a real cross-authz transclusion |
|
||
| O-11 | **Union under shard unavailability** (§8.8 covers stale, not down) | **partial union** + per-shard "unavailable" provenance + last-known-projection where policy allows | an SLA need on partial reads |
|
||
| O-12 | **Per-space append-log throughput ceiling** (§8.1 append authority) | single writer per space scales across spaces; per-space log sharding if needed | a single space exceeding one writer's append rate |
|
||
|
||
These are the spec-writing inputs for `SHARD-WP-0002`; none blocks the architecture, each
|
||
scopes an implementation spike.
|
||
|
||
---
|
||
|
||
## 13. Canonical data flows (the architecture exercised)
|
||
|
||
**A. Attach a shard.** Adapter binds (chosen attachment mode) → probes/declares a capability
|
||
profile → core registers the shard under a root entity → if not git-native, the coordination
|
||
journal is seeded (begin-now/mirror/import per axis 5). No union rebuild yet (lazy).
|
||
|
||
**B. Read a page through the union.** Consumer asks the union for an identity → Identity
|
||
resolver maps it to placements across shards → equivalence yields chorus or canonical →
|
||
replication-projection lazily fetches from each shard (cache + freshness) → page returned
|
||
wrapped in its provenance envelope → L5 filters anything the principal can't see at source.
|
||
|
||
**C. Edit a read-only / limited shard.** Write request → L5 PDP allows → capability profile
|
||
says < write-through → OverlayEngine records a draft → renders a patch/MR in the shard's native
|
||
syntax (lossless) or Markdown (lossy-with-report) → on explicit apply, commit to the journal
|
||
and (if the profile permits) propagate; otherwise the overlay stands as the local truth, fully
|
||
attributed.
|
||
|
||
**D. Attach a computational notebook.** Adapter declares profile (attachment=file-store,
|
||
opacity=mixed, computational=captured-output). Core attaches the `.ipynb` **source** as
|
||
canonical; presents cells + embedded outputs as **derivation-projection snapshots** marked
|
||
"run N, env unguaranteed"; offers a static render via the view registry; re-execution stays
|
||
gated off. History uses paired-text/nbdime per axis 5.
|
||
|
||
---
|
||
|
||
## 14. Key tradeoffs & decisions
|
||
|
||
Decided:
|
||
|
||
- **Capability spectra over a verb checklist** — richer contract for precise, uniform
|
||
degradation; tamed by an orthogonal core + implied positions + a named interaction table
|
||
(§6.5). (Decided.)
|
||
- **Three states; derived = f(canonical)** — sharded + coordination canonical, derived
|
||
disposable (§1). (Decided; supersedes the earlier "edges vs middle" framing.)
|
||
- **Event-sourced coordination, one append authority per space** — coordination-canonical state
|
||
is an append-only **decision log** in the git journal; current state is a derived fold; a
|
||
per-space append lease gives a totally-ordered log and read-your-writes across orchestrator
|
||
instances (§8.1). (Decided — resolves the single-vs-multi-writer keystone.)
|
||
- **Profiles are verified, not asserted** — a versioned **conformance suite** gates adapter
|
||
admission; capability-as-data acts on verified data (§6.6). (Decided.)
|
||
- **I-2 is eventually-verified** — incremental maintenance is the fast path; a digest +
|
||
background consistency-checker detects and self-heals drift (§8.7). (Decided.)
|
||
- **Incremental-first, rebuild-as-fallback** — the derived tier is continuously maintained from
|
||
change events; full rebuild is rare and need not be cheap (§8.7). (Decided — resolves the
|
||
earlier "union graph persistence" open item: **persisted, per-tenant, incrementally
|
||
maintained, rebuildable**, §9.1.)
|
||
- **Identity ≠ fingerprint** — page identity is a stable handle; fingerprints are for
|
||
equivalence (§7.2). (Decided.)
|
||
- **Consistency = read-your-writes (journal) + causal (derived) + eventual/freshness-labelled
|
||
(shards)**; conflict detection/representation is core, resolution is policy (§8.6). (Decided.)
|
||
- **Address scheme** — shard-scoped native-id wrapping now; portable tumbler later (§7.2, O-6).
|
||
(Decided.)
|
||
- **Default federation = fork+journal over Git**; other models opt-in (§8.3). (Decided.)
|
||
- **Execution off by default** — recognise+project always; execute only when gated (§8.5). (Decided.)
|
||
- **Derived tier is tenant-partitioned** (I-13, §9.1). (Decided.)
|
||
|
||
Still open (carried to §12 / policy):
|
||
|
||
1. **L1 "attributed-but-open" mode** — ship it or jump L0→L2? (Carried from ArchitectureBlueprint.)
|
||
2. **Per-page ACL default** — off (per-shard/namespace) confirmed; revisit only if demand appears.
|
||
3. The implementation spikes in **§12** (O-1…O-7).
|
||
|
||
---
|
||
|
||
## 15. What this architecture is *not*
|
||
|
||
- Not a wiki engine, UI, or rendering pipeline (those are consumers at L6).
|
||
- Not a canonical-source-of-truth — shards keep sovereignty; the middle is derived.
|
||
- Not a generic file-sync daemon — synchronisation is wiki-page-semantic.
|
||
- Not an execution platform — computation is recognised and projected, not hosted.
|
||
- Not a universal ontology — no single schema is imposed on all shards.
|
||
- Not an authentication/identity store — that is delegated (authorization is owned).
|
||
|
||
---
|
||
|
||
## 16. Traceability
|
||
|
||
- **INTENT** — every invariant in §2 (I-1…I-13) cites an INTENT principle or boundary; no
|
||
invariant contradicts the Stability Note.
|
||
- **Review & hardening** — this revision folds in
|
||
`history/260615-core-architecture-blueprint-review.md` via **`SHARD-WP-0005`**: A-1→§1/§3/§4
|
||
(three-state re-frame), B-1→§7.2 (identity vs equivalence), B-2→§8.6 (consistency/conflict),
|
||
B-3→§9.1+I-13 (tenant isolation), C-1/C-2→§8.7/§8.8 (incremental + indexed + invalidation),
|
||
C-3→§8.1 (history scaling), D-1→§6.5 (orthogonal core), D-2→§7.3 (layered provenance),
|
||
D-3→§8.4 (common-case projection), D-4→§11 (policy module + rail discipline); open items→§12.
|
||
- **Round-2 review & hardening II** — folds in
|
||
`history/260615-core-architecture-blueprint-review-2.md` via **`SHARD-WP-0006`**:
|
||
A-1…A-4→§3/§4/§10/§11 (overview reconciled to the body), B-1+B-3→§8.1 (event-sourced
|
||
coordination + per-space append authority), B-2→§6.6 (adapter conformance suite),
|
||
B-4→§8.7 (incremental retraction/propagation + I-2 digest/checker); C-1…C-4→§12 (O-8…O-11).
|
||
- **Research** — §6 (spectra) ← `260614-shard-spectrum-synthesis` v3; §8.3 (federation
|
||
taxonomy) ← v3 §2.5; §8.4–§8.5 (two-axis projection, view registry, computational scope) ←
|
||
`260614-computational-page-model-synthesis`; §7 page shapes ← the engine + modern-tool +
|
||
computational dives; §1 thesis ← the files-canonical/index-derived through-line across
|
||
Logseq/ikiwiki/GT/Pharo/Jupyter.
|
||
- **Use cases** — the architecture is sized to UC-01–UC-84: federation/coordination (UC-01–07,
|
||
26–33, 56, 71–72, 79) → §8; attachment/adapter (UC-34–43, 50, 53, 57, 60–62, 64–66, 68–70,
|
||
76–82) → §6; page model & fidelity (UC-34, 39, 42, 55, 58–59, 67, 73, 80, 83–84) → §7/§8.5;
|
||
addressing/identity/query (UC-32, 44–49, 51–52, 54, 63, 74) → §7.2/§8.4; provenance &
|
||
metadata (UC-24–25, 75) → the provenance rail; collaboration & discovery (UC-08–23) → L6
|
||
consumers over the union.
|
||
- **Workplans** — §6–§8 are the design target of `SHARD-WP-0002` (T11–T18); §9 is owned by
|
||
`ArchitectureBlueprint.md`; §1 (yawex-derived resolution/overlay) aligns with
|
||
`SHARD-WP-0001`.
|
||
|
||
---
|
||
|
||
## 17. Stability note
|
||
|
||
This document defines shard-wiki's **internal** architecture; it may evolve as the spec
|
||
workplans land. But the **thesis (§1)**, the **invariants (§2)**, and the **dual narrow waist
|
||
(§1, §6, §7)** are load-bearing — changing any of them is an architectural change in the sense
|
||
of INTENT's Stability Note and should be rare and deliberate.
|