Files
shard-wiki/research/260614-federated-wiki-deep-dive/findings.md
tegwick 036dbad816 research: Federated Wiki deep dive (journal/fork/neighborhood); UC-70-72
SHARD-WP-0003 T1. Federation model (not a shard candidate): per-page
append-only semantic-action journal with story as derived replay,
fork-with-site-provenance, neighborhood/roster discovery + chorus of forks.
Prior art for shard-wiki's own pillars: coordination journal (UC-71),
overlay-before-mutation (UC-26 fork), union-without-erasure (UC-72).
Attach as REST/file-store hybrid (page JSON + CORS, UC-70). Feeds
SHARD-WP-0002 T1-T5, T11, T13, T16.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 19:01:13 +02:00

241 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Federated Wiki — deep dive (findings)
**Date:** 2026-06-14 · **Source:** SHARD-WP-0003 T1 · **Subject:** Ward Cunningham's
Smallest Federated Wiki (SFW) / Federated Wiki (fedwiki ecosystem).
## Why this dive
Every prior dive has been a *shard candidate* — a store we might attach. Federated Wiki
is different: it is a **federation model**, the one piece of public prior art whose core
job is the same as shard-wiki's coordination layer — *present a union of pages from many
independent sites while preserving where each came from, and let people copy and edit
non-destructively*. Ward Cunningham (inventor of the wiki) built SFW in 2011 precisely to
fix the original wiki's single-canonical-page weakness with **fork + provenance**. We go
past the surface (`260608-federation-concepts/` §3) into the data model and protocol, then
ask what shard-wiki should adopt.
**Framing:** fedwiki is not just "a shard we attach" — it is a *worked example of the
coordination journal, overlay-before-mutation, and union-without-erasure*, three of our
own design pillars, shipped and running.
---
## 1. The data model — page = title + story + journal
A fedwiki page is a small JSON object with three core fields (plus optional decoration):
```json
{
"title": "Welcome Visitors",
"story": [
{ "type": "paragraph", "id": "7b56f22a4b9ee974",
"text": "Welcome to this [[Federated Wiki]] site." },
{ "type": "image", "id": "a1c0e3...", "url": "...", "caption": "..." }
],
"journal": [
{ "type": "create", "id": "7b56f22a4b9ee974", "item": {...}, "date": 1310000000000 },
{ "type": "add", "id": "a1c0e3...", "item": {...}, "after": "7b56f22a4b9ee974",
"date": 1310000100000 },
{ "type": "edit", "id": "7b56f22a4b9ee974", "item": {...}, "date": 1310000200000 },
{ "type": "fork", "site": "ward.fed.wiki.org", "date": 1310000300000 }
]
}
```
- **story** — an *ordered array of typed items* ("paragraph-like" items). Each item is
`{ type, id, text, ...type-specific }`. The **`id`** is a random 16-hex string,
**stable across edits** (it is the unit of identity within a page). The **`type`** names
the **plugin** that renders/edits the item (`paragraph`, `image`, `html`, `markdown`,
`code`, `method`, `pagefold`, chart plugins, …). *Data lives in the item; behavior lives
in the plugin* — the item is portable JSON; the plugin is the renderer.
- **journal** — an *ordered, append-only array of action objects* that, when replayed,
**reconstructs the story**. The story is a materialized view of the journal. This is the
key architectural choice: **the journal is the source of truth, the story is derived.**
## 2. Journal action types — a semantic op-log
Each journal entry is an action with `{ type, ... , date }` (epoch-ms). The action types:
| action | fields | meaning |
|---------|--------|---------|
| `create`| `id, item, date` | first item — page born |
| `add` | `id, item, after, date` | insert an item after another |
| `edit` | `id, item, date` | replace an item's content (id preserved) |
| `move` | `order, date` | reorder items |
| `remove`| `id, date` | delete an item |
| `fork` | `site, date` | **mark that the page was copied from `site` at this point** |
Two things matter for us:
1. **These are *semantic* operations** (add/move/edit/remove a paragraph), not text diffs
and not character-level CRDT ops. The write granularity is the **story item
(paragraph)** — a *middle* granularity between whole-file (TiddlyWiki) and
block/character (Logseq/CRDT). It is an **op-log** like a CRDT, but the ops are
coarse-grained and **applied by humans via fork**, not auto-merged.
2. **`fork` is the provenance primitive.** When you copy a remote page to your own site,
a `fork` entry is appended recording the **source site** and time. The journal of a
forked page therefore **serializes a directed acyclic graph (DAG)** of where content
came from — "the journal of a forked page is detailed enough to recognize where in the
journal of the original the fork took place" (CouchDB-style per-entry sequence numbers
make the cut-point identifiable). History visualization highlights the forked entry.
## 3. The federation protocol — sites, neighborhood, roster
- **Site** = an independent server (originally Node.js; also static-file and serverless
variants). A site owns a set of pages, each served as **page JSON over HTTP** at
`/<slug>.json`, with **CORS headers** so a *browser-side* client can fetch pages from
**any** site. Page identity within a site is the **slug** (a title-derived kebab name).
- **The client assembles the union, not the server.** The fedwiki client ("the lineup")
renders pages **side by side**: clicking a link opens that page *from whatever site it
resolves against*, appended to the right. Browsing literally builds a left-to-right
trail across sites.
- **Neighborhood** = the dynamic set of sites encountered in the current session (from the
sites of pages you've opened, links, and forks). **Search runs across the neighborhood**
— a federated search over exactly the sites you've touched.
- **Roster** = an explicit, authored list of sites to include (a curated neighborhood);
"sister sites" are peers you watch. There is **no central registry** — discovery is by
link, fork, and roster.
- **Happenings** = time-bounded collaborative events where many participants fork around a
topic for a period, producing a burst of related forks (a bounded collaboration that
leaves a durable forked record on each participant's own site).
## 4. The editorial model — fork, don't edit-in-place
You can only write to **your own** site. To change someone else's page you **fork** it
(copy into your site, journal records the source), then edit your copy. Many forks of the
same page coexist across sites — Cunningham's **"chorus of voices"**: *no canonical
version*, divergence is normal and visible, and you choose whose changes to pull by forking
them. There is **no automatic merge** — reconciliation is human: compare journals, fork the
version you prefer, optionally re-fork upstream changes.
---
## 5. Capability profile
| Dimension (synthesis spectrum) | Federated Wiki |
|--------------------------------|----------------|
| Attachment mode | **REST/file-store hybrid** — page JSON over HTTP+CORS; also static files |
| Addressing granularity | **story item (paragraph)** via stable 16-hex `id` |
| Content identity | item `id` random+stable; page id = site + slug |
| Identity vs placement | **placement-bound**: identity = `site` + `slug`; forks are *new* identities linked by journal provenance |
| Structure | ordered array of **typed items** (plugin-typed) |
| History | **per-page append-only journal** of semantic actions (op-log) |
| Merge model | **fork + manual journal compare** — a *third model* beside git 3-way and CRDT auto-merge |
| Native query | none built-in; **neighborhood search** (federated full-text across touched sites) |
| Translation | item `text` is wiki/Markdown-ish; plugins own their formats |
| Attachment/write granularity | **story-item level** (add/edit/move/remove one item) |
| Operational envelope | tiny servers, browser-driven; CORS is the whole API surface |
| Access grant | **own-site-only writes**; reads open via CORS |
| Content opacity | transparent JSON (no E2EE); plugin-typed but inspectable |
| Provenance | **first-class**`fork` records source site; journal = provenance DAG |
## 6. INTENT mapping
### Reinforcements (fedwiki validates our pillars)
- **Coordination journal** (INTENT) ≈ fedwiki **journal**. Our journal idea is *exactly*
fedwiki's per-page append-only action log — and fedwiki proves the story-as-derived-view
pattern works. Strong reinforcement; adopt the **semantic-op + provenance-entry** shape.
- **Overlay before mutation** ≈ **fork**. Fork *is* the canonical overlay: a
non-destructive copy onto a writable surface, recording provenance, before any change.
- **Union without erasure** ≈ **neighborhood + chorus**. The union is assembled from many
sovereign sites; provenance (which site, forked-from) is never hidden; divergence is
surfaced, not resolved away.
- **No silent remote mutation** ≈ **own-site-only writes**. You structurally *cannot*
mutate a remote; you fork to your own site. This is our rule, enforced by architecture.
- **Mechanism over policy** ≈ **no canonical source**. Fedwiki ships the mechanism (fork,
journal, neighborhood) and leaves "which version wins" entirely to people.
- **Graceful degradation** ≈ static-file sites — a fedwiki site can be a read-only pile of
JSON files; still forkable, still in the neighborhood.
### Divergences (boundaries / design notes, not bugs)
- **Identity = placement.** Fedwiki page identity is `site` + `slug`; a fork is a *new*
page whose only tie to the origin is a journal `fork` entry. shard-wiki wants
**identity ≠ placement** (the "same" page across shards under a stable identity, T16) —
so we treat fedwiki's journal-linked forks as *provenance edges*, and layer our own
cross-shard identity over them rather than adopting slug-as-identity.
- **No query / no typed-record model.** Fedwiki is paragraphs+plugins, not a typed DB
(contrast Notion/Wikibase). Fine — it sits at the *coordination* end, not the structure
end. We don't ask fedwiki to provide query; the neighborhood search is the model for
*federated* search across shards (T-federation), not in-shard query.
- **Browser-assembles-union.** Fedwiki pushes union assembly to the client. shard-wiki
assembles server/orchestrator-side. Adopt the *model* (union from sovereign sources +
provenance), not the client-only locus.
### What to keep
1. **Journal = append-only semantic-op log with provenance entries**, story = derived
replay view. This is the concrete shape for our coordination journal (T13).
2. **Fork-with-source-attribution** as the overlay/adopt primitive across shards.
3. **Neighborhood** as the model for a *dynamic, link-and-fork-discovered* federated set +
search, with **roster** as the curated/explicit variant.
4. **Chorus of forks** — represent divergent versions across shards as co-equal, linked by
provenance, with reconciliation as an explicit human/policy step (mechanism over policy).
---
## 7. UC seeds
| # | Seed | Disposition |
|---|------|-------------|
| UC-70 | Attach a Federated Wiki site as a shard via its **page JSON + CORS** (REST/file-store hybrid); project pages, fork to overlay | **new** |
| UC-71 | Adopt a **per-page append-only semantic-action journal with provenance entries** (fork=source site) as the coordination-journal model — replay to materialize, compare to locate divergence | **new** |
| UC-72 | **Fork-with-site-provenance federation across a neighborhood** of peer shards — assemble a union from links/forks, search across it, preserve the chorus without forcing a canonical | **new** |
| — | fork-with-provenance as overlay/adopt | enrich **UC-26** (fork) |
| — | carry-forward of forked content + upstream re-fork | enrich **UC-28** (carry-forward) |
| — | happenings = time-bounded collaboration leaving durable forks | enrich **UC-30** (time-bounded space) |
| — | union/chorus of co-equal versions, provenance-linked | enrich **UC-05 / UC-27** |
## 8. Architecture notes for SHARD-WP-0002
- **T1T5 (federation):** fedwiki is the reference design. The **journal** (append-only,
semantic ops, fork-provenance) is the concrete coordination-journal shape; **neighborhood
+ roster** is the discovery/membership model (dynamic vs curated); **fork** is the
overlay/adopt op. Model the union as an assembly over sovereign sources with provenance
edges, reconciliation left to policy.
- **T11 (capability/write-granularity):** add **story-item / paragraph** as a named
write-granularity tier between whole-file and block/character.
- **T13 (history portability / merge model):** record fedwiki's **journal-replay op-log**
as a *third merge model* beside git 3-way and CRDT auto-merge — a **coarse semantic
op-log applied manually via fork**. A shard whose history *is* such a journal can supply
our coordination journal almost directly (vs git-commit import or CRDT-update import).
- **T16 (identity ≠ placement):** fedwiki's `fork` journal entries are **provenance edges**
between same-named pages on different sites — exactly the cross-shard "same page,
different placement" relation we must model. Use them as edges; keep our own identity
layer above slug.
## 9. Open questions
1. Should shard-wiki's coordination journal adopt fedwiki's **exact action vocabulary**
(create/add/edit/move/remove/fork) at the page-item level, or a more granular/abstract
op set that other shards can also emit?
2. Is **neighborhood** (dynamic, link/fork-discovered) a first-class membership mode for an
information space, or only a *view* over an explicitly-configured shard set (roster)?
3. How do we reconcile fedwiki's **slug-as-identity + fork-DAG** with our intended
**stable cross-shard identity** (T16) — promote fork edges into the identity graph, or
keep them as provenance-only annotations?
4. Does the **chorus / no-canonical** stance compose with shards that *do* assert a
canonical (Notion, an upstream git main)? (policy-selectable canonical over a
mechanism that permits chorus.)
## 10. Sources
- Smallest Federated Wiki wiki: **Story JSON**, **Federation Details**
github.com/WardCunningham/Smallest-Federated-Wiki/wiki
- JSON Schema notes — song.fed.wiki.org/json-schema.html
- "Smallest Federated Wiki" — home.c2.com/smallest-federated-wiki.html
- Federated Wiki — federated.wiki (Visualizing Page History)
- Mike Caulfield, "The OER Case for Federated Wiki" — hapgood.us (2015)
- Jon Udell, "A federated Wikipedia" — blog.jonudell.net (2015)
- Wikipedia: *Federated Wiki*; IndieWeb: *Smallest Federated Wiki*
- fedwiki/wiki-plugin-transport (plugin/transport reference)
- prior: `research/260608-federation-concepts/` §3
## 11. Traceability
New UCs **UC-70UC-72** carry the marker **⊞** in the wikiengines column of
`spec/UseCaseCatalog.md` (true lineage = this dive; placed in the nearest existing column).
Enriched: UC-26, UC-28, UC-30, UC-05, UC-27. Architecture cross-refs: SHARD-WP-0002
T1T5, T11, T13, T16.