Files
shard-wiki/history/260615-core-architecture-blueprint-review.md
tegwick dd812abb81 history+workplan: CoreArchitectureBlueprint review; SHARD-WP-0005 hardening
Records the critical review (history/260615-...) and establishes SHARD-WP-0005
to fold its findings (A-1, B-1..B-3, C-1..C-3, D-1..D-4) into the blueprint:
correctness (state re-frame, identity/equivalence split, consistency model),
scale (incremental-first union, equivalence indexing, cache invalidation),
elegance (orthogonal spectra, layered provenance, common-case projection,
policy module), security/history scaling, and a known-open-problems section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:01:19 +02:00

123 lines
6.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Critical review — CoreArchitectureBlueprint.md
Date: 2026-06-15 · Reviewer: tegwick (with Claude) · Subject:
`spec/CoreArchitectureBlueprint.md` @ commit **9b5b393** · Feeds: **SHARD-WP-0005**
A deliberately hostile review of the first whole-system architecture, to find where it
**breaks (correctness)**, **fails to scale**, and **could be more elegant/efficient** before
any implementation. Findings are prioritised; each is the input to a SHARD-WP-0005 task.
## Verdict in one line
The **layering and the dual narrow waist are sound and stay**. The **thesis is ~90% right**;
the missing 10% (curatorial / coordination-canonical state) breaks its clean story. There are
**two genuine bugs**, **two large unaddressed scaling risks**, and several **elegance/efficiency
debts** — all fixable without touching INTENT.
---
## A. The framing crack (fix resolves three issues)
**A-1 — Two buckets hide a third.** The thesis "canonical at the edges, derived in the middle"
omits **born-in-the-middle-but-canonical** state: overlays that are the local truth against a
read-only shard (Flow C), manual **curator equivalence bindings**, alias tables, merge
decisions. These encode human judgment or local-only content and **cannot be rebuilt** from
shards+journal.
**Contradiction:** I-2 declares L4 rebuildable, yet §8.4 puts "alias table, curator binding"
in L4. You cannot rebuild a curator's manual binding.
**Fix:** three states — **sharded-canonical**, **coordination-canonical** (journal: overlays,
bindings, aliases, merges — durable, born in the middle), **derived-disposable** (union graph,
indexes, projections). Re-frame §1 as **canonical (sharded + coordination) vs derived
(disposable)**; `derived = f(canonical)` then becomes actually true. → **T1**
---
## B. Where it breaks (correctness)
**B-1 — Identity conflated with content-fingerprint (BUG).** §7.2 derives page identity from
content fingerprint. That makes **editing a page change its identity**, breaking every
reference. Fingerprints identify *versions/equivalence*, not *identity*. Page identity must be
a **stable handle (uid)** surviving edits; fingerprints belong to the **equivalence** mechanism
(§8.4). One word, two concepts, wrong implementation for the stable one. → **T2**
**B-2 — No concurrency/consistency model.** Concurrent overlays on one page, overlay applied
after source drift, journal-commit vs shard-native-write ordering — all undefined. Conflict
handling is deferred to "policy presets," but **conflict *detection + representation* is core
mechanism**; only *resolution* is policy. The union's consistency guarantee is unstated
(eventually-consistent? read-your-writes? causal-via-journal?). → **T3**
**B-3 — Persisted union cache + multi-tenant = leak surface.** §13 recommends a persisted L4
cache; §9 protects content by *read-time* filtering on the provenance envelope. A persisted
cross-tenant union cache guarded only by read-time filtering is an L4 attack surface. Tension
between I-2 (persisted rebuildable cache), scale, and L5 isolation is unacknowledged. → **T8**
---
## C. Where it fails to scale
**C-1 — Equivalence detection is O(N²), no indexing/incremental story.** Fingerprint /
span-set-overlap across all pages of all shards is combinatorial (10 shards × 100k pages ≈
10¹² comparisons). No blocking/LSH/indexing, no incremental maintenance. Biggest scaling
hazard in the document. → **T4**
**C-2 — "Rebuildable cache" collides with the operational-envelope axis.** A byte-exact
rebuild requires reading *every page of every shard*, including rate-limited/paginated
external APIs (Notion) and irreducibly-live sources — hours-to-days. I-2 contradicts axis-10.
**Incremental, change-driven maintenance must be primary** (notify→delta), rebuild a rare
fallback. Cache invalidation — the actual hard problem — is named once and never designed. →
**T4, T5**
**C-3 — Unbounded history at open L0 = DoS/perf.** "Every write a commit" + "open for all" ⇒
the git journal grows without bound under bots/vandalism and git degrades on huge histories.
"History is the floor" has an unacknowledged cost: packing, compaction, per-shard offload. →
**T8**
---
## D. Elegance / efficiency debts
**D-1 — The 15 spectra assert a clean degradation function never demonstrated.** Either most
axes are irrelevant to most ops (then the 15-D profile is ceremony), or behavior depends on
several axes *jointly* (then "no per-backend code" becomes a sprawling axis-interaction matrix
— the flat-checklist problem in higher dimensions). And the axes **aren't orthogonal**
(git-native history ⟺ git-IS-store ⟺ git/text merge; encrypted opacity ⟹ query/translation
collapse). Model a **smaller orthogonal core** + **derived/implied** positions, and state the
**axis-interaction subset** the degradation logic truly uses. → **T6**
**D-2 — Provenance envelope isn't inherited; it'll dwarf the content.** Per-span envelopes at
block granularity = 10k near-identical envelopes for a 10k-block graph. The doc already
invented the right pattern for Trilium ("effective-vs-own with per-attribute provenance") and
failed to apply it to its own envelope. Make provenance **layered (page envelope + span
deltas)**. → **T7**
**D-3 — Projection machinery over-fit to the exotic tail.** Two-axis model + three facets +
view registry exist mostly for UC-83/84 (2 of 84 UCs); the 95% case (markdown in git) pays the
weight. Make the **common case trivial** (default = plain lazy replication) and
derivation/liveness an **extension point**, not a taxonomy every projection instantiates. →
**T7**
**D-4 — Cross-cutting rails are the highest-coupling components, presented as clean.**
`provenance/` and capability types are imported by every layer (god-modules); an envelope
change ripples everywhere. And **policy has no module** (§10 enumerates it; §11 omits it)
despite being consulted by L3/L4/L5. Give policy a home; pin the rails behind stable narrow
interfaces. → **T7**
---
## E. What explicitly stays
- The 6-layer model + the dual narrow waist (adapter contract / page model).
- Capability-as-data (I-3), union-without-erasure (I-4), overlay-before-mutation (I-5),
Git-addressable coordination (I-6), mechanism-over-policy (I-7), graceful degradation (I-8).
- The federation-model taxonomy and the auth ladder (ArchitectureBlueprint.md).
## F. Disposition
Some findings are **solvable now** (A-1, B-1, D-2, D-3, D-4, C-3); some are **partially open**
and should be tracked honestly rather than pretend-solved (B-2 consistency model: pick a
guarantee; C-1 equivalence-at-scale: pick a blocking strategy; D-1 axis interactions: enumerate
the real subset). SHARD-WP-0005 closes the solvable ones and records the open ones in a new
"Known scaling risks & open problems" section of the blueprint. → **T9**