Files
shard-wiki/history/260615-core-architecture-blueprint-review.md
tegwick dd812abb81 history+workplan: CoreArchitectureBlueprint review; SHARD-WP-0005 hardening
Records the critical review (history/260615-...) and establishes SHARD-WP-0005
to fold its findings (A-1, B-1..B-3, C-1..C-3, D-1..D-4) into the blueprint:
correctness (state re-frame, identity/equivalence split, consistency model),
scale (incremental-first union, equivalence indexing, cache invalidation),
elegance (orthogonal spectra, layered provenance, common-case projection,
policy module), security/history scaling, and a known-open-problems section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:01:19 +02:00

6.7 KiB
Raw Permalink Blame History

Critical review — CoreArchitectureBlueprint.md

Date: 2026-06-15 · Reviewer: tegwick (with Claude) · Subject: spec/CoreArchitectureBlueprint.md @ commit 9b5b393 · Feeds: SHARD-WP-0005

A deliberately hostile review of the first whole-system architecture, to find where it breaks (correctness), fails to scale, and could be more elegant/efficient before any implementation. Findings are prioritised; each is the input to a SHARD-WP-0005 task.

Verdict in one line

The layering and the dual narrow waist are sound and stay. The thesis is ~90% right; the missing 10% (curatorial / coordination-canonical state) breaks its clean story. There are two genuine bugs, two large unaddressed scaling risks, and several elegance/efficiency debts — all fixable without touching INTENT.


A. The framing crack (fix resolves three issues)

A-1 — Two buckets hide a third. The thesis "canonical at the edges, derived in the middle" omits born-in-the-middle-but-canonical state: overlays that are the local truth against a read-only shard (Flow C), manual curator equivalence bindings, alias tables, merge decisions. These encode human judgment or local-only content and cannot be rebuilt from shards+journal.

Contradiction: I-2 declares L4 rebuildable, yet §8.4 puts "alias table, curator binding" in L4. You cannot rebuild a curator's manual binding.

Fix: three states — sharded-canonical, coordination-canonical (journal: overlays, bindings, aliases, merges — durable, born in the middle), derived-disposable (union graph, indexes, projections). Re-frame §1 as canonical (sharded + coordination) vs derived (disposable); derived = f(canonical) then becomes actually true. → T1


B. Where it breaks (correctness)

B-1 — Identity conflated with content-fingerprint (BUG). §7.2 derives page identity from content fingerprint. That makes editing a page change its identity, breaking every reference. Fingerprints identify versions/equivalence, not identity. Page identity must be a stable handle (uid) surviving edits; fingerprints belong to the equivalence mechanism (§8.4). One word, two concepts, wrong implementation for the stable one. → T2

B-2 — No concurrency/consistency model. Concurrent overlays on one page, overlay applied after source drift, journal-commit vs shard-native-write ordering — all undefined. Conflict handling is deferred to "policy presets," but conflict detection + representation is core mechanism; only resolution is policy. The union's consistency guarantee is unstated (eventually-consistent? read-your-writes? causal-via-journal?). → T3

B-3 — Persisted union cache + multi-tenant = leak surface. §13 recommends a persisted L4 cache; §9 protects content by read-time filtering on the provenance envelope. A persisted cross-tenant union cache guarded only by read-time filtering is an L4 attack surface. Tension between I-2 (persisted rebuildable cache), scale, and L5 isolation is unacknowledged. → T8


C. Where it fails to scale

C-1 — Equivalence detection is O(N²), no indexing/incremental story. Fingerprint / span-set-overlap across all pages of all shards is combinatorial (10 shards × 100k pages ≈ 10¹² comparisons). No blocking/LSH/indexing, no incremental maintenance. Biggest scaling hazard in the document. → T4

C-2 — "Rebuildable cache" collides with the operational-envelope axis. A byte-exact rebuild requires reading every page of every shard, including rate-limited/paginated external APIs (Notion) and irreducibly-live sources — hours-to-days. I-2 contradicts axis-10. Incremental, change-driven maintenance must be primary (notify→delta), rebuild a rare fallback. Cache invalidation — the actual hard problem — is named once and never designed. → T4, T5

C-3 — Unbounded history at open L0 = DoS/perf. "Every write a commit" + "open for all" ⇒ the git journal grows without bound under bots/vandalism and git degrades on huge histories. "History is the floor" has an unacknowledged cost: packing, compaction, per-shard offload. → T8


D. Elegance / efficiency debts

D-1 — The 15 spectra assert a clean degradation function never demonstrated. Either most axes are irrelevant to most ops (then the 15-D profile is ceremony), or behavior depends on several axes jointly (then "no per-backend code" becomes a sprawling axis-interaction matrix — the flat-checklist problem in higher dimensions). And the axes aren't orthogonal (git-native history ⟺ git-IS-store ⟺ git/text merge; encrypted opacity ⟹ query/translation collapse). Model a smaller orthogonal core + derived/implied positions, and state the axis-interaction subset the degradation logic truly uses. → T6

D-2 — Provenance envelope isn't inherited; it'll dwarf the content. Per-span envelopes at block granularity = 10k near-identical envelopes for a 10k-block graph. The doc already invented the right pattern for Trilium ("effective-vs-own with per-attribute provenance") and failed to apply it to its own envelope. Make provenance layered (page envelope + span deltas). → T7

D-3 — Projection machinery over-fit to the exotic tail. Two-axis model + three facets + view registry exist mostly for UC-83/84 (2 of 84 UCs); the 95% case (markdown in git) pays the weight. Make the common case trivial (default = plain lazy replication) and derivation/liveness an extension point, not a taxonomy every projection instantiates. → T7

D-4 — Cross-cutting rails are the highest-coupling components, presented as clean. provenance/ and capability types are imported by every layer (god-modules); an envelope change ripples everywhere. And policy has no module (§10 enumerates it; §11 omits it) despite being consulted by L3/L4/L5. Give policy a home; pin the rails behind stable narrow interfaces. → T7


E. What explicitly stays

  • The 6-layer model + the dual narrow waist (adapter contract / page model).
  • Capability-as-data (I-3), union-without-erasure (I-4), overlay-before-mutation (I-5), Git-addressable coordination (I-6), mechanism-over-policy (I-7), graceful degradation (I-8).
  • The federation-model taxonomy and the auth ladder (ArchitectureBlueprint.md).

F. Disposition

Some findings are solvable now (A-1, B-1, D-2, D-3, D-4, C-3); some are partially open and should be tracked honestly rather than pretend-solved (B-2 consistency model: pick a guarantee; C-1 equivalence-at-scale: pick a blocking strategy; D-1 axis interactions: enumerate the real subset). SHARD-WP-0005 closes the solvable ones and records the open ones in a new "Known scaling risks & open problems" section of the blueprint. → T9