Files
citation-evidence/docs/decisions/ADR-0005-persistence.md
tegwick d54daf2e61 Implement CE-WP-0002 T03-T09: ingest, anchor resolution, engine, UI, persistence, e2e
Completes the PDF review slice end-to-end. After this commit a user can
open a fixture, select text, save an evidence item with commentary, see
it in the sidebar, reload the page, click the item, and the viewer
scrolls to the passage.

- T03 src/source/pdf/{fingerprint,extract,ingest}.ts + 39 fixture tests
  - SHA-256 fingerprint over a fresh ArrayBuffer (TS BufferSource-safe)
  - PDF.js text extract; per-page normalize then join with "\n\n"
  - PageMap + OffsetMap (gap-free coverage); pageLength = end - start
  - Updated manifest's Betriebskosten quote to one PDF.js extracts cleanly
- T04 src/anchor/selectors/{create,resolve}.ts + 25 unit + 7 fixture tests
  - createSelectors emits the maximal redundant set (TextQuote +
    TextPosition + PdfRect + PdfPageText when available)
  - resolveSelectors implements the SharedContracts §7 ladder; confidence
    1.0 (pos+quote) → 0.7 (rect-only) → 0 (unresolved)
  - Cross-module integration test moved to tests/integration/ to honor
    the anchor↛source boundary lint rule
- T05 engine: sync event bus over the closed §4 vocabulary, Map-backed
  repos, services, createEngine() composition root, 12 tests
- T06 work + app: three-pane shell (CollectionList | ViewerShell |
  EvidenceSidebar) wired through EngineProvider; EngineContext lives in
  src/work/ to respect the work↛app boundary; SpikeApp deleted
- T07 AnnotationToolbar: pendingSelection in context; Save runs
  createSelectors → engine.annotations.create → engine.evidence.create
- T08 click-to-reopen + localStorage persistence
  - scrollToAnnotation state in context with a version counter so a
    second click on the same item re-fires the viewer scroll
  - captureSnapshot/restoreSnapshot/attachPersister/restoreFromStorage;
    restore bypasses services to avoid event-loops
  - active-document id persisted alongside the snapshot so reload lands
    on the same fixture; ADR-0005 written
  - 9 persistence tests
- T09 tests/integration/app-prd-scenario.dom.test.tsx
  - end-to-end happy-dom test of PRD scenario steps 1-8 through the real
    React tree; viewer + ingest mocked per ADR-0004's headless-Chromium
    limitation. Fixed memo-deps bug in EvidenceSidebar/ViewerShell where
    useEngineEventTick values were not included in the useMemo deps,
    leaving stale memoization across event-driven re-renders
- vitest.config.ts: happy-dom for *.dom.test.{ts,tsx} files
- noEmit added to tsconfig so tsc -b doesn't litter src/ with .js outputs

Gates: typecheck ✓ lint ✓ test 109/109 across 11 files ✓ build ✓

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 10:58:11 +02:00

3.8 KiB

ADR-0005 — Persistence for the MVP slice

  • Status: accepted (provisional — durable storage owned by a later workplan)
  • Date: 2026-05-25
  • Workplan: CE-WP-0002-T08 (click-to-reopen requires reload-survival)

Context

CE-WP-0002 needs the click-to-reopen flow to survive a page reload (PRD scenario step 4 → "even after a full page reload"). The full persistence design (SQLite local-first vs Postgres server-first) is too large to land inside this slice — wiki/ArchitectureOverview.md §10 lays out the bigger picture but the workplan explicitly defers the decision.

The engine already runs Map-backed in-memory repositories (src/engine/repos/in-memory.ts). To survive reloads we need some persistence boundary now, without committing to the long-term store.

Options

  • A. localStorage snapshot (this ADR). The SPA serializes the entire engine state into a single JSON blob on every mutation and restores it on mount. No new dependencies; no schema migrations; no networking. Per-tab only.
  • B. IndexedDB-backed store. More headroom, more API surface, async reads. Needed eventually for binary blobs (PDF bytes) but overkill for the few hundred annotations the MVP produces.
  • C. SQLite via sql.js or wa-sqlite. Brings query semantics into the browser. Heavy for the MVP and entangles us with a database we may not keep.
  • D. Server-backed persistence from day one. Requires shipping a backend. Premature.

Decision

Adopt A: localStorage snapshot, deliberately temporary.

Implementation lives in src/engine/persistence.ts:

  • captureSnapshot(engine) returns { documents, representations, annotations, evidenceItems }.
  • attachPersister(engine, { key }) subscribes to every mutating engine event and writes a fresh snapshot to localStorage after each.
  • restoreFromStorage(engine, { key }) reads the snapshot on app mount and hydrates the repos directly (bypassing service create() calls) so no spurious *Created events fire — the persister would otherwise loop on its own writes, and other UI listeners would see "the same annotation was created again" on every reload.
  • Snapshot is versioned (SNAPSHOT_VERSION = 1); a version mismatch throws on restore so a future schema bump is loud.

src/work/EngineContext.tsx's EngineProvider wires this on first mount. A sibling localStorage key holds the last-active documentId so reload lands the user back on the same fixture.

Why this is acceptable for the MVP

  • The engine never holds PDF bytes — only metadata + selectors + commentary. A typical session is well under 1 MB even with hundreds of annotations, comfortably within the ~5 MB localStorage budget.
  • The repositories' create() signatures already match the shape an eventual durable repo would expose; swapping the implementation is a localised change.
  • "Survives reload" is the only persistence requirement of CE-WP-0002. Cross-device sync, multi-user access, query-by-tag, history — none are in scope yet.

What this defers

  • A real persistence ADR (SQLite local-first vs Postgres server-first vs IndexedDB) for CE-WP-0005+ work.
  • PDF byte persistence. Today the SPA re-fetches /fixtures/pdfs/* on load; bytes do not enter the snapshot.
  • Multi-tab consistency. Tabs see each other's writes only on reload.
  • Migrations beyond the version check.

Consequences

  • src/engine/persistence.ts is the single point of contact for storage. When the real durable-store ADR lands, that module is what changes.
  • Tests inject a memory-Storage shim into attachPersister / restoreFromStorage so they don't depend on a browser environment (see src/engine/persistence.test.ts).
  • Clearing the user's browser storage destroys all annotations — call this out in the README once the MVP ships.