Implement CE-WP-0002 T03-T09: ingest, anchor resolution, engine, UI, persistence, e2e

Completes the PDF review slice end-to-end. After this commit a user can
open a fixture, select text, save an evidence item with commentary, see
it in the sidebar, reload the page, click the item, and the viewer
scrolls to the passage.

- T03 src/source/pdf/{fingerprint,extract,ingest}.ts + 39 fixture tests
  - SHA-256 fingerprint over a fresh ArrayBuffer (TS BufferSource-safe)
  - PDF.js text extract; per-page normalize then join with "\n\n"
  - PageMap + OffsetMap (gap-free coverage); pageLength = end - start
  - Updated manifest's Betriebskosten quote to one PDF.js extracts cleanly
- T04 src/anchor/selectors/{create,resolve}.ts + 25 unit + 7 fixture tests
  - createSelectors emits the maximal redundant set (TextQuote +
    TextPosition + PdfRect + PdfPageText when available)
  - resolveSelectors implements the SharedContracts §7 ladder; confidence
    1.0 (pos+quote) → 0.7 (rect-only) → 0 (unresolved)
  - Cross-module integration test moved to tests/integration/ to honor
    the anchor↛source boundary lint rule
- T05 engine: sync event bus over the closed §4 vocabulary, Map-backed
  repos, services, createEngine() composition root, 12 tests
- T06 work + app: three-pane shell (CollectionList | ViewerShell |
  EvidenceSidebar) wired through EngineProvider; EngineContext lives in
  src/work/ to respect the work↛app boundary; SpikeApp deleted
- T07 AnnotationToolbar: pendingSelection in context; Save runs
  createSelectors → engine.annotations.create → engine.evidence.create
- T08 click-to-reopen + localStorage persistence
  - scrollToAnnotation state in context with a version counter so a
    second click on the same item re-fires the viewer scroll
  - captureSnapshot/restoreSnapshot/attachPersister/restoreFromStorage;
    restore bypasses services to avoid event-loops
  - active-document id persisted alongside the snapshot so reload lands
    on the same fixture; ADR-0005 written
  - 9 persistence tests
- T09 tests/integration/app-prd-scenario.dom.test.tsx
  - end-to-end happy-dom test of PRD scenario steps 1-8 through the real
    React tree; viewer + ingest mocked per ADR-0004's headless-Chromium
    limitation. Fixed memo-deps bug in EvidenceSidebar/ViewerShell where
    useEngineEventTick values were not included in the useMemo deps,
    leaving stale memoization across event-driven re-renders
- vitest.config.ts: happy-dom for *.dom.test.{ts,tsx} files
- noEmit added to tsconfig so tsc -b doesn't litter src/ with .js outputs

Gates: typecheck ✓ lint ✓ test 109/109 across 11 files ✓ build ✓

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 10:58:11 +02:00
parent 2a7b05c190
commit d54daf2e61
45 changed files with 3655 additions and 277 deletions

View File

@@ -1,38 +1,85 @@
# ADR-0005 — Persistence layer (MVP and beyond)
# ADR-0005 — Persistence for the MVP slice
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub); MVP placeholder in CE-WP-0002-T08
- Status: accepted (provisional — durable storage owned by a later workplan)
- Date: 2026-05-25
- Workplan: CE-WP-0002-T08 (click-to-reopen requires reload-survival)
## Context
The MVP needs persistence so that "click an evidence item and have the PDF
jump to and highlight the passage — even after a full page reload" works
(PRD §20 step 4). The acceptable MVP shortcut is `localStorage` (decided
explicitly in CE-WP-0002-T08).
CE-WP-0002 needs the click-to-reopen flow to survive a page reload (PRD
scenario step 4 → "even after a full page reload"). The full persistence
design (SQLite local-first vs Postgres server-first) is too large to land
inside this slice — `wiki/ArchitectureOverview.md` §10 lays out the bigger
picture but the workplan explicitly defers the decision.
This ADR is the durable home for the real persistence decision: where do
documents, annotations, evidence items, links, and sets live in v1.0?
The engine already runs `Map`-backed in-memory repositories
(`src/engine/repos/in-memory.ts`). To survive reloads we need *some*
persistence boundary now, without committing to the long-term store.
## Options
- **A. Browser-local only (IndexedDB via `idb` or `dexie`)**
- Pros: zero infra; great for a single-user reference workspace.
- Cons: no cross-device sync; export/import only via files.
- **B. Local-first + sync server (e.g. CRDT-backed)**
- Pros: matches the long-term vision of a workspace tool; conflict-free
multi-device.
- Cons: significant infra and CRDT design cost; out of MVP scope.
- **C. Traditional client/server with a REST or GraphQL API**
- Pros: familiar; easy team-sharing story.
- Cons: requires hosting; loses the local-first character.
- **A. localStorage snapshot (this ADR).** The SPA serializes the entire
engine state into a single JSON blob on every mutation and restores it
on mount. No new dependencies; no schema migrations; no networking.
Per-tab only.
- **B. IndexedDB-backed store.** More headroom, more API surface, async
reads. Needed eventually for binary blobs (PDF bytes) but overkill for
the few hundred annotations the MVP produces.
- **C. SQLite via `sql.js` or `wa-sqlite`.** Brings query semantics into
the browser. Heavy for the MVP and entangles us with a database we may
not keep.
- **D. Server-backed persistence from day one.** Requires shipping a
backend. Premature.
## Decision
(blank — to be answered before the second product slice past MVP.)
Adopt **A: localStorage snapshot**, deliberately temporary.
Implementation lives in `src/engine/persistence.ts`:
- `captureSnapshot(engine)` returns
`{ documents, representations, annotations, evidenceItems }`.
- `attachPersister(engine, { key })` subscribes to every mutating engine
event and writes a fresh snapshot to `localStorage` after each.
- `restoreFromStorage(engine, { key })` reads the snapshot on app mount
and hydrates the repos *directly* (bypassing service `create()` calls)
so no spurious `*Created` events fire — the persister would otherwise
loop on its own writes, and other UI listeners would see "the same
annotation was created again" on every reload.
- Snapshot is versioned (`SNAPSHOT_VERSION = 1`); a version mismatch
throws on restore so a future schema bump is loud.
`src/work/EngineContext.tsx`'s `EngineProvider` wires this on first mount.
A sibling localStorage key holds the last-active `documentId` so reload
lands the user back on the same fixture.
## Why this is acceptable for the MVP
- The engine never holds PDF bytes — only metadata + selectors + commentary.
A typical session is well under 1 MB even with hundreds of annotations,
comfortably within the ~5 MB localStorage budget.
- The repositories' `create()` signatures already match the shape an
eventual durable repo would expose; swapping the implementation is a
localised change.
- "Survives reload" is the only persistence requirement of CE-WP-0002.
Cross-device sync, multi-user access, query-by-tag, history — none are
in scope yet.
## What this defers
- A real persistence ADR (SQLite local-first vs Postgres server-first vs
IndexedDB) for CE-WP-0005+ work.
- PDF byte persistence. Today the SPA re-fetches `/fixtures/pdfs/*` on
load; bytes do not enter the snapshot.
- Multi-tab consistency. Tabs see each other's writes only on reload.
- Migrations beyond the version check.
## Consequences
(blank)
- `src/engine/persistence.ts` is the single point of contact for storage.
When the real durable-store ADR lands, that module is what changes.
- Tests inject a memory-Storage shim into `attachPersister` /
`restoreFromStorage` so they don't depend on a browser environment
(see `src/engine/persistence.test.ts`).
- Clearing the user's browser storage destroys all annotations — call
this out in the README once the MVP ships.

View File

@@ -1,14 +1,14 @@
{
"_schema_version": 1,
"_description": "PDF fixture corpus for citation-evidence selector tests. Each entry binds a stable id (used by test code) to a file path, page count, and a verbatim known-good quote with its 1-indexed physical PDF page number. The quote is short, unique within the document, and chosen to round-trip cleanly through the canonical text normalizer.",
"_provenance": "Page counts and quotes extracted on 2026-05-24 by reading each PDF directly. The Betriebskosten file is a scanned/handwritten form with noisy OCR text — its quote is taken from the reliably-extracted printed boilerplate, not from the handwritten fields.",
"_provenance": "Page counts and quotes extracted on 2026-05-24 by reading each PDF directly, then re-verified on 2026-05-25 against the PDF.js v4 text extractor used by src/source/pdf/extract.ts. The Betriebskosten file is a scanned/handwritten form with noisy OCR text — its known-good quote was updated 2026-05-25 from 'Ich bitte um Überweisung auf das Konto bei' to 'Auf der Rückseite finden Sie Ihre Abrechnung' because PDF.js drops the capital-Ü in the original (the lowercase-ü in 'Rückseite' survives, so the new quote still exercises the umlaut code path).",
"fixtures": [
{
"id": "betriebskosten-2024",
"filename": "031-Kemal Güldag Betriebskosten 2024.pdf",
"description": "German Betriebskostenabrechnung (utility-cost statement) for a Seeheim apartment — scanned cover letter + filled-in Abrechnung form. OCR-noisy text and handwritten field values. Useful for stress-testing canonical normalization and selector resolution on imperfect extraction.",
"page_count": 2,
"known_good_quote": "Ich bitte um Überweisung auf das Konto bei",
"known_good_quote": "Auf der Rückseite finden Sie Ihre Abrechnung",
"known_good_quote_page": 1,
"characteristics": ["german", "umlauts", "scanned", "ocr-noisy", "form", "handwritten"]
},

View File

@@ -3,7 +3,7 @@
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>citation-evidence · spike</title>
<title>citation-evidence</title>
</head>
<body style="margin:0">
<div id="root"></div>

View File

@@ -25,6 +25,9 @@
"react-pdf-highlighter-plus": "^1.1.4"
},
"devDependencies": {
"@testing-library/dom": "^10.4.1",
"@testing-library/react": "^16.3.2",
"@testing-library/user-event": "^14.6.1",
"@types/node": "^20.14.0",
"@types/react": "^18.3.3",
"@types/react-dom": "^18.3.0",
@@ -34,6 +37,7 @@
"eslint-plugin-boundaries": "^4.2.2",
"eslint-plugin-import": "^2.30.0",
"globals": "^15.9.0",
"happy-dom": "^20.9.0",
"typescript": "^5.5.4",
"typescript-eslint": "^8.0.0",
"vite": "^5.4.0",

183
pnpm-lock.yaml generated
View File

@@ -21,6 +21,15 @@ importers:
specifier: ^1.1.4
version: 1.1.4(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(pdfjs-dist@4.10.38)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
devDependencies:
'@testing-library/dom':
specifier: ^10.4.1
version: 10.4.1
'@testing-library/react':
specifier: ^16.3.2
version: 16.3.2(@testing-library/dom@10.4.1)(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
'@testing-library/user-event':
specifier: ^14.6.1
version: 14.6.1(@testing-library/dom@10.4.1)
'@types/node':
specifier: ^20.14.0
version: 20.19.41
@@ -48,6 +57,9 @@ importers:
globals:
specifier: ^15.9.0
version: 15.15.0
happy-dom:
specifier: ^20.9.0
version: 20.9.0
typescript:
specifier: ^5.5.4
version: 5.9.3
@@ -59,7 +71,7 @@ importers:
version: 5.4.21(@types/node@20.19.41)
vitest:
specifier: ^2.0.5
version: 2.1.9(@types/node@20.19.41)
version: 2.1.9(@types/node@20.19.41)(happy-dom@20.9.0)
packages:
@@ -134,6 +146,10 @@ packages:
peerDependencies:
'@babel/core': ^7.0.0-0
'@babel/runtime@7.29.2':
resolution: {integrity: sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==}
engines: {node: '>=6.9.0'}
'@babel/template@7.28.6':
resolution: {integrity: sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==}
engines: {node: '>=6.9.0'}
@@ -1051,9 +1067,37 @@ packages:
'@tanstack/virtual-core@3.15.0':
resolution: {integrity: sha512-0AwPGx0I8QxPYjAxShT/+z+ZOe9u8mW5rsXvivCTjRfRmz9a43+3mRyi4wwlyoUqOC56q/jatKa0Bh9M99BEHQ==}
'@testing-library/dom@10.4.1':
resolution: {integrity: sha512-o4PXJQidqJl82ckFaXUeoAW+XysPLauYI43Abki5hABd853iMhitooc6znOnczgbTYmEP6U6/y1ZyKAIsvMKGg==}
engines: {node: '>=18'}
'@testing-library/react@16.3.2':
resolution: {integrity: sha512-XU5/SytQM+ykqMnAnvB2umaJNIOsLF3PVv//1Ew4CTcpz0/BRyy/af40qqrt7SjKpDdT1saBMc42CUok5gaw+g==}
engines: {node: '>=18'}
peerDependencies:
'@testing-library/dom': ^10.0.0
'@types/react': ^18.0.0 || ^19.0.0
'@types/react-dom': ^18.0.0 || ^19.0.0
react: ^18.0.0 || ^19.0.0
react-dom: ^18.0.0 || ^19.0.0
peerDependenciesMeta:
'@types/react':
optional: true
'@types/react-dom':
optional: true
'@testing-library/user-event@14.6.1':
resolution: {integrity: sha512-vq7fv0rnt+QTXgPxr5Hjc210p6YKq2kmdziLgnsZGgLJ9e6VAShx1pACLuRjd/AS/sr7phAR58OIIpf0LlmQNw==}
engines: {node: '>=12', npm: '>=6'}
peerDependencies:
'@testing-library/dom': '>=7.21.4'
'@tybys/wasm-util@0.10.2':
resolution: {integrity: sha512-RoBvJ2X0wuKlWFIjrwffGw1IqZHKQqzIchKaadZZfnNpsAYp2mM0h36JtPCjNDAHGgYez/15uMBpfGwchhiMgg==}
'@types/aria-query@5.0.4':
resolution: {integrity: sha512-rfT93uj5s0PRL7EzccGMs3brplhcrghnDoV26NqKhCAS1hVo+WdNsPvE/yb6ilfr5hi2MEk6d5EWJTKdxg8jVw==}
'@types/babel__core@7.20.5':
resolution: {integrity: sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA==}
@@ -1092,6 +1136,12 @@ packages:
'@types/react@18.3.29':
resolution: {integrity: sha512-ch0qJdr2JY0r04NXSprbK6TXOgnaJ1Tz23fm5W+z0/CBah6BSBc3n96h7K9GOtwh0HrilNWHIBzE1Ko4Dcw/Wg==}
'@types/whatwg-mimetype@3.0.2':
resolution: {integrity: sha512-c2AKvDT8ToxLIOUlN51gTiHXflsfIFisS4pO7pDPoKouJCESkhZnEy623gwP9laCy5lnLDAw1vAzu2vM2YLOrA==}
'@types/ws@8.18.1':
resolution: {integrity: sha512-ThVF6DCVhA8kUGy+aazFQ4kXQ7E1Ty7A3ypFOe0IcJV8O/M511G99AW24irKrW56Wt44yG9+ij8FaqoBGkuBXg==}
'@typescript-eslint/eslint-plugin@8.59.4':
resolution: {integrity: sha512-PegsU+XfyJJNjd4+u/k6f9yTyp0lEXXiPopUNobZcIAUJFGICFLN+sP0Rb3JehVmiij1Ph0dFGYqODoRo/2+6A==}
engines: {node: ^18.18.0 || ^20.9.0 || >=21.1.0}
@@ -1309,10 +1359,18 @@ packages:
ajv@6.15.0:
resolution: {integrity: sha512-fgFx7Hfoq60ytK2c7DhnF8jIvzYgOMxfugjLOSMHjLIPgenqa7S7oaagATUq99mV6IYvN2tRmC0wnTYX6iPbMw==}
ansi-regex@5.0.1:
resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==}
engines: {node: '>=8'}
ansi-styles@4.3.0:
resolution: {integrity: sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==}
engines: {node: '>=8'}
ansi-styles@5.2.0:
resolution: {integrity: sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==}
engines: {node: '>=10'}
argparse@2.0.1:
resolution: {integrity: sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q==}
@@ -1320,6 +1378,9 @@ packages:
resolution: {integrity: sha512-ik3ZgC9dY/lYVVM++OISsaYDeg1tb0VtP5uL3ouh1koGOaUMDPpbFIei4JkFimWUFPn90sbMNMXQAIVOlnYKJA==}
engines: {node: '>=10'}
aria-query@5.3.0:
resolution: {integrity: sha512-b0P0sZPKtyu8HkeRAfCq0IfURZK+SuwMjY1UXGBU27wpAiTwQAIlq56IbIO+ytk/JjS1fMR14ee5WBBfKi5J6A==}
array-buffer-byte-length@1.0.2:
resolution: {integrity: sha512-LHE+8BuR7RYGDKvnrmcuSq3tDcKv9OFEXQt/HpbZhY7V6h0zlUXutnAD82GiFx9rdieCMjkvtcsPqBwgUl1Iiw==}
engines: {node: '>= 0.4'}
@@ -1490,6 +1551,10 @@ packages:
resolution: {integrity: sha512-8QmQKqEASLd5nx0U1B1okLElbUuuttJ/AnYmRXbbbGDWh6uS208EjD4Xqq/I9wK7u0v6O08XhTWnt5XtEbR6Dg==}
engines: {node: '>= 0.4'}
dequal@2.0.3:
resolution: {integrity: sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA==}
engines: {node: '>=6'}
detect-node-es@1.1.0:
resolution: {integrity: sha512-ypdmJU/TbBby2Dxibuv7ZLW3Bs1QEmM7nHjEANfohJLvE0XVujisn1qPJcZxg+qDucsr+bP6fLD1rPS3AhJ7EQ==}
@@ -1497,6 +1562,9 @@ packages:
resolution: {integrity: sha512-35mSku4ZXK0vfCuHEDAwt55dg2jNajHZ1odvF+8SSr82EsZY4QmXfuWso8oEd8zRhVObSN18aM0CjSdoBX7zIw==}
engines: {node: '>=0.10.0'}
dom-accessibility-api@0.5.16:
resolution: {integrity: sha512-X7BJ2yElsnOJ30pZF4uIIDfBEVgF4XEBxL9Bxhy6dnrm5hkzqmsWHGTiHqRiITNhMyFLyAiWndIJP7Z1NTteDg==}
dunder-proto@1.0.1:
resolution: {integrity: sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==}
engines: {node: '>= 0.4'}
@@ -1504,6 +1572,10 @@ packages:
electron-to-chromium@1.5.361:
resolution: {integrity: sha512-Q6Hts7N9FnJc5LeGRINFvLhCI9xZmNtTDe5ZbcVezQz7cU4a8Aua3GH1b8J2XY8Al9PF+OCwYqhgsOOheMdvkA==}
entities@7.0.1:
resolution: {integrity: sha512-TWrgLOFUQTH994YUyl1yT4uyavY5nNB5muff+RtWaqNVCAK408b5ZnnbNAUEWLTCpum9w6arT70i1XdQ4UeOPA==}
engines: {node: '>=0.12'}
es-abstract@1.24.2:
resolution: {integrity: sha512-2FpH9Q5i2RRwyEP1AylXe6nYLR5OhaJTZwmlcP0dL/+JCbgg7yyEo/sEK6HeGZRf3dFpWwThaRHVApXSkW3xeg==}
engines: {node: '>= 0.4'}
@@ -1781,6 +1853,10 @@ packages:
resolution: {integrity: sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==}
engines: {node: '>= 0.4'}
happy-dom@20.9.0:
resolution: {integrity: sha512-GZZ9mKe8r646NUAf/zemnGbjYh4Bt8/MqASJY+pSm5ZDtc3YQox+4gsLI7yi1hba6o+eCsGxpHn5+iEVn31/FQ==}
engines: {node: '>=20.0.0'}
has-bigints@1.1.0:
resolution: {integrity: sha512-R3pbpkcIqv2Pm3dUwgjclDRVmWpTJW2DcMzcIhEXEx1oh/CEMObMm3KLmRJOdvhM7o4uQBnwr8pzRK2sJWIqfg==}
engines: {node: '>= 0.4'}
@@ -1999,6 +2075,10 @@ packages:
peerDependencies:
react: ^16.5.1 || ^17.0.0 || ^18.0.0 || ^19.0.0
lz-string@1.5.0:
resolution: {integrity: sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==}
hasBin: true
magic-string@0.30.21:
resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}
@@ -2147,6 +2227,10 @@ packages:
resolution: {integrity: sha512-vkcDPrRZo1QZLbn5RLGPpg/WmIQ65qoWWhcGKf/b5eplkkarX0m9z8ppCat4mlOqUsWpyNuYgO3VRyrYHSzX5g==}
engines: {node: '>= 0.8.0'}
pretty-format@27.5.1:
resolution: {integrity: sha512-Qb1gy5OrP5+zDf2Bvnzdl3jsTf1qXVMazbvCoKhtKqVs4/YK4ozX4gKQJJVyNe+cajNPn0KoC0MC3FUmaHWEmQ==}
engines: {node: ^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0}
prop-types@15.8.1:
resolution: {integrity: sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg==}
@@ -2174,6 +2258,9 @@ packages:
react-is@16.13.1:
resolution: {integrity: sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ==}
react-is@17.0.2:
resolution: {integrity: sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w==}
react-pdf-highlighter-plus@1.1.4:
resolution: {integrity: sha512-cJPFZnKjp4mmPjnamh11eC2I0W4waFAwLLG1E3mTg4TQRyMyUY+C6SyUm8MAcQnogbaXIAvCXP9B4hsnTSflnA==}
peerDependencies:
@@ -2542,6 +2629,10 @@ packages:
jsdom:
optional: true
whatwg-mimetype@3.0.0:
resolution: {integrity: sha512-nt+N2dzIutVRxARx1nghPKGv1xHikU7HKdfafKkLNLindmPU/ch3U31NOCGGA/dmPcmb1VlofO0vnKAcsm0o/Q==}
engines: {node: '>=12'}
which-boxed-primitive@1.1.1:
resolution: {integrity: sha512-TbX3mj8n0odCBFVlY8AxkqcHASw3L60jIuF8jFP78az3C2YhmGvqbHBpAjTRH2/xqYunrJ9g1jSyjCjpoWzIAA==}
engines: {node: '>= 0.4'}
@@ -2572,6 +2663,18 @@ packages:
resolution: {integrity: sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==}
engines: {node: '>=0.10.0'}
ws@8.21.0:
resolution: {integrity: sha512-Vsp28b7DRcimFQvrqu2Wek3z1iYxDCWqHYB8Qsnk/S4RfaCQzPGPyBNuVjJV3cd6UiKtUtp6sNM77gWvzcCH+g==}
engines: {node: '>=10.0.0'}
peerDependencies:
bufferutil: ^4.0.1
utf-8-validate: '>=5.0.2'
peerDependenciesMeta:
bufferutil:
optional: true
utf-8-validate:
optional: true
yallist@3.1.1:
resolution: {integrity: sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==}
@@ -2670,6 +2773,8 @@ snapshots:
'@babel/core': 7.29.0
'@babel/helper-plugin-utils': 7.28.6
'@babel/runtime@7.29.2': {}
'@babel/template@7.28.6':
dependencies:
'@babel/code-frame': 7.29.0
@@ -3482,11 +3587,38 @@ snapshots:
'@tanstack/virtual-core@3.15.0': {}
'@testing-library/dom@10.4.1':
dependencies:
'@babel/code-frame': 7.29.0
'@babel/runtime': 7.29.2
'@types/aria-query': 5.0.4
aria-query: 5.3.0
dom-accessibility-api: 0.5.16
lz-string: 1.5.0
picocolors: 1.1.1
pretty-format: 27.5.1
'@testing-library/react@16.3.2(@testing-library/dom@10.4.1)(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
dependencies:
'@babel/runtime': 7.29.2
'@testing-library/dom': 10.4.1
react: 18.3.1
react-dom: 18.3.1(react@18.3.1)
optionalDependencies:
'@types/react': 18.3.29
'@types/react-dom': 18.3.7(@types/react@18.3.29)
'@testing-library/user-event@14.6.1(@testing-library/dom@10.4.1)':
dependencies:
'@testing-library/dom': 10.4.1
'@tybys/wasm-util@0.10.2':
dependencies:
tslib: 2.8.1
optional: true
'@types/aria-query@5.0.4': {}
'@types/babel__core@7.20.5':
dependencies:
'@babel/parser': 7.29.3
@@ -3531,6 +3663,12 @@ snapshots:
'@types/prop-types': 15.7.15
csstype: 3.2.3
'@types/whatwg-mimetype@3.0.2': {}
'@types/ws@8.18.1':
dependencies:
'@types/node': 20.19.41
'@typescript-eslint/eslint-plugin@8.59.4(@typescript-eslint/parser@8.59.4(eslint@9.39.4)(typescript@5.9.3))(eslint@9.39.4)(typescript@5.9.3)':
dependencies:
'@eslint-community/regexpp': 4.12.2
@@ -3757,16 +3895,24 @@ snapshots:
json-schema-traverse: 0.4.1
uri-js: 4.4.1
ansi-regex@5.0.1: {}
ansi-styles@4.3.0:
dependencies:
color-convert: 2.0.1
ansi-styles@5.2.0: {}
argparse@2.0.1: {}
aria-hidden@1.2.6:
dependencies:
tslib: 2.8.1
aria-query@5.3.0:
dependencies:
dequal: 2.0.3
array-buffer-byte-length@1.0.2:
dependencies:
call-bound: 1.0.4
@@ -3956,12 +4102,16 @@ snapshots:
has-property-descriptors: 1.0.2
object-keys: 1.1.1
dequal@2.0.3: {}
detect-node-es@1.1.0: {}
doctrine@2.1.0:
dependencies:
esutils: 2.0.3
dom-accessibility-api@0.5.16: {}
dunder-proto@1.0.1:
dependencies:
call-bind-apply-helpers: 1.0.2
@@ -3970,6 +4120,8 @@ snapshots:
electron-to-chromium@1.5.361: {}
entities@7.0.1: {}
es-abstract@1.24.2:
dependencies:
array-buffer-byte-length: 1.0.2
@@ -4352,6 +4504,18 @@ snapshots:
gopd@1.2.0: {}
happy-dom@20.9.0:
dependencies:
'@types/node': 20.19.41
'@types/whatwg-mimetype': 3.0.2
'@types/ws': 8.18.1
entities: 7.0.1
whatwg-mimetype: 3.0.0
ws: 8.21.0
transitivePeerDependencies:
- bufferutil
- utf-8-validate
has-bigints@1.1.0: {}
has-flag@4.0.0: {}
@@ -4558,6 +4722,8 @@ snapshots:
dependencies:
react: 18.3.1
lz-string@1.5.0: {}
magic-string@0.30.21:
dependencies:
'@jridgewell/sourcemap-codec': 1.5.5
@@ -4704,6 +4870,12 @@ snapshots:
prelude-ls@1.2.1: {}
pretty-format@27.5.1:
dependencies:
ansi-regex: 5.0.1
ansi-styles: 5.2.0
react-is: 17.0.2
prop-types@15.8.1:
dependencies:
loose-envify: 1.4.0
@@ -4732,6 +4904,8 @@ snapshots:
react-is@16.13.1: {}
react-is@17.0.2: {}
react-pdf-highlighter-plus@1.1.4(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(pdfjs-dist@4.10.38)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
dependencies:
'@radix-ui/react-collapsible': 1.1.12(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
@@ -5179,7 +5353,7 @@ snapshots:
'@types/node': 20.19.41
fsevents: 2.3.3
vitest@2.1.9(@types/node@20.19.41):
vitest@2.1.9(@types/node@20.19.41)(happy-dom@20.9.0):
dependencies:
'@vitest/expect': 2.1.9
'@vitest/mocker': 2.1.9(vite@5.4.21(@types/node@20.19.41))
@@ -5203,6 +5377,7 @@ snapshots:
why-is-node-running: 2.3.0
optionalDependencies:
'@types/node': 20.19.41
happy-dom: 20.9.0
transitivePeerDependencies:
- less
- lightningcss
@@ -5214,6 +5389,8 @@ snapshots:
- supports-color
- terser
whatwg-mimetype@3.0.0: {}
which-boxed-primitive@1.1.1:
dependencies:
is-bigint: 1.1.0
@@ -5266,6 +5443,8 @@ snapshots:
word-wrap@1.2.5: {}
ws@8.21.0: {}
yallist@3.1.1: {}
yocto-queue@0.1.0: {}

View File

@@ -5,3 +5,9 @@ export {
type PdfSpikeViewerProps,
type StoredAnnotation,
} from "./pdf-viewer-adapter-spike";
export {
createSelectors,
resolveSelectors,
DEFAULT_CONTEXT_CHARS,
type CreateSelectorsOptions,
} from "./selectors";

View File

@@ -0,0 +1,136 @@
import { describe, expect, it } from "vitest";
import type { DocumentRepresentation } from "@shared/document";
import type { DocumentId, RepresentationId } from "@shared/ids";
import type {
PdfPageTextSelector,
PdfRectSelector,
TextPositionSelector,
TextQuoteSelector,
} from "@shared/selector";
import { createSelectors } from "./create";
import type { PdfSelectionCapture } from "../types";
function repr(canonicalText: string): DocumentRepresentation {
const pageLength = canonicalText.length;
return {
id: "rep_test" as RepresentationId,
documentId: "doc_test" as DocumentId,
representationType: "pdf-text",
contentHash: "test",
canonicalText,
pageMap: [{ page: 1, width: 595, height: 842 }],
offsetMap: [
{ page: 1, globalStart: 0, globalEnd: pageLength, pageLength },
],
generatedAt: "2026-05-25T00:00:00.000Z",
};
}
function capture(text: string, page = 1, rectsCount = 1): PdfSelectionCapture {
return {
kind: "pdf",
text,
page,
rects: Array.from({ length: rectsCount }, (_, i) => ({
x: 0.1,
y: 0.2 + i * 0.05,
width: 0.5,
height: 0.04,
})),
boundingRect: { x: 0.1, y: 0.2, width: 0.5, height: 0.04 * rectsCount },
};
}
describe("createSelectors", () => {
const text = "The quick brown fox jumps over the lazy dog near the river bank.";
const representation = repr(text);
it("always includes a TextQuoteSelector with prefix and suffix from canonical text", () => {
const sels = createSelectors(capture("brown fox"), representation);
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector");
expect(quote).toBeDefined();
expect(quote!.exact).toBe("brown fox");
expect(quote!.prefix).toBe("The quick ");
expect(quote!.suffix).toBe(" jumps over the lazy dog near th");
});
it("includes a TextPositionSelector pointing at the matched offset", () => {
const sels = createSelectors(capture("brown fox"), representation);
const pos = sels.find((s): s is TextPositionSelector => s.type === "TextPositionSelector");
expect(pos).toBeDefined();
expect(pos!.start).toBe(text.indexOf("brown fox"));
expect(pos!.end).toBe(text.indexOf("brown fox") + "brown fox".length);
});
it("includes a PdfRectSelector mirroring the capture's page and rects", () => {
const c = capture("brown fox", 1, 2);
const sels = createSelectors(c, representation);
const rect = sels.find((s): s is PdfRectSelector => s.type === "PdfRectSelector");
expect(rect).toBeDefined();
expect(rect!.page).toBe(1);
expect(rect!.rects).toEqual(c.rects);
});
it("includes a PdfPageTextSelector when the match falls inside the capture's page range", () => {
const sels = createSelectors(capture("brown fox"), representation);
const pageText = sels.find((s): s is PdfPageTextSelector => s.type === "PdfPageTextSelector");
expect(pageText).toBeDefined();
expect(pageText!.page).toBe(1);
expect(pageText!.start).toBe(text.indexOf("brown fox"));
});
it("omits the TextPositionSelector when the quote cannot be found in canonical text", () => {
const sels = createSelectors(capture("nonexistent phrase"), representation);
const pos = sels.find((s) => s.type === "TextPositionSelector");
expect(pos).toBeUndefined();
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector");
expect(quote!.exact).toBe("nonexistent phrase");
expect(quote!.prefix).toBeUndefined();
expect(quote!.suffix).toBeUndefined();
});
it("clamps prefix at the start of the canonical text", () => {
const sels = createSelectors(capture("The quick"), representation);
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
expect(quote.prefix).toBeUndefined();
expect(quote.suffix).toBe(" brown fox jumps over the lazy d");
});
it("clamps suffix at the end of the canonical text", () => {
const sels = createSelectors(capture("river bank."), representation);
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
expect(quote.prefix).toBe("umps over the lazy dog near the ");
expect(quote.suffix).toBeUndefined();
});
it("honors a custom contextChars option", () => {
const sels = createSelectors(capture("brown fox"), representation, { contextChars: 4 });
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
expect(quote.prefix).toBe("ick ");
expect(quote.suffix).toBe(" jum");
});
it("prefers the on-page match when the quote appears on multiple pages", () => {
// Two-page representation where the quote appears once per page.
const canonical = "alpha echo bravo" + "\n\n" + "charlie echo delta";
const rep: DocumentRepresentation = {
id: "rep_multi" as RepresentationId,
documentId: "doc_multi" as DocumentId,
representationType: "pdf-text",
contentHash: "h",
canonicalText: canonical,
pageMap: [
{ page: 1, width: 100, height: 100 },
{ page: 2, width: 100, height: 100 },
],
offsetMap: [
{ page: 1, globalStart: 0, globalEnd: 18, pageLength: 18 },
{ page: 2, globalStart: 18, globalEnd: canonical.length, pageLength: canonical.length - 18 },
],
generatedAt: "2026-05-25T00:00:00.000Z",
};
const sels = createSelectors(capture("echo", 2), rep);
const pos = sels.find((s): s is TextPositionSelector => s.type === "TextPositionSelector")!;
expect(pos.start).toBe(canonical.indexOf("echo", 18));
});
});

View File

@@ -0,0 +1,157 @@
/**
* Build the maximal `Selector[]` from a viewer's `SelectionCapture`.
*
* Implements the "always store all selector types that are available" rule
* from `wiki/SharedContracts.md` §3 (selector redundancy) and the create
* half of the `AnchorAdapter` contract in
* `wiki/ArchitectureOverview.md` §3.3.
*
* Output guarantee: every returned `Selector[]` includes a
* `TextQuoteSelector` (always) and adds `TextPositionSelector`,
* `PdfRectSelector`, `PdfPageTextSelector` only when the underlying data
* actually supports them. Resolvers can rely on the union being trimmed —
* a missing selector means "not available", not "skipped".
*/
import type { DocumentRepresentation } from "@shared/document";
import { normalize } from "@shared/text/normalize";
import type {
PdfPageTextSelector,
PdfRectSelector,
Selector,
TextPositionSelector,
TextQuoteSelector,
} from "@shared/selector";
import type { PdfSelectionCapture, SelectionCapture } from "../types";
/** Default characters of prefix/suffix context stored on TextQuoteSelector. */
export const DEFAULT_CONTEXT_CHARS = 32;
export interface CreateSelectorsOptions {
readonly contextChars?: number;
}
export function createSelectors(
capture: SelectionCapture,
representation: DocumentRepresentation,
options: CreateSelectorsOptions = {},
): Selector[] {
// `SelectionCapture` is a discriminated union. The DOM branch is `never`
// in MVP, so the only runtime shape is `PdfSelectionCapture`.
return createSelectorsFromPdfCapture(capture, representation, options);
}
function createSelectorsFromPdfCapture(
capture: PdfSelectionCapture,
representation: DocumentRepresentation,
options: CreateSelectorsOptions,
): Selector[] {
const contextChars = options.contextChars ?? DEFAULT_CONTEXT_CHARS;
const normalizedQuote = normalize(capture.text).text;
const out: Selector[] = [];
const canonicalText = representation.canonicalText ?? "";
const positions = canonicalText.length > 0 && normalizedQuote.length > 0
? findAllOccurrences(canonicalText, normalizedQuote)
: [];
// Locate the match that falls on the capture's page (when offsetMap is
// known); otherwise fall back to the first match. If there is no match,
// we still emit a quote-only TextQuoteSelector so the annotation is
// recoverable later if the representation is rebuilt.
const pageRange = representation.offsetMap?.find((r) => r.page === capture.page);
const matchOffset = pickMatch(positions, pageRange);
// 1. TextQuoteSelector — always included.
if (normalizedQuote.length > 0) {
const quote = matchOffset !== null
? buildQuoteSelectorWithContext(canonicalText, matchOffset, normalizedQuote, contextChars)
: ({ type: "TextQuoteSelector", exact: normalizedQuote } satisfies TextQuoteSelector);
out.push(quote);
}
// 2. TextPositionSelector — only when we have a unique-enough match.
if (matchOffset !== null) {
const pos: TextPositionSelector = {
type: "TextPositionSelector",
start: matchOffset,
end: matchOffset + normalizedQuote.length,
};
out.push(pos);
}
// 3. PdfRectSelector — straight from the capture; viewer-coordinate truth.
if (capture.rects.length > 0) {
const rect: PdfRectSelector = {
type: "PdfRectSelector",
page: capture.page,
rects: capture.rects,
};
out.push(rect);
}
// 4. PdfPageTextSelector — when we have offsetMap and a unique-enough match
// that falls inside the capture's page range.
if (matchOffset !== null && pageRange) {
if (matchOffset >= pageRange.globalStart && matchOffset + normalizedQuote.length <= pageRange.globalEnd) {
const pageText: PdfPageTextSelector = {
type: "PdfPageTextSelector",
page: capture.page,
start: matchOffset - pageRange.globalStart,
end: matchOffset - pageRange.globalStart + normalizedQuote.length,
};
out.push(pageText);
}
}
return out;
}
function findAllOccurrences(haystack: string, needle: string): number[] {
if (needle.length === 0) return [];
const out: number[] = [];
let from = 0;
for (;;) {
const idx = haystack.indexOf(needle, from);
if (idx === -1) break;
out.push(idx);
from = idx + 1;
}
return out;
}
function pickMatch(
positions: readonly number[],
pageRange: { globalStart: number; globalEnd: number } | undefined,
): number | null {
if (positions.length === 0) return null;
if (positions.length === 1) return positions[0]!;
if (pageRange) {
const onPage = positions.find(
(p) => p >= pageRange.globalStart && p < pageRange.globalEnd,
);
if (onPage !== undefined) return onPage;
}
// Multiple matches and no page hint — return the first; resolve.ts will
// need prefix/suffix to disambiguate.
return positions[0]!;
}
function buildQuoteSelectorWithContext(
canonicalText: string,
matchOffset: number,
exact: string,
contextChars: number,
): TextQuoteSelector {
const prefixStart = Math.max(0, matchOffset - contextChars);
const suffixEnd = Math.min(canonicalText.length, matchOffset + exact.length + contextChars);
const prefix = canonicalText.slice(prefixStart, matchOffset);
const suffix = canonicalText.slice(matchOffset + exact.length, suffixEnd);
return {
type: "TextQuoteSelector",
exact,
...(prefix.length > 0 ? { prefix } : {}),
...(suffix.length > 0 ? { suffix } : {}),
};
}

View File

@@ -0,0 +1,6 @@
export {
createSelectors,
DEFAULT_CONTEXT_CHARS,
type CreateSelectorsOptions,
} from "./create";
export { resolveSelectors } from "./resolve";

View File

@@ -0,0 +1,137 @@
import { describe, expect, it } from "vitest";
import type { DocumentRepresentation } from "@shared/document";
import type { DocumentId, RepresentationId } from "@shared/ids";
import type { Selector } from "@shared/selector";
import { resolveSelectors } from "./resolve";
function repr(canonicalText: string, pages = 1): DocumentRepresentation {
const segmentLen = pages === 1
? canonicalText.length
: Math.floor(canonicalText.length / pages);
const offsetMap = [];
for (let i = 0; i < pages; i++) {
const start = i * segmentLen;
const end = i === pages - 1 ? canonicalText.length : start + segmentLen;
offsetMap.push({ page: i + 1, globalStart: start, globalEnd: end, pageLength: end - start });
}
return {
id: "rep_test" as RepresentationId,
documentId: "doc_test" as DocumentId,
representationType: "pdf-text",
contentHash: "test",
canonicalText,
pageMap: Array.from({ length: pages }, (_, i) => ({ page: i + 1, width: 595, height: 842 })),
offsetMap,
generatedAt: "2026-05-25T00:00:00.000Z",
};
}
describe("resolveSelectors", () => {
const text = "The quick brown fox jumps over the lazy dog.";
const representation = repr(text);
const brownFoxStart = text.indexOf("brown fox");
const brownFoxEnd = brownFoxStart + "brown fox".length;
it("returns 1.0 confidence when position and quote agree exactly", () => {
const selectors: Selector[] = [
{ type: "TextPositionSelector", start: brownFoxStart, end: brownFoxEnd },
{ type: "TextQuoteSelector", exact: "brown fox" },
];
const r = resolveSelectors(selectors, representation);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(1.0);
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
expect(r.candidates[0]?.page).toBe(1);
expect(r.usedSelectorTypes).toEqual(["TextPositionSelector", "TextQuoteSelector"]);
});
it("falls back to quote search when position is stale, and records a warning", () => {
const selectors: Selector[] = [
{ type: "TextPositionSelector", start: 0, end: 9 }, // "The quick"
{ type: "TextQuoteSelector", exact: "brown fox" },
];
const r = resolveSelectors(selectors, representation);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.95);
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
expect(r.warnings?.[0]).toMatch(/did not match/);
expect(r.usedSelectorTypes).toEqual(["TextQuoteSelector"]);
});
it("returns 0.85 for a position-only selector with no quote to verify", () => {
const selectors: Selector[] = [
{ type: "TextPositionSelector", start: brownFoxStart, end: brownFoxEnd },
];
const r = resolveSelectors(selectors, representation);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.85);
});
it("returns 0.95 when only TextQuoteSelector is present and the quote is unique", () => {
const r = resolveSelectors(
[{ type: "TextQuoteSelector", exact: "brown fox" }],
representation,
);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.95);
});
it("returns 0.9 when a duplicated quote is disambiguated by prefix/suffix", () => {
const dup = "alpha echo bravo charlie echo delta";
const r = resolveSelectors(
[{ type: "TextQuoteSelector", exact: "echo", prefix: "charlie ", suffix: " delta" }],
repr(dup),
);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.9);
expect(r.candidates[0]?.textPosition?.start).toBe(dup.indexOf("echo", 10));
});
it("returns ambiguous when a duplicated quote cannot be disambiguated", () => {
const dup = "echo and echo";
const r = resolveSelectors(
[{ type: "TextQuoteSelector", exact: "echo" }],
repr(dup),
);
expect(r.status).toBe("ambiguous");
expect(r.confidence).toBe(0.5);
});
it("falls back to PdfPageTextSelector via the OffsetMap", () => {
// Single page, "brown fox" at offset 10..19.
const r = resolveSelectors(
[{ type: "PdfPageTextSelector", page: 1, start: brownFoxStart, end: brownFoxEnd }],
representation,
);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.8);
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
expect(r.candidates[0]?.page).toBe(1);
});
it("falls back to PdfRectSelector with page+rects only at 0.7 confidence", () => {
const r = resolveSelectors(
[{
type: "PdfRectSelector",
page: 2,
rects: [{ x: 0.1, y: 0.2, width: 0.3, height: 0.04 }],
}],
repr(text, 1),
);
expect(r.status).toBe("resolved");
expect(r.confidence).toBe(0.7);
expect(r.candidates[0]?.page).toBe(2);
expect(r.candidates[0]?.textPosition).toBeUndefined();
expect(r.candidates[0]?.rects).toHaveLength(1);
});
it("returns unresolved when nothing matches", () => {
const r = resolveSelectors(
[{ type: "TextQuoteSelector", exact: "missing string" }],
representation,
);
expect(r.status).toBe("unresolved");
expect(r.confidence).toBe(0);
expect(r.candidates).toEqual([]);
});
});

View File

@@ -0,0 +1,260 @@
/**
* Resolve a `Selector[]` against a `DocumentRepresentation`.
*
* Implements the resolution strategy from `wiki/ArchitectureOverview.md` §7,
* MVP-trimmed:
*
* 1. Try `TextPositionSelector` (cheapest — direct slice).
* 2. Verify with `TextQuoteSelector` at that position.
* 3. Try `TextQuoteSelector` on its own. If multiple matches, disambiguate
* by prefix/suffix.
* 4. Try `PdfPageTextSelector` (page-local offsets through the OffsetMap).
* 5. Fall back to `PdfRectSelector` for a page+rects-only target.
* 6. Return `unresolved` if nothing above succeeds.
*
* Fuzzy matching is out of scope here; a later workplan owns it.
*
* Confidence ladder (0..1):
* 1.00 — TextPosition + TextQuote agree exactly
* 0.95 — TextQuote unique match (no position to cross-check)
* 0.90 — TextQuote disambiguated by prefix/suffix
* 0.85 — TextPosition only (no quote to cross-check)
* 0.80 — PdfPageTextSelector resolved via OffsetMap
* 0.70 — PdfRectSelector only (page+rects, no text verification)
*/
import type { DocumentRepresentation } from "@shared/document";
import type {
PdfPageTextSelector,
PdfRectSelector,
Selector,
SelectorType,
TextPositionSelector,
TextQuoteSelector,
} from "@shared/selector";
import type { AnchorResolution, ResolvedAnchorTarget } from "../types";
export function resolveSelectors(
selectors: readonly Selector[],
representation: DocumentRepresentation,
): AnchorResolution {
const canonicalText = representation.canonicalText ?? "";
const offsetMap = representation.offsetMap ?? [];
const representationId = representation.id;
const byType = indexByType(selectors);
const used: SelectorType[] = [];
const warnings: string[] = [];
// 1 & 2. Try TextPositionSelector, verify with TextQuoteSelector.
if (byType.TextPositionSelector && canonicalText.length > 0) {
const pos = byType.TextPositionSelector;
const slice = sliceSafely(canonicalText, pos.start, pos.end);
if (slice !== null) {
const quote = byType.TextQuoteSelector;
if (quote) {
if (slice === quote.exact) {
used.push("TextPositionSelector", "TextQuoteSelector");
return resolved(
{ representationId, textPosition: { start: pos.start, end: pos.end }, ...pageFor(pos, offsetMap) },
1.0,
used,
warnings,
);
}
warnings.push(
"TextPositionSelector slice did not match TextQuoteSelector.exact; falling back to quote search.",
);
} else {
// Position with no quote to verify — accept at lower confidence.
used.push("TextPositionSelector");
return resolved(
{ representationId, textPosition: { start: pos.start, end: pos.end }, ...pageFor(pos, offsetMap) },
0.85,
used,
warnings,
);
}
}
}
// 3. TextQuoteSelector on its own (or after the position fallback above).
if (byType.TextQuoteSelector && canonicalText.length > 0) {
const quoteResult = resolveByQuote(canonicalText, byType.TextQuoteSelector);
if (quoteResult) {
used.push("TextQuoteSelector");
return resolved(
{
representationId,
textPosition: { start: quoteResult.offset, end: quoteResult.offset + byType.TextQuoteSelector.exact.length },
...pageFor({ start: quoteResult.offset, end: quoteResult.offset + byType.TextQuoteSelector.exact.length }, offsetMap),
},
quoteResult.confidence,
used,
warnings,
quoteResult.status,
);
}
}
// 4. PdfPageTextSelector through OffsetMap.
if (byType.PdfPageTextSelector && offsetMap.length > 0) {
const pageText = byType.PdfPageTextSelector;
const range = offsetMap.find((r) => r.page === pageText.page);
if (range && pageText.start >= 0 && pageText.end <= range.pageLength && pageText.start < pageText.end) {
const globalStart = range.globalStart + pageText.start;
const globalEnd = range.globalStart + pageText.end;
used.push("PdfPageTextSelector");
return resolved(
{
representationId,
page: pageText.page,
textPosition: { start: globalStart, end: globalEnd },
},
0.8,
used,
warnings,
);
}
}
// 5. PdfRectSelector fallback (no text verification possible).
if (byType.PdfRectSelector) {
const rect = byType.PdfRectSelector;
used.push("PdfRectSelector");
return resolved(
{ representationId, page: rect.page, rects: rect.rects },
0.7,
used,
warnings,
);
}
return unresolved(warnings);
}
interface QuoteResolutionResult {
readonly offset: number;
readonly confidence: number;
readonly status: "resolved" | "ambiguous";
}
function resolveByQuote(canonicalText: string, quote: TextQuoteSelector): QuoteResolutionResult | null {
const positions = findAllOccurrences(canonicalText, quote.exact);
if (positions.length === 0) return null;
if (positions.length === 1) {
return { offset: positions[0]!, confidence: 0.95, status: "resolved" };
}
// Multiple matches — try to disambiguate by prefix/suffix.
const filtered = positions.filter((p) => prefixSuffixMatches(canonicalText, p, quote));
if (filtered.length === 1) {
return { offset: filtered[0]!, confidence: 0.9, status: "resolved" };
}
if (filtered.length > 1) {
return { offset: filtered[0]!, confidence: 0.5, status: "ambiguous" };
}
// No prefix/suffix info or no matches with context — return ambiguous on first.
return { offset: positions[0]!, confidence: 0.5, status: "ambiguous" };
}
function prefixSuffixMatches(
canonicalText: string,
offset: number,
quote: TextQuoteSelector,
): boolean {
if (quote.prefix !== undefined) {
const prefixEnd = offset;
const prefixStart = Math.max(0, prefixEnd - quote.prefix.length);
const actualPrefix = canonicalText.slice(prefixStart, prefixEnd);
if (!actualPrefix.endsWith(quote.prefix)) return false;
}
if (quote.suffix !== undefined) {
const suffixStart = offset + quote.exact.length;
const suffixEnd = Math.min(canonicalText.length, suffixStart + quote.suffix.length);
const actualSuffix = canonicalText.slice(suffixStart, suffixEnd);
if (!actualSuffix.startsWith(quote.suffix)) return false;
}
return true;
}
interface SelectorIndex {
TextQuoteSelector?: TextQuoteSelector;
TextPositionSelector?: TextPositionSelector;
PdfRectSelector?: PdfRectSelector;
PdfPageTextSelector?: PdfPageTextSelector;
}
function indexByType(selectors: readonly Selector[]): SelectorIndex {
const idx: SelectorIndex = {};
for (const s of selectors) {
switch (s.type) {
case "TextQuoteSelector":
idx.TextQuoteSelector = s;
break;
case "TextPositionSelector":
idx.TextPositionSelector = s;
break;
case "PdfRectSelector":
idx.PdfRectSelector = s;
break;
case "PdfPageTextSelector":
idx.PdfPageTextSelector = s;
break;
}
}
return idx;
}
function sliceSafely(text: string, start: number, end: number): string | null {
if (start < 0 || end > text.length || start >= end) return null;
return text.slice(start, end);
}
function pageFor(
span: { start: number; end: number },
offsetMap: readonly { page: number; globalStart: number; globalEnd: number }[],
): { page?: number } {
if (offsetMap.length === 0) return {};
const range = offsetMap.find((r) => span.start >= r.globalStart && span.end <= r.globalEnd);
return range ? { page: range.page } : {};
}
function findAllOccurrences(haystack: string, needle: string): number[] {
if (needle.length === 0) return [];
const out: number[] = [];
let from = 0;
for (;;) {
const idx = haystack.indexOf(needle, from);
if (idx === -1) break;
out.push(idx);
from = idx + 1;
}
return out;
}
function resolved(
target: ResolvedAnchorTarget,
confidence: number,
used: readonly SelectorType[],
warnings: readonly string[],
status: "resolved" | "ambiguous" = "resolved",
): AnchorResolution {
return {
status,
confidence,
candidates: [target],
usedSelectorTypes: used,
...(warnings.length > 0 ? { warnings } : {}),
};
}
function unresolved(warnings: readonly string[]): AnchorResolution {
return {
status: "unresolved",
confidence: 0,
candidates: [],
usedSelectorTypes: [],
...(warnings.length > 0 ? { warnings } : {}),
};
}

40
src/app/App.tsx Normal file
View File

@@ -0,0 +1,40 @@
/**
* App — the citation-evidence MVP shell.
*
* Three-pane layout per `wiki/ArchitectureOverview.md` §12.1:
*
* ┌────────────┬──────────────────┬────────────┐
* │ Collection │ Document Viewer │ Evidence │
* │ List │ │ Sidebar │
* └────────────┴──────────────────┴────────────┘
*
* CE-WP-0002-T06 stops at "viewer shell is rendered, evidence list is
* displayed". T07 wires the selection → annotation → evidence flow; T08
* wires the sidebar-click → scroll-to-passage round-trip.
*/
import {
CollectionList,
EngineProvider,
EvidenceSidebar,
ViewerShell,
} from "@work/index";
export function App() {
return (
<EngineProvider>
<div
style={{
display: "flex",
height: "100vh",
fontFamily: "system-ui, sans-serif",
color: "#222",
}}
>
<CollectionList />
<ViewerShell />
<EvidenceSidebar />
</div>
</EngineProvider>
);
}

View File

@@ -1,233 +0,0 @@
/**
* CE-WP-0002-T02 spike host page.
*
* Lists the fixtures from `fixtures/pdfs/manifest.json`, lets the user load
* one in the spike PDF viewer, capture a selection (the viewer's
* `onSelection` fires when text is selected), persist the resulting
* selectors to `localStorage`, and on reload restore + scroll to them.
*
* Success looks like: select a quote → click "save" → reload the tab →
* the highlight is rendered on the same passage and the page is scrolled
* to it.
*/
import { useEffect, useMemo, useState } from "react";
import {
PdfSpikeViewer,
type PdfSelectionCapture,
type StoredAnnotation,
} from "@anchor/index";
import type { Selector } from "@shared/selector";
import { newId } from "@shared/ids";
import manifest from "../../fixtures/pdfs/manifest.json";
interface FixtureEntry {
id: string;
filename: string;
description: string;
page_count: number;
known_good_quote: string;
known_good_quote_page: number;
}
const FIXTURES: FixtureEntry[] = (manifest as { fixtures: FixtureEntry[] }).fixtures;
const STORAGE_KEY = "ce-wp-0002-spike-annotations-v1";
interface StoredEntry {
id: string;
fixtureId: string;
text: string;
selectors: Selector[];
createdAt: string;
}
function loadStore(): StoredEntry[] {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return [];
const parsed = JSON.parse(raw) as unknown;
if (!Array.isArray(parsed)) return [];
return parsed as StoredEntry[];
} catch {
return [];
}
}
function saveStore(entries: StoredEntry[]) {
localStorage.setItem(STORAGE_KEY, JSON.stringify(entries));
}
export function SpikeApp() {
const [activeFixtureId, setActiveFixtureId] = useState<string | null>(null);
const [entries, setEntries] = useState<StoredEntry[]>(() => loadStore());
const [pending, setPending] = useState<
| { capture: PdfSelectionCapture; selectors: Selector[] }
| null
>(null);
const [scrollTo, setScrollTo] = useState<string | null>(null);
useEffect(() => {
saveStore(entries);
}, [entries]);
const activeFixture = useMemo(
() => FIXTURES.find((f) => f.id === activeFixtureId) ?? null,
[activeFixtureId],
);
const annotationsForActive = useMemo<StoredAnnotation[]>(() => {
if (!activeFixtureId) return [];
return entries
.filter((e) => e.fixtureId === activeFixtureId)
.map((e) => ({ id: e.id, text: e.text, selectors: e.selectors }));
}, [activeFixtureId, entries]);
function handleSave() {
if (!pending || !activeFixtureId) return;
const entry: StoredEntry = {
id: newId("annotation"),
fixtureId: activeFixtureId,
text: pending.capture.text,
selectors: pending.selectors,
createdAt: new Date().toISOString(),
};
setEntries((prev) => [...prev, entry]);
setPending(null);
}
function handleClear() {
if (!activeFixtureId) return;
setEntries((prev) => prev.filter((e) => e.fixtureId !== activeFixtureId));
}
return (
<div style={{ display: "flex", height: "100vh", fontFamily: "system-ui, sans-serif" }}>
<aside
style={{
width: 320,
borderRight: "1px solid #ddd",
padding: 12,
overflow: "auto",
flex: "0 0 320px",
}}
>
<h2 style={{ marginTop: 0 }}>CE-WP-0002-T02 Spike</h2>
<p style={{ fontSize: 12, color: "#555" }}>
Pick a fixture, select text in the viewer, save, then reload the page
to verify the highlight is restored.
</p>
<h3 style={{ fontSize: 14 }}>Fixtures</h3>
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{FIXTURES.map((f) => (
<li key={f.id} style={{ marginBottom: 6 }}>
<button
onClick={() => {
setActiveFixtureId(f.id);
setPending(null);
setScrollTo(null);
}}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: f.id === activeFixtureId ? "#e8f0ff" : "white",
border: "1px solid #ccc",
padding: 6,
cursor: "pointer",
}}
>
<div style={{ fontWeight: 600, fontSize: 13 }}>{f.id}</div>
<div style={{ fontSize: 11, color: "#666" }}>
{f.page_count} page{f.page_count === 1 ? "" : "s"} ·
known-good p{f.known_good_quote_page}
</div>
<div style={{ fontSize: 11, color: "#888", marginTop: 2 }}>
&ldquo;{f.known_good_quote}&rdquo;
</div>
</button>
</li>
))}
</ul>
{activeFixture && (
<>
<h3 style={{ fontSize: 14, marginTop: 16 }}>Saved annotations</h3>
{annotationsForActive.length === 0 && (
<p style={{ fontSize: 12, color: "#888" }}>(none)</p>
)}
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{annotationsForActive.map((a) => (
<li key={a.id} style={{ marginBottom: 4 }}>
<button
onClick={() => setScrollTo(a.id)}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: "#fff8d6",
border: "1px solid #ccc",
padding: 4,
cursor: "pointer",
fontSize: 11,
}}
>
{a.text.slice(0, 80)}
{a.text.length > 80 ? "…" : ""}
</button>
</li>
))}
</ul>
{annotationsForActive.length > 0 && (
<button
onClick={handleClear}
style={{ marginTop: 8, fontSize: 11 }}
>
Clear all for this fixture
</button>
)}
</>
)}
{pending && (
<div
style={{
marginTop: 16,
padding: 8,
border: "1px solid #f0c040",
background: "#fff8d6",
}}
>
<div style={{ fontSize: 12 }}>
Pending selection ({pending.selectors.length} selector
{pending.selectors.length === 1 ? "" : "s"}):
</div>
<div style={{ fontSize: 11, color: "#666", margin: "4px 0" }}>
&ldquo;{pending.capture.text.slice(0, 120)}&rdquo;
</div>
<button onClick={handleSave}>Save</button>{" "}
<button onClick={() => setPending(null)}>Discard</button>
</div>
)}
</aside>
<main style={{ flex: 1, overflow: "hidden", position: "relative" }}>
{activeFixture ? (
<PdfSpikeViewer
key={activeFixture.id}
pdfUrl={`/fixtures/pdfs/${encodeURIComponent(activeFixture.filename)}`}
storedAnnotations={annotationsForActive}
{...(scrollTo ? { scrollToAnnotationId: scrollTo } : {})}
onSelectionCaptured={(capture, selectors) =>
setPending({ capture, selectors })
}
/>
) : (
<div style={{ padding: 24, color: "#666" }}>
Pick a fixture on the left to begin.
</div>
)}
</main>
</div>
);
}

View File

@@ -1 +1 @@
export { SpikeApp } from "./SpikeApp";
export { App } from "./App";

View File

@@ -1,12 +1,12 @@
import { StrictMode } from "react";
import { createRoot } from "react-dom/client";
import { SpikeApp } from "./SpikeApp";
import { App } from "./App";
const container = document.getElementById("root");
if (!container) throw new Error("#root not found");
createRoot(container).render(
<StrictMode>
<SpikeApp />
<App />
</StrictMode>,
);

168
src/engine/engine.test.ts Normal file
View File

@@ -0,0 +1,168 @@
import { beforeEach, describe, expect, it } from "vitest";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { DocumentId, RepresentationId } from "@shared/ids";
import type { Selector } from "@shared/selector";
import { createEngine, type Engine, type EngineEvent } from "./index";
function fakeDocAndRep(): { document: Document; representation: DocumentRepresentation } {
const docId = "doc_fake" as DocumentId;
const repId = "rep_fake" as RepresentationId;
return {
document: {
id: docId,
mediaType: "application/pdf",
createdAt: "2026-05-25T00:00:00.000Z",
updatedAt: "2026-05-25T00:00:00.000Z",
},
representation: {
id: repId,
documentId: docId,
representationType: "pdf-text",
contentHash: "h",
canonicalText: "The quick brown fox.",
pageMap: [{ page: 1, width: 100, height: 100 }],
offsetMap: [{ page: 1, globalStart: 0, globalEnd: 20, pageLength: 20 }],
generatedAt: "2026-05-25T00:00:00.000Z",
},
};
}
describe("Engine integration", () => {
let engine: Engine;
let events: EngineEvent[];
beforeEach(() => {
engine = createEngine();
events = [];
engine.bus.onAny((e) => events.push(e));
});
it("documentService.register stores both and emits DocumentImported + DocumentRepresentationGenerated", () => {
const { document, representation } = fakeDocAndRep();
const result = engine.documents.register({ document, representation });
expect(result.document).toBe(document);
expect(result.representation).toBe(representation);
expect(engine.documents.get(document.id)).toBe(document);
expect(engine.documents.getRepresentation(representation.id)).toBe(representation);
expect(events.map((e) => e.type)).toEqual(["DocumentImported", "DocumentRepresentationGenerated"]);
});
it("annotationService.create stamps an ID + normalize version + timestamps, then emits AnnotationCreated", () => {
const { document, representation } = fakeDocAndRep();
engine.documents.register({ document, representation });
const selectors: Selector[] = [{ type: "TextQuoteSelector", exact: "brown fox" }];
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors,
quote: "brown fox",
note: "a quick mark",
});
expect(ann.id).toMatch(/^ann_/);
expect(ann.normalizeVersion).toBeGreaterThan(0);
expect(ann.createdAt).toBe(ann.updatedAt);
expect(engine.annotations.get(ann.id)).toBe(ann);
const created = events.find((e) => e.type === "AnnotationCreated");
expect(created?.type).toBe("AnnotationCreated");
});
it("setResolutionStatus emits AnnotationResolved for resolved/ambiguous and AnnotationResolutionFailed for unresolved/stale", () => {
const { document, representation } = fakeDocAndRep();
engine.documents.register({ document, representation });
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "x" }],
});
events.length = 0;
engine.annotations.setResolutionStatus(ann.id, "resolved", { confidence: 0.95 });
expect(events.map((e) => e.type)).toEqual(["AnnotationResolved"]);
engine.annotations.setResolutionStatus(ann.id, "unresolved", { confidence: 0, reason: "no quote match" });
expect(events.map((e) => e.type)).toEqual(["AnnotationResolved", "AnnotationResolutionFailed"]);
});
it("evidenceService.create requires at least one annotation and emits EvidenceItemCreated", () => {
const { document, representation } = fakeDocAndRep();
engine.documents.register({ document, representation });
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
});
expect(() => engine.evidence.create({ annotationIds: [] })).toThrow();
const item = engine.evidence.create({
annotationIds: [ann.id],
commentary: "good quote",
});
expect(item.status).toBe("candidate");
expect(item.annotationIds).toEqual([ann.id]);
expect(events.find((e) => e.type === "EvidenceItemCreated")).toBeDefined();
});
it("setStatus emits EvidenceItemUpdated only on real change and carries previousStatus", () => {
const { document, representation } = fakeDocAndRep();
engine.documents.register({ document, representation });
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
});
const item = engine.evidence.create({ annotationIds: [ann.id] });
events.length = 0;
const same = engine.evidence.setStatus(item.id, "candidate");
expect(same).toBe(item);
expect(events).toEqual([]);
engine.evidence.setStatus(item.id, "confirmed");
const updated = events.find((e) => e.type === "EvidenceItemUpdated");
expect(updated).toBeDefined();
if (updated?.type === "EvidenceItemUpdated") {
expect(updated.previousStatus).toBe("candidate");
}
});
it("listByDocument scopes evidence items to a single document via annotation lookup", () => {
const a = fakeDocAndRep();
engine.documents.register(a);
const annA = engine.annotations.create({
documentId: a.document.id,
representationId: a.representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
});
engine.evidence.create({ annotationIds: [annA.id], commentary: "a" });
// Second, distinct document.
const otherDocId = "doc_other" as DocumentId;
const otherRepId = "rep_other" as RepresentationId;
engine.documents.register({
document: { ...a.document, id: otherDocId },
representation: { ...a.representation, id: otherRepId, documentId: otherDocId },
});
const annB = engine.annotations.create({
documentId: otherDocId,
representationId: otherRepId,
selectors: [{ type: "TextQuoteSelector", exact: "z" }],
});
engine.evidence.create({ annotationIds: [annB.id], commentary: "b" });
expect(engine.evidence.listByDocument(a.document.id)).toHaveLength(1);
expect(engine.evidence.listByDocument(otherDocId)).toHaveLength(1);
});
it("activate emits EvidenceItemActivated without mutating the item", () => {
const { document, representation } = fakeDocAndRep();
engine.documents.register({ document, representation });
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "x" }],
});
const item = engine.evidence.create({ annotationIds: [ann.id] });
events.length = 0;
engine.evidence.activate(item.id, "sidebar");
const activated = events.find((e) => e.type === "EvidenceItemActivated");
expect(activated).toBeDefined();
if (activated?.type === "EvidenceItemActivated") {
expect(activated.source).toBe("sidebar");
}
});
});

View File

@@ -0,0 +1,64 @@
import { describe, expect, it, vi } from "vitest";
import type { DocumentId } from "@shared/ids";
import { createEventBus } from "./bus";
const docId = "doc_test" as DocumentId;
const minimalDoc = {
id: docId,
mediaType: "application/pdf",
createdAt: "2026-05-25T00:00:00.000Z",
updatedAt: "2026-05-25T00:00:00.000Z",
};
describe("EventBus", () => {
it("delivers typed events to the registered listener", () => {
const bus = createEventBus();
const spy = vi.fn();
bus.on("DocumentImported", spy);
const result = bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
expect(spy).toHaveBeenCalledOnce();
expect(spy.mock.calls[0]![0]).toMatchObject({ type: "DocumentImported", documentId: docId });
expect(result.listenerCount).toBe(1);
expect(result.errors).toEqual([]);
});
it("does not deliver an event to listeners of a different type", () => {
const bus = createEventBus();
const spy = vi.fn();
bus.on("AnnotationCreated", spy);
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
expect(spy).not.toHaveBeenCalled();
});
it("delivers every event to onAny listeners", () => {
const bus = createEventBus();
const spy = vi.fn();
bus.onAny(spy);
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
bus.emit({ type: "EvidenceItemActivated", evidenceItemId: "ev_x" as never });
expect(spy).toHaveBeenCalledTimes(2);
});
it("returns an unsubscribe function from on()", () => {
const bus = createEventBus();
const spy = vi.fn();
const off = bus.on("DocumentImported", spy);
off();
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
expect(spy).not.toHaveBeenCalled();
});
it("captures listener errors and still calls subsequent listeners", () => {
const bus = createEventBus();
const boom = new Error("listener exploded");
const a = vi.fn(() => { throw boom; });
const b = vi.fn();
bus.on("DocumentImported", a);
bus.on("DocumentImported", b);
const result = bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
expect(a).toHaveBeenCalledOnce();
expect(b).toHaveBeenCalledOnce();
expect(result.errors).toEqual([boom]);
expect(result.listenerCount).toBe(2);
});
});

79
src/engine/events/bus.ts Normal file
View File

@@ -0,0 +1,79 @@
/**
* Synchronous in-process event bus.
*
* Listeners fire in registration order on the calling stack; `emit` returns
* after every listener has run. A listener throwing does not stop later
* listeners — its error surfaces through the returned `errors` array so
* callers can decide whether to log, rethrow, or ignore.
*
* MVP-sufficient. ADR-0005 (persistence) will decide whether to upgrade to
* an async/queued bus when storage becomes durable.
*/
import type { EngineEvent, EngineEventOf, EngineEventType } from "./types";
export type EngineEventListener<T extends EngineEventType = EngineEventType> = (
event: EngineEventOf<T>,
) => void;
export type AnyEngineEventListener = (event: EngineEvent) => void;
export interface EmitResult {
readonly listenerCount: number;
readonly errors: readonly unknown[];
}
export interface EventBus {
on<T extends EngineEventType>(type: T, listener: EngineEventListener<T>): () => void;
onAny(listener: AnyEngineEventListener): () => void;
emit<T extends EngineEventType>(event: EngineEventOf<T>): EmitResult;
}
export function createEventBus(): EventBus {
const typedListeners = new Map<EngineEventType, Set<EngineEventListener>>();
const anyListeners = new Set<AnyEngineEventListener>();
return {
on(type, listener) {
let set = typedListeners.get(type);
if (!set) {
set = new Set();
typedListeners.set(type, set);
}
set.add(listener as unknown as EngineEventListener);
return () => {
set!.delete(listener as unknown as EngineEventListener);
};
},
onAny(listener) {
anyListeners.add(listener);
return () => {
anyListeners.delete(listener);
};
},
emit(event) {
const errors: unknown[] = [];
let count = 0;
const typedSet = typedListeners.get(event.type);
if (typedSet) {
for (const l of typedSet) {
count++;
try {
(l as AnyEngineEventListener)(event);
} catch (err) {
errors.push(err);
}
}
}
for (const l of anyListeners) {
count++;
try {
l(event);
} catch (err) {
errors.push(err);
}
}
return { listenerCount: count, errors };
},
};
}

View File

@@ -0,0 +1,8 @@
export * from "./types";
export {
createEventBus,
type EventBus,
type EngineEventListener,
type AnyEngineEventListener,
type EmitResult,
} from "./bus";

View File

@@ -0,0 +1,84 @@
/**
* Engine event vocabulary.
*
* Implements `wiki/SharedContracts.md` §4 (closed event list). Each event
* carries the *minimum* identifying payload needed by downstream listeners;
* services hand back the full domain object to the caller separately.
*
* Adding an event requires updating SharedContracts.md first.
*/
import type { Annotation, AnnotationResolutionStatus } from "@shared/annotation";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { EvidenceItem, EvidenceItemStatus } from "@shared/evidence";
import type {
AnnotationId,
DocumentId,
EvidenceItemId,
RepresentationId,
} from "@shared/ids";
export interface DocumentImportedEvent {
readonly type: "DocumentImported";
readonly documentId: DocumentId;
readonly document: Document;
}
export interface DocumentRepresentationGeneratedEvent {
readonly type: "DocumentRepresentationGenerated";
readonly documentId: DocumentId;
readonly representationId: RepresentationId;
readonly representation: DocumentRepresentation;
}
export interface AnnotationCreatedEvent {
readonly type: "AnnotationCreated";
readonly annotationId: AnnotationId;
readonly annotation: Annotation;
}
export interface AnnotationResolvedEvent {
readonly type: "AnnotationResolved";
readonly annotationId: AnnotationId;
readonly status: AnnotationResolutionStatus;
readonly confidence: number;
}
export interface AnnotationResolutionFailedEvent {
readonly type: "AnnotationResolutionFailed";
readonly annotationId: AnnotationId;
readonly reason: string;
}
export interface EvidenceItemCreatedEvent {
readonly type: "EvidenceItemCreated";
readonly evidenceItemId: EvidenceItemId;
readonly evidenceItem: EvidenceItem;
}
export interface EvidenceItemUpdatedEvent {
readonly type: "EvidenceItemUpdated";
readonly evidenceItemId: EvidenceItemId;
readonly evidenceItem: EvidenceItem;
readonly previousStatus: EvidenceItemStatus;
}
export interface EvidenceItemActivatedEvent {
readonly type: "EvidenceItemActivated";
readonly evidenceItemId: EvidenceItemId;
readonly source?: "sidebar" | "form-field" | "citation-card";
}
export type EngineEvent =
| DocumentImportedEvent
| DocumentRepresentationGeneratedEvent
| AnnotationCreatedEvent
| AnnotationResolvedEvent
| AnnotationResolutionFailedEvent
| EvidenceItemCreatedEvent
| EvidenceItemUpdatedEvent
| EvidenceItemActivatedEvent;
export type EngineEventType = EngineEvent["type"];
export type EngineEventOf<T extends EngineEventType> = Extract<EngineEvent, { type: T }>;

View File

@@ -1 +1,60 @@
export {};
/**
* Engine composition root.
*
* `createEngine()` wires in-memory repos to the services and shares a single
* event bus. The app layer holds the returned `Engine` instance and passes
* its services into the UI.
*
* Swapping the repository implementation later (ADR-0005) is a matter of
* replacing `createInMemoryRepos()` here. The service signatures don't
* change.
*/
import { createEventBus, type EventBus } from "./events";
import {
createInMemoryRepos,
type InMemoryRepos,
} from "./repos";
import {
createAnnotationService,
createDocumentService,
createEvidenceService,
type AnnotationService,
type DocumentService,
type EvidenceService,
} from "./services";
export * from "./events";
export * from "./repos";
export * from "./services";
export {
SNAPSHOT_VERSION,
attachPersister,
captureSnapshot,
documentIdsIn,
restoreFromStorage,
restoreSnapshot,
type EngineSnapshot,
type PersisterOptions,
} from "./persistence";
export interface Engine {
readonly bus: EventBus;
readonly repos: InMemoryRepos;
readonly documents: DocumentService;
readonly annotations: AnnotationService;
readonly evidence: EvidenceService;
}
export function createEngine(): Engine {
const bus = createEventBus();
const repos = createInMemoryRepos();
const documents = createDocumentService(repos.documents, repos.representations, bus);
const annotations = createAnnotationService(repos.annotations, bus);
const evidence = createEvidenceService(
repos.evidenceItems,
(id) => repos.annotations.get(id),
bus,
);
return { bus, repos, documents, annotations, evidence };
}

View File

@@ -0,0 +1,183 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { DocumentId, RepresentationId } from "@shared/ids";
import {
attachPersister,
captureSnapshot,
createEngine,
restoreFromStorage,
restoreSnapshot,
type Engine,
type EngineEvent,
type EngineSnapshot,
} from "./index";
function fakeDocAndRep(suffix: string): {
document: Document;
representation: DocumentRepresentation;
} {
const docId = `doc_${suffix}` as DocumentId;
const repId = `rep_${suffix}` as RepresentationId;
return {
document: {
id: docId,
mediaType: "application/pdf",
title: `Doc ${suffix}`,
createdAt: "2026-05-25T00:00:00.000Z",
updatedAt: "2026-05-25T00:00:00.000Z",
},
representation: {
id: repId,
documentId: docId,
representationType: "pdf-text",
contentHash: `hash-${suffix}`,
canonicalText: "The quick brown fox.",
pageMap: [{ page: 1, width: 100, height: 100 }],
offsetMap: [{ page: 1, globalStart: 0, globalEnd: 20, pageLength: 20 }],
generatedAt: "2026-05-25T00:00:00.000Z",
},
};
}
function memoryStorage(): Pick<Storage, "getItem" | "setItem" | "removeItem"> {
const map = new Map<string, string>();
return {
getItem: (k) => map.get(k) ?? null,
setItem: (k, v) => void map.set(k, v),
removeItem: (k) => void map.delete(k),
};
}
function seed(engine: Engine, suffix: string) {
const { document, representation } = fakeDocAndRep(suffix);
engine.documents.register({ document, representation });
const ann = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
quote: "brown fox",
});
const item = engine.evidence.create({
annotationIds: [ann.id],
commentary: `commentary-${suffix}`,
});
return { document, representation, ann, item };
}
describe("captureSnapshot + restoreSnapshot", () => {
it("round-trips documents, representations, annotations and evidence items", () => {
const src = createEngine();
const a = seed(src, "a");
const b = seed(src, "b");
const snap = captureSnapshot(src);
expect(snap.documents).toHaveLength(2);
expect(snap.representations).toHaveLength(2);
expect(snap.annotations).toHaveLength(2);
expect(snap.evidenceItems).toHaveLength(2);
const dst = createEngine();
restoreSnapshot(dst, snap);
expect(dst.documents.get(a.document.id)?.title).toBe("Doc a");
expect(dst.documents.get(b.document.id)?.title).toBe("Doc b");
expect(dst.annotations.get(a.ann.id)?.quote).toBe("brown fox");
expect(dst.evidence.get(a.item.id)?.commentary).toBe("commentary-a");
});
it("restoreSnapshot does NOT emit *Created events (events would loop the persister)", () => {
const src = createEngine();
seed(src, "x");
const snap = captureSnapshot(src);
const dst = createEngine();
const seen: EngineEvent["type"][] = [];
dst.bus.onAny((e) => seen.push(e.type));
restoreSnapshot(dst, snap);
expect(seen).toEqual([]);
});
it("rejects a snapshot with a mismatching version", () => {
const dst = createEngine();
expect(() =>
restoreSnapshot(dst, {
version: 999,
documents: [],
representations: [],
annotations: [],
evidenceItems: [],
} as EngineSnapshot),
).toThrow(/version/);
});
});
describe("attachPersister", () => {
let storage: ReturnType<typeof memoryStorage>;
let engine: Engine;
const KEY = "ce-test-snap";
beforeEach(() => {
storage = memoryStorage();
engine = createEngine();
});
it("writes a snapshot to storage on every mutating event", () => {
const off = attachPersister(engine, { key: KEY, storage });
expect(storage.getItem(KEY)).toBeNull();
seed(engine, "z");
const raw = storage.getItem(KEY);
expect(raw).not.toBeNull();
const snap = JSON.parse(raw!) as EngineSnapshot;
expect(snap.documents).toHaveLength(1);
expect(snap.evidenceItems).toHaveLength(1);
off();
});
it("stops writing after the unsubscribe is called", () => {
const off = attachPersister(engine, { key: KEY, storage });
seed(engine, "q");
const after = storage.getItem(KEY);
off();
seed(engine, "r");
expect(storage.getItem(KEY)).toBe(after);
});
it("survives a JSON.stringify failure without throwing into the caller", () => {
const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
const failing = { ...memoryStorage(), setItem: () => { throw new Error("boom"); } };
attachPersister(engine, { key: KEY, storage: failing });
expect(() => seed(engine, "k")).not.toThrow();
expect(warn).toHaveBeenCalled();
warn.mockRestore();
});
});
describe("restoreFromStorage", () => {
it("returns {restored: false} when the key is empty", () => {
const storage = memoryStorage();
const engine = createEngine();
const result = restoreFromStorage(engine, { key: "missing", storage });
expect(result.restored).toBe(false);
});
it("hydrates the engine when storage holds a valid snapshot", () => {
const src = createEngine();
seed(src, "rs");
const storage = memoryStorage();
storage.setItem("snap", JSON.stringify(captureSnapshot(src)));
const dst = createEngine();
const result = restoreFromStorage(dst, { key: "snap", storage });
expect(result.restored).toBe(true);
expect(dst.documents.list()).toHaveLength(1);
});
it("ignores malformed JSON without throwing", () => {
const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
const storage = memoryStorage();
storage.setItem("snap", "not-json");
const engine = createEngine();
const result = restoreFromStorage(engine, { key: "snap", storage });
expect(result.restored).toBe(false);
expect(warn).toHaveBeenCalled();
warn.mockRestore();
});
});

138
src/engine/persistence.ts Normal file
View File

@@ -0,0 +1,138 @@
/**
* Engine snapshot + restore.
*
* MVP "persistence" — capture the engine's in-memory state into a JSON blob
* and restore it later. Used by the SPA to survive page reloads via
* `localStorage` until ADR-0005 lands a real store.
*
* Restore deliberately bypasses the service layer: it writes directly to
* the repos so no `*Created` events fire. Without that, restoring would
* trigger the persister to re-write the same snapshot — and if the user
* has another tab open, it would also broadcast spurious "this annotation
* just appeared" events to UI listeners.
*/
import type { Annotation } from "@shared/annotation";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { EvidenceItem } from "@shared/evidence";
import type { DocumentId } from "@shared/ids";
import type { Engine } from "./index";
export const SNAPSHOT_VERSION = 1;
export interface EngineSnapshot {
readonly version: number;
readonly documents: readonly Document[];
readonly representations: readonly DocumentRepresentation[];
readonly annotations: readonly Annotation[];
readonly evidenceItems: readonly EvidenceItem[];
}
export function captureSnapshot(engine: Engine): EngineSnapshot {
const documents = engine.documents.list();
// Gather representations per known document.
const representations: DocumentRepresentation[] = [];
const annotations: Annotation[] = [];
const evidenceItems: EvidenceItem[] = [];
const seenItemIds = new Set<string>();
for (const doc of documents) {
representations.push(...engine.documents.listRepresentations(doc.id));
annotations.push(...engine.annotations.listByDocument(doc.id));
for (const item of engine.evidence.listByDocument(doc.id)) {
// listByDocument keys off annotation lookup; an item that shares
// annotations across two documents would surface twice. De-dupe.
if (!seenItemIds.has(item.id)) {
seenItemIds.add(item.id);
evidenceItems.push(item);
}
}
}
return {
version: SNAPSHOT_VERSION,
documents: [...documents],
representations,
annotations,
evidenceItems,
};
}
export function restoreSnapshot(engine: Engine, snapshot: EngineSnapshot): void {
if (snapshot.version !== SNAPSHOT_VERSION) {
throw new Error(
`restoreSnapshot: snapshot version ${snapshot.version} does not match current ${SNAPSHOT_VERSION}`,
);
}
for (const d of snapshot.documents) engine.repos.documents.create(d);
for (const r of snapshot.representations) engine.repos.representations.create(r);
for (const a of snapshot.annotations) engine.repos.annotations.create(a);
for (const i of snapshot.evidenceItems) engine.repos.evidenceItems.create(i);
}
export interface PersisterOptions {
/** Storage key. */
readonly key: string;
/** Storage shim — defaults to globalThis.localStorage. */
readonly storage?: Pick<Storage, "getItem" | "setItem" | "removeItem">;
}
/**
* Subscribe to engine events and write a fresh snapshot on every mutation.
* Returns the unsubscribe function.
*
* Initial snapshot is NOT written — call `captureSnapshot` + `storage.setItem`
* yourself if you want a baseline.
*/
export function attachPersister(engine: Engine, options: PersisterOptions): () => void {
const storage = options.storage ?? globalThis.localStorage;
const write = () => {
const snap = captureSnapshot(engine);
try {
storage.setItem(options.key, JSON.stringify(snap));
} catch (err) {
// localStorage quota / serialization errors shouldn't crash the app.
// Surface to the console; ADR-0005 owns the durable fix.
console.warn("attachPersister: write failed", err);
}
};
const offs = [
engine.bus.on("DocumentImported", write),
engine.bus.on("DocumentRepresentationGenerated", write),
engine.bus.on("AnnotationCreated", write),
engine.bus.on("AnnotationResolved", write),
engine.bus.on("AnnotationResolutionFailed", write),
engine.bus.on("EvidenceItemCreated", write),
engine.bus.on("EvidenceItemUpdated", write),
];
return () => {
for (const off of offs) off();
};
}
export type RestoreFromStorageOptions = PersisterOptions;
export function restoreFromStorage(
engine: Engine,
options: RestoreFromStorageOptions,
): { readonly restored: boolean; readonly snapshot?: EngineSnapshot } {
const storage = options.storage ?? globalThis.localStorage;
const raw = storage.getItem(options.key);
if (!raw) return { restored: false };
try {
const parsed = JSON.parse(raw) as EngineSnapshot;
if (typeof parsed !== "object" || parsed === null) return { restored: false };
restoreSnapshot(engine, parsed);
return { restored: true, snapshot: parsed };
} catch (err) {
console.warn("restoreFromStorage: parse failed, ignoring stored snapshot", err);
return { restored: false };
}
}
/**
* Narrow helper: get the set of document ids restored from a snapshot.
* Useful for the SPA's "show me what was open last time" logic.
*/
export function documentIdsIn(snapshot: EngineSnapshot): readonly DocumentId[] {
return snapshot.documents.map((d) => d.id);
}

View File

@@ -0,0 +1,151 @@
/**
* In-memory `Map`-backed repositories.
*
* Implements the MVP storage layer. The repository interfaces match the
* shape that ADR-0005's eventual persistence implementation will satisfy,
* so swapping `createInMemoryRepos()` for a SQLite/Postgres factory later
* is a localised change.
*
* All mutating methods return the *stored* object so callers can pick up
* server-assigned fields (none in MVP, but the contract anticipates it).
*/
import type { Annotation } from "@shared/annotation";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { EvidenceItem } from "@shared/evidence";
import type {
AnnotationId,
DocumentId,
EvidenceItemId,
RepresentationId,
} from "@shared/ids";
export interface DocumentRepository {
create(document: Document): Document;
get(id: DocumentId): Document | null;
list(): readonly Document[];
update(document: Document): Document;
}
export interface RepresentationRepository {
create(representation: DocumentRepresentation): DocumentRepresentation;
get(id: RepresentationId): DocumentRepresentation | null;
listByDocument(documentId: DocumentId): readonly DocumentRepresentation[];
}
export interface AnnotationRepository {
create(annotation: Annotation): Annotation;
get(id: AnnotationId): Annotation | null;
listByDocument(documentId: DocumentId): readonly Annotation[];
update(annotation: Annotation): Annotation;
}
export interface EvidenceItemRepository {
create(item: EvidenceItem): EvidenceItem;
get(id: EvidenceItemId): EvidenceItem | null;
listByDocument(
documentId: DocumentId,
annotationLookup: (id: AnnotationId) => Annotation | null,
): readonly EvidenceItem[];
update(item: EvidenceItem): EvidenceItem;
}
export interface InMemoryRepos {
readonly documents: DocumentRepository;
readonly representations: RepresentationRepository;
readonly annotations: AnnotationRepository;
readonly evidenceItems: EvidenceItemRepository;
}
export function createInMemoryRepos(): InMemoryRepos {
const documents = new Map<DocumentId, Document>();
const representations = new Map<RepresentationId, DocumentRepresentation>();
const annotations = new Map<AnnotationId, Annotation>();
const evidenceItems = new Map<EvidenceItemId, EvidenceItem>();
return {
documents: {
create(document) {
documents.set(document.id, document);
return document;
},
get(id) {
return documents.get(id) ?? null;
},
list() {
return [...documents.values()];
},
update(document) {
if (!documents.has(document.id)) {
throw new Error(`DocumentRepository.update: unknown id ${document.id}`);
}
documents.set(document.id, document);
return document;
},
},
representations: {
create(representation) {
representations.set(representation.id, representation);
return representation;
},
get(id) {
return representations.get(id) ?? null;
},
listByDocument(documentId) {
const out: DocumentRepresentation[] = [];
for (const rep of representations.values()) {
if (rep.documentId === documentId) out.push(rep);
}
return out;
},
},
annotations: {
create(annotation) {
annotations.set(annotation.id, annotation);
return annotation;
},
get(id) {
return annotations.get(id) ?? null;
},
listByDocument(documentId) {
const out: Annotation[] = [];
for (const ann of annotations.values()) {
if (ann.documentId === documentId) out.push(ann);
}
return out;
},
update(annotation) {
if (!annotations.has(annotation.id)) {
throw new Error(`AnnotationRepository.update: unknown id ${annotation.id}`);
}
annotations.set(annotation.id, annotation);
return annotation;
},
},
evidenceItems: {
create(item) {
evidenceItems.set(item.id, item);
return item;
},
get(id) {
return evidenceItems.get(id) ?? null;
},
listByDocument(documentId, annotationLookup) {
const out: EvidenceItem[] = [];
for (const item of evidenceItems.values()) {
if (item.annotationIds.some((aid) => annotationLookup(aid)?.documentId === documentId)) {
out.push(item);
}
}
return out;
},
update(item) {
if (!evidenceItems.has(item.id)) {
throw new Error(`EvidenceItemRepository.update: unknown id ${item.id}`);
}
evidenceItems.set(item.id, item);
return item;
},
},
};
}

View File

@@ -0,0 +1,8 @@
export {
createInMemoryRepos,
type InMemoryRepos,
type DocumentRepository,
type RepresentationRepository,
type AnnotationRepository,
type EvidenceItemRepository,
} from "./in-memory";

View File

@@ -0,0 +1,102 @@
/**
* Annotation service — creates technical marks on document ranges and
* emits `AnnotationCreated`. Resolution-status updates emit
* `AnnotationResolved` / `AnnotationResolutionFailed`.
*
* Annotation creation is the engine's response to a user action in the
* viewer (T07). The viewer adapter has already turned the selection into
* `Selector[]`; this service stamps an ID, normalize-version, timestamps,
* persists, and broadcasts.
*/
import type {
Annotation,
AnnotationResolutionStatus,
} from "@shared/annotation";
import type { DocumentId, RepresentationId, AnnotationId } from "@shared/ids";
import type { Selector } from "@shared/selector";
import { newId } from "@shared/ids";
import { NORMALIZE_VERSION } from "@shared/text/normalize";
import type { EventBus } from "../events";
import type { AnnotationRepository } from "../repos";
export interface CreateAnnotationInput {
readonly documentId: DocumentId;
readonly representationId?: RepresentationId;
readonly selectors: readonly Selector[];
readonly quote?: string;
readonly note?: string;
readonly createdBy?: string;
}
export interface AnnotationService {
create(input: CreateAnnotationInput): Annotation;
get(id: AnnotationId): Annotation | null;
listByDocument(documentId: DocumentId): readonly Annotation[];
setResolutionStatus(
id: AnnotationId,
status: AnnotationResolutionStatus,
opts: { readonly confidence: number; readonly reason?: string },
): Annotation;
}
export function createAnnotationService(
annotations: AnnotationRepository,
bus: EventBus,
now: () => string = () => new Date().toISOString(),
): AnnotationService {
return {
create(input) {
const ts = now();
const annotation: Annotation = {
id: newId("annotation"),
documentId: input.documentId,
...(input.representationId !== undefined ? { representationId: input.representationId } : {}),
selectors: input.selectors,
...(input.quote !== undefined ? { quote: input.quote } : {}),
...(input.note !== undefined ? { note: input.note } : {}),
normalizeVersion: NORMALIZE_VERSION,
...(input.createdBy !== undefined ? { createdBy: input.createdBy } : {}),
createdAt: ts,
updatedAt: ts,
};
const stored = annotations.create(annotation);
bus.emit({ type: "AnnotationCreated", annotationId: stored.id, annotation: stored });
return stored;
},
get(id) {
return annotations.get(id);
},
listByDocument(documentId) {
return annotations.listByDocument(documentId);
},
setResolutionStatus(id, status, opts) {
const existing = annotations.get(id);
if (!existing) {
throw new Error(`AnnotationService.setResolutionStatus: unknown id ${id}`);
}
const updated: Annotation = {
...existing,
resolutionStatus: status,
updatedAt: now(),
};
const stored = annotations.update(updated);
if (status === "unresolved" || status === "stale") {
bus.emit({
type: "AnnotationResolutionFailed",
annotationId: stored.id,
reason: opts.reason ?? status,
});
} else {
bus.emit({
type: "AnnotationResolved",
annotationId: stored.id,
status,
confidence: opts.confidence,
});
}
return stored;
},
};
}

View File

@@ -0,0 +1,63 @@
/**
* Document service — registers ingested documents and emits the §4 events.
*
* The ingest pipeline (`src/source/pdf/ingest.ts`) is a pure function over
* bytes — it does not touch the engine. The app composition root calls
* `ingestPdf` then hands the result to `documentService.register()`, which
* is where the engine takes over: persist into the repos, emit
* `DocumentImported` + `DocumentRepresentationGenerated`.
*/
import type { Document, DocumentRepresentation } from "@shared/document";
import type { DocumentId, RepresentationId } from "@shared/ids";
import type { EventBus } from "../events";
import type { DocumentRepository, RepresentationRepository } from "../repos";
export interface DocumentService {
register(input: {
readonly document: Document;
readonly representation: DocumentRepresentation;
}): { readonly document: Document; readonly representation: DocumentRepresentation };
get(id: DocumentId): Document | null;
list(): readonly Document[];
getRepresentation(id: RepresentationId): DocumentRepresentation | null;
listRepresentations(documentId: DocumentId): readonly DocumentRepresentation[];
}
export function createDocumentService(
documents: DocumentRepository,
representations: RepresentationRepository,
bus: EventBus,
): DocumentService {
return {
register({ document, representation }) {
const storedDocument = documents.create(document);
const storedRepresentation = representations.create(representation);
bus.emit({
type: "DocumentImported",
documentId: storedDocument.id,
document: storedDocument,
});
bus.emit({
type: "DocumentRepresentationGenerated",
documentId: storedDocument.id,
representationId: storedRepresentation.id,
representation: storedRepresentation,
});
return { document: storedDocument, representation: storedRepresentation };
},
get(id) {
return documents.get(id);
},
list() {
return documents.list();
},
getRepresentation(id) {
return representations.get(id);
},
listRepresentations(documentId) {
return representations.listByDocument(documentId);
},
};
}

View File

@@ -0,0 +1,127 @@
/**
* Evidence service — creates EvidenceItems on top of annotations and
* tracks their lifecycle. Emits §4 events: `EvidenceItemCreated`,
* `EvidenceItemUpdated`, `EvidenceItemActivated`.
*
* MVP item shape per `wiki/SharedContracts.md` §2.2: status starts at
* `candidate`, may transition to `confirmed | rejected | needs-check`.
* Item-level relation/strength (supports/contradicts/...) lives on the
* link, not the item — that's CE-WP-0003.
*/
import type { Annotation } from "@shared/annotation";
import type {
EvidenceItem,
EvidenceItemStatus,
} from "@shared/evidence";
import type {
AnnotationId,
DocumentId,
EvidenceItemId,
} from "@shared/ids";
import { newId } from "@shared/ids";
import type { EventBus, EvidenceItemActivatedEvent } from "../events";
import type { EvidenceItemRepository } from "../repos";
export interface CreateEvidenceItemInput {
readonly annotationIds: readonly AnnotationId[];
readonly title?: string;
readonly commentary?: string;
readonly status?: EvidenceItemStatus;
readonly confidence?: number;
readonly tags?: readonly string[];
readonly createdBy?: string;
}
export interface EvidenceService {
create(input: CreateEvidenceItemInput): EvidenceItem;
get(id: EvidenceItemId): EvidenceItem | null;
listByDocument(documentId: DocumentId): readonly EvidenceItem[];
setStatus(id: EvidenceItemId, status: EvidenceItemStatus): EvidenceItem;
updateCommentary(id: EvidenceItemId, commentary: string): EvidenceItem;
activate(
id: EvidenceItemId,
source?: EvidenceItemActivatedEvent["source"],
): EvidenceItem;
}
export function createEvidenceService(
items: EvidenceItemRepository,
annotationLookup: (id: AnnotationId) => Annotation | null,
bus: EventBus,
now: () => string = () => new Date().toISOString(),
): EvidenceService {
return {
create(input) {
if (input.annotationIds.length === 0) {
throw new Error("EvidenceService.create: at least one annotationId is required");
}
const ts = now();
const item: EvidenceItem = {
id: newId("evidence"),
annotationIds: input.annotationIds,
...(input.title !== undefined ? { title: input.title } : {}),
...(input.commentary !== undefined ? { commentary: input.commentary } : {}),
status: input.status ?? "candidate",
...(input.confidence !== undefined ? { confidence: input.confidence } : {}),
...(input.tags !== undefined ? { tags: input.tags } : {}),
...(input.createdBy !== undefined ? { createdBy: input.createdBy } : {}),
createdAt: ts,
updatedAt: ts,
};
const stored = items.create(item);
bus.emit({ type: "EvidenceItemCreated", evidenceItemId: stored.id, evidenceItem: stored });
return stored;
},
get(id) {
return items.get(id);
},
listByDocument(documentId) {
return items.listByDocument(documentId, annotationLookup);
},
setStatus(id, status) {
const existing = items.get(id);
if (!existing) {
throw new Error(`EvidenceService.setStatus: unknown id ${id}`);
}
if (existing.status === status) return existing;
const updated: EvidenceItem = { ...existing, status, updatedAt: now() };
const stored = items.update(updated);
bus.emit({
type: "EvidenceItemUpdated",
evidenceItemId: stored.id,
evidenceItem: stored,
previousStatus: existing.status,
});
return stored;
},
updateCommentary(id, commentary) {
const existing = items.get(id);
if (!existing) {
throw new Error(`EvidenceService.updateCommentary: unknown id ${id}`);
}
const updated: EvidenceItem = { ...existing, commentary, updatedAt: now() };
const stored = items.update(updated);
bus.emit({
type: "EvidenceItemUpdated",
evidenceItemId: stored.id,
evidenceItem: stored,
previousStatus: existing.status,
});
return stored;
},
activate(id, source) {
const existing = items.get(id);
if (!existing) {
throw new Error(`EvidenceService.activate: unknown id ${id}`);
}
bus.emit({
type: "EvidenceItemActivated",
evidenceItemId: existing.id,
...(source !== undefined ? { source } : {}),
});
return existing;
},
};
}

View File

@@ -0,0 +1,14 @@
export {
createDocumentService,
type DocumentService,
} from "./documents";
export {
createAnnotationService,
type AnnotationService,
type CreateAnnotationInput,
} from "./annotations";
export {
createEvidenceService,
type EvidenceService,
type CreateEvidenceItemInput,
} from "./evidence";

View File

@@ -1 +1,8 @@
export {};
export {
ingestPdf,
type IngestPdfInput,
type IngestPdfOptions,
type IngestPdfResult,
} from "./pdf/ingest";
export { extractPdf, type PdfExtractionResult } from "./pdf/extract";
export { fingerprintBytes } from "./pdf/fingerprint";

122
src/source/pdf/extract.ts Normal file
View File

@@ -0,0 +1,122 @@
/**
* PDF text extraction → canonical text + PageMap + OffsetMap.
*
* Implements `wiki/ArchitectureOverview.md` §3.4 ("extract canonical text /
* build format-specific maps") for the `pdf-text` representation
* (`wiki/SharedContracts.md` §1, §3) and §6 (canonical normalization).
*
* Runtime independence: the PDF.js worker must be configured by the host
* application (`GlobalWorkerOptions.workerSrc`) before this module is
* called. In Vite/browser code the worker is bundled via the viewer; in
* Node tests the test setup file points it at
* `pdfjs-dist/legacy/build/pdf.worker.mjs`. No worker setup happens here
* so the same module loads cleanly in both runtimes.
*
* Page boundary semantics: canonical text concatenates per-page normalized
* text with a single "\n\n" paragraph separator. The separator is treated
* as belonging to the *preceding* page in `OffsetMap`, so the map covers
* `[0, canonicalText.length)` with no gaps. The last page has no trailing
* separator. This means `pageLength = globalEnd - globalStart` for
* every page; for non-last pages it equals (normalized page text length +
* 2). See `PageOffsetRange` in `@shared/document.ts`.
*/
import { getDocument } from "pdfjs-dist";
import type { PDFPageProxy } from "pdfjs-dist";
import type {
OffsetMap,
PageInfo,
PageMap,
PageOffsetRange,
} from "@shared/document";
import { normalize } from "@shared/text/normalize";
const PAGE_SEPARATOR = "\n\n";
export interface PdfExtractionResult {
readonly canonicalText: string;
readonly pageMap: PageMap;
readonly offsetMap: OffsetMap;
readonly pageCount: number;
}
export async function extractPdf(bytes: Uint8Array): Promise<PdfExtractionResult> {
// PDF.js mutates the bytes buffer (transfers ownership). Pass a fresh copy
// so the caller's Uint8Array stays usable for fingerprinting after extract.
const data = new Uint8Array(bytes);
const loadingTask = getDocument({ data });
const doc = await loadingTask.promise;
try {
const pageCount = doc.numPages;
const pageInfos: PageInfo[] = [];
const pageNormalizedTexts: string[] = [];
for (let pageNumber = 1; pageNumber <= pageCount; pageNumber++) {
const page = await doc.getPage(pageNumber);
try {
const viewport = page.getViewport({ scale: 1 });
pageInfos.push({
page: pageNumber,
width: viewport.width,
height: viewport.height,
});
const rawText = await extractPageText(page);
pageNormalizedTexts.push(normalize(rawText).text);
} finally {
page.cleanup();
}
}
const { canonicalText, offsetMap } = buildOffsetMap(pageNormalizedTexts);
return {
canonicalText,
pageMap: pageInfos,
offsetMap,
pageCount,
};
} finally {
await doc.destroy();
}
}
async function extractPageText(page: PDFPageProxy): Promise<string> {
const content = await page.getTextContent();
// textContent.items are TextItem | TextMarkedContent. We want only the
// TextItem strings (those have a `str` field); marked-content entries are
// structural anchors and have no visible text.
const parts: string[] = [];
for (const item of content.items) {
if ("str" in item) {
parts.push(item.str);
if (item.hasEOL) parts.push("\n");
}
}
return parts.join("");
}
function buildOffsetMap(pageTexts: readonly string[]): {
canonicalText: string;
offsetMap: OffsetMap;
} {
const ranges: PageOffsetRange[] = [];
let offset = 0;
for (let i = 0; i < pageTexts.length; i++) {
const text = pageTexts[i]!;
const isLast = i === pageTexts.length - 1;
const segmentLength = text.length + (isLast ? 0 : PAGE_SEPARATOR.length);
const globalStart = offset;
const globalEnd = offset + segmentLength;
ranges.push({
page: i + 1,
globalStart,
globalEnd,
pageLength: segmentLength,
});
offset = globalEnd;
}
const canonicalText = pageTexts.join(PAGE_SEPARATOR);
return { canonicalText, offsetMap: ranges };
}

View File

@@ -0,0 +1,31 @@
/**
* SHA-256 fingerprint of raw document bytes.
*
* Implements the fingerprint half of `wiki/ArchitectureOverview.md` §3.4
* (the "compute fingerprint" pipeline step) and populates
* `Document.fingerprint` (`wiki/SharedContracts.md` §1).
*
* Uses Web Crypto's `crypto.subtle.digest`, which is available in browsers
* and in Node ≥ 20 (where it is exposed on `globalThis.crypto`). No
* platform branching — the API is the same in both environments.
*/
export async function fingerprintBytes(bytes: Uint8Array): Promise<string> {
// Copy into a fresh ArrayBuffer (not SharedArrayBuffer) so the digest call
// satisfies TS's updated `BufferSource` type, which excludes
// `SharedArrayBuffer`. The copy is O(n) — fine even for large PDFs since
// SHA-256 itself is already O(n).
const ab = new ArrayBuffer(bytes.byteLength);
new Uint8Array(ab).set(bytes);
const digest = await crypto.subtle.digest("SHA-256", ab);
return bytesToHex(new Uint8Array(digest));
}
function bytesToHex(bytes: Uint8Array): string {
let hex = "";
for (let i = 0; i < bytes.length; i++) {
const b = bytes[i]!;
hex += (b < 0x10 ? "0" : "") + b.toString(16);
}
return hex;
}

View File

@@ -0,0 +1,142 @@
/**
* Fixture-driven contract tests for the PDF ingest pipeline.
*
* For each fixture in `fixtures/pdfs/manifest.json`:
* 1. Read the PDF bytes from disk.
* 2. Run `ingestPdf` end-to-end.
* 3. Assert the resulting Document + DocumentRepresentation honour the
* manifest contract: media type is application/pdf, fingerprint is a
* 64-hex SHA-256, pageMap matches `page_count`, canonicalText
* contains `known_good_quote`, and the offsetMap covers
* `[0, canonicalText.length)` with no gaps.
*
* This is the verification gate for CE-WP-0002-T03.
*/
import { readFileSync } from "node:fs";
import { dirname, resolve } from "node:path";
import { createRequire } from "node:module";
import { fileURLToPath } from "node:url";
import { beforeAll, describe, expect, it } from "vitest";
import { ingestPdf } from "./ingest";
import { fingerprintBytes } from "./fingerprint";
import manifest from "../../../fixtures/pdfs/manifest.json" with { type: "json" };
const __dirname = dirname(fileURLToPath(import.meta.url));
const FIXTURE_DIR = resolve(__dirname, "../../../fixtures/pdfs");
interface Fixture {
id: string;
filename: string;
page_count: number;
known_good_quote: string;
known_good_quote_page: number;
}
const FIXTURES: readonly Fixture[] = manifest.fixtures;
beforeAll(async () => {
// PDF.js needs a workerSrc set. In Node tests we point it at the legacy
// worker bundle — the modern bundle uses APIs that aren't present in
// Node. The legacy worker is bundled as plain JS and runs through the
// fake-worker fallback that PDF.js spins up when no real Worker is
// available.
const pdfjs = await import("pdfjs-dist");
const require = createRequire(import.meta.url);
pdfjs.GlobalWorkerOptions.workerSrc = require.resolve(
"pdfjs-dist/legacy/build/pdf.worker.mjs",
);
});
describe("ingestPdf — fixture corpus", () => {
for (const fixture of FIXTURES) {
describe(fixture.id, () => {
const path = resolve(FIXTURE_DIR, fixture.filename);
const bytes = new Uint8Array(readFileSync(path));
it("produces a Document with PDF media type and SHA-256 fingerprint", async () => {
const { document } = await ingestPdf(bytes, { filename: fixture.filename });
expect(document.mediaType).toBe("application/pdf");
expect(document.fingerprint).toMatch(/^[0-9a-f]{64}$/);
expect(document.title).toBe(fixture.filename);
// Fingerprint must be deterministic across runs.
const expected = await fingerprintBytes(bytes);
expect(document.fingerprint).toBe(expected);
});
it("produces a pdf-text representation with the expected page count", async () => {
const { representation } = await ingestPdf(bytes);
expect(representation.representationType).toBe("pdf-text");
expect(representation.pageMap?.length).toBe(fixture.page_count);
expect(representation.offsetMap?.length).toBe(fixture.page_count);
});
it("canonical text contains the manifest's known-good quote", async () => {
const { representation } = await ingestPdf(bytes);
const text = representation.canonicalText ?? "";
expect(text).toContain(fixture.known_good_quote);
});
it("offsetMap is gap-free and covers [0, canonicalText.length)", async () => {
const { representation } = await ingestPdf(bytes);
const text = representation.canonicalText ?? "";
const offsets = representation.offsetMap ?? [];
expect(offsets.length).toBeGreaterThan(0);
expect(offsets[0]!.globalStart).toBe(0);
expect(offsets.at(-1)!.globalEnd).toBe(text.length);
for (let i = 0; i < offsets.length; i++) {
const r = offsets[i]!;
expect(r.page).toBe(i + 1);
expect(r.globalEnd - r.globalStart).toBe(r.pageLength);
if (i > 0) expect(r.globalStart).toBe(offsets[i - 1]!.globalEnd);
}
});
it("pageMap entries have positive width and height in user-space points", async () => {
const { representation } = await ingestPdf(bytes);
const pages = representation.pageMap ?? [];
for (let i = 0; i < pages.length; i++) {
const p = pages[i]!;
expect(p.page).toBe(i + 1);
expect(p.width).toBeGreaterThan(0);
expect(p.height).toBeGreaterThan(0);
}
});
});
}
});
describe("ingestPdf — option handling", () => {
const fixture = FIXTURES[0]!;
const path = resolve(FIXTURE_DIR, fixture.filename);
const bytes = new Uint8Array(readFileSync(path));
it("uses explicit title over filename", async () => {
const { document } = await ingestPdf(bytes, {
filename: fixture.filename,
title: "Custom Title",
});
expect(document.title).toBe("Custom Title");
});
it("omits title entirely when neither filename nor title is supplied", async () => {
const { document } = await ingestPdf(bytes);
expect(document.title).toBeUndefined();
});
it("propagates uri and metadata when supplied", async () => {
const { document } = await ingestPdf(bytes, {
uri: "file:///example.pdf",
metadata: { source: "test" },
});
expect(document.uri).toBe("file:///example.pdf");
expect(document.metadata).toEqual({ source: "test" });
});
it("accepts ArrayBuffer input", async () => {
const ab = bytes.buffer.slice(bytes.byteOffset, bytes.byteOffset + bytes.byteLength);
const { document } = await ingestPdf(ab);
expect(document.fingerprint).toMatch(/^[0-9a-f]{64}$/);
});
});

88
src/source/pdf/ingest.ts Normal file
View File

@@ -0,0 +1,88 @@
/**
* PDF ingest pipeline → `{ document, representation }`.
*
* Implements `wiki/ArchitectureOverview.md` §3.4 ("Raw Source → identify
* media type → compute fingerprint → extract metadata → extract canonical
* text → build format-specific maps → persist Document +
* DocumentRepresentation") for the PDF source format.
*
* Ingest is a pure function over bytes: it does not persist anything. The
* caller (engine repositories in T05, app layer in T06) writes the returned
* Document + DocumentRepresentation into the chosen store.
*/
import {
type Document,
type DocumentRepresentation,
} from "@shared/document";
import { newId } from "@shared/ids";
import { extractPdf } from "./extract";
import { fingerprintBytes } from "./fingerprint";
const PDF_MEDIA_TYPE = "application/pdf";
export interface IngestPdfOptions {
/** Original filename, used as the default title when no title is given. */
readonly filename?: string;
/** Optional pre-existing title (overrides filename). */
readonly title?: string;
/** Optional source URI (e.g. file:// or https://). */
readonly uri?: string;
/** Free-form metadata persisted on the Document record. */
readonly metadata?: Readonly<Record<string, unknown>>;
}
export interface IngestPdfResult {
readonly document: Document;
readonly representation: DocumentRepresentation;
}
export type IngestPdfInput = Uint8Array | ArrayBuffer | Blob;
export async function ingestPdf(
input: IngestPdfInput,
options: IngestPdfOptions = {},
): Promise<IngestPdfResult> {
const bytes = await toBytes(input);
const [fingerprint, extraction] = await Promise.all([
fingerprintBytes(bytes),
extractPdf(bytes),
]);
const now = new Date().toISOString();
const documentId = newId("document");
const representationId = newId("representation");
const title = options.title ?? options.filename;
const document: Document = {
id: documentId,
mediaType: PDF_MEDIA_TYPE,
fingerprint,
createdAt: now,
updatedAt: now,
...(title !== undefined ? { title } : {}),
...(options.uri !== undefined ? { uri: options.uri } : {}),
...(options.metadata !== undefined ? { metadata: options.metadata } : {}),
};
const representation: DocumentRepresentation = {
id: representationId,
documentId,
representationType: "pdf-text",
contentHash: fingerprint,
canonicalText: extraction.canonicalText,
pageMap: extraction.pageMap,
offsetMap: extraction.offsetMap,
generatedAt: now,
};
return { document, representation };
}
async function toBytes(input: IngestPdfInput): Promise<Uint8Array> {
if (input instanceof Uint8Array) return input;
if (input instanceof ArrayBuffer) return new Uint8Array(input);
// Blob (covers `File` in browsers — File extends Blob).
const buf = await input.arrayBuffer();
return new Uint8Array(buf);
}

View File

@@ -0,0 +1,100 @@
/**
* AnnotationToolbar — wires "I selected text" into "evidence appears in
* the sidebar".
*
* Visible only when a `pendingSelection` is set (the viewer publishes
* captures into context, then this toolbar lets the user attach commentary
* and commit). On Save it runs the full pipeline:
*
* 1. `createSelectors(capture, representation)` — anchor builds the
* maximal selector set against the active representation.
* 2. `engine.annotations.create(...)` — engine mints an Annotation +
* emits AnnotationCreated.
* 3. `engine.evidence.create(...)` — engine mints the EvidenceItem with
* the user's commentary, emits EvidenceItemCreated.
*
* The sidebar re-renders via the engine event bus, so no other glue is
* needed.
*/
import { useEffect, useState } from "react";
import { createSelectors } from "@anchor/index";
import {
useActiveDocument,
useEngine,
usePendingSelection,
} from "./EngineContext";
export function AnnotationToolbar() {
const engine = useEngine();
const { document, representation } = useActiveDocument();
const { pending, set } = usePendingSelection();
const [commentary, setCommentary] = useState("");
// Reset the commentary box whenever a fresh selection arrives.
useEffect(() => {
setCommentary("");
}, [pending]);
if (!pending || !document || !representation) return null;
const handleSave = () => {
const selectors = createSelectors(pending.capture, representation);
const annotation = engine.annotations.create({
documentId: document.id,
representationId: representation.id,
selectors,
quote: pending.capture.text,
});
engine.evidence.create({
annotationIds: [annotation.id],
...(commentary.trim().length > 0 ? { commentary: commentary.trim() } : {}),
});
set(null);
};
const handleDiscard = () => set(null);
const quote = pending.capture.text;
const shortQuote = quote.length > 200 ? `${quote.slice(0, 200)}` : quote;
return (
<div
style={{
borderBottom: "1px solid #f0c040",
background: "#fff8d6",
padding: 8,
fontFamily: "system-ui, sans-serif",
fontSize: 12,
}}
>
<div style={{ marginBottom: 6, fontWeight: 600 }}>
New annotation ({pending.selectors.length} selector{pending.selectors.length === 1 ? "" : "s"})
</div>
<div style={{ marginBottom: 6, fontStyle: "italic", color: "#444" }}>
&ldquo;{shortQuote}&rdquo;
</div>
<textarea
value={commentary}
onChange={(e) => setCommentary(e.target.value)}
placeholder="Add a one-line comment (optional)…"
rows={2}
style={{
width: "100%",
boxSizing: "border-box",
fontSize: 12,
padding: 4,
marginBottom: 6,
}}
/>
<div style={{ display: "flex", gap: 6 }}>
<button onClick={handleSave} style={{ fontSize: 12, padding: "4px 10px" }}>
Save evidence
</button>
<button onClick={handleDiscard} style={{ fontSize: 12, padding: "4px 10px" }}>
Discard
</button>
</div>
</div>
);
}

125
src/work/CollectionList.tsx Normal file
View File

@@ -0,0 +1,125 @@
/**
* CollectionList — the left pane.
*
* Lists the fixture corpus (the MVP stand-in for a real document collection).
* Clicking a fixture fetches the bytes, runs `ingestPdf` (PDF.js extraction
* + fingerprint + canonical text), registers the result with the engine
* (emitting §4 events), and activates it as the current document.
*
* Per CE-WP-0002-T06, the loaded fixture set is hard-wired to
* `fixtures/pdfs/manifest.json`. Real collections arrive in a later
* workplan.
*/
import { useCallback, useState } from "react";
import { ingestPdf } from "@source/index";
import { useEngine, useActiveDocumentId } from "./EngineContext";
import type { DocumentId } from "@shared/ids";
import manifest from "../../fixtures/pdfs/manifest.json";
interface Fixture {
id: string;
filename: string;
description: string;
page_count: number;
}
const FIXTURES: readonly Fixture[] = (manifest as { fixtures: Fixture[] }).fixtures;
export function CollectionList() {
const engine = useEngine();
const { id: activeId, setId } = useActiveDocumentId();
const [loadingFixtureId, setLoadingFixtureId] = useState<string | null>(null);
const [error, setError] = useState<string | null>(null);
// Remember which fixture-id maps to which loaded documentId so re-clicking
// a fixture activates the existing engine record rather than re-ingesting.
const [byFixture, setByFixture] = useState<Record<string, DocumentId>>({});
const handleLoad = useCallback(
async (fixture: Fixture) => {
setError(null);
const existing = byFixture[fixture.id];
if (existing) {
setId(existing);
return;
}
setLoadingFixtureId(fixture.id);
try {
const url = `/fixtures/pdfs/${encodeURIComponent(fixture.filename)}`;
const response = await fetch(url);
if (!response.ok) {
throw new Error(`fetch ${url}${response.status}`);
}
const buffer = await response.arrayBuffer();
const { document, representation } = await ingestPdf(new Uint8Array(buffer), {
filename: fixture.filename,
});
engine.documents.register({ document, representation });
setByFixture((prev) => ({ ...prev, [fixture.id]: document.id }));
setId(document.id);
} catch (err) {
setError(err instanceof Error ? err.message : String(err));
} finally {
setLoadingFixtureId(null);
}
},
[byFixture, engine, setId],
);
return (
<aside
style={{
width: 280,
borderRight: "1px solid #ddd",
padding: 12,
overflow: "auto",
flex: "0 0 280px",
}}
>
<h2 style={{ marginTop: 0, fontSize: 16 }}>Collection</h2>
<p style={{ fontSize: 12, color: "#555", marginTop: 0 }}>
{FIXTURES.length} fixture PDF{FIXTURES.length === 1 ? "" : "s"}
</p>
{error && (
<p style={{ fontSize: 12, color: "#b00020", background: "#fff4f4", padding: 6 }}>
{error}
</p>
)}
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{FIXTURES.map((f) => {
const isLoading = loadingFixtureId === f.id;
const documentId = byFixture[f.id];
const isActive = documentId !== undefined && documentId === activeId;
return (
<li key={f.id} style={{ marginBottom: 6 }}>
<button
onClick={() => {
void handleLoad(f);
}}
disabled={isLoading}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: isActive ? "#e8f0ff" : "white",
border: "1px solid #ccc",
padding: 6,
cursor: isLoading ? "wait" : "pointer",
fontSize: 12,
}}
>
<div style={{ fontWeight: 600 }}>{f.id}</div>
<div style={{ color: "#666", fontSize: 11 }}>
{f.page_count} page{f.page_count === 1 ? "" : "s"}
{isLoading ? " · loading…" : isActive ? " · open" : ""}
</div>
</button>
</li>
);
})}
</ul>
</aside>
);
}

219
src/work/EngineContext.tsx Normal file
View File

@@ -0,0 +1,219 @@
/**
* Engine + active-document React context.
*
* MVP composition root for the UI: one `Engine` instance for the lifetime of
* the SPA, plus the "what's open in the viewer right now" pointer.
* `useEngine()` returns the engine; `useActiveDocument()` returns the
* currently-loaded `{document, representation}` pair, refreshed when the
* engine emits `DocumentImported` / `DocumentRepresentationGenerated`.
*
* Replaces ad-hoc engine wiring inside each component. Per the workplan
* (T07 note), state lives in a single React context; no Zustand or Redux.
*/
import {
createContext,
useCallback,
useContext,
useEffect,
useMemo,
useState,
type ReactNode,
} from "react";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { AnnotationId, DocumentId } from "@shared/ids";
import type { Selector } from "@shared/selector";
import {
attachPersister,
createEngine,
restoreFromStorage,
type Engine,
} from "@engine/index";
import type { PdfSelectionCapture } from "@anchor/index";
/**
* localStorage keys for the engine snapshot and the UI's "what was open"
* pointer. ADR-0005 frames both as deliberately temporary — real
* persistence later.
*/
const STORAGE_KEY = "citation-evidence:engine-snapshot:v1";
const ACTIVE_KEY = "citation-evidence:active-document-id:v1";
/**
* The pending selection lives in context (not local component state) because
* the toolbar that consumes it is rendered above the viewer, not inside it.
* `null` means "no selection waiting for a comment".
*/
export interface PendingSelection {
readonly capture: PdfSelectionCapture;
readonly selectors: readonly Selector[];
}
interface EngineContextValue {
readonly engine: Engine;
readonly activeDocumentId: DocumentId | null;
setActiveDocumentId(id: DocumentId | null): void;
readonly pendingSelection: PendingSelection | null;
setPendingSelection(pending: PendingSelection | null): void;
readonly scrollToAnnotationId: AnnotationId | null;
/** The version counter bumps even when the same id is set twice in a row,
* so a second click on the same evidence item still triggers a scroll. */
readonly scrollVersion: number;
scrollToAnnotation(id: AnnotationId | null): void;
}
const EngineContext = createContext<EngineContextValue | null>(null);
interface EngineProviderProps {
readonly children: ReactNode;
/** Inject a pre-built engine for tests; production uses the default. */
readonly engine?: Engine;
}
export function EngineProvider({ children, engine: injected }: EngineProviderProps) {
const engine = useMemo(() => injected ?? createEngine(), [injected]);
const [activeDocumentId, setActiveDocumentIdState] = useState<DocumentId | null>(null);
const [pendingSelection, setPendingSelection] = useState<PendingSelection | null>(null);
const [scrollState, setScrollState] = useState<{ id: AnnotationId | null; version: number }>({
id: null,
version: 0,
});
// Restore from localStorage on first mount, then attach the persister.
// The injected-engine path skips persistence (tests own their lifecycle).
useEffect(() => {
if (injected) return;
if (typeof globalThis.localStorage === "undefined") return;
const result = restoreFromStorage(engine, { key: STORAGE_KEY });
if (result.restored) {
const saved = globalThis.localStorage.getItem(ACTIVE_KEY);
if (saved && engine.documents.get(saved as DocumentId)) {
setActiveDocumentIdState(saved as DocumentId);
}
}
return attachPersister(engine, { key: STORAGE_KEY });
}, [engine, injected]);
// Persist the active-document pointer alongside the engine snapshot so a
// reload lands the user back where they were.
useEffect(() => {
if (injected) return;
if (typeof globalThis.localStorage === "undefined") return;
if (activeDocumentId) {
globalThis.localStorage.setItem(ACTIVE_KEY, activeDocumentId);
} else {
globalThis.localStorage.removeItem(ACTIVE_KEY);
}
}, [activeDocumentId, injected]);
// Switching the active document discards any pending selection — it
// belongs to the previous document's viewer state.
const setActiveDocumentId = useCallback((id: DocumentId | null) => {
setActiveDocumentIdState(id);
setPendingSelection(null);
setScrollState((prev) => ({ id: null, version: prev.version + 1 }));
}, []);
const scrollToAnnotation = useCallback((id: AnnotationId | null) => {
setScrollState((prev) => ({ id, version: prev.version + 1 }));
}, []);
const value = useMemo<EngineContextValue>(
() => ({
engine,
activeDocumentId,
setActiveDocumentId,
pendingSelection,
setPendingSelection,
scrollToAnnotationId: scrollState.id,
scrollVersion: scrollState.version,
scrollToAnnotation,
}),
[engine, activeDocumentId, setActiveDocumentId, pendingSelection, scrollState, scrollToAnnotation],
);
return <EngineContext.Provider value={value}>{children}</EngineContext.Provider>;
}
export function useEngine(): Engine {
const ctx = useContext(EngineContext);
if (!ctx) throw new Error("useEngine: missing EngineProvider");
return ctx.engine;
}
export function useActiveDocumentId(): {
readonly id: DocumentId | null;
setId(id: DocumentId | null): void;
} {
const ctx = useContext(EngineContext);
if (!ctx) throw new Error("useActiveDocumentId: missing EngineProvider");
return { id: ctx.activeDocumentId, setId: ctx.setActiveDocumentId };
}
export function useActiveDocument(): {
readonly document: Document | null;
readonly representation: DocumentRepresentation | null;
} {
const engine = useEngine();
const { id } = useActiveDocumentId();
const [tick, setTick] = useState(0);
// Re-render when documents come and go so list views stay fresh.
useEffect(() => {
const off1 = engine.bus.on("DocumentImported", () => setTick((t) => t + 1));
const off2 = engine.bus.on("DocumentRepresentationGenerated", () => setTick((t) => t + 1));
return () => {
off1();
off2();
};
}, [engine]);
const document = id ? engine.documents.get(id) : null;
const representation = id
? engine.documents.listRepresentations(id).at(-1) ?? null
: null;
// `tick` is intentionally read to silence unused-var warnings; the dep
// chain is via useState so React handles the re-render. We don't actually
// need to consume the value.
void tick;
return { document, representation };
}
/**
* Subscribe to a single engine event type and trigger a re-render each time
* it fires. Returns the current monotonic counter — pure state-marker.
*/
export function useEngineEventTick<T extends Parameters<Engine["bus"]["on"]>[0]>(
type: T,
): number {
const engine = useEngine();
const [tick, setTick] = useState(0);
const bump = useCallback(() => setTick((t) => t + 1), []);
useEffect(() => engine.bus.on(type, bump), [engine, type, bump]);
return tick;
}
export function usePendingSelection(): {
readonly pending: PendingSelection | null;
set(pending: PendingSelection | null): void;
} {
const ctx = useContext(EngineContext);
if (!ctx) throw new Error("usePendingSelection: missing EngineProvider");
return { pending: ctx.pendingSelection, set: ctx.setPendingSelection };
}
export function useScrollToAnnotation(): {
readonly id: AnnotationId | null;
readonly version: number;
scrollTo(id: AnnotationId | null): void;
} {
const ctx = useContext(EngineContext);
if (!ctx) throw new Error("useScrollToAnnotation: missing EngineProvider");
return {
id: ctx.scrollToAnnotationId,
version: ctx.scrollVersion,
scrollTo: ctx.scrollToAnnotation,
};
}

View File

@@ -0,0 +1,101 @@
/**
* EvidenceSidebar — the right pane.
*
* Lists `EvidenceItem`s scoped to the currently-active document. Each row
* shows quote + commentary + status. Clicking a row emits
* `EvidenceItemActivated` via the engine, which T08 will translate into a
* scroll-to-passage in the viewer.
*
* T06 scope: read-only display + activation event. Item creation lives in
* T07; the click-to-reopen integration lives in T08.
*/
import { useMemo } from "react";
import type { EvidenceItem } from "@shared/evidence";
import {
useActiveDocument,
useEngine,
useEngineEventTick,
useScrollToAnnotation,
} from "./EngineContext";
export interface EvidenceSidebarProps {
onActivate?(item: EvidenceItem): void;
}
export function EvidenceSidebar(props: EvidenceSidebarProps) {
const engine = useEngine();
const { document } = useActiveDocument();
const { scrollTo } = useScrollToAnnotation();
// Refresh the list when items are created or updated. The tick values are
// included in the memo deps below so the list re-resolves on each event.
const createTick = useEngineEventTick("EvidenceItemCreated");
const updateTick = useEngineEventTick("EvidenceItemUpdated");
const items = useMemo<readonly EvidenceItem[]>(() => {
if (!document) return [];
return engine.evidence.listByDocument(document.id);
// createTick / updateTick are read here purely as memo invalidators.
}, [document, engine, createTick, updateTick]);
return (
<aside
style={{
width: 320,
borderLeft: "1px solid #ddd",
padding: 12,
overflow: "auto",
flex: "0 0 320px",
fontFamily: "system-ui, sans-serif",
}}
>
<h2 style={{ marginTop: 0, fontSize: 16 }}>Evidence</h2>
{!document && (
<p style={{ fontSize: 12, color: "#888" }}>No document open.</p>
)}
{document && items.length === 0 && (
<p style={{ fontSize: 12, color: "#888" }}>
No evidence yet. Select a passage in the viewer to create one.
</p>
)}
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{items.map((item) => {
const firstAnnotationId = item.annotationIds[0];
const annotation = firstAnnotationId ? engine.annotations.get(firstAnnotationId) : null;
const quote = annotation?.quote ?? "(no quote)";
return (
<li key={item.id} style={{ marginBottom: 8 }}>
<button
onClick={() => {
engine.evidence.activate(item.id, "sidebar");
if (firstAnnotationId) scrollTo(firstAnnotationId);
props.onActivate?.(item);
}}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: "#fff8d6",
border: "1px solid #e0c050",
padding: 8,
cursor: "pointer",
fontSize: 12,
}}
>
<div style={{ fontStyle: "italic", marginBottom: 4 }}>
&ldquo;{quote.slice(0, 140)}
{quote.length > 140 ? "…" : ""}&rdquo;
</div>
{item.commentary && (
<div style={{ color: "#333", marginBottom: 4 }}>{item.commentary}</div>
)}
<div style={{ color: "#666", fontSize: 11 }}>status: {item.status}</div>
</button>
</li>
);
})}
</ul>
</aside>
);
}

94
src/work/ViewerShell.tsx Normal file
View File

@@ -0,0 +1,94 @@
/**
* ViewerShell — the centre pane.
*
* Hosts the viewer adapter (currently the T02 PDF spike) and shows whatever
* is active. `work/` consumes only the adapter's public surface
* (`PdfSpikeViewer`) — it never touches PDF.js or react-pdf-highlighter-plus
* directly. When the PDF library is swapped (or the spike is replaced),
* only the adapter module changes; this shell stays the same.
*
* T06 scope: load + render the active PDF + show stored annotations. The
* selection-capture → annotation pipeline is wired in T07; the
* click-to-reopen pipeline is wired in T08.
*/
import { useMemo } from "react";
import { PdfSpikeViewer, type StoredAnnotation } from "@anchor/index";
import {
useActiveDocument,
useEngine,
useEngineEventTick,
usePendingSelection,
useScrollToAnnotation,
} from "./EngineContext";
import { AnnotationToolbar } from "./AnnotationToolbar";
export function ViewerShell() {
const engine = useEngine();
const { document, representation } = useActiveDocument();
const { set: setPending } = usePendingSelection();
const { id: scrollToId, version: scrollVersion } = useScrollToAnnotation();
// The viewer needs to re-fetch its highlight list whenever annotations
// change. The tick is included in the memo deps so the list re-resolves.
const annotationTick = useEngineEventTick("AnnotationCreated");
const annotations = useMemo<StoredAnnotation[]>(() => {
if (!document) return [];
return engine.annotations.listByDocument(document.id).map((a) => ({
id: a.id,
text: a.quote ?? "",
selectors: a.selectors,
}));
}, [document, engine, annotationTick]);
const fileUrl = useMemo(() => {
if (!document) return null;
const titleOrId = document.title ?? document.id;
return `/fixtures/pdfs/${encodeURIComponent(titleOrId)}`;
}, [document]);
if (!document || !representation || !fileUrl) {
return (
<main
style={{
flex: 1,
display: "flex",
alignItems: "center",
justifyContent: "center",
color: "#666",
fontFamily: "system-ui, sans-serif",
}}
>
Pick a fixture on the left to begin.
</main>
);
}
return (
<main
style={{
flex: 1,
display: "flex",
flexDirection: "column",
overflow: "hidden",
position: "relative",
}}
>
<AnnotationToolbar />
<div style={{ flex: 1, overflow: "hidden", position: "relative" }}>
<PdfSpikeViewer
// Re-key on scrollVersion so clicking the same item twice still
// triggers the viewer's mount-time scroll effect.
key={`${document.id}#${scrollVersion}`}
pdfUrl={fileUrl}
storedAnnotations={annotations}
{...(scrollToId ? { scrollToAnnotationId: scrollToId } : {})}
onSelectionCaptured={(capture, selectors) => {
setPending({ capture, selectors });
}}
/>
</div>
</main>
);
}

View File

@@ -1 +1,13 @@
export {};
export { CollectionList } from "./CollectionList";
export { ViewerShell } from "./ViewerShell";
export { EvidenceSidebar, type EvidenceSidebarProps } from "./EvidenceSidebar";
export { AnnotationToolbar } from "./AnnotationToolbar";
export {
EngineProvider,
useEngine,
useActiveDocument,
useActiveDocumentId,
useEngineEventTick,
usePendingSelection,
type PendingSelection,
} from "./EngineContext";

View File

@@ -0,0 +1,72 @@
/**
* Integration round-trip: ingest → createSelectors → resolveSelectors.
*
* This test crosses the source ↔ anchor boundary, which the
* `boundaries/element-types` lint rule (correctly) forbids inside `src/`.
* It lives under `tests/integration/` so it can verify the end-to-end
* MVP contract without weakening the production-code boundary.
*
* CE-WP-0002-T04 contract:
* For each fixture+known-quote pair, create selectors then immediately
* resolve them; resolution must succeed with confidence ≥ 0.9.
*/
import { readFileSync } from "node:fs";
import { dirname, resolve } from "node:path";
import { createRequire } from "node:module";
import { fileURLToPath } from "node:url";
import { beforeAll, describe, expect, it } from "vitest";
import { ingestPdf } from "@source/pdf/ingest";
import { createSelectors, resolveSelectors } from "@anchor/selectors";
import type { PdfSelectionCapture } from "@anchor/types";
import manifest from "../../fixtures/pdfs/manifest.json" with { type: "json" };
const __dirname = dirname(fileURLToPath(import.meta.url));
const FIXTURE_DIR = resolve(__dirname, "../../fixtures/pdfs");
interface Fixture {
id: string;
filename: string;
known_good_quote: string;
known_good_quote_page: number;
}
const FIXTURES: readonly Fixture[] = manifest.fixtures;
beforeAll(async () => {
const pdfjs = await import("pdfjs-dist");
const require = createRequire(import.meta.url);
pdfjs.GlobalWorkerOptions.workerSrc = require.resolve(
"pdfjs-dist/legacy/build/pdf.worker.mjs",
);
});
describe("create + resolve round-trip — fixture corpus", () => {
for (const fixture of FIXTURES) {
it(`${fixture.id}: known-good quote round-trips with confidence ≥ 0.9`, async () => {
const bytes = new Uint8Array(readFileSync(resolve(FIXTURE_DIR, fixture.filename)));
const { representation } = await ingestPdf(bytes, { filename: fixture.filename });
const capture: PdfSelectionCapture = {
kind: "pdf",
text: fixture.known_good_quote,
page: fixture.known_good_quote_page,
rects: [{ x: 0.1, y: 0.2, width: 0.5, height: 0.04 }],
};
const selectors = createSelectors(capture, representation);
const resolution = resolveSelectors(selectors, representation);
expect(resolution.status).toBe("resolved");
expect(resolution.confidence).toBeGreaterThanOrEqual(0.9);
const span = resolution.candidates[0]?.textPosition;
expect(span).toBeDefined();
const text = representation.canonicalText ?? "";
expect(text.slice(span!.start, span!.end)).toBe(fixture.known_good_quote);
expect(resolution.candidates[0]?.page).toBe(fixture.known_good_quote_page);
});
}
});

View File

@@ -0,0 +1,262 @@
/**
* CE-WP-0002-T09 — end-to-end test of the PRD slice-1 scenario.
*
* Steps verified (per the workplan):
* 1. Open the app.
* 2. Pick a fixture PDF.
* 3. Programmatically inject a selection for the manifest's known-good
* quote (the actual drag-select interaction is verified manually —
* see ADR-0004 for the headless-Chromium limitation).
* 4. Save an evidence item with a comment.
* 5. The item appears in the sidebar.
* 6. Reload the page (unmount + remount with the same localStorage).
* 7. Click the evidence item.
* 8. The viewer is asked to scroll to the correct annotation
* (the visual highlight render is not exercised here — see ADR-0004).
*
* The PDF.js viewer + ingest pipeline are mocked. Both have dedicated
* coverage elsewhere (`src/source/pdf/ingest.test.ts`,
* `tests/integration/anchor-source-roundtrip.test.ts`,
* `src/anchor/pdf-selector-math.test.ts`). T09 owns the *wiring* of the
* full UX flow; the PDF-rendering correctness lives in those layers.
*/
// @vitest-environment happy-dom
import { act, render, screen, waitFor } from "@testing-library/react";
import userEvent from "@testing-library/user-event";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import type { Selector } from "@shared/selector";
import type { DocumentId, RepresentationId } from "@shared/ids";
import type { Document, DocumentRepresentation } from "@shared/document";
import type { PdfSelectionCapture } from "@anchor/index";
import manifest from "../../fixtures/pdfs/manifest.json" with { type: "json" };
// ---------------------------------------------------------------------------
// Mocks
// ---------------------------------------------------------------------------
interface ViewerProps {
pdfUrl: string;
storedAnnotations: readonly { id: string; text: string; selectors: readonly Selector[] }[];
scrollToAnnotationId?: string;
onSelectionCaptured(capture: PdfSelectionCapture, selectors: Selector[]): void;
}
interface ViewerSnapshot {
pdfUrl: string | null;
storedAnnotationIds: string[];
scrollToAnnotationId: string | null;
onSelectionCaptured: ViewerProps["onSelectionCaptured"] | null;
}
const viewerSnapshot: ViewerSnapshot = {
pdfUrl: null,
storedAnnotationIds: [],
scrollToAnnotationId: null,
onSelectionCaptured: null,
};
vi.mock("@anchor/index", async (importOriginal) => {
const original = await importOriginal<typeof import("@anchor/index")>();
const MockPdfSpikeViewer = (props: ViewerProps) => {
viewerSnapshot.pdfUrl = props.pdfUrl;
viewerSnapshot.storedAnnotationIds = props.storedAnnotations.map((a) => a.id);
viewerSnapshot.scrollToAnnotationId = props.scrollToAnnotationId ?? null;
viewerSnapshot.onSelectionCaptured = props.onSelectionCaptured;
return (
<div
data-testid="mock-pdf-viewer"
data-pdf-url={props.pdfUrl}
data-scroll-to={props.scrollToAnnotationId ?? ""}
data-stored-count={String(props.storedAnnotations.length)}
/>
);
};
return {
...original,
PdfSpikeViewer: MockPdfSpikeViewer,
};
});
// Mock ingestPdf so the click-to-load flow doesn't pull PDF.js into jsdom.
// The synthetic representation includes the manifest's known-good quote in
// its canonical text so createSelectors/resolveSelectors run for real.
const FIXTURE = manifest.fixtures.find((f) => f.id === "fristsetzung-bezifferung")!;
const SYNTHETIC_CANONICAL = [
"Header boilerplate that comes before the quote.",
FIXTURE.known_good_quote,
"Trailing prose that comes after the quote.",
].join(" ");
vi.mock("@source/index", async (importOriginal) => {
const original = await importOriginal<typeof import("@source/index")>();
return {
...original,
ingestPdf: vi.fn(async (_input: unknown, options?: { filename?: string }) => {
const documentId = ("doc_test_" + Math.random().toString(36).slice(2, 10)) as DocumentId;
const representationId = ("rep_test_" + Math.random().toString(36).slice(2, 10)) as RepresentationId;
const document: Document = {
id: documentId,
mediaType: "application/pdf",
...(options?.filename ? { title: options.filename } : {}),
fingerprint: "synthetic-fingerprint-for-test",
createdAt: "2026-05-25T00:00:00.000Z",
updatedAt: "2026-05-25T00:00:00.000Z",
};
const representation: DocumentRepresentation = {
id: representationId,
documentId,
representationType: "pdf-text",
contentHash: "synthetic-fingerprint-for-test",
canonicalText: SYNTHETIC_CANONICAL,
pageMap: [{ page: 1, width: 595, height: 842 }],
offsetMap: [
{
page: 1,
globalStart: 0,
globalEnd: SYNTHETIC_CANONICAL.length,
pageLength: SYNTHETIC_CANONICAL.length,
},
],
generatedAt: "2026-05-25T00:00:00.000Z",
};
return { document, representation };
}),
};
});
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
function resetViewerSnapshot() {
viewerSnapshot.pdfUrl = null;
viewerSnapshot.storedAnnotationIds = [];
viewerSnapshot.scrollToAnnotationId = null;
viewerSnapshot.onSelectionCaptured = null;
}
function syntheticCaptureFor(fixturePage: number, text: string): PdfSelectionCapture {
return {
kind: "pdf",
text,
page: fixturePage,
rects: [{ x: 0.1, y: 0.2, width: 0.4, height: 0.04 }],
boundingRect: { x: 0.1, y: 0.2, width: 0.4, height: 0.04 },
};
}
async function loadApp() {
// Late import so the vi.mock calls take effect before the module graph
// pulls in @anchor / @source.
const { App } = await import("@app/App");
return render(<App />);
}
// ---------------------------------------------------------------------------
// Test
// ---------------------------------------------------------------------------
describe("App — PRD scenario steps 1-8 (CE-WP-0002-T09)", () => {
beforeEach(() => {
resetViewerSnapshot();
// Each test starts with empty localStorage.
globalThis.localStorage?.clear();
// The fetch isn't reached (ingestPdf is mocked) — but stub it so that
// any accidental call returns gracefully instead of TypeError.
globalThis.fetch = vi.fn(async () =>
new Response(new Uint8Array([0x25, 0x50, 0x44, 0x46]).buffer, {
status: 200,
headers: { "Content-Type": "application/pdf" },
}),
);
});
afterEach(() => {
vi.restoreAllMocks();
});
it("walks the full slice-1 scenario: load → select → save → reload → click → scroll", async () => {
const user = userEvent.setup();
// Step 1: open the app.
const { unmount } = await loadApp();
expect(screen.getByText("Collection")).toBeTruthy();
// Step 2: pick a fixture.
const fixtureButton = screen.getByRole("button", { name: new RegExp(FIXTURE.id) });
await user.click(fixtureButton);
// The mock viewer should have mounted with our test URL.
await waitFor(() => {
expect(viewerSnapshot.pdfUrl).toBeTruthy();
});
expect(decodeURIComponent(viewerSnapshot.pdfUrl!)).toContain(FIXTURE.filename);
// Step 3: programmatically inject a selection for the known-good quote.
expect(viewerSnapshot.onSelectionCaptured).not.toBeNull();
await act(async () => {
viewerSnapshot.onSelectionCaptured!(
syntheticCaptureFor(FIXTURE.known_good_quote_page, FIXTURE.known_good_quote),
[{ type: "TextQuoteSelector", exact: FIXTURE.known_good_quote }],
);
});
// The toolbar should appear with the quoted text preview.
const toolbar = await screen.findByText(/New annotation/);
expect(toolbar).toBeTruthy();
// Step 4: add a comment and save.
const textarea = screen.getByPlaceholderText(/Add a one-line comment/);
await user.type(textarea, "Important deadline clause");
await user.click(screen.getByRole("button", { name: /Save evidence/ }));
// Step 5: the item appears in the sidebar. The commentary text is
// unique to the right pane (the collection list never echoes it back).
await screen.findByText(/Important deadline clause/);
// The mock viewer should now know about the stored annotation.
await waitFor(() => {
expect(viewerSnapshot.storedAnnotationIds.length).toBe(1);
});
const savedAnnotationId = viewerSnapshot.storedAnnotationIds[0]!;
// Snapshot key from EngineContext.STORAGE_KEY — implementation detail
// but worth asserting once at the integration layer.
const stored = globalThis.localStorage?.getItem("citation-evidence:engine-snapshot:v1");
expect(stored).toBeTruthy();
// Step 6: reload — unmount and remount the App. The same localStorage is
// still in place because we did not clear it.
unmount();
resetViewerSnapshot();
await loadApp();
// The viewer should re-mount automatically because the active document
// was persisted.
await waitFor(() => {
expect(viewerSnapshot.pdfUrl).toBeTruthy();
});
expect(decodeURIComponent(viewerSnapshot.pdfUrl!)).toContain(FIXTURE.filename);
// The sidebar should show the restored item.
const restoredItem = await screen.findByText(/Important deadline clause/);
// The viewer must already know about the restored annotation.
expect(viewerSnapshot.storedAnnotationIds).toContain(savedAnnotationId);
// Step 7: click the evidence item in the sidebar. The commentary text
// is rendered inside the sidebar's <button>; walking up to its closest
// button ancestor and clicking it triggers the activate flow.
const sidebarButton = restoredItem.closest("button");
expect(sidebarButton).not.toBeNull();
await user.click(sidebarButton!);
// Step 8: the viewer was asked to scroll to the right annotation.
await waitFor(() => {
expect(viewerSnapshot.scrollToAnnotationId).toBe(savedAnnotationId);
});
});
});

View File

@@ -34,6 +34,6 @@
"@app/*": ["src/app/*"]
}
},
"include": ["src", "vite.config.ts", "vitest.config.ts"],
"include": ["src", "tests", "vite.config.ts", "vitest.config.ts"],
"exclude": ["node_modules", "dist"]
}

16
vitest.config.ts Normal file
View File

@@ -0,0 +1,16 @@
import { defineConfig } from "vitest/config";
import viteConfig from "./vite.config";
// Use the same resolve aliases as the app; pick a DOM environment for tests
// whose filename ends in `.dom.test.{ts,tsx}` so we can mount React via
// @testing-library/react. All other tests run in Node, which is faster and
// has full pdfjs-dist legacy-worker support.
export default defineConfig({
...viteConfig,
test: {
environmentMatchGlobs: [
["**/*.dom.test.{ts,tsx}", "happy-dom"],
],
globals: false,
},
});

View File

@@ -8,7 +8,7 @@ repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
topic_slug: citation_evidence_mvp
topic_id: 96fa8e80-9f74-40f2-84cd-644e9747b9ec
state_hub_workstream_id: 19cb420b-c262-4c0e-afab-e85946b2cfce
status: todo
status: done
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
@@ -129,7 +129,7 @@ and proposed alternative. Do not proceed with T03+.
id: CE-WP-0002-T03
state_hub_task_id: 01dad096-3521-42b9-aed9-ce0b2f5d3450
priority: high
status: todo
status: done
depends_on: [T02]
```
@@ -152,7 +152,7 @@ extracted canonical text must contain the manifest's known-good quote.
id: CE-WP-0002-T04
state_hub_task_id: 62e4839a-8026-4e15-b4cc-6685e56b3584
priority: high
status: todo
status: done
depends_on: [T01, T03]
```
@@ -183,7 +183,7 @@ confidence ≥ 0.9.
id: CE-WP-0002-T05
state_hub_task_id: b339a73a-6b58-471c-a01d-e769ea414ee7
priority: high
status: todo
status: done
depends_on: [T01]
```
@@ -206,7 +206,7 @@ No persistence to disk yet. ADR-0005 (persistence) is still pending.
id: CE-WP-0002-T06
state_hub_task_id: f400e133-6ec6-4d5a-98a0-a6408ca4125e
priority: high
status: todo
status: done
depends_on: [T02, T05]
```
@@ -232,7 +232,7 @@ note in ADR-0001 if it wasn't already.
id: CE-WP-0002-T07
state_hub_task_id: 26346a07-bf98-4d43-8b30-de2038ab72f8
priority: high
status: todo
status: done
depends_on: [T04, T05, T06]
```
@@ -254,7 +254,7 @@ Active state lives in a single React context for now; no Redux/Zustand.
id: CE-WP-0002-T08
state_hub_task_id: 469e3fb4-1b42-49a7-88dc-29a6d5055ef5
priority: critical
status: todo
status: done
depends_on: [T04, T06, T07]
```
@@ -276,7 +276,7 @@ Critically, this must also work **after a page reload**. Persistence to
id: CE-WP-0002-T09
state_hub_task_id: 77423e57-f2c5-42e1-9e6c-c9b6fa35dfcf
priority: high
status: todo
status: done
depends_on: [T07, T08]
```