generated from coulomb/repo-seed
Implement CE-WP-0002 T03-T09: ingest, anchor resolution, engine, UI, persistence, e2e
Completes the PDF review slice end-to-end. After this commit a user can
open a fixture, select text, save an evidence item with commentary, see
it in the sidebar, reload the page, click the item, and the viewer
scrolls to the passage.
- T03 src/source/pdf/{fingerprint,extract,ingest}.ts + 39 fixture tests
- SHA-256 fingerprint over a fresh ArrayBuffer (TS BufferSource-safe)
- PDF.js text extract; per-page normalize then join with "\n\n"
- PageMap + OffsetMap (gap-free coverage); pageLength = end - start
- Updated manifest's Betriebskosten quote to one PDF.js extracts cleanly
- T04 src/anchor/selectors/{create,resolve}.ts + 25 unit + 7 fixture tests
- createSelectors emits the maximal redundant set (TextQuote +
TextPosition + PdfRect + PdfPageText when available)
- resolveSelectors implements the SharedContracts §7 ladder; confidence
1.0 (pos+quote) → 0.7 (rect-only) → 0 (unresolved)
- Cross-module integration test moved to tests/integration/ to honor
the anchor↛source boundary lint rule
- T05 engine: sync event bus over the closed §4 vocabulary, Map-backed
repos, services, createEngine() composition root, 12 tests
- T06 work + app: three-pane shell (CollectionList | ViewerShell |
EvidenceSidebar) wired through EngineProvider; EngineContext lives in
src/work/ to respect the work↛app boundary; SpikeApp deleted
- T07 AnnotationToolbar: pendingSelection in context; Save runs
createSelectors → engine.annotations.create → engine.evidence.create
- T08 click-to-reopen + localStorage persistence
- scrollToAnnotation state in context with a version counter so a
second click on the same item re-fires the viewer scroll
- captureSnapshot/restoreSnapshot/attachPersister/restoreFromStorage;
restore bypasses services to avoid event-loops
- active-document id persisted alongside the snapshot so reload lands
on the same fixture; ADR-0005 written
- 9 persistence tests
- T09 tests/integration/app-prd-scenario.dom.test.tsx
- end-to-end happy-dom test of PRD scenario steps 1-8 through the real
React tree; viewer + ingest mocked per ADR-0004's headless-Chromium
limitation. Fixed memo-deps bug in EvidenceSidebar/ViewerShell where
useEngineEventTick values were not included in the useMemo deps,
leaving stale memoization across event-driven re-renders
- vitest.config.ts: happy-dom for *.dom.test.{ts,tsx} files
- noEmit added to tsconfig so tsc -b doesn't litter src/ with .js outputs
Gates: typecheck ✓ lint ✓ test 109/109 across 11 files ✓ build ✓
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,38 +1,85 @@
|
||||
# ADR-0005 — Persistence layer (MVP and beyond)
|
||||
# ADR-0005 — Persistence for the MVP slice
|
||||
|
||||
- Status: proposed
|
||||
- Date: 2026-05-24
|
||||
- Workplan: CE-WP-0001-T07 (stub); MVP placeholder in CE-WP-0002-T08
|
||||
- Status: accepted (provisional — durable storage owned by a later workplan)
|
||||
- Date: 2026-05-25
|
||||
- Workplan: CE-WP-0002-T08 (click-to-reopen requires reload-survival)
|
||||
|
||||
## Context
|
||||
|
||||
The MVP needs persistence so that "click an evidence item and have the PDF
|
||||
jump to and highlight the passage — even after a full page reload" works
|
||||
(PRD §20 step 4). The acceptable MVP shortcut is `localStorage` (decided
|
||||
explicitly in CE-WP-0002-T08).
|
||||
CE-WP-0002 needs the click-to-reopen flow to survive a page reload (PRD
|
||||
scenario step 4 → "even after a full page reload"). The full persistence
|
||||
design (SQLite local-first vs Postgres server-first) is too large to land
|
||||
inside this slice — `wiki/ArchitectureOverview.md` §10 lays out the bigger
|
||||
picture but the workplan explicitly defers the decision.
|
||||
|
||||
This ADR is the durable home for the real persistence decision: where do
|
||||
documents, annotations, evidence items, links, and sets live in v1.0?
|
||||
The engine already runs `Map`-backed in-memory repositories
|
||||
(`src/engine/repos/in-memory.ts`). To survive reloads we need *some*
|
||||
persistence boundary now, without committing to the long-term store.
|
||||
|
||||
## Options
|
||||
|
||||
- **A. Browser-local only (IndexedDB via `idb` or `dexie`)**
|
||||
- Pros: zero infra; great for a single-user reference workspace.
|
||||
- Cons: no cross-device sync; export/import only via files.
|
||||
|
||||
- **B. Local-first + sync server (e.g. CRDT-backed)**
|
||||
- Pros: matches the long-term vision of a workspace tool; conflict-free
|
||||
multi-device.
|
||||
- Cons: significant infra and CRDT design cost; out of MVP scope.
|
||||
|
||||
- **C. Traditional client/server with a REST or GraphQL API**
|
||||
- Pros: familiar; easy team-sharing story.
|
||||
- Cons: requires hosting; loses the local-first character.
|
||||
- **A. localStorage snapshot (this ADR).** The SPA serializes the entire
|
||||
engine state into a single JSON blob on every mutation and restores it
|
||||
on mount. No new dependencies; no schema migrations; no networking.
|
||||
Per-tab only.
|
||||
- **B. IndexedDB-backed store.** More headroom, more API surface, async
|
||||
reads. Needed eventually for binary blobs (PDF bytes) but overkill for
|
||||
the few hundred annotations the MVP produces.
|
||||
- **C. SQLite via `sql.js` or `wa-sqlite`.** Brings query semantics into
|
||||
the browser. Heavy for the MVP and entangles us with a database we may
|
||||
not keep.
|
||||
- **D. Server-backed persistence from day one.** Requires shipping a
|
||||
backend. Premature.
|
||||
|
||||
## Decision
|
||||
|
||||
(blank — to be answered before the second product slice past MVP.)
|
||||
Adopt **A: localStorage snapshot**, deliberately temporary.
|
||||
|
||||
Implementation lives in `src/engine/persistence.ts`:
|
||||
|
||||
- `captureSnapshot(engine)` returns
|
||||
`{ documents, representations, annotations, evidenceItems }`.
|
||||
- `attachPersister(engine, { key })` subscribes to every mutating engine
|
||||
event and writes a fresh snapshot to `localStorage` after each.
|
||||
- `restoreFromStorage(engine, { key })` reads the snapshot on app mount
|
||||
and hydrates the repos *directly* (bypassing service `create()` calls)
|
||||
so no spurious `*Created` events fire — the persister would otherwise
|
||||
loop on its own writes, and other UI listeners would see "the same
|
||||
annotation was created again" on every reload.
|
||||
- Snapshot is versioned (`SNAPSHOT_VERSION = 1`); a version mismatch
|
||||
throws on restore so a future schema bump is loud.
|
||||
|
||||
`src/work/EngineContext.tsx`'s `EngineProvider` wires this on first mount.
|
||||
A sibling localStorage key holds the last-active `documentId` so reload
|
||||
lands the user back on the same fixture.
|
||||
|
||||
## Why this is acceptable for the MVP
|
||||
|
||||
- The engine never holds PDF bytes — only metadata + selectors + commentary.
|
||||
A typical session is well under 1 MB even with hundreds of annotations,
|
||||
comfortably within the ~5 MB localStorage budget.
|
||||
- The repositories' `create()` signatures already match the shape an
|
||||
eventual durable repo would expose; swapping the implementation is a
|
||||
localised change.
|
||||
- "Survives reload" is the only persistence requirement of CE-WP-0002.
|
||||
Cross-device sync, multi-user access, query-by-tag, history — none are
|
||||
in scope yet.
|
||||
|
||||
## What this defers
|
||||
|
||||
- A real persistence ADR (SQLite local-first vs Postgres server-first vs
|
||||
IndexedDB) for CE-WP-0005+ work.
|
||||
- PDF byte persistence. Today the SPA re-fetches `/fixtures/pdfs/*` on
|
||||
load; bytes do not enter the snapshot.
|
||||
- Multi-tab consistency. Tabs see each other's writes only on reload.
|
||||
- Migrations beyond the version check.
|
||||
|
||||
## Consequences
|
||||
|
||||
(blank)
|
||||
- `src/engine/persistence.ts` is the single point of contact for storage.
|
||||
When the real durable-store ADR lands, that module is what changes.
|
||||
- Tests inject a memory-Storage shim into `attachPersister` /
|
||||
`restoreFromStorage` so they don't depend on a browser environment
|
||||
(see `src/engine/persistence.test.ts`).
|
||||
- Clearing the user's browser storage destroys all annotations — call
|
||||
this out in the README once the MVP ships.
|
||||
|
||||
@@ -1,14 +1,14 @@
|
||||
{
|
||||
"_schema_version": 1,
|
||||
"_description": "PDF fixture corpus for citation-evidence selector tests. Each entry binds a stable id (used by test code) to a file path, page count, and a verbatim known-good quote with its 1-indexed physical PDF page number. The quote is short, unique within the document, and chosen to round-trip cleanly through the canonical text normalizer.",
|
||||
"_provenance": "Page counts and quotes extracted on 2026-05-24 by reading each PDF directly. The Betriebskosten file is a scanned/handwritten form with noisy OCR text — its quote is taken from the reliably-extracted printed boilerplate, not from the handwritten fields.",
|
||||
"_provenance": "Page counts and quotes extracted on 2026-05-24 by reading each PDF directly, then re-verified on 2026-05-25 against the PDF.js v4 text extractor used by src/source/pdf/extract.ts. The Betriebskosten file is a scanned/handwritten form with noisy OCR text — its known-good quote was updated 2026-05-25 from 'Ich bitte um Überweisung auf das Konto bei' to 'Auf der Rückseite finden Sie Ihre Abrechnung' because PDF.js drops the capital-Ü in the original (the lowercase-ü in 'Rückseite' survives, so the new quote still exercises the umlaut code path).",
|
||||
"fixtures": [
|
||||
{
|
||||
"id": "betriebskosten-2024",
|
||||
"filename": "031-Kemal Güldag Betriebskosten 2024.pdf",
|
||||
"description": "German Betriebskostenabrechnung (utility-cost statement) for a Seeheim apartment — scanned cover letter + filled-in Abrechnung form. OCR-noisy text and handwritten field values. Useful for stress-testing canonical normalization and selector resolution on imperfect extraction.",
|
||||
"page_count": 2,
|
||||
"known_good_quote": "Ich bitte um Überweisung auf das Konto bei",
|
||||
"known_good_quote": "Auf der Rückseite finden Sie Ihre Abrechnung",
|
||||
"known_good_quote_page": 1,
|
||||
"characteristics": ["german", "umlauts", "scanned", "ocr-noisy", "form", "handwritten"]
|
||||
},
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>citation-evidence · spike</title>
|
||||
<title>citation-evidence</title>
|
||||
</head>
|
||||
<body style="margin:0">
|
||||
<div id="root"></div>
|
||||
|
||||
@@ -25,6 +25,9 @@
|
||||
"react-pdf-highlighter-plus": "^1.1.4"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@testing-library/dom": "^10.4.1",
|
||||
"@testing-library/react": "^16.3.2",
|
||||
"@testing-library/user-event": "^14.6.1",
|
||||
"@types/node": "^20.14.0",
|
||||
"@types/react": "^18.3.3",
|
||||
"@types/react-dom": "^18.3.0",
|
||||
@@ -34,6 +37,7 @@
|
||||
"eslint-plugin-boundaries": "^4.2.2",
|
||||
"eslint-plugin-import": "^2.30.0",
|
||||
"globals": "^15.9.0",
|
||||
"happy-dom": "^20.9.0",
|
||||
"typescript": "^5.5.4",
|
||||
"typescript-eslint": "^8.0.0",
|
||||
"vite": "^5.4.0",
|
||||
|
||||
183
pnpm-lock.yaml
generated
183
pnpm-lock.yaml
generated
@@ -21,6 +21,15 @@ importers:
|
||||
specifier: ^1.1.4
|
||||
version: 1.1.4(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(pdfjs-dist@4.10.38)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
|
||||
devDependencies:
|
||||
'@testing-library/dom':
|
||||
specifier: ^10.4.1
|
||||
version: 10.4.1
|
||||
'@testing-library/react':
|
||||
specifier: ^16.3.2
|
||||
version: 16.3.2(@testing-library/dom@10.4.1)(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
|
||||
'@testing-library/user-event':
|
||||
specifier: ^14.6.1
|
||||
version: 14.6.1(@testing-library/dom@10.4.1)
|
||||
'@types/node':
|
||||
specifier: ^20.14.0
|
||||
version: 20.19.41
|
||||
@@ -48,6 +57,9 @@ importers:
|
||||
globals:
|
||||
specifier: ^15.9.0
|
||||
version: 15.15.0
|
||||
happy-dom:
|
||||
specifier: ^20.9.0
|
||||
version: 20.9.0
|
||||
typescript:
|
||||
specifier: ^5.5.4
|
||||
version: 5.9.3
|
||||
@@ -59,7 +71,7 @@ importers:
|
||||
version: 5.4.21(@types/node@20.19.41)
|
||||
vitest:
|
||||
specifier: ^2.0.5
|
||||
version: 2.1.9(@types/node@20.19.41)
|
||||
version: 2.1.9(@types/node@20.19.41)(happy-dom@20.9.0)
|
||||
|
||||
packages:
|
||||
|
||||
@@ -134,6 +146,10 @@ packages:
|
||||
peerDependencies:
|
||||
'@babel/core': ^7.0.0-0
|
||||
|
||||
'@babel/runtime@7.29.2':
|
||||
resolution: {integrity: sha512-JiDShH45zKHWyGe4ZNVRrCjBz8Nh9TMmZG1kh4QTK8hCBTWBi8Da+i7s1fJw7/lYpM4ccepSNfqzZ/QvABBi5g==}
|
||||
engines: {node: '>=6.9.0'}
|
||||
|
||||
'@babel/template@7.28.6':
|
||||
resolution: {integrity: sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==}
|
||||
engines: {node: '>=6.9.0'}
|
||||
@@ -1051,9 +1067,37 @@ packages:
|
||||
'@tanstack/virtual-core@3.15.0':
|
||||
resolution: {integrity: sha512-0AwPGx0I8QxPYjAxShT/+z+ZOe9u8mW5rsXvivCTjRfRmz9a43+3mRyi4wwlyoUqOC56q/jatKa0Bh9M99BEHQ==}
|
||||
|
||||
'@testing-library/dom@10.4.1':
|
||||
resolution: {integrity: sha512-o4PXJQidqJl82ckFaXUeoAW+XysPLauYI43Abki5hABd853iMhitooc6znOnczgbTYmEP6U6/y1ZyKAIsvMKGg==}
|
||||
engines: {node: '>=18'}
|
||||
|
||||
'@testing-library/react@16.3.2':
|
||||
resolution: {integrity: sha512-XU5/SytQM+ykqMnAnvB2umaJNIOsLF3PVv//1Ew4CTcpz0/BRyy/af40qqrt7SjKpDdT1saBMc42CUok5gaw+g==}
|
||||
engines: {node: '>=18'}
|
||||
peerDependencies:
|
||||
'@testing-library/dom': ^10.0.0
|
||||
'@types/react': ^18.0.0 || ^19.0.0
|
||||
'@types/react-dom': ^18.0.0 || ^19.0.0
|
||||
react: ^18.0.0 || ^19.0.0
|
||||
react-dom: ^18.0.0 || ^19.0.0
|
||||
peerDependenciesMeta:
|
||||
'@types/react':
|
||||
optional: true
|
||||
'@types/react-dom':
|
||||
optional: true
|
||||
|
||||
'@testing-library/user-event@14.6.1':
|
||||
resolution: {integrity: sha512-vq7fv0rnt+QTXgPxr5Hjc210p6YKq2kmdziLgnsZGgLJ9e6VAShx1pACLuRjd/AS/sr7phAR58OIIpf0LlmQNw==}
|
||||
engines: {node: '>=12', npm: '>=6'}
|
||||
peerDependencies:
|
||||
'@testing-library/dom': '>=7.21.4'
|
||||
|
||||
'@tybys/wasm-util@0.10.2':
|
||||
resolution: {integrity: sha512-RoBvJ2X0wuKlWFIjrwffGw1IqZHKQqzIchKaadZZfnNpsAYp2mM0h36JtPCjNDAHGgYez/15uMBpfGwchhiMgg==}
|
||||
|
||||
'@types/aria-query@5.0.4':
|
||||
resolution: {integrity: sha512-rfT93uj5s0PRL7EzccGMs3brplhcrghnDoV26NqKhCAS1hVo+WdNsPvE/yb6ilfr5hi2MEk6d5EWJTKdxg8jVw==}
|
||||
|
||||
'@types/babel__core@7.20.5':
|
||||
resolution: {integrity: sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA==}
|
||||
|
||||
@@ -1092,6 +1136,12 @@ packages:
|
||||
'@types/react@18.3.29':
|
||||
resolution: {integrity: sha512-ch0qJdr2JY0r04NXSprbK6TXOgnaJ1Tz23fm5W+z0/CBah6BSBc3n96h7K9GOtwh0HrilNWHIBzE1Ko4Dcw/Wg==}
|
||||
|
||||
'@types/whatwg-mimetype@3.0.2':
|
||||
resolution: {integrity: sha512-c2AKvDT8ToxLIOUlN51gTiHXflsfIFisS4pO7pDPoKouJCESkhZnEy623gwP9laCy5lnLDAw1vAzu2vM2YLOrA==}
|
||||
|
||||
'@types/ws@8.18.1':
|
||||
resolution: {integrity: sha512-ThVF6DCVhA8kUGy+aazFQ4kXQ7E1Ty7A3ypFOe0IcJV8O/M511G99AW24irKrW56Wt44yG9+ij8FaqoBGkuBXg==}
|
||||
|
||||
'@typescript-eslint/eslint-plugin@8.59.4':
|
||||
resolution: {integrity: sha512-PegsU+XfyJJNjd4+u/k6f9yTyp0lEXXiPopUNobZcIAUJFGICFLN+sP0Rb3JehVmiij1Ph0dFGYqODoRo/2+6A==}
|
||||
engines: {node: ^18.18.0 || ^20.9.0 || >=21.1.0}
|
||||
@@ -1309,10 +1359,18 @@ packages:
|
||||
ajv@6.15.0:
|
||||
resolution: {integrity: sha512-fgFx7Hfoq60ytK2c7DhnF8jIvzYgOMxfugjLOSMHjLIPgenqa7S7oaagATUq99mV6IYvN2tRmC0wnTYX6iPbMw==}
|
||||
|
||||
ansi-regex@5.0.1:
|
||||
resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==}
|
||||
engines: {node: '>=8'}
|
||||
|
||||
ansi-styles@4.3.0:
|
||||
resolution: {integrity: sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==}
|
||||
engines: {node: '>=8'}
|
||||
|
||||
ansi-styles@5.2.0:
|
||||
resolution: {integrity: sha512-Cxwpt2SfTzTtXcfOlzGEee8O+c+MmUgGrNiBcXnuWxuFJHe6a5Hz7qwhwe5OgaSYI0IJvkLqWX1ASG+cJOkEiA==}
|
||||
engines: {node: '>=10'}
|
||||
|
||||
argparse@2.0.1:
|
||||
resolution: {integrity: sha512-8+9WqebbFzpX9OR+Wa6O29asIogeRMzcGtAINdpMHHyAg10f05aSFVBbcEqGf/PXw1EjAZ+q2/bEBg3DvurK3Q==}
|
||||
|
||||
@@ -1320,6 +1378,9 @@ packages:
|
||||
resolution: {integrity: sha512-ik3ZgC9dY/lYVVM++OISsaYDeg1tb0VtP5uL3ouh1koGOaUMDPpbFIei4JkFimWUFPn90sbMNMXQAIVOlnYKJA==}
|
||||
engines: {node: '>=10'}
|
||||
|
||||
aria-query@5.3.0:
|
||||
resolution: {integrity: sha512-b0P0sZPKtyu8HkeRAfCq0IfURZK+SuwMjY1UXGBU27wpAiTwQAIlq56IbIO+ytk/JjS1fMR14ee5WBBfKi5J6A==}
|
||||
|
||||
array-buffer-byte-length@1.0.2:
|
||||
resolution: {integrity: sha512-LHE+8BuR7RYGDKvnrmcuSq3tDcKv9OFEXQt/HpbZhY7V6h0zlUXutnAD82GiFx9rdieCMjkvtcsPqBwgUl1Iiw==}
|
||||
engines: {node: '>= 0.4'}
|
||||
@@ -1490,6 +1551,10 @@ packages:
|
||||
resolution: {integrity: sha512-8QmQKqEASLd5nx0U1B1okLElbUuuttJ/AnYmRXbbbGDWh6uS208EjD4Xqq/I9wK7u0v6O08XhTWnt5XtEbR6Dg==}
|
||||
engines: {node: '>= 0.4'}
|
||||
|
||||
dequal@2.0.3:
|
||||
resolution: {integrity: sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA==}
|
||||
engines: {node: '>=6'}
|
||||
|
||||
detect-node-es@1.1.0:
|
||||
resolution: {integrity: sha512-ypdmJU/TbBby2Dxibuv7ZLW3Bs1QEmM7nHjEANfohJLvE0XVujisn1qPJcZxg+qDucsr+bP6fLD1rPS3AhJ7EQ==}
|
||||
|
||||
@@ -1497,6 +1562,9 @@ packages:
|
||||
resolution: {integrity: sha512-35mSku4ZXK0vfCuHEDAwt55dg2jNajHZ1odvF+8SSr82EsZY4QmXfuWso8oEd8zRhVObSN18aM0CjSdoBX7zIw==}
|
||||
engines: {node: '>=0.10.0'}
|
||||
|
||||
dom-accessibility-api@0.5.16:
|
||||
resolution: {integrity: sha512-X7BJ2yElsnOJ30pZF4uIIDfBEVgF4XEBxL9Bxhy6dnrm5hkzqmsWHGTiHqRiITNhMyFLyAiWndIJP7Z1NTteDg==}
|
||||
|
||||
dunder-proto@1.0.1:
|
||||
resolution: {integrity: sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==}
|
||||
engines: {node: '>= 0.4'}
|
||||
@@ -1504,6 +1572,10 @@ packages:
|
||||
electron-to-chromium@1.5.361:
|
||||
resolution: {integrity: sha512-Q6Hts7N9FnJc5LeGRINFvLhCI9xZmNtTDe5ZbcVezQz7cU4a8Aua3GH1b8J2XY8Al9PF+OCwYqhgsOOheMdvkA==}
|
||||
|
||||
entities@7.0.1:
|
||||
resolution: {integrity: sha512-TWrgLOFUQTH994YUyl1yT4uyavY5nNB5muff+RtWaqNVCAK408b5ZnnbNAUEWLTCpum9w6arT70i1XdQ4UeOPA==}
|
||||
engines: {node: '>=0.12'}
|
||||
|
||||
es-abstract@1.24.2:
|
||||
resolution: {integrity: sha512-2FpH9Q5i2RRwyEP1AylXe6nYLR5OhaJTZwmlcP0dL/+JCbgg7yyEo/sEK6HeGZRf3dFpWwThaRHVApXSkW3xeg==}
|
||||
engines: {node: '>= 0.4'}
|
||||
@@ -1781,6 +1853,10 @@ packages:
|
||||
resolution: {integrity: sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==}
|
||||
engines: {node: '>= 0.4'}
|
||||
|
||||
happy-dom@20.9.0:
|
||||
resolution: {integrity: sha512-GZZ9mKe8r646NUAf/zemnGbjYh4Bt8/MqASJY+pSm5ZDtc3YQox+4gsLI7yi1hba6o+eCsGxpHn5+iEVn31/FQ==}
|
||||
engines: {node: '>=20.0.0'}
|
||||
|
||||
has-bigints@1.1.0:
|
||||
resolution: {integrity: sha512-R3pbpkcIqv2Pm3dUwgjclDRVmWpTJW2DcMzcIhEXEx1oh/CEMObMm3KLmRJOdvhM7o4uQBnwr8pzRK2sJWIqfg==}
|
||||
engines: {node: '>= 0.4'}
|
||||
@@ -1999,6 +2075,10 @@ packages:
|
||||
peerDependencies:
|
||||
react: ^16.5.1 || ^17.0.0 || ^18.0.0 || ^19.0.0
|
||||
|
||||
lz-string@1.5.0:
|
||||
resolution: {integrity: sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==}
|
||||
hasBin: true
|
||||
|
||||
magic-string@0.30.21:
|
||||
resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}
|
||||
|
||||
@@ -2147,6 +2227,10 @@ packages:
|
||||
resolution: {integrity: sha512-vkcDPrRZo1QZLbn5RLGPpg/WmIQ65qoWWhcGKf/b5eplkkarX0m9z8ppCat4mlOqUsWpyNuYgO3VRyrYHSzX5g==}
|
||||
engines: {node: '>= 0.8.0'}
|
||||
|
||||
pretty-format@27.5.1:
|
||||
resolution: {integrity: sha512-Qb1gy5OrP5+zDf2Bvnzdl3jsTf1qXVMazbvCoKhtKqVs4/YK4ozX4gKQJJVyNe+cajNPn0KoC0MC3FUmaHWEmQ==}
|
||||
engines: {node: ^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0}
|
||||
|
||||
prop-types@15.8.1:
|
||||
resolution: {integrity: sha512-oj87CgZICdulUohogVAR7AjlC0327U4el4L6eAvOqCeudMDVU0NThNaV+b9Df4dXgSP1gXMTnPdhfe/2qDH5cg==}
|
||||
|
||||
@@ -2174,6 +2258,9 @@ packages:
|
||||
react-is@16.13.1:
|
||||
resolution: {integrity: sha512-24e6ynE2H+OKt4kqsOvNd8kBpV65zoxbA4BVsEOB3ARVWQki/DHzaUoC5KuON/BiccDaCCTZBuOcfZs70kR8bQ==}
|
||||
|
||||
react-is@17.0.2:
|
||||
resolution: {integrity: sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w==}
|
||||
|
||||
react-pdf-highlighter-plus@1.1.4:
|
||||
resolution: {integrity: sha512-cJPFZnKjp4mmPjnamh11eC2I0W4waFAwLLG1E3mTg4TQRyMyUY+C6SyUm8MAcQnogbaXIAvCXP9B4hsnTSflnA==}
|
||||
peerDependencies:
|
||||
@@ -2542,6 +2629,10 @@ packages:
|
||||
jsdom:
|
||||
optional: true
|
||||
|
||||
whatwg-mimetype@3.0.0:
|
||||
resolution: {integrity: sha512-nt+N2dzIutVRxARx1nghPKGv1xHikU7HKdfafKkLNLindmPU/ch3U31NOCGGA/dmPcmb1VlofO0vnKAcsm0o/Q==}
|
||||
engines: {node: '>=12'}
|
||||
|
||||
which-boxed-primitive@1.1.1:
|
||||
resolution: {integrity: sha512-TbX3mj8n0odCBFVlY8AxkqcHASw3L60jIuF8jFP78az3C2YhmGvqbHBpAjTRH2/xqYunrJ9g1jSyjCjpoWzIAA==}
|
||||
engines: {node: '>= 0.4'}
|
||||
@@ -2572,6 +2663,18 @@ packages:
|
||||
resolution: {integrity: sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==}
|
||||
engines: {node: '>=0.10.0'}
|
||||
|
||||
ws@8.21.0:
|
||||
resolution: {integrity: sha512-Vsp28b7DRcimFQvrqu2Wek3z1iYxDCWqHYB8Qsnk/S4RfaCQzPGPyBNuVjJV3cd6UiKtUtp6sNM77gWvzcCH+g==}
|
||||
engines: {node: '>=10.0.0'}
|
||||
peerDependencies:
|
||||
bufferutil: ^4.0.1
|
||||
utf-8-validate: '>=5.0.2'
|
||||
peerDependenciesMeta:
|
||||
bufferutil:
|
||||
optional: true
|
||||
utf-8-validate:
|
||||
optional: true
|
||||
|
||||
yallist@3.1.1:
|
||||
resolution: {integrity: sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==}
|
||||
|
||||
@@ -2670,6 +2773,8 @@ snapshots:
|
||||
'@babel/core': 7.29.0
|
||||
'@babel/helper-plugin-utils': 7.28.6
|
||||
|
||||
'@babel/runtime@7.29.2': {}
|
||||
|
||||
'@babel/template@7.28.6':
|
||||
dependencies:
|
||||
'@babel/code-frame': 7.29.0
|
||||
@@ -3482,11 +3587,38 @@ snapshots:
|
||||
|
||||
'@tanstack/virtual-core@3.15.0': {}
|
||||
|
||||
'@testing-library/dom@10.4.1':
|
||||
dependencies:
|
||||
'@babel/code-frame': 7.29.0
|
||||
'@babel/runtime': 7.29.2
|
||||
'@types/aria-query': 5.0.4
|
||||
aria-query: 5.3.0
|
||||
dom-accessibility-api: 0.5.16
|
||||
lz-string: 1.5.0
|
||||
picocolors: 1.1.1
|
||||
pretty-format: 27.5.1
|
||||
|
||||
'@testing-library/react@16.3.2(@testing-library/dom@10.4.1)(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)':
|
||||
dependencies:
|
||||
'@babel/runtime': 7.29.2
|
||||
'@testing-library/dom': 10.4.1
|
||||
react: 18.3.1
|
||||
react-dom: 18.3.1(react@18.3.1)
|
||||
optionalDependencies:
|
||||
'@types/react': 18.3.29
|
||||
'@types/react-dom': 18.3.7(@types/react@18.3.29)
|
||||
|
||||
'@testing-library/user-event@14.6.1(@testing-library/dom@10.4.1)':
|
||||
dependencies:
|
||||
'@testing-library/dom': 10.4.1
|
||||
|
||||
'@tybys/wasm-util@0.10.2':
|
||||
dependencies:
|
||||
tslib: 2.8.1
|
||||
optional: true
|
||||
|
||||
'@types/aria-query@5.0.4': {}
|
||||
|
||||
'@types/babel__core@7.20.5':
|
||||
dependencies:
|
||||
'@babel/parser': 7.29.3
|
||||
@@ -3531,6 +3663,12 @@ snapshots:
|
||||
'@types/prop-types': 15.7.15
|
||||
csstype: 3.2.3
|
||||
|
||||
'@types/whatwg-mimetype@3.0.2': {}
|
||||
|
||||
'@types/ws@8.18.1':
|
||||
dependencies:
|
||||
'@types/node': 20.19.41
|
||||
|
||||
'@typescript-eslint/eslint-plugin@8.59.4(@typescript-eslint/parser@8.59.4(eslint@9.39.4)(typescript@5.9.3))(eslint@9.39.4)(typescript@5.9.3)':
|
||||
dependencies:
|
||||
'@eslint-community/regexpp': 4.12.2
|
||||
@@ -3757,16 +3895,24 @@ snapshots:
|
||||
json-schema-traverse: 0.4.1
|
||||
uri-js: 4.4.1
|
||||
|
||||
ansi-regex@5.0.1: {}
|
||||
|
||||
ansi-styles@4.3.0:
|
||||
dependencies:
|
||||
color-convert: 2.0.1
|
||||
|
||||
ansi-styles@5.2.0: {}
|
||||
|
||||
argparse@2.0.1: {}
|
||||
|
||||
aria-hidden@1.2.6:
|
||||
dependencies:
|
||||
tslib: 2.8.1
|
||||
|
||||
aria-query@5.3.0:
|
||||
dependencies:
|
||||
dequal: 2.0.3
|
||||
|
||||
array-buffer-byte-length@1.0.2:
|
||||
dependencies:
|
||||
call-bound: 1.0.4
|
||||
@@ -3956,12 +4102,16 @@ snapshots:
|
||||
has-property-descriptors: 1.0.2
|
||||
object-keys: 1.1.1
|
||||
|
||||
dequal@2.0.3: {}
|
||||
|
||||
detect-node-es@1.1.0: {}
|
||||
|
||||
doctrine@2.1.0:
|
||||
dependencies:
|
||||
esutils: 2.0.3
|
||||
|
||||
dom-accessibility-api@0.5.16: {}
|
||||
|
||||
dunder-proto@1.0.1:
|
||||
dependencies:
|
||||
call-bind-apply-helpers: 1.0.2
|
||||
@@ -3970,6 +4120,8 @@ snapshots:
|
||||
|
||||
electron-to-chromium@1.5.361: {}
|
||||
|
||||
entities@7.0.1: {}
|
||||
|
||||
es-abstract@1.24.2:
|
||||
dependencies:
|
||||
array-buffer-byte-length: 1.0.2
|
||||
@@ -4352,6 +4504,18 @@ snapshots:
|
||||
|
||||
gopd@1.2.0: {}
|
||||
|
||||
happy-dom@20.9.0:
|
||||
dependencies:
|
||||
'@types/node': 20.19.41
|
||||
'@types/whatwg-mimetype': 3.0.2
|
||||
'@types/ws': 8.18.1
|
||||
entities: 7.0.1
|
||||
whatwg-mimetype: 3.0.0
|
||||
ws: 8.21.0
|
||||
transitivePeerDependencies:
|
||||
- bufferutil
|
||||
- utf-8-validate
|
||||
|
||||
has-bigints@1.1.0: {}
|
||||
|
||||
has-flag@4.0.0: {}
|
||||
@@ -4558,6 +4722,8 @@ snapshots:
|
||||
dependencies:
|
||||
react: 18.3.1
|
||||
|
||||
lz-string@1.5.0: {}
|
||||
|
||||
magic-string@0.30.21:
|
||||
dependencies:
|
||||
'@jridgewell/sourcemap-codec': 1.5.5
|
||||
@@ -4704,6 +4870,12 @@ snapshots:
|
||||
|
||||
prelude-ls@1.2.1: {}
|
||||
|
||||
pretty-format@27.5.1:
|
||||
dependencies:
|
||||
ansi-regex: 5.0.1
|
||||
ansi-styles: 5.2.0
|
||||
react-is: 17.0.2
|
||||
|
||||
prop-types@15.8.1:
|
||||
dependencies:
|
||||
loose-envify: 1.4.0
|
||||
@@ -4732,6 +4904,8 @@ snapshots:
|
||||
|
||||
react-is@16.13.1: {}
|
||||
|
||||
react-is@17.0.2: {}
|
||||
|
||||
react-pdf-highlighter-plus@1.1.4(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(pdfjs-dist@4.10.38)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
|
||||
dependencies:
|
||||
'@radix-ui/react-collapsible': 1.1.12(@types/react-dom@18.3.7(@types/react@18.3.29))(@types/react@18.3.29)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
|
||||
@@ -5179,7 +5353,7 @@ snapshots:
|
||||
'@types/node': 20.19.41
|
||||
fsevents: 2.3.3
|
||||
|
||||
vitest@2.1.9(@types/node@20.19.41):
|
||||
vitest@2.1.9(@types/node@20.19.41)(happy-dom@20.9.0):
|
||||
dependencies:
|
||||
'@vitest/expect': 2.1.9
|
||||
'@vitest/mocker': 2.1.9(vite@5.4.21(@types/node@20.19.41))
|
||||
@@ -5203,6 +5377,7 @@ snapshots:
|
||||
why-is-node-running: 2.3.0
|
||||
optionalDependencies:
|
||||
'@types/node': 20.19.41
|
||||
happy-dom: 20.9.0
|
||||
transitivePeerDependencies:
|
||||
- less
|
||||
- lightningcss
|
||||
@@ -5214,6 +5389,8 @@ snapshots:
|
||||
- supports-color
|
||||
- terser
|
||||
|
||||
whatwg-mimetype@3.0.0: {}
|
||||
|
||||
which-boxed-primitive@1.1.1:
|
||||
dependencies:
|
||||
is-bigint: 1.1.0
|
||||
@@ -5266,6 +5443,8 @@ snapshots:
|
||||
|
||||
word-wrap@1.2.5: {}
|
||||
|
||||
ws@8.21.0: {}
|
||||
|
||||
yallist@3.1.1: {}
|
||||
|
||||
yocto-queue@0.1.0: {}
|
||||
|
||||
@@ -5,3 +5,9 @@ export {
|
||||
type PdfSpikeViewerProps,
|
||||
type StoredAnnotation,
|
||||
} from "./pdf-viewer-adapter-spike";
|
||||
export {
|
||||
createSelectors,
|
||||
resolveSelectors,
|
||||
DEFAULT_CONTEXT_CHARS,
|
||||
type CreateSelectorsOptions,
|
||||
} from "./selectors";
|
||||
|
||||
136
src/anchor/selectors/create.test.ts
Normal file
136
src/anchor/selectors/create.test.ts
Normal file
@@ -0,0 +1,136 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import type { DocumentRepresentation } from "@shared/document";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
import type {
|
||||
PdfPageTextSelector,
|
||||
PdfRectSelector,
|
||||
TextPositionSelector,
|
||||
TextQuoteSelector,
|
||||
} from "@shared/selector";
|
||||
import { createSelectors } from "./create";
|
||||
import type { PdfSelectionCapture } from "../types";
|
||||
|
||||
function repr(canonicalText: string): DocumentRepresentation {
|
||||
const pageLength = canonicalText.length;
|
||||
return {
|
||||
id: "rep_test" as RepresentationId,
|
||||
documentId: "doc_test" as DocumentId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: "test",
|
||||
canonicalText,
|
||||
pageMap: [{ page: 1, width: 595, height: 842 }],
|
||||
offsetMap: [
|
||||
{ page: 1, globalStart: 0, globalEnd: pageLength, pageLength },
|
||||
],
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
}
|
||||
|
||||
function capture(text: string, page = 1, rectsCount = 1): PdfSelectionCapture {
|
||||
return {
|
||||
kind: "pdf",
|
||||
text,
|
||||
page,
|
||||
rects: Array.from({ length: rectsCount }, (_, i) => ({
|
||||
x: 0.1,
|
||||
y: 0.2 + i * 0.05,
|
||||
width: 0.5,
|
||||
height: 0.04,
|
||||
})),
|
||||
boundingRect: { x: 0.1, y: 0.2, width: 0.5, height: 0.04 * rectsCount },
|
||||
};
|
||||
}
|
||||
|
||||
describe("createSelectors", () => {
|
||||
const text = "The quick brown fox jumps over the lazy dog near the river bank.";
|
||||
const representation = repr(text);
|
||||
|
||||
it("always includes a TextQuoteSelector with prefix and suffix from canonical text", () => {
|
||||
const sels = createSelectors(capture("brown fox"), representation);
|
||||
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector");
|
||||
expect(quote).toBeDefined();
|
||||
expect(quote!.exact).toBe("brown fox");
|
||||
expect(quote!.prefix).toBe("The quick ");
|
||||
expect(quote!.suffix).toBe(" jumps over the lazy dog near th");
|
||||
});
|
||||
|
||||
it("includes a TextPositionSelector pointing at the matched offset", () => {
|
||||
const sels = createSelectors(capture("brown fox"), representation);
|
||||
const pos = sels.find((s): s is TextPositionSelector => s.type === "TextPositionSelector");
|
||||
expect(pos).toBeDefined();
|
||||
expect(pos!.start).toBe(text.indexOf("brown fox"));
|
||||
expect(pos!.end).toBe(text.indexOf("brown fox") + "brown fox".length);
|
||||
});
|
||||
|
||||
it("includes a PdfRectSelector mirroring the capture's page and rects", () => {
|
||||
const c = capture("brown fox", 1, 2);
|
||||
const sels = createSelectors(c, representation);
|
||||
const rect = sels.find((s): s is PdfRectSelector => s.type === "PdfRectSelector");
|
||||
expect(rect).toBeDefined();
|
||||
expect(rect!.page).toBe(1);
|
||||
expect(rect!.rects).toEqual(c.rects);
|
||||
});
|
||||
|
||||
it("includes a PdfPageTextSelector when the match falls inside the capture's page range", () => {
|
||||
const sels = createSelectors(capture("brown fox"), representation);
|
||||
const pageText = sels.find((s): s is PdfPageTextSelector => s.type === "PdfPageTextSelector");
|
||||
expect(pageText).toBeDefined();
|
||||
expect(pageText!.page).toBe(1);
|
||||
expect(pageText!.start).toBe(text.indexOf("brown fox"));
|
||||
});
|
||||
|
||||
it("omits the TextPositionSelector when the quote cannot be found in canonical text", () => {
|
||||
const sels = createSelectors(capture("nonexistent phrase"), representation);
|
||||
const pos = sels.find((s) => s.type === "TextPositionSelector");
|
||||
expect(pos).toBeUndefined();
|
||||
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector");
|
||||
expect(quote!.exact).toBe("nonexistent phrase");
|
||||
expect(quote!.prefix).toBeUndefined();
|
||||
expect(quote!.suffix).toBeUndefined();
|
||||
});
|
||||
|
||||
it("clamps prefix at the start of the canonical text", () => {
|
||||
const sels = createSelectors(capture("The quick"), representation);
|
||||
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
|
||||
expect(quote.prefix).toBeUndefined();
|
||||
expect(quote.suffix).toBe(" brown fox jumps over the lazy d");
|
||||
});
|
||||
|
||||
it("clamps suffix at the end of the canonical text", () => {
|
||||
const sels = createSelectors(capture("river bank."), representation);
|
||||
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
|
||||
expect(quote.prefix).toBe("umps over the lazy dog near the ");
|
||||
expect(quote.suffix).toBeUndefined();
|
||||
});
|
||||
|
||||
it("honors a custom contextChars option", () => {
|
||||
const sels = createSelectors(capture("brown fox"), representation, { contextChars: 4 });
|
||||
const quote = sels.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector")!;
|
||||
expect(quote.prefix).toBe("ick ");
|
||||
expect(quote.suffix).toBe(" jum");
|
||||
});
|
||||
|
||||
it("prefers the on-page match when the quote appears on multiple pages", () => {
|
||||
// Two-page representation where the quote appears once per page.
|
||||
const canonical = "alpha echo bravo" + "\n\n" + "charlie echo delta";
|
||||
const rep: DocumentRepresentation = {
|
||||
id: "rep_multi" as RepresentationId,
|
||||
documentId: "doc_multi" as DocumentId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: "h",
|
||||
canonicalText: canonical,
|
||||
pageMap: [
|
||||
{ page: 1, width: 100, height: 100 },
|
||||
{ page: 2, width: 100, height: 100 },
|
||||
],
|
||||
offsetMap: [
|
||||
{ page: 1, globalStart: 0, globalEnd: 18, pageLength: 18 },
|
||||
{ page: 2, globalStart: 18, globalEnd: canonical.length, pageLength: canonical.length - 18 },
|
||||
],
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
const sels = createSelectors(capture("echo", 2), rep);
|
||||
const pos = sels.find((s): s is TextPositionSelector => s.type === "TextPositionSelector")!;
|
||||
expect(pos.start).toBe(canonical.indexOf("echo", 18));
|
||||
});
|
||||
});
|
||||
157
src/anchor/selectors/create.ts
Normal file
157
src/anchor/selectors/create.ts
Normal file
@@ -0,0 +1,157 @@
|
||||
/**
|
||||
* Build the maximal `Selector[]` from a viewer's `SelectionCapture`.
|
||||
*
|
||||
* Implements the "always store all selector types that are available" rule
|
||||
* from `wiki/SharedContracts.md` §3 (selector redundancy) and the create
|
||||
* half of the `AnchorAdapter` contract in
|
||||
* `wiki/ArchitectureOverview.md` §3.3.
|
||||
*
|
||||
* Output guarantee: every returned `Selector[]` includes a
|
||||
* `TextQuoteSelector` (always) and adds `TextPositionSelector`,
|
||||
* `PdfRectSelector`, `PdfPageTextSelector` only when the underlying data
|
||||
* actually supports them. Resolvers can rely on the union being trimmed —
|
||||
* a missing selector means "not available", not "skipped".
|
||||
*/
|
||||
|
||||
import type { DocumentRepresentation } from "@shared/document";
|
||||
import { normalize } from "@shared/text/normalize";
|
||||
import type {
|
||||
PdfPageTextSelector,
|
||||
PdfRectSelector,
|
||||
Selector,
|
||||
TextPositionSelector,
|
||||
TextQuoteSelector,
|
||||
} from "@shared/selector";
|
||||
|
||||
import type { PdfSelectionCapture, SelectionCapture } from "../types";
|
||||
|
||||
/** Default characters of prefix/suffix context stored on TextQuoteSelector. */
|
||||
export const DEFAULT_CONTEXT_CHARS = 32;
|
||||
|
||||
export interface CreateSelectorsOptions {
|
||||
readonly contextChars?: number;
|
||||
}
|
||||
|
||||
export function createSelectors(
|
||||
capture: SelectionCapture,
|
||||
representation: DocumentRepresentation,
|
||||
options: CreateSelectorsOptions = {},
|
||||
): Selector[] {
|
||||
// `SelectionCapture` is a discriminated union. The DOM branch is `never`
|
||||
// in MVP, so the only runtime shape is `PdfSelectionCapture`.
|
||||
return createSelectorsFromPdfCapture(capture, representation, options);
|
||||
}
|
||||
|
||||
function createSelectorsFromPdfCapture(
|
||||
capture: PdfSelectionCapture,
|
||||
representation: DocumentRepresentation,
|
||||
options: CreateSelectorsOptions,
|
||||
): Selector[] {
|
||||
const contextChars = options.contextChars ?? DEFAULT_CONTEXT_CHARS;
|
||||
const normalizedQuote = normalize(capture.text).text;
|
||||
const out: Selector[] = [];
|
||||
|
||||
const canonicalText = representation.canonicalText ?? "";
|
||||
const positions = canonicalText.length > 0 && normalizedQuote.length > 0
|
||||
? findAllOccurrences(canonicalText, normalizedQuote)
|
||||
: [];
|
||||
|
||||
// Locate the match that falls on the capture's page (when offsetMap is
|
||||
// known); otherwise fall back to the first match. If there is no match,
|
||||
// we still emit a quote-only TextQuoteSelector so the annotation is
|
||||
// recoverable later if the representation is rebuilt.
|
||||
const pageRange = representation.offsetMap?.find((r) => r.page === capture.page);
|
||||
const matchOffset = pickMatch(positions, pageRange);
|
||||
|
||||
// 1. TextQuoteSelector — always included.
|
||||
if (normalizedQuote.length > 0) {
|
||||
const quote = matchOffset !== null
|
||||
? buildQuoteSelectorWithContext(canonicalText, matchOffset, normalizedQuote, contextChars)
|
||||
: ({ type: "TextQuoteSelector", exact: normalizedQuote } satisfies TextQuoteSelector);
|
||||
out.push(quote);
|
||||
}
|
||||
|
||||
// 2. TextPositionSelector — only when we have a unique-enough match.
|
||||
if (matchOffset !== null) {
|
||||
const pos: TextPositionSelector = {
|
||||
type: "TextPositionSelector",
|
||||
start: matchOffset,
|
||||
end: matchOffset + normalizedQuote.length,
|
||||
};
|
||||
out.push(pos);
|
||||
}
|
||||
|
||||
// 3. PdfRectSelector — straight from the capture; viewer-coordinate truth.
|
||||
if (capture.rects.length > 0) {
|
||||
const rect: PdfRectSelector = {
|
||||
type: "PdfRectSelector",
|
||||
page: capture.page,
|
||||
rects: capture.rects,
|
||||
};
|
||||
out.push(rect);
|
||||
}
|
||||
|
||||
// 4. PdfPageTextSelector — when we have offsetMap and a unique-enough match
|
||||
// that falls inside the capture's page range.
|
||||
if (matchOffset !== null && pageRange) {
|
||||
if (matchOffset >= pageRange.globalStart && matchOffset + normalizedQuote.length <= pageRange.globalEnd) {
|
||||
const pageText: PdfPageTextSelector = {
|
||||
type: "PdfPageTextSelector",
|
||||
page: capture.page,
|
||||
start: matchOffset - pageRange.globalStart,
|
||||
end: matchOffset - pageRange.globalStart + normalizedQuote.length,
|
||||
};
|
||||
out.push(pageText);
|
||||
}
|
||||
}
|
||||
|
||||
return out;
|
||||
}
|
||||
|
||||
function findAllOccurrences(haystack: string, needle: string): number[] {
|
||||
if (needle.length === 0) return [];
|
||||
const out: number[] = [];
|
||||
let from = 0;
|
||||
for (;;) {
|
||||
const idx = haystack.indexOf(needle, from);
|
||||
if (idx === -1) break;
|
||||
out.push(idx);
|
||||
from = idx + 1;
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
function pickMatch(
|
||||
positions: readonly number[],
|
||||
pageRange: { globalStart: number; globalEnd: number } | undefined,
|
||||
): number | null {
|
||||
if (positions.length === 0) return null;
|
||||
if (positions.length === 1) return positions[0]!;
|
||||
if (pageRange) {
|
||||
const onPage = positions.find(
|
||||
(p) => p >= pageRange.globalStart && p < pageRange.globalEnd,
|
||||
);
|
||||
if (onPage !== undefined) return onPage;
|
||||
}
|
||||
// Multiple matches and no page hint — return the first; resolve.ts will
|
||||
// need prefix/suffix to disambiguate.
|
||||
return positions[0]!;
|
||||
}
|
||||
|
||||
function buildQuoteSelectorWithContext(
|
||||
canonicalText: string,
|
||||
matchOffset: number,
|
||||
exact: string,
|
||||
contextChars: number,
|
||||
): TextQuoteSelector {
|
||||
const prefixStart = Math.max(0, matchOffset - contextChars);
|
||||
const suffixEnd = Math.min(canonicalText.length, matchOffset + exact.length + contextChars);
|
||||
const prefix = canonicalText.slice(prefixStart, matchOffset);
|
||||
const suffix = canonicalText.slice(matchOffset + exact.length, suffixEnd);
|
||||
return {
|
||||
type: "TextQuoteSelector",
|
||||
exact,
|
||||
...(prefix.length > 0 ? { prefix } : {}),
|
||||
...(suffix.length > 0 ? { suffix } : {}),
|
||||
};
|
||||
}
|
||||
6
src/anchor/selectors/index.ts
Normal file
6
src/anchor/selectors/index.ts
Normal file
@@ -0,0 +1,6 @@
|
||||
export {
|
||||
createSelectors,
|
||||
DEFAULT_CONTEXT_CHARS,
|
||||
type CreateSelectorsOptions,
|
||||
} from "./create";
|
||||
export { resolveSelectors } from "./resolve";
|
||||
137
src/anchor/selectors/resolve.test.ts
Normal file
137
src/anchor/selectors/resolve.test.ts
Normal file
@@ -0,0 +1,137 @@
|
||||
import { describe, expect, it } from "vitest";
|
||||
import type { DocumentRepresentation } from "@shared/document";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
import type { Selector } from "@shared/selector";
|
||||
import { resolveSelectors } from "./resolve";
|
||||
|
||||
function repr(canonicalText: string, pages = 1): DocumentRepresentation {
|
||||
const segmentLen = pages === 1
|
||||
? canonicalText.length
|
||||
: Math.floor(canonicalText.length / pages);
|
||||
const offsetMap = [];
|
||||
for (let i = 0; i < pages; i++) {
|
||||
const start = i * segmentLen;
|
||||
const end = i === pages - 1 ? canonicalText.length : start + segmentLen;
|
||||
offsetMap.push({ page: i + 1, globalStart: start, globalEnd: end, pageLength: end - start });
|
||||
}
|
||||
return {
|
||||
id: "rep_test" as RepresentationId,
|
||||
documentId: "doc_test" as DocumentId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: "test",
|
||||
canonicalText,
|
||||
pageMap: Array.from({ length: pages }, (_, i) => ({ page: i + 1, width: 595, height: 842 })),
|
||||
offsetMap,
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
}
|
||||
|
||||
describe("resolveSelectors", () => {
|
||||
const text = "The quick brown fox jumps over the lazy dog.";
|
||||
const representation = repr(text);
|
||||
const brownFoxStart = text.indexOf("brown fox");
|
||||
const brownFoxEnd = brownFoxStart + "brown fox".length;
|
||||
|
||||
it("returns 1.0 confidence when position and quote agree exactly", () => {
|
||||
const selectors: Selector[] = [
|
||||
{ type: "TextPositionSelector", start: brownFoxStart, end: brownFoxEnd },
|
||||
{ type: "TextQuoteSelector", exact: "brown fox" },
|
||||
];
|
||||
const r = resolveSelectors(selectors, representation);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(1.0);
|
||||
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
|
||||
expect(r.candidates[0]?.page).toBe(1);
|
||||
expect(r.usedSelectorTypes).toEqual(["TextPositionSelector", "TextQuoteSelector"]);
|
||||
});
|
||||
|
||||
it("falls back to quote search when position is stale, and records a warning", () => {
|
||||
const selectors: Selector[] = [
|
||||
{ type: "TextPositionSelector", start: 0, end: 9 }, // "The quick"
|
||||
{ type: "TextQuoteSelector", exact: "brown fox" },
|
||||
];
|
||||
const r = resolveSelectors(selectors, representation);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.95);
|
||||
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
|
||||
expect(r.warnings?.[0]).toMatch(/did not match/);
|
||||
expect(r.usedSelectorTypes).toEqual(["TextQuoteSelector"]);
|
||||
});
|
||||
|
||||
it("returns 0.85 for a position-only selector with no quote to verify", () => {
|
||||
const selectors: Selector[] = [
|
||||
{ type: "TextPositionSelector", start: brownFoxStart, end: brownFoxEnd },
|
||||
];
|
||||
const r = resolveSelectors(selectors, representation);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.85);
|
||||
});
|
||||
|
||||
it("returns 0.95 when only TextQuoteSelector is present and the quote is unique", () => {
|
||||
const r = resolveSelectors(
|
||||
[{ type: "TextQuoteSelector", exact: "brown fox" }],
|
||||
representation,
|
||||
);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.95);
|
||||
});
|
||||
|
||||
it("returns 0.9 when a duplicated quote is disambiguated by prefix/suffix", () => {
|
||||
const dup = "alpha echo bravo charlie echo delta";
|
||||
const r = resolveSelectors(
|
||||
[{ type: "TextQuoteSelector", exact: "echo", prefix: "charlie ", suffix: " delta" }],
|
||||
repr(dup),
|
||||
);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.9);
|
||||
expect(r.candidates[0]?.textPosition?.start).toBe(dup.indexOf("echo", 10));
|
||||
});
|
||||
|
||||
it("returns ambiguous when a duplicated quote cannot be disambiguated", () => {
|
||||
const dup = "echo and echo";
|
||||
const r = resolveSelectors(
|
||||
[{ type: "TextQuoteSelector", exact: "echo" }],
|
||||
repr(dup),
|
||||
);
|
||||
expect(r.status).toBe("ambiguous");
|
||||
expect(r.confidence).toBe(0.5);
|
||||
});
|
||||
|
||||
it("falls back to PdfPageTextSelector via the OffsetMap", () => {
|
||||
// Single page, "brown fox" at offset 10..19.
|
||||
const r = resolveSelectors(
|
||||
[{ type: "PdfPageTextSelector", page: 1, start: brownFoxStart, end: brownFoxEnd }],
|
||||
representation,
|
||||
);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.8);
|
||||
expect(r.candidates[0]?.textPosition).toEqual({ start: brownFoxStart, end: brownFoxEnd });
|
||||
expect(r.candidates[0]?.page).toBe(1);
|
||||
});
|
||||
|
||||
it("falls back to PdfRectSelector with page+rects only at 0.7 confidence", () => {
|
||||
const r = resolveSelectors(
|
||||
[{
|
||||
type: "PdfRectSelector",
|
||||
page: 2,
|
||||
rects: [{ x: 0.1, y: 0.2, width: 0.3, height: 0.04 }],
|
||||
}],
|
||||
repr(text, 1),
|
||||
);
|
||||
expect(r.status).toBe("resolved");
|
||||
expect(r.confidence).toBe(0.7);
|
||||
expect(r.candidates[0]?.page).toBe(2);
|
||||
expect(r.candidates[0]?.textPosition).toBeUndefined();
|
||||
expect(r.candidates[0]?.rects).toHaveLength(1);
|
||||
});
|
||||
|
||||
it("returns unresolved when nothing matches", () => {
|
||||
const r = resolveSelectors(
|
||||
[{ type: "TextQuoteSelector", exact: "missing string" }],
|
||||
representation,
|
||||
);
|
||||
expect(r.status).toBe("unresolved");
|
||||
expect(r.confidence).toBe(0);
|
||||
expect(r.candidates).toEqual([]);
|
||||
});
|
||||
});
|
||||
260
src/anchor/selectors/resolve.ts
Normal file
260
src/anchor/selectors/resolve.ts
Normal file
@@ -0,0 +1,260 @@
|
||||
/**
|
||||
* Resolve a `Selector[]` against a `DocumentRepresentation`.
|
||||
*
|
||||
* Implements the resolution strategy from `wiki/ArchitectureOverview.md` §7,
|
||||
* MVP-trimmed:
|
||||
*
|
||||
* 1. Try `TextPositionSelector` (cheapest — direct slice).
|
||||
* 2. Verify with `TextQuoteSelector` at that position.
|
||||
* 3. Try `TextQuoteSelector` on its own. If multiple matches, disambiguate
|
||||
* by prefix/suffix.
|
||||
* 4. Try `PdfPageTextSelector` (page-local offsets through the OffsetMap).
|
||||
* 5. Fall back to `PdfRectSelector` for a page+rects-only target.
|
||||
* 6. Return `unresolved` if nothing above succeeds.
|
||||
*
|
||||
* Fuzzy matching is out of scope here; a later workplan owns it.
|
||||
*
|
||||
* Confidence ladder (0..1):
|
||||
* 1.00 — TextPosition + TextQuote agree exactly
|
||||
* 0.95 — TextQuote unique match (no position to cross-check)
|
||||
* 0.90 — TextQuote disambiguated by prefix/suffix
|
||||
* 0.85 — TextPosition only (no quote to cross-check)
|
||||
* 0.80 — PdfPageTextSelector resolved via OffsetMap
|
||||
* 0.70 — PdfRectSelector only (page+rects, no text verification)
|
||||
*/
|
||||
|
||||
import type { DocumentRepresentation } from "@shared/document";
|
||||
import type {
|
||||
PdfPageTextSelector,
|
||||
PdfRectSelector,
|
||||
Selector,
|
||||
SelectorType,
|
||||
TextPositionSelector,
|
||||
TextQuoteSelector,
|
||||
} from "@shared/selector";
|
||||
|
||||
import type { AnchorResolution, ResolvedAnchorTarget } from "../types";
|
||||
|
||||
export function resolveSelectors(
|
||||
selectors: readonly Selector[],
|
||||
representation: DocumentRepresentation,
|
||||
): AnchorResolution {
|
||||
const canonicalText = representation.canonicalText ?? "";
|
||||
const offsetMap = representation.offsetMap ?? [];
|
||||
const representationId = representation.id;
|
||||
|
||||
const byType = indexByType(selectors);
|
||||
const used: SelectorType[] = [];
|
||||
const warnings: string[] = [];
|
||||
|
||||
// 1 & 2. Try TextPositionSelector, verify with TextQuoteSelector.
|
||||
if (byType.TextPositionSelector && canonicalText.length > 0) {
|
||||
const pos = byType.TextPositionSelector;
|
||||
const slice = sliceSafely(canonicalText, pos.start, pos.end);
|
||||
if (slice !== null) {
|
||||
const quote = byType.TextQuoteSelector;
|
||||
if (quote) {
|
||||
if (slice === quote.exact) {
|
||||
used.push("TextPositionSelector", "TextQuoteSelector");
|
||||
return resolved(
|
||||
{ representationId, textPosition: { start: pos.start, end: pos.end }, ...pageFor(pos, offsetMap) },
|
||||
1.0,
|
||||
used,
|
||||
warnings,
|
||||
);
|
||||
}
|
||||
warnings.push(
|
||||
"TextPositionSelector slice did not match TextQuoteSelector.exact; falling back to quote search.",
|
||||
);
|
||||
} else {
|
||||
// Position with no quote to verify — accept at lower confidence.
|
||||
used.push("TextPositionSelector");
|
||||
return resolved(
|
||||
{ representationId, textPosition: { start: pos.start, end: pos.end }, ...pageFor(pos, offsetMap) },
|
||||
0.85,
|
||||
used,
|
||||
warnings,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 3. TextQuoteSelector on its own (or after the position fallback above).
|
||||
if (byType.TextQuoteSelector && canonicalText.length > 0) {
|
||||
const quoteResult = resolveByQuote(canonicalText, byType.TextQuoteSelector);
|
||||
if (quoteResult) {
|
||||
used.push("TextQuoteSelector");
|
||||
return resolved(
|
||||
{
|
||||
representationId,
|
||||
textPosition: { start: quoteResult.offset, end: quoteResult.offset + byType.TextQuoteSelector.exact.length },
|
||||
...pageFor({ start: quoteResult.offset, end: quoteResult.offset + byType.TextQuoteSelector.exact.length }, offsetMap),
|
||||
},
|
||||
quoteResult.confidence,
|
||||
used,
|
||||
warnings,
|
||||
quoteResult.status,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// 4. PdfPageTextSelector through OffsetMap.
|
||||
if (byType.PdfPageTextSelector && offsetMap.length > 0) {
|
||||
const pageText = byType.PdfPageTextSelector;
|
||||
const range = offsetMap.find((r) => r.page === pageText.page);
|
||||
if (range && pageText.start >= 0 && pageText.end <= range.pageLength && pageText.start < pageText.end) {
|
||||
const globalStart = range.globalStart + pageText.start;
|
||||
const globalEnd = range.globalStart + pageText.end;
|
||||
used.push("PdfPageTextSelector");
|
||||
return resolved(
|
||||
{
|
||||
representationId,
|
||||
page: pageText.page,
|
||||
textPosition: { start: globalStart, end: globalEnd },
|
||||
},
|
||||
0.8,
|
||||
used,
|
||||
warnings,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// 5. PdfRectSelector fallback (no text verification possible).
|
||||
if (byType.PdfRectSelector) {
|
||||
const rect = byType.PdfRectSelector;
|
||||
used.push("PdfRectSelector");
|
||||
return resolved(
|
||||
{ representationId, page: rect.page, rects: rect.rects },
|
||||
0.7,
|
||||
used,
|
||||
warnings,
|
||||
);
|
||||
}
|
||||
|
||||
return unresolved(warnings);
|
||||
}
|
||||
|
||||
interface QuoteResolutionResult {
|
||||
readonly offset: number;
|
||||
readonly confidence: number;
|
||||
readonly status: "resolved" | "ambiguous";
|
||||
}
|
||||
|
||||
function resolveByQuote(canonicalText: string, quote: TextQuoteSelector): QuoteResolutionResult | null {
|
||||
const positions = findAllOccurrences(canonicalText, quote.exact);
|
||||
if (positions.length === 0) return null;
|
||||
if (positions.length === 1) {
|
||||
return { offset: positions[0]!, confidence: 0.95, status: "resolved" };
|
||||
}
|
||||
// Multiple matches — try to disambiguate by prefix/suffix.
|
||||
const filtered = positions.filter((p) => prefixSuffixMatches(canonicalText, p, quote));
|
||||
if (filtered.length === 1) {
|
||||
return { offset: filtered[0]!, confidence: 0.9, status: "resolved" };
|
||||
}
|
||||
if (filtered.length > 1) {
|
||||
return { offset: filtered[0]!, confidence: 0.5, status: "ambiguous" };
|
||||
}
|
||||
// No prefix/suffix info or no matches with context — return ambiguous on first.
|
||||
return { offset: positions[0]!, confidence: 0.5, status: "ambiguous" };
|
||||
}
|
||||
|
||||
function prefixSuffixMatches(
|
||||
canonicalText: string,
|
||||
offset: number,
|
||||
quote: TextQuoteSelector,
|
||||
): boolean {
|
||||
if (quote.prefix !== undefined) {
|
||||
const prefixEnd = offset;
|
||||
const prefixStart = Math.max(0, prefixEnd - quote.prefix.length);
|
||||
const actualPrefix = canonicalText.slice(prefixStart, prefixEnd);
|
||||
if (!actualPrefix.endsWith(quote.prefix)) return false;
|
||||
}
|
||||
if (quote.suffix !== undefined) {
|
||||
const suffixStart = offset + quote.exact.length;
|
||||
const suffixEnd = Math.min(canonicalText.length, suffixStart + quote.suffix.length);
|
||||
const actualSuffix = canonicalText.slice(suffixStart, suffixEnd);
|
||||
if (!actualSuffix.startsWith(quote.suffix)) return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
interface SelectorIndex {
|
||||
TextQuoteSelector?: TextQuoteSelector;
|
||||
TextPositionSelector?: TextPositionSelector;
|
||||
PdfRectSelector?: PdfRectSelector;
|
||||
PdfPageTextSelector?: PdfPageTextSelector;
|
||||
}
|
||||
|
||||
function indexByType(selectors: readonly Selector[]): SelectorIndex {
|
||||
const idx: SelectorIndex = {};
|
||||
for (const s of selectors) {
|
||||
switch (s.type) {
|
||||
case "TextQuoteSelector":
|
||||
idx.TextQuoteSelector = s;
|
||||
break;
|
||||
case "TextPositionSelector":
|
||||
idx.TextPositionSelector = s;
|
||||
break;
|
||||
case "PdfRectSelector":
|
||||
idx.PdfRectSelector = s;
|
||||
break;
|
||||
case "PdfPageTextSelector":
|
||||
idx.PdfPageTextSelector = s;
|
||||
break;
|
||||
}
|
||||
}
|
||||
return idx;
|
||||
}
|
||||
|
||||
function sliceSafely(text: string, start: number, end: number): string | null {
|
||||
if (start < 0 || end > text.length || start >= end) return null;
|
||||
return text.slice(start, end);
|
||||
}
|
||||
|
||||
function pageFor(
|
||||
span: { start: number; end: number },
|
||||
offsetMap: readonly { page: number; globalStart: number; globalEnd: number }[],
|
||||
): { page?: number } {
|
||||
if (offsetMap.length === 0) return {};
|
||||
const range = offsetMap.find((r) => span.start >= r.globalStart && span.end <= r.globalEnd);
|
||||
return range ? { page: range.page } : {};
|
||||
}
|
||||
|
||||
function findAllOccurrences(haystack: string, needle: string): number[] {
|
||||
if (needle.length === 0) return [];
|
||||
const out: number[] = [];
|
||||
let from = 0;
|
||||
for (;;) {
|
||||
const idx = haystack.indexOf(needle, from);
|
||||
if (idx === -1) break;
|
||||
out.push(idx);
|
||||
from = idx + 1;
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
function resolved(
|
||||
target: ResolvedAnchorTarget,
|
||||
confidence: number,
|
||||
used: readonly SelectorType[],
|
||||
warnings: readonly string[],
|
||||
status: "resolved" | "ambiguous" = "resolved",
|
||||
): AnchorResolution {
|
||||
return {
|
||||
status,
|
||||
confidence,
|
||||
candidates: [target],
|
||||
usedSelectorTypes: used,
|
||||
...(warnings.length > 0 ? { warnings } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
function unresolved(warnings: readonly string[]): AnchorResolution {
|
||||
return {
|
||||
status: "unresolved",
|
||||
confidence: 0,
|
||||
candidates: [],
|
||||
usedSelectorTypes: [],
|
||||
...(warnings.length > 0 ? { warnings } : {}),
|
||||
};
|
||||
}
|
||||
40
src/app/App.tsx
Normal file
40
src/app/App.tsx
Normal file
@@ -0,0 +1,40 @@
|
||||
/**
|
||||
* App — the citation-evidence MVP shell.
|
||||
*
|
||||
* Three-pane layout per `wiki/ArchitectureOverview.md` §12.1:
|
||||
*
|
||||
* ┌────────────┬──────────────────┬────────────┐
|
||||
* │ Collection │ Document Viewer │ Evidence │
|
||||
* │ List │ │ Sidebar │
|
||||
* └────────────┴──────────────────┴────────────┘
|
||||
*
|
||||
* CE-WP-0002-T06 stops at "viewer shell is rendered, evidence list is
|
||||
* displayed". T07 wires the selection → annotation → evidence flow; T08
|
||||
* wires the sidebar-click → scroll-to-passage round-trip.
|
||||
*/
|
||||
|
||||
import {
|
||||
CollectionList,
|
||||
EngineProvider,
|
||||
EvidenceSidebar,
|
||||
ViewerShell,
|
||||
} from "@work/index";
|
||||
|
||||
export function App() {
|
||||
return (
|
||||
<EngineProvider>
|
||||
<div
|
||||
style={{
|
||||
display: "flex",
|
||||
height: "100vh",
|
||||
fontFamily: "system-ui, sans-serif",
|
||||
color: "#222",
|
||||
}}
|
||||
>
|
||||
<CollectionList />
|
||||
<ViewerShell />
|
||||
<EvidenceSidebar />
|
||||
</div>
|
||||
</EngineProvider>
|
||||
);
|
||||
}
|
||||
@@ -1,233 +0,0 @@
|
||||
/**
|
||||
* CE-WP-0002-T02 spike host page.
|
||||
*
|
||||
* Lists the fixtures from `fixtures/pdfs/manifest.json`, lets the user load
|
||||
* one in the spike PDF viewer, capture a selection (the viewer's
|
||||
* `onSelection` fires when text is selected), persist the resulting
|
||||
* selectors to `localStorage`, and on reload restore + scroll to them.
|
||||
*
|
||||
* Success looks like: select a quote → click "save" → reload the tab →
|
||||
* the highlight is rendered on the same passage and the page is scrolled
|
||||
* to it.
|
||||
*/
|
||||
|
||||
import { useEffect, useMemo, useState } from "react";
|
||||
import {
|
||||
PdfSpikeViewer,
|
||||
type PdfSelectionCapture,
|
||||
type StoredAnnotation,
|
||||
} from "@anchor/index";
|
||||
import type { Selector } from "@shared/selector";
|
||||
import { newId } from "@shared/ids";
|
||||
import manifest from "../../fixtures/pdfs/manifest.json";
|
||||
|
||||
interface FixtureEntry {
|
||||
id: string;
|
||||
filename: string;
|
||||
description: string;
|
||||
page_count: number;
|
||||
known_good_quote: string;
|
||||
known_good_quote_page: number;
|
||||
}
|
||||
|
||||
const FIXTURES: FixtureEntry[] = (manifest as { fixtures: FixtureEntry[] }).fixtures;
|
||||
|
||||
const STORAGE_KEY = "ce-wp-0002-spike-annotations-v1";
|
||||
|
||||
interface StoredEntry {
|
||||
id: string;
|
||||
fixtureId: string;
|
||||
text: string;
|
||||
selectors: Selector[];
|
||||
createdAt: string;
|
||||
}
|
||||
|
||||
function loadStore(): StoredEntry[] {
|
||||
try {
|
||||
const raw = localStorage.getItem(STORAGE_KEY);
|
||||
if (!raw) return [];
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
if (!Array.isArray(parsed)) return [];
|
||||
return parsed as StoredEntry[];
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function saveStore(entries: StoredEntry[]) {
|
||||
localStorage.setItem(STORAGE_KEY, JSON.stringify(entries));
|
||||
}
|
||||
|
||||
export function SpikeApp() {
|
||||
const [activeFixtureId, setActiveFixtureId] = useState<string | null>(null);
|
||||
const [entries, setEntries] = useState<StoredEntry[]>(() => loadStore());
|
||||
const [pending, setPending] = useState<
|
||||
| { capture: PdfSelectionCapture; selectors: Selector[] }
|
||||
| null
|
||||
>(null);
|
||||
const [scrollTo, setScrollTo] = useState<string | null>(null);
|
||||
|
||||
useEffect(() => {
|
||||
saveStore(entries);
|
||||
}, [entries]);
|
||||
|
||||
const activeFixture = useMemo(
|
||||
() => FIXTURES.find((f) => f.id === activeFixtureId) ?? null,
|
||||
[activeFixtureId],
|
||||
);
|
||||
|
||||
const annotationsForActive = useMemo<StoredAnnotation[]>(() => {
|
||||
if (!activeFixtureId) return [];
|
||||
return entries
|
||||
.filter((e) => e.fixtureId === activeFixtureId)
|
||||
.map((e) => ({ id: e.id, text: e.text, selectors: e.selectors }));
|
||||
}, [activeFixtureId, entries]);
|
||||
|
||||
function handleSave() {
|
||||
if (!pending || !activeFixtureId) return;
|
||||
const entry: StoredEntry = {
|
||||
id: newId("annotation"),
|
||||
fixtureId: activeFixtureId,
|
||||
text: pending.capture.text,
|
||||
selectors: pending.selectors,
|
||||
createdAt: new Date().toISOString(),
|
||||
};
|
||||
setEntries((prev) => [...prev, entry]);
|
||||
setPending(null);
|
||||
}
|
||||
|
||||
function handleClear() {
|
||||
if (!activeFixtureId) return;
|
||||
setEntries((prev) => prev.filter((e) => e.fixtureId !== activeFixtureId));
|
||||
}
|
||||
|
||||
return (
|
||||
<div style={{ display: "flex", height: "100vh", fontFamily: "system-ui, sans-serif" }}>
|
||||
<aside
|
||||
style={{
|
||||
width: 320,
|
||||
borderRight: "1px solid #ddd",
|
||||
padding: 12,
|
||||
overflow: "auto",
|
||||
flex: "0 0 320px",
|
||||
}}
|
||||
>
|
||||
<h2 style={{ marginTop: 0 }}>CE-WP-0002-T02 Spike</h2>
|
||||
<p style={{ fontSize: 12, color: "#555" }}>
|
||||
Pick a fixture, select text in the viewer, save, then reload the page
|
||||
to verify the highlight is restored.
|
||||
</p>
|
||||
<h3 style={{ fontSize: 14 }}>Fixtures</h3>
|
||||
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
|
||||
{FIXTURES.map((f) => (
|
||||
<li key={f.id} style={{ marginBottom: 6 }}>
|
||||
<button
|
||||
onClick={() => {
|
||||
setActiveFixtureId(f.id);
|
||||
setPending(null);
|
||||
setScrollTo(null);
|
||||
}}
|
||||
style={{
|
||||
display: "block",
|
||||
width: "100%",
|
||||
textAlign: "left",
|
||||
background: f.id === activeFixtureId ? "#e8f0ff" : "white",
|
||||
border: "1px solid #ccc",
|
||||
padding: 6,
|
||||
cursor: "pointer",
|
||||
}}
|
||||
>
|
||||
<div style={{ fontWeight: 600, fontSize: 13 }}>{f.id}</div>
|
||||
<div style={{ fontSize: 11, color: "#666" }}>
|
||||
{f.page_count} page{f.page_count === 1 ? "" : "s"} ·
|
||||
known-good p{f.known_good_quote_page}
|
||||
</div>
|
||||
<div style={{ fontSize: 11, color: "#888", marginTop: 2 }}>
|
||||
“{f.known_good_quote}”
|
||||
</div>
|
||||
</button>
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
|
||||
{activeFixture && (
|
||||
<>
|
||||
<h3 style={{ fontSize: 14, marginTop: 16 }}>Saved annotations</h3>
|
||||
{annotationsForActive.length === 0 && (
|
||||
<p style={{ fontSize: 12, color: "#888" }}>(none)</p>
|
||||
)}
|
||||
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
|
||||
{annotationsForActive.map((a) => (
|
||||
<li key={a.id} style={{ marginBottom: 4 }}>
|
||||
<button
|
||||
onClick={() => setScrollTo(a.id)}
|
||||
style={{
|
||||
display: "block",
|
||||
width: "100%",
|
||||
textAlign: "left",
|
||||
background: "#fff8d6",
|
||||
border: "1px solid #ccc",
|
||||
padding: 4,
|
||||
cursor: "pointer",
|
||||
fontSize: 11,
|
||||
}}
|
||||
>
|
||||
{a.text.slice(0, 80)}
|
||||
{a.text.length > 80 ? "…" : ""}
|
||||
</button>
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
{annotationsForActive.length > 0 && (
|
||||
<button
|
||||
onClick={handleClear}
|
||||
style={{ marginTop: 8, fontSize: 11 }}
|
||||
>
|
||||
Clear all for this fixture
|
||||
</button>
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
|
||||
{pending && (
|
||||
<div
|
||||
style={{
|
||||
marginTop: 16,
|
||||
padding: 8,
|
||||
border: "1px solid #f0c040",
|
||||
background: "#fff8d6",
|
||||
}}
|
||||
>
|
||||
<div style={{ fontSize: 12 }}>
|
||||
Pending selection ({pending.selectors.length} selector
|
||||
{pending.selectors.length === 1 ? "" : "s"}):
|
||||
</div>
|
||||
<div style={{ fontSize: 11, color: "#666", margin: "4px 0" }}>
|
||||
“{pending.capture.text.slice(0, 120)}”
|
||||
</div>
|
||||
<button onClick={handleSave}>Save</button>{" "}
|
||||
<button onClick={() => setPending(null)}>Discard</button>
|
||||
</div>
|
||||
)}
|
||||
</aside>
|
||||
|
||||
<main style={{ flex: 1, overflow: "hidden", position: "relative" }}>
|
||||
{activeFixture ? (
|
||||
<PdfSpikeViewer
|
||||
key={activeFixture.id}
|
||||
pdfUrl={`/fixtures/pdfs/${encodeURIComponent(activeFixture.filename)}`}
|
||||
storedAnnotations={annotationsForActive}
|
||||
{...(scrollTo ? { scrollToAnnotationId: scrollTo } : {})}
|
||||
onSelectionCaptured={(capture, selectors) =>
|
||||
setPending({ capture, selectors })
|
||||
}
|
||||
/>
|
||||
) : (
|
||||
<div style={{ padding: 24, color: "#666" }}>
|
||||
Pick a fixture on the left to begin.
|
||||
</div>
|
||||
)}
|
||||
</main>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -1 +1 @@
|
||||
export { SpikeApp } from "./SpikeApp";
|
||||
export { App } from "./App";
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
import { StrictMode } from "react";
|
||||
import { createRoot } from "react-dom/client";
|
||||
import { SpikeApp } from "./SpikeApp";
|
||||
import { App } from "./App";
|
||||
|
||||
const container = document.getElementById("root");
|
||||
if (!container) throw new Error("#root not found");
|
||||
|
||||
createRoot(container).render(
|
||||
<StrictMode>
|
||||
<SpikeApp />
|
||||
<App />
|
||||
</StrictMode>,
|
||||
);
|
||||
|
||||
168
src/engine/engine.test.ts
Normal file
168
src/engine/engine.test.ts
Normal file
@@ -0,0 +1,168 @@
|
||||
import { beforeEach, describe, expect, it } from "vitest";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
import type { Selector } from "@shared/selector";
|
||||
import { createEngine, type Engine, type EngineEvent } from "./index";
|
||||
|
||||
function fakeDocAndRep(): { document: Document; representation: DocumentRepresentation } {
|
||||
const docId = "doc_fake" as DocumentId;
|
||||
const repId = "rep_fake" as RepresentationId;
|
||||
return {
|
||||
document: {
|
||||
id: docId,
|
||||
mediaType: "application/pdf",
|
||||
createdAt: "2026-05-25T00:00:00.000Z",
|
||||
updatedAt: "2026-05-25T00:00:00.000Z",
|
||||
},
|
||||
representation: {
|
||||
id: repId,
|
||||
documentId: docId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: "h",
|
||||
canonicalText: "The quick brown fox.",
|
||||
pageMap: [{ page: 1, width: 100, height: 100 }],
|
||||
offsetMap: [{ page: 1, globalStart: 0, globalEnd: 20, pageLength: 20 }],
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
describe("Engine integration", () => {
|
||||
let engine: Engine;
|
||||
let events: EngineEvent[];
|
||||
|
||||
beforeEach(() => {
|
||||
engine = createEngine();
|
||||
events = [];
|
||||
engine.bus.onAny((e) => events.push(e));
|
||||
});
|
||||
|
||||
it("documentService.register stores both and emits DocumentImported + DocumentRepresentationGenerated", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
const result = engine.documents.register({ document, representation });
|
||||
expect(result.document).toBe(document);
|
||||
expect(result.representation).toBe(representation);
|
||||
expect(engine.documents.get(document.id)).toBe(document);
|
||||
expect(engine.documents.getRepresentation(representation.id)).toBe(representation);
|
||||
expect(events.map((e) => e.type)).toEqual(["DocumentImported", "DocumentRepresentationGenerated"]);
|
||||
});
|
||||
|
||||
it("annotationService.create stamps an ID + normalize version + timestamps, then emits AnnotationCreated", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
engine.documents.register({ document, representation });
|
||||
const selectors: Selector[] = [{ type: "TextQuoteSelector", exact: "brown fox" }];
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors,
|
||||
quote: "brown fox",
|
||||
note: "a quick mark",
|
||||
});
|
||||
expect(ann.id).toMatch(/^ann_/);
|
||||
expect(ann.normalizeVersion).toBeGreaterThan(0);
|
||||
expect(ann.createdAt).toBe(ann.updatedAt);
|
||||
expect(engine.annotations.get(ann.id)).toBe(ann);
|
||||
const created = events.find((e) => e.type === "AnnotationCreated");
|
||||
expect(created?.type).toBe("AnnotationCreated");
|
||||
});
|
||||
|
||||
it("setResolutionStatus emits AnnotationResolved for resolved/ambiguous and AnnotationResolutionFailed for unresolved/stale", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
engine.documents.register({ document, representation });
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "x" }],
|
||||
});
|
||||
events.length = 0;
|
||||
engine.annotations.setResolutionStatus(ann.id, "resolved", { confidence: 0.95 });
|
||||
expect(events.map((e) => e.type)).toEqual(["AnnotationResolved"]);
|
||||
engine.annotations.setResolutionStatus(ann.id, "unresolved", { confidence: 0, reason: "no quote match" });
|
||||
expect(events.map((e) => e.type)).toEqual(["AnnotationResolved", "AnnotationResolutionFailed"]);
|
||||
});
|
||||
|
||||
it("evidenceService.create requires at least one annotation and emits EvidenceItemCreated", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
engine.documents.register({ document, representation });
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
|
||||
});
|
||||
expect(() => engine.evidence.create({ annotationIds: [] })).toThrow();
|
||||
const item = engine.evidence.create({
|
||||
annotationIds: [ann.id],
|
||||
commentary: "good quote",
|
||||
});
|
||||
expect(item.status).toBe("candidate");
|
||||
expect(item.annotationIds).toEqual([ann.id]);
|
||||
expect(events.find((e) => e.type === "EvidenceItemCreated")).toBeDefined();
|
||||
});
|
||||
|
||||
it("setStatus emits EvidenceItemUpdated only on real change and carries previousStatus", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
engine.documents.register({ document, representation });
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
|
||||
});
|
||||
const item = engine.evidence.create({ annotationIds: [ann.id] });
|
||||
events.length = 0;
|
||||
const same = engine.evidence.setStatus(item.id, "candidate");
|
||||
expect(same).toBe(item);
|
||||
expect(events).toEqual([]);
|
||||
engine.evidence.setStatus(item.id, "confirmed");
|
||||
const updated = events.find((e) => e.type === "EvidenceItemUpdated");
|
||||
expect(updated).toBeDefined();
|
||||
if (updated?.type === "EvidenceItemUpdated") {
|
||||
expect(updated.previousStatus).toBe("candidate");
|
||||
}
|
||||
});
|
||||
|
||||
it("listByDocument scopes evidence items to a single document via annotation lookup", () => {
|
||||
const a = fakeDocAndRep();
|
||||
engine.documents.register(a);
|
||||
const annA = engine.annotations.create({
|
||||
documentId: a.document.id,
|
||||
representationId: a.representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
|
||||
});
|
||||
engine.evidence.create({ annotationIds: [annA.id], commentary: "a" });
|
||||
|
||||
// Second, distinct document.
|
||||
const otherDocId = "doc_other" as DocumentId;
|
||||
const otherRepId = "rep_other" as RepresentationId;
|
||||
engine.documents.register({
|
||||
document: { ...a.document, id: otherDocId },
|
||||
representation: { ...a.representation, id: otherRepId, documentId: otherDocId },
|
||||
});
|
||||
const annB = engine.annotations.create({
|
||||
documentId: otherDocId,
|
||||
representationId: otherRepId,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "z" }],
|
||||
});
|
||||
engine.evidence.create({ annotationIds: [annB.id], commentary: "b" });
|
||||
|
||||
expect(engine.evidence.listByDocument(a.document.id)).toHaveLength(1);
|
||||
expect(engine.evidence.listByDocument(otherDocId)).toHaveLength(1);
|
||||
});
|
||||
|
||||
it("activate emits EvidenceItemActivated without mutating the item", () => {
|
||||
const { document, representation } = fakeDocAndRep();
|
||||
engine.documents.register({ document, representation });
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "x" }],
|
||||
});
|
||||
const item = engine.evidence.create({ annotationIds: [ann.id] });
|
||||
events.length = 0;
|
||||
engine.evidence.activate(item.id, "sidebar");
|
||||
const activated = events.find((e) => e.type === "EvidenceItemActivated");
|
||||
expect(activated).toBeDefined();
|
||||
if (activated?.type === "EvidenceItemActivated") {
|
||||
expect(activated.source).toBe("sidebar");
|
||||
}
|
||||
});
|
||||
});
|
||||
64
src/engine/events/bus.test.ts
Normal file
64
src/engine/events/bus.test.ts
Normal file
@@ -0,0 +1,64 @@
|
||||
import { describe, expect, it, vi } from "vitest";
|
||||
import type { DocumentId } from "@shared/ids";
|
||||
import { createEventBus } from "./bus";
|
||||
|
||||
const docId = "doc_test" as DocumentId;
|
||||
const minimalDoc = {
|
||||
id: docId,
|
||||
mediaType: "application/pdf",
|
||||
createdAt: "2026-05-25T00:00:00.000Z",
|
||||
updatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
|
||||
describe("EventBus", () => {
|
||||
it("delivers typed events to the registered listener", () => {
|
||||
const bus = createEventBus();
|
||||
const spy = vi.fn();
|
||||
bus.on("DocumentImported", spy);
|
||||
const result = bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
|
||||
expect(spy).toHaveBeenCalledOnce();
|
||||
expect(spy.mock.calls[0]![0]).toMatchObject({ type: "DocumentImported", documentId: docId });
|
||||
expect(result.listenerCount).toBe(1);
|
||||
expect(result.errors).toEqual([]);
|
||||
});
|
||||
|
||||
it("does not deliver an event to listeners of a different type", () => {
|
||||
const bus = createEventBus();
|
||||
const spy = vi.fn();
|
||||
bus.on("AnnotationCreated", spy);
|
||||
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
|
||||
expect(spy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("delivers every event to onAny listeners", () => {
|
||||
const bus = createEventBus();
|
||||
const spy = vi.fn();
|
||||
bus.onAny(spy);
|
||||
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
|
||||
bus.emit({ type: "EvidenceItemActivated", evidenceItemId: "ev_x" as never });
|
||||
expect(spy).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
it("returns an unsubscribe function from on()", () => {
|
||||
const bus = createEventBus();
|
||||
const spy = vi.fn();
|
||||
const off = bus.on("DocumentImported", spy);
|
||||
off();
|
||||
bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
|
||||
expect(spy).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it("captures listener errors and still calls subsequent listeners", () => {
|
||||
const bus = createEventBus();
|
||||
const boom = new Error("listener exploded");
|
||||
const a = vi.fn(() => { throw boom; });
|
||||
const b = vi.fn();
|
||||
bus.on("DocumentImported", a);
|
||||
bus.on("DocumentImported", b);
|
||||
const result = bus.emit({ type: "DocumentImported", documentId: docId, document: minimalDoc });
|
||||
expect(a).toHaveBeenCalledOnce();
|
||||
expect(b).toHaveBeenCalledOnce();
|
||||
expect(result.errors).toEqual([boom]);
|
||||
expect(result.listenerCount).toBe(2);
|
||||
});
|
||||
});
|
||||
79
src/engine/events/bus.ts
Normal file
79
src/engine/events/bus.ts
Normal file
@@ -0,0 +1,79 @@
|
||||
/**
|
||||
* Synchronous in-process event bus.
|
||||
*
|
||||
* Listeners fire in registration order on the calling stack; `emit` returns
|
||||
* after every listener has run. A listener throwing does not stop later
|
||||
* listeners — its error surfaces through the returned `errors` array so
|
||||
* callers can decide whether to log, rethrow, or ignore.
|
||||
*
|
||||
* MVP-sufficient. ADR-0005 (persistence) will decide whether to upgrade to
|
||||
* an async/queued bus when storage becomes durable.
|
||||
*/
|
||||
|
||||
import type { EngineEvent, EngineEventOf, EngineEventType } from "./types";
|
||||
|
||||
export type EngineEventListener<T extends EngineEventType = EngineEventType> = (
|
||||
event: EngineEventOf<T>,
|
||||
) => void;
|
||||
|
||||
export type AnyEngineEventListener = (event: EngineEvent) => void;
|
||||
|
||||
export interface EmitResult {
|
||||
readonly listenerCount: number;
|
||||
readonly errors: readonly unknown[];
|
||||
}
|
||||
|
||||
export interface EventBus {
|
||||
on<T extends EngineEventType>(type: T, listener: EngineEventListener<T>): () => void;
|
||||
onAny(listener: AnyEngineEventListener): () => void;
|
||||
emit<T extends EngineEventType>(event: EngineEventOf<T>): EmitResult;
|
||||
}
|
||||
|
||||
export function createEventBus(): EventBus {
|
||||
const typedListeners = new Map<EngineEventType, Set<EngineEventListener>>();
|
||||
const anyListeners = new Set<AnyEngineEventListener>();
|
||||
|
||||
return {
|
||||
on(type, listener) {
|
||||
let set = typedListeners.get(type);
|
||||
if (!set) {
|
||||
set = new Set();
|
||||
typedListeners.set(type, set);
|
||||
}
|
||||
set.add(listener as unknown as EngineEventListener);
|
||||
return () => {
|
||||
set!.delete(listener as unknown as EngineEventListener);
|
||||
};
|
||||
},
|
||||
onAny(listener) {
|
||||
anyListeners.add(listener);
|
||||
return () => {
|
||||
anyListeners.delete(listener);
|
||||
};
|
||||
},
|
||||
emit(event) {
|
||||
const errors: unknown[] = [];
|
||||
let count = 0;
|
||||
const typedSet = typedListeners.get(event.type);
|
||||
if (typedSet) {
|
||||
for (const l of typedSet) {
|
||||
count++;
|
||||
try {
|
||||
(l as AnyEngineEventListener)(event);
|
||||
} catch (err) {
|
||||
errors.push(err);
|
||||
}
|
||||
}
|
||||
}
|
||||
for (const l of anyListeners) {
|
||||
count++;
|
||||
try {
|
||||
l(event);
|
||||
} catch (err) {
|
||||
errors.push(err);
|
||||
}
|
||||
}
|
||||
return { listenerCount: count, errors };
|
||||
},
|
||||
};
|
||||
}
|
||||
8
src/engine/events/index.ts
Normal file
8
src/engine/events/index.ts
Normal file
@@ -0,0 +1,8 @@
|
||||
export * from "./types";
|
||||
export {
|
||||
createEventBus,
|
||||
type EventBus,
|
||||
type EngineEventListener,
|
||||
type AnyEngineEventListener,
|
||||
type EmitResult,
|
||||
} from "./bus";
|
||||
84
src/engine/events/types.ts
Normal file
84
src/engine/events/types.ts
Normal file
@@ -0,0 +1,84 @@
|
||||
/**
|
||||
* Engine event vocabulary.
|
||||
*
|
||||
* Implements `wiki/SharedContracts.md` §4 (closed event list). Each event
|
||||
* carries the *minimum* identifying payload needed by downstream listeners;
|
||||
* services hand back the full domain object to the caller separately.
|
||||
*
|
||||
* Adding an event requires updating SharedContracts.md first.
|
||||
*/
|
||||
|
||||
import type { Annotation, AnnotationResolutionStatus } from "@shared/annotation";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { EvidenceItem, EvidenceItemStatus } from "@shared/evidence";
|
||||
import type {
|
||||
AnnotationId,
|
||||
DocumentId,
|
||||
EvidenceItemId,
|
||||
RepresentationId,
|
||||
} from "@shared/ids";
|
||||
|
||||
export interface DocumentImportedEvent {
|
||||
readonly type: "DocumentImported";
|
||||
readonly documentId: DocumentId;
|
||||
readonly document: Document;
|
||||
}
|
||||
|
||||
export interface DocumentRepresentationGeneratedEvent {
|
||||
readonly type: "DocumentRepresentationGenerated";
|
||||
readonly documentId: DocumentId;
|
||||
readonly representationId: RepresentationId;
|
||||
readonly representation: DocumentRepresentation;
|
||||
}
|
||||
|
||||
export interface AnnotationCreatedEvent {
|
||||
readonly type: "AnnotationCreated";
|
||||
readonly annotationId: AnnotationId;
|
||||
readonly annotation: Annotation;
|
||||
}
|
||||
|
||||
export interface AnnotationResolvedEvent {
|
||||
readonly type: "AnnotationResolved";
|
||||
readonly annotationId: AnnotationId;
|
||||
readonly status: AnnotationResolutionStatus;
|
||||
readonly confidence: number;
|
||||
}
|
||||
|
||||
export interface AnnotationResolutionFailedEvent {
|
||||
readonly type: "AnnotationResolutionFailed";
|
||||
readonly annotationId: AnnotationId;
|
||||
readonly reason: string;
|
||||
}
|
||||
|
||||
export interface EvidenceItemCreatedEvent {
|
||||
readonly type: "EvidenceItemCreated";
|
||||
readonly evidenceItemId: EvidenceItemId;
|
||||
readonly evidenceItem: EvidenceItem;
|
||||
}
|
||||
|
||||
export interface EvidenceItemUpdatedEvent {
|
||||
readonly type: "EvidenceItemUpdated";
|
||||
readonly evidenceItemId: EvidenceItemId;
|
||||
readonly evidenceItem: EvidenceItem;
|
||||
readonly previousStatus: EvidenceItemStatus;
|
||||
}
|
||||
|
||||
export interface EvidenceItemActivatedEvent {
|
||||
readonly type: "EvidenceItemActivated";
|
||||
readonly evidenceItemId: EvidenceItemId;
|
||||
readonly source?: "sidebar" | "form-field" | "citation-card";
|
||||
}
|
||||
|
||||
export type EngineEvent =
|
||||
| DocumentImportedEvent
|
||||
| DocumentRepresentationGeneratedEvent
|
||||
| AnnotationCreatedEvent
|
||||
| AnnotationResolvedEvent
|
||||
| AnnotationResolutionFailedEvent
|
||||
| EvidenceItemCreatedEvent
|
||||
| EvidenceItemUpdatedEvent
|
||||
| EvidenceItemActivatedEvent;
|
||||
|
||||
export type EngineEventType = EngineEvent["type"];
|
||||
|
||||
export type EngineEventOf<T extends EngineEventType> = Extract<EngineEvent, { type: T }>;
|
||||
@@ -1 +1,60 @@
|
||||
export {};
|
||||
/**
|
||||
* Engine composition root.
|
||||
*
|
||||
* `createEngine()` wires in-memory repos to the services and shares a single
|
||||
* event bus. The app layer holds the returned `Engine` instance and passes
|
||||
* its services into the UI.
|
||||
*
|
||||
* Swapping the repository implementation later (ADR-0005) is a matter of
|
||||
* replacing `createInMemoryRepos()` here. The service signatures don't
|
||||
* change.
|
||||
*/
|
||||
|
||||
import { createEventBus, type EventBus } from "./events";
|
||||
import {
|
||||
createInMemoryRepos,
|
||||
type InMemoryRepos,
|
||||
} from "./repos";
|
||||
import {
|
||||
createAnnotationService,
|
||||
createDocumentService,
|
||||
createEvidenceService,
|
||||
type AnnotationService,
|
||||
type DocumentService,
|
||||
type EvidenceService,
|
||||
} from "./services";
|
||||
|
||||
export * from "./events";
|
||||
export * from "./repos";
|
||||
export * from "./services";
|
||||
export {
|
||||
SNAPSHOT_VERSION,
|
||||
attachPersister,
|
||||
captureSnapshot,
|
||||
documentIdsIn,
|
||||
restoreFromStorage,
|
||||
restoreSnapshot,
|
||||
type EngineSnapshot,
|
||||
type PersisterOptions,
|
||||
} from "./persistence";
|
||||
|
||||
export interface Engine {
|
||||
readonly bus: EventBus;
|
||||
readonly repos: InMemoryRepos;
|
||||
readonly documents: DocumentService;
|
||||
readonly annotations: AnnotationService;
|
||||
readonly evidence: EvidenceService;
|
||||
}
|
||||
|
||||
export function createEngine(): Engine {
|
||||
const bus = createEventBus();
|
||||
const repos = createInMemoryRepos();
|
||||
const documents = createDocumentService(repos.documents, repos.representations, bus);
|
||||
const annotations = createAnnotationService(repos.annotations, bus);
|
||||
const evidence = createEvidenceService(
|
||||
repos.evidenceItems,
|
||||
(id) => repos.annotations.get(id),
|
||||
bus,
|
||||
);
|
||||
return { bus, repos, documents, annotations, evidence };
|
||||
}
|
||||
|
||||
183
src/engine/persistence.test.ts
Normal file
183
src/engine/persistence.test.ts
Normal file
@@ -0,0 +1,183 @@
|
||||
import { beforeEach, describe, expect, it, vi } from "vitest";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
import {
|
||||
attachPersister,
|
||||
captureSnapshot,
|
||||
createEngine,
|
||||
restoreFromStorage,
|
||||
restoreSnapshot,
|
||||
type Engine,
|
||||
type EngineEvent,
|
||||
type EngineSnapshot,
|
||||
} from "./index";
|
||||
|
||||
function fakeDocAndRep(suffix: string): {
|
||||
document: Document;
|
||||
representation: DocumentRepresentation;
|
||||
} {
|
||||
const docId = `doc_${suffix}` as DocumentId;
|
||||
const repId = `rep_${suffix}` as RepresentationId;
|
||||
return {
|
||||
document: {
|
||||
id: docId,
|
||||
mediaType: "application/pdf",
|
||||
title: `Doc ${suffix}`,
|
||||
createdAt: "2026-05-25T00:00:00.000Z",
|
||||
updatedAt: "2026-05-25T00:00:00.000Z",
|
||||
},
|
||||
representation: {
|
||||
id: repId,
|
||||
documentId: docId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: `hash-${suffix}`,
|
||||
canonicalText: "The quick brown fox.",
|
||||
pageMap: [{ page: 1, width: 100, height: 100 }],
|
||||
offsetMap: [{ page: 1, globalStart: 0, globalEnd: 20, pageLength: 20 }],
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function memoryStorage(): Pick<Storage, "getItem" | "setItem" | "removeItem"> {
|
||||
const map = new Map<string, string>();
|
||||
return {
|
||||
getItem: (k) => map.get(k) ?? null,
|
||||
setItem: (k, v) => void map.set(k, v),
|
||||
removeItem: (k) => void map.delete(k),
|
||||
};
|
||||
}
|
||||
|
||||
function seed(engine: Engine, suffix: string) {
|
||||
const { document, representation } = fakeDocAndRep(suffix);
|
||||
engine.documents.register({ document, representation });
|
||||
const ann = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors: [{ type: "TextQuoteSelector", exact: "brown fox" }],
|
||||
quote: "brown fox",
|
||||
});
|
||||
const item = engine.evidence.create({
|
||||
annotationIds: [ann.id],
|
||||
commentary: `commentary-${suffix}`,
|
||||
});
|
||||
return { document, representation, ann, item };
|
||||
}
|
||||
|
||||
describe("captureSnapshot + restoreSnapshot", () => {
|
||||
it("round-trips documents, representations, annotations and evidence items", () => {
|
||||
const src = createEngine();
|
||||
const a = seed(src, "a");
|
||||
const b = seed(src, "b");
|
||||
const snap = captureSnapshot(src);
|
||||
expect(snap.documents).toHaveLength(2);
|
||||
expect(snap.representations).toHaveLength(2);
|
||||
expect(snap.annotations).toHaveLength(2);
|
||||
expect(snap.evidenceItems).toHaveLength(2);
|
||||
|
||||
const dst = createEngine();
|
||||
restoreSnapshot(dst, snap);
|
||||
expect(dst.documents.get(a.document.id)?.title).toBe("Doc a");
|
||||
expect(dst.documents.get(b.document.id)?.title).toBe("Doc b");
|
||||
expect(dst.annotations.get(a.ann.id)?.quote).toBe("brown fox");
|
||||
expect(dst.evidence.get(a.item.id)?.commentary).toBe("commentary-a");
|
||||
});
|
||||
|
||||
it("restoreSnapshot does NOT emit *Created events (events would loop the persister)", () => {
|
||||
const src = createEngine();
|
||||
seed(src, "x");
|
||||
const snap = captureSnapshot(src);
|
||||
|
||||
const dst = createEngine();
|
||||
const seen: EngineEvent["type"][] = [];
|
||||
dst.bus.onAny((e) => seen.push(e.type));
|
||||
restoreSnapshot(dst, snap);
|
||||
expect(seen).toEqual([]);
|
||||
});
|
||||
|
||||
it("rejects a snapshot with a mismatching version", () => {
|
||||
const dst = createEngine();
|
||||
expect(() =>
|
||||
restoreSnapshot(dst, {
|
||||
version: 999,
|
||||
documents: [],
|
||||
representations: [],
|
||||
annotations: [],
|
||||
evidenceItems: [],
|
||||
} as EngineSnapshot),
|
||||
).toThrow(/version/);
|
||||
});
|
||||
});
|
||||
|
||||
describe("attachPersister", () => {
|
||||
let storage: ReturnType<typeof memoryStorage>;
|
||||
let engine: Engine;
|
||||
const KEY = "ce-test-snap";
|
||||
|
||||
beforeEach(() => {
|
||||
storage = memoryStorage();
|
||||
engine = createEngine();
|
||||
});
|
||||
|
||||
it("writes a snapshot to storage on every mutating event", () => {
|
||||
const off = attachPersister(engine, { key: KEY, storage });
|
||||
expect(storage.getItem(KEY)).toBeNull();
|
||||
seed(engine, "z");
|
||||
const raw = storage.getItem(KEY);
|
||||
expect(raw).not.toBeNull();
|
||||
const snap = JSON.parse(raw!) as EngineSnapshot;
|
||||
expect(snap.documents).toHaveLength(1);
|
||||
expect(snap.evidenceItems).toHaveLength(1);
|
||||
off();
|
||||
});
|
||||
|
||||
it("stops writing after the unsubscribe is called", () => {
|
||||
const off = attachPersister(engine, { key: KEY, storage });
|
||||
seed(engine, "q");
|
||||
const after = storage.getItem(KEY);
|
||||
off();
|
||||
seed(engine, "r");
|
||||
expect(storage.getItem(KEY)).toBe(after);
|
||||
});
|
||||
|
||||
it("survives a JSON.stringify failure without throwing into the caller", () => {
|
||||
const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
|
||||
const failing = { ...memoryStorage(), setItem: () => { throw new Error("boom"); } };
|
||||
attachPersister(engine, { key: KEY, storage: failing });
|
||||
expect(() => seed(engine, "k")).not.toThrow();
|
||||
expect(warn).toHaveBeenCalled();
|
||||
warn.mockRestore();
|
||||
});
|
||||
});
|
||||
|
||||
describe("restoreFromStorage", () => {
|
||||
it("returns {restored: false} when the key is empty", () => {
|
||||
const storage = memoryStorage();
|
||||
const engine = createEngine();
|
||||
const result = restoreFromStorage(engine, { key: "missing", storage });
|
||||
expect(result.restored).toBe(false);
|
||||
});
|
||||
|
||||
it("hydrates the engine when storage holds a valid snapshot", () => {
|
||||
const src = createEngine();
|
||||
seed(src, "rs");
|
||||
const storage = memoryStorage();
|
||||
storage.setItem("snap", JSON.stringify(captureSnapshot(src)));
|
||||
|
||||
const dst = createEngine();
|
||||
const result = restoreFromStorage(dst, { key: "snap", storage });
|
||||
expect(result.restored).toBe(true);
|
||||
expect(dst.documents.list()).toHaveLength(1);
|
||||
});
|
||||
|
||||
it("ignores malformed JSON without throwing", () => {
|
||||
const warn = vi.spyOn(console, "warn").mockImplementation(() => {});
|
||||
const storage = memoryStorage();
|
||||
storage.setItem("snap", "not-json");
|
||||
const engine = createEngine();
|
||||
const result = restoreFromStorage(engine, { key: "snap", storage });
|
||||
expect(result.restored).toBe(false);
|
||||
expect(warn).toHaveBeenCalled();
|
||||
warn.mockRestore();
|
||||
});
|
||||
});
|
||||
138
src/engine/persistence.ts
Normal file
138
src/engine/persistence.ts
Normal file
@@ -0,0 +1,138 @@
|
||||
/**
|
||||
* Engine snapshot + restore.
|
||||
*
|
||||
* MVP "persistence" — capture the engine's in-memory state into a JSON blob
|
||||
* and restore it later. Used by the SPA to survive page reloads via
|
||||
* `localStorage` until ADR-0005 lands a real store.
|
||||
*
|
||||
* Restore deliberately bypasses the service layer: it writes directly to
|
||||
* the repos so no `*Created` events fire. Without that, restoring would
|
||||
* trigger the persister to re-write the same snapshot — and if the user
|
||||
* has another tab open, it would also broadcast spurious "this annotation
|
||||
* just appeared" events to UI listeners.
|
||||
*/
|
||||
|
||||
import type { Annotation } from "@shared/annotation";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { EvidenceItem } from "@shared/evidence";
|
||||
import type { DocumentId } from "@shared/ids";
|
||||
|
||||
import type { Engine } from "./index";
|
||||
|
||||
export const SNAPSHOT_VERSION = 1;
|
||||
|
||||
export interface EngineSnapshot {
|
||||
readonly version: number;
|
||||
readonly documents: readonly Document[];
|
||||
readonly representations: readonly DocumentRepresentation[];
|
||||
readonly annotations: readonly Annotation[];
|
||||
readonly evidenceItems: readonly EvidenceItem[];
|
||||
}
|
||||
|
||||
export function captureSnapshot(engine: Engine): EngineSnapshot {
|
||||
const documents = engine.documents.list();
|
||||
// Gather representations per known document.
|
||||
const representations: DocumentRepresentation[] = [];
|
||||
const annotations: Annotation[] = [];
|
||||
const evidenceItems: EvidenceItem[] = [];
|
||||
const seenItemIds = new Set<string>();
|
||||
for (const doc of documents) {
|
||||
representations.push(...engine.documents.listRepresentations(doc.id));
|
||||
annotations.push(...engine.annotations.listByDocument(doc.id));
|
||||
for (const item of engine.evidence.listByDocument(doc.id)) {
|
||||
// listByDocument keys off annotation lookup; an item that shares
|
||||
// annotations across two documents would surface twice. De-dupe.
|
||||
if (!seenItemIds.has(item.id)) {
|
||||
seenItemIds.add(item.id);
|
||||
evidenceItems.push(item);
|
||||
}
|
||||
}
|
||||
}
|
||||
return {
|
||||
version: SNAPSHOT_VERSION,
|
||||
documents: [...documents],
|
||||
representations,
|
||||
annotations,
|
||||
evidenceItems,
|
||||
};
|
||||
}
|
||||
|
||||
export function restoreSnapshot(engine: Engine, snapshot: EngineSnapshot): void {
|
||||
if (snapshot.version !== SNAPSHOT_VERSION) {
|
||||
throw new Error(
|
||||
`restoreSnapshot: snapshot version ${snapshot.version} does not match current ${SNAPSHOT_VERSION}`,
|
||||
);
|
||||
}
|
||||
for (const d of snapshot.documents) engine.repos.documents.create(d);
|
||||
for (const r of snapshot.representations) engine.repos.representations.create(r);
|
||||
for (const a of snapshot.annotations) engine.repos.annotations.create(a);
|
||||
for (const i of snapshot.evidenceItems) engine.repos.evidenceItems.create(i);
|
||||
}
|
||||
|
||||
export interface PersisterOptions {
|
||||
/** Storage key. */
|
||||
readonly key: string;
|
||||
/** Storage shim — defaults to globalThis.localStorage. */
|
||||
readonly storage?: Pick<Storage, "getItem" | "setItem" | "removeItem">;
|
||||
}
|
||||
|
||||
/**
|
||||
* Subscribe to engine events and write a fresh snapshot on every mutation.
|
||||
* Returns the unsubscribe function.
|
||||
*
|
||||
* Initial snapshot is NOT written — call `captureSnapshot` + `storage.setItem`
|
||||
* yourself if you want a baseline.
|
||||
*/
|
||||
export function attachPersister(engine: Engine, options: PersisterOptions): () => void {
|
||||
const storage = options.storage ?? globalThis.localStorage;
|
||||
const write = () => {
|
||||
const snap = captureSnapshot(engine);
|
||||
try {
|
||||
storage.setItem(options.key, JSON.stringify(snap));
|
||||
} catch (err) {
|
||||
// localStorage quota / serialization errors shouldn't crash the app.
|
||||
// Surface to the console; ADR-0005 owns the durable fix.
|
||||
console.warn("attachPersister: write failed", err);
|
||||
}
|
||||
};
|
||||
const offs = [
|
||||
engine.bus.on("DocumentImported", write),
|
||||
engine.bus.on("DocumentRepresentationGenerated", write),
|
||||
engine.bus.on("AnnotationCreated", write),
|
||||
engine.bus.on("AnnotationResolved", write),
|
||||
engine.bus.on("AnnotationResolutionFailed", write),
|
||||
engine.bus.on("EvidenceItemCreated", write),
|
||||
engine.bus.on("EvidenceItemUpdated", write),
|
||||
];
|
||||
return () => {
|
||||
for (const off of offs) off();
|
||||
};
|
||||
}
|
||||
|
||||
export type RestoreFromStorageOptions = PersisterOptions;
|
||||
|
||||
export function restoreFromStorage(
|
||||
engine: Engine,
|
||||
options: RestoreFromStorageOptions,
|
||||
): { readonly restored: boolean; readonly snapshot?: EngineSnapshot } {
|
||||
const storage = options.storage ?? globalThis.localStorage;
|
||||
const raw = storage.getItem(options.key);
|
||||
if (!raw) return { restored: false };
|
||||
try {
|
||||
const parsed = JSON.parse(raw) as EngineSnapshot;
|
||||
if (typeof parsed !== "object" || parsed === null) return { restored: false };
|
||||
restoreSnapshot(engine, parsed);
|
||||
return { restored: true, snapshot: parsed };
|
||||
} catch (err) {
|
||||
console.warn("restoreFromStorage: parse failed, ignoring stored snapshot", err);
|
||||
return { restored: false };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Narrow helper: get the set of document ids restored from a snapshot.
|
||||
* Useful for the SPA's "show me what was open last time" logic.
|
||||
*/
|
||||
export function documentIdsIn(snapshot: EngineSnapshot): readonly DocumentId[] {
|
||||
return snapshot.documents.map((d) => d.id);
|
||||
}
|
||||
151
src/engine/repos/in-memory.ts
Normal file
151
src/engine/repos/in-memory.ts
Normal file
@@ -0,0 +1,151 @@
|
||||
/**
|
||||
* In-memory `Map`-backed repositories.
|
||||
*
|
||||
* Implements the MVP storage layer. The repository interfaces match the
|
||||
* shape that ADR-0005's eventual persistence implementation will satisfy,
|
||||
* so swapping `createInMemoryRepos()` for a SQLite/Postgres factory later
|
||||
* is a localised change.
|
||||
*
|
||||
* All mutating methods return the *stored* object so callers can pick up
|
||||
* server-assigned fields (none in MVP, but the contract anticipates it).
|
||||
*/
|
||||
|
||||
import type { Annotation } from "@shared/annotation";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { EvidenceItem } from "@shared/evidence";
|
||||
import type {
|
||||
AnnotationId,
|
||||
DocumentId,
|
||||
EvidenceItemId,
|
||||
RepresentationId,
|
||||
} from "@shared/ids";
|
||||
|
||||
export interface DocumentRepository {
|
||||
create(document: Document): Document;
|
||||
get(id: DocumentId): Document | null;
|
||||
list(): readonly Document[];
|
||||
update(document: Document): Document;
|
||||
}
|
||||
|
||||
export interface RepresentationRepository {
|
||||
create(representation: DocumentRepresentation): DocumentRepresentation;
|
||||
get(id: RepresentationId): DocumentRepresentation | null;
|
||||
listByDocument(documentId: DocumentId): readonly DocumentRepresentation[];
|
||||
}
|
||||
|
||||
export interface AnnotationRepository {
|
||||
create(annotation: Annotation): Annotation;
|
||||
get(id: AnnotationId): Annotation | null;
|
||||
listByDocument(documentId: DocumentId): readonly Annotation[];
|
||||
update(annotation: Annotation): Annotation;
|
||||
}
|
||||
|
||||
export interface EvidenceItemRepository {
|
||||
create(item: EvidenceItem): EvidenceItem;
|
||||
get(id: EvidenceItemId): EvidenceItem | null;
|
||||
listByDocument(
|
||||
documentId: DocumentId,
|
||||
annotationLookup: (id: AnnotationId) => Annotation | null,
|
||||
): readonly EvidenceItem[];
|
||||
update(item: EvidenceItem): EvidenceItem;
|
||||
}
|
||||
|
||||
export interface InMemoryRepos {
|
||||
readonly documents: DocumentRepository;
|
||||
readonly representations: RepresentationRepository;
|
||||
readonly annotations: AnnotationRepository;
|
||||
readonly evidenceItems: EvidenceItemRepository;
|
||||
}
|
||||
|
||||
export function createInMemoryRepos(): InMemoryRepos {
|
||||
const documents = new Map<DocumentId, Document>();
|
||||
const representations = new Map<RepresentationId, DocumentRepresentation>();
|
||||
const annotations = new Map<AnnotationId, Annotation>();
|
||||
const evidenceItems = new Map<EvidenceItemId, EvidenceItem>();
|
||||
|
||||
return {
|
||||
documents: {
|
||||
create(document) {
|
||||
documents.set(document.id, document);
|
||||
return document;
|
||||
},
|
||||
get(id) {
|
||||
return documents.get(id) ?? null;
|
||||
},
|
||||
list() {
|
||||
return [...documents.values()];
|
||||
},
|
||||
update(document) {
|
||||
if (!documents.has(document.id)) {
|
||||
throw new Error(`DocumentRepository.update: unknown id ${document.id}`);
|
||||
}
|
||||
documents.set(document.id, document);
|
||||
return document;
|
||||
},
|
||||
},
|
||||
representations: {
|
||||
create(representation) {
|
||||
representations.set(representation.id, representation);
|
||||
return representation;
|
||||
},
|
||||
get(id) {
|
||||
return representations.get(id) ?? null;
|
||||
},
|
||||
listByDocument(documentId) {
|
||||
const out: DocumentRepresentation[] = [];
|
||||
for (const rep of representations.values()) {
|
||||
if (rep.documentId === documentId) out.push(rep);
|
||||
}
|
||||
return out;
|
||||
},
|
||||
},
|
||||
annotations: {
|
||||
create(annotation) {
|
||||
annotations.set(annotation.id, annotation);
|
||||
return annotation;
|
||||
},
|
||||
get(id) {
|
||||
return annotations.get(id) ?? null;
|
||||
},
|
||||
listByDocument(documentId) {
|
||||
const out: Annotation[] = [];
|
||||
for (const ann of annotations.values()) {
|
||||
if (ann.documentId === documentId) out.push(ann);
|
||||
}
|
||||
return out;
|
||||
},
|
||||
update(annotation) {
|
||||
if (!annotations.has(annotation.id)) {
|
||||
throw new Error(`AnnotationRepository.update: unknown id ${annotation.id}`);
|
||||
}
|
||||
annotations.set(annotation.id, annotation);
|
||||
return annotation;
|
||||
},
|
||||
},
|
||||
evidenceItems: {
|
||||
create(item) {
|
||||
evidenceItems.set(item.id, item);
|
||||
return item;
|
||||
},
|
||||
get(id) {
|
||||
return evidenceItems.get(id) ?? null;
|
||||
},
|
||||
listByDocument(documentId, annotationLookup) {
|
||||
const out: EvidenceItem[] = [];
|
||||
for (const item of evidenceItems.values()) {
|
||||
if (item.annotationIds.some((aid) => annotationLookup(aid)?.documentId === documentId)) {
|
||||
out.push(item);
|
||||
}
|
||||
}
|
||||
return out;
|
||||
},
|
||||
update(item) {
|
||||
if (!evidenceItems.has(item.id)) {
|
||||
throw new Error(`EvidenceItemRepository.update: unknown id ${item.id}`);
|
||||
}
|
||||
evidenceItems.set(item.id, item);
|
||||
return item;
|
||||
},
|
||||
},
|
||||
};
|
||||
}
|
||||
8
src/engine/repos/index.ts
Normal file
8
src/engine/repos/index.ts
Normal file
@@ -0,0 +1,8 @@
|
||||
export {
|
||||
createInMemoryRepos,
|
||||
type InMemoryRepos,
|
||||
type DocumentRepository,
|
||||
type RepresentationRepository,
|
||||
type AnnotationRepository,
|
||||
type EvidenceItemRepository,
|
||||
} from "./in-memory";
|
||||
102
src/engine/services/annotations.ts
Normal file
102
src/engine/services/annotations.ts
Normal file
@@ -0,0 +1,102 @@
|
||||
/**
|
||||
* Annotation service — creates technical marks on document ranges and
|
||||
* emits `AnnotationCreated`. Resolution-status updates emit
|
||||
* `AnnotationResolved` / `AnnotationResolutionFailed`.
|
||||
*
|
||||
* Annotation creation is the engine's response to a user action in the
|
||||
* viewer (T07). The viewer adapter has already turned the selection into
|
||||
* `Selector[]`; this service stamps an ID, normalize-version, timestamps,
|
||||
* persists, and broadcasts.
|
||||
*/
|
||||
|
||||
import type {
|
||||
Annotation,
|
||||
AnnotationResolutionStatus,
|
||||
} from "@shared/annotation";
|
||||
import type { DocumentId, RepresentationId, AnnotationId } from "@shared/ids";
|
||||
import type { Selector } from "@shared/selector";
|
||||
import { newId } from "@shared/ids";
|
||||
import { NORMALIZE_VERSION } from "@shared/text/normalize";
|
||||
|
||||
import type { EventBus } from "../events";
|
||||
import type { AnnotationRepository } from "../repos";
|
||||
|
||||
export interface CreateAnnotationInput {
|
||||
readonly documentId: DocumentId;
|
||||
readonly representationId?: RepresentationId;
|
||||
readonly selectors: readonly Selector[];
|
||||
readonly quote?: string;
|
||||
readonly note?: string;
|
||||
readonly createdBy?: string;
|
||||
}
|
||||
|
||||
export interface AnnotationService {
|
||||
create(input: CreateAnnotationInput): Annotation;
|
||||
get(id: AnnotationId): Annotation | null;
|
||||
listByDocument(documentId: DocumentId): readonly Annotation[];
|
||||
setResolutionStatus(
|
||||
id: AnnotationId,
|
||||
status: AnnotationResolutionStatus,
|
||||
opts: { readonly confidence: number; readonly reason?: string },
|
||||
): Annotation;
|
||||
}
|
||||
|
||||
export function createAnnotationService(
|
||||
annotations: AnnotationRepository,
|
||||
bus: EventBus,
|
||||
now: () => string = () => new Date().toISOString(),
|
||||
): AnnotationService {
|
||||
return {
|
||||
create(input) {
|
||||
const ts = now();
|
||||
const annotation: Annotation = {
|
||||
id: newId("annotation"),
|
||||
documentId: input.documentId,
|
||||
...(input.representationId !== undefined ? { representationId: input.representationId } : {}),
|
||||
selectors: input.selectors,
|
||||
...(input.quote !== undefined ? { quote: input.quote } : {}),
|
||||
...(input.note !== undefined ? { note: input.note } : {}),
|
||||
normalizeVersion: NORMALIZE_VERSION,
|
||||
...(input.createdBy !== undefined ? { createdBy: input.createdBy } : {}),
|
||||
createdAt: ts,
|
||||
updatedAt: ts,
|
||||
};
|
||||
const stored = annotations.create(annotation);
|
||||
bus.emit({ type: "AnnotationCreated", annotationId: stored.id, annotation: stored });
|
||||
return stored;
|
||||
},
|
||||
get(id) {
|
||||
return annotations.get(id);
|
||||
},
|
||||
listByDocument(documentId) {
|
||||
return annotations.listByDocument(documentId);
|
||||
},
|
||||
setResolutionStatus(id, status, opts) {
|
||||
const existing = annotations.get(id);
|
||||
if (!existing) {
|
||||
throw new Error(`AnnotationService.setResolutionStatus: unknown id ${id}`);
|
||||
}
|
||||
const updated: Annotation = {
|
||||
...existing,
|
||||
resolutionStatus: status,
|
||||
updatedAt: now(),
|
||||
};
|
||||
const stored = annotations.update(updated);
|
||||
if (status === "unresolved" || status === "stale") {
|
||||
bus.emit({
|
||||
type: "AnnotationResolutionFailed",
|
||||
annotationId: stored.id,
|
||||
reason: opts.reason ?? status,
|
||||
});
|
||||
} else {
|
||||
bus.emit({
|
||||
type: "AnnotationResolved",
|
||||
annotationId: stored.id,
|
||||
status,
|
||||
confidence: opts.confidence,
|
||||
});
|
||||
}
|
||||
return stored;
|
||||
},
|
||||
};
|
||||
}
|
||||
63
src/engine/services/documents.ts
Normal file
63
src/engine/services/documents.ts
Normal file
@@ -0,0 +1,63 @@
|
||||
/**
|
||||
* Document service — registers ingested documents and emits the §4 events.
|
||||
*
|
||||
* The ingest pipeline (`src/source/pdf/ingest.ts`) is a pure function over
|
||||
* bytes — it does not touch the engine. The app composition root calls
|
||||
* `ingestPdf` then hands the result to `documentService.register()`, which
|
||||
* is where the engine takes over: persist into the repos, emit
|
||||
* `DocumentImported` + `DocumentRepresentationGenerated`.
|
||||
*/
|
||||
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
|
||||
import type { EventBus } from "../events";
|
||||
import type { DocumentRepository, RepresentationRepository } from "../repos";
|
||||
|
||||
export interface DocumentService {
|
||||
register(input: {
|
||||
readonly document: Document;
|
||||
readonly representation: DocumentRepresentation;
|
||||
}): { readonly document: Document; readonly representation: DocumentRepresentation };
|
||||
get(id: DocumentId): Document | null;
|
||||
list(): readonly Document[];
|
||||
getRepresentation(id: RepresentationId): DocumentRepresentation | null;
|
||||
listRepresentations(documentId: DocumentId): readonly DocumentRepresentation[];
|
||||
}
|
||||
|
||||
export function createDocumentService(
|
||||
documents: DocumentRepository,
|
||||
representations: RepresentationRepository,
|
||||
bus: EventBus,
|
||||
): DocumentService {
|
||||
return {
|
||||
register({ document, representation }) {
|
||||
const storedDocument = documents.create(document);
|
||||
const storedRepresentation = representations.create(representation);
|
||||
bus.emit({
|
||||
type: "DocumentImported",
|
||||
documentId: storedDocument.id,
|
||||
document: storedDocument,
|
||||
});
|
||||
bus.emit({
|
||||
type: "DocumentRepresentationGenerated",
|
||||
documentId: storedDocument.id,
|
||||
representationId: storedRepresentation.id,
|
||||
representation: storedRepresentation,
|
||||
});
|
||||
return { document: storedDocument, representation: storedRepresentation };
|
||||
},
|
||||
get(id) {
|
||||
return documents.get(id);
|
||||
},
|
||||
list() {
|
||||
return documents.list();
|
||||
},
|
||||
getRepresentation(id) {
|
||||
return representations.get(id);
|
||||
},
|
||||
listRepresentations(documentId) {
|
||||
return representations.listByDocument(documentId);
|
||||
},
|
||||
};
|
||||
}
|
||||
127
src/engine/services/evidence.ts
Normal file
127
src/engine/services/evidence.ts
Normal file
@@ -0,0 +1,127 @@
|
||||
/**
|
||||
* Evidence service — creates EvidenceItems on top of annotations and
|
||||
* tracks their lifecycle. Emits §4 events: `EvidenceItemCreated`,
|
||||
* `EvidenceItemUpdated`, `EvidenceItemActivated`.
|
||||
*
|
||||
* MVP item shape per `wiki/SharedContracts.md` §2.2: status starts at
|
||||
* `candidate`, may transition to `confirmed | rejected | needs-check`.
|
||||
* Item-level relation/strength (supports/contradicts/...) lives on the
|
||||
* link, not the item — that's CE-WP-0003.
|
||||
*/
|
||||
|
||||
import type { Annotation } from "@shared/annotation";
|
||||
import type {
|
||||
EvidenceItem,
|
||||
EvidenceItemStatus,
|
||||
} from "@shared/evidence";
|
||||
import type {
|
||||
AnnotationId,
|
||||
DocumentId,
|
||||
EvidenceItemId,
|
||||
} from "@shared/ids";
|
||||
import { newId } from "@shared/ids";
|
||||
|
||||
import type { EventBus, EvidenceItemActivatedEvent } from "../events";
|
||||
import type { EvidenceItemRepository } from "../repos";
|
||||
|
||||
export interface CreateEvidenceItemInput {
|
||||
readonly annotationIds: readonly AnnotationId[];
|
||||
readonly title?: string;
|
||||
readonly commentary?: string;
|
||||
readonly status?: EvidenceItemStatus;
|
||||
readonly confidence?: number;
|
||||
readonly tags?: readonly string[];
|
||||
readonly createdBy?: string;
|
||||
}
|
||||
|
||||
export interface EvidenceService {
|
||||
create(input: CreateEvidenceItemInput): EvidenceItem;
|
||||
get(id: EvidenceItemId): EvidenceItem | null;
|
||||
listByDocument(documentId: DocumentId): readonly EvidenceItem[];
|
||||
setStatus(id: EvidenceItemId, status: EvidenceItemStatus): EvidenceItem;
|
||||
updateCommentary(id: EvidenceItemId, commentary: string): EvidenceItem;
|
||||
activate(
|
||||
id: EvidenceItemId,
|
||||
source?: EvidenceItemActivatedEvent["source"],
|
||||
): EvidenceItem;
|
||||
}
|
||||
|
||||
export function createEvidenceService(
|
||||
items: EvidenceItemRepository,
|
||||
annotationLookup: (id: AnnotationId) => Annotation | null,
|
||||
bus: EventBus,
|
||||
now: () => string = () => new Date().toISOString(),
|
||||
): EvidenceService {
|
||||
return {
|
||||
create(input) {
|
||||
if (input.annotationIds.length === 0) {
|
||||
throw new Error("EvidenceService.create: at least one annotationId is required");
|
||||
}
|
||||
const ts = now();
|
||||
const item: EvidenceItem = {
|
||||
id: newId("evidence"),
|
||||
annotationIds: input.annotationIds,
|
||||
...(input.title !== undefined ? { title: input.title } : {}),
|
||||
...(input.commentary !== undefined ? { commentary: input.commentary } : {}),
|
||||
status: input.status ?? "candidate",
|
||||
...(input.confidence !== undefined ? { confidence: input.confidence } : {}),
|
||||
...(input.tags !== undefined ? { tags: input.tags } : {}),
|
||||
...(input.createdBy !== undefined ? { createdBy: input.createdBy } : {}),
|
||||
createdAt: ts,
|
||||
updatedAt: ts,
|
||||
};
|
||||
const stored = items.create(item);
|
||||
bus.emit({ type: "EvidenceItemCreated", evidenceItemId: stored.id, evidenceItem: stored });
|
||||
return stored;
|
||||
},
|
||||
get(id) {
|
||||
return items.get(id);
|
||||
},
|
||||
listByDocument(documentId) {
|
||||
return items.listByDocument(documentId, annotationLookup);
|
||||
},
|
||||
setStatus(id, status) {
|
||||
const existing = items.get(id);
|
||||
if (!existing) {
|
||||
throw new Error(`EvidenceService.setStatus: unknown id ${id}`);
|
||||
}
|
||||
if (existing.status === status) return existing;
|
||||
const updated: EvidenceItem = { ...existing, status, updatedAt: now() };
|
||||
const stored = items.update(updated);
|
||||
bus.emit({
|
||||
type: "EvidenceItemUpdated",
|
||||
evidenceItemId: stored.id,
|
||||
evidenceItem: stored,
|
||||
previousStatus: existing.status,
|
||||
});
|
||||
return stored;
|
||||
},
|
||||
updateCommentary(id, commentary) {
|
||||
const existing = items.get(id);
|
||||
if (!existing) {
|
||||
throw new Error(`EvidenceService.updateCommentary: unknown id ${id}`);
|
||||
}
|
||||
const updated: EvidenceItem = { ...existing, commentary, updatedAt: now() };
|
||||
const stored = items.update(updated);
|
||||
bus.emit({
|
||||
type: "EvidenceItemUpdated",
|
||||
evidenceItemId: stored.id,
|
||||
evidenceItem: stored,
|
||||
previousStatus: existing.status,
|
||||
});
|
||||
return stored;
|
||||
},
|
||||
activate(id, source) {
|
||||
const existing = items.get(id);
|
||||
if (!existing) {
|
||||
throw new Error(`EvidenceService.activate: unknown id ${id}`);
|
||||
}
|
||||
bus.emit({
|
||||
type: "EvidenceItemActivated",
|
||||
evidenceItemId: existing.id,
|
||||
...(source !== undefined ? { source } : {}),
|
||||
});
|
||||
return existing;
|
||||
},
|
||||
};
|
||||
}
|
||||
14
src/engine/services/index.ts
Normal file
14
src/engine/services/index.ts
Normal file
@@ -0,0 +1,14 @@
|
||||
export {
|
||||
createDocumentService,
|
||||
type DocumentService,
|
||||
} from "./documents";
|
||||
export {
|
||||
createAnnotationService,
|
||||
type AnnotationService,
|
||||
type CreateAnnotationInput,
|
||||
} from "./annotations";
|
||||
export {
|
||||
createEvidenceService,
|
||||
type EvidenceService,
|
||||
type CreateEvidenceItemInput,
|
||||
} from "./evidence";
|
||||
@@ -1 +1,8 @@
|
||||
export {};
|
||||
export {
|
||||
ingestPdf,
|
||||
type IngestPdfInput,
|
||||
type IngestPdfOptions,
|
||||
type IngestPdfResult,
|
||||
} from "./pdf/ingest";
|
||||
export { extractPdf, type PdfExtractionResult } from "./pdf/extract";
|
||||
export { fingerprintBytes } from "./pdf/fingerprint";
|
||||
|
||||
122
src/source/pdf/extract.ts
Normal file
122
src/source/pdf/extract.ts
Normal file
@@ -0,0 +1,122 @@
|
||||
/**
|
||||
* PDF text extraction → canonical text + PageMap + OffsetMap.
|
||||
*
|
||||
* Implements `wiki/ArchitectureOverview.md` §3.4 ("extract canonical text /
|
||||
* build format-specific maps") for the `pdf-text` representation
|
||||
* (`wiki/SharedContracts.md` §1, §3) and §6 (canonical normalization).
|
||||
*
|
||||
* Runtime independence: the PDF.js worker must be configured by the host
|
||||
* application (`GlobalWorkerOptions.workerSrc`) before this module is
|
||||
* called. In Vite/browser code the worker is bundled via the viewer; in
|
||||
* Node tests the test setup file points it at
|
||||
* `pdfjs-dist/legacy/build/pdf.worker.mjs`. No worker setup happens here
|
||||
* so the same module loads cleanly in both runtimes.
|
||||
*
|
||||
* Page boundary semantics: canonical text concatenates per-page normalized
|
||||
* text with a single "\n\n" paragraph separator. The separator is treated
|
||||
* as belonging to the *preceding* page in `OffsetMap`, so the map covers
|
||||
* `[0, canonicalText.length)` with no gaps. The last page has no trailing
|
||||
* separator. This means `pageLength = globalEnd - globalStart` for
|
||||
* every page; for non-last pages it equals (normalized page text length +
|
||||
* 2). See `PageOffsetRange` in `@shared/document.ts`.
|
||||
*/
|
||||
|
||||
import { getDocument } from "pdfjs-dist";
|
||||
import type { PDFPageProxy } from "pdfjs-dist";
|
||||
import type {
|
||||
OffsetMap,
|
||||
PageInfo,
|
||||
PageMap,
|
||||
PageOffsetRange,
|
||||
} from "@shared/document";
|
||||
import { normalize } from "@shared/text/normalize";
|
||||
|
||||
const PAGE_SEPARATOR = "\n\n";
|
||||
|
||||
export interface PdfExtractionResult {
|
||||
readonly canonicalText: string;
|
||||
readonly pageMap: PageMap;
|
||||
readonly offsetMap: OffsetMap;
|
||||
readonly pageCount: number;
|
||||
}
|
||||
|
||||
export async function extractPdf(bytes: Uint8Array): Promise<PdfExtractionResult> {
|
||||
// PDF.js mutates the bytes buffer (transfers ownership). Pass a fresh copy
|
||||
// so the caller's Uint8Array stays usable for fingerprinting after extract.
|
||||
const data = new Uint8Array(bytes);
|
||||
const loadingTask = getDocument({ data });
|
||||
const doc = await loadingTask.promise;
|
||||
|
||||
try {
|
||||
const pageCount = doc.numPages;
|
||||
const pageInfos: PageInfo[] = [];
|
||||
const pageNormalizedTexts: string[] = [];
|
||||
|
||||
for (let pageNumber = 1; pageNumber <= pageCount; pageNumber++) {
|
||||
const page = await doc.getPage(pageNumber);
|
||||
try {
|
||||
const viewport = page.getViewport({ scale: 1 });
|
||||
pageInfos.push({
|
||||
page: pageNumber,
|
||||
width: viewport.width,
|
||||
height: viewport.height,
|
||||
});
|
||||
|
||||
const rawText = await extractPageText(page);
|
||||
pageNormalizedTexts.push(normalize(rawText).text);
|
||||
} finally {
|
||||
page.cleanup();
|
||||
}
|
||||
}
|
||||
|
||||
const { canonicalText, offsetMap } = buildOffsetMap(pageNormalizedTexts);
|
||||
|
||||
return {
|
||||
canonicalText,
|
||||
pageMap: pageInfos,
|
||||
offsetMap,
|
||||
pageCount,
|
||||
};
|
||||
} finally {
|
||||
await doc.destroy();
|
||||
}
|
||||
}
|
||||
|
||||
async function extractPageText(page: PDFPageProxy): Promise<string> {
|
||||
const content = await page.getTextContent();
|
||||
// textContent.items are TextItem | TextMarkedContent. We want only the
|
||||
// TextItem strings (those have a `str` field); marked-content entries are
|
||||
// structural anchors and have no visible text.
|
||||
const parts: string[] = [];
|
||||
for (const item of content.items) {
|
||||
if ("str" in item) {
|
||||
parts.push(item.str);
|
||||
if (item.hasEOL) parts.push("\n");
|
||||
}
|
||||
}
|
||||
return parts.join("");
|
||||
}
|
||||
|
||||
function buildOffsetMap(pageTexts: readonly string[]): {
|
||||
canonicalText: string;
|
||||
offsetMap: OffsetMap;
|
||||
} {
|
||||
const ranges: PageOffsetRange[] = [];
|
||||
let offset = 0;
|
||||
for (let i = 0; i < pageTexts.length; i++) {
|
||||
const text = pageTexts[i]!;
|
||||
const isLast = i === pageTexts.length - 1;
|
||||
const segmentLength = text.length + (isLast ? 0 : PAGE_SEPARATOR.length);
|
||||
const globalStart = offset;
|
||||
const globalEnd = offset + segmentLength;
|
||||
ranges.push({
|
||||
page: i + 1,
|
||||
globalStart,
|
||||
globalEnd,
|
||||
pageLength: segmentLength,
|
||||
});
|
||||
offset = globalEnd;
|
||||
}
|
||||
const canonicalText = pageTexts.join(PAGE_SEPARATOR);
|
||||
return { canonicalText, offsetMap: ranges };
|
||||
}
|
||||
31
src/source/pdf/fingerprint.ts
Normal file
31
src/source/pdf/fingerprint.ts
Normal file
@@ -0,0 +1,31 @@
|
||||
/**
|
||||
* SHA-256 fingerprint of raw document bytes.
|
||||
*
|
||||
* Implements the fingerprint half of `wiki/ArchitectureOverview.md` §3.4
|
||||
* (the "compute fingerprint" pipeline step) and populates
|
||||
* `Document.fingerprint` (`wiki/SharedContracts.md` §1).
|
||||
*
|
||||
* Uses Web Crypto's `crypto.subtle.digest`, which is available in browsers
|
||||
* and in Node ≥ 20 (where it is exposed on `globalThis.crypto`). No
|
||||
* platform branching — the API is the same in both environments.
|
||||
*/
|
||||
|
||||
export async function fingerprintBytes(bytes: Uint8Array): Promise<string> {
|
||||
// Copy into a fresh ArrayBuffer (not SharedArrayBuffer) so the digest call
|
||||
// satisfies TS's updated `BufferSource` type, which excludes
|
||||
// `SharedArrayBuffer`. The copy is O(n) — fine even for large PDFs since
|
||||
// SHA-256 itself is already O(n).
|
||||
const ab = new ArrayBuffer(bytes.byteLength);
|
||||
new Uint8Array(ab).set(bytes);
|
||||
const digest = await crypto.subtle.digest("SHA-256", ab);
|
||||
return bytesToHex(new Uint8Array(digest));
|
||||
}
|
||||
|
||||
function bytesToHex(bytes: Uint8Array): string {
|
||||
let hex = "";
|
||||
for (let i = 0; i < bytes.length; i++) {
|
||||
const b = bytes[i]!;
|
||||
hex += (b < 0x10 ? "0" : "") + b.toString(16);
|
||||
}
|
||||
return hex;
|
||||
}
|
||||
142
src/source/pdf/ingest.test.ts
Normal file
142
src/source/pdf/ingest.test.ts
Normal file
@@ -0,0 +1,142 @@
|
||||
/**
|
||||
* Fixture-driven contract tests for the PDF ingest pipeline.
|
||||
*
|
||||
* For each fixture in `fixtures/pdfs/manifest.json`:
|
||||
* 1. Read the PDF bytes from disk.
|
||||
* 2. Run `ingestPdf` end-to-end.
|
||||
* 3. Assert the resulting Document + DocumentRepresentation honour the
|
||||
* manifest contract: media type is application/pdf, fingerprint is a
|
||||
* 64-hex SHA-256, pageMap matches `page_count`, canonicalText
|
||||
* contains `known_good_quote`, and the offsetMap covers
|
||||
* `[0, canonicalText.length)` with no gaps.
|
||||
*
|
||||
* This is the verification gate for CE-WP-0002-T03.
|
||||
*/
|
||||
|
||||
import { readFileSync } from "node:fs";
|
||||
import { dirname, resolve } from "node:path";
|
||||
import { createRequire } from "node:module";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import { beforeAll, describe, expect, it } from "vitest";
|
||||
|
||||
import { ingestPdf } from "./ingest";
|
||||
import { fingerprintBytes } from "./fingerprint";
|
||||
import manifest from "../../../fixtures/pdfs/manifest.json" with { type: "json" };
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const FIXTURE_DIR = resolve(__dirname, "../../../fixtures/pdfs");
|
||||
|
||||
interface Fixture {
|
||||
id: string;
|
||||
filename: string;
|
||||
page_count: number;
|
||||
known_good_quote: string;
|
||||
known_good_quote_page: number;
|
||||
}
|
||||
|
||||
const FIXTURES: readonly Fixture[] = manifest.fixtures;
|
||||
|
||||
beforeAll(async () => {
|
||||
// PDF.js needs a workerSrc set. In Node tests we point it at the legacy
|
||||
// worker bundle — the modern bundle uses APIs that aren't present in
|
||||
// Node. The legacy worker is bundled as plain JS and runs through the
|
||||
// fake-worker fallback that PDF.js spins up when no real Worker is
|
||||
// available.
|
||||
const pdfjs = await import("pdfjs-dist");
|
||||
const require = createRequire(import.meta.url);
|
||||
pdfjs.GlobalWorkerOptions.workerSrc = require.resolve(
|
||||
"pdfjs-dist/legacy/build/pdf.worker.mjs",
|
||||
);
|
||||
});
|
||||
|
||||
describe("ingestPdf — fixture corpus", () => {
|
||||
for (const fixture of FIXTURES) {
|
||||
describe(fixture.id, () => {
|
||||
const path = resolve(FIXTURE_DIR, fixture.filename);
|
||||
const bytes = new Uint8Array(readFileSync(path));
|
||||
|
||||
it("produces a Document with PDF media type and SHA-256 fingerprint", async () => {
|
||||
const { document } = await ingestPdf(bytes, { filename: fixture.filename });
|
||||
expect(document.mediaType).toBe("application/pdf");
|
||||
expect(document.fingerprint).toMatch(/^[0-9a-f]{64}$/);
|
||||
expect(document.title).toBe(fixture.filename);
|
||||
// Fingerprint must be deterministic across runs.
|
||||
const expected = await fingerprintBytes(bytes);
|
||||
expect(document.fingerprint).toBe(expected);
|
||||
});
|
||||
|
||||
it("produces a pdf-text representation with the expected page count", async () => {
|
||||
const { representation } = await ingestPdf(bytes);
|
||||
expect(representation.representationType).toBe("pdf-text");
|
||||
expect(representation.pageMap?.length).toBe(fixture.page_count);
|
||||
expect(representation.offsetMap?.length).toBe(fixture.page_count);
|
||||
});
|
||||
|
||||
it("canonical text contains the manifest's known-good quote", async () => {
|
||||
const { representation } = await ingestPdf(bytes);
|
||||
const text = representation.canonicalText ?? "";
|
||||
expect(text).toContain(fixture.known_good_quote);
|
||||
});
|
||||
|
||||
it("offsetMap is gap-free and covers [0, canonicalText.length)", async () => {
|
||||
const { representation } = await ingestPdf(bytes);
|
||||
const text = representation.canonicalText ?? "";
|
||||
const offsets = representation.offsetMap ?? [];
|
||||
expect(offsets.length).toBeGreaterThan(0);
|
||||
expect(offsets[0]!.globalStart).toBe(0);
|
||||
expect(offsets.at(-1)!.globalEnd).toBe(text.length);
|
||||
for (let i = 0; i < offsets.length; i++) {
|
||||
const r = offsets[i]!;
|
||||
expect(r.page).toBe(i + 1);
|
||||
expect(r.globalEnd - r.globalStart).toBe(r.pageLength);
|
||||
if (i > 0) expect(r.globalStart).toBe(offsets[i - 1]!.globalEnd);
|
||||
}
|
||||
});
|
||||
|
||||
it("pageMap entries have positive width and height in user-space points", async () => {
|
||||
const { representation } = await ingestPdf(bytes);
|
||||
const pages = representation.pageMap ?? [];
|
||||
for (let i = 0; i < pages.length; i++) {
|
||||
const p = pages[i]!;
|
||||
expect(p.page).toBe(i + 1);
|
||||
expect(p.width).toBeGreaterThan(0);
|
||||
expect(p.height).toBeGreaterThan(0);
|
||||
}
|
||||
});
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
describe("ingestPdf — option handling", () => {
|
||||
const fixture = FIXTURES[0]!;
|
||||
const path = resolve(FIXTURE_DIR, fixture.filename);
|
||||
const bytes = new Uint8Array(readFileSync(path));
|
||||
|
||||
it("uses explicit title over filename", async () => {
|
||||
const { document } = await ingestPdf(bytes, {
|
||||
filename: fixture.filename,
|
||||
title: "Custom Title",
|
||||
});
|
||||
expect(document.title).toBe("Custom Title");
|
||||
});
|
||||
|
||||
it("omits title entirely when neither filename nor title is supplied", async () => {
|
||||
const { document } = await ingestPdf(bytes);
|
||||
expect(document.title).toBeUndefined();
|
||||
});
|
||||
|
||||
it("propagates uri and metadata when supplied", async () => {
|
||||
const { document } = await ingestPdf(bytes, {
|
||||
uri: "file:///example.pdf",
|
||||
metadata: { source: "test" },
|
||||
});
|
||||
expect(document.uri).toBe("file:///example.pdf");
|
||||
expect(document.metadata).toEqual({ source: "test" });
|
||||
});
|
||||
|
||||
it("accepts ArrayBuffer input", async () => {
|
||||
const ab = bytes.buffer.slice(bytes.byteOffset, bytes.byteOffset + bytes.byteLength);
|
||||
const { document } = await ingestPdf(ab);
|
||||
expect(document.fingerprint).toMatch(/^[0-9a-f]{64}$/);
|
||||
});
|
||||
});
|
||||
88
src/source/pdf/ingest.ts
Normal file
88
src/source/pdf/ingest.ts
Normal file
@@ -0,0 +1,88 @@
|
||||
/**
|
||||
* PDF ingest pipeline → `{ document, representation }`.
|
||||
*
|
||||
* Implements `wiki/ArchitectureOverview.md` §3.4 ("Raw Source → identify
|
||||
* media type → compute fingerprint → extract metadata → extract canonical
|
||||
* text → build format-specific maps → persist Document +
|
||||
* DocumentRepresentation") for the PDF source format.
|
||||
*
|
||||
* Ingest is a pure function over bytes: it does not persist anything. The
|
||||
* caller (engine repositories in T05, app layer in T06) writes the returned
|
||||
* Document + DocumentRepresentation into the chosen store.
|
||||
*/
|
||||
|
||||
import {
|
||||
type Document,
|
||||
type DocumentRepresentation,
|
||||
} from "@shared/document";
|
||||
import { newId } from "@shared/ids";
|
||||
import { extractPdf } from "./extract";
|
||||
import { fingerprintBytes } from "./fingerprint";
|
||||
|
||||
const PDF_MEDIA_TYPE = "application/pdf";
|
||||
|
||||
export interface IngestPdfOptions {
|
||||
/** Original filename, used as the default title when no title is given. */
|
||||
readonly filename?: string;
|
||||
/** Optional pre-existing title (overrides filename). */
|
||||
readonly title?: string;
|
||||
/** Optional source URI (e.g. file:// or https://). */
|
||||
readonly uri?: string;
|
||||
/** Free-form metadata persisted on the Document record. */
|
||||
readonly metadata?: Readonly<Record<string, unknown>>;
|
||||
}
|
||||
|
||||
export interface IngestPdfResult {
|
||||
readonly document: Document;
|
||||
readonly representation: DocumentRepresentation;
|
||||
}
|
||||
|
||||
export type IngestPdfInput = Uint8Array | ArrayBuffer | Blob;
|
||||
|
||||
export async function ingestPdf(
|
||||
input: IngestPdfInput,
|
||||
options: IngestPdfOptions = {},
|
||||
): Promise<IngestPdfResult> {
|
||||
const bytes = await toBytes(input);
|
||||
const [fingerprint, extraction] = await Promise.all([
|
||||
fingerprintBytes(bytes),
|
||||
extractPdf(bytes),
|
||||
]);
|
||||
|
||||
const now = new Date().toISOString();
|
||||
const documentId = newId("document");
|
||||
const representationId = newId("representation");
|
||||
const title = options.title ?? options.filename;
|
||||
|
||||
const document: Document = {
|
||||
id: documentId,
|
||||
mediaType: PDF_MEDIA_TYPE,
|
||||
fingerprint,
|
||||
createdAt: now,
|
||||
updatedAt: now,
|
||||
...(title !== undefined ? { title } : {}),
|
||||
...(options.uri !== undefined ? { uri: options.uri } : {}),
|
||||
...(options.metadata !== undefined ? { metadata: options.metadata } : {}),
|
||||
};
|
||||
|
||||
const representation: DocumentRepresentation = {
|
||||
id: representationId,
|
||||
documentId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: fingerprint,
|
||||
canonicalText: extraction.canonicalText,
|
||||
pageMap: extraction.pageMap,
|
||||
offsetMap: extraction.offsetMap,
|
||||
generatedAt: now,
|
||||
};
|
||||
|
||||
return { document, representation };
|
||||
}
|
||||
|
||||
async function toBytes(input: IngestPdfInput): Promise<Uint8Array> {
|
||||
if (input instanceof Uint8Array) return input;
|
||||
if (input instanceof ArrayBuffer) return new Uint8Array(input);
|
||||
// Blob (covers `File` in browsers — File extends Blob).
|
||||
const buf = await input.arrayBuffer();
|
||||
return new Uint8Array(buf);
|
||||
}
|
||||
100
src/work/AnnotationToolbar.tsx
Normal file
100
src/work/AnnotationToolbar.tsx
Normal file
@@ -0,0 +1,100 @@
|
||||
/**
|
||||
* AnnotationToolbar — wires "I selected text" into "evidence appears in
|
||||
* the sidebar".
|
||||
*
|
||||
* Visible only when a `pendingSelection` is set (the viewer publishes
|
||||
* captures into context, then this toolbar lets the user attach commentary
|
||||
* and commit). On Save it runs the full pipeline:
|
||||
*
|
||||
* 1. `createSelectors(capture, representation)` — anchor builds the
|
||||
* maximal selector set against the active representation.
|
||||
* 2. `engine.annotations.create(...)` — engine mints an Annotation +
|
||||
* emits AnnotationCreated.
|
||||
* 3. `engine.evidence.create(...)` — engine mints the EvidenceItem with
|
||||
* the user's commentary, emits EvidenceItemCreated.
|
||||
*
|
||||
* The sidebar re-renders via the engine event bus, so no other glue is
|
||||
* needed.
|
||||
*/
|
||||
|
||||
import { useEffect, useState } from "react";
|
||||
import { createSelectors } from "@anchor/index";
|
||||
import {
|
||||
useActiveDocument,
|
||||
useEngine,
|
||||
usePendingSelection,
|
||||
} from "./EngineContext";
|
||||
|
||||
export function AnnotationToolbar() {
|
||||
const engine = useEngine();
|
||||
const { document, representation } = useActiveDocument();
|
||||
const { pending, set } = usePendingSelection();
|
||||
const [commentary, setCommentary] = useState("");
|
||||
|
||||
// Reset the commentary box whenever a fresh selection arrives.
|
||||
useEffect(() => {
|
||||
setCommentary("");
|
||||
}, [pending]);
|
||||
|
||||
if (!pending || !document || !representation) return null;
|
||||
|
||||
const handleSave = () => {
|
||||
const selectors = createSelectors(pending.capture, representation);
|
||||
const annotation = engine.annotations.create({
|
||||
documentId: document.id,
|
||||
representationId: representation.id,
|
||||
selectors,
|
||||
quote: pending.capture.text,
|
||||
});
|
||||
engine.evidence.create({
|
||||
annotationIds: [annotation.id],
|
||||
...(commentary.trim().length > 0 ? { commentary: commentary.trim() } : {}),
|
||||
});
|
||||
set(null);
|
||||
};
|
||||
|
||||
const handleDiscard = () => set(null);
|
||||
|
||||
const quote = pending.capture.text;
|
||||
const shortQuote = quote.length > 200 ? `${quote.slice(0, 200)}…` : quote;
|
||||
|
||||
return (
|
||||
<div
|
||||
style={{
|
||||
borderBottom: "1px solid #f0c040",
|
||||
background: "#fff8d6",
|
||||
padding: 8,
|
||||
fontFamily: "system-ui, sans-serif",
|
||||
fontSize: 12,
|
||||
}}
|
||||
>
|
||||
<div style={{ marginBottom: 6, fontWeight: 600 }}>
|
||||
New annotation ({pending.selectors.length} selector{pending.selectors.length === 1 ? "" : "s"})
|
||||
</div>
|
||||
<div style={{ marginBottom: 6, fontStyle: "italic", color: "#444" }}>
|
||||
“{shortQuote}”
|
||||
</div>
|
||||
<textarea
|
||||
value={commentary}
|
||||
onChange={(e) => setCommentary(e.target.value)}
|
||||
placeholder="Add a one-line comment (optional)…"
|
||||
rows={2}
|
||||
style={{
|
||||
width: "100%",
|
||||
boxSizing: "border-box",
|
||||
fontSize: 12,
|
||||
padding: 4,
|
||||
marginBottom: 6,
|
||||
}}
|
||||
/>
|
||||
<div style={{ display: "flex", gap: 6 }}>
|
||||
<button onClick={handleSave} style={{ fontSize: 12, padding: "4px 10px" }}>
|
||||
Save evidence
|
||||
</button>
|
||||
<button onClick={handleDiscard} style={{ fontSize: 12, padding: "4px 10px" }}>
|
||||
Discard
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
125
src/work/CollectionList.tsx
Normal file
125
src/work/CollectionList.tsx
Normal file
@@ -0,0 +1,125 @@
|
||||
/**
|
||||
* CollectionList — the left pane.
|
||||
*
|
||||
* Lists the fixture corpus (the MVP stand-in for a real document collection).
|
||||
* Clicking a fixture fetches the bytes, runs `ingestPdf` (PDF.js extraction
|
||||
* + fingerprint + canonical text), registers the result with the engine
|
||||
* (emitting §4 events), and activates it as the current document.
|
||||
*
|
||||
* Per CE-WP-0002-T06, the loaded fixture set is hard-wired to
|
||||
* `fixtures/pdfs/manifest.json`. Real collections arrive in a later
|
||||
* workplan.
|
||||
*/
|
||||
|
||||
import { useCallback, useState } from "react";
|
||||
import { ingestPdf } from "@source/index";
|
||||
import { useEngine, useActiveDocumentId } from "./EngineContext";
|
||||
import type { DocumentId } from "@shared/ids";
|
||||
import manifest from "../../fixtures/pdfs/manifest.json";
|
||||
|
||||
interface Fixture {
|
||||
id: string;
|
||||
filename: string;
|
||||
description: string;
|
||||
page_count: number;
|
||||
}
|
||||
|
||||
const FIXTURES: readonly Fixture[] = (manifest as { fixtures: Fixture[] }).fixtures;
|
||||
|
||||
export function CollectionList() {
|
||||
const engine = useEngine();
|
||||
const { id: activeId, setId } = useActiveDocumentId();
|
||||
const [loadingFixtureId, setLoadingFixtureId] = useState<string | null>(null);
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
// Remember which fixture-id maps to which loaded documentId so re-clicking
|
||||
// a fixture activates the existing engine record rather than re-ingesting.
|
||||
const [byFixture, setByFixture] = useState<Record<string, DocumentId>>({});
|
||||
|
||||
const handleLoad = useCallback(
|
||||
async (fixture: Fixture) => {
|
||||
setError(null);
|
||||
|
||||
const existing = byFixture[fixture.id];
|
||||
if (existing) {
|
||||
setId(existing);
|
||||
return;
|
||||
}
|
||||
|
||||
setLoadingFixtureId(fixture.id);
|
||||
try {
|
||||
const url = `/fixtures/pdfs/${encodeURIComponent(fixture.filename)}`;
|
||||
const response = await fetch(url);
|
||||
if (!response.ok) {
|
||||
throw new Error(`fetch ${url} → ${response.status}`);
|
||||
}
|
||||
const buffer = await response.arrayBuffer();
|
||||
const { document, representation } = await ingestPdf(new Uint8Array(buffer), {
|
||||
filename: fixture.filename,
|
||||
});
|
||||
engine.documents.register({ document, representation });
|
||||
setByFixture((prev) => ({ ...prev, [fixture.id]: document.id }));
|
||||
setId(document.id);
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : String(err));
|
||||
} finally {
|
||||
setLoadingFixtureId(null);
|
||||
}
|
||||
},
|
||||
[byFixture, engine, setId],
|
||||
);
|
||||
|
||||
return (
|
||||
<aside
|
||||
style={{
|
||||
width: 280,
|
||||
borderRight: "1px solid #ddd",
|
||||
padding: 12,
|
||||
overflow: "auto",
|
||||
flex: "0 0 280px",
|
||||
}}
|
||||
>
|
||||
<h2 style={{ marginTop: 0, fontSize: 16 }}>Collection</h2>
|
||||
<p style={{ fontSize: 12, color: "#555", marginTop: 0 }}>
|
||||
{FIXTURES.length} fixture PDF{FIXTURES.length === 1 ? "" : "s"}
|
||||
</p>
|
||||
{error && (
|
||||
<p style={{ fontSize: 12, color: "#b00020", background: "#fff4f4", padding: 6 }}>
|
||||
{error}
|
||||
</p>
|
||||
)}
|
||||
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
|
||||
{FIXTURES.map((f) => {
|
||||
const isLoading = loadingFixtureId === f.id;
|
||||
const documentId = byFixture[f.id];
|
||||
const isActive = documentId !== undefined && documentId === activeId;
|
||||
return (
|
||||
<li key={f.id} style={{ marginBottom: 6 }}>
|
||||
<button
|
||||
onClick={() => {
|
||||
void handleLoad(f);
|
||||
}}
|
||||
disabled={isLoading}
|
||||
style={{
|
||||
display: "block",
|
||||
width: "100%",
|
||||
textAlign: "left",
|
||||
background: isActive ? "#e8f0ff" : "white",
|
||||
border: "1px solid #ccc",
|
||||
padding: 6,
|
||||
cursor: isLoading ? "wait" : "pointer",
|
||||
fontSize: 12,
|
||||
}}
|
||||
>
|
||||
<div style={{ fontWeight: 600 }}>{f.id}</div>
|
||||
<div style={{ color: "#666", fontSize: 11 }}>
|
||||
{f.page_count} page{f.page_count === 1 ? "" : "s"}
|
||||
{isLoading ? " · loading…" : isActive ? " · open" : ""}
|
||||
</div>
|
||||
</button>
|
||||
</li>
|
||||
);
|
||||
})}
|
||||
</ul>
|
||||
</aside>
|
||||
);
|
||||
}
|
||||
219
src/work/EngineContext.tsx
Normal file
219
src/work/EngineContext.tsx
Normal file
@@ -0,0 +1,219 @@
|
||||
/**
|
||||
* Engine + active-document React context.
|
||||
*
|
||||
* MVP composition root for the UI: one `Engine` instance for the lifetime of
|
||||
* the SPA, plus the "what's open in the viewer right now" pointer.
|
||||
* `useEngine()` returns the engine; `useActiveDocument()` returns the
|
||||
* currently-loaded `{document, representation}` pair, refreshed when the
|
||||
* engine emits `DocumentImported` / `DocumentRepresentationGenerated`.
|
||||
*
|
||||
* Replaces ad-hoc engine wiring inside each component. Per the workplan
|
||||
* (T07 note), state lives in a single React context; no Zustand or Redux.
|
||||
*/
|
||||
|
||||
import {
|
||||
createContext,
|
||||
useCallback,
|
||||
useContext,
|
||||
useEffect,
|
||||
useMemo,
|
||||
useState,
|
||||
type ReactNode,
|
||||
} from "react";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { AnnotationId, DocumentId } from "@shared/ids";
|
||||
import type { Selector } from "@shared/selector";
|
||||
import {
|
||||
attachPersister,
|
||||
createEngine,
|
||||
restoreFromStorage,
|
||||
type Engine,
|
||||
} from "@engine/index";
|
||||
import type { PdfSelectionCapture } from "@anchor/index";
|
||||
|
||||
/**
|
||||
* localStorage keys for the engine snapshot and the UI's "what was open"
|
||||
* pointer. ADR-0005 frames both as deliberately temporary — real
|
||||
* persistence later.
|
||||
*/
|
||||
const STORAGE_KEY = "citation-evidence:engine-snapshot:v1";
|
||||
const ACTIVE_KEY = "citation-evidence:active-document-id:v1";
|
||||
|
||||
/**
|
||||
* The pending selection lives in context (not local component state) because
|
||||
* the toolbar that consumes it is rendered above the viewer, not inside it.
|
||||
* `null` means "no selection waiting for a comment".
|
||||
*/
|
||||
export interface PendingSelection {
|
||||
readonly capture: PdfSelectionCapture;
|
||||
readonly selectors: readonly Selector[];
|
||||
}
|
||||
|
||||
interface EngineContextValue {
|
||||
readonly engine: Engine;
|
||||
readonly activeDocumentId: DocumentId | null;
|
||||
setActiveDocumentId(id: DocumentId | null): void;
|
||||
readonly pendingSelection: PendingSelection | null;
|
||||
setPendingSelection(pending: PendingSelection | null): void;
|
||||
readonly scrollToAnnotationId: AnnotationId | null;
|
||||
/** The version counter bumps even when the same id is set twice in a row,
|
||||
* so a second click on the same evidence item still triggers a scroll. */
|
||||
readonly scrollVersion: number;
|
||||
scrollToAnnotation(id: AnnotationId | null): void;
|
||||
}
|
||||
|
||||
const EngineContext = createContext<EngineContextValue | null>(null);
|
||||
|
||||
interface EngineProviderProps {
|
||||
readonly children: ReactNode;
|
||||
/** Inject a pre-built engine for tests; production uses the default. */
|
||||
readonly engine?: Engine;
|
||||
}
|
||||
|
||||
export function EngineProvider({ children, engine: injected }: EngineProviderProps) {
|
||||
const engine = useMemo(() => injected ?? createEngine(), [injected]);
|
||||
const [activeDocumentId, setActiveDocumentIdState] = useState<DocumentId | null>(null);
|
||||
const [pendingSelection, setPendingSelection] = useState<PendingSelection | null>(null);
|
||||
const [scrollState, setScrollState] = useState<{ id: AnnotationId | null; version: number }>({
|
||||
id: null,
|
||||
version: 0,
|
||||
});
|
||||
|
||||
// Restore from localStorage on first mount, then attach the persister.
|
||||
// The injected-engine path skips persistence (tests own their lifecycle).
|
||||
useEffect(() => {
|
||||
if (injected) return;
|
||||
if (typeof globalThis.localStorage === "undefined") return;
|
||||
const result = restoreFromStorage(engine, { key: STORAGE_KEY });
|
||||
if (result.restored) {
|
||||
const saved = globalThis.localStorage.getItem(ACTIVE_KEY);
|
||||
if (saved && engine.documents.get(saved as DocumentId)) {
|
||||
setActiveDocumentIdState(saved as DocumentId);
|
||||
}
|
||||
}
|
||||
return attachPersister(engine, { key: STORAGE_KEY });
|
||||
}, [engine, injected]);
|
||||
|
||||
// Persist the active-document pointer alongside the engine snapshot so a
|
||||
// reload lands the user back where they were.
|
||||
useEffect(() => {
|
||||
if (injected) return;
|
||||
if (typeof globalThis.localStorage === "undefined") return;
|
||||
if (activeDocumentId) {
|
||||
globalThis.localStorage.setItem(ACTIVE_KEY, activeDocumentId);
|
||||
} else {
|
||||
globalThis.localStorage.removeItem(ACTIVE_KEY);
|
||||
}
|
||||
}, [activeDocumentId, injected]);
|
||||
|
||||
// Switching the active document discards any pending selection — it
|
||||
// belongs to the previous document's viewer state.
|
||||
const setActiveDocumentId = useCallback((id: DocumentId | null) => {
|
||||
setActiveDocumentIdState(id);
|
||||
setPendingSelection(null);
|
||||
setScrollState((prev) => ({ id: null, version: prev.version + 1 }));
|
||||
}, []);
|
||||
|
||||
const scrollToAnnotation = useCallback((id: AnnotationId | null) => {
|
||||
setScrollState((prev) => ({ id, version: prev.version + 1 }));
|
||||
}, []);
|
||||
|
||||
const value = useMemo<EngineContextValue>(
|
||||
() => ({
|
||||
engine,
|
||||
activeDocumentId,
|
||||
setActiveDocumentId,
|
||||
pendingSelection,
|
||||
setPendingSelection,
|
||||
scrollToAnnotationId: scrollState.id,
|
||||
scrollVersion: scrollState.version,
|
||||
scrollToAnnotation,
|
||||
}),
|
||||
[engine, activeDocumentId, setActiveDocumentId, pendingSelection, scrollState, scrollToAnnotation],
|
||||
);
|
||||
|
||||
return <EngineContext.Provider value={value}>{children}</EngineContext.Provider>;
|
||||
}
|
||||
|
||||
export function useEngine(): Engine {
|
||||
const ctx = useContext(EngineContext);
|
||||
if (!ctx) throw new Error("useEngine: missing EngineProvider");
|
||||
return ctx.engine;
|
||||
}
|
||||
|
||||
export function useActiveDocumentId(): {
|
||||
readonly id: DocumentId | null;
|
||||
setId(id: DocumentId | null): void;
|
||||
} {
|
||||
const ctx = useContext(EngineContext);
|
||||
if (!ctx) throw new Error("useActiveDocumentId: missing EngineProvider");
|
||||
return { id: ctx.activeDocumentId, setId: ctx.setActiveDocumentId };
|
||||
}
|
||||
|
||||
export function useActiveDocument(): {
|
||||
readonly document: Document | null;
|
||||
readonly representation: DocumentRepresentation | null;
|
||||
} {
|
||||
const engine = useEngine();
|
||||
const { id } = useActiveDocumentId();
|
||||
const [tick, setTick] = useState(0);
|
||||
|
||||
// Re-render when documents come and go so list views stay fresh.
|
||||
useEffect(() => {
|
||||
const off1 = engine.bus.on("DocumentImported", () => setTick((t) => t + 1));
|
||||
const off2 = engine.bus.on("DocumentRepresentationGenerated", () => setTick((t) => t + 1));
|
||||
return () => {
|
||||
off1();
|
||||
off2();
|
||||
};
|
||||
}, [engine]);
|
||||
|
||||
const document = id ? engine.documents.get(id) : null;
|
||||
const representation = id
|
||||
? engine.documents.listRepresentations(id).at(-1) ?? null
|
||||
: null;
|
||||
|
||||
// `tick` is intentionally read to silence unused-var warnings; the dep
|
||||
// chain is via useState so React handles the re-render. We don't actually
|
||||
// need to consume the value.
|
||||
void tick;
|
||||
|
||||
return { document, representation };
|
||||
}
|
||||
|
||||
/**
|
||||
* Subscribe to a single engine event type and trigger a re-render each time
|
||||
* it fires. Returns the current monotonic counter — pure state-marker.
|
||||
*/
|
||||
export function useEngineEventTick<T extends Parameters<Engine["bus"]["on"]>[0]>(
|
||||
type: T,
|
||||
): number {
|
||||
const engine = useEngine();
|
||||
const [tick, setTick] = useState(0);
|
||||
const bump = useCallback(() => setTick((t) => t + 1), []);
|
||||
useEffect(() => engine.bus.on(type, bump), [engine, type, bump]);
|
||||
return tick;
|
||||
}
|
||||
|
||||
export function usePendingSelection(): {
|
||||
readonly pending: PendingSelection | null;
|
||||
set(pending: PendingSelection | null): void;
|
||||
} {
|
||||
const ctx = useContext(EngineContext);
|
||||
if (!ctx) throw new Error("usePendingSelection: missing EngineProvider");
|
||||
return { pending: ctx.pendingSelection, set: ctx.setPendingSelection };
|
||||
}
|
||||
|
||||
export function useScrollToAnnotation(): {
|
||||
readonly id: AnnotationId | null;
|
||||
readonly version: number;
|
||||
scrollTo(id: AnnotationId | null): void;
|
||||
} {
|
||||
const ctx = useContext(EngineContext);
|
||||
if (!ctx) throw new Error("useScrollToAnnotation: missing EngineProvider");
|
||||
return {
|
||||
id: ctx.scrollToAnnotationId,
|
||||
version: ctx.scrollVersion,
|
||||
scrollTo: ctx.scrollToAnnotation,
|
||||
};
|
||||
}
|
||||
101
src/work/EvidenceSidebar.tsx
Normal file
101
src/work/EvidenceSidebar.tsx
Normal file
@@ -0,0 +1,101 @@
|
||||
/**
|
||||
* EvidenceSidebar — the right pane.
|
||||
*
|
||||
* Lists `EvidenceItem`s scoped to the currently-active document. Each row
|
||||
* shows quote + commentary + status. Clicking a row emits
|
||||
* `EvidenceItemActivated` via the engine, which T08 will translate into a
|
||||
* scroll-to-passage in the viewer.
|
||||
*
|
||||
* T06 scope: read-only display + activation event. Item creation lives in
|
||||
* T07; the click-to-reopen integration lives in T08.
|
||||
*/
|
||||
|
||||
import { useMemo } from "react";
|
||||
import type { EvidenceItem } from "@shared/evidence";
|
||||
import {
|
||||
useActiveDocument,
|
||||
useEngine,
|
||||
useEngineEventTick,
|
||||
useScrollToAnnotation,
|
||||
} from "./EngineContext";
|
||||
|
||||
export interface EvidenceSidebarProps {
|
||||
onActivate?(item: EvidenceItem): void;
|
||||
}
|
||||
|
||||
export function EvidenceSidebar(props: EvidenceSidebarProps) {
|
||||
const engine = useEngine();
|
||||
const { document } = useActiveDocument();
|
||||
const { scrollTo } = useScrollToAnnotation();
|
||||
|
||||
// Refresh the list when items are created or updated. The tick values are
|
||||
// included in the memo deps below so the list re-resolves on each event.
|
||||
const createTick = useEngineEventTick("EvidenceItemCreated");
|
||||
const updateTick = useEngineEventTick("EvidenceItemUpdated");
|
||||
|
||||
const items = useMemo<readonly EvidenceItem[]>(() => {
|
||||
if (!document) return [];
|
||||
return engine.evidence.listByDocument(document.id);
|
||||
// createTick / updateTick are read here purely as memo invalidators.
|
||||
}, [document, engine, createTick, updateTick]);
|
||||
|
||||
return (
|
||||
<aside
|
||||
style={{
|
||||
width: 320,
|
||||
borderLeft: "1px solid #ddd",
|
||||
padding: 12,
|
||||
overflow: "auto",
|
||||
flex: "0 0 320px",
|
||||
fontFamily: "system-ui, sans-serif",
|
||||
}}
|
||||
>
|
||||
<h2 style={{ marginTop: 0, fontSize: 16 }}>Evidence</h2>
|
||||
{!document && (
|
||||
<p style={{ fontSize: 12, color: "#888" }}>No document open.</p>
|
||||
)}
|
||||
{document && items.length === 0 && (
|
||||
<p style={{ fontSize: 12, color: "#888" }}>
|
||||
No evidence yet. Select a passage in the viewer to create one.
|
||||
</p>
|
||||
)}
|
||||
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
|
||||
{items.map((item) => {
|
||||
const firstAnnotationId = item.annotationIds[0];
|
||||
const annotation = firstAnnotationId ? engine.annotations.get(firstAnnotationId) : null;
|
||||
const quote = annotation?.quote ?? "(no quote)";
|
||||
return (
|
||||
<li key={item.id} style={{ marginBottom: 8 }}>
|
||||
<button
|
||||
onClick={() => {
|
||||
engine.evidence.activate(item.id, "sidebar");
|
||||
if (firstAnnotationId) scrollTo(firstAnnotationId);
|
||||
props.onActivate?.(item);
|
||||
}}
|
||||
style={{
|
||||
display: "block",
|
||||
width: "100%",
|
||||
textAlign: "left",
|
||||
background: "#fff8d6",
|
||||
border: "1px solid #e0c050",
|
||||
padding: 8,
|
||||
cursor: "pointer",
|
||||
fontSize: 12,
|
||||
}}
|
||||
>
|
||||
<div style={{ fontStyle: "italic", marginBottom: 4 }}>
|
||||
“{quote.slice(0, 140)}
|
||||
{quote.length > 140 ? "…" : ""}”
|
||||
</div>
|
||||
{item.commentary && (
|
||||
<div style={{ color: "#333", marginBottom: 4 }}>{item.commentary}</div>
|
||||
)}
|
||||
<div style={{ color: "#666", fontSize: 11 }}>status: {item.status}</div>
|
||||
</button>
|
||||
</li>
|
||||
);
|
||||
})}
|
||||
</ul>
|
||||
</aside>
|
||||
);
|
||||
}
|
||||
94
src/work/ViewerShell.tsx
Normal file
94
src/work/ViewerShell.tsx
Normal file
@@ -0,0 +1,94 @@
|
||||
/**
|
||||
* ViewerShell — the centre pane.
|
||||
*
|
||||
* Hosts the viewer adapter (currently the T02 PDF spike) and shows whatever
|
||||
* is active. `work/` consumes only the adapter's public surface
|
||||
* (`PdfSpikeViewer`) — it never touches PDF.js or react-pdf-highlighter-plus
|
||||
* directly. When the PDF library is swapped (or the spike is replaced),
|
||||
* only the adapter module changes; this shell stays the same.
|
||||
*
|
||||
* T06 scope: load + render the active PDF + show stored annotations. The
|
||||
* selection-capture → annotation pipeline is wired in T07; the
|
||||
* click-to-reopen pipeline is wired in T08.
|
||||
*/
|
||||
|
||||
import { useMemo } from "react";
|
||||
import { PdfSpikeViewer, type StoredAnnotation } from "@anchor/index";
|
||||
import {
|
||||
useActiveDocument,
|
||||
useEngine,
|
||||
useEngineEventTick,
|
||||
usePendingSelection,
|
||||
useScrollToAnnotation,
|
||||
} from "./EngineContext";
|
||||
import { AnnotationToolbar } from "./AnnotationToolbar";
|
||||
|
||||
export function ViewerShell() {
|
||||
const engine = useEngine();
|
||||
const { document, representation } = useActiveDocument();
|
||||
const { set: setPending } = usePendingSelection();
|
||||
const { id: scrollToId, version: scrollVersion } = useScrollToAnnotation();
|
||||
|
||||
// The viewer needs to re-fetch its highlight list whenever annotations
|
||||
// change. The tick is included in the memo deps so the list re-resolves.
|
||||
const annotationTick = useEngineEventTick("AnnotationCreated");
|
||||
|
||||
const annotations = useMemo<StoredAnnotation[]>(() => {
|
||||
if (!document) return [];
|
||||
return engine.annotations.listByDocument(document.id).map((a) => ({
|
||||
id: a.id,
|
||||
text: a.quote ?? "",
|
||||
selectors: a.selectors,
|
||||
}));
|
||||
}, [document, engine, annotationTick]);
|
||||
|
||||
const fileUrl = useMemo(() => {
|
||||
if (!document) return null;
|
||||
const titleOrId = document.title ?? document.id;
|
||||
return `/fixtures/pdfs/${encodeURIComponent(titleOrId)}`;
|
||||
}, [document]);
|
||||
|
||||
if (!document || !representation || !fileUrl) {
|
||||
return (
|
||||
<main
|
||||
style={{
|
||||
flex: 1,
|
||||
display: "flex",
|
||||
alignItems: "center",
|
||||
justifyContent: "center",
|
||||
color: "#666",
|
||||
fontFamily: "system-ui, sans-serif",
|
||||
}}
|
||||
>
|
||||
Pick a fixture on the left to begin.
|
||||
</main>
|
||||
);
|
||||
}
|
||||
|
||||
return (
|
||||
<main
|
||||
style={{
|
||||
flex: 1,
|
||||
display: "flex",
|
||||
flexDirection: "column",
|
||||
overflow: "hidden",
|
||||
position: "relative",
|
||||
}}
|
||||
>
|
||||
<AnnotationToolbar />
|
||||
<div style={{ flex: 1, overflow: "hidden", position: "relative" }}>
|
||||
<PdfSpikeViewer
|
||||
// Re-key on scrollVersion so clicking the same item twice still
|
||||
// triggers the viewer's mount-time scroll effect.
|
||||
key={`${document.id}#${scrollVersion}`}
|
||||
pdfUrl={fileUrl}
|
||||
storedAnnotations={annotations}
|
||||
{...(scrollToId ? { scrollToAnnotationId: scrollToId } : {})}
|
||||
onSelectionCaptured={(capture, selectors) => {
|
||||
setPending({ capture, selectors });
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
</main>
|
||||
);
|
||||
}
|
||||
@@ -1 +1,13 @@
|
||||
export {};
|
||||
export { CollectionList } from "./CollectionList";
|
||||
export { ViewerShell } from "./ViewerShell";
|
||||
export { EvidenceSidebar, type EvidenceSidebarProps } from "./EvidenceSidebar";
|
||||
export { AnnotationToolbar } from "./AnnotationToolbar";
|
||||
export {
|
||||
EngineProvider,
|
||||
useEngine,
|
||||
useActiveDocument,
|
||||
useActiveDocumentId,
|
||||
useEngineEventTick,
|
||||
usePendingSelection,
|
||||
type PendingSelection,
|
||||
} from "./EngineContext";
|
||||
|
||||
72
tests/integration/anchor-source-roundtrip.test.ts
Normal file
72
tests/integration/anchor-source-roundtrip.test.ts
Normal file
@@ -0,0 +1,72 @@
|
||||
/**
|
||||
* Integration round-trip: ingest → createSelectors → resolveSelectors.
|
||||
*
|
||||
* This test crosses the source ↔ anchor boundary, which the
|
||||
* `boundaries/element-types` lint rule (correctly) forbids inside `src/`.
|
||||
* It lives under `tests/integration/` so it can verify the end-to-end
|
||||
* MVP contract without weakening the production-code boundary.
|
||||
*
|
||||
* CE-WP-0002-T04 contract:
|
||||
* For each fixture+known-quote pair, create selectors then immediately
|
||||
* resolve them; resolution must succeed with confidence ≥ 0.9.
|
||||
*/
|
||||
|
||||
import { readFileSync } from "node:fs";
|
||||
import { dirname, resolve } from "node:path";
|
||||
import { createRequire } from "node:module";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import { beforeAll, describe, expect, it } from "vitest";
|
||||
|
||||
import { ingestPdf } from "@source/pdf/ingest";
|
||||
import { createSelectors, resolveSelectors } from "@anchor/selectors";
|
||||
import type { PdfSelectionCapture } from "@anchor/types";
|
||||
import manifest from "../../fixtures/pdfs/manifest.json" with { type: "json" };
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const FIXTURE_DIR = resolve(__dirname, "../../fixtures/pdfs");
|
||||
|
||||
interface Fixture {
|
||||
id: string;
|
||||
filename: string;
|
||||
known_good_quote: string;
|
||||
known_good_quote_page: number;
|
||||
}
|
||||
|
||||
const FIXTURES: readonly Fixture[] = manifest.fixtures;
|
||||
|
||||
beforeAll(async () => {
|
||||
const pdfjs = await import("pdfjs-dist");
|
||||
const require = createRequire(import.meta.url);
|
||||
pdfjs.GlobalWorkerOptions.workerSrc = require.resolve(
|
||||
"pdfjs-dist/legacy/build/pdf.worker.mjs",
|
||||
);
|
||||
});
|
||||
|
||||
describe("create + resolve round-trip — fixture corpus", () => {
|
||||
for (const fixture of FIXTURES) {
|
||||
it(`${fixture.id}: known-good quote round-trips with confidence ≥ 0.9`, async () => {
|
||||
const bytes = new Uint8Array(readFileSync(resolve(FIXTURE_DIR, fixture.filename)));
|
||||
const { representation } = await ingestPdf(bytes, { filename: fixture.filename });
|
||||
|
||||
const capture: PdfSelectionCapture = {
|
||||
kind: "pdf",
|
||||
text: fixture.known_good_quote,
|
||||
page: fixture.known_good_quote_page,
|
||||
rects: [{ x: 0.1, y: 0.2, width: 0.5, height: 0.04 }],
|
||||
};
|
||||
|
||||
const selectors = createSelectors(capture, representation);
|
||||
const resolution = resolveSelectors(selectors, representation);
|
||||
|
||||
expect(resolution.status).toBe("resolved");
|
||||
expect(resolution.confidence).toBeGreaterThanOrEqual(0.9);
|
||||
|
||||
const span = resolution.candidates[0]?.textPosition;
|
||||
expect(span).toBeDefined();
|
||||
const text = representation.canonicalText ?? "";
|
||||
expect(text.slice(span!.start, span!.end)).toBe(fixture.known_good_quote);
|
||||
|
||||
expect(resolution.candidates[0]?.page).toBe(fixture.known_good_quote_page);
|
||||
});
|
||||
}
|
||||
});
|
||||
262
tests/integration/app-prd-scenario.dom.test.tsx
Normal file
262
tests/integration/app-prd-scenario.dom.test.tsx
Normal file
@@ -0,0 +1,262 @@
|
||||
/**
|
||||
* CE-WP-0002-T09 — end-to-end test of the PRD slice-1 scenario.
|
||||
*
|
||||
* Steps verified (per the workplan):
|
||||
* 1. Open the app.
|
||||
* 2. Pick a fixture PDF.
|
||||
* 3. Programmatically inject a selection for the manifest's known-good
|
||||
* quote (the actual drag-select interaction is verified manually —
|
||||
* see ADR-0004 for the headless-Chromium limitation).
|
||||
* 4. Save an evidence item with a comment.
|
||||
* 5. The item appears in the sidebar.
|
||||
* 6. Reload the page (unmount + remount with the same localStorage).
|
||||
* 7. Click the evidence item.
|
||||
* 8. The viewer is asked to scroll to the correct annotation
|
||||
* (the visual highlight render is not exercised here — see ADR-0004).
|
||||
*
|
||||
* The PDF.js viewer + ingest pipeline are mocked. Both have dedicated
|
||||
* coverage elsewhere (`src/source/pdf/ingest.test.ts`,
|
||||
* `tests/integration/anchor-source-roundtrip.test.ts`,
|
||||
* `src/anchor/pdf-selector-math.test.ts`). T09 owns the *wiring* of the
|
||||
* full UX flow; the PDF-rendering correctness lives in those layers.
|
||||
*/
|
||||
|
||||
// @vitest-environment happy-dom
|
||||
|
||||
import { act, render, screen, waitFor } from "@testing-library/react";
|
||||
import userEvent from "@testing-library/user-event";
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
|
||||
|
||||
import type { Selector } from "@shared/selector";
|
||||
import type { DocumentId, RepresentationId } from "@shared/ids";
|
||||
import type { Document, DocumentRepresentation } from "@shared/document";
|
||||
import type { PdfSelectionCapture } from "@anchor/index";
|
||||
import manifest from "../../fixtures/pdfs/manifest.json" with { type: "json" };
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Mocks
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
interface ViewerProps {
|
||||
pdfUrl: string;
|
||||
storedAnnotations: readonly { id: string; text: string; selectors: readonly Selector[] }[];
|
||||
scrollToAnnotationId?: string;
|
||||
onSelectionCaptured(capture: PdfSelectionCapture, selectors: Selector[]): void;
|
||||
}
|
||||
|
||||
interface ViewerSnapshot {
|
||||
pdfUrl: string | null;
|
||||
storedAnnotationIds: string[];
|
||||
scrollToAnnotationId: string | null;
|
||||
onSelectionCaptured: ViewerProps["onSelectionCaptured"] | null;
|
||||
}
|
||||
|
||||
const viewerSnapshot: ViewerSnapshot = {
|
||||
pdfUrl: null,
|
||||
storedAnnotationIds: [],
|
||||
scrollToAnnotationId: null,
|
||||
onSelectionCaptured: null,
|
||||
};
|
||||
|
||||
vi.mock("@anchor/index", async (importOriginal) => {
|
||||
const original = await importOriginal<typeof import("@anchor/index")>();
|
||||
const MockPdfSpikeViewer = (props: ViewerProps) => {
|
||||
viewerSnapshot.pdfUrl = props.pdfUrl;
|
||||
viewerSnapshot.storedAnnotationIds = props.storedAnnotations.map((a) => a.id);
|
||||
viewerSnapshot.scrollToAnnotationId = props.scrollToAnnotationId ?? null;
|
||||
viewerSnapshot.onSelectionCaptured = props.onSelectionCaptured;
|
||||
return (
|
||||
<div
|
||||
data-testid="mock-pdf-viewer"
|
||||
data-pdf-url={props.pdfUrl}
|
||||
data-scroll-to={props.scrollToAnnotationId ?? ""}
|
||||
data-stored-count={String(props.storedAnnotations.length)}
|
||||
/>
|
||||
);
|
||||
};
|
||||
return {
|
||||
...original,
|
||||
PdfSpikeViewer: MockPdfSpikeViewer,
|
||||
};
|
||||
});
|
||||
|
||||
// Mock ingestPdf so the click-to-load flow doesn't pull PDF.js into jsdom.
|
||||
// The synthetic representation includes the manifest's known-good quote in
|
||||
// its canonical text so createSelectors/resolveSelectors run for real.
|
||||
const FIXTURE = manifest.fixtures.find((f) => f.id === "fristsetzung-bezifferung")!;
|
||||
const SYNTHETIC_CANONICAL = [
|
||||
"Header boilerplate that comes before the quote.",
|
||||
FIXTURE.known_good_quote,
|
||||
"Trailing prose that comes after the quote.",
|
||||
].join(" ");
|
||||
|
||||
vi.mock("@source/index", async (importOriginal) => {
|
||||
const original = await importOriginal<typeof import("@source/index")>();
|
||||
return {
|
||||
...original,
|
||||
ingestPdf: vi.fn(async (_input: unknown, options?: { filename?: string }) => {
|
||||
const documentId = ("doc_test_" + Math.random().toString(36).slice(2, 10)) as DocumentId;
|
||||
const representationId = ("rep_test_" + Math.random().toString(36).slice(2, 10)) as RepresentationId;
|
||||
const document: Document = {
|
||||
id: documentId,
|
||||
mediaType: "application/pdf",
|
||||
...(options?.filename ? { title: options.filename } : {}),
|
||||
fingerprint: "synthetic-fingerprint-for-test",
|
||||
createdAt: "2026-05-25T00:00:00.000Z",
|
||||
updatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
const representation: DocumentRepresentation = {
|
||||
id: representationId,
|
||||
documentId,
|
||||
representationType: "pdf-text",
|
||||
contentHash: "synthetic-fingerprint-for-test",
|
||||
canonicalText: SYNTHETIC_CANONICAL,
|
||||
pageMap: [{ page: 1, width: 595, height: 842 }],
|
||||
offsetMap: [
|
||||
{
|
||||
page: 1,
|
||||
globalStart: 0,
|
||||
globalEnd: SYNTHETIC_CANONICAL.length,
|
||||
pageLength: SYNTHETIC_CANONICAL.length,
|
||||
},
|
||||
],
|
||||
generatedAt: "2026-05-25T00:00:00.000Z",
|
||||
};
|
||||
return { document, representation };
|
||||
}),
|
||||
};
|
||||
});
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
function resetViewerSnapshot() {
|
||||
viewerSnapshot.pdfUrl = null;
|
||||
viewerSnapshot.storedAnnotationIds = [];
|
||||
viewerSnapshot.scrollToAnnotationId = null;
|
||||
viewerSnapshot.onSelectionCaptured = null;
|
||||
}
|
||||
|
||||
function syntheticCaptureFor(fixturePage: number, text: string): PdfSelectionCapture {
|
||||
return {
|
||||
kind: "pdf",
|
||||
text,
|
||||
page: fixturePage,
|
||||
rects: [{ x: 0.1, y: 0.2, width: 0.4, height: 0.04 }],
|
||||
boundingRect: { x: 0.1, y: 0.2, width: 0.4, height: 0.04 },
|
||||
};
|
||||
}
|
||||
|
||||
async function loadApp() {
|
||||
// Late import so the vi.mock calls take effect before the module graph
|
||||
// pulls in @anchor / @source.
|
||||
const { App } = await import("@app/App");
|
||||
return render(<App />);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Test
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
describe("App — PRD scenario steps 1-8 (CE-WP-0002-T09)", () => {
|
||||
beforeEach(() => {
|
||||
resetViewerSnapshot();
|
||||
// Each test starts with empty localStorage.
|
||||
globalThis.localStorage?.clear();
|
||||
// The fetch isn't reached (ingestPdf is mocked) — but stub it so that
|
||||
// any accidental call returns gracefully instead of TypeError.
|
||||
globalThis.fetch = vi.fn(async () =>
|
||||
new Response(new Uint8Array([0x25, 0x50, 0x44, 0x46]).buffer, {
|
||||
status: 200,
|
||||
headers: { "Content-Type": "application/pdf" },
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
it("walks the full slice-1 scenario: load → select → save → reload → click → scroll", async () => {
|
||||
const user = userEvent.setup();
|
||||
|
||||
// Step 1: open the app.
|
||||
const { unmount } = await loadApp();
|
||||
expect(screen.getByText("Collection")).toBeTruthy();
|
||||
|
||||
// Step 2: pick a fixture.
|
||||
const fixtureButton = screen.getByRole("button", { name: new RegExp(FIXTURE.id) });
|
||||
await user.click(fixtureButton);
|
||||
|
||||
// The mock viewer should have mounted with our test URL.
|
||||
await waitFor(() => {
|
||||
expect(viewerSnapshot.pdfUrl).toBeTruthy();
|
||||
});
|
||||
expect(decodeURIComponent(viewerSnapshot.pdfUrl!)).toContain(FIXTURE.filename);
|
||||
|
||||
// Step 3: programmatically inject a selection for the known-good quote.
|
||||
expect(viewerSnapshot.onSelectionCaptured).not.toBeNull();
|
||||
await act(async () => {
|
||||
viewerSnapshot.onSelectionCaptured!(
|
||||
syntheticCaptureFor(FIXTURE.known_good_quote_page, FIXTURE.known_good_quote),
|
||||
[{ type: "TextQuoteSelector", exact: FIXTURE.known_good_quote }],
|
||||
);
|
||||
});
|
||||
|
||||
// The toolbar should appear with the quoted text preview.
|
||||
const toolbar = await screen.findByText(/New annotation/);
|
||||
expect(toolbar).toBeTruthy();
|
||||
|
||||
// Step 4: add a comment and save.
|
||||
const textarea = screen.getByPlaceholderText(/Add a one-line comment/);
|
||||
await user.type(textarea, "Important deadline clause");
|
||||
await user.click(screen.getByRole("button", { name: /Save evidence/ }));
|
||||
|
||||
// Step 5: the item appears in the sidebar. The commentary text is
|
||||
// unique to the right pane (the collection list never echoes it back).
|
||||
await screen.findByText(/Important deadline clause/);
|
||||
|
||||
// The mock viewer should now know about the stored annotation.
|
||||
await waitFor(() => {
|
||||
expect(viewerSnapshot.storedAnnotationIds.length).toBe(1);
|
||||
});
|
||||
const savedAnnotationId = viewerSnapshot.storedAnnotationIds[0]!;
|
||||
|
||||
// Snapshot key from EngineContext.STORAGE_KEY — implementation detail
|
||||
// but worth asserting once at the integration layer.
|
||||
const stored = globalThis.localStorage?.getItem("citation-evidence:engine-snapshot:v1");
|
||||
expect(stored).toBeTruthy();
|
||||
|
||||
// Step 6: reload — unmount and remount the App. The same localStorage is
|
||||
// still in place because we did not clear it.
|
||||
unmount();
|
||||
resetViewerSnapshot();
|
||||
await loadApp();
|
||||
|
||||
// The viewer should re-mount automatically because the active document
|
||||
// was persisted.
|
||||
await waitFor(() => {
|
||||
expect(viewerSnapshot.pdfUrl).toBeTruthy();
|
||||
});
|
||||
expect(decodeURIComponent(viewerSnapshot.pdfUrl!)).toContain(FIXTURE.filename);
|
||||
|
||||
// The sidebar should show the restored item.
|
||||
const restoredItem = await screen.findByText(/Important deadline clause/);
|
||||
|
||||
// The viewer must already know about the restored annotation.
|
||||
expect(viewerSnapshot.storedAnnotationIds).toContain(savedAnnotationId);
|
||||
|
||||
// Step 7: click the evidence item in the sidebar. The commentary text
|
||||
// is rendered inside the sidebar's <button>; walking up to its closest
|
||||
// button ancestor and clicking it triggers the activate flow.
|
||||
const sidebarButton = restoredItem.closest("button");
|
||||
expect(sidebarButton).not.toBeNull();
|
||||
await user.click(sidebarButton!);
|
||||
|
||||
// Step 8: the viewer was asked to scroll to the right annotation.
|
||||
await waitFor(() => {
|
||||
expect(viewerSnapshot.scrollToAnnotationId).toBe(savedAnnotationId);
|
||||
});
|
||||
});
|
||||
});
|
||||
@@ -34,6 +34,6 @@
|
||||
"@app/*": ["src/app/*"]
|
||||
}
|
||||
},
|
||||
"include": ["src", "vite.config.ts", "vitest.config.ts"],
|
||||
"include": ["src", "tests", "vite.config.ts", "vitest.config.ts"],
|
||||
"exclude": ["node_modules", "dist"]
|
||||
}
|
||||
|
||||
16
vitest.config.ts
Normal file
16
vitest.config.ts
Normal file
@@ -0,0 +1,16 @@
|
||||
import { defineConfig } from "vitest/config";
|
||||
import viteConfig from "./vite.config";
|
||||
|
||||
// Use the same resolve aliases as the app; pick a DOM environment for tests
|
||||
// whose filename ends in `.dom.test.{ts,tsx}` so we can mount React via
|
||||
// @testing-library/react. All other tests run in Node, which is faster and
|
||||
// has full pdfjs-dist legacy-worker support.
|
||||
export default defineConfig({
|
||||
...viteConfig,
|
||||
test: {
|
||||
environmentMatchGlobs: [
|
||||
["**/*.dom.test.{ts,tsx}", "happy-dom"],
|
||||
],
|
||||
globals: false,
|
||||
},
|
||||
});
|
||||
@@ -8,7 +8,7 @@ repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
|
||||
topic_slug: citation_evidence_mvp
|
||||
topic_id: 96fa8e80-9f74-40f2-84cd-644e9747b9ec
|
||||
state_hub_workstream_id: 19cb420b-c262-4c0e-afab-e85946b2cfce
|
||||
status: todo
|
||||
status: done
|
||||
owner: Bernd
|
||||
created: 2026-05-24
|
||||
updated: 2026-05-24
|
||||
@@ -129,7 +129,7 @@ and proposed alternative. Do not proceed with T03+.
|
||||
id: CE-WP-0002-T03
|
||||
state_hub_task_id: 01dad096-3521-42b9-aed9-ce0b2f5d3450
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T02]
|
||||
```
|
||||
|
||||
@@ -152,7 +152,7 @@ extracted canonical text must contain the manifest's known-good quote.
|
||||
id: CE-WP-0002-T04
|
||||
state_hub_task_id: 62e4839a-8026-4e15-b4cc-6685e56b3584
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T01, T03]
|
||||
```
|
||||
|
||||
@@ -183,7 +183,7 @@ confidence ≥ 0.9.
|
||||
id: CE-WP-0002-T05
|
||||
state_hub_task_id: b339a73a-6b58-471c-a01d-e769ea414ee7
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T01]
|
||||
```
|
||||
|
||||
@@ -206,7 +206,7 @@ No persistence to disk yet. ADR-0005 (persistence) is still pending.
|
||||
id: CE-WP-0002-T06
|
||||
state_hub_task_id: f400e133-6ec6-4d5a-98a0-a6408ca4125e
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T02, T05]
|
||||
```
|
||||
|
||||
@@ -232,7 +232,7 @@ note in ADR-0001 if it wasn't already.
|
||||
id: CE-WP-0002-T07
|
||||
state_hub_task_id: 26346a07-bf98-4d43-8b30-de2038ab72f8
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T04, T05, T06]
|
||||
```
|
||||
|
||||
@@ -254,7 +254,7 @@ Active state lives in a single React context for now; no Redux/Zustand.
|
||||
id: CE-WP-0002-T08
|
||||
state_hub_task_id: 469e3fb4-1b42-49a7-88dc-29a6d5055ef5
|
||||
priority: critical
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T04, T06, T07]
|
||||
```
|
||||
|
||||
@@ -276,7 +276,7 @@ Critically, this must also work **after a page reload**. Persistence to
|
||||
id: CE-WP-0002-T09
|
||||
state_hub_task_id: 77423e57-f2c5-42e1-9e6c-c9b6fa35dfcf
|
||||
priority: high
|
||||
status: todo
|
||||
status: done
|
||||
depends_on: [T07, T08]
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user