Files
citation-evidence/workplans/CE-WP-0002-pdf-review-slice.md
tegwick d54daf2e61 Implement CE-WP-0002 T03-T09: ingest, anchor resolution, engine, UI, persistence, e2e
Completes the PDF review slice end-to-end. After this commit a user can
open a fixture, select text, save an evidence item with commentary, see
it in the sidebar, reload the page, click the item, and the viewer
scrolls to the passage.

- T03 src/source/pdf/{fingerprint,extract,ingest}.ts + 39 fixture tests
  - SHA-256 fingerprint over a fresh ArrayBuffer (TS BufferSource-safe)
  - PDF.js text extract; per-page normalize then join with "\n\n"
  - PageMap + OffsetMap (gap-free coverage); pageLength = end - start
  - Updated manifest's Betriebskosten quote to one PDF.js extracts cleanly
- T04 src/anchor/selectors/{create,resolve}.ts + 25 unit + 7 fixture tests
  - createSelectors emits the maximal redundant set (TextQuote +
    TextPosition + PdfRect + PdfPageText when available)
  - resolveSelectors implements the SharedContracts §7 ladder; confidence
    1.0 (pos+quote) → 0.7 (rect-only) → 0 (unresolved)
  - Cross-module integration test moved to tests/integration/ to honor
    the anchor↛source boundary lint rule
- T05 engine: sync event bus over the closed §4 vocabulary, Map-backed
  repos, services, createEngine() composition root, 12 tests
- T06 work + app: three-pane shell (CollectionList | ViewerShell |
  EvidenceSidebar) wired through EngineProvider; EngineContext lives in
  src/work/ to respect the work↛app boundary; SpikeApp deleted
- T07 AnnotationToolbar: pendingSelection in context; Save runs
  createSelectors → engine.annotations.create → engine.evidence.create
- T08 click-to-reopen + localStorage persistence
  - scrollToAnnotation state in context with a version counter so a
    second click on the same item re-fires the viewer scroll
  - captureSnapshot/restoreSnapshot/attachPersister/restoreFromStorage;
    restore bypasses services to avoid event-loops
  - active-document id persisted alongside the snapshot so reload lands
    on the same fixture; ADR-0005 written
  - 9 persistence tests
- T09 tests/integration/app-prd-scenario.dom.test.tsx
  - end-to-end happy-dom test of PRD scenario steps 1-8 through the real
    React tree; viewer + ingest mocked per ADR-0004's headless-Chromium
    limitation. Fixed memo-deps bug in EvidenceSidebar/ViewerShell where
    useEngineEventTick values were not included in the useMemo deps,
    leaving stale memoization across event-driven re-renders
- vitest.config.ts: happy-dom for *.dom.test.{ts,tsx} files
- noEmit added to tsconfig so tsc -b doesn't litter src/ with .js outputs

Gates: typecheck ✓ lint ✓ test 109/109 across 11 files ✓ build ✓

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 10:58:11 +02:00

296 lines
9.4 KiB
Markdown

---
id: CE-WP-0002
type: workplan
title: "PDF review slice — engine types, anchor, source, viewer, sidebar, click-to-reopen"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
topic_slug: citation_evidence_mvp
topic_id: 96fa8e80-9f74-40f2-84cd-644e9747b9ec
state_hub_workstream_id: 19cb420b-c262-4c0e-afab-e85946b2cfce
status: done
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
depends_on_workplan: CE-WP-0001
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
---
# CE-WP-0002 — PDF Review Slice
The first vertical product slice. After this workplan, a user can:
1. Open the app, see a collection of fixture PDFs.
2. Open one PDF in a viewer.
3. Select text, add a one-line comment, save as an evidence item.
4. See the evidence item appear in a sidebar.
5. Click the evidence item and have the PDF jump to and highlight the
passage — even after a full page reload.
No forms, no Markdown/HTML, no recovery, no export. Those come later.
This workplan exercises the riskiest architectural assumption (PDF selector
round-trip with viewer independence) on the simplest possible feature set.
## Risk-driven order
T01 and T02 are the spike from the assessment: prove the
`react-pdf-highlighter-plus` integration can store and reload selectors
without leaking viewer types into engine code. If that breaks, the rest of
the workplan stops and a new ADR is required for ADR-0004 (PDF viewer choice).
## Dependency Order
```
T01 (engine types: Document, Representation, Annotation, Selector, EvidenceItem)
└─ T02 (PDF viewer adapter spike — store + reload selectors as JSON)
└─ T03 (evidence-source: PDF ingest, fingerprint, canonical text)
└─ T04 (evidence-anchor: TextQuote + TextPosition resolution against representation)
└─ T05 (in-memory repositories + engine services)
└─ T06 (citation-work UI: collection list + viewer shell + sidebar)
└─ T07 (annotation create flow)
└─ T08 (click-to-reopen flow)
└─ T09 (end-to-end test of PRD scenario steps 1-4)
```
---
## T01 — Engine types in `src/shared/`
```task
id: CE-WP-0002-T01
state_hub_task_id: b015c082-4272-407d-b6e4-9e1bd97f0193
priority: critical
status: done
```
Translate the type definitions in `wiki/SharedContracts.md` §1 and §3 into
TypeScript under `src/shared/`:
- `src/shared/document.ts``Document`, `DocumentRepresentation`, `PageMap`,
`OffsetMap`
- `src/shared/selector.ts``Selector` discriminated union with at minimum
`TextQuoteSelector`, `TextPositionSelector`, `PdfRectSelector`,
`PdfPageTextSelector`. Other selector kinds defined as `never`-typed stubs
for now.
- `src/shared/annotation.ts``Annotation` with `selectors`, `quote`,
`note`, `normalizeVersion`
- `src/shared/evidence.ts``EvidenceItem`, `EvidenceItem.status` enum per
§2.2
- `src/shared/ids.ts` — branded ID types and a `newId(prefix)` helper
No services, no behavior. Pure data shapes + the ID helper.
Add JSDoc on each type pointing at the §-reference in
`wiki/SharedContracts.md` it implements.
---
## T02 — PDF viewer adapter spike
```task
id: CE-WP-0002-T02
state_hub_task_id: 59846d9e-7ac1-4306-b02e-0980a52f44c8
priority: critical
status: done
depends_on: [T01]
```
**This is the architectural spike.** Build a throwaway
`src/anchor/pdf-viewer-adapter-spike.tsx` that:
1. Loads `fixtures/pdfs/simple.pdf` using `react-pdf-highlighter-plus`
(assumed; if a better library appears, document it in ADR-0004 before
committing).
2. Lets the user select text and produces selectors per `T01` shapes.
3. Serializes the selectors to a JSON blob in `localStorage`.
4. On reload, reads the blob, asks the adapter to resolve, scrolls to the
passage, and renders a highlight.
Success criteria:
- Reload-and-resolve works for all fixture PDFs.
- No PDF.js or `react-pdf-highlighter-plus` types appear in any file under
`src/shared/` or `src/engine/`.
- The adapter's public surface matches the contract in
`wiki/SharedContracts.md` §5.
If success criteria fail: stop. Write a short note in
`docs/decisions/ADR-0004-pdf-viewer-library.md` describing the failure mode
and proposed alternative. Do not proceed with T03+.
---
## T03 — `src/source/`: PDF ingest, fingerprint, canonical text
```task
id: CE-WP-0002-T03
state_hub_task_id: 01dad096-3521-42b9-aed9-ce0b2f5d3450
priority: high
status: done
depends_on: [T02]
```
Implement under `src/source/pdf/`:
- `ingest.ts``ingestPdf(file: File | Buffer): Promise<{ document: Document; representation: DocumentRepresentation }>`
- `fingerprint.ts` — stable SHA-256 of bytes
- `extract.ts` — uses PDF.js to extract page text; runs `normalize()` from
T04 of WP-0001 over the canonical text; builds the `PageMap` and
`OffsetMap` per `Document.DocumentRepresentation`
Tests use the fixture corpus from `CE-WP-0001-T05`. For each fixture,
extracted canonical text must contain the manifest's known-good quote.
---
## T04 — `src/anchor/`: TextQuote and TextPosition resolution
```task
id: CE-WP-0002-T04
state_hub_task_id: 62e4839a-8026-4e15-b4cc-6685e56b3584
priority: high
status: done
depends_on: [T01, T03]
```
Implement under `src/anchor/`:
- `selectors/create.ts` — given a `SelectionCapture` from the adapter, build
the maximal set of available selectors (always `TextQuoteSelector` with
prefix/suffix; `TextPositionSelector` when the representation provides
offsets; PDF rect/text selectors when on PDF)
- `selectors/resolve.ts` — implements the resolution strategy from
`wiki/ArchitectureOverview.md` §7 (try position, verify quote, fall back
through quote+prefix/suffix, return `AnchorResolution`)
- `selectors/types.ts``AnchorResolution`, `SelectionCapture`,
`ResolvedAnchorTarget`
Fuzzy matching is out of scope here — return `unresolved` if exact+prefix/suffix
fails. Fuzzy is a later workplan.
Unit tests using fixtures: for each fixture+known-quote pair, create
selectors then immediately resolve them; resolution must succeed with
confidence ≥ 0.9.
---
## T05 — In-memory repositories + engine services
```task
id: CE-WP-0002-T05
state_hub_task_id: b339a73a-6b58-471c-a01d-e769ea414ee7
priority: high
status: done
depends_on: [T01]
```
Under `src/engine/`:
- `repos/in-memory.ts``Map`-backed implementations of
`DocumentRepository`, `AnnotationRepository`, `EvidenceItemRepository`
- `services/documents.ts`, `services/annotations.ts`, `services/evidence.ts`
— thin orchestration layer that creates IDs, calls repos, and emits the
events from `wiki/SharedContracts.md` §4
- `events/bus.ts` — minimal pub/sub. Synchronous for MVP.
No persistence to disk yet. ADR-0005 (persistence) is still pending.
---
## T06 — `src/work/`: collection list + viewer shell + sidebar
```task
id: CE-WP-0002-T06
state_hub_task_id: f400e133-6ec6-4d5a-98a0-a6408ca4125e
priority: high
status: done
depends_on: [T02, T05]
```
Under `src/work/` and `src/app/`:
- `src/app/App.tsx` — three-pane layout per Architecture §12.1: collection
list (left), viewer (centre), evidence sidebar (right)
- `src/work/CollectionList.tsx` — lists `fixtures/pdfs/manifest.json`
entries; click to load
- `src/work/ViewerShell.tsx` — hosts the viewer adapter from T02 wrapped
cleanly; viewer adapter API is the only surface `work/` uses
- `src/work/EvidenceSidebar.tsx` — lists evidence items for the current
document, shows quote + commentary + status
No styling beyond minimum legibility. CSS in Tailwind or vanilla — pick one,
note in ADR-0001 if it wasn't already.
---
## T07 — Annotation create flow
```task
id: CE-WP-0002-T07
state_hub_task_id: 26346a07-bf98-4d43-8b30-de2038ab72f8
priority: high
status: done
depends_on: [T04, T05, T06]
```
Wire selection → annotation → evidence item:
1. User selects text in the viewer.
2. A small toolbar appears with a comment input + Save button.
3. On Save: adapter produces `SelectionCapture` → anchor creates `Selector[]`
→ engine creates `Annotation` → engine creates `EvidenceItem` with the
commentary → sidebar updates.
Active state lives in a single React context for now; no Redux/Zustand.
---
## T08 — Click-to-reopen flow
```task
id: CE-WP-0002-T08
state_hub_task_id: 469e3fb4-1b42-49a7-88dc-29a6d5055ef5
priority: critical
status: done
depends_on: [T04, T06, T07]
```
Implement the round trip:
1. User clicks an evidence item in the sidebar.
2. Engine loads the annotation → anchor resolves selectors against the
current representation → adapter scrolls to and highlights the target.
Critically, this must also work **after a page reload**. Persistence to
`localStorage` is acceptable for MVP (decide explicitly in
`ADR-0005-persistence.md` that we are deferring real persistence).
---
## T09 — End-to-end test of PRD scenario steps 1-4
```task
id: CE-WP-0002-T09
state_hub_task_id: 77423e57-f2c5-42e1-9e6c-c9b6fa35dfcf
priority: high
status: done
depends_on: [T07, T08]
```
Write a Playwright (or similar) E2E test that:
1. Opens the app.
2. Picks `simple.pdf`.
3. Programmatically selects the known-good quote from the manifest.
4. Saves an evidence item with a comment.
5. Verifies the item appears in the sidebar.
6. Reloads the page.
7. Clicks the evidence item.
8. Verifies the highlight is rendered on the expected page.
This is the contract for "MVP slice 1 works". If it passes, CE-WP-0003 may
begin.