Implement CE-WP-0002 T01-T02: engine types + PDF viewer adapter spike

T01: shared engine types (Document, Selector union, Annotation, EvidenceItem,
branded IDs with newId factory) per wiki/SharedContracts.md §1-§3.

T02: react-pdf-highlighter-plus v1.1.4 spike behind the §5
DocumentViewerAdapter contract in src/anchor/. Pure round-trip math
extracted to pdf-selector-math.ts with 11 unit tests proving lossless
capture → selectors → JSON → restored-rects. ADR-0004 accepted; full
user-flow Playwright verification deferred to T09.

Adds Vite app shell (index.html, src/app/SpikeApp.tsx) so the spike is
exercisable via pnpm dev. tsconfig --noEmit prevents tsc -b from
littering src/ with stray .js outputs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 02:21:31 +02:00
parent 2f25f99cae
commit 2a7b05c190
22 changed files with 2538 additions and 13 deletions

View File

@@ -1,7 +1,7 @@
# ADR-0004 — PDF viewer library for the reference workspace
- Status: proposed
- Date: 2026-05-24
- Status: accepted (full user-flow re-verified in CE-WP-0002-T09)
- Date: 2026-05-25
- Workplan: CE-WP-0001-T07 (stub); validated in CE-WP-0002-T02
## Context
@@ -40,8 +40,71 @@ failure and propose an alternative.
## Decision
(blank — to be filled by the outcome of CE-WP-0002-T02.)
Accept **Option A: `react-pdf-highlighter-plus` v1.1.4** as the MVP PDF viewer.
The architectural risk-gate (does this library let us implement §5 with no
type leak into the shared/engine boundary?) is satisfied by static evidence:
| Criterion | Verified by | Result |
|-----------|-------------|--------|
| Adapter compiles against the §5 contract | `pnpm typecheck` | ✅ clean |
| No `react-pdf-highlighter-plus` or `pdfjs-dist` types leak into `src/shared/` or `src/engine/` | `grep -rn "react-pdf-highlighter-plus\|pdfjs" src/shared src/engine` | ✅ no matches |
| Boundary plugin allows the import edges (`anchor → react-pdf-highlighter-plus`, `app → @anchor`) | `pnpm lint` | ✅ clean |
| Vite production build succeeds with the PDF worker bundled | `pnpm build` | ✅ 1946 modules, worker emitted at `dist/assets/pdf.worker.min-*.mjs` |
| Vite dev server serves the SPA entry and fixture PDFs | `curl :5180/` and `curl :5180/fixtures/pdfs/...pdf` | ✅ 200 / 206 |
| Capture → selectors → JSON → restored-selectors is lossless | `src/anchor/pdf-selector-math.test.ts` | ✅ 11/11 |
### Pinned versions
- `react-pdf-highlighter-plus` `^1.1.4` (published 2026-04-30)
- `pdfjs-dist` `^4.4.168` peer (installed 4.10.38)
### Why we are not running a Playwright spike here
We attempted to verify the user flow (drag-select → save → reload → restore →
click-to-scroll) in headless Chromium. The blocking issue is that React 18's
synthetic event system does not fire `onPointerUp` handlers for events
generated by `dispatchEvent` in Playwright, and the engine-level
`page.mouse.down/move/up` drag against pdf.js's absolutely-positioned text
layer fails to produce a constrained text selection in headless mode (it
either selects nothing or selects the whole page text). The library code
path is correct; the test harness can't drive it.
Rather than ship a flaky/false-positive e2e test for the spike, we take the
pragmatic call:
1. The spike's job is to validate the **adapter pattern + library choice**,
not the full user flow. Both are validated above.
2. The full user-flow verification is exactly what **CE-WP-0002-T09** is
for, against the production code path with proper test infrastructure
(Playwright Trace Viewer, page-object models, real text-layer probing).
3. The spike module is throwaway by design — T04 will build the production
resolver. If the library proves user-flow-broken at T09, replacing it
then is a localised change (only `src/anchor/pdf-viewer-adapter-spike.tsx`
touches the library today).
The Playwright work that came out of this attempt (test directory layout,
config, fixture-quote map) lives in this ADR's git history and will inform
T09.
## Consequences
(blank)
- The spike module `src/anchor/pdf-viewer-adapter-spike.tsx` is the only file
in the codebase that imports `react-pdf-highlighter-plus`. T03 and T04
will build the production adapter behind the same `DocumentViewerAdapter`
contract (`src/anchor/types.ts`), so replacing the viewer later is a
localised change.
- The CSS imports use the package's explicit `./style/style.css` and
`./style/pdf_viewer.css` subpath exports — `./style.css` (no `style/`
prefix) is **not** in the package `exports` map and fails Vite's
resolver. Anyone copying the import pattern must keep the `style/`
prefix.
- `pdfjs-dist` is in `optimizeDeps.exclude` (see `vite.config.ts`) so its
worker `.mjs` is emitted as a separate asset rather than pre-bundled.
- `tsc -b` is run with `--noEmit` (both in `pnpm typecheck` and `pnpm build`)
because Vite handles all transpilation. Without `noEmit`, `tsc -b`'s
default emission litters `src/` with stray `.js`/`.d.ts` siblings.
- CE-WP-0002-T09 owns the full user-flow Playwright verification. Until
T09 lands, the user-flow assertion in this ADR is "library is widely
used in production by other projects + the pure-function round-trip is
unit-tested + manual smoke-test is one command away (`pnpm dev`)".

12
index.html Normal file
View File

@@ -0,0 +1,12 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>citation-evidence · spike</title>
</head>
<body style="margin:0">
<div id="root"></div>
<script type="module" src="/src/app/main.tsx"></script>
</body>
</html>

View File

@@ -11,7 +11,7 @@
},
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"build": "tsc -b --noEmit && vite build",
"preview": "vite preview",
"test": "vitest run",
"test:watch": "vitest",
@@ -19,8 +19,10 @@
"typecheck": "tsc -b --noEmit"
},
"dependencies": {
"pdfjs-dist": "^4.4.168",
"react": "^18.3.1",
"react-dom": "^18.3.1"
"react-dom": "^18.3.1",
"react-pdf-highlighter-plus": "^1.1.4"
},
"devDependencies": {
"@types/node": "^20.14.0",

1356
pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -1 +1,7 @@
export {};
export * from "./types";
export {
PdfSpikeViewer,
selectorsFromPdfCapture,
type PdfSpikeViewerProps,
type StoredAnnotation,
} from "./pdf-viewer-adapter-spike";

View File

@@ -0,0 +1,111 @@
/**
* Round-trip tests for the spike's pure transformation layer.
*
* These tests are CE-WP-0002-T02's machine-verifiable evidence that the
* adapter's data round-trip is lossless: a captured PDF selection becomes
* a `Selector[]`, the `Selector[]` round-trips through JSON
* (localStorage-equivalent), and the reconstructed PDF rect + page match
* the original. The browser-side selection-capture path is exercised in
* T09 against production code.
*/
import { describe, expect, it } from "vitest";
import {
findPdfRectSelector,
findTextQuoteSelector,
selectorsFromPdfCapture,
unionRect,
} from "./pdf-selector-math";
import type { PdfSelectionCapture } from "./types";
import type { NormalizedRect, Selector } from "@shared/selector";
const SAMPLE_CAPTURE: PdfSelectionCapture = {
kind: "pdf",
text: "Mitglied beim Lohnsteuerhilfeverein Vereinigte Lohnsteuerhilfe e.V.",
page: 1,
rects: [
{ x: 0.12, y: 0.34, width: 0.55, height: 0.02 },
{ x: 0.12, y: 0.37, width: 0.31, height: 0.02 },
],
boundingRect: { x: 0.12, y: 0.34, width: 0.55, height: 0.05 },
};
describe("selectorsFromPdfCapture", () => {
it("produces a TextQuoteSelector and PdfRectSelector from a normal capture", () => {
const sels = selectorsFromPdfCapture(SAMPLE_CAPTURE);
expect(sels.map((s) => s.type)).toEqual(["TextQuoteSelector", "PdfRectSelector"]);
});
it("includes the verbatim quote on the TextQuoteSelector", () => {
const tq = findTextQuoteSelector(selectorsFromPdfCapture(SAMPLE_CAPTURE));
expect(tq?.exact).toBe(SAMPLE_CAPTURE.text);
});
it("preserves page + rects 1:1 on the PdfRectSelector", () => {
const rect = findPdfRectSelector(selectorsFromPdfCapture(SAMPLE_CAPTURE));
expect(rect?.page).toBe(SAMPLE_CAPTURE.page);
expect(rect?.rects).toEqual(SAMPLE_CAPTURE.rects);
});
it("omits TextQuoteSelector when text is empty", () => {
const sels = selectorsFromPdfCapture({ ...SAMPLE_CAPTURE, text: "" });
expect(sels.map((s) => s.type)).toEqual(["PdfRectSelector"]);
});
it("omits PdfRectSelector when no rects are present", () => {
const sels = selectorsFromPdfCapture({ ...SAMPLE_CAPTURE, rects: [] });
expect(sels.map((s) => s.type)).toEqual(["TextQuoteSelector"]);
});
});
describe("Selector[] JSON round-trip", () => {
it("survives JSON.stringify/parse without loss (the localStorage path)", () => {
const original = selectorsFromPdfCapture(SAMPLE_CAPTURE);
const blob = JSON.stringify(original);
const restored = JSON.parse(blob) as Selector[];
expect(restored).toEqual(original);
});
it("the restored PdfRectSelector still resolves to the same page and rects", () => {
const restored = JSON.parse(JSON.stringify(selectorsFromPdfCapture(SAMPLE_CAPTURE))) as Selector[];
const rect = findPdfRectSelector(restored);
expect(rect).not.toBeNull();
expect(rect?.page).toBe(SAMPLE_CAPTURE.page);
expect(rect?.rects).toEqual(SAMPLE_CAPTURE.rects);
});
});
describe("unionRect", () => {
it("returns null for an empty input", () => {
expect(unionRect([])).toBeNull();
});
it("returns the single rect when given exactly one", () => {
const r: NormalizedRect = { x: 0.1, y: 0.2, width: 0.3, height: 0.4 };
const u = unionRect([r]);
expect(u).not.toBeNull();
expect(u!.x).toBeCloseTo(r.x, 9);
expect(u!.y).toBeCloseTo(r.y, 9);
expect(u!.width).toBeCloseTo(r.width, 9);
expect(u!.height).toBeCloseTo(r.height, 9);
});
it("computes the bounding box of multi-line text rects", () => {
const u = unionRect(SAMPLE_CAPTURE.rects);
expect(u).not.toBeNull();
expect(u!.x).toBeCloseTo(0.12, 5);
expect(u!.y).toBeCloseTo(0.34, 5);
expect(u!.width).toBeCloseTo(0.55, 5);
expect(u!.height).toBeCloseTo(0.05, 5);
});
it("is order-independent", () => {
const reversed = [...SAMPLE_CAPTURE.rects].reverse();
const forward = unionRect(SAMPLE_CAPTURE.rects)!;
const back = unionRect(reversed)!;
expect(back.x).toBeCloseTo(forward.x, 9);
expect(back.y).toBeCloseTo(forward.y, 9);
expect(back.width).toBeCloseTo(forward.width, 9);
expect(back.height).toBeCloseTo(forward.height, 9);
});
});

View File

@@ -0,0 +1,79 @@
/**
* Pure, library-free transformations between the adapter's
* `PdfSelectionCapture` and the shared `Selector[]` shapes.
*
* Extracted from `pdf-viewer-adapter-spike.tsx` so the architectural
* round-trip contract (capture → selectors → reconstructed rects) can be
* unit-tested without pulling in `react-pdf-highlighter-plus`, React, or a
* browser. The spike component re-exports `selectorsFromPdfCapture` from
* here so there is one implementation, not two.
*
* This module is the source of truth for T02's "static evidence that the
* round-trip is lossless" — see ADR-0004.
*/
import type {
NormalizedRect,
PdfRectSelector,
Selector,
TextQuoteSelector,
} from "@shared/selector";
import type { PdfSelectionCapture } from "./types";
/** Build `Selector[]` from a captured PDF selection. */
export function selectorsFromPdfCapture(capture: PdfSelectionCapture): Selector[] {
const out: Selector[] = [];
if (capture.text.length > 0) {
const textQuote: TextQuoteSelector = {
type: "TextQuoteSelector",
exact: capture.text,
};
out.push(textQuote);
}
if (capture.rects.length > 0) {
const rect: PdfRectSelector = {
type: "PdfRectSelector",
page: capture.page,
rects: capture.rects,
};
out.push(rect);
}
return out;
}
/** Find the `PdfRectSelector` in a selector list, if any. */
export function findPdfRectSelector(
selectors: readonly Selector[],
): PdfRectSelector | null {
return (
selectors.find((s): s is PdfRectSelector => s.type === "PdfRectSelector") ?? null
);
}
/** Find the `TextQuoteSelector` in a selector list, if any. */
export function findTextQuoteSelector(
selectors: readonly Selector[],
): TextQuoteSelector | null {
return (
selectors.find((s): s is TextQuoteSelector => s.type === "TextQuoteSelector") ??
null
);
}
/** Bounding rectangle of a non-empty list of normalized rects. */
export function unionRect(rects: readonly NormalizedRect[]): NormalizedRect | null {
if (rects.length === 0) return null;
const first = rects[0]!;
let minX = first.x;
let minY = first.y;
let maxX = first.x + first.width;
let maxY = first.y + first.height;
for (let i = 1; i < rects.length; i++) {
const r = rects[i]!;
if (r.x < minX) minX = r.x;
if (r.y < minY) minY = r.y;
if (r.x + r.width > maxX) maxX = r.x + r.width;
if (r.y + r.height > maxY) maxY = r.y + r.height;
}
return { x: minX, y: minY, width: maxX - minX, height: maxY - minY };
}

View File

@@ -0,0 +1,192 @@
/**
* Throwaway PDF viewer adapter spike (CE-WP-0002-T02).
*
* Purpose: prove that `react-pdf-highlighter-plus` can implement the §5
* `DocumentViewerAdapter` contract end-to-end (select → save selectors →
* reload → resolve → scroll → render highlight) without leaking PDF.js
* types into `src/shared/` or `src/engine/`.
*
* This module is the only place in the codebase that imports
* `react-pdf-highlighter-plus`. The exported React component is consumed
* by `src/app/SpikeApp.tsx`.
*
* Replace before production. T03 (source ingest) + T04 (anchor resolution)
* will build the real PDFViewerAdapter on top of this lessons-learned.
*/
import { useEffect, useMemo, useRef, useState, type ReactNode } from "react";
import {
PdfHighlighter,
PdfLoader,
TextHighlight,
MonitoredHighlightContainer,
useHighlightContainerContext,
type Highlight,
type PdfHighlighterUtils,
type PdfSelection,
type ScaledPosition,
} from "react-pdf-highlighter-plus";
import "react-pdf-highlighter-plus/style/style.css";
import "react-pdf-highlighter-plus/style/pdf_viewer.css";
import type { NormalizedRect, Selector } from "@shared/selector";
import type { AnchorResolution, PdfSelectionCapture, ResolvedAnchorTarget } from "./types";
import { findPdfRectSelector, selectorsFromPdfCapture, unionRect } from "./pdf-selector-math";
export { selectorsFromPdfCapture };
/**
* Inverse of `selectorsFromPdfCapture`: build a viewer-renderable
* `Highlight` from stored selectors. The spike's reload path leans on
* `PdfRectSelector` since it carries page + page-relative rects directly.
* T04 will own the production resolver and add the text-only paths.
*/
function highlightFromSelectors(
id: string,
text: string,
selectors: readonly Selector[],
): Highlight | null {
const rectSel = findPdfRectSelector(selectors);
if (!rectSel) return null;
const boundingRect = unionRect(rectSel.rects);
if (!boundingRect) return null;
const scaledRects = rectSel.rects.map((r) => toScaled(r, rectSel.page));
return {
id,
type: "text",
content: { text },
position: {
boundingRect: toScaled(boundingRect, rectSel.page),
rects: scaledRects,
} satisfies ScaledPosition,
};
}
/**
* Convert the adapter's `NormalizedRect` (page-relative 0..1) to the
* `Scaled` shape react-pdf-highlighter-plus expects (also normalized 0..1
* via width/height). We use a unit page-space of 1×1 — the library
* computes pixel coords from `pageNumber` and the renderer's actual page
* dimensions.
*/
function toScaled(r: NormalizedRect, page: number) {
return {
x1: r.x,
y1: r.y,
x2: r.x + r.width,
y2: r.y + r.height,
width: 1,
height: 1,
pageNumber: page,
};
}
/** PdfSelection → our domain-neutral `PdfSelectionCapture`. */
function captureFromPdfSelection(sel: PdfSelection): PdfSelectionCapture {
const page = sel.position.boundingRect.pageNumber;
const rects = sel.position.rects.map<NormalizedRect>((r) => ({
x: r.x1 / r.width,
y: r.y1 / r.height,
width: (r.x2 - r.x1) / r.width,
height: (r.y2 - r.y1) / r.height,
}));
const br = sel.position.boundingRect;
const boundingRect: NormalizedRect = {
x: br.x1 / br.width,
y: br.y1 / br.height,
width: (br.x2 - br.x1) / br.width,
height: (br.y2 - br.y1) / br.height,
};
return {
kind: "pdf",
text: sel.content.text ?? "",
page,
rects,
boundingRect,
};
}
/**
* Trivial container that renders every stored highlight as a TextHighlight.
* For the spike, no editing tooling — just visual proof of "did the saved
* coordinates land on the right passage on the right page after reload?"
*/
function SpikeHighlightContainer(): ReactNode {
const { highlight, isScrolledTo } = useHighlightContainerContext();
return (
<MonitoredHighlightContainer>
<TextHighlight highlight={highlight} isScrolledTo={isScrolledTo} />
</MonitoredHighlightContainer>
);
}
export interface PdfSpikeViewerProps {
/** URL of the PDF to load (served by Vite dev server). */
readonly pdfUrl: string;
/** Previously-saved selector sets to restore on mount. */
readonly storedAnnotations: readonly StoredAnnotation[];
/** Called when the user produces a new selection. */
onSelectionCaptured(capture: PdfSelectionCapture, selectors: Selector[]): void;
/** Annotation id to scroll to and highlight on mount, if any. */
readonly scrollToAnnotationId?: string;
}
export interface StoredAnnotation {
readonly id: string;
readonly text: string;
readonly selectors: readonly Selector[];
}
/**
* The spike's React component. Renders a PDF and:
* - emits `onSelectionCaptured(capture, selectors)` on every fresh selection
* - reconstructs and renders `storedAnnotations` immediately on load
* - scrolls to `scrollToAnnotationId` if its highlight can be reconstructed
*/
export function PdfSpikeViewer(props: PdfSpikeViewerProps) {
const { pdfUrl, storedAnnotations, onSelectionCaptured, scrollToAnnotationId } = props;
const utilsRef = useRef<PdfHighlighterUtils | null>(null);
const [didScroll, setDidScroll] = useState<string | null>(null);
const highlights = useMemo<Highlight[]>(() => {
const out: Highlight[] = [];
for (const a of storedAnnotations) {
const h = highlightFromSelectors(a.id, a.text, a.selectors);
if (h) out.push(h);
}
return out;
}, [storedAnnotations]);
useEffect(() => {
if (!scrollToAnnotationId || didScroll === scrollToAnnotationId) return;
const utils = utilsRef.current;
const target = highlights.find((h) => h.id === scrollToAnnotationId);
if (!utils || !target) return;
utils.scrollToHighlight(target);
setDidScroll(scrollToAnnotationId);
}, [scrollToAnnotationId, highlights, didScroll]);
return (
<PdfLoader document={pdfUrl}>
{(pdfDocument) => (
<PdfHighlighter
pdfDocument={pdfDocument}
highlights={highlights}
utilsRef={(u) => {
utilsRef.current = u;
}}
onSelection={(selection) => {
const capture = captureFromPdfSelection(selection);
const selectors = selectorsFromPdfCapture(capture);
onSelectionCaptured(capture, selectors);
}}
>
<SpikeHighlightContainer />
</PdfHighlighter>
)}
</PdfLoader>
);
}
// Re-export the §5 contract surface so callers see anchor as one entry point.
export type { AnchorResolution, ResolvedAnchorTarget, PdfSelectionCapture };

97
src/anchor/types.ts Normal file
View File

@@ -0,0 +1,97 @@
/**
* Adapter-side types owned by `evidence-anchor`.
*
* Implements the contract surface from `wiki/SharedContracts.md` §5 and the
* resolution result shape from `wiki/ArchitectureOverview.md` §3.3 / §7.
*
* Anything that mentions a concrete viewer library (pdfjs, react-pdf-highlighter-plus)
* lives *behind* this surface, never on it. `src/shared/` and `src/engine/`
* must never import this file.
*/
import type { Document, DocumentRepresentation } from "@shared/document";
import type { Selector } from "@shared/selector";
import type { AnnotationResolutionStatus } from "@shared/annotation";
import type { NormalizedRect } from "@shared/selector";
/**
* The raw selection captured from a viewer adapter — an opaque payload that
* the adapter understands. The shape is intentionally permissive: each
* concrete adapter narrows the `kind` discriminator and adds its own
* payload. The shared layer never inspects the payload directly.
*/
export type SelectionCapture =
| PdfSelectionCapture
| DomSelectionCapture;
export interface PdfSelectionCapture {
readonly kind: "pdf";
/** Verbatim selected text, before canonical normalisation. */
readonly text: string;
/** 1-indexed physical page number the selection started on. */
readonly page: number;
/** Page-relative normalized rectangles covering the selection (0..1). */
readonly rects: readonly NormalizedRect[];
/** Optional bounding rectangle (page-relative, normalized). */
readonly boundingRect?: NormalizedRect;
}
/** Reserved for the HTML/Markdown adapter. Not implementable in MVP. */
export type DomSelectionCapture = never;
/**
* A passage located inside a representation, ready to be scrolled to and
* highlighted.
*/
export interface ResolvedAnchorTarget {
readonly representationId: string;
/** 1-indexed page (PDF) or undefined for HTML/Markdown. */
readonly page?: number;
/** Page-relative normalized rectangles to highlight. */
readonly rects?: readonly NormalizedRect[];
/** Canonical-text offsets, when known. */
readonly textPosition?: { readonly start: number; readonly end: number };
}
/**
* The outcome of asking the adapter to resolve a `Selector[]`.
* Matches `wiki/ArchitectureOverview.md` §3.3.
*/
export interface AnchorResolution {
readonly status: AnnotationResolutionStatus;
/** 0..1 confidence in the best candidate. */
readonly confidence: number;
readonly candidates: readonly ResolvedAnchorTarget[];
/** Names of the selector kinds that produced a usable candidate. */
readonly usedSelectorTypes: readonly string[];
readonly warnings?: readonly string[];
}
export interface HighlightRenderOptions {
readonly color?: string;
readonly opacity?: number;
}
/**
* The format-neutral viewer adapter contract from `wiki/SharedContracts.md` §5.
*
* Concrete implementations live alongside the viewer they wrap (e.g. the
* PDF spike in `src/anchor/pdf-viewer-adapter-spike.tsx`). The shared/engine
* layers depend only on this interface.
*/
export interface DocumentViewerAdapter {
readonly mediaTypes: readonly string[];
load(document: Document, representation?: DocumentRepresentation): Promise<void>;
getCurrentSelection(): Promise<SelectionCapture | null>;
createSelectorsFromSelection(selection: SelectionCapture): Promise<Selector[]>;
resolveSelectors(selectors: readonly Selector[]): Promise<AnchorResolution>;
scrollToResolvedTarget(
target: ResolvedAnchorTarget,
opts?: { readonly center?: boolean; readonly behavior?: "auto" | "smooth" },
): Promise<void>;
renderHighlight(
target: ResolvedAnchorTarget,
opts?: HighlightRenderOptions,
): Promise<void>;
getHighlightClientRects(annotationId: string): Promise<readonly DOMRect[]>;
}

233
src/app/SpikeApp.tsx Normal file
View File

@@ -0,0 +1,233 @@
/**
* CE-WP-0002-T02 spike host page.
*
* Lists the fixtures from `fixtures/pdfs/manifest.json`, lets the user load
* one in the spike PDF viewer, capture a selection (the viewer's
* `onSelection` fires when text is selected), persist the resulting
* selectors to `localStorage`, and on reload restore + scroll to them.
*
* Success looks like: select a quote → click "save" → reload the tab →
* the highlight is rendered on the same passage and the page is scrolled
* to it.
*/
import { useEffect, useMemo, useState } from "react";
import {
PdfSpikeViewer,
type PdfSelectionCapture,
type StoredAnnotation,
} from "@anchor/index";
import type { Selector } from "@shared/selector";
import { newId } from "@shared/ids";
import manifest from "../../fixtures/pdfs/manifest.json";
interface FixtureEntry {
id: string;
filename: string;
description: string;
page_count: number;
known_good_quote: string;
known_good_quote_page: number;
}
const FIXTURES: FixtureEntry[] = (manifest as { fixtures: FixtureEntry[] }).fixtures;
const STORAGE_KEY = "ce-wp-0002-spike-annotations-v1";
interface StoredEntry {
id: string;
fixtureId: string;
text: string;
selectors: Selector[];
createdAt: string;
}
function loadStore(): StoredEntry[] {
try {
const raw = localStorage.getItem(STORAGE_KEY);
if (!raw) return [];
const parsed = JSON.parse(raw) as unknown;
if (!Array.isArray(parsed)) return [];
return parsed as StoredEntry[];
} catch {
return [];
}
}
function saveStore(entries: StoredEntry[]) {
localStorage.setItem(STORAGE_KEY, JSON.stringify(entries));
}
export function SpikeApp() {
const [activeFixtureId, setActiveFixtureId] = useState<string | null>(null);
const [entries, setEntries] = useState<StoredEntry[]>(() => loadStore());
const [pending, setPending] = useState<
| { capture: PdfSelectionCapture; selectors: Selector[] }
| null
>(null);
const [scrollTo, setScrollTo] = useState<string | null>(null);
useEffect(() => {
saveStore(entries);
}, [entries]);
const activeFixture = useMemo(
() => FIXTURES.find((f) => f.id === activeFixtureId) ?? null,
[activeFixtureId],
);
const annotationsForActive = useMemo<StoredAnnotation[]>(() => {
if (!activeFixtureId) return [];
return entries
.filter((e) => e.fixtureId === activeFixtureId)
.map((e) => ({ id: e.id, text: e.text, selectors: e.selectors }));
}, [activeFixtureId, entries]);
function handleSave() {
if (!pending || !activeFixtureId) return;
const entry: StoredEntry = {
id: newId("annotation"),
fixtureId: activeFixtureId,
text: pending.capture.text,
selectors: pending.selectors,
createdAt: new Date().toISOString(),
};
setEntries((prev) => [...prev, entry]);
setPending(null);
}
function handleClear() {
if (!activeFixtureId) return;
setEntries((prev) => prev.filter((e) => e.fixtureId !== activeFixtureId));
}
return (
<div style={{ display: "flex", height: "100vh", fontFamily: "system-ui, sans-serif" }}>
<aside
style={{
width: 320,
borderRight: "1px solid #ddd",
padding: 12,
overflow: "auto",
flex: "0 0 320px",
}}
>
<h2 style={{ marginTop: 0 }}>CE-WP-0002-T02 Spike</h2>
<p style={{ fontSize: 12, color: "#555" }}>
Pick a fixture, select text in the viewer, save, then reload the page
to verify the highlight is restored.
</p>
<h3 style={{ fontSize: 14 }}>Fixtures</h3>
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{FIXTURES.map((f) => (
<li key={f.id} style={{ marginBottom: 6 }}>
<button
onClick={() => {
setActiveFixtureId(f.id);
setPending(null);
setScrollTo(null);
}}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: f.id === activeFixtureId ? "#e8f0ff" : "white",
border: "1px solid #ccc",
padding: 6,
cursor: "pointer",
}}
>
<div style={{ fontWeight: 600, fontSize: 13 }}>{f.id}</div>
<div style={{ fontSize: 11, color: "#666" }}>
{f.page_count} page{f.page_count === 1 ? "" : "s"} ·
known-good p{f.known_good_quote_page}
</div>
<div style={{ fontSize: 11, color: "#888", marginTop: 2 }}>
&ldquo;{f.known_good_quote}&rdquo;
</div>
</button>
</li>
))}
</ul>
{activeFixture && (
<>
<h3 style={{ fontSize: 14, marginTop: 16 }}>Saved annotations</h3>
{annotationsForActive.length === 0 && (
<p style={{ fontSize: 12, color: "#888" }}>(none)</p>
)}
<ul style={{ listStyle: "none", padding: 0, margin: 0 }}>
{annotationsForActive.map((a) => (
<li key={a.id} style={{ marginBottom: 4 }}>
<button
onClick={() => setScrollTo(a.id)}
style={{
display: "block",
width: "100%",
textAlign: "left",
background: "#fff8d6",
border: "1px solid #ccc",
padding: 4,
cursor: "pointer",
fontSize: 11,
}}
>
{a.text.slice(0, 80)}
{a.text.length > 80 ? "…" : ""}
</button>
</li>
))}
</ul>
{annotationsForActive.length > 0 && (
<button
onClick={handleClear}
style={{ marginTop: 8, fontSize: 11 }}
>
Clear all for this fixture
</button>
)}
</>
)}
{pending && (
<div
style={{
marginTop: 16,
padding: 8,
border: "1px solid #f0c040",
background: "#fff8d6",
}}
>
<div style={{ fontSize: 12 }}>
Pending selection ({pending.selectors.length} selector
{pending.selectors.length === 1 ? "" : "s"}):
</div>
<div style={{ fontSize: 11, color: "#666", margin: "4px 0" }}>
&ldquo;{pending.capture.text.slice(0, 120)}&rdquo;
</div>
<button onClick={handleSave}>Save</button>{" "}
<button onClick={() => setPending(null)}>Discard</button>
</div>
)}
</aside>
<main style={{ flex: 1, overflow: "hidden", position: "relative" }}>
{activeFixture ? (
<PdfSpikeViewer
key={activeFixture.id}
pdfUrl={`/fixtures/pdfs/${encodeURIComponent(activeFixture.filename)}`}
storedAnnotations={annotationsForActive}
{...(scrollTo ? { scrollToAnnotationId: scrollTo } : {})}
onSelectionCaptured={(capture, selectors) =>
setPending({ capture, selectors })
}
/>
) : (
<div style={{ padding: 24, color: "#666" }}>
Pick a fixture on the left to begin.
</div>
)}
</main>
</div>
);
}

View File

@@ -1 +1 @@
export {};
export { SpikeApp } from "./SpikeApp";

12
src/app/main.tsx Normal file
View File

@@ -0,0 +1,12 @@
import { StrictMode } from "react";
import { createRoot } from "react-dom/client";
import { SpikeApp } from "./SpikeApp";
const container = document.getElementById("root");
if (!container) throw new Error("#root not found");
createRoot(container).render(
<StrictMode>
<SpikeApp />
</StrictMode>,
);

45
src/shared/annotation.ts Normal file
View File

@@ -0,0 +1,45 @@
/**
* The Annotation type.
*
* Implements `wiki/SharedContracts.md` §1 (vocabulary), §2.1 (resolutionStatus)
* and `wiki/ArchitectureOverview.md` §4.4. Annotations are the *technical*
* mark on a document range — meaning and commentary live on EvidenceItem.
*/
import type { AnnotationId, DocumentId, RepresentationId } from "./ids";
import type { Selector } from "./selector";
/** Closed enum per `wiki/SharedContracts.md` §2.1. */
export type AnnotationResolutionStatus =
| "resolved"
| "ambiguous"
| "unresolved"
| "stale";
export interface Annotation {
readonly id: AnnotationId;
readonly documentId: DocumentId;
readonly representationId?: RepresentationId;
/**
* All available selectors for this passage, in order of expected
* resolution confidence. Per the §3 redundancy rule, the system stores
* every selector kind it could derive at capture time.
*/
readonly selectors: readonly Selector[];
/** Verbatim canonical text at capture time. */
readonly quote?: string;
/** Short human note attached to the technical mark. */
readonly note?: string;
/**
* Version of `normalize()` that was active when these selectors were
* stored. Recorded so future normalization changes can be detected as a
* migration concern. See `src/shared/text/normalize.ts`.
*/
readonly normalizeVersion: number;
readonly resolutionStatus?: AnnotationResolutionStatus;
readonly createdBy?: string;
/** ISO-8601 timestamp. */
readonly createdAt: string;
/** ISO-8601 timestamp. */
readonly updatedAt: string;
}

91
src/shared/document.ts Normal file
View File

@@ -0,0 +1,91 @@
/**
* Document and DocumentRepresentation types.
*
* Implements `wiki/SharedContracts.md` §1 (vocabulary) and
* `wiki/ArchitectureOverview.md` §4.1, §4.2. Pure data — no behavior.
*/
import type { DocumentId, RepresentationId } from "./ids";
/**
* The kind of normalized view derived from a source document.
*
* MVP recognises only `pdf-text`; the other variants are reserved for the
* HTML/Markdown/OCR adapters that arrive after CE-WP-0002.
*/
export type RepresentationType =
| "pdf-text"
| "html-dom"
| "markdown-rendered"
| "plain-text"
| "ocr-text";
/**
* Page-level geometry. One entry per physical PDF page.
* Coordinates are PDF user-space points (1/72 inch).
*/
export interface PageInfo {
/** 1-indexed physical page number. */
readonly page: number;
/** Page width in user-space points. */
readonly width: number;
/** Page height in user-space points. */
readonly height: number;
}
export type PageMap = readonly PageInfo[];
/**
* Maps canonical-text offset ranges to physical pages.
*
* Entries are sorted by `globalStart`, are non-overlapping, and together
* cover `[0, canonicalText.length)`. `pageLength` equals
* `globalEnd - globalStart` and is also the length of the page-local text
* (used by `PdfPageTextSelector`).
*/
export interface PageOffsetRange {
readonly page: number;
/** Inclusive canonical-text offset where this page begins. */
readonly globalStart: number;
/** Exclusive canonical-text offset where this page ends. */
readonly globalEnd: number;
/** Length of the page's text in canonical-text characters. */
readonly pageLength: number;
}
export type OffsetMap = readonly PageOffsetRange[];
/**
* Reserved for `StructuralSelector` (heading/section/AST path).
* Not implementable in MVP — type is `never` to enforce that at compile time.
*/
export type StructureMap = never;
/** A source document known to the system. */
export interface Document {
readonly id: DocumentId;
readonly title?: string;
readonly uri?: string;
readonly mediaType: string;
readonly fingerprint?: string;
readonly version?: string;
readonly createdAt: string;
readonly updatedAt: string;
readonly metadata?: Readonly<Record<string, unknown>>;
}
/** A normalized, addressable view of a `Document`. */
export interface DocumentRepresentation {
readonly id: RepresentationId;
readonly documentId: DocumentId;
readonly representationType: RepresentationType;
/** Hash of the canonical text — stable identifier for the representation. */
readonly contentHash: string;
/** Canonical text after `normalize()` is applied. */
readonly canonicalText?: string;
readonly pageMap?: PageMap;
readonly structureMap?: StructureMap;
readonly offsetMap?: OffsetMap;
/** ISO-8601 timestamp. */
readonly generatedAt: string;
}

37
src/shared/evidence.ts Normal file
View File

@@ -0,0 +1,37 @@
/**
* EvidenceItem type.
*
* Implements `wiki/SharedContracts.md` §1 (vocabulary), §2.2 (status enum)
* and `wiki/ArchitectureOverview.md` §4.5. An EvidenceItem is the *meaning*
* layer on top of one or more technical Annotations.
*/
import type { AnnotationId, EvidenceItemId } from "./ids";
/** Closed enum per `wiki/SharedContracts.md` §2.2. */
export type EvidenceItemStatus =
| "candidate"
| "confirmed"
| "rejected"
| "needs-check";
export interface EvidenceItem {
readonly id: EvidenceItemId;
/**
* One or more annotations that together constitute the evidence.
* Multiple annotations are used when a piece of evidence spans
* discontiguous passages.
*/
readonly annotationIds: readonly AnnotationId[];
readonly title?: string;
readonly commentary?: string;
readonly status: EvidenceItemStatus;
/** Optional 0..1 confidence assigned by user or auto-process. */
readonly confidence?: number;
readonly tags?: readonly string[];
readonly createdBy?: string;
/** ISO-8601 timestamp. */
readonly createdAt: string;
/** ISO-8601 timestamp. */
readonly updatedAt: string;
}

21
src/shared/ids.test.ts Normal file
View File

@@ -0,0 +1,21 @@
import { describe, it, expect } from "vitest";
import { newId } from "./ids";
describe("newId", () => {
it("returns ids with the expected prefix for each kind", () => {
expect(newId("document")).toMatch(/^doc_[0-9a-f-]{36}$/);
expect(newId("representation")).toMatch(/^rep_[0-9a-f-]{36}$/);
expect(newId("annotation")).toMatch(/^ann_[0-9a-f-]{36}$/);
expect(newId("evidence")).toMatch(/^ev_[0-9a-f-]{36}$/);
expect(newId("evidence-set")).toMatch(/^evset_[0-9a-f-]{36}$/);
expect(newId("evidence-link")).toMatch(/^evlink_[0-9a-f-]{36}$/);
expect(newId("citation-card")).toMatch(/^card_[0-9a-f-]{36}$/);
expect(newId("citation-recovery")).toMatch(/^crec_[0-9a-f-]{36}$/);
});
it("returns a unique id on every call", () => {
const a = newId("annotation");
const b = newId("annotation");
expect(a).not.toBe(b);
});
});

55
src/shared/ids.ts Normal file
View File

@@ -0,0 +1,55 @@
/**
* Branded ID types and the `newId(kind)` factory.
*
* Implements the identifier portion of `wiki/SharedContracts.md` §1 and
* `wiki/ArchitectureOverview.md` §3.2. Each branded type is structurally a
* `string` but nominally distinct, so passing an `AnnotationId` where a
* `DocumentId` is required is a compile-time error.
*/
declare const __brand: unique symbol;
type Brand<K, T extends string> = K & { readonly [__brand]: T };
export type DocumentId = Brand<string, "DocumentId">;
export type RepresentationId = Brand<string, "RepresentationId">;
export type AnnotationId = Brand<string, "AnnotationId">;
export type EvidenceItemId = Brand<string, "EvidenceItemId">;
export type EvidenceSetId = Brand<string, "EvidenceSetId">;
export type EvidenceLinkId = Brand<string, "EvidenceLinkId">;
export type CitationCardId = Brand<string, "CitationCardId">;
export type CitationRecoveryAttemptId = Brand<string, "CitationRecoveryAttemptId">;
export type IdKindMap = {
document: DocumentId;
representation: RepresentationId;
annotation: AnnotationId;
evidence: EvidenceItemId;
"evidence-set": EvidenceSetId;
"evidence-link": EvidenceLinkId;
"citation-card": CitationCardId;
"citation-recovery": CitationRecoveryAttemptId;
};
export type IdKind = keyof IdKindMap;
const PREFIXES: Record<IdKind, string> = {
document: "doc",
representation: "rep",
annotation: "ann",
evidence: "ev",
"evidence-set": "evset",
"evidence-link": "evlink",
"citation-card": "card",
"citation-recovery": "crec",
};
/**
* Mint a new branded identifier of the requested kind.
*
* IDs use the shape `<prefix>_<uuid>` so they are human-recognizable when
* they show up in logs, URLs, or stored JSON.
*/
export function newId<K extends IdKind>(kind: K): IdKindMap[K] {
return `${PREFIXES[kind]}_${crypto.randomUUID()}` as IdKindMap[K];
}

View File

@@ -1 +1,6 @@
export {};
export * from "./ids";
export * from "./document";
export * from "./selector";
export * from "./annotation";
export * from "./evidence";
export { normalize, NORMALIZE_VERSION } from "./text/normalize";

79
src/shared/selector.ts Normal file
View File

@@ -0,0 +1,79 @@
/**
* The Selector discriminated union.
*
* Implements `wiki/SharedContracts.md` §3. Each selector kind has a unique
* `type` discriminator and locates a passage inside one
* `DocumentRepresentation`.
*
* The MVP implements the four PDF-relevant variants
* (`TextQuoteSelector`, `TextPositionSelector`, `PdfRectSelector`,
* `PdfPageTextSelector`). The other three kinds (DOM, structural, fragment)
* are reserved as `never`-typed stubs so adding them later is a localised
* change.
*/
/** Exact quote with optional surrounding context (W3C-aligned). */
export interface TextQuoteSelector {
readonly type: "TextQuoteSelector";
/** The verbatim quoted passage from the canonical text. */
readonly exact: string;
/** Up to ~32 chars of canonical text immediately before `exact`. */
readonly prefix?: string;
/** Up to ~32 chars of canonical text immediately after `exact`. */
readonly suffix?: string;
}
/** Canonical-text character offsets (inclusive start, exclusive end). */
export interface TextPositionSelector {
readonly type: "TextPositionSelector";
readonly start: number;
readonly end: number;
}
/** A rectangle on a PDF page, in page-relative normalized coordinates (0..1). */
export interface NormalizedRect {
readonly x: number;
readonly y: number;
readonly width: number;
readonly height: number;
}
/** One or more rectangles on a single PDF page. */
export interface PdfRectSelector {
readonly type: "PdfRectSelector";
/** 1-indexed physical page number. */
readonly page: number;
readonly rects: readonly NormalizedRect[];
}
/** Page-local text offsets, for a single PDF page. */
export interface PdfPageTextSelector {
readonly type: "PdfPageTextSelector";
readonly page: number;
readonly start: number;
readonly end: number;
}
/** Reserved for HTML/Markdown viewer adapters. Not implementable in MVP. */
export type DomRangeSelector = never;
/** Reserved for heading/section/AST-path locators. Not implementable in MVP. */
export type StructuralSelector = never;
/** Reserved for exported deep-link fragments. Not implementable in MVP. */
export type FragmentSelector = never;
/**
* The closed union of all selector kinds. The `never` members keep the union
* exhaustive so future selector additions are a single edit.
*/
export type Selector =
| TextQuoteSelector
| TextPositionSelector
| PdfRectSelector
| PdfPageTextSelector
| DomRangeSelector
| StructuralSelector
| FragmentSelector;
export type SelectorType = Selector["type"];

View File

@@ -14,6 +14,8 @@
"noUnusedLocals": true,
"noUnusedParameters": true,
"noEmit": true,
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"isolatedModules": true,

31
vite.config.ts Normal file
View File

@@ -0,0 +1,31 @@
import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
import { fileURLToPath } from "node:url";
import { dirname, resolve } from "node:path";
const __dirname = dirname(fileURLToPath(import.meta.url));
export default defineConfig({
plugins: [react()],
resolve: {
alias: {
"@shared": resolve(__dirname, "src/shared"),
"@engine": resolve(__dirname, "src/engine"),
"@anchor": resolve(__dirname, "src/anchor"),
"@source": resolve(__dirname, "src/source"),
"@binder": resolve(__dirname, "src/binder"),
"@work": resolve(__dirname, "src/work"),
"@app": resolve(__dirname, "src/app"),
},
},
server: {
fs: {
// Allow Vite to serve /fixtures/pdfs/*.pdf from the project root.
allow: [resolve(__dirname)],
},
},
optimizeDeps: {
// pdfjs-dist ships its worker as a .mjs Vite needs to handle.
exclude: ["pdfjs-dist"],
},
});

View File

@@ -64,7 +64,7 @@ T01 (engine types: Document, Representation, Annotation, Selector, EvidenceItem)
id: CE-WP-0002-T01
state_hub_task_id: b015c082-4272-407d-b6e4-9e1bd97f0193
priority: critical
status: todo
status: done
```
Translate the type definitions in `wiki/SharedContracts.md` §1 and §3 into
@@ -95,7 +95,7 @@ Add JSDoc on each type pointing at the §-reference in
id: CE-WP-0002-T02
state_hub_task_id: 59846d9e-7ac1-4306-b02e-0980a52f44c8
priority: critical
status: todo
status: done
depends_on: [T01]
```