Implement CE-WP-0001 Foundations: TS scaffold, lint boundaries, normalize v1, fixtures

T01 Toolchain — vite + pnpm 9.15 + React 18 + strict TS (ADR-0001).
T02 Folder layout — src/{shared,engine,anchor,source,binder,work,app}/
    mirroring the future subsystem split, with path aliases.
T03 Boundary lint — eslint-plugin-boundaries enforcing the dependency
    edges from wiki/DependencyMap.md §4; verified by a violating fixture.
T04 Canonical normalization v1 — src/shared/text/normalize.ts with
    NORMALIZE_VERSION=1; 10/10 vitest covering ligatures, CRLF, soft
    hyphens (including line-break reassembly), mixed whitespace.
T05 PDF fixture corpus — 7 user-supplied German PDFs in fixtures/pdfs/
    (gitignored binaries) plus a manifest with verbatim known-good
    quotes and page counts, ready for CE-WP-0002 selector tests.
T06 README upgrade — umbrella README points at wiki/docs/workplans
    and documents the dev workflow.
T07 ADR-0002..0006 stubs in docs/decisions/.

Toolchain end-to-end: pnpm install + lint + typecheck + test all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-25 00:13:03 +02:00
parent 707620adfb
commit 2f25f99cae
32 changed files with 4756 additions and 9 deletions

View File

@@ -0,0 +1,68 @@
# ADR-0001 — Toolchain (Vite + pnpm + React 18 + strict TypeScript)
- Status: accepted
- Date: 2026-05-24
- Workplan: CE-WP-0001-T01
## Context
`citation-evidence` is the umbrella repo for an MVP that will eventually be
segmented into six packages (`shared/engine`, `anchor`, `source`, `binder`,
`work`, `app` per `wiki/DependencyMap.md`). We need a single toolchain that:
1. Gives a fast inner dev loop for a React-based reference workspace.
2. Plays well with a future pnpm workspace split (so each `src/<name>/` folder
can become a workspace package with a `git mv` and a `package.json` cut).
3. Provides first-class TypeScript with the strictest practical settings — the
shared contracts in `wiki/SharedContracts.md` only pay off if the type
system actually enforces them.
4. Has a credible unit-test story for the engine/anchor/source pure-logic code
and an integration path for the UI later.
## Options considered
- **Vite + pnpm + React + Vitest** *(chosen)*
- Fast HMR; well-supported React plugin; Vitest shares the Vite pipeline so
tests use the same module resolution as the app.
- pnpm workspaces are the most ergonomic path to the eventual multi-package
split.
- React 18 because the PRD's reference workspace is a desktop-class web app
and the ecosystem (PDF viewer libraries, drag-and-drop, etc.) targets it.
- **Next.js (App Router)**
- Heavier than needed for a local-first reference workspace; SSR/route
handlers add complexity the MVP doesn't use.
- Harder to split into independent packages later.
- **tsc-only + custom runner**
- Simplest, but no HMR and we'd hand-roll the React + bundler integration.
Pointless overhead for a UI-centric project.
- **Bun / Deno**
- Toolchain bets that would add risk to the PDF/viewer integration spike,
which is already the highest-risk part of the project (see
`CE-WP-0002-T02`).
## Decision
Use **Vite 5** + **pnpm 9** + **React 18** + **TypeScript 5 with `strict`,
`noUncheckedIndexedAccess`, `exactOptionalPropertyTypes`, `noImplicitOverride`,
`noFallthroughCasesInSwitch`, `verbatimModuleSyntax`** turned on. Use
**Vitest 2** as the test runner. Node version pinned to **20.10.0 LTS** via
`.nvmrc`. Path aliases (`@shared/*`, `@engine/*`, etc.) map to `src/<name>/*`
so import sites read the same whether or not the folder is later extracted.
## Consequences
- Bumping React or Node is a deliberate, ADR-worthy change.
- The eventual pnpm workspace split keeps the same import names — each
package's `name` becomes `@citation-evidence/<folder>` and the path aliases
are replaced by package resolution. No source-code churn required.
- Vitest's Vite-aware resolution means a contract test that imports across
partitions will fail at the same boundary that production code would —
there is no test-only loophole.
- ESLint rules enforcing the dependency map (CE-WP-0001-T03) layer on top
cleanly: `eslint-plugin-boundaries` reads the same `tsconfig` paths.
- No application dependencies are installed in this task — only the toolchain.
Subsequent workplans install PDF, drag-and-drop, etc. on demand and record
them in their own ADRs where the choice is non-obvious.

View File

@@ -0,0 +1,50 @@
# ADR-0002 — Monorepo vs polyrepo for the six subsystems
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub)
## Context
The umbrella-first MVP lives entirely in `citation-evidence/` under
`src/{shared,engine,anchor,source,binder,work,app}/`. Each folder is named
after its eventual extracted package. At some point — driven by an external
consumer needing one subsystem, or by independent release cadence — code
will move out into its sister repo.
We need a written answer to: when that moment comes, do we (a) keep one
repository with pnpm workspaces, (b) split into six independent repos with
published packages, or (c) something in between?
The decision affects: dependency management, release cadence, CI surface
area, contributor friction, and how `wiki/SharedContracts.md` is enforced
across the boundary.
## Options
- **A. Single repo, pnpm workspaces**
- Pros: one CI, one version of every dep, atomic cross-package PRs, easy
refactors. Shared contracts enforced by the type checker.
- Cons: any consumer outside this repo needs a private registry or
git-tag-based installs. Release cadence is shared.
- **B. Six independent repos, published packages**
- Pros: clean external publish story, independent versioning. Forces the
contract to be a real package boundary.
- Cons: dependency upgrades require coordinated PR trains. Refactors that
span subsystems become multi-repo dances. Hard to keep
`SharedContracts.md` in sync across repos.
- **C. Hybrid — monorepo with publishable workspaces**
- Pros: best of both: one repo for dev, but `pnpm publish` from any
workspace package. Tools: changesets / nx / turbo.
- Cons: more tooling to learn; per-workspace `package.json` cuts to
maintain.
## Decision
(blank — to be answered before the first subsystem extraction lands.)
## Consequences
(blank)

View File

@@ -0,0 +1,44 @@
# ADR-0003 — W3C Web Annotation mapping: native model or import/export?
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub)
## Context
The PRD mandates compatibility with the W3C Web Annotation Data Model
(FR-009 of `wiki/ProductRequirementsDocument.md`). `Selector` shapes already
mirror the W3C taxonomy. Open question: do we serialize our internal types
*as* JSON-LD Web Annotations natively, or maintain our own JSON shape with
an import/export mapping?
The choice affects: storage format, the public API of `evidence-source`'s
ingest/export paths, what "compatible" means when a user imports an existing
W3C annotation collection, and how much our internal model can diverge from
the spec (e.g. our `EvidenceItem` has no W3C analogue).
## Options
- **A. Native JSON-LD as the canonical store**
- Pros: maximally interoperable; no mapping layer to keep in sync.
- Cons: JSON-LD adds verbosity and context resolution; our extensions
(EvidenceItem, EvidenceLink, EvidenceSet) need custom JSON-LD contexts.
Bad fit for an in-memory MVP.
- **B. Internal model + import/export mapping** *(currently assumed)*
- Pros: terse internal types; clean fit for `wiki/SharedContracts.md`.
Mapping only runs at the system boundary.
- Cons: two shapes to maintain; subtle divergence risk.
- **C. Hybrid — internal model that is a strict superset of W3C JSON shape**
- Pros: serializes losslessly to W3C without a full JSON-LD context.
- Cons: ties internal naming to W3C naming forever, which constrains
future extensions.
## Decision
(blank — required before evidence-source ships its first export path.)
## Consequences
(blank)

View File

@@ -0,0 +1,47 @@
# ADR-0004 — PDF viewer library for the reference workspace
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub); validated in CE-WP-0002-T02
## Context
The PDF round-trip (select text → store selectors → reload → resolve →
scroll → highlight) is the riskiest architectural assumption in the MVP
(see `history/2026-05-24-initial-assessment.md`). The viewer library must:
- Render PDF.js-backed pages in a React shell.
- Expose stable APIs for programmatic text selection and highlight overlay.
- Not leak its types into `src/shared/` or `src/engine/` (enforced by the
T03 boundary lint rules).
- Survive across versions of PDF.js without trapping us in old versions.
CE-WP-0002-T02 is the spike that validates whichever library we pick. If
the spike fails the success criteria, this ADR is the place to record the
failure and propose an alternative.
## Options
- **A. `react-pdf-highlighter-plus`** *(current assumption)*
- Pros: React-native, opinionated overlay layer, well-tested fixture
coverage in the community.
- Cons: bundles a particular PDF.js version; risk of needing to fork to
get clean adapter boundaries.
- **B. `react-pdf` (the official PDF.js React binding) + custom overlay**
- Pros: thinnest abstraction; we own the overlay layer.
- Cons: significantly more code to write and maintain for selection/
highlight; reinventing PDF.js text-layer interaction.
- **C. PDF.js directly (no React wrapper)**
- Pros: maximum control.
- Cons: highest implementation cost; harder to integrate into the React
composition root.
## Decision
(blank — to be filled by the outcome of CE-WP-0002-T02.)
## Consequences
(blank)

View File

@@ -0,0 +1,38 @@
# ADR-0005 — Persistence layer (MVP and beyond)
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub); MVP placeholder in CE-WP-0002-T08
## Context
The MVP needs persistence so that "click an evidence item and have the PDF
jump to and highlight the passage — even after a full page reload" works
(PRD §20 step 4). The acceptable MVP shortcut is `localStorage` (decided
explicitly in CE-WP-0002-T08).
This ADR is the durable home for the real persistence decision: where do
documents, annotations, evidence items, links, and sets live in v1.0?
## Options
- **A. Browser-local only (IndexedDB via `idb` or `dexie`)**
- Pros: zero infra; great for a single-user reference workspace.
- Cons: no cross-device sync; export/import only via files.
- **B. Local-first + sync server (e.g. CRDT-backed)**
- Pros: matches the long-term vision of a workspace tool; conflict-free
multi-device.
- Cons: significant infra and CRDT design cost; out of MVP scope.
- **C. Traditional client/server with a REST or GraphQL API**
- Pros: familiar; easy team-sharing story.
- Cons: requires hosting; loses the local-first character.
## Decision
(blank — to be answered before the second product slice past MVP.)
## Consequences
(blank)

View File

@@ -0,0 +1,46 @@
# ADR-0006 — Selector ownership split (types in engine, algorithms in anchor)
- Status: proposed
- Date: 2026-05-24
- Workplan: CE-WP-0001-T07 (stub); echoes `wiki/SharedContracts.md` §8
## Context
The original sister-repo INTENT files had overlapping ownership claims for
`Selector`: `citation-engine` listed it as an owned domain type, while
`evidence-anchor`'s scope claimed "selector type definitions related to
anchoring". This was resolved on 2026-05-24 in `wiki/SharedContracts.md` §8:
type *interfaces* live in engine (`src/shared/selector.ts`), creation and
resolution *algorithms* live in anchor.
This ADR makes the split formal so that future code reviews have a written
answer when somebody proposes moving the types into anchor or moving the
algorithms into shared.
## Options
- **A. Status quo: types in `shared/`, algorithms in `anchor/`** *(default)*
- Pros: anchor depends on shared (allowed by DependencyMap §4); type
consumers (binder, work) never have to import anchor.
- Cons: tiny risk of types drifting out of sync with what anchor can
actually produce.
- **B. Co-locate types and algorithms in `anchor/`**
- Pros: one home for everything selector-related.
- Cons: any partition that mentions a `Selector` type (which is most of
them) would have to import from `anchor/`. Breaks the
"shared has no internal imports" invariant of DependencyMap §4.
- **C. Split selector kinds: text-quote in shared, PDF-rect in anchor**
- Pros: only adapter-specific selectors leave shared.
- Cons: forces a discriminated union spanning two packages — type
narrowing becomes painful for consumers.
## Decision
(blank — option A is the working assumption codified in SharedContracts.md;
fill this in if a future use case challenges it.)
## Consequences
(blank)