Files
citation-evidence/workplans/CE-WP-0001-foundations.md
tegwick 2f25f99cae Implement CE-WP-0001 Foundations: TS scaffold, lint boundaries, normalize v1, fixtures
T01 Toolchain — vite + pnpm 9.15 + React 18 + strict TS (ADR-0001).
T02 Folder layout — src/{shared,engine,anchor,source,binder,work,app}/
    mirroring the future subsystem split, with path aliases.
T03 Boundary lint — eslint-plugin-boundaries enforcing the dependency
    edges from wiki/DependencyMap.md §4; verified by a violating fixture.
T04 Canonical normalization v1 — src/shared/text/normalize.ts with
    NORMALIZE_VERSION=1; 10/10 vitest covering ligatures, CRLF, soft
    hyphens (including line-break reassembly), mixed whitespace.
T05 PDF fixture corpus — 7 user-supplied German PDFs in fixtures/pdfs/
    (gitignored binaries) plus a manifest with verbatim known-good
    quotes and page counts, ready for CE-WP-0002 selector tests.
T06 README upgrade — umbrella README points at wiki/docs/workplans
    and documents the dev workflow.
T07 ADR-0002..0006 stubs in docs/decisions/.

Toolchain end-to-end: pnpm install + lint + typecheck + test all green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 00:13:03 +02:00

257 lines
7.6 KiB
Markdown

---
id: CE-WP-0001
type: workplan
title: "Foundations — TS scaffold, folder layout, lint boundaries, normalization, fixtures"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
topic_slug: citation_evidence_mvp
topic_id: 96fa8e80-9f74-40f2-84cd-644e9747b9ec
state_hub_workstream_id: 1737bf6e-a3cb-413e-81b8-932f6f85791c
status: done
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
- wiki/DependencyMap.md
---
# CE-WP-0001 — Foundations
Establish the skeleton of the umbrella-first MVP: a TypeScript project with
a folder layout that mirrors the future subsystem split (so that extracting
to sister repos later is a `git mv` plus a `package.json` cut), lint rules
that enforce the dependency map at the folder level, the versioned
canonical-text normalization function, and a small but representative PDF
fixtures corpus.
No product features yet. This workplan exists so that everything from
`CE-WP-0002` onward has somewhere to land.
## Decisions captured here
Each task below corresponds to a Phase-0 ADR. The ADR lives at
`docs/decisions/ADR-NNNN-<slug>.md`. If a task involves a choice that wasn't
already decided, the agent stops and asks Bernd before writing code.
## Dependency Order
```
T01 (toolchain decision + package.json)
└─ T02 (folder layout per DependencyMap §4)
└─ T03 (lint rules enforcing dep edges)
└─ T04 (canonical text normalization v1, versioned)
└─ T05 (fixtures: 5+ representative PDFs + a manifest)
└─ T06 (README upgrade + dev workflow doc)
└─ T07 (write the six pending ADRs as stubs)
```
---
## T01 — Toolchain + package.json + tsconfig
```task
id: CE-WP-0001-T01
state_hub_task_id: 4de816d0-34de-4bdf-a802-da1b0feefc19
priority: critical
status: done
```
Decide the TS toolchain (vite vs tsc-only vs Next.js) and write a single
`package.json` at the repo root. Decisions to lock in this task as an ADR
(`docs/decisions/ADR-0001-toolchain.md`):
- Bundler: vite (recommended, fastest dev loop for a React MVP)
- Package manager: pnpm (recommended, plays well with future workspace split)
- React 18+
- Strict TS
Deliverables:
- `package.json` with `dev`, `build`, `test`, `lint`, `typecheck` scripts
- `tsconfig.json` with strict mode, paths for the `src/` partitions
- `.nvmrc` pinning Node version
- `docs/decisions/ADR-0001-toolchain.md` written and committed
Do not install application dependencies yet — just the toolchain.
---
## T02 — Folder layout matching DependencyMap §4
```task
id: CE-WP-0001-T02
state_hub_task_id: 448d2d93-9517-4649-8aac-e00907a12a0a
priority: critical
status: done
depends_on: [T01]
```
Create the source folder layout:
```
src/
shared/ # will become @citation-evidence/engine (types + contracts)
engine/ # will become @citation-evidence/engine (services)
anchor/ # will become @citation-evidence/anchor
source/ # will become @citation-evidence/source
work/ # will become @citation-evidence/work (UI)
binder/ # will become @citation-evidence/binder
app/ # the reference workspace shell
```
Each folder gets:
- A one-line `README.md` stating its future home
- An `index.ts` that re-exports its public API (empty for now)
Add path aliases in `tsconfig.json`: `@shared/*`, `@engine/*`, etc.
---
## T03 — Lint rules enforcing dependency edges
```task
id: CE-WP-0001-T03
state_hub_task_id: abd08afb-78e5-4b41-b956-53e5605c1113
priority: high
status: done
depends_on: [T02]
```
Install `eslint-plugin-boundaries` (or equivalent) and configure rules per
`wiki/DependencyMap.md` §4:
| Folder | May import from |
|--------------|--------------------------------------------------|
| `shared/` | (nothing internal) |
| `engine/` | `shared/` |
| `anchor/` | `shared/`, `engine/` |
| `source/` | `shared/`, `engine/` |
| `binder/` | `shared/`, `engine/`, `anchor/` |
| `work/` | `shared/`, `engine/`, `anchor/`, `source/` |
| `app/` | any |
Add a failing test fixture that imports `source/` from `binder/` and confirm
lint catches it; remove the fixture afterward.
`npm run lint` must pass on a clean tree.
---
## T04 — Canonical text normalization v1
```task
id: CE-WP-0001-T04
state_hub_task_id: 0ca4f848-20c5-425e-8996-a73569c9be16
priority: critical
status: done
depends_on: [T02]
```
Implement `src/shared/text/normalize.ts` per `wiki/SharedContracts.md` §6:
1. Unicode NFC
2. Normalize line endings to `\n`
3. Collapse horizontal whitespace runs to a single space
4. Strip soft hyphens (U+00AD)
5. Preserve paragraph boundaries (`\n\n`)
Public API:
```ts
export const NORMALIZE_VERSION = 1;
export function normalize(input: string): { text: string; version: number };
```
Include unit tests covering: ligatures, CRLF input, soft-hyphenated German,
mixed whitespace, paragraph preservation.
Stored selectors will record this version number so that future normalization
changes can be detected as a migration concern.
---
## T05 — PDF fixtures corpus + manifest
```task
id: CE-WP-0001-T05
state_hub_task_id: 0b686530-ef89-4172-b5c8-de97fa7b7ef0
priority: high
status: done
depends_on: [T01]
```
Assemble `fixtures/pdfs/` with at least 5 representative PDFs:
- A simple single-column text PDF
- A two-column academic PDF (e.g. ACM-style)
- A German PDF with umlauts and soft hyphens
- A form PDF (e.g. a public-sector application form)
- A PDF with a heading hierarchy
Write `fixtures/pdfs/manifest.json` recording for each:
- filename
- short description
- expected page count
- one short "known-good quote" with the page number it appears on (used by
CE-WP-0002 selector tests)
Keep each PDF small (< 1 MB) and check sources/licenses into
`fixtures/pdfs/SOURCES.md`. Public-domain or Bernd-authored only.
---
## T06 — README upgrade + dev workflow doc
```task
id: CE-WP-0001-T06
state_hub_task_id: b0a5b5a4-81f0-4359-a6e1-67bc6c77e52b
priority: medium
status: done
depends_on: [T01, T02]
```
Replace the one-line `README.md` with a real one:
- What citation-evidence is (one paragraph from INTENT)
- Repository layout (point at `src/` partitions and what each becomes)
- Where to find docs (`wiki/`, `docs/decisions/`, `history/`, `workplans/`)
- Dev workflow: `pnpm install`, `pnpm dev`, `pnpm test`, `pnpm lint`
- Pointer to `~/ralph-workplan/` for how workplans are driven
Add a one-paragraph `README.md` in each of the five sister repos pointing
back at this umbrella + reminding readers that code lives upstream during
the MVP phase.
---
## T07 — Stub the six pending ADRs
```task
id: CE-WP-0001-T07
state_hub_task_id: 15456374-73e0-403e-b805-2e259247e615
priority: medium
status: done
depends_on: [T01]
```
Create stub files in `docs/decisions/` for each ADR mentioned in
`wiki/SharedContracts.md` §10:
- `ADR-0001-toolchain.md` (filled in by T01)
- `ADR-0002-monorepo-vs-polyrepo.md`
- `ADR-0003-w3c-mapping-scope.md`
- `ADR-0004-pdf-viewer-library.md`
- `ADR-0005-persistence.md`
- `ADR-0006-selector-ownership-split.md`
Each stub: title, status (`proposed` for 2-6), context (one paragraph
explaining what the decision is about and why it matters), options (bullet
list with pros/cons), decision (blank), consequences (blank).
These are not decisions yet — they are *the questions that must be answered
before the relevant code lands*. The MVP can proceed without 2-6 being
resolved because no extraction or persistence happens until later workplans.