Files
citation-evidence/workplans/CE-WP-0001-foundations.md
tegwick d06a456c2a Establish shared-contracts home, dependency map, MVP workplans, and umbrella-first strategy
- INTENT.md: declare umbrella as the home for shared contracts; document
  umbrella-first MVP decision (code lives here until subsystems stabilize)
- wiki/SharedContracts.md: vocabulary, state enums, relation types,
  selector taxonomy, event vocabulary, viewer adapter contract,
  canonical text normalization, rect-registry contract
- wiki/DependencyMap.md: allowed dependency edges; folder layout +
  lint-rule strategy during umbrella-first phase
- history/2026-05-24-initial-assessment.md: alignment review, technical
  risks, and the umbrella-first pivot rationale
- workplans/CE-WP-0001..0004: four ralph-compatible workplans covering
  foundations, PDF review slice, form binding + visual guide, and
  citation card export — implementing PRD §20 end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 16:42:25 +02:00

247 lines
7.1 KiB
Markdown

---
id: CE-WP-0001
type: workplan
title: "Foundations — TS scaffold, folder layout, lint boundaries, normalization, fixtures"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
status: todo
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
- wiki/DependencyMap.md
---
# CE-WP-0001 — Foundations
Establish the skeleton of the umbrella-first MVP: a TypeScript project with
a folder layout that mirrors the future subsystem split (so that extracting
to sister repos later is a `git mv` plus a `package.json` cut), lint rules
that enforce the dependency map at the folder level, the versioned
canonical-text normalization function, and a small but representative PDF
fixtures corpus.
No product features yet. This workplan exists so that everything from
`CE-WP-0002` onward has somewhere to land.
## Decisions captured here
Each task below corresponds to a Phase-0 ADR. The ADR lives at
`docs/decisions/ADR-NNNN-<slug>.md`. If a task involves a choice that wasn't
already decided, the agent stops and asks Bernd before writing code.
## Dependency Order
```
T01 (toolchain decision + package.json)
└─ T02 (folder layout per DependencyMap §4)
└─ T03 (lint rules enforcing dep edges)
└─ T04 (canonical text normalization v1, versioned)
└─ T05 (fixtures: 5+ representative PDFs + a manifest)
└─ T06 (README upgrade + dev workflow doc)
└─ T07 (write the six pending ADRs as stubs)
```
---
## T01 — Toolchain + package.json + tsconfig
```task
id: CE-WP-0001-T01
priority: critical
status: todo
```
Decide the TS toolchain (vite vs tsc-only vs Next.js) and write a single
`package.json` at the repo root. Decisions to lock in this task as an ADR
(`docs/decisions/ADR-0001-toolchain.md`):
- Bundler: vite (recommended, fastest dev loop for a React MVP)
- Package manager: pnpm (recommended, plays well with future workspace split)
- React 18+
- Strict TS
Deliverables:
- `package.json` with `dev`, `build`, `test`, `lint`, `typecheck` scripts
- `tsconfig.json` with strict mode, paths for the `src/` partitions
- `.nvmrc` pinning Node version
- `docs/decisions/ADR-0001-toolchain.md` written and committed
Do not install application dependencies yet — just the toolchain.
---
## T02 — Folder layout matching DependencyMap §4
```task
id: CE-WP-0001-T02
priority: critical
status: todo
depends_on: [T01]
```
Create the source folder layout:
```
src/
shared/ # will become @citation-evidence/engine (types + contracts)
engine/ # will become @citation-evidence/engine (services)
anchor/ # will become @citation-evidence/anchor
source/ # will become @citation-evidence/source
work/ # will become @citation-evidence/work (UI)
binder/ # will become @citation-evidence/binder
app/ # the reference workspace shell
```
Each folder gets:
- A one-line `README.md` stating its future home
- An `index.ts` that re-exports its public API (empty for now)
Add path aliases in `tsconfig.json`: `@shared/*`, `@engine/*`, etc.
---
## T03 — Lint rules enforcing dependency edges
```task
id: CE-WP-0001-T03
priority: high
status: todo
depends_on: [T02]
```
Install `eslint-plugin-boundaries` (or equivalent) and configure rules per
`wiki/DependencyMap.md` §4:
| Folder | May import from |
|--------------|--------------------------------------------------|
| `shared/` | (nothing internal) |
| `engine/` | `shared/` |
| `anchor/` | `shared/`, `engine/` |
| `source/` | `shared/`, `engine/` |
| `binder/` | `shared/`, `engine/`, `anchor/` |
| `work/` | `shared/`, `engine/`, `anchor/`, `source/` |
| `app/` | any |
Add a failing test fixture that imports `source/` from `binder/` and confirm
lint catches it; remove the fixture afterward.
`npm run lint` must pass on a clean tree.
---
## T04 — Canonical text normalization v1
```task
id: CE-WP-0001-T04
priority: critical
status: todo
depends_on: [T02]
```
Implement `src/shared/text/normalize.ts` per `wiki/SharedContracts.md` §6:
1. Unicode NFC
2. Normalize line endings to `\n`
3. Collapse horizontal whitespace runs to a single space
4. Strip soft hyphens (U+00AD)
5. Preserve paragraph boundaries (`\n\n`)
Public API:
```ts
export const NORMALIZE_VERSION = 1;
export function normalize(input: string): { text: string; version: number };
```
Include unit tests covering: ligatures, CRLF input, soft-hyphenated German,
mixed whitespace, paragraph preservation.
Stored selectors will record this version number so that future normalization
changes can be detected as a migration concern.
---
## T05 — PDF fixtures corpus + manifest
```task
id: CE-WP-0001-T05
priority: high
status: todo
depends_on: [T01]
```
Assemble `fixtures/pdfs/` with at least 5 representative PDFs:
- A simple single-column text PDF
- A two-column academic PDF (e.g. ACM-style)
- A German PDF with umlauts and soft hyphens
- A form PDF (e.g. a public-sector application form)
- A PDF with a heading hierarchy
Write `fixtures/pdfs/manifest.json` recording for each:
- filename
- short description
- expected page count
- one short "known-good quote" with the page number it appears on (used by
CE-WP-0002 selector tests)
Keep each PDF small (< 1 MB) and check sources/licenses into
`fixtures/pdfs/SOURCES.md`. Public-domain or Bernd-authored only.
---
## T06 — README upgrade + dev workflow doc
```task
id: CE-WP-0001-T06
priority: medium
status: todo
depends_on: [T01, T02]
```
Replace the one-line `README.md` with a real one:
- What citation-evidence is (one paragraph from INTENT)
- Repository layout (point at `src/` partitions and what each becomes)
- Where to find docs (`wiki/`, `docs/decisions/`, `history/`, `workplans/`)
- Dev workflow: `pnpm install`, `pnpm dev`, `pnpm test`, `pnpm lint`
- Pointer to `~/ralph-workplan/` for how workplans are driven
Add a one-paragraph `README.md` in each of the five sister repos pointing
back at this umbrella + reminding readers that code lives upstream during
the MVP phase.
---
## T07 — Stub the six pending ADRs
```task
id: CE-WP-0001-T07
priority: medium
status: todo
depends_on: [T01]
```
Create stub files in `docs/decisions/` for each ADR mentioned in
`wiki/SharedContracts.md` §10:
- `ADR-0001-toolchain.md` (filled in by T01)
- `ADR-0002-monorepo-vs-polyrepo.md`
- `ADR-0003-w3c-mapping-scope.md`
- `ADR-0004-pdf-viewer-library.md`
- `ADR-0005-persistence.md`
- `ADR-0006-selector-ownership-split.md`
Each stub: title, status (`proposed` for 2-6), context (one paragraph
explaining what the decision is about and why it matters), options (bullet
list with pros/cons), decision (blank), consequences (blank).
These are not decisions yet — they are *the questions that must be answered
before the relevant code lands*. The MVP can proceed without 2-6 being
resolved because no extraction or persistence happens until later workplans.