Establish shared-contracts home, dependency map, MVP workplans, and umbrella-first strategy

- INTENT.md: declare umbrella as the home for shared contracts; document
  umbrella-first MVP decision (code lives here until subsystems stabilize)
- wiki/SharedContracts.md: vocabulary, state enums, relation types,
  selector taxonomy, event vocabulary, viewer adapter contract,
  canonical text normalization, rect-registry contract
- wiki/DependencyMap.md: allowed dependency edges; folder layout +
  lint-rule strategy during umbrella-first phase
- history/2026-05-24-initial-assessment.md: alignment review, technical
  risks, and the umbrella-first pivot rationale
- workplans/CE-WP-0001..0004: four ralph-compatible workplans covering
  foundations, PDF review slice, form binding + visual guide, and
  citation card export — implementing PRD §20 end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-24 16:42:25 +02:00
parent bc95737e6a
commit d06a456c2a
9 changed files with 1597 additions and 0 deletions

View File

@@ -189,6 +189,59 @@ This repository should be:
---
## Home for Shared Contracts
This repository is the **single home for everything the sister repos must
agree on**. The canonical documents live in `wiki/`:
* `wiki/ProductRequirementsDocument.md` — what the product does
* `wiki/ArchitectureOverview.md` — how the subsystems compose
* `wiki/SharedContracts.md` — vocabulary, state enums, relation types, selector taxonomy, event types, viewer adapter contract, canonical text normalization
* `wiki/DependencyMap.md` — which subsystem may depend on which
* `docs/decisions/` — ADRs that resolve ambiguities and bind the contract
Sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`,
`citation-work`, `evidence-binder`) defer to these documents. When their
own `INTENT.md` files mention "shared contracts", they mean the documents
listed above.
Changes to shared contracts happen here, not in the sister repos.
---
## MVP Strategy — Umbrella-First (decided 2026-05-24)
**The MVP lives entirely in this repository before being segmented into the
sister repos.** This is a deliberate trade-off: fewer interface decisions up
front, more refactoring later when extraction happens.
The reasoning:
1. The architectural boundaries documented in the sister INTENT files are
hypotheses. We do not yet know which ones will hold up under real product
pressure.
2. Coordinating six repos with no working code is expensive. Coordinating one
repo with working code is cheap.
3. Interfaces designed in advance of implementation tend to be wrong.
4. Extracting working code into a new repo is a known, bounded refactor.
Reshaping a premature interface while implementing against it is not.
Concretely:
* All MVP source code lives under `citation-evidence/src/`, partitioned by
future-repo names (`shared/`, `engine/`, `anchor/`, `source/`, `work/`,
`binder/`, `app/`).
* The `DependencyMap.md` rules are enforced by lint rules on these folders.
* The five sister repos remain INTENT-only during MVP — they document the
intended boundary, not current code.
* When a subsystem's interface stabilizes (typically after the MVP scenario
has run end-to-end at least once), its `src/<repo-name>/` slice extracts
to the sister repo.
This INTENT will be updated when extraction happens.
---
## Success Criteria
The repository is successful when it allows a developer or agent to understand, run, and extend the citation-evidence system as an integrated product.

View File

@@ -0,0 +1,113 @@
# Initial Assessment — citation-evidence ecosystem
**Date:** 2026-05-24
**Author:** Claude (Opus 4.7), commissioned by Bernd
**Scope:** Review of `citation-evidence` umbrella PRD and Architecture overview, plus all five sister-repo `INTENT.md` files, for alignment, risk, and recommended approach.
---
## 1. Overall alignment across the six INTENT.md files
The vocabulary is impressively coherent: every repo speaks of
`Document → DocumentRepresentation → Annotation → Selector → EvidenceItem → EvidenceLink → CitationCard`.
Each `INTENT.md` follows the same Purpose / Scope / Out-of-Scope / Architectural Position / First-Useful-Version / Success Criteria shape.
Out-of-scope sections show the authors deliberately *pushing* responsibilities into other repos — a healthy signal.
The PRD and Architecture overview in `citation-evidence/wiki/` are also internally consistent: the PRD's functional requirements map cleanly to the architecture's data flows and to subsystem scopes.
But the documents were authored in quick succession (all on 2026-05-24, within ~30 minutes of each other based on file timestamps) and **never reconciled against each other**, which created the issues below.
## 2. What should be improved
### 2.1 Concrete ownership ambiguities to resolve in short ADRs
| Concept | Conflict |
|---|---|
| **`Selector` types** | `citation-engine` claims it as a "key concept owned"; `evidence-anchor`'s scope lists "selector type definitions". Likely fix: *interfaces* in engine, *creation/resolution/algorithms* in anchor. |
| **`EvidenceLink` / `EvidenceSet`** | Engine claims both as owned domain types; `evidence-binder` lists "evidence-to-target binding model" and "evidence sets" in scope. Same engine-defines-type / binder-owns-behavior split needed. |
| **Status enums** | Architecture's `EvidenceItem.status` is `candidate\|confirmed\|rejected\|needs-check`. `citation-work` adds `strong-support\|weak-support\|contradicts`. `evidence-binder` adds *target-specific* states (`conflicting-evidence`, `insufficient-evidence`, `verified`) plus extra relations (`context-for`, `derived-from`, `needs-check`). Three repos inventing overlapping enums. |
| **Viewer adapters** | Architecture diagram shows them as a separate box, no owner. Adapter methods (`load`, `createSelectorsFromSelection`, `resolveSelectors`, `scrollToResolvedTarget`, `renderHighlight`) straddle `evidence-source` and `evidence-anchor`. Pick one home (likely `evidence-anchor`, with `evidence-source` providing the representation). |
| **`CitationRecoveryAttempt`** | Type in engine, behavior in `evidence-source` — semantic ownership split that will rot. |
| **Document review status (FR-006)** | No repo claims it; `citation-work` hints "may later be moved into a shared model". |
### 2.2 Repository scaffolding gaps
- The umbrella architecture (§3.1) promises `apps/workspace-demo/`, `docs/decisions/`, `integration-tests/`, `docker-compose.yml` — none of this exists yet.
- All six READMEs are essentially empty (1 line). New contributors and agents won't know where to start.
- `citation-evidence` is **not registered in the state-hub**. For a project that splits across six repos, you lose central memory of decisions/dependencies/progress without it.
### 2.3 Architectural decisions still pending
ADR-001 through ADR-005 in the architecture doc are framed as "recommendations" rather than commitments. Each blocks code:
- React-first vs web-component-first (drives repo packaging)
- Local-first vs server-first storage (drives persistence interface shape)
- W3C internal model vs mapping (drives every type definition)
- `react-pdf-highlighter-plus` vs PDF.js direct (drives MVP timeline by weeks)
- Recovery scope local-only vs external
### 2.4 Missing cross-repo contract artefacts
There is no central dependency map. Each repo says "I expect to depend on X" but nothing names which repo *publishes* the shared types package(s). Pick monorepo (pnpm workspace) vs polyrepo with published `@citation-evidence/engine` npm packages before the first commit of code lands — switching later is painful.
## 3. Technical risks to inspect first
In rough order of "if this is broken, the architecture doesn't work":
1. **PDF canonical-text stability** — the entire selector/anchor model assumes a given PDF + extraction pipeline produces *the same* canonical text each time. PDF.js text extraction has known issues with multi-column layouts, custom-glyph fonts, ligatures, soft hyphens, and reading order. Build a corpus of 15-20 representative PDFs (governmental forms, two-column papers, scanned-then-OCR'd, German umlauts) and confirm round-trip selector resolution before committing to the model.
2. **`react-pdf-highlighter-plus` abstraction leakage** — this library is opinionated; wrapping it cleanly while keeping the engine viewer-independent is the central architectural test. Do a focused spike: load PDF → select → store selectors as JSON → reload page → resolve from JSON → highlight. If this leaks PDF.js types into the engine API, the boundary fails on day one.
3. **Canonical-text normalization is a silent migration** — every stored annotation's `TextQuoteSelector` / `TextPositionSelector` depends on the *exact* normalization rules used at creation time. Treat normalization as a versioned, deterministic function from day one. If you change Unicode normalization or whitespace handling later, every stored annotation breaks silently.
4. **Visual guide overlay coupling**`evidence-binder` owns the visual-guide *model*, but rendering needs DOM rects from three sources: the form (binder's UI?), the evidence sidebar (`citation-work`), and the document highlight (viewer adapter). Three subsystems contributing rects to one overlay is the highest-coupling part of the system. Define an explicit *rect registry* contract before any of them ships UI.
5. **CSS Custom Highlight API support** — architecture mentions it for HTML/Markdown with fallback. Browser support is uneven; the fallback (usually DOM range-based span wrapping) is what will actually run on most users' machines. Verify the fallback path is acceptable, not the optimistic primary.
6. **W3C Web Annotation mapping is not free** — JSON-LD selectors can express things your internal model can't (and vice versa). Round-tripping is a research task, not a one-day mapping. Decide whether mapping is "lossy but useful" or "MUST round-trip" before stabilizing types.
7. **Multi-repo dependency cycle risk** — engine ↔ anchor (`Selector` ownership), engine ↔ source (`RecoveryAttempt`), engine ↔ binder (`Link`/`Set`) all currently look bidirectional in the INTENT files. Without a strict "types-only flow downward, behavior flows upward" rule, you will hit `npm install` cycles.
## 4. Rough approach (original phased plan)
**Phase 0 — Foundations (1-2 weeks, no production code)**
- Register `citation-evidence` as a state-hub domain + register all six repos
- Write 5-7 micro-ADRs in `citation-evidence/docs/decisions/` resolving the ownership ambiguities above
- Pick monorepo-vs-polyrepo and pin Node/TS toolchain
- Assemble a 15-20 PDF test corpus and check it into a fixtures location
- Write a real README for each repo pointing at INTENT + architecture
**Phase 1 — Vertical slice on the easiest format (4-6 weeks)**
- Engine: TS types + in-memory repos only
- Anchor: text-quote + text-position selectors, fuzzy match deferred
- Source: PDF text extraction + fingerprint only
- Work: one-document UI, sidebar, create annotation, click-to-reopen
- Umbrella: wire it into a reference app
- Goal: prove viewer-independence on PDFs end-to-end. No forms, no recovery, no Markdown.
**Phase 2 — Evidence binding & form mode (4 weeks)**
- Binder + visual-guide rect registry
- One form-schema example with side-by-side viewer
- This is where the active-state coordination claim gets stress-tested
**Phase 3 — Format expansion (4 weeks)**
- HTML adapter (sanitization + DOM range selectors)
- Markdown adapter
- Confirms the format-neutral claim
**Phase 4 — Local citation recovery (4 weeks)**
- Local-library search, exact + fuzzy quote match, confirmation UI
- Defer external source lookup until local pipeline is reliable
## 5. Pivot — umbrella-first MVP (decided 2026-05-24)
The user has chosen to **build the MVP entirely inside `citation-evidence`** before segmenting code into the sister repos. The reasoning: get the product working end-to-end with minimal coordination cost, then extract subsystems once the contracts have been validated by actual use.
This means:
- All MVP source code lives under `citation-evidence/` (likely `src/` partitioned by future-repo names: `engine/`, `anchor/`, `source/`, `work/`, `binder/`).
- The five sister repos remain as INTENT-only placeholders during MVP — they document the intended boundaries, but code will move in only when a subsystem's contract has stabilized.
- Interface design is explicitly deferred. Phase-0 ADRs become Phase-N extractions, informed by real friction points.
- Shared contracts live in `citation-evidence/wiki/SharedContracts.md` and `citation-evidence/wiki/DependencyMap.md`.
This trade-off accepts more rework later (when subsystems extract) in exchange for faster MVP velocity now and better-informed boundaries when extraction happens.

155
wiki/DependencyMap.md Normal file
View File

@@ -0,0 +1,155 @@
# Dependency Map — citation-evidence
This document describes the **allowed dependency edges** between the
subsystems of the citation-evidence ecosystem. It is the cycle-prevention
contract.
It complements `SharedContracts.md` (which says *what* is shared) by saying
*who is allowed to depend on whom*.
---
## 1. The rule
> Types flow downward from `citation-engine`. Behavior flows upward into
> specialised repos. No subsystem may import another subsystem's behavior
> unless this map shows an edge.
The umbrella repo `citation-evidence` is allowed to depend on every
subsystem; nothing depends on the umbrella.
---
## 2. Allowed edges
```
┌───────────────────────┐
│ citation-evidence │ (umbrella)
└───────────┬───────────┘
│ depends on
┌──────────────────────────┼────────────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌────────────────┐ ┌────────────────┐
│ citation- │ │ evidence- │ │ citation- │
│ work │ │ binder │ │ engine │
└──────┬────────┘ └────────┬───────┘ └────────┬───────┘
│ │ │
│ depends on │ depends on │ depends on
│ │ │ (nothing —
▼ ▼ │ leaf node)
┌────────────────┐ ┌────────────────┐ │
│ evidence- │ │ evidence- │ │
│ anchor │ │ anchor │ │
└──────┬─────────┘ └────────┬───────┘ │
│ │ │
│ depends on │ depends on │
▼ ▼ ▼
┌────────────────┐ ┌────────────────┐ (citation-engine)
│ evidence- │ │ citation- │
│ source │ │ engine │
└────────┬───────┘ └────────────────┘
│ depends on
┌────────────────┐
│ citation- │
│ engine │
└────────────────┘
```
In tabular form:
| Repo | May depend on | Must not depend on |
|--------------------|--------------------------------------------------------|-----------------------------------------|
| `citation-engine` | (nothing — it is the leaf) | every other subsystem |
| `evidence-anchor` | `citation-engine` | `evidence-source`, `citation-work`, `evidence-binder`, `citation-evidence` |
| `evidence-source` | `citation-engine` | `evidence-anchor`, `citation-work`, `evidence-binder`, `citation-evidence` |
| `evidence-binder` | `citation-engine`, `evidence-anchor` | `evidence-source`, `citation-work`, `citation-evidence` |
| `citation-work` | `citation-engine`, `evidence-anchor`, `evidence-source`| `evidence-binder`, `citation-evidence` |
| `citation-evidence`| all five subsystems | (nothing else in the ecosystem) |
Notes:
- `evidence-source` does NOT depend on `evidence-anchor`. When an ingestion
pipeline needs to know "could a selector resolve here?", the answer comes
through events, not direct calls.
- `citation-work` does NOT depend on `evidence-binder`. Linking evidence to
form fields is a separate workflow; the review workspace should function
without it. A separate "evidence-backed form" application composes work +
binder + engine.
- `evidence-binder` does NOT depend on `evidence-source`. When a binder needs
source context, it asks `evidence-anchor` to resolve the annotation, which
in turn knows nothing about how the document was ingested.
---
## 3. Communication channels
Direct imports are allowed only along the edges above. Where two subsystems
need to coordinate without being allowed to import each other, they use one
of these indirect channels:
| Channel | Owner | Notes |
|---------------------------------|------------------|---------------------------------------------------------|
| Shared event bus | `citation-engine`| Vocabulary frozen in `SharedContracts.md` §4 |
| Shared types package | `citation-engine`| Re-exported through `@citation-evidence/engine` (post-extraction) |
| Rect registry | `evidence-binder`| Used by form UI, evidence sidebar, viewer adapter |
| Persistence interfaces | `citation-engine`| Concrete adapters in subsystems but interfaces in engine|
---
## 4. During umbrella-first MVP
While all code lives in `citation-evidence/src/`, the rule is enforced by
**folder structure** and **lint rules**:
```
citation-evidence/src/
shared/ ← what will become citation-engine (types + contracts)
engine/ ← what will become citation-engine (services)
anchor/ ← what will become evidence-anchor
source/ ← what will become evidence-source
work/ ← what will become citation-work (UI)
binder/ ← what will become evidence-binder
app/ ← the umbrella reference app
```
Lint rule (to be added in WP-0001):
- `engine/` may import only from `shared/`.
- `anchor/` may import only from `shared/`, `engine/`.
- `source/` may import only from `shared/`, `engine/`.
- `binder/` may import only from `shared/`, `engine/`, `anchor/`.
- `work/` may import only from `shared/`, `engine/`, `anchor/`, `source/`.
- `app/` may import from any.
Violating these rules in MVP is a lint error, not a runtime error. When
subsystems extract into their own repos, the lint rule disappears and the
package boundary enforces the same constraint.
---
## 5. Why these rules
1. **`citation-engine` as the leaf** prevents the most common monorepo pathology:
the "core" repo accumulating UI/IO dependencies because it was easier than
inverting a dependency.
2. **`citation-work``evidence-binder`** keeps the review workspace usable
even when there is no form context (e.g. just collecting evidence for a
report).
3. **`evidence-binder``evidence-source`** keeps binding logic from
accidentally caring about ingestion details.
4. **No subsystem depends on `citation-evidence`** — the umbrella is a
composition point, not a library.
---
## 6. Change process
Adding an edge to this map is a change to the contract.
- New edges require a short ADR in `docs/decisions/`.
- Removing an edge requires a refactoring plan (where do consumers go?).
- The MVP itself is an exception: edges that turn out to be wrong during
umbrella-first development are recorded as "deferred reshape" items in the
relevant workplan, not as ADRs.

296
wiki/SharedContracts.md Normal file
View File

@@ -0,0 +1,296 @@
# Shared Contracts — citation-evidence
This document is the **single source of truth** for everything that more than one
subsystem in the citation-evidence ecosystem must agree on:
- the **vocabulary** (entity names and what they mean),
- the **canonical state enums** for entities that flow across repo boundaries,
- the **relation type** vocabulary,
- the **selector type** taxonomy,
- the **event type** vocabulary,
- the **ownership rules** for shared types versus shared behavior.
The five sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`,
`citation-work`, `evidence-binder`) defer to this document. When their
`INTENT.md` files refer to "shared contracts", they mean this file.
During the umbrella-first MVP phase, the **TypeScript implementations** of
these contracts live in `citation-evidence/src/shared/` and are imported by
the per-subsystem code under `citation-evidence/src/{engine,anchor,source,work,binder}/`.
When a subsystem extracts to its own repo, it takes its slice of the shared
types with it — but this document remains the canonical vocabulary.
---
## 1. Vocabulary
These nine entities are the vocabulary every subsystem uses.
| Entity | One-line definition | Owner (post-extraction) |
|---------------------------|----------------------------------------------------------------------------------------------------|-------------------------|
| `Document` | An identified source object: PDF, Markdown, HTML, scan, etc. | `citation-engine` |
| `DocumentRepresentation` | A normalized, addressable view of a document (canonical text, page map, structure). | `citation-engine` |
| `Selector` | A technical locator for a passage inside a representation. | `citation-engine` (types) / `evidence-anchor` (behavior) |
| `Annotation` | A technical mark on a document range, expressed as one or more selectors plus quote text. | `citation-engine` |
| `EvidenceItem` | A meaningful evidence object built from one or more annotations, with commentary and status. | `citation-engine` |
| `EvidenceSet` | An ordered group of evidence items associated with a target or topic. | `citation-engine` (type) / `evidence-binder` (behavior) |
| `EvidenceLink` | A relation between an `EvidenceItem` and a structured target (form field, claim, requirement, …). | `citation-engine` (type) / `evidence-binder` (behavior) |
| `CitationCard` | A renderable, exportable presentation of an evidence item. | `citation-engine` |
| `CitationRecoveryAttempt` | A traceable attempt to locate a cited passage from an external clue. | `citation-engine` (type) / `evidence-source` (behavior) |
**Ownership rule:** *types and interfaces flow downward from `citation-engine`;
behavior flows upward into the specialised repos*. Where the table shows a
split, the engine repo holds the data shape and the other repo holds the
algorithms and lifecycle.
---
## 2. Canonical state enums
These enums are the authoritative values. Subsystems must not invent local
variants without updating this document first.
### 2.1 `Annotation.resolutionStatus`
```
resolved — selectors located the passage with high confidence
ambiguous — multiple plausible candidates found
unresolved — no plausible candidate found
stale — representation has changed since selectors were stored
```
### 2.2 `EvidenceItem.status`
```
candidate — captured but not yet vetted
confirmed — verified by a user as useful evidence
rejected — explicitly discarded
needs-check — flagged for review
```
> **Note:** earlier subsystem drafts introduced `strong-support`, `weak-support`,
> and `contradicts` on the item. Those concepts now live on the **link**, not
> the item — see §2.4.
### 2.3 `Document.reviewStatus` (when used by `citation-work`)
```
unreviewed
in-review
relevant
rejected
needs-follow-up
cited
verified
```
`citation-work` may treat any of these as the active state; the canonical
storage lives on the Document record in `citation-engine`.
### 2.4 `EvidenceLink.status` (per target)
```
no-evidence
candidate
confirmed
conflicting
insufficient
verified
```
`no-evidence` is a *derived* state computed when a target has zero links;
it is not stored on a link itself.
### 2.5 `EvidenceLink.relation`
```
supports
contradicts
explains
qualifies
source-for
context-for
```
This is the closed vocabulary for the MVP. Adding a relation requires updating
this document and the `EvidenceLink` schema together.
### 2.6 `CitationRecoveryAttempt.state`
```
created
source-found-fulltext
source-found-preview-only
source-found-metadata-only
source-not-found
quote-found
quote-not-found
candidate-passages-found
manual-confirmation-needed
confirmed
annotation-created
failed
```
---
## 3. Selector taxonomy
A `Selector` is a discriminated union of:
```
TextQuoteSelector exact quote + prefix/suffix context
TextPositionSelector canonical text start/end offsets
PdfRectSelector page number + normalized page rectangles
PdfPageTextSelector page number + page-local text offsets
DomRangeSelector DOM path + range offsets (HTML/Markdown)
StructuralSelector heading/section/AST path
FragmentSelector exported fragment / deep link (export-only)
```
**Selector redundancy rule:** when an annotation is created, the system stores
*all selector types that are available* for that document representation, not
just one. Resolution tries them in order of expected confidence and stops at
the first high-confidence match.
W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping
is documented separately (see ADR-0003 — pending).
---
## 4. Event vocabulary
Events are the primary integration mechanism between subsystems. The closed
event vocabulary for the MVP is:
```
DocumentImported
DocumentRepresentationGenerated
AnnotationCreated
AnnotationResolved
AnnotationResolutionFailed
EvidenceItemCreated
EvidenceItemUpdated
EvidenceLinkCreated
EvidenceLinkUpdated
EvidenceItemActivated
FormFieldActivated
CitationCardRendered
CitationRecoveryStarted
CitationRecoveryCandidateFound
CitationRecoveryConfirmed
```
Subsystems must emit these events through a shared event bus owned by
`citation-engine`. Subsystems may listen to any event but must not invent
event types without updating this document.
---
## 5. Viewer adapter contract
Viewer adapters are the bridge between a document format and the rest of the
system. They are **owned by `evidence-anchor`** as far as the contract goes;
concrete adapters may live in either `evidence-anchor` or `evidence-source`
depending on whether the heavy lifting is selector logic or document
representation logic.
```ts
interface DocumentViewerAdapter {
mediaTypes: string[];
load(document: Document, representation?: DocumentRepresentation): Promise<void>;
getCurrentSelection(): Promise<SelectionCapture | null>;
createSelectorsFromSelection(selection: SelectionCapture): Promise<Selector[]>;
resolveSelectors(selectors: Selector[]): Promise<AnchorResolution>;
scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise<void>;
renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise<void>;
getHighlightClientRects(annotationId: string): Promise<DOMRect[]>;
}
```
MVP delivers a single `PDFViewerAdapter`. HTML and Markdown adapters are
deferred.
---
## 6. Canonical text normalization
All text-based selectors and quote matching depend on a deterministic
normalization function. The MVP normalization is:
1. Unicode NFC normalization.
2. Replace all line-ending sequences with `\n`.
3. Collapse runs of horizontal whitespace into a single space.
4. Strip soft hyphens (U+00AD).
5. Preserve paragraph boundaries (double `\n`).
**This function is versioned.** Stored selectors record the normalization
version they were created against. Changing the function later requires either
backwards-compatible behavior or a re-anchoring migration.
The reference implementation lives in `citation-evidence/src/shared/text/normalize.ts`.
---
## 7. Visual guide rect registry
The visual-guide overlay (form field → evidence card → source highlight)
requires DOM rects from three independently-rendered subsystems. The contract
is a **rect registry** owned by `evidence-binder`:
```ts
interface RectRegistry {
register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void;
getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null;
subscribe(listener: (event: RectRegistryEvent) => void): () => void;
}
```
Each renderer (form, evidence sidebar, viewer adapter) registers a
`getRect` callback. The overlay queries on-demand and re-renders on scroll,
resize, focus, and active-evidence change.
This contract MUST be defined and stable before any of the three renderers
hardens, or the overlay becomes the system's coupling bottleneck.
---
## 8. Ownership rules (the short version)
1. **Types and interfaces** flow downward from `citation-engine`.
2. **Behavior and algorithms** live in the specialised repos.
3. Where a concept appears in both a type and a behavior context (e.g.
`Selector`, `EvidenceLink`, `EvidenceSet`, `CitationRecoveryAttempt`),
the engine owns the shape and the specialised repo owns the lifecycle.
4. **The shared event bus is engine-owned**; subsystems publish and subscribe
but do not extend the event vocabulary unilaterally.
5. **No new enum values, relation types, event types, or selector kinds**
land in code without first appearing in this document.
6. During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate
small violations in `citation-evidence/src/` and reconcile during extraction.
---
## 9. Change process
Changes to this document are change to the contract.
- Small additions (a new enum value, a new event type) can be made in a single
PR that updates this doc + the type definitions + at least one consumer.
- Breaking changes (renaming an entity, removing a state, changing an
ownership split) require a short ADR in `docs/decisions/` and a heads-up
progress event on the state-hub.
---
## 10. Pending ADRs that will affect this document
These are listed in `docs/decisions/` once written. Until then the document
reflects the current best understanding from the architecture overview.
- **ADR-0001** — Umbrella-first MVP strategy (decided 2026-05-24, this session).
- **ADR-0002** — Monorepo vs polyrepo packaging (pending).
- **ADR-0003** — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending).
- **ADR-0004** — PDF viewer library choice: `react-pdf-highlighter-plus` vs PDF.js direct (pending).
- **ADR-0005** — Persistence: local-first SQLite vs Postgres from day one (pending).
- **ADR-0006** — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here).

View File

@@ -0,0 +1,246 @@
---
id: CE-WP-0001
type: workplan
title: "Foundations — TS scaffold, folder layout, lint boundaries, normalization, fixtures"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
status: todo
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
- wiki/DependencyMap.md
---
# CE-WP-0001 — Foundations
Establish the skeleton of the umbrella-first MVP: a TypeScript project with
a folder layout that mirrors the future subsystem split (so that extracting
to sister repos later is a `git mv` plus a `package.json` cut), lint rules
that enforce the dependency map at the folder level, the versioned
canonical-text normalization function, and a small but representative PDF
fixtures corpus.
No product features yet. This workplan exists so that everything from
`CE-WP-0002` onward has somewhere to land.
## Decisions captured here
Each task below corresponds to a Phase-0 ADR. The ADR lives at
`docs/decisions/ADR-NNNN-<slug>.md`. If a task involves a choice that wasn't
already decided, the agent stops and asks Bernd before writing code.
## Dependency Order
```
T01 (toolchain decision + package.json)
└─ T02 (folder layout per DependencyMap §4)
└─ T03 (lint rules enforcing dep edges)
└─ T04 (canonical text normalization v1, versioned)
└─ T05 (fixtures: 5+ representative PDFs + a manifest)
└─ T06 (README upgrade + dev workflow doc)
└─ T07 (write the six pending ADRs as stubs)
```
---
## T01 — Toolchain + package.json + tsconfig
```task
id: CE-WP-0001-T01
priority: critical
status: todo
```
Decide the TS toolchain (vite vs tsc-only vs Next.js) and write a single
`package.json` at the repo root. Decisions to lock in this task as an ADR
(`docs/decisions/ADR-0001-toolchain.md`):
- Bundler: vite (recommended, fastest dev loop for a React MVP)
- Package manager: pnpm (recommended, plays well with future workspace split)
- React 18+
- Strict TS
Deliverables:
- `package.json` with `dev`, `build`, `test`, `lint`, `typecheck` scripts
- `tsconfig.json` with strict mode, paths for the `src/` partitions
- `.nvmrc` pinning Node version
- `docs/decisions/ADR-0001-toolchain.md` written and committed
Do not install application dependencies yet — just the toolchain.
---
## T02 — Folder layout matching DependencyMap §4
```task
id: CE-WP-0001-T02
priority: critical
status: todo
depends_on: [T01]
```
Create the source folder layout:
```
src/
shared/ # will become @citation-evidence/engine (types + contracts)
engine/ # will become @citation-evidence/engine (services)
anchor/ # will become @citation-evidence/anchor
source/ # will become @citation-evidence/source
work/ # will become @citation-evidence/work (UI)
binder/ # will become @citation-evidence/binder
app/ # the reference workspace shell
```
Each folder gets:
- A one-line `README.md` stating its future home
- An `index.ts` that re-exports its public API (empty for now)
Add path aliases in `tsconfig.json`: `@shared/*`, `@engine/*`, etc.
---
## T03 — Lint rules enforcing dependency edges
```task
id: CE-WP-0001-T03
priority: high
status: todo
depends_on: [T02]
```
Install `eslint-plugin-boundaries` (or equivalent) and configure rules per
`wiki/DependencyMap.md` §4:
| Folder | May import from |
|--------------|--------------------------------------------------|
| `shared/` | (nothing internal) |
| `engine/` | `shared/` |
| `anchor/` | `shared/`, `engine/` |
| `source/` | `shared/`, `engine/` |
| `binder/` | `shared/`, `engine/`, `anchor/` |
| `work/` | `shared/`, `engine/`, `anchor/`, `source/` |
| `app/` | any |
Add a failing test fixture that imports `source/` from `binder/` and confirm
lint catches it; remove the fixture afterward.
`npm run lint` must pass on a clean tree.
---
## T04 — Canonical text normalization v1
```task
id: CE-WP-0001-T04
priority: critical
status: todo
depends_on: [T02]
```
Implement `src/shared/text/normalize.ts` per `wiki/SharedContracts.md` §6:
1. Unicode NFC
2. Normalize line endings to `\n`
3. Collapse horizontal whitespace runs to a single space
4. Strip soft hyphens (U+00AD)
5. Preserve paragraph boundaries (`\n\n`)
Public API:
```ts
export const NORMALIZE_VERSION = 1;
export function normalize(input: string): { text: string; version: number };
```
Include unit tests covering: ligatures, CRLF input, soft-hyphenated German,
mixed whitespace, paragraph preservation.
Stored selectors will record this version number so that future normalization
changes can be detected as a migration concern.
---
## T05 — PDF fixtures corpus + manifest
```task
id: CE-WP-0001-T05
priority: high
status: todo
depends_on: [T01]
```
Assemble `fixtures/pdfs/` with at least 5 representative PDFs:
- A simple single-column text PDF
- A two-column academic PDF (e.g. ACM-style)
- A German PDF with umlauts and soft hyphens
- A form PDF (e.g. a public-sector application form)
- A PDF with a heading hierarchy
Write `fixtures/pdfs/manifest.json` recording for each:
- filename
- short description
- expected page count
- one short "known-good quote" with the page number it appears on (used by
CE-WP-0002 selector tests)
Keep each PDF small (< 1 MB) and check sources/licenses into
`fixtures/pdfs/SOURCES.md`. Public-domain or Bernd-authored only.
---
## T06 — README upgrade + dev workflow doc
```task
id: CE-WP-0001-T06
priority: medium
status: todo
depends_on: [T01, T02]
```
Replace the one-line `README.md` with a real one:
- What citation-evidence is (one paragraph from INTENT)
- Repository layout (point at `src/` partitions and what each becomes)
- Where to find docs (`wiki/`, `docs/decisions/`, `history/`, `workplans/`)
- Dev workflow: `pnpm install`, `pnpm dev`, `pnpm test`, `pnpm lint`
- Pointer to `~/ralph-workplan/` for how workplans are driven
Add a one-paragraph `README.md` in each of the five sister repos pointing
back at this umbrella + reminding readers that code lives upstream during
the MVP phase.
---
## T07 — Stub the six pending ADRs
```task
id: CE-WP-0001-T07
priority: medium
status: todo
depends_on: [T01]
```
Create stub files in `docs/decisions/` for each ADR mentioned in
`wiki/SharedContracts.md` §10:
- `ADR-0001-toolchain.md` (filled in by T01)
- `ADR-0002-monorepo-vs-polyrepo.md`
- `ADR-0003-w3c-mapping-scope.md`
- `ADR-0004-pdf-viewer-library.md`
- `ADR-0005-persistence.md`
- `ADR-0006-selector-ownership-split.md`
Each stub: title, status (`proposed` for 2-6), context (one paragraph
explaining what the decision is about and why it matters), options (bullet
list with pros/cons), decision (blank), consequences (blank).
These are not decisions yet — they are *the questions that must be answered
before the relevant code lands*. The MVP can proceed without 2-6 being
resolved because no extraction or persistence happens until later workplans.

View File

@@ -0,0 +1,283 @@
---
id: CE-WP-0002
type: workplan
title: "PDF review slice — engine types, anchor, source, viewer, sidebar, click-to-reopen"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
status: todo
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
depends_on_workplan: CE-WP-0001
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
---
# CE-WP-0002 — PDF Review Slice
The first vertical product slice. After this workplan, a user can:
1. Open the app, see a collection of fixture PDFs.
2. Open one PDF in a viewer.
3. Select text, add a one-line comment, save as an evidence item.
4. See the evidence item appear in a sidebar.
5. Click the evidence item and have the PDF jump to and highlight the
passage — even after a full page reload.
No forms, no Markdown/HTML, no recovery, no export. Those come later.
This workplan exercises the riskiest architectural assumption (PDF selector
round-trip with viewer independence) on the simplest possible feature set.
## Risk-driven order
T01 and T02 are the spike from the assessment: prove the
`react-pdf-highlighter-plus` integration can store and reload selectors
without leaking viewer types into engine code. If that breaks, the rest of
the workplan stops and a new ADR is required for ADR-0004 (PDF viewer choice).
## Dependency Order
```
T01 (engine types: Document, Representation, Annotation, Selector, EvidenceItem)
└─ T02 (PDF viewer adapter spike — store + reload selectors as JSON)
└─ T03 (evidence-source: PDF ingest, fingerprint, canonical text)
└─ T04 (evidence-anchor: TextQuote + TextPosition resolution against representation)
└─ T05 (in-memory repositories + engine services)
└─ T06 (citation-work UI: collection list + viewer shell + sidebar)
└─ T07 (annotation create flow)
└─ T08 (click-to-reopen flow)
└─ T09 (end-to-end test of PRD scenario steps 1-4)
```
---
## T01 — Engine types in `src/shared/`
```task
id: CE-WP-0002-T01
priority: critical
status: todo
```
Translate the type definitions in `wiki/SharedContracts.md` §1 and §3 into
TypeScript under `src/shared/`:
- `src/shared/document.ts``Document`, `DocumentRepresentation`, `PageMap`,
`OffsetMap`
- `src/shared/selector.ts``Selector` discriminated union with at minimum
`TextQuoteSelector`, `TextPositionSelector`, `PdfRectSelector`,
`PdfPageTextSelector`. Other selector kinds defined as `never`-typed stubs
for now.
- `src/shared/annotation.ts``Annotation` with `selectors`, `quote`,
`note`, `normalizeVersion`
- `src/shared/evidence.ts``EvidenceItem`, `EvidenceItem.status` enum per
§2.2
- `src/shared/ids.ts` — branded ID types and a `newId(prefix)` helper
No services, no behavior. Pure data shapes + the ID helper.
Add JSDoc on each type pointing at the §-reference in
`wiki/SharedContracts.md` it implements.
---
## T02 — PDF viewer adapter spike
```task
id: CE-WP-0002-T02
priority: critical
status: todo
depends_on: [T01]
```
**This is the architectural spike.** Build a throwaway
`src/anchor/pdf-viewer-adapter-spike.tsx` that:
1. Loads `fixtures/pdfs/simple.pdf` using `react-pdf-highlighter-plus`
(assumed; if a better library appears, document it in ADR-0004 before
committing).
2. Lets the user select text and produces selectors per `T01` shapes.
3. Serializes the selectors to a JSON blob in `localStorage`.
4. On reload, reads the blob, asks the adapter to resolve, scrolls to the
passage, and renders a highlight.
Success criteria:
- Reload-and-resolve works for all fixture PDFs.
- No PDF.js or `react-pdf-highlighter-plus` types appear in any file under
`src/shared/` or `src/engine/`.
- The adapter's public surface matches the contract in
`wiki/SharedContracts.md` §5.
If success criteria fail: stop. Write a short note in
`docs/decisions/ADR-0004-pdf-viewer-library.md` describing the failure mode
and proposed alternative. Do not proceed with T03+.
---
## T03 — `src/source/`: PDF ingest, fingerprint, canonical text
```task
id: CE-WP-0002-T03
priority: high
status: todo
depends_on: [T02]
```
Implement under `src/source/pdf/`:
- `ingest.ts``ingestPdf(file: File | Buffer): Promise<{ document: Document; representation: DocumentRepresentation }>`
- `fingerprint.ts` — stable SHA-256 of bytes
- `extract.ts` — uses PDF.js to extract page text; runs `normalize()` from
T04 of WP-0001 over the canonical text; builds the `PageMap` and
`OffsetMap` per `Document.DocumentRepresentation`
Tests use the fixture corpus from `CE-WP-0001-T05`. For each fixture,
extracted canonical text must contain the manifest's known-good quote.
---
## T04 — `src/anchor/`: TextQuote and TextPosition resolution
```task
id: CE-WP-0002-T04
priority: high
status: todo
depends_on: [T01, T03]
```
Implement under `src/anchor/`:
- `selectors/create.ts` — given a `SelectionCapture` from the adapter, build
the maximal set of available selectors (always `TextQuoteSelector` with
prefix/suffix; `TextPositionSelector` when the representation provides
offsets; PDF rect/text selectors when on PDF)
- `selectors/resolve.ts` — implements the resolution strategy from
`wiki/ArchitectureOverview.md` §7 (try position, verify quote, fall back
through quote+prefix/suffix, return `AnchorResolution`)
- `selectors/types.ts``AnchorResolution`, `SelectionCapture`,
`ResolvedAnchorTarget`
Fuzzy matching is out of scope here — return `unresolved` if exact+prefix/suffix
fails. Fuzzy is a later workplan.
Unit tests using fixtures: for each fixture+known-quote pair, create
selectors then immediately resolve them; resolution must succeed with
confidence ≥ 0.9.
---
## T05 — In-memory repositories + engine services
```task
id: CE-WP-0002-T05
priority: high
status: todo
depends_on: [T01]
```
Under `src/engine/`:
- `repos/in-memory.ts``Map`-backed implementations of
`DocumentRepository`, `AnnotationRepository`, `EvidenceItemRepository`
- `services/documents.ts`, `services/annotations.ts`, `services/evidence.ts`
— thin orchestration layer that creates IDs, calls repos, and emits the
events from `wiki/SharedContracts.md` §4
- `events/bus.ts` — minimal pub/sub. Synchronous for MVP.
No persistence to disk yet. ADR-0005 (persistence) is still pending.
---
## T06 — `src/work/`: collection list + viewer shell + sidebar
```task
id: CE-WP-0002-T06
priority: high
status: todo
depends_on: [T02, T05]
```
Under `src/work/` and `src/app/`:
- `src/app/App.tsx` — three-pane layout per Architecture §12.1: collection
list (left), viewer (centre), evidence sidebar (right)
- `src/work/CollectionList.tsx` — lists `fixtures/pdfs/manifest.json`
entries; click to load
- `src/work/ViewerShell.tsx` — hosts the viewer adapter from T02 wrapped
cleanly; viewer adapter API is the only surface `work/` uses
- `src/work/EvidenceSidebar.tsx` — lists evidence items for the current
document, shows quote + commentary + status
No styling beyond minimum legibility. CSS in Tailwind or vanilla — pick one,
note in ADR-0001 if it wasn't already.
---
## T07 — Annotation create flow
```task
id: CE-WP-0002-T07
priority: high
status: todo
depends_on: [T04, T05, T06]
```
Wire selection → annotation → evidence item:
1. User selects text in the viewer.
2. A small toolbar appears with a comment input + Save button.
3. On Save: adapter produces `SelectionCapture` → anchor creates `Selector[]`
→ engine creates `Annotation` → engine creates `EvidenceItem` with the
commentary → sidebar updates.
Active state lives in a single React context for now; no Redux/Zustand.
---
## T08 — Click-to-reopen flow
```task
id: CE-WP-0002-T08
priority: critical
status: todo
depends_on: [T04, T06, T07]
```
Implement the round trip:
1. User clicks an evidence item in the sidebar.
2. Engine loads the annotation → anchor resolves selectors against the
current representation → adapter scrolls to and highlights the target.
Critically, this must also work **after a page reload**. Persistence to
`localStorage` is acceptable for MVP (decide explicitly in
`ADR-0005-persistence.md` that we are deferring real persistence).
---
## T09 — End-to-end test of PRD scenario steps 1-4
```task
id: CE-WP-0002-T09
priority: high
status: todo
depends_on: [T07, T08]
```
Write a Playwright (or similar) E2E test that:
1. Opens the app.
2. Picks `simple.pdf`.
3. Programmatically selects the known-good quote from the manifest.
4. Saves an evidence item with a comment.
5. Verifies the item appears in the sidebar.
6. Reloads the page.
7. Clicks the evidence item.
8. Verifies the highlight is rendered on the expected page.
This is the contract for "MVP slice 1 works". If it passes, CE-WP-0003 may
begin.

View File

@@ -0,0 +1,246 @@
---
id: CE-WP-0003
type: workplan
title: "Form binding + visual guide — EvidenceLink, rect registry, SVG overlay"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
status: todo
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
depends_on_workplan: CE-WP-0002
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
---
# CE-WP-0003 — Form Binding + Visual Guide
Build the evidence-backed form mode and the SVG visual guide overlay.
After this workplan, a user can:
1. Open a form next to the document viewer.
2. Drag (or click-to-link) an evidence item from the sidebar onto a form
field.
3. Click a form field → its linked evidence items appear → the active
evidence's source passage is scrolled into view and highlighted → an SVG
guide visually connects the field, the evidence card, and the highlight.
4. Cycle through multiple evidence items on the same field.
This is the workplan that stress-tests the rect-registry contract from
`wiki/SharedContracts.md` §7. The form, the evidence card, and the viewer's
highlight all need to publish rects to a single overlay that re-renders on
scroll/resize/focus.
## Dependency Order
```
T01 (EvidenceLink + EvidenceSet types + relation/status enums)
└─ T02 (binding service + in-memory link repo + active-state machine)
└─ T03 (rect registry — the contract from SharedContracts.md §7)
└─ T04 (form schema + simple field renderer)
└─ T05 (side-by-side layout + drag-or-click to link)
└─ T06 (active-evidence cycling on a field)
└─ T07 (SVG visual guide overlay)
└─ T08 (E2E test of PRD scenario steps 5-9)
```
---
## T01 — `EvidenceLink` + `EvidenceSet` types
```task
id: CE-WP-0003-T01
priority: critical
status: todo
```
Add under `src/shared/`:
- `src/shared/evidence-link.ts``EvidenceLink`, `EvidenceLink.status`
enum per SharedContracts §2.4, `EvidenceLink.relation` enum per §2.5,
`EvidenceTarget` generic shape
- `src/shared/evidence-set.ts``EvidenceSet` with `activeEvidenceItemId`
No services. Pure shapes.
Add a unit test asserting that the union of all enum values matches the
`SharedContracts.md` lists exactly — if someone adds a value without
updating the doc, the test fails.
---
## T02 — Binding service + in-memory link repo + active-state machine
```task
id: CE-WP-0003-T02
priority: high
status: todo
depends_on: [T01]
```
Under `src/binder/`:
- `repos/in-memory-links.ts` — Map-backed `EvidenceLinkRepository`
- `services/bindings.ts``linkEvidenceToTarget`, `unlinkEvidence`,
`listEvidenceForTarget`, `setActiveEvidence`
- `state/active.ts` — a small machine tracking
`(activeTarget, activeEvidenceItem, activeAnnotation)`. Exposed as a React
context.
Emit the events from SharedContracts §4 (`EvidenceLinkCreated`,
`EvidenceItemActivated`, `FormFieldActivated`).
---
## T03 — Rect registry (the SharedContracts §7 contract)
```task
id: CE-WP-0003-T03
priority: critical
status: todo
depends_on: [T02]
```
Implement under `src/binder/visual-guide/`:
- `rect-registry.ts``RectRegistry` with `register`, `getRect`,
`subscribe` per SharedContracts §7
- `react-hooks.ts``useRegisterRect(kind, id, ref)` for components to
register a ref-derived rect
- `events.ts` — registry emits `rect-changed` events on
scroll/resize/focus/active-evidence-change (use ResizeObserver +
IntersectionObserver + window resize + window scroll listeners)
Unit tests: register a fake field, evidence card, and highlight; mutate
their bounding rects; assert subscribers fire with the new rects.
**This contract must not change after T03.** Three subsystems will depend on
it in T05/T06/T07.
---
## T04 — Form schema + simple field renderer
```task
id: CE-WP-0003-T04
priority: medium
status: todo
depends_on: [T01]
```
A deliberately minimal form schema lives in `src/app/forms/demo-schema.ts`:
```ts
type FormFieldSchema =
| { type: "text"; id: string; label: string }
| { type: "textarea"; id: string; label: string }
| { type: "date"; id: string; label: string };
```
JSON Schema is **not** used yet — defer that to a later ADR. The MVP form
just needs to render 3-4 fields and accept evidence links.
- `src/work/FormRenderer.tsx` renders the schema as a basic form
- Each field registers itself with the rect registry as kind `"field"` with
the field's `id`
---
## T05 — Side-by-side layout + link evidence to field
```task
id: CE-WP-0003-T05
priority: high
status: todo
depends_on: [T02, T04]
```
A new app route `/forms/demo` shows the side-by-side layout from Architecture
§12.2:
- Left: `FormRenderer` with a demo schema (3 fields)
- Right: viewer (reusing `ViewerShell` from CE-WP-0002)
- Bottom strip or popover: evidence list
Linking interaction: click an evidence item, then click a field → link
created. (Drag-and-drop is a polish item, not MVP.) Visual indication on
linked fields (e.g. a chip showing the count of linked evidence items).
---
## T06 — Active-evidence cycling on a field
```task
id: CE-WP-0003-T06
priority: high
status: todo
depends_on: [T05]
```
When a field is focused:
1. Binder loads the field's evidence set.
2. The first evidence item becomes active.
3. The viewer scrolls to and highlights its annotation.
4. Keyboard `Tab`/`Shift-Tab` within the field's evidence chips cycles
active evidence; viewer scrolls accordingly.
5. The evidence sidebar highlights the active evidence card.
Each evidence card registers itself with the rect registry as
`"evidence-card"`.
---
## T07 — SVG visual guide overlay
```task
id: CE-WP-0003-T07
priority: high
status: todo
depends_on: [T03, T06]
```
Implement `src/binder/visual-guide/Overlay.tsx`:
- Single absolutely-positioned SVG covering the viewport
- Subscribes to the rect registry
- On every change, redraws two curves: `field → evidence-card` and
`evidence-card → highlight`
- Active-only — only the currently active triple gets drawn
- Throttled to animation frames
Acceptance: scroll the viewer, resize the window, change active evidence —
the guide tracks every change without visible lag.
The viewer adapter from CE-WP-0002 must expose
`getHighlightClientRects(annotationId)` so the highlight's rect can be
registered.
---
## T08 — E2E test of PRD scenario steps 5-9
```task
id: CE-WP-0003-T08
priority: high
status: todo
depends_on: [T05, T07]
```
Extend the Playwright E2E from CE-WP-0002-T09:
5. Navigate to `/forms/demo`.
6. Link the previously-created evidence item to the "summary" field.
7. Click the "summary" field.
8. Assert the field, the evidence card, and the highlight all have an
`aria-current="true"` (or equivalent active marker).
9. Assert the SVG overlay contains exactly two `<path>` elements (one
field→card, one card→highlight).
10. Scroll the viewer; assert the SVG paths' endpoints update within the
next animation frame.
If this passes, the form-binding slice is complete and CE-WP-0004 may run
in parallel with any deferred polish work.

View File

@@ -0,0 +1,164 @@
---
id: CE-WP-0004
type: workplan
title: "Citation card export — Markdown and HTML renderers, sidebar export"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
status: todo
owner: Bernd
created: 2026-05-24
updated: 2026-05-24
depends_on_workplan: CE-WP-0002
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
---
# CE-WP-0004 — Citation Card Export
The final step of the MVP scenario: turn an evidence item into a portable
Markdown or HTML citation card.
After this workplan, a user can:
1. Click "Export" on an evidence item in the sidebar.
2. Choose Markdown or HTML.
3. Get a clipboard-ready citation card with quote, source label,
commentary, and a link back to source context.
This workplan can run in parallel with CE-WP-0003 once CE-WP-0002 is done —
it touches different code paths.
## Dependency Order
```
T01 (CitationCard type + open-context URL convention)
└─ T02 (Markdown renderer)
└─ T03 (HTML renderer)
└─ T04 (sidebar Export button + copy-to-clipboard)
└─ T05 (E2E test of PRD scenario step 10)
```
---
## T01 — `CitationCard` type + open-context URL convention
```task
id: CE-WP-0004-T01
priority: high
status: todo
```
Under `src/shared/`:
- `src/shared/citation-card.ts``CitationCard` per Architecture §4.7
- `src/shared/open-context-url.ts` — function `openContextUrl(annotationId)`
returning a URL of the form
`/viewer?document=<docId>&annotation=<annId>` (per Architecture §14.3)
The URL is the deep link that an exported card uses to reopen the source
context in this MVP. When persistence becomes real (post-MVP), the URL
scheme stays the same.
---
## T02 — Markdown citation card renderer
```task
id: CE-WP-0004-T02
priority: high
status: todo
depends_on: [T01]
```
Under `src/engine/rendering/`:
- `markdown.ts``renderCitationCardMarkdown(evidenceItem, document, annotation): string`
Output format (lock this in `docs/decisions/ADR-0007-citation-card-format.md`):
```markdown
> {quote}
*{sourceLabel}* · [Open source]({openContextUrl})
{commentary}
```
Where `sourceLabel` is `document.title` if present, else the filename, else
the document URI.
Unit tests: snapshot a few rendered cards against fixtures.
---
## T03 — HTML citation card renderer
```task
id: CE-WP-0004-T03
priority: high
status: todo
depends_on: [T01]
```
Under `src/engine/rendering/`:
- `html.ts``renderCitationCardHtml(evidenceItem, document, annotation): string`
Output: a single `<aside class="citation-card">` element with `<blockquote>`,
`<cite>`, `<a>` (open context), and `<div class="commentary">`. Inline
styles avoided — host page provides CSS. Sanitize commentary as plain text
(no raw HTML pass-through).
Web component `<citation-card>` from Architecture §14.2 is *not* in scope
here — it ships in a later workplan.
---
## T04 — Sidebar Export button + copy-to-clipboard
```task
id: CE-WP-0004-T04
priority: medium
status: todo
depends_on: [T02, T03]
```
Add to `src/work/EvidenceSidebar.tsx`:
- Per evidence item: an "Export" affordance (icon button or menu)
- On click: small popover with two buttons, "Copy as Markdown" and
"Copy as HTML"
- On click: render via T02/T03 and write to clipboard with the standard
`navigator.clipboard` API; show a transient confirmation toast
Keyboard shortcut `Cmd/Ctrl+Shift+C` exports the active evidence item as
Markdown (the most common action).
---
## T05 — E2E test of PRD scenario step 10
```task
id: CE-WP-0004-T05
priority: medium
status: todo
depends_on: [T04]
```
Extend the Playwright E2E:
10. After the earlier steps, click Export → Copy as Markdown on the saved
evidence item.
11. Read the clipboard; assert it contains the quote text, the document
title, the commentary, and a URL matching the
`/viewer?document=...&annotation=...` shape.
If this passes, MVP scenario steps 1-10 are all green and the
umbrella-first MVP is *done* for the first reference scenario from PRD §20.
The next workplan (post-MVP) would be `CE-WP-0005` to either extract the
first stable subsystem (likely `citation-engine`) into its own repo or to
add Markdown/HTML document support.

41
workplans/README.md Normal file
View File

@@ -0,0 +1,41 @@
# MVP Workplans
These four workplans implement the **first reference scenario** from
`wiki/ProductRequirementsDocument.md` §20 — end-to-end PDF evidence
capture → form binding → citation card export — entirely inside the
`citation-evidence` repository.
| Workplan | Title | Status |
|----------|----------------------------------------|--------|
| `CE-WP-0001` | Foundations — scaffold, folders, lint rules, normalize, fixtures | todo |
| `CE-WP-0002` | PDF review slice — engine types, anchor, source, viewer, sidebar | todo |
| `CE-WP-0003` | Form binding + visual guide — EvidenceLink, rect registry, overlay | todo |
| `CE-WP-0004` | Citation card export — Markdown + HTML renderers, sidebar export | todo |
## Order
Strictly sequential. `CE-WP-0002` depends on the folder/lint scaffolding from
`CE-WP-0001`. `CE-WP-0003` and `CE-WP-0004` depend on the engine types,
viewer adapter, and sidebar from `CE-WP-0002`.
## How to run a workplan
```
/ralph-workplan workplans/CE-WP-0001-foundations.md
```
Ralph drives the loop and retires automatically when all tasks in the
workplan are marked `done`. See `~/.claude/plugins/ralph-workplan/ralph-workplan.md`.
## Acceptance for MVP
The first reference scenario from PRD §20 runs end-to-end:
1. Create a collection
2. Upload a PDF
3. Select a passage, add commentary, create an evidence item
4. Open a side-by-side form
5. Link the evidence item to a form field
6. Focus the field → field, evidence card, and PDF passage all highlighted
7. SVG guide visible between field → card → highlight
8. Export evidence as a Markdown citation card