diff --git a/INTENT.md b/INTENT.md index 5a05461..bd827ca 100644 --- a/INTENT.md +++ b/INTENT.md @@ -189,6 +189,59 @@ This repository should be: --- +## Home for Shared Contracts + +This repository is the **single home for everything the sister repos must +agree on**. The canonical documents live in `wiki/`: + +* `wiki/ProductRequirementsDocument.md` — what the product does +* `wiki/ArchitectureOverview.md` — how the subsystems compose +* `wiki/SharedContracts.md` — vocabulary, state enums, relation types, selector taxonomy, event types, viewer adapter contract, canonical text normalization +* `wiki/DependencyMap.md` — which subsystem may depend on which +* `docs/decisions/` — ADRs that resolve ambiguities and bind the contract + +Sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`, +`citation-work`, `evidence-binder`) defer to these documents. When their +own `INTENT.md` files mention "shared contracts", they mean the documents +listed above. + +Changes to shared contracts happen here, not in the sister repos. + +--- + +## MVP Strategy — Umbrella-First (decided 2026-05-24) + +**The MVP lives entirely in this repository before being segmented into the +sister repos.** This is a deliberate trade-off: fewer interface decisions up +front, more refactoring later when extraction happens. + +The reasoning: + +1. The architectural boundaries documented in the sister INTENT files are + hypotheses. We do not yet know which ones will hold up under real product + pressure. +2. Coordinating six repos with no working code is expensive. Coordinating one + repo with working code is cheap. +3. Interfaces designed in advance of implementation tend to be wrong. +4. Extracting working code into a new repo is a known, bounded refactor. + Reshaping a premature interface while implementing against it is not. + +Concretely: + +* All MVP source code lives under `citation-evidence/src/`, partitioned by + future-repo names (`shared/`, `engine/`, `anchor/`, `source/`, `work/`, + `binder/`, `app/`). +* The `DependencyMap.md` rules are enforced by lint rules on these folders. +* The five sister repos remain INTENT-only during MVP — they document the + intended boundary, not current code. +* When a subsystem's interface stabilizes (typically after the MVP scenario + has run end-to-end at least once), its `src//` slice extracts + to the sister repo. + +This INTENT will be updated when extraction happens. + +--- + ## Success Criteria The repository is successful when it allows a developer or agent to understand, run, and extend the citation-evidence system as an integrated product. diff --git a/history/2026-05-24-initial-assessment.md b/history/2026-05-24-initial-assessment.md new file mode 100644 index 0000000..1c80bd8 --- /dev/null +++ b/history/2026-05-24-initial-assessment.md @@ -0,0 +1,113 @@ +# Initial Assessment — citation-evidence ecosystem + +**Date:** 2026-05-24 +**Author:** Claude (Opus 4.7), commissioned by Bernd +**Scope:** Review of `citation-evidence` umbrella PRD and Architecture overview, plus all five sister-repo `INTENT.md` files, for alignment, risk, and recommended approach. + +--- + +## 1. Overall alignment across the six INTENT.md files + +The vocabulary is impressively coherent: every repo speaks of +`Document → DocumentRepresentation → Annotation → Selector → EvidenceItem → EvidenceLink → CitationCard`. +Each `INTENT.md` follows the same Purpose / Scope / Out-of-Scope / Architectural Position / First-Useful-Version / Success Criteria shape. +Out-of-scope sections show the authors deliberately *pushing* responsibilities into other repos — a healthy signal. + +The PRD and Architecture overview in `citation-evidence/wiki/` are also internally consistent: the PRD's functional requirements map cleanly to the architecture's data flows and to subsystem scopes. + +But the documents were authored in quick succession (all on 2026-05-24, within ~30 minutes of each other based on file timestamps) and **never reconciled against each other**, which created the issues below. + +## 2. What should be improved + +### 2.1 Concrete ownership ambiguities to resolve in short ADRs + +| Concept | Conflict | +|---|---| +| **`Selector` types** | `citation-engine` claims it as a "key concept owned"; `evidence-anchor`'s scope lists "selector type definitions". Likely fix: *interfaces* in engine, *creation/resolution/algorithms* in anchor. | +| **`EvidenceLink` / `EvidenceSet`** | Engine claims both as owned domain types; `evidence-binder` lists "evidence-to-target binding model" and "evidence sets" in scope. Same engine-defines-type / binder-owns-behavior split needed. | +| **Status enums** | Architecture's `EvidenceItem.status` is `candidate\|confirmed\|rejected\|needs-check`. `citation-work` adds `strong-support\|weak-support\|contradicts`. `evidence-binder` adds *target-specific* states (`conflicting-evidence`, `insufficient-evidence`, `verified`) plus extra relations (`context-for`, `derived-from`, `needs-check`). Three repos inventing overlapping enums. | +| **Viewer adapters** | Architecture diagram shows them as a separate box, no owner. Adapter methods (`load`, `createSelectorsFromSelection`, `resolveSelectors`, `scrollToResolvedTarget`, `renderHighlight`) straddle `evidence-source` and `evidence-anchor`. Pick one home (likely `evidence-anchor`, with `evidence-source` providing the representation). | +| **`CitationRecoveryAttempt`** | Type in engine, behavior in `evidence-source` — semantic ownership split that will rot. | +| **Document review status (FR-006)** | No repo claims it; `citation-work` hints "may later be moved into a shared model". | + +### 2.2 Repository scaffolding gaps + +- The umbrella architecture (§3.1) promises `apps/workspace-demo/`, `docs/decisions/`, `integration-tests/`, `docker-compose.yml` — none of this exists yet. +- All six READMEs are essentially empty (1 line). New contributors and agents won't know where to start. +- `citation-evidence` is **not registered in the state-hub**. For a project that splits across six repos, you lose central memory of decisions/dependencies/progress without it. + +### 2.3 Architectural decisions still pending + +ADR-001 through ADR-005 in the architecture doc are framed as "recommendations" rather than commitments. Each blocks code: + +- React-first vs web-component-first (drives repo packaging) +- Local-first vs server-first storage (drives persistence interface shape) +- W3C internal model vs mapping (drives every type definition) +- `react-pdf-highlighter-plus` vs PDF.js direct (drives MVP timeline by weeks) +- Recovery scope local-only vs external + +### 2.4 Missing cross-repo contract artefacts + +There is no central dependency map. Each repo says "I expect to depend on X" but nothing names which repo *publishes* the shared types package(s). Pick monorepo (pnpm workspace) vs polyrepo with published `@citation-evidence/engine` npm packages before the first commit of code lands — switching later is painful. + +## 3. Technical risks to inspect first + +In rough order of "if this is broken, the architecture doesn't work": + +1. **PDF canonical-text stability** — the entire selector/anchor model assumes a given PDF + extraction pipeline produces *the same* canonical text each time. PDF.js text extraction has known issues with multi-column layouts, custom-glyph fonts, ligatures, soft hyphens, and reading order. Build a corpus of 15-20 representative PDFs (governmental forms, two-column papers, scanned-then-OCR'd, German umlauts) and confirm round-trip selector resolution before committing to the model. + +2. **`react-pdf-highlighter-plus` abstraction leakage** — this library is opinionated; wrapping it cleanly while keeping the engine viewer-independent is the central architectural test. Do a focused spike: load PDF → select → store selectors as JSON → reload page → resolve from JSON → highlight. If this leaks PDF.js types into the engine API, the boundary fails on day one. + +3. **Canonical-text normalization is a silent migration** — every stored annotation's `TextQuoteSelector` / `TextPositionSelector` depends on the *exact* normalization rules used at creation time. Treat normalization as a versioned, deterministic function from day one. If you change Unicode normalization or whitespace handling later, every stored annotation breaks silently. + +4. **Visual guide overlay coupling** — `evidence-binder` owns the visual-guide *model*, but rendering needs DOM rects from three sources: the form (binder's UI?), the evidence sidebar (`citation-work`), and the document highlight (viewer adapter). Three subsystems contributing rects to one overlay is the highest-coupling part of the system. Define an explicit *rect registry* contract before any of them ships UI. + +5. **CSS Custom Highlight API support** — architecture mentions it for HTML/Markdown with fallback. Browser support is uneven; the fallback (usually DOM range-based span wrapping) is what will actually run on most users' machines. Verify the fallback path is acceptable, not the optimistic primary. + +6. **W3C Web Annotation mapping is not free** — JSON-LD selectors can express things your internal model can't (and vice versa). Round-tripping is a research task, not a one-day mapping. Decide whether mapping is "lossy but useful" or "MUST round-trip" before stabilizing types. + +7. **Multi-repo dependency cycle risk** — engine ↔ anchor (`Selector` ownership), engine ↔ source (`RecoveryAttempt`), engine ↔ binder (`Link`/`Set`) all currently look bidirectional in the INTENT files. Without a strict "types-only flow downward, behavior flows upward" rule, you will hit `npm install` cycles. + +## 4. Rough approach (original phased plan) + +**Phase 0 — Foundations (1-2 weeks, no production code)** +- Register `citation-evidence` as a state-hub domain + register all six repos +- Write 5-7 micro-ADRs in `citation-evidence/docs/decisions/` resolving the ownership ambiguities above +- Pick monorepo-vs-polyrepo and pin Node/TS toolchain +- Assemble a 15-20 PDF test corpus and check it into a fixtures location +- Write a real README for each repo pointing at INTENT + architecture + +**Phase 1 — Vertical slice on the easiest format (4-6 weeks)** +- Engine: TS types + in-memory repos only +- Anchor: text-quote + text-position selectors, fuzzy match deferred +- Source: PDF text extraction + fingerprint only +- Work: one-document UI, sidebar, create annotation, click-to-reopen +- Umbrella: wire it into a reference app +- Goal: prove viewer-independence on PDFs end-to-end. No forms, no recovery, no Markdown. + +**Phase 2 — Evidence binding & form mode (4 weeks)** +- Binder + visual-guide rect registry +- One form-schema example with side-by-side viewer +- This is where the active-state coordination claim gets stress-tested + +**Phase 3 — Format expansion (4 weeks)** +- HTML adapter (sanitization + DOM range selectors) +- Markdown adapter +- Confirms the format-neutral claim + +**Phase 4 — Local citation recovery (4 weeks)** +- Local-library search, exact + fuzzy quote match, confirmation UI +- Defer external source lookup until local pipeline is reliable + +## 5. Pivot — umbrella-first MVP (decided 2026-05-24) + +The user has chosen to **build the MVP entirely inside `citation-evidence`** before segmenting code into the sister repos. The reasoning: get the product working end-to-end with minimal coordination cost, then extract subsystems once the contracts have been validated by actual use. + +This means: + +- All MVP source code lives under `citation-evidence/` (likely `src/` partitioned by future-repo names: `engine/`, `anchor/`, `source/`, `work/`, `binder/`). +- The five sister repos remain as INTENT-only placeholders during MVP — they document the intended boundaries, but code will move in only when a subsystem's contract has stabilized. +- Interface design is explicitly deferred. Phase-0 ADRs become Phase-N extractions, informed by real friction points. +- Shared contracts live in `citation-evidence/wiki/SharedContracts.md` and `citation-evidence/wiki/DependencyMap.md`. + +This trade-off accepts more rework later (when subsystems extract) in exchange for faster MVP velocity now and better-informed boundaries when extraction happens. diff --git a/wiki/DependencyMap.md b/wiki/DependencyMap.md new file mode 100644 index 0000000..d0e8feb --- /dev/null +++ b/wiki/DependencyMap.md @@ -0,0 +1,155 @@ +# Dependency Map — citation-evidence + +This document describes the **allowed dependency edges** between the +subsystems of the citation-evidence ecosystem. It is the cycle-prevention +contract. + +It complements `SharedContracts.md` (which says *what* is shared) by saying +*who is allowed to depend on whom*. + +--- + +## 1. The rule + +> Types flow downward from `citation-engine`. Behavior flows upward into +> specialised repos. No subsystem may import another subsystem's behavior +> unless this map shows an edge. + +The umbrella repo `citation-evidence` is allowed to depend on every +subsystem; nothing depends on the umbrella. + +--- + +## 2. Allowed edges + +``` + ┌───────────────────────┐ + │ citation-evidence │ (umbrella) + └───────────┬───────────┘ + │ depends on + ┌──────────────────────────┼────────────────────────────┐ + ▼ ▼ ▼ +┌───────────────┐ ┌────────────────┐ ┌────────────────┐ +│ citation- │ │ evidence- │ │ citation- │ +│ work │ │ binder │ │ engine │ +└──────┬────────┘ └────────┬───────┘ └────────┬───────┘ + │ │ │ + │ depends on │ depends on │ depends on + │ │ │ (nothing — + ▼ ▼ │ leaf node) +┌────────────────┐ ┌────────────────┐ │ +│ evidence- │ │ evidence- │ │ +│ anchor │ │ anchor │ │ +└──────┬─────────┘ └────────┬───────┘ │ + │ │ │ + │ depends on │ depends on │ + ▼ ▼ ▼ +┌────────────────┐ ┌────────────────┐ (citation-engine) +│ evidence- │ │ citation- │ +│ source │ │ engine │ +└────────┬───────┘ └────────────────┘ + │ + │ depends on + ▼ +┌────────────────┐ +│ citation- │ +│ engine │ +└────────────────┘ +``` + +In tabular form: + +| Repo | May depend on | Must not depend on | +|--------------------|--------------------------------------------------------|-----------------------------------------| +| `citation-engine` | (nothing — it is the leaf) | every other subsystem | +| `evidence-anchor` | `citation-engine` | `evidence-source`, `citation-work`, `evidence-binder`, `citation-evidence` | +| `evidence-source` | `citation-engine` | `evidence-anchor`, `citation-work`, `evidence-binder`, `citation-evidence` | +| `evidence-binder` | `citation-engine`, `evidence-anchor` | `evidence-source`, `citation-work`, `citation-evidence` | +| `citation-work` | `citation-engine`, `evidence-anchor`, `evidence-source`| `evidence-binder`, `citation-evidence` | +| `citation-evidence`| all five subsystems | (nothing else in the ecosystem) | + +Notes: + +- `evidence-source` does NOT depend on `evidence-anchor`. When an ingestion + pipeline needs to know "could a selector resolve here?", the answer comes + through events, not direct calls. +- `citation-work` does NOT depend on `evidence-binder`. Linking evidence to + form fields is a separate workflow; the review workspace should function + without it. A separate "evidence-backed form" application composes work + + binder + engine. +- `evidence-binder` does NOT depend on `evidence-source`. When a binder needs + source context, it asks `evidence-anchor` to resolve the annotation, which + in turn knows nothing about how the document was ingested. + +--- + +## 3. Communication channels + +Direct imports are allowed only along the edges above. Where two subsystems +need to coordinate without being allowed to import each other, they use one +of these indirect channels: + +| Channel | Owner | Notes | +|---------------------------------|------------------|---------------------------------------------------------| +| Shared event bus | `citation-engine`| Vocabulary frozen in `SharedContracts.md` §4 | +| Shared types package | `citation-engine`| Re-exported through `@citation-evidence/engine` (post-extraction) | +| Rect registry | `evidence-binder`| Used by form UI, evidence sidebar, viewer adapter | +| Persistence interfaces | `citation-engine`| Concrete adapters in subsystems but interfaces in engine| + +--- + +## 4. During umbrella-first MVP + +While all code lives in `citation-evidence/src/`, the rule is enforced by +**folder structure** and **lint rules**: + +``` +citation-evidence/src/ + shared/ ← what will become citation-engine (types + contracts) + engine/ ← what will become citation-engine (services) + anchor/ ← what will become evidence-anchor + source/ ← what will become evidence-source + work/ ← what will become citation-work (UI) + binder/ ← what will become evidence-binder + app/ ← the umbrella reference app +``` + +Lint rule (to be added in WP-0001): + +- `engine/` may import only from `shared/`. +- `anchor/` may import only from `shared/`, `engine/`. +- `source/` may import only from `shared/`, `engine/`. +- `binder/` may import only from `shared/`, `engine/`, `anchor/`. +- `work/` may import only from `shared/`, `engine/`, `anchor/`, `source/`. +- `app/` may import from any. + +Violating these rules in MVP is a lint error, not a runtime error. When +subsystems extract into their own repos, the lint rule disappears and the +package boundary enforces the same constraint. + +--- + +## 5. Why these rules + +1. **`citation-engine` as the leaf** prevents the most common monorepo pathology: + the "core" repo accumulating UI/IO dependencies because it was easier than + inverting a dependency. +2. **`citation-work` ⊄ `evidence-binder`** keeps the review workspace usable + even when there is no form context (e.g. just collecting evidence for a + report). +3. **`evidence-binder` ⊄ `evidence-source`** keeps binding logic from + accidentally caring about ingestion details. +4. **No subsystem depends on `citation-evidence`** — the umbrella is a + composition point, not a library. + +--- + +## 6. Change process + +Adding an edge to this map is a change to the contract. + +- New edges require a short ADR in `docs/decisions/`. +- Removing an edge requires a refactoring plan (where do consumers go?). +- The MVP itself is an exception: edges that turn out to be wrong during + umbrella-first development are recorded as "deferred reshape" items in the + relevant workplan, not as ADRs. diff --git a/wiki/SharedContracts.md b/wiki/SharedContracts.md new file mode 100644 index 0000000..26b7f32 --- /dev/null +++ b/wiki/SharedContracts.md @@ -0,0 +1,296 @@ +# Shared Contracts — citation-evidence + +This document is the **single source of truth** for everything that more than one +subsystem in the citation-evidence ecosystem must agree on: + +- the **vocabulary** (entity names and what they mean), +- the **canonical state enums** for entities that flow across repo boundaries, +- the **relation type** vocabulary, +- the **selector type** taxonomy, +- the **event type** vocabulary, +- the **ownership rules** for shared types versus shared behavior. + +The five sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`, +`citation-work`, `evidence-binder`) defer to this document. When their +`INTENT.md` files refer to "shared contracts", they mean this file. + +During the umbrella-first MVP phase, the **TypeScript implementations** of +these contracts live in `citation-evidence/src/shared/` and are imported by +the per-subsystem code under `citation-evidence/src/{engine,anchor,source,work,binder}/`. +When a subsystem extracts to its own repo, it takes its slice of the shared +types with it — but this document remains the canonical vocabulary. + +--- + +## 1. Vocabulary + +These nine entities are the vocabulary every subsystem uses. + +| Entity | One-line definition | Owner (post-extraction) | +|---------------------------|----------------------------------------------------------------------------------------------------|-------------------------| +| `Document` | An identified source object: PDF, Markdown, HTML, scan, etc. | `citation-engine` | +| `DocumentRepresentation` | A normalized, addressable view of a document (canonical text, page map, structure). | `citation-engine` | +| `Selector` | A technical locator for a passage inside a representation. | `citation-engine` (types) / `evidence-anchor` (behavior) | +| `Annotation` | A technical mark on a document range, expressed as one or more selectors plus quote text. | `citation-engine` | +| `EvidenceItem` | A meaningful evidence object built from one or more annotations, with commentary and status. | `citation-engine` | +| `EvidenceSet` | An ordered group of evidence items associated with a target or topic. | `citation-engine` (type) / `evidence-binder` (behavior) | +| `EvidenceLink` | A relation between an `EvidenceItem` and a structured target (form field, claim, requirement, …). | `citation-engine` (type) / `evidence-binder` (behavior) | +| `CitationCard` | A renderable, exportable presentation of an evidence item. | `citation-engine` | +| `CitationRecoveryAttempt` | A traceable attempt to locate a cited passage from an external clue. | `citation-engine` (type) / `evidence-source` (behavior) | + +**Ownership rule:** *types and interfaces flow downward from `citation-engine`; +behavior flows upward into the specialised repos*. Where the table shows a +split, the engine repo holds the data shape and the other repo holds the +algorithms and lifecycle. + +--- + +## 2. Canonical state enums + +These enums are the authoritative values. Subsystems must not invent local +variants without updating this document first. + +### 2.1 `Annotation.resolutionStatus` + +``` +resolved — selectors located the passage with high confidence +ambiguous — multiple plausible candidates found +unresolved — no plausible candidate found +stale — representation has changed since selectors were stored +``` + +### 2.2 `EvidenceItem.status` + +``` +candidate — captured but not yet vetted +confirmed — verified by a user as useful evidence +rejected — explicitly discarded +needs-check — flagged for review +``` + +> **Note:** earlier subsystem drafts introduced `strong-support`, `weak-support`, +> and `contradicts` on the item. Those concepts now live on the **link**, not +> the item — see §2.4. + +### 2.3 `Document.reviewStatus` (when used by `citation-work`) + +``` +unreviewed +in-review +relevant +rejected +needs-follow-up +cited +verified +``` + +`citation-work` may treat any of these as the active state; the canonical +storage lives on the Document record in `citation-engine`. + +### 2.4 `EvidenceLink.status` (per target) + +``` +no-evidence +candidate +confirmed +conflicting +insufficient +verified +``` + +`no-evidence` is a *derived* state computed when a target has zero links; +it is not stored on a link itself. + +### 2.5 `EvidenceLink.relation` + +``` +supports +contradicts +explains +qualifies +source-for +context-for +``` + +This is the closed vocabulary for the MVP. Adding a relation requires updating +this document and the `EvidenceLink` schema together. + +### 2.6 `CitationRecoveryAttempt.state` + +``` +created +source-found-fulltext +source-found-preview-only +source-found-metadata-only +source-not-found +quote-found +quote-not-found +candidate-passages-found +manual-confirmation-needed +confirmed +annotation-created +failed +``` + +--- + +## 3. Selector taxonomy + +A `Selector` is a discriminated union of: + +``` +TextQuoteSelector exact quote + prefix/suffix context +TextPositionSelector canonical text start/end offsets +PdfRectSelector page number + normalized page rectangles +PdfPageTextSelector page number + page-local text offsets +DomRangeSelector DOM path + range offsets (HTML/Markdown) +StructuralSelector heading/section/AST path +FragmentSelector exported fragment / deep link (export-only) +``` + +**Selector redundancy rule:** when an annotation is created, the system stores +*all selector types that are available* for that document representation, not +just one. Resolution tries them in order of expected confidence and stops at +the first high-confidence match. + +W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping +is documented separately (see ADR-0003 — pending). + +--- + +## 4. Event vocabulary + +Events are the primary integration mechanism between subsystems. The closed +event vocabulary for the MVP is: + +``` +DocumentImported +DocumentRepresentationGenerated +AnnotationCreated +AnnotationResolved +AnnotationResolutionFailed +EvidenceItemCreated +EvidenceItemUpdated +EvidenceLinkCreated +EvidenceLinkUpdated +EvidenceItemActivated +FormFieldActivated +CitationCardRendered +CitationRecoveryStarted +CitationRecoveryCandidateFound +CitationRecoveryConfirmed +``` + +Subsystems must emit these events through a shared event bus owned by +`citation-engine`. Subsystems may listen to any event but must not invent +event types without updating this document. + +--- + +## 5. Viewer adapter contract + +Viewer adapters are the bridge between a document format and the rest of the +system. They are **owned by `evidence-anchor`** as far as the contract goes; +concrete adapters may live in either `evidence-anchor` or `evidence-source` +depending on whether the heavy lifting is selector logic or document +representation logic. + +```ts +interface DocumentViewerAdapter { + mediaTypes: string[]; + load(document: Document, representation?: DocumentRepresentation): Promise; + getCurrentSelection(): Promise; + createSelectorsFromSelection(selection: SelectionCapture): Promise; + resolveSelectors(selectors: Selector[]): Promise; + scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise; + renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise; + getHighlightClientRects(annotationId: string): Promise; +} +``` + +MVP delivers a single `PDFViewerAdapter`. HTML and Markdown adapters are +deferred. + +--- + +## 6. Canonical text normalization + +All text-based selectors and quote matching depend on a deterministic +normalization function. The MVP normalization is: + +1. Unicode NFC normalization. +2. Replace all line-ending sequences with `\n`. +3. Collapse runs of horizontal whitespace into a single space. +4. Strip soft hyphens (U+00AD). +5. Preserve paragraph boundaries (double `\n`). + +**This function is versioned.** Stored selectors record the normalization +version they were created against. Changing the function later requires either +backwards-compatible behavior or a re-anchoring migration. + +The reference implementation lives in `citation-evidence/src/shared/text/normalize.ts`. + +--- + +## 7. Visual guide rect registry + +The visual-guide overlay (form field → evidence card → source highlight) +requires DOM rects from three independently-rendered subsystems. The contract +is a **rect registry** owned by `evidence-binder`: + +```ts +interface RectRegistry { + register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void; + getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null; + subscribe(listener: (event: RectRegistryEvent) => void): () => void; +} +``` + +Each renderer (form, evidence sidebar, viewer adapter) registers a +`getRect` callback. The overlay queries on-demand and re-renders on scroll, +resize, focus, and active-evidence change. + +This contract MUST be defined and stable before any of the three renderers +hardens, or the overlay becomes the system's coupling bottleneck. + +--- + +## 8. Ownership rules (the short version) + +1. **Types and interfaces** flow downward from `citation-engine`. +2. **Behavior and algorithms** live in the specialised repos. +3. Where a concept appears in both a type and a behavior context (e.g. + `Selector`, `EvidenceLink`, `EvidenceSet`, `CitationRecoveryAttempt`), + the engine owns the shape and the specialised repo owns the lifecycle. +4. **The shared event bus is engine-owned**; subsystems publish and subscribe + but do not extend the event vocabulary unilaterally. +5. **No new enum values, relation types, event types, or selector kinds** + land in code without first appearing in this document. +6. During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate + small violations in `citation-evidence/src/` and reconcile during extraction. + +--- + +## 9. Change process + +Changes to this document are change to the contract. + +- Small additions (a new enum value, a new event type) can be made in a single + PR that updates this doc + the type definitions + at least one consumer. +- Breaking changes (renaming an entity, removing a state, changing an + ownership split) require a short ADR in `docs/decisions/` and a heads-up + progress event on the state-hub. + +--- + +## 10. Pending ADRs that will affect this document + +These are listed in `docs/decisions/` once written. Until then the document +reflects the current best understanding from the architecture overview. + +- **ADR-0001** — Umbrella-first MVP strategy (decided 2026-05-24, this session). +- **ADR-0002** — Monorepo vs polyrepo packaging (pending). +- **ADR-0003** — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending). +- **ADR-0004** — PDF viewer library choice: `react-pdf-highlighter-plus` vs PDF.js direct (pending). +- **ADR-0005** — Persistence: local-first SQLite vs Postgres from day one (pending). +- **ADR-0006** — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here). diff --git a/workplans/CE-WP-0001-foundations.md b/workplans/CE-WP-0001-foundations.md new file mode 100644 index 0000000..5c7b46d --- /dev/null +++ b/workplans/CE-WP-0001-foundations.md @@ -0,0 +1,246 @@ +--- +id: CE-WP-0001 +type: workplan +title: "Foundations — TS scaffold, folder layout, lint boundaries, normalization, fixtures" +domain: citation_evidence +repo: citation-evidence +repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6 +status: todo +owner: Bernd +created: 2026-05-24 +updated: 2026-05-24 +spec_refs: + - wiki/ProductRequirementsDocument.md + - wiki/ArchitectureOverview.md + - wiki/SharedContracts.md + - wiki/DependencyMap.md +--- + +# CE-WP-0001 — Foundations + +Establish the skeleton of the umbrella-first MVP: a TypeScript project with +a folder layout that mirrors the future subsystem split (so that extracting +to sister repos later is a `git mv` plus a `package.json` cut), lint rules +that enforce the dependency map at the folder level, the versioned +canonical-text normalization function, and a small but representative PDF +fixtures corpus. + +No product features yet. This workplan exists so that everything from +`CE-WP-0002` onward has somewhere to land. + +## Decisions captured here + +Each task below corresponds to a Phase-0 ADR. The ADR lives at +`docs/decisions/ADR-NNNN-.md`. If a task involves a choice that wasn't +already decided, the agent stops and asks Bernd before writing code. + +## Dependency Order + +``` +T01 (toolchain decision + package.json) + └─ T02 (folder layout per DependencyMap §4) + └─ T03 (lint rules enforcing dep edges) + └─ T04 (canonical text normalization v1, versioned) + └─ T05 (fixtures: 5+ representative PDFs + a manifest) + └─ T06 (README upgrade + dev workflow doc) + └─ T07 (write the six pending ADRs as stubs) +``` + +--- + +## T01 — Toolchain + package.json + tsconfig + +```task +id: CE-WP-0001-T01 +priority: critical +status: todo +``` + +Decide the TS toolchain (vite vs tsc-only vs Next.js) and write a single +`package.json` at the repo root. Decisions to lock in this task as an ADR +(`docs/decisions/ADR-0001-toolchain.md`): + +- Bundler: vite (recommended, fastest dev loop for a React MVP) +- Package manager: pnpm (recommended, plays well with future workspace split) +- React 18+ +- Strict TS + +Deliverables: +- `package.json` with `dev`, `build`, `test`, `lint`, `typecheck` scripts +- `tsconfig.json` with strict mode, paths for the `src/` partitions +- `.nvmrc` pinning Node version +- `docs/decisions/ADR-0001-toolchain.md` written and committed + +Do not install application dependencies yet — just the toolchain. + +--- + +## T02 — Folder layout matching DependencyMap §4 + +```task +id: CE-WP-0001-T02 +priority: critical +status: todo +depends_on: [T01] +``` + +Create the source folder layout: + +``` +src/ + shared/ # will become @citation-evidence/engine (types + contracts) + engine/ # will become @citation-evidence/engine (services) + anchor/ # will become @citation-evidence/anchor + source/ # will become @citation-evidence/source + work/ # will become @citation-evidence/work (UI) + binder/ # will become @citation-evidence/binder + app/ # the reference workspace shell +``` + +Each folder gets: +- A one-line `README.md` stating its future home +- An `index.ts` that re-exports its public API (empty for now) + +Add path aliases in `tsconfig.json`: `@shared/*`, `@engine/*`, etc. + +--- + +## T03 — Lint rules enforcing dependency edges + +```task +id: CE-WP-0001-T03 +priority: high +status: todo +depends_on: [T02] +``` + +Install `eslint-plugin-boundaries` (or equivalent) and configure rules per +`wiki/DependencyMap.md` §4: + +| Folder | May import from | +|--------------|--------------------------------------------------| +| `shared/` | (nothing internal) | +| `engine/` | `shared/` | +| `anchor/` | `shared/`, `engine/` | +| `source/` | `shared/`, `engine/` | +| `binder/` | `shared/`, `engine/`, `anchor/` | +| `work/` | `shared/`, `engine/`, `anchor/`, `source/` | +| `app/` | any | + +Add a failing test fixture that imports `source/` from `binder/` and confirm +lint catches it; remove the fixture afterward. + +`npm run lint` must pass on a clean tree. + +--- + +## T04 — Canonical text normalization v1 + +```task +id: CE-WP-0001-T04 +priority: critical +status: todo +depends_on: [T02] +``` + +Implement `src/shared/text/normalize.ts` per `wiki/SharedContracts.md` §6: + +1. Unicode NFC +2. Normalize line endings to `\n` +3. Collapse horizontal whitespace runs to a single space +4. Strip soft hyphens (U+00AD) +5. Preserve paragraph boundaries (`\n\n`) + +Public API: + +```ts +export const NORMALIZE_VERSION = 1; +export function normalize(input: string): { text: string; version: number }; +``` + +Include unit tests covering: ligatures, CRLF input, soft-hyphenated German, +mixed whitespace, paragraph preservation. + +Stored selectors will record this version number so that future normalization +changes can be detected as a migration concern. + +--- + +## T05 — PDF fixtures corpus + manifest + +```task +id: CE-WP-0001-T05 +priority: high +status: todo +depends_on: [T01] +``` + +Assemble `fixtures/pdfs/` with at least 5 representative PDFs: + +- A simple single-column text PDF +- A two-column academic PDF (e.g. ACM-style) +- A German PDF with umlauts and soft hyphens +- A form PDF (e.g. a public-sector application form) +- A PDF with a heading hierarchy + +Write `fixtures/pdfs/manifest.json` recording for each: +- filename +- short description +- expected page count +- one short "known-good quote" with the page number it appears on (used by + CE-WP-0002 selector tests) + +Keep each PDF small (< 1 MB) and check sources/licenses into +`fixtures/pdfs/SOURCES.md`. Public-domain or Bernd-authored only. + +--- + +## T06 — README upgrade + dev workflow doc + +```task +id: CE-WP-0001-T06 +priority: medium +status: todo +depends_on: [T01, T02] +``` + +Replace the one-line `README.md` with a real one: + +- What citation-evidence is (one paragraph from INTENT) +- Repository layout (point at `src/` partitions and what each becomes) +- Where to find docs (`wiki/`, `docs/decisions/`, `history/`, `workplans/`) +- Dev workflow: `pnpm install`, `pnpm dev`, `pnpm test`, `pnpm lint` +- Pointer to `~/ralph-workplan/` for how workplans are driven + +Add a one-paragraph `README.md` in each of the five sister repos pointing +back at this umbrella + reminding readers that code lives upstream during +the MVP phase. + +--- + +## T07 — Stub the six pending ADRs + +```task +id: CE-WP-0001-T07 +priority: medium +status: todo +depends_on: [T01] +``` + +Create stub files in `docs/decisions/` for each ADR mentioned in +`wiki/SharedContracts.md` §10: + +- `ADR-0001-toolchain.md` (filled in by T01) +- `ADR-0002-monorepo-vs-polyrepo.md` +- `ADR-0003-w3c-mapping-scope.md` +- `ADR-0004-pdf-viewer-library.md` +- `ADR-0005-persistence.md` +- `ADR-0006-selector-ownership-split.md` + +Each stub: title, status (`proposed` for 2-6), context (one paragraph +explaining what the decision is about and why it matters), options (bullet +list with pros/cons), decision (blank), consequences (blank). + +These are not decisions yet — they are *the questions that must be answered +before the relevant code lands*. The MVP can proceed without 2-6 being +resolved because no extraction or persistence happens until later workplans. diff --git a/workplans/CE-WP-0002-pdf-review-slice.md b/workplans/CE-WP-0002-pdf-review-slice.md new file mode 100644 index 0000000..409c498 --- /dev/null +++ b/workplans/CE-WP-0002-pdf-review-slice.md @@ -0,0 +1,283 @@ +--- +id: CE-WP-0002 +type: workplan +title: "PDF review slice — engine types, anchor, source, viewer, sidebar, click-to-reopen" +domain: citation_evidence +repo: citation-evidence +repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6 +status: todo +owner: Bernd +created: 2026-05-24 +updated: 2026-05-24 +depends_on_workplan: CE-WP-0001 +spec_refs: + - wiki/ProductRequirementsDocument.md + - wiki/ArchitectureOverview.md + - wiki/SharedContracts.md +--- + +# CE-WP-0002 — PDF Review Slice + +The first vertical product slice. After this workplan, a user can: + +1. Open the app, see a collection of fixture PDFs. +2. Open one PDF in a viewer. +3. Select text, add a one-line comment, save as an evidence item. +4. See the evidence item appear in a sidebar. +5. Click the evidence item and have the PDF jump to and highlight the + passage — even after a full page reload. + +No forms, no Markdown/HTML, no recovery, no export. Those come later. + +This workplan exercises the riskiest architectural assumption (PDF selector +round-trip with viewer independence) on the simplest possible feature set. + +## Risk-driven order + +T01 and T02 are the spike from the assessment: prove the +`react-pdf-highlighter-plus` integration can store and reload selectors +without leaking viewer types into engine code. If that breaks, the rest of +the workplan stops and a new ADR is required for ADR-0004 (PDF viewer choice). + +## Dependency Order + +``` +T01 (engine types: Document, Representation, Annotation, Selector, EvidenceItem) + └─ T02 (PDF viewer adapter spike — store + reload selectors as JSON) + └─ T03 (evidence-source: PDF ingest, fingerprint, canonical text) + └─ T04 (evidence-anchor: TextQuote + TextPosition resolution against representation) + └─ T05 (in-memory repositories + engine services) + └─ T06 (citation-work UI: collection list + viewer shell + sidebar) + └─ T07 (annotation create flow) + └─ T08 (click-to-reopen flow) + └─ T09 (end-to-end test of PRD scenario steps 1-4) +``` + +--- + +## T01 — Engine types in `src/shared/` + +```task +id: CE-WP-0002-T01 +priority: critical +status: todo +``` + +Translate the type definitions in `wiki/SharedContracts.md` §1 and §3 into +TypeScript under `src/shared/`: + +- `src/shared/document.ts` — `Document`, `DocumentRepresentation`, `PageMap`, + `OffsetMap` +- `src/shared/selector.ts` — `Selector` discriminated union with at minimum + `TextQuoteSelector`, `TextPositionSelector`, `PdfRectSelector`, + `PdfPageTextSelector`. Other selector kinds defined as `never`-typed stubs + for now. +- `src/shared/annotation.ts` — `Annotation` with `selectors`, `quote`, + `note`, `normalizeVersion` +- `src/shared/evidence.ts` — `EvidenceItem`, `EvidenceItem.status` enum per + §2.2 +- `src/shared/ids.ts` — branded ID types and a `newId(prefix)` helper + +No services, no behavior. Pure data shapes + the ID helper. + +Add JSDoc on each type pointing at the §-reference in +`wiki/SharedContracts.md` it implements. + +--- + +## T02 — PDF viewer adapter spike + +```task +id: CE-WP-0002-T02 +priority: critical +status: todo +depends_on: [T01] +``` + +**This is the architectural spike.** Build a throwaway +`src/anchor/pdf-viewer-adapter-spike.tsx` that: + +1. Loads `fixtures/pdfs/simple.pdf` using `react-pdf-highlighter-plus` + (assumed; if a better library appears, document it in ADR-0004 before + committing). +2. Lets the user select text and produces selectors per `T01` shapes. +3. Serializes the selectors to a JSON blob in `localStorage`. +4. On reload, reads the blob, asks the adapter to resolve, scrolls to the + passage, and renders a highlight. + +Success criteria: +- Reload-and-resolve works for all fixture PDFs. +- No PDF.js or `react-pdf-highlighter-plus` types appear in any file under + `src/shared/` or `src/engine/`. +- The adapter's public surface matches the contract in + `wiki/SharedContracts.md` §5. + +If success criteria fail: stop. Write a short note in +`docs/decisions/ADR-0004-pdf-viewer-library.md` describing the failure mode +and proposed alternative. Do not proceed with T03+. + +--- + +## T03 — `src/source/`: PDF ingest, fingerprint, canonical text + +```task +id: CE-WP-0002-T03 +priority: high +status: todo +depends_on: [T02] +``` + +Implement under `src/source/pdf/`: + +- `ingest.ts` — `ingestPdf(file: File | Buffer): Promise<{ document: Document; representation: DocumentRepresentation }>` +- `fingerprint.ts` — stable SHA-256 of bytes +- `extract.ts` — uses PDF.js to extract page text; runs `normalize()` from + T04 of WP-0001 over the canonical text; builds the `PageMap` and + `OffsetMap` per `Document.DocumentRepresentation` + +Tests use the fixture corpus from `CE-WP-0001-T05`. For each fixture, +extracted canonical text must contain the manifest's known-good quote. + +--- + +## T04 — `src/anchor/`: TextQuote and TextPosition resolution + +```task +id: CE-WP-0002-T04 +priority: high +status: todo +depends_on: [T01, T03] +``` + +Implement under `src/anchor/`: + +- `selectors/create.ts` — given a `SelectionCapture` from the adapter, build + the maximal set of available selectors (always `TextQuoteSelector` with + prefix/suffix; `TextPositionSelector` when the representation provides + offsets; PDF rect/text selectors when on PDF) +- `selectors/resolve.ts` — implements the resolution strategy from + `wiki/ArchitectureOverview.md` §7 (try position, verify quote, fall back + through quote+prefix/suffix, return `AnchorResolution`) +- `selectors/types.ts` — `AnchorResolution`, `SelectionCapture`, + `ResolvedAnchorTarget` + +Fuzzy matching is out of scope here — return `unresolved` if exact+prefix/suffix +fails. Fuzzy is a later workplan. + +Unit tests using fixtures: for each fixture+known-quote pair, create +selectors then immediately resolve them; resolution must succeed with +confidence ≥ 0.9. + +--- + +## T05 — In-memory repositories + engine services + +```task +id: CE-WP-0002-T05 +priority: high +status: todo +depends_on: [T01] +``` + +Under `src/engine/`: + +- `repos/in-memory.ts` — `Map`-backed implementations of + `DocumentRepository`, `AnnotationRepository`, `EvidenceItemRepository` +- `services/documents.ts`, `services/annotations.ts`, `services/evidence.ts` + — thin orchestration layer that creates IDs, calls repos, and emits the + events from `wiki/SharedContracts.md` §4 +- `events/bus.ts` — minimal pub/sub. Synchronous for MVP. + +No persistence to disk yet. ADR-0005 (persistence) is still pending. + +--- + +## T06 — `src/work/`: collection list + viewer shell + sidebar + +```task +id: CE-WP-0002-T06 +priority: high +status: todo +depends_on: [T02, T05] +``` + +Under `src/work/` and `src/app/`: + +- `src/app/App.tsx` — three-pane layout per Architecture §12.1: collection + list (left), viewer (centre), evidence sidebar (right) +- `src/work/CollectionList.tsx` — lists `fixtures/pdfs/manifest.json` + entries; click to load +- `src/work/ViewerShell.tsx` — hosts the viewer adapter from T02 wrapped + cleanly; viewer adapter API is the only surface `work/` uses +- `src/work/EvidenceSidebar.tsx` — lists evidence items for the current + document, shows quote + commentary + status + +No styling beyond minimum legibility. CSS in Tailwind or vanilla — pick one, +note in ADR-0001 if it wasn't already. + +--- + +## T07 — Annotation create flow + +```task +id: CE-WP-0002-T07 +priority: high +status: todo +depends_on: [T04, T05, T06] +``` + +Wire selection → annotation → evidence item: + +1. User selects text in the viewer. +2. A small toolbar appears with a comment input + Save button. +3. On Save: adapter produces `SelectionCapture` → anchor creates `Selector[]` + → engine creates `Annotation` → engine creates `EvidenceItem` with the + commentary → sidebar updates. + +Active state lives in a single React context for now; no Redux/Zustand. + +--- + +## T08 — Click-to-reopen flow + +```task +id: CE-WP-0002-T08 +priority: critical +status: todo +depends_on: [T04, T06, T07] +``` + +Implement the round trip: + +1. User clicks an evidence item in the sidebar. +2. Engine loads the annotation → anchor resolves selectors against the + current representation → adapter scrolls to and highlights the target. + +Critically, this must also work **after a page reload**. Persistence to +`localStorage` is acceptable for MVP (decide explicitly in +`ADR-0005-persistence.md` that we are deferring real persistence). + +--- + +## T09 — End-to-end test of PRD scenario steps 1-4 + +```task +id: CE-WP-0002-T09 +priority: high +status: todo +depends_on: [T07, T08] +``` + +Write a Playwright (or similar) E2E test that: + +1. Opens the app. +2. Picks `simple.pdf`. +3. Programmatically selects the known-good quote from the manifest. +4. Saves an evidence item with a comment. +5. Verifies the item appears in the sidebar. +6. Reloads the page. +7. Clicks the evidence item. +8. Verifies the highlight is rendered on the expected page. + +This is the contract for "MVP slice 1 works". If it passes, CE-WP-0003 may +begin. diff --git a/workplans/CE-WP-0003-form-binding-visual-guide.md b/workplans/CE-WP-0003-form-binding-visual-guide.md new file mode 100644 index 0000000..1000f1b --- /dev/null +++ b/workplans/CE-WP-0003-form-binding-visual-guide.md @@ -0,0 +1,246 @@ +--- +id: CE-WP-0003 +type: workplan +title: "Form binding + visual guide — EvidenceLink, rect registry, SVG overlay" +domain: citation_evidence +repo: citation-evidence +repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6 +status: todo +owner: Bernd +created: 2026-05-24 +updated: 2026-05-24 +depends_on_workplan: CE-WP-0002 +spec_refs: + - wiki/ProductRequirementsDocument.md + - wiki/ArchitectureOverview.md + - wiki/SharedContracts.md +--- + +# CE-WP-0003 — Form Binding + Visual Guide + +Build the evidence-backed form mode and the SVG visual guide overlay. +After this workplan, a user can: + +1. Open a form next to the document viewer. +2. Drag (or click-to-link) an evidence item from the sidebar onto a form + field. +3. Click a form field → its linked evidence items appear → the active + evidence's source passage is scrolled into view and highlighted → an SVG + guide visually connects the field, the evidence card, and the highlight. +4. Cycle through multiple evidence items on the same field. + +This is the workplan that stress-tests the rect-registry contract from +`wiki/SharedContracts.md` §7. The form, the evidence card, and the viewer's +highlight all need to publish rects to a single overlay that re-renders on +scroll/resize/focus. + +## Dependency Order + +``` +T01 (EvidenceLink + EvidenceSet types + relation/status enums) + └─ T02 (binding service + in-memory link repo + active-state machine) + └─ T03 (rect registry — the contract from SharedContracts.md §7) + └─ T04 (form schema + simple field renderer) + └─ T05 (side-by-side layout + drag-or-click to link) + └─ T06 (active-evidence cycling on a field) + └─ T07 (SVG visual guide overlay) + └─ T08 (E2E test of PRD scenario steps 5-9) +``` + +--- + +## T01 — `EvidenceLink` + `EvidenceSet` types + +```task +id: CE-WP-0003-T01 +priority: critical +status: todo +``` + +Add under `src/shared/`: + +- `src/shared/evidence-link.ts` — `EvidenceLink`, `EvidenceLink.status` + enum per SharedContracts §2.4, `EvidenceLink.relation` enum per §2.5, + `EvidenceTarget` generic shape +- `src/shared/evidence-set.ts` — `EvidenceSet` with `activeEvidenceItemId` + +No services. Pure shapes. + +Add a unit test asserting that the union of all enum values matches the +`SharedContracts.md` lists exactly — if someone adds a value without +updating the doc, the test fails. + +--- + +## T02 — Binding service + in-memory link repo + active-state machine + +```task +id: CE-WP-0003-T02 +priority: high +status: todo +depends_on: [T01] +``` + +Under `src/binder/`: + +- `repos/in-memory-links.ts` — Map-backed `EvidenceLinkRepository` +- `services/bindings.ts` — `linkEvidenceToTarget`, `unlinkEvidence`, + `listEvidenceForTarget`, `setActiveEvidence` +- `state/active.ts` — a small machine tracking + `(activeTarget, activeEvidenceItem, activeAnnotation)`. Exposed as a React + context. + +Emit the events from SharedContracts §4 (`EvidenceLinkCreated`, +`EvidenceItemActivated`, `FormFieldActivated`). + +--- + +## T03 — Rect registry (the SharedContracts §7 contract) + +```task +id: CE-WP-0003-T03 +priority: critical +status: todo +depends_on: [T02] +``` + +Implement under `src/binder/visual-guide/`: + +- `rect-registry.ts` — `RectRegistry` with `register`, `getRect`, + `subscribe` per SharedContracts §7 +- `react-hooks.ts` — `useRegisterRect(kind, id, ref)` for components to + register a ref-derived rect +- `events.ts` — registry emits `rect-changed` events on + scroll/resize/focus/active-evidence-change (use ResizeObserver + + IntersectionObserver + window resize + window scroll listeners) + +Unit tests: register a fake field, evidence card, and highlight; mutate +their bounding rects; assert subscribers fire with the new rects. + +**This contract must not change after T03.** Three subsystems will depend on +it in T05/T06/T07. + +--- + +## T04 — Form schema + simple field renderer + +```task +id: CE-WP-0003-T04 +priority: medium +status: todo +depends_on: [T01] +``` + +A deliberately minimal form schema lives in `src/app/forms/demo-schema.ts`: + +```ts +type FormFieldSchema = + | { type: "text"; id: string; label: string } + | { type: "textarea"; id: string; label: string } + | { type: "date"; id: string; label: string }; +``` + +JSON Schema is **not** used yet — defer that to a later ADR. The MVP form +just needs to render 3-4 fields and accept evidence links. + +- `src/work/FormRenderer.tsx` renders the schema as a basic form +- Each field registers itself with the rect registry as kind `"field"` with + the field's `id` + +--- + +## T05 — Side-by-side layout + link evidence to field + +```task +id: CE-WP-0003-T05 +priority: high +status: todo +depends_on: [T02, T04] +``` + +A new app route `/forms/demo` shows the side-by-side layout from Architecture +§12.2: + +- Left: `FormRenderer` with a demo schema (3 fields) +- Right: viewer (reusing `ViewerShell` from CE-WP-0002) +- Bottom strip or popover: evidence list + +Linking interaction: click an evidence item, then click a field → link +created. (Drag-and-drop is a polish item, not MVP.) Visual indication on +linked fields (e.g. a chip showing the count of linked evidence items). + +--- + +## T06 — Active-evidence cycling on a field + +```task +id: CE-WP-0003-T06 +priority: high +status: todo +depends_on: [T05] +``` + +When a field is focused: + +1. Binder loads the field's evidence set. +2. The first evidence item becomes active. +3. The viewer scrolls to and highlights its annotation. +4. Keyboard `Tab`/`Shift-Tab` within the field's evidence chips cycles + active evidence; viewer scrolls accordingly. +5. The evidence sidebar highlights the active evidence card. + +Each evidence card registers itself with the rect registry as +`"evidence-card"`. + +--- + +## T07 — SVG visual guide overlay + +```task +id: CE-WP-0003-T07 +priority: high +status: todo +depends_on: [T03, T06] +``` + +Implement `src/binder/visual-guide/Overlay.tsx`: + +- Single absolutely-positioned SVG covering the viewport +- Subscribes to the rect registry +- On every change, redraws two curves: `field → evidence-card` and + `evidence-card → highlight` +- Active-only — only the currently active triple gets drawn +- Throttled to animation frames + +Acceptance: scroll the viewer, resize the window, change active evidence — +the guide tracks every change without visible lag. + +The viewer adapter from CE-WP-0002 must expose +`getHighlightClientRects(annotationId)` so the highlight's rect can be +registered. + +--- + +## T08 — E2E test of PRD scenario steps 5-9 + +```task +id: CE-WP-0003-T08 +priority: high +status: todo +depends_on: [T05, T07] +``` + +Extend the Playwright E2E from CE-WP-0002-T09: + +5. Navigate to `/forms/demo`. +6. Link the previously-created evidence item to the "summary" field. +7. Click the "summary" field. +8. Assert the field, the evidence card, and the highlight all have an + `aria-current="true"` (or equivalent active marker). +9. Assert the SVG overlay contains exactly two `` elements (one + field→card, one card→highlight). +10. Scroll the viewer; assert the SVG paths' endpoints update within the + next animation frame. + +If this passes, the form-binding slice is complete and CE-WP-0004 may run +in parallel with any deferred polish work. diff --git a/workplans/CE-WP-0004-citation-card-export.md b/workplans/CE-WP-0004-citation-card-export.md new file mode 100644 index 0000000..7399466 --- /dev/null +++ b/workplans/CE-WP-0004-citation-card-export.md @@ -0,0 +1,164 @@ +--- +id: CE-WP-0004 +type: workplan +title: "Citation card export — Markdown and HTML renderers, sidebar export" +domain: citation_evidence +repo: citation-evidence +repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6 +status: todo +owner: Bernd +created: 2026-05-24 +updated: 2026-05-24 +depends_on_workplan: CE-WP-0002 +spec_refs: + - wiki/ProductRequirementsDocument.md + - wiki/ArchitectureOverview.md + - wiki/SharedContracts.md +--- + +# CE-WP-0004 — Citation Card Export + +The final step of the MVP scenario: turn an evidence item into a portable +Markdown or HTML citation card. + +After this workplan, a user can: + +1. Click "Export" on an evidence item in the sidebar. +2. Choose Markdown or HTML. +3. Get a clipboard-ready citation card with quote, source label, + commentary, and a link back to source context. + +This workplan can run in parallel with CE-WP-0003 once CE-WP-0002 is done — +it touches different code paths. + +## Dependency Order + +``` +T01 (CitationCard type + open-context URL convention) + └─ T02 (Markdown renderer) + └─ T03 (HTML renderer) + └─ T04 (sidebar Export button + copy-to-clipboard) + └─ T05 (E2E test of PRD scenario step 10) +``` + +--- + +## T01 — `CitationCard` type + open-context URL convention + +```task +id: CE-WP-0004-T01 +priority: high +status: todo +``` + +Under `src/shared/`: + +- `src/shared/citation-card.ts` — `CitationCard` per Architecture §4.7 +- `src/shared/open-context-url.ts` — function `openContextUrl(annotationId)` + returning a URL of the form + `/viewer?document=&annotation=` (per Architecture §14.3) + +The URL is the deep link that an exported card uses to reopen the source +context in this MVP. When persistence becomes real (post-MVP), the URL +scheme stays the same. + +--- + +## T02 — Markdown citation card renderer + +```task +id: CE-WP-0004-T02 +priority: high +status: todo +depends_on: [T01] +``` + +Under `src/engine/rendering/`: + +- `markdown.ts` — `renderCitationCardMarkdown(evidenceItem, document, annotation): string` + +Output format (lock this in `docs/decisions/ADR-0007-citation-card-format.md`): + +```markdown +> {quote} + +— *{sourceLabel}* · [Open source]({openContextUrl}) + +{commentary} +``` + +Where `sourceLabel` is `document.title` if present, else the filename, else +the document URI. + +Unit tests: snapshot a few rendered cards against fixtures. + +--- + +## T03 — HTML citation card renderer + +```task +id: CE-WP-0004-T03 +priority: high +status: todo +depends_on: [T01] +``` + +Under `src/engine/rendering/`: + +- `html.ts` — `renderCitationCardHtml(evidenceItem, document, annotation): string` + +Output: a single `