# Shared Contracts — citation-evidence This document is the **single source of truth** for everything that more than one subsystem in the citation-evidence ecosystem must agree on: - the **vocabulary** (entity names and what they mean), - the **canonical state enums** for entities that flow across repo boundaries, - the **relation type** vocabulary, - the **selector type** taxonomy, - the **event type** vocabulary, - the **ownership rules** for shared types versus shared behavior. The five sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`, `citation-work`, `evidence-binder`) defer to this document. When their `INTENT.md` files refer to "shared contracts", they mean this file. During the umbrella-first MVP phase, the **TypeScript implementations** of these contracts live in `citation-evidence/src/shared/` and are imported by the per-subsystem code under `citation-evidence/src/{engine,anchor,source,work,binder}/`. When a subsystem extracts to its own repo, it takes its slice of the shared types with it — but this document remains the canonical vocabulary. --- ## 1. Vocabulary These nine entities are the vocabulary every subsystem uses. | Entity | One-line definition | Owner (post-extraction) | |---------------------------|----------------------------------------------------------------------------------------------------|-------------------------| | `Document` | An identified source object: PDF, Markdown, HTML, scan, etc. | `citation-engine` | | `DocumentRepresentation` | A normalized, addressable view of a document (canonical text, page map, structure). | `citation-engine` | | `Selector` | A technical locator for a passage inside a representation. | `citation-engine` (types) / `evidence-anchor` (behavior) | | `Annotation` | A technical mark on a document range, expressed as one or more selectors plus quote text. | `citation-engine` | | `EvidenceItem` | A meaningful evidence object built from one or more annotations, with commentary and status. | `citation-engine` | | `EvidenceSet` | An ordered group of evidence items associated with a target or topic. | `citation-engine` (type) / `evidence-binder` (behavior) | | `EvidenceLink` | A relation between an `EvidenceItem` and a structured target (form field, claim, requirement, …). | `citation-engine` (type) / `evidence-binder` (behavior) | | `CitationCard` | A renderable, exportable presentation of an evidence item. | `citation-engine` | | `CitationRecoveryAttempt` | A traceable attempt to locate a cited passage from an external clue. | `citation-engine` (type) / `evidence-source` (behavior) | **Ownership rule:** *types and interfaces flow downward from `citation-engine`; behavior flows upward into the specialised repos*. Where the table shows a split, the engine repo holds the data shape and the other repo holds the algorithms and lifecycle. --- ## 2. Canonical state enums These enums are the authoritative values. Subsystems must not invent local variants without updating this document first. ### 2.1 `Annotation.resolutionStatus` ``` resolved — selectors located the passage with high confidence ambiguous — multiple plausible candidates found unresolved — no plausible candidate found stale — representation has changed since selectors were stored ``` ### 2.2 `EvidenceItem.status` ``` candidate — captured but not yet vetted confirmed — verified by a user as useful evidence rejected — explicitly discarded needs-check — flagged for review ``` > **Note:** earlier subsystem drafts introduced `strong-support`, `weak-support`, > and `contradicts` on the item. Those concepts now live on the **link**, not > the item — see §2.4. ### 2.3 `Document.reviewStatus` (when used by `citation-work`) ``` unreviewed in-review relevant rejected needs-follow-up cited verified ``` `citation-work` may treat any of these as the active state; the canonical storage lives on the Document record in `citation-engine`. ### 2.4 `EvidenceLink.status` (per target) ``` no-evidence candidate confirmed conflicting insufficient verified ``` `no-evidence` is a *derived* state computed when a target has zero links; it is not stored on a link itself. ### 2.5 `EvidenceLink.relation` ``` supports contradicts explains qualifies source-for context-for ``` This is the closed vocabulary for the MVP. Adding a relation requires updating this document and the `EvidenceLink` schema together. ### 2.6 `CitationRecoveryAttempt.state` ``` created source-found-fulltext source-found-preview-only source-found-metadata-only source-not-found quote-found quote-not-found candidate-passages-found manual-confirmation-needed confirmed annotation-created failed ``` --- ## 3. Selector taxonomy A `Selector` is a discriminated union of: ``` TextQuoteSelector exact quote + prefix/suffix context TextPositionSelector canonical text start/end offsets PdfRectSelector page number + normalized page rectangles PdfPageTextSelector page number + page-local text offsets DomRangeSelector DOM path + range offsets (HTML/Markdown) StructuralSelector heading/section/AST path FragmentSelector exported fragment / deep link (export-only) ``` **Selector redundancy rule:** when an annotation is created, the system stores *all selector types that are available* for that document representation, not just one. Resolution tries them in order of expected confidence and stops at the first high-confidence match. W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping is documented separately (see ADR-0003 — pending). --- ## 4. Event vocabulary Events are the primary integration mechanism between subsystems. The closed event vocabulary for the MVP is: ``` DocumentImported DocumentRepresentationGenerated DocumentRemoved AnnotationCreated AnnotationResolved AnnotationResolutionFailed EvidenceItemCreated EvidenceItemUpdated EvidenceLinkCreated EvidenceLinkUpdated EvidenceItemActivated FormFieldActivated CitationCardRendered CitationRecoveryStarted CitationRecoveryCandidateFound CitationRecoveryConfirmed SessionCreated SessionRenamed SessionDeleted SessionActivated ``` The `Session*` events live on the cross-session session bus (the SessionService's own EventBus instance — see CE-WP-0005). The remaining events live on the per-session engine bus and are scoped to whatever session is currently active. Subsystems must emit these events through a shared event bus owned by `citation-engine`. Subsystems may listen to any event but must not invent event types without updating this document. --- ## 5. Viewer adapter contract Viewer adapters are the bridge between a document format and the rest of the system. They are **owned by `evidence-anchor`** as far as the contract goes; concrete adapters may live in either `evidence-anchor` or `evidence-source` depending on whether the heavy lifting is selector logic or document representation logic. ```ts interface DocumentViewerAdapter { mediaTypes: string[]; load(document: Document, representation?: DocumentRepresentation): Promise; getCurrentSelection(): Promise; createSelectorsFromSelection(selection: SelectionCapture): Promise; resolveSelectors(selectors: Selector[]): Promise; scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise; renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise; getHighlightClientRects(annotationId: string): Promise; } ``` MVP delivers a single `PDFViewerAdapter`. HTML and Markdown adapters are deferred. --- ## 6. Canonical text normalization All text-based selectors and quote matching depend on a deterministic normalization function. The MVP normalization is: 1. Unicode NFC normalization. 2. Replace all line-ending sequences with `\n`. 3. Collapse runs of horizontal whitespace into a single space. 4. Strip soft hyphens (U+00AD). 5. Preserve paragraph boundaries (double `\n`). **This function is versioned.** Stored selectors record the normalization version they were created against. Changing the function later requires either backwards-compatible behavior or a re-anchoring migration. The reference implementation lives in `citation-evidence/src/shared/text/normalize.ts`. --- ## 7. Visual guide rect registry The visual-guide overlay (form field → evidence card → source highlight) requires DOM rects from three independently-rendered subsystems. The contract is a **rect registry** owned by `evidence-binder`: ```ts interface RectRegistry { register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void; getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null; subscribe(listener: (event: RectRegistryEvent) => void): () => void; } ``` Each renderer (form, evidence sidebar, viewer adapter) registers a `getRect` callback. The overlay queries on-demand and re-renders on scroll, resize, focus, and active-evidence change. This contract MUST be defined and stable before any of the three renderers hardens, or the overlay becomes the system's coupling bottleneck. --- ## 8. Ownership rules (the short version) 1. **Types and interfaces** flow downward from `citation-engine`. 2. **Behavior and algorithms** live in the specialised repos. 3. Where a concept appears in both a type and a behavior context (e.g. `Selector`, `EvidenceLink`, `EvidenceSet`, `CitationRecoveryAttempt`), the engine owns the shape and the specialised repo owns the lifecycle. 4. **The shared event bus is engine-owned**; subsystems publish and subscribe but do not extend the event vocabulary unilaterally. 5. **No new enum values, relation types, event types, or selector kinds** land in code without first appearing in this document. 6. During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate small violations in `citation-evidence/src/` and reconcile during extraction. --- ## 9. Change process Changes to this document are change to the contract. - Small additions (a new enum value, a new event type) can be made in a single PR that updates this doc + the type definitions + at least one consumer. - Breaking changes (renaming an entity, removing a state, changing an ownership split) require a short ADR in `docs/decisions/` and a heads-up progress event on the state-hub. --- ## 10. Pending ADRs that will affect this document These are listed in `docs/decisions/` once written. Until then the document reflects the current best understanding from the architecture overview. - **ADR-0001** — Umbrella-first MVP strategy (decided 2026-05-24, this session). - **ADR-0002** — Monorepo vs polyrepo packaging (pending). - **ADR-0003** — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending). - **ADR-0004** — PDF viewer library choice: `react-pdf-highlighter-plus` vs PDF.js direct (pending). - **ADR-0005** — Persistence: local-first SQLite vs Postgres from day one (pending). - **ADR-0006** — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here).