generated from coulomb/repo-seed
Bootstrap @citation-evidence/engine as a standalone TypeScript package with shared types and engine services copied from the umbrella MVP. All 89 tests pass with lint and typecheck clean.
307 lines
12 KiB
Markdown
307 lines
12 KiB
Markdown
# Shared Contracts — citation-evidence
|
|
|
|
This document is the **single source of truth** for everything that more than one
|
|
subsystem in the citation-evidence ecosystem must agree on:
|
|
|
|
- the **vocabulary** (entity names and what they mean),
|
|
- the **canonical state enums** for entities that flow across repo boundaries,
|
|
- the **relation type** vocabulary,
|
|
- the **selector type** taxonomy,
|
|
- the **event type** vocabulary,
|
|
- the **ownership rules** for shared types versus shared behavior.
|
|
|
|
The five sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`,
|
|
`citation-work`, `evidence-binder`) defer to this document. When their
|
|
`INTENT.md` files refer to "shared contracts", they mean this file.
|
|
|
|
During the umbrella-first MVP phase, the **TypeScript implementations** of
|
|
these contracts live in `citation-evidence/src/shared/` and are imported by
|
|
the per-subsystem code under `citation-evidence/src/{engine,anchor,source,work,binder}/`.
|
|
When a subsystem extracts to its own repo, it takes its slice of the shared
|
|
types with it — but this document remains the canonical vocabulary.
|
|
|
|
---
|
|
|
|
## 1. Vocabulary
|
|
|
|
These nine entities are the vocabulary every subsystem uses.
|
|
|
|
| Entity | One-line definition | Owner (post-extraction) |
|
|
|---------------------------|----------------------------------------------------------------------------------------------------|-------------------------|
|
|
| `Document` | An identified source object: PDF, Markdown, HTML, scan, etc. | `citation-engine` |
|
|
| `DocumentRepresentation` | A normalized, addressable view of a document (canonical text, page map, structure). | `citation-engine` |
|
|
| `Selector` | A technical locator for a passage inside a representation. | `citation-engine` (types) / `evidence-anchor` (behavior) |
|
|
| `Annotation` | A technical mark on a document range, expressed as one or more selectors plus quote text. | `citation-engine` |
|
|
| `EvidenceItem` | A meaningful evidence object built from one or more annotations, with commentary and status. | `citation-engine` |
|
|
| `EvidenceSet` | An ordered group of evidence items associated with a target or topic. | `citation-engine` (type) / `evidence-binder` (behavior) |
|
|
| `EvidenceLink` | A relation between an `EvidenceItem` and a structured target (form field, claim, requirement, …). | `citation-engine` (type) / `evidence-binder` (behavior) |
|
|
| `CitationCard` | A renderable, exportable presentation of an evidence item. | `citation-engine` |
|
|
| `CitationRecoveryAttempt` | A traceable attempt to locate a cited passage from an external clue. | `citation-engine` (type) / `evidence-source` (behavior) |
|
|
|
|
**Ownership rule:** *types and interfaces flow downward from `citation-engine`;
|
|
behavior flows upward into the specialised repos*. Where the table shows a
|
|
split, the engine repo holds the data shape and the other repo holds the
|
|
algorithms and lifecycle.
|
|
|
|
---
|
|
|
|
## 2. Canonical state enums
|
|
|
|
These enums are the authoritative values. Subsystems must not invent local
|
|
variants without updating this document first.
|
|
|
|
### 2.1 `Annotation.resolutionStatus`
|
|
|
|
```
|
|
resolved — selectors located the passage with high confidence
|
|
ambiguous — multiple plausible candidates found
|
|
unresolved — no plausible candidate found
|
|
stale — representation has changed since selectors were stored
|
|
```
|
|
|
|
### 2.2 `EvidenceItem.status`
|
|
|
|
```
|
|
candidate — captured but not yet vetted
|
|
confirmed — verified by a user as useful evidence
|
|
rejected — explicitly discarded
|
|
needs-check — flagged for review
|
|
```
|
|
|
|
> **Note:** earlier subsystem drafts introduced `strong-support`, `weak-support`,
|
|
> and `contradicts` on the item. Those concepts now live on the **link**, not
|
|
> the item — see §2.4.
|
|
|
|
### 2.3 `Document.reviewStatus` (when used by `citation-work`)
|
|
|
|
```
|
|
unreviewed
|
|
in-review
|
|
relevant
|
|
rejected
|
|
needs-follow-up
|
|
cited
|
|
verified
|
|
```
|
|
|
|
`citation-work` may treat any of these as the active state; the canonical
|
|
storage lives on the Document record in `citation-engine`.
|
|
|
|
### 2.4 `EvidenceLink.status` (per target)
|
|
|
|
```
|
|
no-evidence
|
|
candidate
|
|
confirmed
|
|
conflicting
|
|
insufficient
|
|
verified
|
|
```
|
|
|
|
`no-evidence` is a *derived* state computed when a target has zero links;
|
|
it is not stored on a link itself.
|
|
|
|
### 2.5 `EvidenceLink.relation`
|
|
|
|
```
|
|
supports
|
|
contradicts
|
|
explains
|
|
qualifies
|
|
source-for
|
|
context-for
|
|
```
|
|
|
|
This is the closed vocabulary for the MVP. Adding a relation requires updating
|
|
this document and the `EvidenceLink` schema together.
|
|
|
|
### 2.6 `CitationRecoveryAttempt.state`
|
|
|
|
```
|
|
created
|
|
source-found-fulltext
|
|
source-found-preview-only
|
|
source-found-metadata-only
|
|
source-not-found
|
|
quote-found
|
|
quote-not-found
|
|
candidate-passages-found
|
|
manual-confirmation-needed
|
|
confirmed
|
|
annotation-created
|
|
failed
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Selector taxonomy
|
|
|
|
A `Selector` is a discriminated union of:
|
|
|
|
```
|
|
TextQuoteSelector exact quote + prefix/suffix context
|
|
TextPositionSelector canonical text start/end offsets
|
|
PdfRectSelector page number + normalized page rectangles
|
|
PdfPageTextSelector page number + page-local text offsets
|
|
DomRangeSelector DOM path + range offsets (HTML/Markdown)
|
|
StructuralSelector heading/section/AST path
|
|
FragmentSelector exported fragment / deep link (export-only)
|
|
```
|
|
|
|
**Selector redundancy rule:** when an annotation is created, the system stores
|
|
*all selector types that are available* for that document representation, not
|
|
just one. Resolution tries them in order of expected confidence and stops at
|
|
the first high-confidence match.
|
|
|
|
W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping
|
|
is documented separately (see ADR-0003 — pending).
|
|
|
|
---
|
|
|
|
## 4. Event vocabulary
|
|
|
|
Events are the primary integration mechanism between subsystems. The closed
|
|
event vocabulary for the MVP is:
|
|
|
|
```
|
|
DocumentImported
|
|
DocumentRepresentationGenerated
|
|
DocumentRemoved
|
|
AnnotationCreated
|
|
AnnotationResolved
|
|
AnnotationResolutionFailed
|
|
EvidenceItemCreated
|
|
EvidenceItemUpdated
|
|
EvidenceLinkCreated
|
|
EvidenceLinkUpdated
|
|
EvidenceItemActivated
|
|
FormFieldActivated
|
|
CitationCardRendered
|
|
CitationRecoveryStarted
|
|
CitationRecoveryCandidateFound
|
|
CitationRecoveryConfirmed
|
|
SessionCreated
|
|
SessionRenamed
|
|
SessionDeleted
|
|
SessionActivated
|
|
```
|
|
|
|
The `Session*` events live on the cross-session session bus (the
|
|
SessionService's own EventBus instance — see CE-WP-0005). The remaining
|
|
events live on the per-session engine bus and are scoped to whatever
|
|
session is currently active.
|
|
|
|
Subsystems must emit these events through a shared event bus owned by
|
|
`citation-engine`. Subsystems may listen to any event but must not invent
|
|
event types without updating this document.
|
|
|
|
---
|
|
|
|
## 5. Viewer adapter contract
|
|
|
|
Viewer adapters are the bridge between a document format and the rest of the
|
|
system. They are **owned by `evidence-anchor`** as far as the contract goes;
|
|
concrete adapters may live in either `evidence-anchor` or `evidence-source`
|
|
depending on whether the heavy lifting is selector logic or document
|
|
representation logic.
|
|
|
|
```ts
|
|
interface DocumentViewerAdapter {
|
|
mediaTypes: string[];
|
|
load(document: Document, representation?: DocumentRepresentation): Promise<void>;
|
|
getCurrentSelection(): Promise<SelectionCapture | null>;
|
|
createSelectorsFromSelection(selection: SelectionCapture): Promise<Selector[]>;
|
|
resolveSelectors(selectors: Selector[]): Promise<AnchorResolution>;
|
|
scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise<void>;
|
|
renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise<void>;
|
|
getHighlightClientRects(annotationId: string): Promise<DOMRect[]>;
|
|
}
|
|
```
|
|
|
|
MVP delivers a single `PDFViewerAdapter`. HTML and Markdown adapters are
|
|
deferred.
|
|
|
|
---
|
|
|
|
## 6. Canonical text normalization
|
|
|
|
All text-based selectors and quote matching depend on a deterministic
|
|
normalization function. The MVP normalization is:
|
|
|
|
1. Unicode NFC normalization.
|
|
2. Replace all line-ending sequences with `\n`.
|
|
3. Collapse runs of horizontal whitespace into a single space.
|
|
4. Strip soft hyphens (U+00AD).
|
|
5. Preserve paragraph boundaries (double `\n`).
|
|
|
|
**This function is versioned.** Stored selectors record the normalization
|
|
version they were created against. Changing the function later requires either
|
|
backwards-compatible behavior or a re-anchoring migration.
|
|
|
|
The reference implementation lives in `citation-evidence/src/shared/text/normalize.ts`.
|
|
|
|
---
|
|
|
|
## 7. Visual guide rect registry
|
|
|
|
The visual-guide overlay (form field → evidence card → source highlight)
|
|
requires DOM rects from three independently-rendered subsystems. The contract
|
|
is a **rect registry** owned by `evidence-binder`:
|
|
|
|
```ts
|
|
interface RectRegistry {
|
|
register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void;
|
|
getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null;
|
|
subscribe(listener: (event: RectRegistryEvent) => void): () => void;
|
|
}
|
|
```
|
|
|
|
Each renderer (form, evidence sidebar, viewer adapter) registers a
|
|
`getRect` callback. The overlay queries on-demand and re-renders on scroll,
|
|
resize, focus, and active-evidence change.
|
|
|
|
This contract MUST be defined and stable before any of the three renderers
|
|
hardens, or the overlay becomes the system's coupling bottleneck.
|
|
|
|
---
|
|
|
|
## 8. Ownership rules (the short version)
|
|
|
|
1. **Types and interfaces** flow downward from `citation-engine`.
|
|
2. **Behavior and algorithms** live in the specialised repos.
|
|
3. Where a concept appears in both a type and a behavior context (e.g.
|
|
`Selector`, `EvidenceLink`, `EvidenceSet`, `CitationRecoveryAttempt`),
|
|
the engine owns the shape and the specialised repo owns the lifecycle.
|
|
4. **The shared event bus is engine-owned**; subsystems publish and subscribe
|
|
but do not extend the event vocabulary unilaterally.
|
|
5. **No new enum values, relation types, event types, or selector kinds**
|
|
land in code without first appearing in this document.
|
|
6. During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate
|
|
small violations in `citation-evidence/src/` and reconcile during extraction.
|
|
|
|
---
|
|
|
|
## 9. Change process
|
|
|
|
Changes to this document are change to the contract.
|
|
|
|
- Small additions (a new enum value, a new event type) can be made in a single
|
|
PR that updates this doc + the type definitions + at least one consumer.
|
|
- Breaking changes (renaming an entity, removing a state, changing an
|
|
ownership split) require a short ADR in `docs/decisions/` and a heads-up
|
|
progress event on the state-hub.
|
|
|
|
---
|
|
|
|
## 10. Pending ADRs that will affect this document
|
|
|
|
These are listed in `docs/decisions/` once written. Until then the document
|
|
reflects the current best understanding from the architecture overview.
|
|
|
|
- **ADR-0001** — Umbrella-first MVP strategy (decided 2026-05-24, this session).
|
|
- **ADR-0002** — Monorepo vs polyrepo packaging (pending).
|
|
- **ADR-0003** — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending).
|
|
- **ADR-0004** — PDF viewer library choice: `react-pdf-highlighter-plus` vs PDF.js direct (pending).
|
|
- **ADR-0005** — Persistence: local-first SQLite vs Postgres from day one (pending).
|
|
- **ADR-0006** — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here).
|