Turn the MVP into a self-contained demo. Users now:
1. Land on an empty-state and create a named session.
2. Drag-drop or pick arbitrary PDFs into that session.
3. Annotate, build evidence, link to form fields — all session-scoped.
4. Export the whole session as a single .zip archive (manifest +
per-document PDFs).
5. Import a .zip back — into a new session, or merged into an
existing one (documents deduped by SHA-256 fingerprint;
annotations/evidence/links added additively).
Architecture:
- New shared types: SessionId, Session, SessionArchiveManifest +
parseSessionArchiveManifest with schema-version validation.
- SessionService (engine/services/sessions.ts) handles lifecycle
(create/rename/delete/setActive) + emits 4 new events through its
own bus; SharedContracts.md §4 lists the additions.
- SessionProvider (work/SessionContext.tsx) owns the cross-session
state: service, per-session PdfByteStore registry, per-session
version counter that drives EngineProvider remounts after imports.
- EngineProvider becomes session-aware (sessionId prop drives per-
session localStorage keys). Bumping engineRevision after
restoreFromStorage forces consumers to re-render so restored repos
show up immediately.
- PdfByteStore (source/pdf/byte-store.ts) holds Uint8Array bytes per
document and mints blob URLs; ingestPdfFromFile is the upload
entry-point that wraps the existing ingestPdf pipeline.
- ADR-0008 locks the ZIP layout (manifest.json + documents/<id>.pdf),
the manifest schema (schemaVersion 1), and the merge-on-collision
policy. JSZip is the only new dependency.
- App.tsx restructured: SessionProvider at the root, EngineProvider
keyed by ${sessionId}:${version}, hash routing #/s/<id>[/forms/demo],
SessionMenu top-bar, CreateFirstSession empty state.
- New DocumentRemoved event for per-document delete cleanup in
CollectionList; engine.documents.remove() is the new service method.
Tests:
- Unit: 16 SessionService lifecycle + persistence tests;
per-session snapshot round-trip; PdfByteStore + ingestPdfFromFile;
SessionArchive parser; exportSessionZip + importSessionZip with
create + merge + corrupt-archive paths.
- DOM: UploadDropzone, session-scoped CollectionList delete,
SessionMenu create/switch/rename, routing parser.
- E2E: tests/integration/session-export-reimport.dom.test.tsx walks
the full create → annotate → export → reimport flow and asserts
the additive merge (deduped doc + doubled evidence rows).
- Legacy E2Es updated to use a seed-session helper instead of the
removed fixture-button flow.
Known limitation (documented in ADR-0008): re-importing your own
freshly-exported ZIP creates duplicate annotations. Forward pointer
left for an importBundleId follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
Shared Contracts — citation-evidence
This document is the single source of truth for everything that more than one subsystem in the citation-evidence ecosystem must agree on:
- the vocabulary (entity names and what they mean),
- the canonical state enums for entities that flow across repo boundaries,
- the relation type vocabulary,
- the selector type taxonomy,
- the event type vocabulary,
- the ownership rules for shared types versus shared behavior.
The five sister repos (citation-engine, evidence-anchor, evidence-source,
citation-work, evidence-binder) defer to this document. When their
INTENT.md files refer to "shared contracts", they mean this file.
During the umbrella-first MVP phase, the TypeScript implementations of
these contracts live in citation-evidence/src/shared/ and are imported by
the per-subsystem code under citation-evidence/src/{engine,anchor,source,work,binder}/.
When a subsystem extracts to its own repo, it takes its slice of the shared
types with it — but this document remains the canonical vocabulary.
1. Vocabulary
These nine entities are the vocabulary every subsystem uses.
| Entity | One-line definition | Owner (post-extraction) |
|---|---|---|
Document |
An identified source object: PDF, Markdown, HTML, scan, etc. | citation-engine |
DocumentRepresentation |
A normalized, addressable view of a document (canonical text, page map, structure). | citation-engine |
Selector |
A technical locator for a passage inside a representation. | citation-engine (types) / evidence-anchor (behavior) |
Annotation |
A technical mark on a document range, expressed as one or more selectors plus quote text. | citation-engine |
EvidenceItem |
A meaningful evidence object built from one or more annotations, with commentary and status. | citation-engine |
EvidenceSet |
An ordered group of evidence items associated with a target or topic. | citation-engine (type) / evidence-binder (behavior) |
EvidenceLink |
A relation between an EvidenceItem and a structured target (form field, claim, requirement, …). |
citation-engine (type) / evidence-binder (behavior) |
CitationCard |
A renderable, exportable presentation of an evidence item. | citation-engine |
CitationRecoveryAttempt |
A traceable attempt to locate a cited passage from an external clue. | citation-engine (type) / evidence-source (behavior) |
Ownership rule: types and interfaces flow downward from citation-engine;
behavior flows upward into the specialised repos. Where the table shows a
split, the engine repo holds the data shape and the other repo holds the
algorithms and lifecycle.
2. Canonical state enums
These enums are the authoritative values. Subsystems must not invent local variants without updating this document first.
2.1 Annotation.resolutionStatus
resolved — selectors located the passage with high confidence
ambiguous — multiple plausible candidates found
unresolved — no plausible candidate found
stale — representation has changed since selectors were stored
2.2 EvidenceItem.status
candidate — captured but not yet vetted
confirmed — verified by a user as useful evidence
rejected — explicitly discarded
needs-check — flagged for review
Note: earlier subsystem drafts introduced
strong-support,weak-support, andcontradictson the item. Those concepts now live on the link, not the item — see §2.4.
2.3 Document.reviewStatus (when used by citation-work)
unreviewed
in-review
relevant
rejected
needs-follow-up
cited
verified
citation-work may treat any of these as the active state; the canonical
storage lives on the Document record in citation-engine.
2.4 EvidenceLink.status (per target)
no-evidence
candidate
confirmed
conflicting
insufficient
verified
no-evidence is a derived state computed when a target has zero links;
it is not stored on a link itself.
2.5 EvidenceLink.relation
supports
contradicts
explains
qualifies
source-for
context-for
This is the closed vocabulary for the MVP. Adding a relation requires updating
this document and the EvidenceLink schema together.
2.6 CitationRecoveryAttempt.state
created
source-found-fulltext
source-found-preview-only
source-found-metadata-only
source-not-found
quote-found
quote-not-found
candidate-passages-found
manual-confirmation-needed
confirmed
annotation-created
failed
3. Selector taxonomy
A Selector is a discriminated union of:
TextQuoteSelector exact quote + prefix/suffix context
TextPositionSelector canonical text start/end offsets
PdfRectSelector page number + normalized page rectangles
PdfPageTextSelector page number + page-local text offsets
DomRangeSelector DOM path + range offsets (HTML/Markdown)
StructuralSelector heading/section/AST path
FragmentSelector exported fragment / deep link (export-only)
Selector redundancy rule: when an annotation is created, the system stores all selector types that are available for that document representation, not just one. Resolution tries them in order of expected confidence and stops at the first high-confidence match.
W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping is documented separately (see ADR-0003 — pending).
4. Event vocabulary
Events are the primary integration mechanism between subsystems. The closed event vocabulary for the MVP is:
DocumentImported
DocumentRepresentationGenerated
DocumentRemoved
AnnotationCreated
AnnotationResolved
AnnotationResolutionFailed
EvidenceItemCreated
EvidenceItemUpdated
EvidenceLinkCreated
EvidenceLinkUpdated
EvidenceItemActivated
FormFieldActivated
CitationCardRendered
CitationRecoveryStarted
CitationRecoveryCandidateFound
CitationRecoveryConfirmed
SessionCreated
SessionRenamed
SessionDeleted
SessionActivated
The Session* events live on the cross-session session bus (the
SessionService's own EventBus instance — see CE-WP-0005). The remaining
events live on the per-session engine bus and are scoped to whatever
session is currently active.
Subsystems must emit these events through a shared event bus owned by
citation-engine. Subsystems may listen to any event but must not invent
event types without updating this document.
5. Viewer adapter contract
Viewer adapters are the bridge between a document format and the rest of the
system. They are owned by evidence-anchor as far as the contract goes;
concrete adapters may live in either evidence-anchor or evidence-source
depending on whether the heavy lifting is selector logic or document
representation logic.
interface DocumentViewerAdapter {
mediaTypes: string[];
load(document: Document, representation?: DocumentRepresentation): Promise<void>;
getCurrentSelection(): Promise<SelectionCapture | null>;
createSelectorsFromSelection(selection: SelectionCapture): Promise<Selector[]>;
resolveSelectors(selectors: Selector[]): Promise<AnchorResolution>;
scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise<void>;
renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise<void>;
getHighlightClientRects(annotationId: string): Promise<DOMRect[]>;
}
MVP delivers a single PDFViewerAdapter. HTML and Markdown adapters are
deferred.
6. Canonical text normalization
All text-based selectors and quote matching depend on a deterministic normalization function. The MVP normalization is:
- Unicode NFC normalization.
- Replace all line-ending sequences with
\n. - Collapse runs of horizontal whitespace into a single space.
- Strip soft hyphens (U+00AD).
- Preserve paragraph boundaries (double
\n).
This function is versioned. Stored selectors record the normalization version they were created against. Changing the function later requires either backwards-compatible behavior or a re-anchoring migration.
The reference implementation lives in citation-evidence/src/shared/text/normalize.ts.
7. Visual guide rect registry
The visual-guide overlay (form field → evidence card → source highlight)
requires DOM rects from three independently-rendered subsystems. The contract
is a rect registry owned by evidence-binder:
interface RectRegistry {
register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void;
getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null;
subscribe(listener: (event: RectRegistryEvent) => void): () => void;
}
Each renderer (form, evidence sidebar, viewer adapter) registers a
getRect callback. The overlay queries on-demand and re-renders on scroll,
resize, focus, and active-evidence change.
This contract MUST be defined and stable before any of the three renderers hardens, or the overlay becomes the system's coupling bottleneck.
8. Ownership rules (the short version)
- Types and interfaces flow downward from
citation-engine. - Behavior and algorithms live in the specialised repos.
- Where a concept appears in both a type and a behavior context (e.g.
Selector,EvidenceLink,EvidenceSet,CitationRecoveryAttempt), the engine owns the shape and the specialised repo owns the lifecycle. - The shared event bus is engine-owned; subsystems publish and subscribe but do not extend the event vocabulary unilaterally.
- No new enum values, relation types, event types, or selector kinds land in code without first appearing in this document.
- During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate
small violations in
citation-evidence/src/and reconcile during extraction.
9. Change process
Changes to this document are change to the contract.
- Small additions (a new enum value, a new event type) can be made in a single PR that updates this doc + the type definitions + at least one consumer.
- Breaking changes (renaming an entity, removing a state, changing an
ownership split) require a short ADR in
docs/decisions/and a heads-up progress event on the state-hub.
10. Pending ADRs that will affect this document
These are listed in docs/decisions/ once written. Until then the document
reflects the current best understanding from the architecture overview.
- ADR-0001 — Umbrella-first MVP strategy (decided 2026-05-24, this session).
- ADR-0002 — Monorepo vs polyrepo packaging (pending).
- ADR-0003 — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending).
- ADR-0004 — PDF viewer library choice:
react-pdf-highlighter-plusvs PDF.js direct (pending). - ADR-0005 — Persistence: local-first SQLite vs Postgres from day one (pending).
- ADR-0006 — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here).