Documents the umbrella-first MVP decision (2026-05-24). This repo remains INTENT-only until the anchoring interfaces stabilize through real product use (in particular, after PDF selector round-trip is validated end-to-end). Reaffirms: anchor depends only on engine; selector types in engine, selector algorithms here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12 KiB
INTENT
Purpose
This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the citation-evidence ecosystem.
evidence-anchor makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes.
It is responsible for answering the central technical question:
Given a citation or evidence item, how do we find and highlight the cited source passage again?
Primary Utility
The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages.
It should make it possible to:
- capture a selected passage from a document viewer,
- create redundant selectors for that passage,
- resolve selectors back into document context,
- scroll the cited passage into view,
- highlight the cited passage,
- detect unresolved or stale citations,
- support fallback and fuzzy re-anchoring strategies,
- work across PDFs, Markdown, HTML, and later other document formats.
This repository turns annotations from static marks into navigable source references.
Intended Users
Primary users of this repository are developers and agents implementing citation-evidence subsystems.
They include:
- developers building document viewers,
- developers building the review workspace,
- developers implementing evidence-backed forms,
- developers implementing citation recovery,
- developers building source ingestion and document representation pipelines,
- coding agents that need to understand how citations remain connected to source passages.
End users should experience this repository indirectly whenever they click a citation and the correct source context opens.
Strategic Role
The strategic role of evidence-anchor is to protect the citation-evidence system from fragile, viewer-specific annotations.
Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when:
- a PDF is zoomed or resized,
- a document is rendered differently,
- an HTML page changes layout,
- a Markdown document is re-rendered,
- a source document is lightly revised,
- a citation is exported and later reopened.
evidence-anchor provides the durable technical foundation that makes source-backed evidence trustworthy over time.
Core Concept
The core concept of this repository is the anchor.
An anchor is a resolvable reference to a passage or region in a document.
An anchor is not one thing. It should usually be represented by several complementary selectors.
For example, a PDF anchor may include:
exact quote
prefix and suffix context
canonical text offsets
page number
page-local offsets
normalized page rectangles
text item references
An HTML or Markdown anchor may include:
exact quote
prefix and suffix context
canonical text offsets
DOM range
structural path
heading or section context
The system should use the strongest available selectors first and fall back to more flexible selectors when necessary.
Scope
This repository should own:
- selector type definitions related to anchoring,
- selector creation from captured selections,
- text quote selector support,
- text position selector support,
- PDF page and rectangle selector support,
- DOM range and structural selector support,
- selector resolution strategies,
- fuzzy re-anchoring strategies,
- anchor confidence scoring,
- ambiguous match handling,
- orphaned annotation detection,
- highlight rendering contracts,
- scroll-to-anchor contracts,
- format-neutral anchor adapter interfaces.
It should provide the technical layer used by:
- citation-work when a user selects text and creates an annotation,
- evidence-binder when a field focuses linked evidence,
- evidence-source when citation recovery finds a passage,
- citation-engine when annotations need selector and resolution semantics,
- citation-evidence when the integrated workspace reopens source context.
Out of Scope
This repository should not own the broader product or domain concerns.
Specifically, it should not own:
- the canonical evidence domain model,
- persistence policy,
- citation card rendering,
- document ingestion pipelines,
- metadata extraction,
- OCR processing,
- external source lookup,
- document collection review UI,
- form-field binding semantics,
- visual guide overlay UI,
- application shell and deployment.
Those responsibilities belong to the appropriate citation-evidence subsystem repositories.
Architectural Position
citation-evidence
integrated product shell
citation-engine
core domain model, services, persistence contracts
evidence-anchor
selectors, anchor creation, resolution, re-anchoring, highlighting contracts
evidence-source
document ingestion and document representations
citation-work
review workspace and annotation UX
evidence-binder
evidence-to-target binding and active evidence state
evidence-anchor should operate close to document representations and viewers, but it should not become a viewer implementation itself.
It should define the anchoring contract and provide reusable anchoring logic.
Design Principles
Selector Redundancy
A durable citation should not rely on a single selector.
Where possible, the system should store multiple selectors that can support each other:
visual selectors
text selectors
structural selectors
context selectors
Viewer Independence
Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework.
Viewer adapters may provide selection and rendering details, but the anchor model should remain portable.
Format Neutrality
The anchoring model should work across paginated and non-paginated documents.
PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts.
Confidence Over Certainty
Anchor resolution should return a confidence level, not just success or failure.
Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved.
The API should make this explicit.
Human Confirmation for Ambiguity
When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target.
Preserve the Quote
Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item.
No Silent Misleading Match
It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage.
Initial Selector Types
The first version should support or prepare for these selector concepts:
TextQuoteSelector
exact selected text plus prefix and suffix context
TextPositionSelector
canonical start and end offsets in normalized text
PdfRectSelector
page number and normalized page rectangles
PdfPageTextSelector
page number plus page-local text offsets
DomRangeSelector
DOM path and range offsets for rendered HTML
StructuralSelector
heading, section, block, or AST path information
FragmentSelector
optional exported fragment or deep-link representation
The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD.
Anchor Resolution Strategy
A typical resolution pipeline should be:
1. Check document and representation identity.
2. Try exact position-based selectors.
3. Verify resolved text against the stored quote.
4. Try PDF page and rectangle selectors where applicable.
5. Try text quote matching with prefix and suffix.
6. Try structural or heading context.
7. Try fuzzy quote matching.
8. Rank candidate matches by confidence.
9. Return resolved, ambiguous, stale, or unresolved status.
The caller should receive enough information to decide whether to:
- highlight the passage,
- ask the user to confirm,
- mark the citation as stale,
- manually reattach the annotation.
Core Interfaces
The repository should eventually provide interfaces similar to:
interface AnchorAdapter {
createSelectors(selection: SelectionCapture): Promise<Selector[]>;
resolveSelectors(
representation: DocumentRepresentation,
selectors: Selector[]
): Promise<AnchorResolution>;
renderHighlight(
target: ResolvedAnchorTarget,
options?: HighlightRenderOptions
): Promise<void>;
scrollToTarget(
target: ResolvedAnchorTarget,
options?: ScrollToTargetOptions
): Promise<void>;
}
Resolution should return structured results:
type AnchorResolution = {
status: "resolved" | "ambiguous" | "unresolved" | "stale";
confidence: number;
candidates: ResolvedAnchorTarget[];
usedSelectorTypes: string[];
warnings?: string[];
};
These interfaces should remain implementation-neutral enough to work with different viewers and document formats.
Expected Dependencies
This repository is expected to depend on shared types from:
citation-engine
annotation, selector, document representation, and evidence-related contracts
It may be consumed by:
citation-work
to create anchors from user selections and reopen evidence
evidence-source
to create anchors from citation recovery results
evidence-binder
to resolve active evidence linked to form fields or claims
citation-evidence
to integrate the complete user experience
It should avoid depending on application-level UI repositories.
First Useful Version
A first useful version of evidence-anchor should provide:
- TypeScript selector types,
- a text quote selector model,
- a text position selector model,
- a PDF rectangle selector model,
- an anchor resolution result model,
- a simple text quote matching function,
- a simple text position resolver,
- a viewer adapter contract,
- enough mock/test data to prove that an annotation can be resolved back into a document representation.
The first version does not need to support every document format, but it should establish the selector redundancy pattern.
Success Criteria
The repository is successful when another subsystem can use it to:
- capture a document selection,
- create one or more selectors,
- store those selectors through citation-engine,
- later resolve them against a document representation,
- receive a confidence-scored target,
- scroll to and highlight that target through a viewer adapter,
- detect when the target is ambiguous or unresolved.
A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context.
Repository Character
This repository should be:
- technically focused,
- algorithm-friendly,
- format-neutral,
- viewer-independent,
- strongly typed,
- explicit about uncertainty,
- careful about false positives,
- reusable across multiple user interfaces,
- small enough to remain a precise anchoring layer.
MVP Coordination — Code Lives Upstream
During the umbrella-first MVP phase (decided 2026-05-24), the source code
for this subsystem does not live in this repository yet. It lives in the
umbrella repo at citation-evidence/src/anchor/.
This INTENT.md documents the intended responsibilities and boundaries. When the anchoring interfaces have stabilized through actual MVP use — in particular, after PDF selector round-trip has been validated end-to-end — the corresponding code extracts into this repository.
Shared contracts (selector taxonomy, viewer adapter contract, canonical text normalization, resolution result shape, allowed dependency edges) are maintained in the umbrella repo:
citation-evidence/wiki/SharedContracts.mdcitation-evidence/wiki/DependencyMap.mdcitation-evidence/docs/decisions/(ADRs)
This subsystem's eventual code must not contradict those documents. Changes to shared contracts happen in the umbrella, not here.
Under the dependency map, evidence-anchor may depend only on
citation-engine. Selector types live in the engine; selector creation,
resolution, fuzzy matching, and highlight rendering live here.
Guiding Statement
evidence-anchor exists to keep cited evidence attached to its source context.