# INTENT ## Purpose This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the **citation-evidence** ecosystem. **evidence-anchor** makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes. It is responsible for answering the central technical question: > Given a citation or evidence item, how do we find and highlight the cited source passage again? --- ## Primary Utility The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages. It should make it possible to: - capture a selected passage from a document viewer, - create redundant selectors for that passage, - resolve selectors back into document context, - scroll the cited passage into view, - highlight the cited passage, - detect unresolved or stale citations, - support fallback and fuzzy re-anchoring strategies, - work across PDFs, Markdown, HTML, and later other document formats. This repository turns annotations from static marks into navigable source references. --- ## Intended Users Primary users of this repository are developers and agents implementing citation-evidence subsystems. They include: - developers building document viewers, - developers building the review workspace, - developers implementing evidence-backed forms, - developers implementing citation recovery, - developers building source ingestion and document representation pipelines, - coding agents that need to understand how citations remain connected to source passages. End users should experience this repository indirectly whenever they click a citation and the correct source context opens. --- ## Strategic Role The strategic role of **evidence-anchor** is to protect the citation-evidence system from fragile, viewer-specific annotations. Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when: - a PDF is zoomed or resized, - a document is rendered differently, - an HTML page changes layout, - a Markdown document is re-rendered, - a source document is lightly revised, - a citation is exported and later reopened. **evidence-anchor** provides the durable technical foundation that makes source-backed evidence trustworthy over time. --- ## Core Concept The core concept of this repository is the **anchor**. An anchor is a resolvable reference to a passage or region in a document. An anchor is not one thing. It should usually be represented by several complementary selectors. For example, a PDF anchor may include: ```text exact quote prefix and suffix context canonical text offsets page number page-local offsets normalized page rectangles text item references ```` An HTML or Markdown anchor may include: ```text exact quote prefix and suffix context canonical text offsets DOM range structural path heading or section context ``` The system should use the strongest available selectors first and fall back to more flexible selectors when necessary. --- ## Scope This repository should own: * selector type definitions related to anchoring, * selector creation from captured selections, * text quote selector support, * text position selector support, * PDF page and rectangle selector support, * DOM range and structural selector support, * selector resolution strategies, * fuzzy re-anchoring strategies, * anchor confidence scoring, * ambiguous match handling, * orphaned annotation detection, * highlight rendering contracts, * scroll-to-anchor contracts, * format-neutral anchor adapter interfaces. It should provide the technical layer used by: * **citation-work** when a user selects text and creates an annotation, * **evidence-binder** when a field focuses linked evidence, * **evidence-source** when citation recovery finds a passage, * **citation-engine** when annotations need selector and resolution semantics, * **citation-evidence** when the integrated workspace reopens source context. --- ## Out of Scope This repository should not own the broader product or domain concerns. Specifically, it should not own: * the canonical evidence domain model, * persistence policy, * citation card rendering, * document ingestion pipelines, * metadata extraction, * OCR processing, * external source lookup, * document collection review UI, * form-field binding semantics, * visual guide overlay UI, * application shell and deployment. Those responsibilities belong to the appropriate citation-evidence subsystem repositories. --- ## Architectural Position ```text id="6p8rno" citation-evidence integrated product shell citation-engine core domain model, services, persistence contracts evidence-anchor selectors, anchor creation, resolution, re-anchoring, highlighting contracts evidence-source document ingestion and document representations citation-work review workspace and annotation UX evidence-binder evidence-to-target binding and active evidence state ``` **evidence-anchor** should operate close to document representations and viewers, but it should not become a viewer implementation itself. It should define the anchoring contract and provide reusable anchoring logic. --- ## Design Principles ### Selector Redundancy A durable citation should not rely on a single selector. Where possible, the system should store multiple selectors that can support each other: ```text visual selectors text selectors structural selectors context selectors ``` ### Viewer Independence Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework. Viewer adapters may provide selection and rendering details, but the anchor model should remain portable. ### Format Neutrality The anchoring model should work across paginated and non-paginated documents. PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts. ### Confidence Over Certainty Anchor resolution should return a confidence level, not just success or failure. Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved. The API should make this explicit. ### Human Confirmation for Ambiguity When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target. ### Preserve the Quote Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item. ### No Silent Misleading Match It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage. --- ## Initial Selector Types The first version should support or prepare for these selector concepts: ```text id="qtvj9s" TextQuoteSelector exact selected text plus prefix and suffix context TextPositionSelector canonical start and end offsets in normalized text PdfRectSelector page number and normalized page rectangles PdfPageTextSelector page number plus page-local text offsets DomRangeSelector DOM path and range offsets for rendered HTML StructuralSelector heading, section, block, or AST path information FragmentSelector optional exported fragment or deep-link representation ``` The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD. --- ## Anchor Resolution Strategy A typical resolution pipeline should be: ```text id="eug5uu" 1. Check document and representation identity. 2. Try exact position-based selectors. 3. Verify resolved text against the stored quote. 4. Try PDF page and rectangle selectors where applicable. 5. Try text quote matching with prefix and suffix. 6. Try structural or heading context. 7. Try fuzzy quote matching. 8. Rank candidate matches by confidence. 9. Return resolved, ambiguous, stale, or unresolved status. ``` The caller should receive enough information to decide whether to: * highlight the passage, * ask the user to confirm, * mark the citation as stale, * manually reattach the annotation. --- ## Core Interfaces The repository should eventually provide interfaces similar to: ```ts interface AnchorAdapter { createSelectors(selection: SelectionCapture): Promise; resolveSelectors( representation: DocumentRepresentation, selectors: Selector[] ): Promise; renderHighlight( target: ResolvedAnchorTarget, options?: HighlightRenderOptions ): Promise; scrollToTarget( target: ResolvedAnchorTarget, options?: ScrollToTargetOptions ): Promise; } ``` Resolution should return structured results: ```ts type AnchorResolution = { status: "resolved" | "ambiguous" | "unresolved" | "stale"; confidence: number; candidates: ResolvedAnchorTarget[]; usedSelectorTypes: string[]; warnings?: string[]; }; ``` These interfaces should remain implementation-neutral enough to work with different viewers and document formats. --- ## Expected Dependencies This repository is expected to depend on shared types from: ```text id="nwia4s" citation-engine annotation, selector, document representation, and evidence-related contracts ``` It may be consumed by: ```text id="4ywhgg" citation-work to create anchors from user selections and reopen evidence evidence-source to create anchors from citation recovery results evidence-binder to resolve active evidence linked to form fields or claims citation-evidence to integrate the complete user experience ``` It should avoid depending on application-level UI repositories. --- ## First Useful Version A first useful version of **evidence-anchor** should provide: * TypeScript selector types, * a text quote selector model, * a text position selector model, * a PDF rectangle selector model, * an anchor resolution result model, * a simple text quote matching function, * a simple text position resolver, * a viewer adapter contract, * enough mock/test data to prove that an annotation can be resolved back into a document representation. The first version does not need to support every document format, but it should establish the selector redundancy pattern. --- ## Success Criteria The repository is successful when another subsystem can use it to: 1. capture a document selection, 2. create one or more selectors, 3. store those selectors through **citation-engine**, 4. later resolve them against a document representation, 5. receive a confidence-scored target, 6. scroll to and highlight that target through a viewer adapter, 7. detect when the target is ambiguous or unresolved. A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context. --- ## Repository Character This repository should be: * technically focused, * algorithm-friendly, * format-neutral, * viewer-independent, * strongly typed, * explicit about uncertainty, * careful about false positives, * reusable across multiple user interfaces, * small enough to remain a precise anchoring layer. --- ## MVP Coordination — Code Lives Upstream During the umbrella-first MVP phase (decided 2026-05-24), **the source code for this subsystem does not live in this repository yet**. It lives in the umbrella repo at `citation-evidence/src/anchor/`. This INTENT.md documents the *intended* responsibilities and boundaries. When the anchoring interfaces have stabilized through actual MVP use — in particular, after PDF selector round-trip has been validated end-to-end — the corresponding code extracts into this repository. **Shared contracts** (selector taxonomy, viewer adapter contract, canonical text normalization, resolution result shape, allowed dependency edges) are maintained in the umbrella repo: * `citation-evidence/wiki/SharedContracts.md` * `citation-evidence/wiki/DependencyMap.md` * `citation-evidence/docs/decisions/` (ADRs) This subsystem's eventual code must not contradict those documents. Changes to shared contracts happen in the umbrella, not here. Under the dependency map, **`evidence-anchor` may depend only on `citation-engine`**. Selector *types* live in the engine; selector *creation, resolution, fuzzy matching, and highlight rendering* live here. --- ## Guiding Statement **evidence-anchor exists to keep cited evidence attached to its source context.**