Files

tegwick 605a3c032f Add MVP Coordination section: code lives in citation-evidence umbrella during MVP

Documents the umbrella-first MVP decision (2026-05-24). This repo remains
INTENT-only until the anchoring interfaces stabilize through real product
use (in particular, after PDF selector round-trip is validated end-to-end).
Reaffirms: anchor depends only on engine; selector types in engine, selector
algorithms here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-24 16:51:04 +02:00

12 KiB

Raw Blame History

INTENT

Purpose

This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the citation-evidence ecosystem.

evidence-anchor makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes.

It is responsible for answering the central technical question:

Given a citation or evidence item, how do we find and highlight the cited source passage again?

Primary Utility

The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages.

It should make it possible to:

capture a selected passage from a document viewer,
create redundant selectors for that passage,
resolve selectors back into document context,
scroll the cited passage into view,
highlight the cited passage,
detect unresolved or stale citations,
support fallback and fuzzy re-anchoring strategies,
work across PDFs, Markdown, HTML, and later other document formats.

This repository turns annotations from static marks into navigable source references.

Intended Users

Primary users of this repository are developers and agents implementing citation-evidence subsystems.

They include:

developers building document viewers,
developers building the review workspace,
developers implementing evidence-backed forms,
developers implementing citation recovery,
developers building source ingestion and document representation pipelines,
coding agents that need to understand how citations remain connected to source passages.

End users should experience this repository indirectly whenever they click a citation and the correct source context opens.

Strategic Role

The strategic role of evidence-anchor is to protect the citation-evidence system from fragile, viewer-specific annotations.

Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when:

a PDF is zoomed or resized,
a document is rendered differently,
an HTML page changes layout,
a Markdown document is re-rendered,
a source document is lightly revised,
a citation is exported and later reopened.

evidence-anchor provides the durable technical foundation that makes source-backed evidence trustworthy over time.

Core Concept

The core concept of this repository is the anchor.

An anchor is a resolvable reference to a passage or region in a document.

An anchor is not one thing. It should usually be represented by several complementary selectors.

For example, a PDF anchor may include:

exact quote
prefix and suffix context
canonical text offsets
page number
page-local offsets
normalized page rectangles
text item references

An HTML or Markdown anchor may include:

exact quote
prefix and suffix context
canonical text offsets
DOM range
structural path
heading or section context

The system should use the strongest available selectors first and fall back to more flexible selectors when necessary.

Scope

This repository should own:

selector type definitions related to anchoring,
selector creation from captured selections,
text quote selector support,
text position selector support,
PDF page and rectangle selector support,
DOM range and structural selector support,
selector resolution strategies,
fuzzy re-anchoring strategies,
anchor confidence scoring,
ambiguous match handling,
orphaned annotation detection,
highlight rendering contracts,
scroll-to-anchor contracts,
format-neutral anchor adapter interfaces.

It should provide the technical layer used by:

citation-work when a user selects text and creates an annotation,
evidence-binder when a field focuses linked evidence,
evidence-source when citation recovery finds a passage,
citation-engine when annotations need selector and resolution semantics,
citation-evidence when the integrated workspace reopens source context.

Out of Scope

This repository should not own the broader product or domain concerns.

Specifically, it should not own:

the canonical evidence domain model,
persistence policy,
citation card rendering,
document ingestion pipelines,
metadata extraction,
OCR processing,
external source lookup,
document collection review UI,
form-field binding semantics,
visual guide overlay UI,
application shell and deployment.

Those responsibilities belong to the appropriate citation-evidence subsystem repositories.

Architectural Position

citation-evidence
  integrated product shell

citation-engine
  core domain model, services, persistence contracts

evidence-anchor
  selectors, anchor creation, resolution, re-anchoring, highlighting contracts

evidence-source
  document ingestion and document representations

citation-work
  review workspace and annotation UX

evidence-binder
  evidence-to-target binding and active evidence state

evidence-anchor should operate close to document representations and viewers, but it should not become a viewer implementation itself.

It should define the anchoring contract and provide reusable anchoring logic.

Design Principles

Selector Redundancy

A durable citation should not rely on a single selector.

Where possible, the system should store multiple selectors that can support each other:

visual selectors
text selectors
structural selectors
context selectors

Viewer Independence

Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework.

Viewer adapters may provide selection and rendering details, but the anchor model should remain portable.

Format Neutrality

The anchoring model should work across paginated and non-paginated documents.

PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts.

Confidence Over Certainty

Anchor resolution should return a confidence level, not just success or failure.

Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved.

The API should make this explicit.

Human Confirmation for Ambiguity

When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target.

Preserve the Quote

Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item.

No Silent Misleading Match

It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage.

Initial Selector Types

The first version should support or prepare for these selector concepts:

TextQuoteSelector
  exact selected text plus prefix and suffix context

TextPositionSelector
  canonical start and end offsets in normalized text

PdfRectSelector
  page number and normalized page rectangles

PdfPageTextSelector
  page number plus page-local text offsets

DomRangeSelector
  DOM path and range offsets for rendered HTML

StructuralSelector
  heading, section, block, or AST path information

FragmentSelector
  optional exported fragment or deep-link representation

The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD.

Anchor Resolution Strategy

A typical resolution pipeline should be:

1. Check document and representation identity.
2. Try exact position-based selectors.
3. Verify resolved text against the stored quote.
4. Try PDF page and rectangle selectors where applicable.
5. Try text quote matching with prefix and suffix.
6. Try structural or heading context.
7. Try fuzzy quote matching.
8. Rank candidate matches by confidence.
9. Return resolved, ambiguous, stale, or unresolved status.

The caller should receive enough information to decide whether to:

highlight the passage,
ask the user to confirm,
mark the citation as stale,
manually reattach the annotation.

Core Interfaces

The repository should eventually provide interfaces similar to:

interface AnchorAdapter {
  createSelectors(selection: SelectionCapture): Promise<Selector[]>;

  resolveSelectors(
    representation: DocumentRepresentation,
    selectors: Selector[]
  ): Promise<AnchorResolution>;

  renderHighlight(
    target: ResolvedAnchorTarget,
    options?: HighlightRenderOptions
  ): Promise<void>;

  scrollToTarget(
    target: ResolvedAnchorTarget,
    options?: ScrollToTargetOptions
  ): Promise<void>;
}

Resolution should return structured results:

type AnchorResolution = {
  status: "resolved" | "ambiguous" | "unresolved" | "stale";
  confidence: number;
  candidates: ResolvedAnchorTarget[];
  usedSelectorTypes: string[];
  warnings?: string[];
};

These interfaces should remain implementation-neutral enough to work with different viewers and document formats.

Expected Dependencies

This repository is expected to depend on shared types from:

citation-engine
  annotation, selector, document representation, and evidence-related contracts

It may be consumed by:

citation-work
  to create anchors from user selections and reopen evidence

evidence-source
  to create anchors from citation recovery results

evidence-binder
  to resolve active evidence linked to form fields or claims

citation-evidence
  to integrate the complete user experience

It should avoid depending on application-level UI repositories.

First Useful Version

A first useful version of evidence-anchor should provide:

TypeScript selector types,
a text quote selector model,
a text position selector model,
a PDF rectangle selector model,
an anchor resolution result model,
a simple text quote matching function,
a simple text position resolver,
a viewer adapter contract,
enough mock/test data to prove that an annotation can be resolved back into a document representation.

The first version does not need to support every document format, but it should establish the selector redundancy pattern.

Success Criteria

The repository is successful when another subsystem can use it to:

capture a document selection,
create one or more selectors,
store those selectors through citation-engine,
later resolve them against a document representation,
receive a confidence-scored target,
scroll to and highlight that target through a viewer adapter,
detect when the target is ambiguous or unresolved.

A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context.

Repository Character

This repository should be:

technically focused,
algorithm-friendly,
format-neutral,
viewer-independent,
strongly typed,
explicit about uncertainty,
careful about false positives,
reusable across multiple user interfaces,
small enough to remain a precise anchoring layer.

MVP Coordination — Code Lives Upstream

During the umbrella-first MVP phase (decided 2026-05-24), the source code for this subsystem does not live in this repository yet. It lives in the umbrella repo at citation-evidence/src/anchor/.

This INTENT.md documents the intended responsibilities and boundaries. When the anchoring interfaces have stabilized through actual MVP use — in particular, after PDF selector round-trip has been validated end-to-end — the corresponding code extracts into this repository.

Shared contracts (selector taxonomy, viewer adapter contract, canonical text normalization, resolution result shape, allowed dependency edges) are maintained in the umbrella repo:

citation-evidence/wiki/SharedContracts.md
citation-evidence/wiki/DependencyMap.md
citation-evidence/docs/decisions/ (ADRs)

This subsystem's eventual code must not contradict those documents. Changes to shared contracts happen in the umbrella, not here.

Under the dependency map, evidence-anchor may depend only on citation-engine. Selector types live in the engine; selector creation, resolution, fuzzy matching, and highlight rendering live here.

Guiding Statement

evidence-anchor exists to keep cited evidence attached to its source context.

12 KiB Raw Blame History