generated from coulomb/repo-seed
Documents the umbrella-first MVP decision (2026-05-24). This repo remains INTENT-only until the anchoring interfaces stabilize through real product use (in particular, after PDF selector round-trip is validated end-to-end). Reaffirms: anchor depends only on engine; selector types in engine, selector algorithms here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
438 lines
12 KiB
Markdown
438 lines
12 KiB
Markdown
# INTENT
|
|
|
|
## Purpose
|
|
|
|
This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the **citation-evidence** ecosystem.
|
|
|
|
**evidence-anchor** makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes.
|
|
|
|
It is responsible for answering the central technical question:
|
|
|
|
> Given a citation or evidence item, how do we find and highlight the cited source passage again?
|
|
|
|
---
|
|
|
|
## Primary Utility
|
|
|
|
The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages.
|
|
|
|
It should make it possible to:
|
|
|
|
- capture a selected passage from a document viewer,
|
|
- create redundant selectors for that passage,
|
|
- resolve selectors back into document context,
|
|
- scroll the cited passage into view,
|
|
- highlight the cited passage,
|
|
- detect unresolved or stale citations,
|
|
- support fallback and fuzzy re-anchoring strategies,
|
|
- work across PDFs, Markdown, HTML, and later other document formats.
|
|
|
|
This repository turns annotations from static marks into navigable source references.
|
|
|
|
---
|
|
|
|
## Intended Users
|
|
|
|
Primary users of this repository are developers and agents implementing citation-evidence subsystems.
|
|
|
|
They include:
|
|
|
|
- developers building document viewers,
|
|
- developers building the review workspace,
|
|
- developers implementing evidence-backed forms,
|
|
- developers implementing citation recovery,
|
|
- developers building source ingestion and document representation pipelines,
|
|
- coding agents that need to understand how citations remain connected to source passages.
|
|
|
|
End users should experience this repository indirectly whenever they click a citation and the correct source context opens.
|
|
|
|
---
|
|
|
|
## Strategic Role
|
|
|
|
The strategic role of **evidence-anchor** is to protect the citation-evidence system from fragile, viewer-specific annotations.
|
|
|
|
Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when:
|
|
|
|
- a PDF is zoomed or resized,
|
|
- a document is rendered differently,
|
|
- an HTML page changes layout,
|
|
- a Markdown document is re-rendered,
|
|
- a source document is lightly revised,
|
|
- a citation is exported and later reopened.
|
|
|
|
**evidence-anchor** provides the durable technical foundation that makes source-backed evidence trustworthy over time.
|
|
|
|
---
|
|
|
|
## Core Concept
|
|
|
|
The core concept of this repository is the **anchor**.
|
|
|
|
An anchor is a resolvable reference to a passage or region in a document.
|
|
|
|
An anchor is not one thing. It should usually be represented by several complementary selectors.
|
|
|
|
For example, a PDF anchor may include:
|
|
|
|
```text
|
|
exact quote
|
|
prefix and suffix context
|
|
canonical text offsets
|
|
page number
|
|
page-local offsets
|
|
normalized page rectangles
|
|
text item references
|
|
````
|
|
|
|
An HTML or Markdown anchor may include:
|
|
|
|
```text
|
|
exact quote
|
|
prefix and suffix context
|
|
canonical text offsets
|
|
DOM range
|
|
structural path
|
|
heading or section context
|
|
```
|
|
|
|
The system should use the strongest available selectors first and fall back to more flexible selectors when necessary.
|
|
|
|
---
|
|
|
|
## Scope
|
|
|
|
This repository should own:
|
|
|
|
* selector type definitions related to anchoring,
|
|
* selector creation from captured selections,
|
|
* text quote selector support,
|
|
* text position selector support,
|
|
* PDF page and rectangle selector support,
|
|
* DOM range and structural selector support,
|
|
* selector resolution strategies,
|
|
* fuzzy re-anchoring strategies,
|
|
* anchor confidence scoring,
|
|
* ambiguous match handling,
|
|
* orphaned annotation detection,
|
|
* highlight rendering contracts,
|
|
* scroll-to-anchor contracts,
|
|
* format-neutral anchor adapter interfaces.
|
|
|
|
It should provide the technical layer used by:
|
|
|
|
* **citation-work** when a user selects text and creates an annotation,
|
|
* **evidence-binder** when a field focuses linked evidence,
|
|
* **evidence-source** when citation recovery finds a passage,
|
|
* **citation-engine** when annotations need selector and resolution semantics,
|
|
* **citation-evidence** when the integrated workspace reopens source context.
|
|
|
|
---
|
|
|
|
## Out of Scope
|
|
|
|
This repository should not own the broader product or domain concerns.
|
|
|
|
Specifically, it should not own:
|
|
|
|
* the canonical evidence domain model,
|
|
* persistence policy,
|
|
* citation card rendering,
|
|
* document ingestion pipelines,
|
|
* metadata extraction,
|
|
* OCR processing,
|
|
* external source lookup,
|
|
* document collection review UI,
|
|
* form-field binding semantics,
|
|
* visual guide overlay UI,
|
|
* application shell and deployment.
|
|
|
|
Those responsibilities belong to the appropriate citation-evidence subsystem repositories.
|
|
|
|
---
|
|
|
|
## Architectural Position
|
|
|
|
```text id="6p8rno"
|
|
citation-evidence
|
|
integrated product shell
|
|
|
|
citation-engine
|
|
core domain model, services, persistence contracts
|
|
|
|
evidence-anchor
|
|
selectors, anchor creation, resolution, re-anchoring, highlighting contracts
|
|
|
|
evidence-source
|
|
document ingestion and document representations
|
|
|
|
citation-work
|
|
review workspace and annotation UX
|
|
|
|
evidence-binder
|
|
evidence-to-target binding and active evidence state
|
|
```
|
|
|
|
**evidence-anchor** should operate close to document representations and viewers, but it should not become a viewer implementation itself.
|
|
|
|
It should define the anchoring contract and provide reusable anchoring logic.
|
|
|
|
---
|
|
|
|
## Design Principles
|
|
|
|
### Selector Redundancy
|
|
|
|
A durable citation should not rely on a single selector.
|
|
|
|
Where possible, the system should store multiple selectors that can support each other:
|
|
|
|
```text
|
|
visual selectors
|
|
text selectors
|
|
structural selectors
|
|
context selectors
|
|
```
|
|
|
|
### Viewer Independence
|
|
|
|
Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework.
|
|
|
|
Viewer adapters may provide selection and rendering details, but the anchor model should remain portable.
|
|
|
|
### Format Neutrality
|
|
|
|
The anchoring model should work across paginated and non-paginated documents.
|
|
|
|
PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts.
|
|
|
|
### Confidence Over Certainty
|
|
|
|
Anchor resolution should return a confidence level, not just success or failure.
|
|
|
|
Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved.
|
|
|
|
The API should make this explicit.
|
|
|
|
### Human Confirmation for Ambiguity
|
|
|
|
When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target.
|
|
|
|
### Preserve the Quote
|
|
|
|
Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item.
|
|
|
|
### No Silent Misleading Match
|
|
|
|
It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage.
|
|
|
|
---
|
|
|
|
## Initial Selector Types
|
|
|
|
The first version should support or prepare for these selector concepts:
|
|
|
|
```text id="qtvj9s"
|
|
TextQuoteSelector
|
|
exact selected text plus prefix and suffix context
|
|
|
|
TextPositionSelector
|
|
canonical start and end offsets in normalized text
|
|
|
|
PdfRectSelector
|
|
page number and normalized page rectangles
|
|
|
|
PdfPageTextSelector
|
|
page number plus page-local text offsets
|
|
|
|
DomRangeSelector
|
|
DOM path and range offsets for rendered HTML
|
|
|
|
StructuralSelector
|
|
heading, section, block, or AST path information
|
|
|
|
FragmentSelector
|
|
optional exported fragment or deep-link representation
|
|
```
|
|
|
|
The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD.
|
|
|
|
---
|
|
|
|
## Anchor Resolution Strategy
|
|
|
|
A typical resolution pipeline should be:
|
|
|
|
```text id="eug5uu"
|
|
1. Check document and representation identity.
|
|
2. Try exact position-based selectors.
|
|
3. Verify resolved text against the stored quote.
|
|
4. Try PDF page and rectangle selectors where applicable.
|
|
5. Try text quote matching with prefix and suffix.
|
|
6. Try structural or heading context.
|
|
7. Try fuzzy quote matching.
|
|
8. Rank candidate matches by confidence.
|
|
9. Return resolved, ambiguous, stale, or unresolved status.
|
|
```
|
|
|
|
The caller should receive enough information to decide whether to:
|
|
|
|
* highlight the passage,
|
|
* ask the user to confirm,
|
|
* mark the citation as stale,
|
|
* manually reattach the annotation.
|
|
|
|
---
|
|
|
|
## Core Interfaces
|
|
|
|
The repository should eventually provide interfaces similar to:
|
|
|
|
```ts
|
|
interface AnchorAdapter {
|
|
createSelectors(selection: SelectionCapture): Promise<Selector[]>;
|
|
|
|
resolveSelectors(
|
|
representation: DocumentRepresentation,
|
|
selectors: Selector[]
|
|
): Promise<AnchorResolution>;
|
|
|
|
renderHighlight(
|
|
target: ResolvedAnchorTarget,
|
|
options?: HighlightRenderOptions
|
|
): Promise<void>;
|
|
|
|
scrollToTarget(
|
|
target: ResolvedAnchorTarget,
|
|
options?: ScrollToTargetOptions
|
|
): Promise<void>;
|
|
}
|
|
```
|
|
|
|
Resolution should return structured results:
|
|
|
|
```ts
|
|
type AnchorResolution = {
|
|
status: "resolved" | "ambiguous" | "unresolved" | "stale";
|
|
confidence: number;
|
|
candidates: ResolvedAnchorTarget[];
|
|
usedSelectorTypes: string[];
|
|
warnings?: string[];
|
|
};
|
|
```
|
|
|
|
These interfaces should remain implementation-neutral enough to work with different viewers and document formats.
|
|
|
|
---
|
|
|
|
## Expected Dependencies
|
|
|
|
This repository is expected to depend on shared types from:
|
|
|
|
```text id="nwia4s"
|
|
citation-engine
|
|
annotation, selector, document representation, and evidence-related contracts
|
|
```
|
|
|
|
It may be consumed by:
|
|
|
|
```text id="4ywhgg"
|
|
citation-work
|
|
to create anchors from user selections and reopen evidence
|
|
|
|
evidence-source
|
|
to create anchors from citation recovery results
|
|
|
|
evidence-binder
|
|
to resolve active evidence linked to form fields or claims
|
|
|
|
citation-evidence
|
|
to integrate the complete user experience
|
|
```
|
|
|
|
It should avoid depending on application-level UI repositories.
|
|
|
|
---
|
|
|
|
## First Useful Version
|
|
|
|
A first useful version of **evidence-anchor** should provide:
|
|
|
|
* TypeScript selector types,
|
|
* a text quote selector model,
|
|
* a text position selector model,
|
|
* a PDF rectangle selector model,
|
|
* an anchor resolution result model,
|
|
* a simple text quote matching function,
|
|
* a simple text position resolver,
|
|
* a viewer adapter contract,
|
|
* enough mock/test data to prove that an annotation can be resolved back into a document representation.
|
|
|
|
The first version does not need to support every document format, but it should establish the selector redundancy pattern.
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
The repository is successful when another subsystem can use it to:
|
|
|
|
1. capture a document selection,
|
|
2. create one or more selectors,
|
|
3. store those selectors through **citation-engine**,
|
|
4. later resolve them against a document representation,
|
|
5. receive a confidence-scored target,
|
|
6. scroll to and highlight that target through a viewer adapter,
|
|
7. detect when the target is ambiguous or unresolved.
|
|
|
|
A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context.
|
|
|
|
---
|
|
|
|
## Repository Character
|
|
|
|
This repository should be:
|
|
|
|
* technically focused,
|
|
* algorithm-friendly,
|
|
* format-neutral,
|
|
* viewer-independent,
|
|
* strongly typed,
|
|
* explicit about uncertainty,
|
|
* careful about false positives,
|
|
* reusable across multiple user interfaces,
|
|
* small enough to remain a precise anchoring layer.
|
|
|
|
---
|
|
|
|
## MVP Coordination — Code Lives Upstream
|
|
|
|
During the umbrella-first MVP phase (decided 2026-05-24), **the source code
|
|
for this subsystem does not live in this repository yet**. It lives in the
|
|
umbrella repo at `citation-evidence/src/anchor/`.
|
|
|
|
This INTENT.md documents the *intended* responsibilities and boundaries.
|
|
When the anchoring interfaces have stabilized through actual MVP use — in
|
|
particular, after PDF selector round-trip has been validated end-to-end —
|
|
the corresponding code extracts into this repository.
|
|
|
|
**Shared contracts** (selector taxonomy, viewer adapter contract, canonical
|
|
text normalization, resolution result shape, allowed dependency edges) are
|
|
maintained in the umbrella repo:
|
|
|
|
* `citation-evidence/wiki/SharedContracts.md`
|
|
* `citation-evidence/wiki/DependencyMap.md`
|
|
* `citation-evidence/docs/decisions/` (ADRs)
|
|
|
|
This subsystem's eventual code must not contradict those documents. Changes
|
|
to shared contracts happen in the umbrella, not here.
|
|
|
|
Under the dependency map, **`evidence-anchor` may depend only on
|
|
`citation-engine`**. Selector *types* live in the engine; selector *creation,
|
|
resolution, fuzzy matching, and highlight rendering* live here.
|
|
|
|
---
|
|
|
|
## Guiding Statement
|
|
|
|
**evidence-anchor exists to keep cited evidence attached to its source context.**
|