generated from coulomb/repo-seed
Seeded with INTENT.md
This commit is contained in:
409
INTENT.md
Normal file
409
INTENT.md
Normal file
@@ -0,0 +1,409 @@
|
||||
# INTENT
|
||||
|
||||
## Purpose
|
||||
|
||||
This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the **citation-evidence** ecosystem.
|
||||
|
||||
**evidence-anchor** makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes.
|
||||
|
||||
It is responsible for answering the central technical question:
|
||||
|
||||
> Given a citation or evidence item, how do we find and highlight the cited source passage again?
|
||||
|
||||
---
|
||||
|
||||
## Primary Utility
|
||||
|
||||
The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages.
|
||||
|
||||
It should make it possible to:
|
||||
|
||||
- capture a selected passage from a document viewer,
|
||||
- create redundant selectors for that passage,
|
||||
- resolve selectors back into document context,
|
||||
- scroll the cited passage into view,
|
||||
- highlight the cited passage,
|
||||
- detect unresolved or stale citations,
|
||||
- support fallback and fuzzy re-anchoring strategies,
|
||||
- work across PDFs, Markdown, HTML, and later other document formats.
|
||||
|
||||
This repository turns annotations from static marks into navigable source references.
|
||||
|
||||
---
|
||||
|
||||
## Intended Users
|
||||
|
||||
Primary users of this repository are developers and agents implementing citation-evidence subsystems.
|
||||
|
||||
They include:
|
||||
|
||||
- developers building document viewers,
|
||||
- developers building the review workspace,
|
||||
- developers implementing evidence-backed forms,
|
||||
- developers implementing citation recovery,
|
||||
- developers building source ingestion and document representation pipelines,
|
||||
- coding agents that need to understand how citations remain connected to source passages.
|
||||
|
||||
End users should experience this repository indirectly whenever they click a citation and the correct source context opens.
|
||||
|
||||
---
|
||||
|
||||
## Strategic Role
|
||||
|
||||
The strategic role of **evidence-anchor** is to protect the citation-evidence system from fragile, viewer-specific annotations.
|
||||
|
||||
Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when:
|
||||
|
||||
- a PDF is zoomed or resized,
|
||||
- a document is rendered differently,
|
||||
- an HTML page changes layout,
|
||||
- a Markdown document is re-rendered,
|
||||
- a source document is lightly revised,
|
||||
- a citation is exported and later reopened.
|
||||
|
||||
**evidence-anchor** provides the durable technical foundation that makes source-backed evidence trustworthy over time.
|
||||
|
||||
---
|
||||
|
||||
## Core Concept
|
||||
|
||||
The core concept of this repository is the **anchor**.
|
||||
|
||||
An anchor is a resolvable reference to a passage or region in a document.
|
||||
|
||||
An anchor is not one thing. It should usually be represented by several complementary selectors.
|
||||
|
||||
For example, a PDF anchor may include:
|
||||
|
||||
```text
|
||||
exact quote
|
||||
prefix and suffix context
|
||||
canonical text offsets
|
||||
page number
|
||||
page-local offsets
|
||||
normalized page rectangles
|
||||
text item references
|
||||
````
|
||||
|
||||
An HTML or Markdown anchor may include:
|
||||
|
||||
```text
|
||||
exact quote
|
||||
prefix and suffix context
|
||||
canonical text offsets
|
||||
DOM range
|
||||
structural path
|
||||
heading or section context
|
||||
```
|
||||
|
||||
The system should use the strongest available selectors first and fall back to more flexible selectors when necessary.
|
||||
|
||||
---
|
||||
|
||||
## Scope
|
||||
|
||||
This repository should own:
|
||||
|
||||
* selector type definitions related to anchoring,
|
||||
* selector creation from captured selections,
|
||||
* text quote selector support,
|
||||
* text position selector support,
|
||||
* PDF page and rectangle selector support,
|
||||
* DOM range and structural selector support,
|
||||
* selector resolution strategies,
|
||||
* fuzzy re-anchoring strategies,
|
||||
* anchor confidence scoring,
|
||||
* ambiguous match handling,
|
||||
* orphaned annotation detection,
|
||||
* highlight rendering contracts,
|
||||
* scroll-to-anchor contracts,
|
||||
* format-neutral anchor adapter interfaces.
|
||||
|
||||
It should provide the technical layer used by:
|
||||
|
||||
* **citation-work** when a user selects text and creates an annotation,
|
||||
* **evidence-binder** when a field focuses linked evidence,
|
||||
* **evidence-source** when citation recovery finds a passage,
|
||||
* **citation-engine** when annotations need selector and resolution semantics,
|
||||
* **citation-evidence** when the integrated workspace reopens source context.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
This repository should not own the broader product or domain concerns.
|
||||
|
||||
Specifically, it should not own:
|
||||
|
||||
* the canonical evidence domain model,
|
||||
* persistence policy,
|
||||
* citation card rendering,
|
||||
* document ingestion pipelines,
|
||||
* metadata extraction,
|
||||
* OCR processing,
|
||||
* external source lookup,
|
||||
* document collection review UI,
|
||||
* form-field binding semantics,
|
||||
* visual guide overlay UI,
|
||||
* application shell and deployment.
|
||||
|
||||
Those responsibilities belong to the appropriate citation-evidence subsystem repositories.
|
||||
|
||||
---
|
||||
|
||||
## Architectural Position
|
||||
|
||||
```text id="6p8rno"
|
||||
citation-evidence
|
||||
integrated product shell
|
||||
|
||||
citation-engine
|
||||
core domain model, services, persistence contracts
|
||||
|
||||
evidence-anchor
|
||||
selectors, anchor creation, resolution, re-anchoring, highlighting contracts
|
||||
|
||||
evidence-source
|
||||
document ingestion and document representations
|
||||
|
||||
citation-work
|
||||
review workspace and annotation UX
|
||||
|
||||
evidence-binder
|
||||
evidence-to-target binding and active evidence state
|
||||
```
|
||||
|
||||
**evidence-anchor** should operate close to document representations and viewers, but it should not become a viewer implementation itself.
|
||||
|
||||
It should define the anchoring contract and provide reusable anchoring logic.
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
### Selector Redundancy
|
||||
|
||||
A durable citation should not rely on a single selector.
|
||||
|
||||
Where possible, the system should store multiple selectors that can support each other:
|
||||
|
||||
```text
|
||||
visual selectors
|
||||
text selectors
|
||||
structural selectors
|
||||
context selectors
|
||||
```
|
||||
|
||||
### Viewer Independence
|
||||
|
||||
Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework.
|
||||
|
||||
Viewer adapters may provide selection and rendering details, but the anchor model should remain portable.
|
||||
|
||||
### Format Neutrality
|
||||
|
||||
The anchoring model should work across paginated and non-paginated documents.
|
||||
|
||||
PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts.
|
||||
|
||||
### Confidence Over Certainty
|
||||
|
||||
Anchor resolution should return a confidence level, not just success or failure.
|
||||
|
||||
Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved.
|
||||
|
||||
The API should make this explicit.
|
||||
|
||||
### Human Confirmation for Ambiguity
|
||||
|
||||
When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target.
|
||||
|
||||
### Preserve the Quote
|
||||
|
||||
Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item.
|
||||
|
||||
### No Silent Misleading Match
|
||||
|
||||
It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage.
|
||||
|
||||
---
|
||||
|
||||
## Initial Selector Types
|
||||
|
||||
The first version should support or prepare for these selector concepts:
|
||||
|
||||
```text id="qtvj9s"
|
||||
TextQuoteSelector
|
||||
exact selected text plus prefix and suffix context
|
||||
|
||||
TextPositionSelector
|
||||
canonical start and end offsets in normalized text
|
||||
|
||||
PdfRectSelector
|
||||
page number and normalized page rectangles
|
||||
|
||||
PdfPageTextSelector
|
||||
page number plus page-local text offsets
|
||||
|
||||
DomRangeSelector
|
||||
DOM path and range offsets for rendered HTML
|
||||
|
||||
StructuralSelector
|
||||
heading, section, block, or AST path information
|
||||
|
||||
FragmentSelector
|
||||
optional exported fragment or deep-link representation
|
||||
```
|
||||
|
||||
The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD.
|
||||
|
||||
---
|
||||
|
||||
## Anchor Resolution Strategy
|
||||
|
||||
A typical resolution pipeline should be:
|
||||
|
||||
```text id="eug5uu"
|
||||
1. Check document and representation identity.
|
||||
2. Try exact position-based selectors.
|
||||
3. Verify resolved text against the stored quote.
|
||||
4. Try PDF page and rectangle selectors where applicable.
|
||||
5. Try text quote matching with prefix and suffix.
|
||||
6. Try structural or heading context.
|
||||
7. Try fuzzy quote matching.
|
||||
8. Rank candidate matches by confidence.
|
||||
9. Return resolved, ambiguous, stale, or unresolved status.
|
||||
```
|
||||
|
||||
The caller should receive enough information to decide whether to:
|
||||
|
||||
* highlight the passage,
|
||||
* ask the user to confirm,
|
||||
* mark the citation as stale,
|
||||
* manually reattach the annotation.
|
||||
|
||||
---
|
||||
|
||||
## Core Interfaces
|
||||
|
||||
The repository should eventually provide interfaces similar to:
|
||||
|
||||
```ts
|
||||
interface AnchorAdapter {
|
||||
createSelectors(selection: SelectionCapture): Promise<Selector[]>;
|
||||
|
||||
resolveSelectors(
|
||||
representation: DocumentRepresentation,
|
||||
selectors: Selector[]
|
||||
): Promise<AnchorResolution>;
|
||||
|
||||
renderHighlight(
|
||||
target: ResolvedAnchorTarget,
|
||||
options?: HighlightRenderOptions
|
||||
): Promise<void>;
|
||||
|
||||
scrollToTarget(
|
||||
target: ResolvedAnchorTarget,
|
||||
options?: ScrollToTargetOptions
|
||||
): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
Resolution should return structured results:
|
||||
|
||||
```ts
|
||||
type AnchorResolution = {
|
||||
status: "resolved" | "ambiguous" | "unresolved" | "stale";
|
||||
confidence: number;
|
||||
candidates: ResolvedAnchorTarget[];
|
||||
usedSelectorTypes: string[];
|
||||
warnings?: string[];
|
||||
};
|
||||
```
|
||||
|
||||
These interfaces should remain implementation-neutral enough to work with different viewers and document formats.
|
||||
|
||||
---
|
||||
|
||||
## Expected Dependencies
|
||||
|
||||
This repository is expected to depend on shared types from:
|
||||
|
||||
```text id="nwia4s"
|
||||
citation-engine
|
||||
annotation, selector, document representation, and evidence-related contracts
|
||||
```
|
||||
|
||||
It may be consumed by:
|
||||
|
||||
```text id="4ywhgg"
|
||||
citation-work
|
||||
to create anchors from user selections and reopen evidence
|
||||
|
||||
evidence-source
|
||||
to create anchors from citation recovery results
|
||||
|
||||
evidence-binder
|
||||
to resolve active evidence linked to form fields or claims
|
||||
|
||||
citation-evidence
|
||||
to integrate the complete user experience
|
||||
```
|
||||
|
||||
It should avoid depending on application-level UI repositories.
|
||||
|
||||
---
|
||||
|
||||
## First Useful Version
|
||||
|
||||
A first useful version of **evidence-anchor** should provide:
|
||||
|
||||
* TypeScript selector types,
|
||||
* a text quote selector model,
|
||||
* a text position selector model,
|
||||
* a PDF rectangle selector model,
|
||||
* an anchor resolution result model,
|
||||
* a simple text quote matching function,
|
||||
* a simple text position resolver,
|
||||
* a viewer adapter contract,
|
||||
* enough mock/test data to prove that an annotation can be resolved back into a document representation.
|
||||
|
||||
The first version does not need to support every document format, but it should establish the selector redundancy pattern.
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
The repository is successful when another subsystem can use it to:
|
||||
|
||||
1. capture a document selection,
|
||||
2. create one or more selectors,
|
||||
3. store those selectors through **citation-engine**,
|
||||
4. later resolve them against a document representation,
|
||||
5. receive a confidence-scored target,
|
||||
6. scroll to and highlight that target through a viewer adapter,
|
||||
7. detect when the target is ambiguous or unresolved.
|
||||
|
||||
A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context.
|
||||
|
||||
---
|
||||
|
||||
## Repository Character
|
||||
|
||||
This repository should be:
|
||||
|
||||
* technically focused,
|
||||
* algorithm-friendly,
|
||||
* format-neutral,
|
||||
* viewer-independent,
|
||||
* strongly typed,
|
||||
* explicit about uncertainty,
|
||||
* careful about false positives,
|
||||
* reusable across multiple user interfaces,
|
||||
* small enough to remain a precise anchoring layer.
|
||||
|
||||
---
|
||||
|
||||
## Guiding Statement
|
||||
|
||||
**evidence-anchor exists to keep cited evidence attached to its source context.**
|
||||
Reference in New Issue
Block a user