Seeded with INTENT.md

2026-05-24 15:48:02 +02:00
parent a5cea5bfc5
commit 0adb4b9bfd
2 changed files with 410 additions and 3 deletions
--- a/INTENT.md
+++ b/INTENT.md
@@ -0,0 +1,409 @@
+# INTENT
+
+## Purpose
+
+This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the **citation-evidence** ecosystem.
+
+**evidence-anchor** makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes.
+
+It is responsible for answering the central technical question:
+
+> Given a citation or evidence item, how do we find and highlight the cited source passage again?
+
+---
+
+## Primary Utility
+
+The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages.
+
+It should make it possible to:
+
+- capture a selected passage from a document viewer,
+- create redundant selectors for that passage,
+- resolve selectors back into document context,
+- scroll the cited passage into view,
+- highlight the cited passage,
+- detect unresolved or stale citations,
+- support fallback and fuzzy re-anchoring strategies,
+- work across PDFs, Markdown, HTML, and later other document formats.
+
+This repository turns annotations from static marks into navigable source references.
+
+---
+
+## Intended Users
+
+Primary users of this repository are developers and agents implementing citation-evidence subsystems.
+
+They include:
+
+- developers building document viewers,
+- developers building the review workspace,
+- developers implementing evidence-backed forms,
+- developers implementing citation recovery,
+- developers building source ingestion and document representation pipelines,
+- coding agents that need to understand how citations remain connected to source passages.
+
+End users should experience this repository indirectly whenever they click a citation and the correct source context opens.
+
+---
+
+## Strategic Role
+
+The strategic role of **evidence-anchor** is to protect the citation-evidence system from fragile, viewer-specific annotations.
+
+Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when:
+
+- a PDF is zoomed or resized,
+- a document is rendered differently,
+- an HTML page changes layout,
+- a Markdown document is re-rendered,
+- a source document is lightly revised,
+- a citation is exported and later reopened.
+
+**evidence-anchor** provides the durable technical foundation that makes source-backed evidence trustworthy over time.
+
+---
+
+## Core Concept
+
+The core concept of this repository is the **anchor**.
+
+An anchor is a resolvable reference to a passage or region in a document.
+
+An anchor is not one thing. It should usually be represented by several complementary selectors.
+
+For example, a PDF anchor may include:
+
+```text
+exact quote
+prefix and suffix context
+canonical text offsets
+page number
+page-local offsets
+normalized page rectangles
+text item references
+````
+
+An HTML or Markdown anchor may include:
+
+```text
+exact quote
+prefix and suffix context
+canonical text offsets
+DOM range
+structural path
+heading or section context
+```
+
+The system should use the strongest available selectors first and fall back to more flexible selectors when necessary.
+
+---
+
+## Scope
+
+This repository should own:
+
+* selector type definitions related to anchoring,
+* selector creation from captured selections,
+* text quote selector support,
+* text position selector support,
+* PDF page and rectangle selector support,
+* DOM range and structural selector support,
+* selector resolution strategies,
+* fuzzy re-anchoring strategies,
+* anchor confidence scoring,
+* ambiguous match handling,
+* orphaned annotation detection,
+* highlight rendering contracts,
+* scroll-to-anchor contracts,
+* format-neutral anchor adapter interfaces.
+
+It should provide the technical layer used by:
+
+* **citation-work** when a user selects text and creates an annotation,
+* **evidence-binder** when a field focuses linked evidence,
+* **evidence-source** when citation recovery finds a passage,
+* **citation-engine** when annotations need selector and resolution semantics,
+* **citation-evidence** when the integrated workspace reopens source context.
+
+---
+
+## Out of Scope
+
+This repository should not own the broader product or domain concerns.
+
+Specifically, it should not own:
+
+* the canonical evidence domain model,
+* persistence policy,
+* citation card rendering,
+* document ingestion pipelines,
+* metadata extraction,
+* OCR processing,
+* external source lookup,
+* document collection review UI,
+* form-field binding semantics,
+* visual guide overlay UI,
+* application shell and deployment.
+
+Those responsibilities belong to the appropriate citation-evidence subsystem repositories.
+
+---
+
+## Architectural Position
+
+```text id="6p8rno"
+citation-evidence
+  integrated product shell
+
+citation-engine
+  core domain model, services, persistence contracts
+
+evidence-anchor
+  selectors, anchor creation, resolution, re-anchoring, highlighting contracts
+
+evidence-source
+  document ingestion and document representations
+
+citation-work
+  review workspace and annotation UX
+
+evidence-binder
+  evidence-to-target binding and active evidence state
+```
+
+**evidence-anchor** should operate close to document representations and viewers, but it should not become a viewer implementation itself.
+
+It should define the anchoring contract and provide reusable anchoring logic.
+
+---
+
+## Design Principles
+
+### Selector Redundancy
+
+A durable citation should not rely on a single selector.
+
+Where possible, the system should store multiple selectors that can support each other:
+
+```text
+visual selectors
+text selectors
+structural selectors
+context selectors
+```
+
+### Viewer Independence
+
+Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework.
+
+Viewer adapters may provide selection and rendering details, but the anchor model should remain portable.
+
+### Format Neutrality
+
+The anchoring model should work across paginated and non-paginated documents.
+
+PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts.
+
+### Confidence Over Certainty
+
+Anchor resolution should return a confidence level, not just success or failure.
+
+Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved.
+
+The API should make this explicit.
+
+### Human Confirmation for Ambiguity
+
+When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target.
+
+### Preserve the Quote
+
+Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item.
+
+### No Silent Misleading Match
+
+It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage.
+
+---
+
+## Initial Selector Types
+
+The first version should support or prepare for these selector concepts:
+
+```text id="qtvj9s"
+TextQuoteSelector
+  exact selected text plus prefix and suffix context
+
+TextPositionSelector
+  canonical start and end offsets in normalized text
+
+PdfRectSelector
+  page number and normalized page rectangles
+
+PdfPageTextSelector
+  page number plus page-local text offsets
+
+DomRangeSelector
+  DOM path and range offsets for rendered HTML
+
+StructuralSelector
+  heading, section, block, or AST path information
+
+FragmentSelector
+  optional exported fragment or deep-link representation
+```
+
+The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD.
+
+---
+
+## Anchor Resolution Strategy
+
+A typical resolution pipeline should be:
+
+```text id="eug5uu"
+1. Check document and representation identity.
+2. Try exact position-based selectors.
+3. Verify resolved text against the stored quote.
+4. Try PDF page and rectangle selectors where applicable.
+5. Try text quote matching with prefix and suffix.
+6. Try structural or heading context.
+7. Try fuzzy quote matching.
+8. Rank candidate matches by confidence.
+9. Return resolved, ambiguous, stale, or unresolved status.
+```
+
+The caller should receive enough information to decide whether to:
+
+* highlight the passage,
+* ask the user to confirm,
+* mark the citation as stale,
+* manually reattach the annotation.
+
+---
+
+## Core Interfaces
+
+The repository should eventually provide interfaces similar to:
+
+```ts
+interface AnchorAdapter {
+  createSelectors(selection: SelectionCapture): Promise<Selector[]>;
+
+  resolveSelectors(
+    representation: DocumentRepresentation,
+    selectors: Selector[]
+  ): Promise<AnchorResolution>;
+
+  renderHighlight(
+    target: ResolvedAnchorTarget,
+    options?: HighlightRenderOptions
+  ): Promise<void>;
+
+  scrollToTarget(
+    target: ResolvedAnchorTarget,
+    options?: ScrollToTargetOptions
+  ): Promise<void>;
+}
+```
+
+Resolution should return structured results:
+
+```ts
+type AnchorResolution = {
+  status: "resolved" | "ambiguous" | "unresolved" | "stale";
+  confidence: number;
+  candidates: ResolvedAnchorTarget[];
+  usedSelectorTypes: string[];
+  warnings?: string[];
+};
+```
+
+These interfaces should remain implementation-neutral enough to work with different viewers and document formats.
+
+---
+
+## Expected Dependencies
+
+This repository is expected to depend on shared types from:
+
+```text id="nwia4s"
+citation-engine
+  annotation, selector, document representation, and evidence-related contracts
+```
+
+It may be consumed by:
+
+```text id="4ywhgg"
+citation-work
+  to create anchors from user selections and reopen evidence
+
+evidence-source
+  to create anchors from citation recovery results
+
+evidence-binder
+  to resolve active evidence linked to form fields or claims
+
+citation-evidence
+  to integrate the complete user experience
+```
+
+It should avoid depending on application-level UI repositories.
+
+---
+
+## First Useful Version
+
+A first useful version of **evidence-anchor** should provide:
+
+* TypeScript selector types,
+* a text quote selector model,
+* a text position selector model,
+* a PDF rectangle selector model,
+* an anchor resolution result model,
+* a simple text quote matching function,
+* a simple text position resolver,
+* a viewer adapter contract,
+* enough mock/test data to prove that an annotation can be resolved back into a document representation.
+
+The first version does not need to support every document format, but it should establish the selector redundancy pattern.
+
+---
+
+## Success Criteria
+
+The repository is successful when another subsystem can use it to:
+
+1. capture a document selection,
+2. create one or more selectors,
+3. store those selectors through **citation-engine**,
+4. later resolve them against a document representation,
+5. receive a confidence-scored target,
+6. scroll to and highlight that target through a viewer adapter,
+7. detect when the target is ambiguous or unresolved.
+
+A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context.
+
+---
+
+## Repository Character
+
+This repository should be:
+
+* technically focused,
+* algorithm-friendly,
+* format-neutral,
+* viewer-independent,
+* strongly typed,
+* explicit about uncertainty,
+* careful about false positives,
+* reusable across multiple user interfaces,
+* small enough to remain a precise anchoring layer.
+
+---
+
+## Guiding Statement
+
+**evidence-anchor exists to keep cited evidence attached to its source context.**
--- a/README.md
+++ b/README.md
@@ -1,3 +1 @@
-# repo-seed
-
-A git repository template to bootstrap coulomb projects from.
+Evidence anchoring, selector, resolution for citations.