From 0adb4b9bfdeb06339d66ebc3525b4234034a0203 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 24 May 2026 15:48:02 +0200 Subject: [PATCH] Seeded with INTENT.md --- INTENT.md | 409 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 4 +- 2 files changed, 410 insertions(+), 3 deletions(-) create mode 100644 INTENT.md diff --git a/INTENT.md b/INTENT.md new file mode 100644 index 0000000..8fde821 --- /dev/null +++ b/INTENT.md @@ -0,0 +1,409 @@ +# INTENT + +## Purpose + +This repository exists to provide the anchoring, selector, resolution, and highlighting layer for the **citation-evidence** ecosystem. + +**evidence-anchor** makes citations durable and reopenable across document formats, viewer implementations, viewport changes, zoom levels, layout changes, and moderate source changes. + +It is responsible for answering the central technical question: + +> Given a citation or evidence item, how do we find and highlight the cited source passage again? + +--- + +## Primary Utility + +The repository provides the mechanisms needed to create, store, resolve, and render anchors for document passages. + +It should make it possible to: + +- capture a selected passage from a document viewer, +- create redundant selectors for that passage, +- resolve selectors back into document context, +- scroll the cited passage into view, +- highlight the cited passage, +- detect unresolved or stale citations, +- support fallback and fuzzy re-anchoring strategies, +- work across PDFs, Markdown, HTML, and later other document formats. + +This repository turns annotations from static marks into navigable source references. + +--- + +## Intended Users + +Primary users of this repository are developers and agents implementing citation-evidence subsystems. + +They include: + +- developers building document viewers, +- developers building the review workspace, +- developers implementing evidence-backed forms, +- developers implementing citation recovery, +- developers building source ingestion and document representation pipelines, +- coding agents that need to understand how citations remain connected to source passages. + +End users should experience this repository indirectly whenever they click a citation and the correct source context opens. + +--- + +## Strategic Role + +The strategic role of **evidence-anchor** is to protect the citation-evidence system from fragile, viewer-specific annotations. + +Without this repository, citations risk becoming simple coordinates, highlights, or comments that break when: + +- a PDF is zoomed or resized, +- a document is rendered differently, +- an HTML page changes layout, +- a Markdown document is re-rendered, +- a source document is lightly revised, +- a citation is exported and later reopened. + +**evidence-anchor** provides the durable technical foundation that makes source-backed evidence trustworthy over time. + +--- + +## Core Concept + +The core concept of this repository is the **anchor**. + +An anchor is a resolvable reference to a passage or region in a document. + +An anchor is not one thing. It should usually be represented by several complementary selectors. + +For example, a PDF anchor may include: + +```text +exact quote +prefix and suffix context +canonical text offsets +page number +page-local offsets +normalized page rectangles +text item references +```` + +An HTML or Markdown anchor may include: + +```text +exact quote +prefix and suffix context +canonical text offsets +DOM range +structural path +heading or section context +``` + +The system should use the strongest available selectors first and fall back to more flexible selectors when necessary. + +--- + +## Scope + +This repository should own: + +* selector type definitions related to anchoring, +* selector creation from captured selections, +* text quote selector support, +* text position selector support, +* PDF page and rectangle selector support, +* DOM range and structural selector support, +* selector resolution strategies, +* fuzzy re-anchoring strategies, +* anchor confidence scoring, +* ambiguous match handling, +* orphaned annotation detection, +* highlight rendering contracts, +* scroll-to-anchor contracts, +* format-neutral anchor adapter interfaces. + +It should provide the technical layer used by: + +* **citation-work** when a user selects text and creates an annotation, +* **evidence-binder** when a field focuses linked evidence, +* **evidence-source** when citation recovery finds a passage, +* **citation-engine** when annotations need selector and resolution semantics, +* **citation-evidence** when the integrated workspace reopens source context. + +--- + +## Out of Scope + +This repository should not own the broader product or domain concerns. + +Specifically, it should not own: + +* the canonical evidence domain model, +* persistence policy, +* citation card rendering, +* document ingestion pipelines, +* metadata extraction, +* OCR processing, +* external source lookup, +* document collection review UI, +* form-field binding semantics, +* visual guide overlay UI, +* application shell and deployment. + +Those responsibilities belong to the appropriate citation-evidence subsystem repositories. + +--- + +## Architectural Position + +```text id="6p8rno" +citation-evidence + integrated product shell + +citation-engine + core domain model, services, persistence contracts + +evidence-anchor + selectors, anchor creation, resolution, re-anchoring, highlighting contracts + +evidence-source + document ingestion and document representations + +citation-work + review workspace and annotation UX + +evidence-binder + evidence-to-target binding and active evidence state +``` + +**evidence-anchor** should operate close to document representations and viewers, but it should not become a viewer implementation itself. + +It should define the anchoring contract and provide reusable anchoring logic. + +--- + +## Design Principles + +### Selector Redundancy + +A durable citation should not rely on a single selector. + +Where possible, the system should store multiple selectors that can support each other: + +```text +visual selectors +text selectors +structural selectors +context selectors +``` + +### Viewer Independence + +Anchors should not depend on one specific PDF viewer, Markdown renderer, HTML renderer, or frontend framework. + +Viewer adapters may provide selection and rendering details, but the anchor model should remain portable. + +### Format Neutrality + +The anchoring model should work across paginated and non-paginated documents. + +PDFs, Markdown, HTML, plain text, and future formats should share the same resolution concepts. + +### Confidence Over Certainty + +Anchor resolution should return a confidence level, not just success or failure. + +Some matches are exact. Some are probable. Some are ambiguous. Some are unresolved. + +The API should make this explicit. + +### Human Confirmation for Ambiguity + +When multiple possible matches exist or fuzzy matching is uncertain, the repository should support workflows that ask the user or calling system to confirm the correct target. + +### Preserve the Quote + +Even if an anchor can no longer be resolved, the original quote and context should remain available through the annotation or evidence item. + +### No Silent Misleading Match + +It is better to report an unresolved or ambiguous citation than to confidently highlight the wrong passage. + +--- + +## Initial Selector Types + +The first version should support or prepare for these selector concepts: + +```text id="qtvj9s" +TextQuoteSelector + exact selected text plus prefix and suffix context + +TextPositionSelector + canonical start and end offsets in normalized text + +PdfRectSelector + page number and normalized page rectangles + +PdfPageTextSelector + page number plus page-local text offsets + +DomRangeSelector + DOM path and range offsets for rendered HTML + +StructuralSelector + heading, section, block, or AST path information + +FragmentSelector + optional exported fragment or deep-link representation +``` + +The repository should align with W3C Web Annotation selector concepts where practical, without forcing all internal logic into JSON-LD. + +--- + +## Anchor Resolution Strategy + +A typical resolution pipeline should be: + +```text id="eug5uu" +1. Check document and representation identity. +2. Try exact position-based selectors. +3. Verify resolved text against the stored quote. +4. Try PDF page and rectangle selectors where applicable. +5. Try text quote matching with prefix and suffix. +6. Try structural or heading context. +7. Try fuzzy quote matching. +8. Rank candidate matches by confidence. +9. Return resolved, ambiguous, stale, or unresolved status. +``` + +The caller should receive enough information to decide whether to: + +* highlight the passage, +* ask the user to confirm, +* mark the citation as stale, +* manually reattach the annotation. + +--- + +## Core Interfaces + +The repository should eventually provide interfaces similar to: + +```ts +interface AnchorAdapter { + createSelectors(selection: SelectionCapture): Promise; + + resolveSelectors( + representation: DocumentRepresentation, + selectors: Selector[] + ): Promise; + + renderHighlight( + target: ResolvedAnchorTarget, + options?: HighlightRenderOptions + ): Promise; + + scrollToTarget( + target: ResolvedAnchorTarget, + options?: ScrollToTargetOptions + ): Promise; +} +``` + +Resolution should return structured results: + +```ts +type AnchorResolution = { + status: "resolved" | "ambiguous" | "unresolved" | "stale"; + confidence: number; + candidates: ResolvedAnchorTarget[]; + usedSelectorTypes: string[]; + warnings?: string[]; +}; +``` + +These interfaces should remain implementation-neutral enough to work with different viewers and document formats. + +--- + +## Expected Dependencies + +This repository is expected to depend on shared types from: + +```text id="nwia4s" +citation-engine + annotation, selector, document representation, and evidence-related contracts +``` + +It may be consumed by: + +```text id="4ywhgg" +citation-work + to create anchors from user selections and reopen evidence + +evidence-source + to create anchors from citation recovery results + +evidence-binder + to resolve active evidence linked to form fields or claims + +citation-evidence + to integrate the complete user experience +``` + +It should avoid depending on application-level UI repositories. + +--- + +## First Useful Version + +A first useful version of **evidence-anchor** should provide: + +* TypeScript selector types, +* a text quote selector model, +* a text position selector model, +* a PDF rectangle selector model, +* an anchor resolution result model, +* a simple text quote matching function, +* a simple text position resolver, +* a viewer adapter contract, +* enough mock/test data to prove that an annotation can be resolved back into a document representation. + +The first version does not need to support every document format, but it should establish the selector redundancy pattern. + +--- + +## Success Criteria + +The repository is successful when another subsystem can use it to: + +1. capture a document selection, +2. create one or more selectors, +3. store those selectors through **citation-engine**, +4. later resolve them against a document representation, +5. receive a confidence-scored target, +6. scroll to and highlight that target through a viewer adapter, +7. detect when the target is ambiguous or unresolved. + +A developer or coding agent should be able to understand from this repository how citation-evidence keeps evidence connected to source context. + +--- + +## Repository Character + +This repository should be: + +* technically focused, +* algorithm-friendly, +* format-neutral, +* viewer-independent, +* strongly typed, +* explicit about uncertainty, +* careful about false positives, +* reusable across multiple user interfaces, +* small enough to remain a precise anchoring layer. + +--- + +## Guiding Statement + +**evidence-anchor exists to keep cited evidence attached to its source context.** diff --git a/README.md b/README.md index fcd7b8f..444ff87 100644 --- a/README.md +++ b/README.md @@ -1,3 +1 @@ -# repo-seed - -A git repository template to bootstrap coulomb projects from. \ No newline at end of file +Evidence anchoring, selector, resolution for citations.