From bc95737e6a13cc92f997502b05bca01ed1a4558c Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 24 May 2026 15:26:34 +0200 Subject: [PATCH] Added intent prd and architecture --- INTENT.md | 211 +++++ wiki/ArchitectureOverview.md | 1360 +++++++++++++++++++++++++++ wiki/ProductRequirementsDocument.md | 527 +++++++++++ 3 files changed, 2098 insertions(+) create mode 100644 INTENT.md create mode 100644 wiki/ArchitectureOverview.md create mode 100644 wiki/ProductRequirementsDocument.md diff --git a/INTENT.md b/INTENT.md new file mode 100644 index 0000000..5a05461 --- /dev/null +++ b/INTENT.md @@ -0,0 +1,211 @@ +# INTENT + +## Purpose + +This repository exists to provide the umbrella product, integration shell, and reference implementation for **citation-evidence**. + +**citation-evidence** is a document-centered evidence workspace for capturing, managing, presenting, and reopening citations with contextual commentary across PDFs, Markdown, HTML, and other document formats. + +The project enables users to turn source passages into reusable evidence objects that can support form fields, claims, requirements, decisions, reports, and web publications. + +A citation should not be a dead reference. It should be an actionable bridge back to the source context. + +--- + +## Primary Utility + +The repository provides the integrated workspace and coordination layer for the citation-evidence system. + +It brings together the subsystem projects: + +- **citation-engine** — core domain model, APIs, persistence contracts, citation rendering, and orchestration +- **evidence-anchor** — durable selectors, anchoring, re-anchoring, and highlight resolution +- **evidence-source** — document ingestion, text extraction, source metadata, and citation recovery +- **citation-work** — document collection review, annotation workflow, and evidence sidebar UX +- **evidence-binder** — linking evidence to form fields, claims, requirements, decisions, and other structured targets + +The umbrella repository exists to demonstrate and validate how these subsystems work together as one coherent product. + +--- + +## Intended Users + +Primary users are people and systems that need evidence-backed information work: + +- researchers and analysts reviewing document collections +- form workers and case processors who need source-backed field entries +- consultants and knowledge workers producing evidence-backed reports +- compliance, audit, procurement, and legal-adjacent workers who need traceable justification +- product and requirements workers linking source material to structured decisions +- developers integrating citation-evidence capabilities into other applications +- agentic assistants helping users search, extract, bind, and present evidence + +--- + +## Strategic Role + +The strategic role of **citation-evidence** is to establish a reusable infrastructure layer for **evidence-backed information spaces**. + +It connects three activities that are often handled separately: + +1. reading and annotating documents, +2. extracting reusable citations and commentary, +3. binding evidence to structured outputs such as forms, claims, requirements, reports, and web pages. + +The project should become a foundation for workflows where information must remain traceable to its source context. + +--- + +## Core Concept + +The central flow of the system is: + +```text +Source Document + → Document Representation + → Durable Annotation Anchor + → Evidence Item with Commentary + → Evidence Link to Field / Claim / Requirement + → Portable Citation Card + → Reopenable Source Context +```` + +The system treats an evidence item as more than a highlight. + +An evidence item is a reusable object that can: + +* quote a source passage, +* preserve commentary, +* reopen the source context, +* support or contradict a structured target, +* be exported into another document or webpage, +* be reused by humans and software agents. + +--- + +## Scope + +This repository owns the integrated product scope. + +It should contain: + +* product documentation +* architecture documentation +* integration scenarios +* reference workspace application +* cross-subsystem examples +* demo data and test workflows +* deployment sketches +* system-level acceptance tests +* onboarding material for developers and agents + +It should coordinate the subsystem repositories without absorbing their responsibilities. + +--- + +## Out of Scope + +This repository should not become the implementation home for all subsystem internals. + +Specifically, it should not own: + +* low-level selector and re-anchoring algorithms +* full document ingestion and extraction pipelines +* the complete persistence implementation +* all viewer-specific internals +* all form-binding logic +* all citation rendering logic + +Those responsibilities belong in the focused subsystem repositories. + +The umbrella repository should integrate, validate, and demonstrate them. + +--- + +## Initial Product Modes + +The integrated product should support three primary modes. + +### 1. Document Review + +Users add documents to a collection, review them, highlight relevant passages, add commentary, and create reusable evidence items. + +### 2. Evidence-Backed Forms + +Users display source documents next to structured forms. Form fields can be linked to evidence items. Activating a field focuses the corresponding source citation and visually connects field, evidence item, and document highlight. + +### 3. Citation Recovery + +Users provide a citation, quote, or source clue. The system searches local and eventually configured external sources, locates candidate passages, and allows the user to confirm and turn the passage into a navigable annotation. + +--- + +## Architectural Direction + +The project should be built around a headless, format-neutral evidence model with viewer-specific adapters. + +Key principles: + +* citations must not depend on one specific viewer implementation +* multiple selector types should be stored for durable re-anchoring +* evidence items should be first-class domain objects +* PDFs, Markdown, HTML, and future formats should share the same evidence model +* uncertain source recovery should require human confirmation +* citation cards should be portable across web, Markdown, and later report outputs +* APIs and data structures should be suitable for agentic workflows + +--- + +## Initial Reference Scenario + +The first end-to-end scenario should be: + +1. A user creates a document collection. +2. The user adds a PDF. +3. The user selects a passage and adds commentary. +4. The system creates an annotation and evidence item. +5. The user opens a form next to the document. +6. The user links the evidence item to a form field. +7. The user focuses the field. +8. The system highlights the field, evidence item, and source passage. +9. The system draws a visual guide between them. +10. The user exports the evidence as a Markdown or HTML citation card. + +This scenario validates the core product promise without requiring advanced collaboration or external source discovery. + +--- + +## Repository Character + +This repository should be: + +* integrative rather than monolithic +* product-oriented rather than library-only +* documentation-rich +* testable through reference scenarios +* friendly to human developers and coding agents +* explicit about subsystem boundaries +* suitable as the entry point for the overall citation-evidence ecosystem + +--- + +## Success Criteria + +The repository is successful when it allows a developer or agent to understand, run, and extend the citation-evidence system as an integrated product. + +A first useful version should make it possible to: + +* load a document collection, +* review a PDF, +* create an evidence item from selected text, +* link that evidence item to a structured form field, +* reopen the cited source context from the field, +* render the evidence as a citation card, +* understand which subsystem owns which part of the implementation. + +--- + +## Guiding Statement + +**citation-evidence exists to make source-backed information work navigable, reusable, and trustworthy.** + diff --git a/wiki/ArchitectureOverview.md b/wiki/ArchitectureOverview.md new file mode 100644 index 0000000..7edc98b --- /dev/null +++ b/wiki/ArchitectureOverview.md @@ -0,0 +1,1360 @@ +# Architecture Overview: citation-evidence + +## 1. Purpose + +This document describes the initial architecture for **citation-evidence**, a modular evidence workspace for capturing, managing, presenting, and reopening citations across PDFs and other document formats. + +The architecture is designed to support three primary product modes: + +1. **Document Review** — add documents to a collection, mark passages, comment on them, and create reusable evidence items. +2. **Evidence-Backed Forms** — display documents next to forms, bind evidence to fields, and navigate from field to cited source context. +3. **Citation Recovery** — start from an external citation, quote, or source clue, find the digital source if available, locate the cited passage, and create an annotation. + +The system should remain viewer-independent, format-neutral, and suitable for future agentic workflows. + +--- + +## 2. Architectural Summary + +At its core, **citation-evidence** separates five concerns: + +```text +Document Source + The original PDF, Markdown, HTML, web page, scan, or other document. + +Document Representation + A normalized, searchable, addressable representation derived from the source. + +Annotation Anchor + A durable technical reference to a passage inside a representation. + +Evidence Item + A meaningful evidence object built from one or more annotations and commentary. + +Evidence Binding + A connection between evidence and a structured target such as a form field, claim, requirement, or decision. +``` + +The high-level architecture is: + +```text +┌─────────────────────────────────────────────────────────────────────┐ +│ citation-evidence │ +│ Umbrella app, workspace shell, integration, demos, docs │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ citation-engine │ +│ Core domain model, APIs, persistence contracts, citation rendering │ +└─────────────────────────────────────────────────────────────────────┘ + │ + ┌───────────┼───────────────────────┬───────────────────────┐ + ▼ ▼ ▼ ▼ +┌───────────────┐ ┌────────────────┐ ┌────────────────┐ +│ evidence- │ │ evidence- │ │ citation- │ +│ source │ │ anchor │ │ work │ +│ ingestion, │ │ selectors, │ │ review UI, │ +│ extraction, │ │ resolving, │ │ collections, │ +│ recovery │ │ highlighting │ │ annotation UX │ +└───────────────┘ └────────────────┘ └────────────────┘ + │ │ │ + └───────────────┬───────┴───────────────┬───────┘ + ▼ ▼ + ┌────────────────┐ ┌────────────────┐ + │ evidence- │ │ viewer adapters │ + │ binder │ │ PDF / HTML / MD │ + │ field/claim │ │ and later more │ + │ evidence links │ │ │ + └────────────────┘ └────────────────┘ +``` + +--- + +## 3. Repository and Subsystem Boundaries + +### 3.1 citation-evidence + +**Role:** Umbrella product and integration repository. + +This repository ties the subsystem implementations together and provides the reference product experience. + +Responsibilities: + +* Workspace shell. +* Cross-subsystem integration. +* Reference web application. +* Demo scenarios. +* Product documentation. +* System-level tests. +* Example deployments. +* Developer onboarding. + +Should contain: + +```text +citation-evidence/ + README.md + INTENT.md + ARCHITECTURE.md + PRODUCT_REQUIREMENTS.md + apps/ + workspace-demo/ + docs/ + concepts/ + decisions/ + examples/ + integration-tests/ + docker-compose.yml +``` + +Should not contain: + +* The low-level anchoring algorithms. +* The complete document ingestion implementation. +* The full domain engine implementation. +* Viewer-specific internals except as integration examples. + +--- + +### 3.2 citation-engine + +**Role:** Core domain engine and service layer. + +This is the conceptual center of the system. It owns the stable domain model and the API contracts used by the other subsystems. + +Responsibilities: + +* Core domain model. +* Document, annotation, evidence, and binding APIs. +* Persistence interfaces. +* Citation card rendering contracts. +* Markdown and HTML export logic. +* W3C Web Annotation-compatible mapping. +* Event model. +* Orchestration between source, anchor, work, and binder subsystems. + +Key concepts owned: + +```text +Document +DocumentRepresentation +Annotation +Selector +EvidenceItem +EvidenceLink +EvidenceSet +CitationCard +CitationRecoveryAttempt +``` + +Suggested package structure: + +```text +citation-engine/ + packages/ + model/ + api-contracts/ + persistence/ + citation-rendering/ + events/ + w3c-mapping/ + docs/ + tests/ +``` + +Primary interfaces: + +```ts +type DocumentId = string; +type AnnotationId = string; +type EvidenceItemId = string; +type EvidenceLinkId = string; + +interface CitationEngine { + documents: DocumentService; + annotations: AnnotationService; + evidence: EvidenceService; + bindings: EvidenceBindingService; + rendering: CitationRenderingService; +} +``` + +--- + +### 3.3 evidence-anchor + +**Role:** Format-neutral anchoring, selector resolution, and highlighting contract. + +This repository is responsible for making citations durable and reopenable. + +Responsibilities: + +* Selector model. +* Text quote selectors. +* Text position selectors. +* PDF page/rectangle selectors. +* DOM/structural selectors. +* Selector creation from user selections. +* Selector resolution against document representations. +* Fuzzy re-anchoring. +* Highlight rendering contract. +* Orphaned annotation detection. + +Key architectural rule: + +**No citation should depend on a single visual coordinate system only.** + +The subsystem should store redundant selectors where possible: + +```text +PDF citation: + - exact quote + - prefix/suffix + - page number + - normalized page rectangles + - page-local text offsets + - global canonical text offsets + +HTML/Markdown citation: + - exact quote + - prefix/suffix + - canonical text offsets + - DOM range or structural path + - heading/section context +``` + +Suggested package structure: + +```text +evidence-anchor/ + packages/ + selectors/ + resolver/ + fuzzy-match/ + highlight-contract/ + pdf-selectors/ + dom-selectors/ + docs/ + tests/ +``` + +Core interface: + +```ts +interface AnchorAdapter { + createSelectors(selection: SelectionCapture): Promise; + resolveSelectors( + representation: DocumentRepresentation, + selectors: Selector[] + ): Promise; + renderHighlight( + target: ResolvedAnchorTarget, + options?: HighlightRenderOptions + ): Promise; + scrollToTarget( + target: ResolvedAnchorTarget, + options?: ScrollToTargetOptions + ): Promise; +} +``` + +Resolution result: + +```ts +type AnchorResolution = { + status: "resolved" | "ambiguous" | "unresolved" | "stale"; + confidence: number; + candidates: ResolvedAnchorTarget[]; + usedSelectorTypes: string[]; + warnings?: string[]; +}; +``` + +--- + +### 3.4 evidence-source + +**Role:** Document ingestion, source metadata, full-text extraction, and citation recovery. + +This repository turns raw sources into usable document representations and supports the process of recovering cited passages from external references. + +Responsibilities: + +* Document import. +* Source URI handling. +* Metadata extraction. +* Fingerprinting. +* Text extraction. +* PDF text extraction pipeline. +* Markdown normalization. +* HTML normalization and sanitization. +* Optional OCR integration later. +* Local source matching. +* External source discovery hooks. +* Citation recovery attempts. + +Suggested package structure: + +```text +evidence-source/ + packages/ + ingest-core/ + fingerprinting/ + metadata/ + extract-pdf/ + extract-markdown/ + extract-html/ + source-lookup/ + citation-recovery/ + docs/ + tests/ +``` + +Core ingestion pipeline: + +```text +Raw Source + → identify media type + → compute fingerprint + → extract metadata + → extract canonical text + → build format-specific maps + → persist Document + DocumentRepresentation +``` + +PDF representation should include: + +```text +page count +page text +global canonical text +page-local offset map +text item map +page dimensions +optional normalized rectangles for selections +``` + +Markdown/HTML representation should include: + +```text +canonical text +DOM or AST structure +heading map +offset-to-node map +source line map where available +sanitized render output +``` + +Citation recovery pipeline: + +```text +Citation clue / quote / reference + → parse clue + → search local library + → search configured external sources if allowed + → identify candidate documents + → extract/index candidate text + → exact quote search + → fuzzy quote search + → present candidates + → user confirms + → create annotation + evidence item +``` + +--- + +### 3.5 citation-work + +**Role:** Review workspace and annotation user experience. + +This repository provides the user-facing workflows for reviewing document collections and creating evidence from selected passages. + +Responsibilities: + +* Document collection UI. +* Review queue. +* Document viewer composition. +* Annotation creation UX. +* Evidence sidebar. +* Review state management. +* Tagging and filtering. +* Navigation between evidence items and source context. + +Suggested package structure: + +```text +citation-work/ + packages/ + review-ui/ + collection-ui/ + evidence-sidebar/ + annotation-toolbar/ + viewer-shell/ + review-state/ + docs/ + tests/ +``` + +Core UI layout: + +```text +┌─────────────────────────────────────────────────────────────┐ +│ Collection / Review Header │ +├───────────────┬───────────────────────────────┬─────────────┤ +│ Document List │ Document Viewer │ Evidence │ +│ / Filters │ PDF / HTML / Markdown │ Sidebar │ +└───────────────┴───────────────────────────────┴─────────────┘ +``` + +Review states: + +```text +unreviewed +in-review +marked +relevant +rejected +needs-follow-up +cited +verified +``` + +Evidence states: + +```text +candidate +confirmed +rejected +needs-check +strong-support +weak-support +contradicts +``` + +--- + +### 3.6 evidence-binder + +**Role:** Binding evidence to structured targets such as form fields, claims, requirements, decisions, or document sections. + +This repository provides the graph-like layer between evidence and the things it supports. + +Responsibilities: + +* Evidence-to-field links. +* Evidence-to-claim links. +* Evidence-to-requirement links. +* Evidence sets. +* Relation types. +* Form synchronization state. +* Active field/evidence/annotation state. +* Visual guide model. +* Evidence completeness indicators. + +Suggested package structure: + +```text +evidence-binder/ + packages/ + binding-model/ + form-evidence-state/ + evidence-switcher/ + visual-guide-overlay/ + target-adapters/ + docs/ + tests/ +``` + +Core model: + +```ts +type EvidenceTargetType = + | "form-field" + | "claim" + | "requirement" + | "decision" + | "document-section"; + +type EvidenceRelation = + | "supports" + | "contradicts" + | "explains" + | "source-for" + | "qualifies"; + +interface EvidenceLink { + id: string; + evidenceItemId: string; + targetType: EvidenceTargetType; + targetId: string; + relation: EvidenceRelation; + confidence?: number; + status?: "candidate" | "confirmed" | "rejected" | "needs-check"; +} +``` + +Evidence form UI model: + +```text +Form Field Activated + → evidence-binder loads linked EvidenceSet + → citation-engine resolves active evidence + → evidence-anchor scrolls document viewer to annotation + → visual-guide-overlay connects field, evidence card, and highlight +``` + +Visual guide architecture: + +```text +Element Registry + field target rect + evidence card rect + annotation highlight rect + +Guide Overlay + SVG line or curve from field to evidence card + SVG line or curve from evidence card to annotation + active state updates on scroll, resize, focus, and evidence switch +``` + +--- + +## 4. Core Domain Model + +### 4.1 Document + +A source object known to the system. + +```ts +interface Document { + id: string; + title?: string; + uri?: string; + mediaType: string; + fingerprint?: string; + version?: string; + createdAt: string; + updatedAt: string; + metadata?: Record; +} +``` + +### 4.2 DocumentRepresentation + +A normalized representation generated from a document source. + +```ts +interface DocumentRepresentation { + id: string; + documentId: string; + representationType: + | "pdf-text" + | "html-dom" + | "markdown-rendered" + | "plain-text" + | "ocr-text"; + contentHash: string; + canonicalText?: string; + pageMap?: PageMap; + structureMap?: StructureMap; + offsetMap?: OffsetMap; + generatedAt: string; +} +``` + +### 4.3 Selector + +A technical locator for a document passage. + +```ts +type Selector = + | TextQuoteSelector + | TextPositionSelector + | PdfRectSelector + | DomRangeSelector + | StructuralSelector; +``` + +Recommended selector redundancy: + +```text +Always capture: + - exact quote + - prefix/suffix context + +Capture when available: + - canonical text offsets + - PDF page/rectangles + - DOM range + - structural path + - heading context +``` + +### 4.4 Annotation + +A technical mark on a document range. + +```ts +interface Annotation { + id: string; + documentId: string; + representationId?: string; + selectors: Selector[]; + quote?: string; + note?: string; + createdBy?: string; + createdAt: string; + updatedAt: string; +} +``` + +### 4.5 EvidenceItem + +A meaningful evidence object built from one or more annotations. + +```ts +interface EvidenceItem { + id: string; + annotationIds: string[]; + title?: string; + commentary?: string; + status: "candidate" | "confirmed" | "rejected" | "needs-check"; + confidence?: number; + tags?: string[]; + createdBy?: string; + createdAt: string; + updatedAt: string; +} +``` + +### 4.6 EvidenceSet + +A group of evidence items connected to a target or topic. + +```ts +interface EvidenceSet { + id: string; + label?: string; + targetType?: string; + targetId?: string; + evidenceItemIds: string[]; + activeEvidenceItemId?: string; +} +``` + +### 4.7 CitationCard + +A presentable rendering of an evidence item. + +```ts +interface CitationCard { + id: string; + evidenceItemId: string; + quote: string; + sourceLabel: string; + commentary?: string; + openContextUrl?: string; + format: "html" | "markdown" | "web-component"; +} +``` + +--- + +## 5. Viewer Adapter Architecture + +The system must not hard-code one viewer implementation into the citation model. + +Each document format should be supported through a viewer adapter. + +```ts +interface DocumentViewerAdapter { + mediaTypes: string[]; + + load(document: Document, representation?: DocumentRepresentation): Promise; + + getCurrentSelection(): Promise; + + createSelectorsFromSelection( + selection: SelectionCapture + ): Promise; + + resolveSelectors( + selectors: Selector[] + ): Promise; + + scrollToResolvedTarget( + target: ResolvedAnchorTarget, + options?: { + center?: boolean; + behavior?: "auto" | "smooth"; + } + ): Promise; + + renderHighlight( + target: ResolvedAnchorTarget, + options?: HighlightRenderOptions + ): Promise; + + getHighlightClientRects( + annotationId: string + ): Promise; +} +``` + +Initial adapters: + +```text +PDFViewerAdapter + PDF.js / react-pdf-highlighter-plus based + +HtmlViewerAdapter + sanitized HTML, DOM selection, DOM ranges + +MarkdownViewerAdapter + markdown → HTML rendering, DOM selection, optional source-map support +``` + +Future adapters: + +```text +DocxViewerAdapter +EpubViewerAdapter +ImageOcrViewerAdapter +PlainTextViewerAdapter +``` + +--- + +## 6. Data Flow: Document Review + +```text +User adds document + → evidence-source imports source + → document fingerprint is computed + → document metadata is extracted + → document representation is generated + → citation-engine stores Document and DocumentRepresentation + → citation-work displays document + → user selects passage + → viewer adapter captures selection + → evidence-anchor creates selectors + → citation-engine creates Annotation + → user adds commentary + → citation-engine creates EvidenceItem + → citation-work shows item in evidence sidebar +``` + +Result: + +```text +Document + Representation + Annotation + EvidenceItem +``` + +--- + +## 7. Data Flow: Reopen Citation Context + +```text +User clicks citation or evidence item + → citation-engine loads EvidenceItem + → citation-engine loads Annotation + → citation-engine loads Document and Representation + → viewer adapter opens document if needed + → evidence-anchor resolves selectors + → viewer adapter scrolls target into center + → viewer adapter renders highlight + → citation-work/evidence-binder shows active state +``` + +Resolution strategy: + +```text +1. Try exact representation/version match. +2. Try position selector. +3. Verify exact quote. +4. Try PDF page/rectangle selector if PDF. +5. Try text quote selector with prefix/suffix. +6. Try fuzzy quote matching. +7. If multiple matches, rank by structural/page context. +8. If unresolved, mark annotation as orphaned. +``` + +--- + +## 8. Data Flow: Evidence-Backed Form Field + +```text +User focuses form field + → evidence-binder identifies EvidenceSet for field + → evidence-binder selects active EvidenceItem + → citation-engine loads annotation and source context + → viewer adapter resolves and scrolls to annotation + → evidence sidebar highlights active evidence item + → form field shows active evidence state + → visual guide overlay connects field, evidence, and highlight +``` + +Evidence switch: + +```text +User selects next evidence item + → activeEvidenceItemId changes + → annotation is resolved + → viewer scrolls to new passage + → guide overlay updates +``` + +--- + +## 9. Data Flow: Citation Recovery + +```text +User enters citation clue / quote / source reference + → evidence-source parses clue + → search local document library + → rank local candidates + → if allowed, search configured external sources + → fetch/load candidate representation where permitted + → exact quote search + → fuzzy quote search + → show candidate passages + → user confirms passage + → evidence-anchor creates selectors + → citation-engine creates Annotation + → citation-engine creates EvidenceItem + → optional: evidence-binder links item to target +``` + +Recovery states: + +```text +source-found-fulltext +source-found-preview-only +source-found-metadata-only +source-not-found +quote-found +quote-not-found +manual-confirmation-needed +annotation-created +``` + +--- + +## 10. Persistence Architecture + +The architecture should support multiple persistence modes. + +### 10.1 Local-First Development Mode + +Suitable for early MVPs and personal use. + +```text +SQLite / DuckDB / local filesystem + documents stored as files + metadata stored in SQLite + extracted text cached locally + annotations stored as JSON or relational rows +``` + +Advantages: + +* Simple setup. +* Good for CLI and desktop-like workflows. +* Agent-friendly. +* Easy to version and inspect. + +### 10.2 Web Application Mode + +Suitable for team or server deployment. + +```text +Object storage + original documents + +PostgreSQL + documents + representations + annotations + evidence items + evidence links + +Search index + full-text and quote search +``` + +Recommended baseline: + +```text +PostgreSQL + canonical metadata and relationships + +Object storage / filesystem + document blobs and generated representations + +Meilisearch / Typesense / OpenSearch + full-text document and evidence search +``` + +### 10.3 Persistence Boundaries + +`citation-engine` should define persistence interfaces. + +Concrete storage implementations should be replaceable. + +```ts +interface AnnotationRepository { + create(annotation: Annotation): Promise; + get(id: string): Promise; + listByDocument(documentId: string): Promise; + update(annotation: Annotation): Promise; +} +``` + +--- + +## 11. Search and Indexing Architecture + +Search is needed for: + +* Finding documents. +* Finding evidence items. +* Searching within a document. +* Citation recovery. +* Fuzzy re-anchoring. + +Index types: + +```text +Document metadata index + title, author, source URI, document type, collection + +Full-text document index + canonical text, page text, section text + +Evidence index + quote, commentary, tags, target links + +Anchor recovery index + n-grams, quote fragments, prefix/suffix context +``` + +For the MVP, local full-text search may be enough. + +Later, source recovery and large document collections will benefit from a dedicated search service. + +--- + +## 12. UI Architecture + +### 12.1 Review Workspace + +```text +┌─────────────────────────────────────────────────────────────┐ +│ Workspace Header │ +├───────────────┬───────────────────────────────┬─────────────┤ +│ Collection │ Document Viewer │ Evidence │ +│ Navigation │ │ Sidebar │ +└───────────────┴───────────────────────────────┴─────────────┘ +``` + +Primary interactions: + +* Select text. +* Create annotation. +* Add commentary. +* Tag evidence. +* Click evidence to reopen context. +* Filter by status/tag/document. + +### 12.2 Evidence Form Workspace + +```text +┌───────────────────────────────┬─────────────────────────────┐ +│ Structured Form │ Document Viewer │ +│ │ │ +│ Field A │ Active citation highlight │ +│ evidence chips │ │ +│ Field B │ │ +│ evidence chips │ │ +├───────────────────────────────┴─────────────────────────────┤ +│ Optional Evidence Tray / Active Citation Details │ +└───────────────────────────────────────────────────────────────┘ +``` + +Visual guide overlay: + +```text +field element → evidence chip/card → document highlight +``` + +The overlay should be independent from both the form renderer and document viewer. + +### 12.3 Citation Recovery Workspace + +```text +┌─────────────────────────────────────────────────────────────┐ +│ Citation / Quote Input │ +├──────────────────────┬──────────────────────────────────────┤ +│ Candidate Sources │ Candidate Passages │ +├──────────────────────┴──────────────────────────────────────┤ +│ Confirm / Create Annotation │ +└─────────────────────────────────────────────────────────────┘ +``` + +--- + +## 13. Event Model + +Subsystems should communicate through explicit domain events where useful. + +Examples: + +```text +DocumentImported +DocumentRepresentationGenerated +AnnotationCreated +AnnotationResolved +AnnotationResolutionFailed +EvidenceItemCreated +EvidenceItemLinked +EvidenceItemActivated +FormFieldActivated +CitationCardRendered +CitationRecoveryStarted +CitationRecoveryCandidateFound +CitationRecoveryConfirmed +``` + +Example event: + +```ts +interface EvidenceItemActivatedEvent { + type: "EvidenceItemActivated"; + evidenceItemId: string; + source?: "sidebar" | "form-field" | "citation-card"; + targetContext?: { + type: "form-field" | "claim" | "requirement"; + id: string; + }; +} +``` + +Events should be useful both for UI synchronization and later automation/agent workflows. + +--- + +## 14. External Standards and Compatibility + +The architecture should align with existing standards where practical. + +### 14.1 W3C Web Annotation + +Use W3C Web Annotation concepts for: + +* Annotation. +* Body. +* Target. +* Selector. +* TextQuoteSelector. +* TextPositionSelector. + +Recommended approach: + +```text +Internal model: + optimized for citation-evidence workflows + +Import/export mapping: + W3C Web Annotation-compatible JSON where practical +``` + +This avoids forcing JSON-LD complexity into every internal operation while preserving standards compatibility. + +### 14.2 Web Components + +Citation presentation should be embeddable through web components where possible: + +```html + + + +``` + +### 14.3 URL Deep Links + +The system should provide stable internal URLs such as: + +```text +/viewer?document=doc_123&annotation=ann_456 +/workspace/collections/col_123/documents/doc_123?evidence=ev_456 +``` + +For public HTML documents, optional browser text fragments may be generated as export aids, but should not be the only internal anchoring mechanism. + +--- + +## 15. Security Architecture + +Security principles: + +1. Treat imported documents as untrusted input. +2. Sanitize imported HTML. +3. Avoid executing document scripts. +4. Isolate document rendering where needed. +5. Do not send private document text to external services without explicit user permission. +6. Make external source lookup configurable. +7. Preserve access control boundaries around collections and documents. + +Important areas: + +```text +HTML sanitization +PDF processing safety +external URL fetching +object storage access +annotation visibility +collection permissions +agent/tool permissions +``` + +For MVP, single-user/local security is sufficient, but the model should not block later multi-user permissions. + +--- + +## 16. Suggested Initial Technical Stack + +### 16.1 Frontend + +```text +TypeScript +React for first application shell +PDF.js or react-pdf-highlighter-plus for PDF MVP +unified / remark / rehype for Markdown rendering +DOMPurify for HTML sanitization +SVG overlay for visual guides +CSS Custom Highlight API with fallback for HTML/Markdown highlighting +``` + +### 16.2 Backend / Local Service + +```text +Node.js or Python service for initial ingestion +PostgreSQL for server mode +SQLite for local-first mode +Filesystem or object storage for document blobs +Meilisearch or Typesense for search if needed early +``` + +### 16.3 Document Processing + +```text +PDF.js text extraction for browser-side PDF workflows +Apache Tika or similar for broader server-side extraction later +Tesseract OCR for scanned documents later +``` + +### 16.4 Packaging Direction + +Prefer TypeScript-first packages for the core web-facing model and UI integration. + +A later backend may be polyglot, but the browser-facing contracts should remain TypeScript-native. + +--- + +## 17. MVP Implementation Plan + +### Phase 1: Core Model and PDF Review + +Deliverables: + +* Basic `Document`, `Annotation`, `EvidenceItem` model. +* PDF viewer integration. +* Text selection capture. +* Highlight creation. +* Commentary entry. +* Evidence sidebar. +* Click evidence to reopen context. +* Markdown/HTML citation card export. + +Subsystems involved: + +```text +citation-engine +citation-work +evidence-anchor +evidence-source +citation-evidence +``` + +### Phase 2: Evidence Binding and Form Mode + +Deliverables: + +* Simple form definition model. +* Evidence links to form fields. +* Evidence chips on fields. +* Activate field to focus evidence. +* Evidence switcher. +* Active state synchronization. +* Initial SVG visual guide overlay. + +Subsystems involved: + +```text +evidence-binder +citation-engine +citation-work +evidence-anchor +citation-evidence +``` + +### Phase 3: Markdown and HTML Documents + +Deliverables: + +* Markdown rendering adapter. +* HTML rendering adapter. +* DOM text selection capture. +* Text quote and text position selectors. +* Highlighting in non-paginated documents. +* Reuse evidence sidebar and binding workflows. + +Subsystems involved: + +```text +evidence-source +evidence-anchor +citation-work +citation-engine +``` + +### Phase 4: Local Citation Recovery + +Deliverables: + +* Recovery input UI. +* Local document search. +* Exact quote match. +* Fuzzy quote match. +* Candidate passage confirmation. +* Create annotation from confirmed match. + +Subsystems involved: + +```text +evidence-source +evidence-anchor +citation-engine +citation-work +``` + +--- + +## 18. Architectural Decisions to Make Early + +### ADR-001: Internal model vs. native W3C Web Annotation + +Recommendation: + +Use an internal model optimized for citation-evidence, with W3C-compatible import/export mapping. + +Reason: + +The product needs evidence binding, form synchronization, recovery states, and citation cards, which go beyond the basic web annotation model. + +### ADR-002: React-first vs. Web-component-first + +Recommendation: + +Build the first application in React, but keep core model and viewer adapter contracts framework-neutral. Add web components for citation cards and context links early. + +Reason: + +React accelerates MVP UI development, while framework-neutral contracts protect reuse. + +### ADR-003: Local-first vs. server-first storage + +Recommendation: + +Design persistence interfaces from the beginning. Implement local-first storage first if the target is personal/agentic workflows; implement PostgreSQL-backed storage when collaboration or server deployment becomes necessary. + +### ADR-004: PDF.js direct vs. react-pdf-highlighter-plus + +Recommendation: + +Use react-pdf-highlighter-plus for initial speed if it satisfies selector and rendering needs. Keep an abstraction boundary so the PDF viewer can be replaced with direct PDF.js integration later. + +### ADR-005: Citation recovery scope + +Recommendation: + +Start with local document library recovery. Add external source lookup only after the local anchoring and quote matching pipeline is reliable. + +--- + +## 19. Risks and Mitigations + +| Risk | Impact | Mitigation | +| ---------------------------------------------------- | -----: | --------------------------------------------------------------- | +| PDF text extraction is inconsistent across documents | High | Store both visual and text selectors; support manual correction | +| Highlight coordinates break with zoom/layout | High | Use normalized coordinates and viewer-independent selectors | +| Imported HTML executes unsafe content | High | Sanitize and sandbox HTML rendering | +| Citation recovery finds wrong passage | Medium | Require user confirmation for fuzzy or external matches | +| Too many repos create coordination overhead | Medium | Keep domain model in citation-engine and define clear contracts | +| Viewer library constraints leak into domain model | High | Enforce adapter boundary and selector abstraction | +| Form binding becomes too domain-specific | Medium | Model generic EvidenceTargets and target adapters | +| Search/indexing becomes heavy too early | Medium | Begin local/simple; add dedicated search service later | + +--- + +## 20. First Reference Scenario + +The first end-to-end reference scenario should be: + +```text +1. User creates a collection named “Application Evidence”. +2. User uploads a PDF. +3. User selects a passage and adds commentary. +4. System creates an annotation and evidence item. +5. User opens a form next to the PDF. +6. User links the evidence item to a form field. +7. User focuses the field. +8. System highlights the field, evidence item, and source passage. +9. System draws a guide from field to evidence to source passage. +10. User exports the evidence as a Markdown citation card. +``` + +This scenario exercises the essential product value without requiring external source lookup or advanced collaboration. + +--- + +## 21. Summary + +The architecture of **citation-evidence** should be organized around reusable evidence objects, not only document annotations. + +The core design is: + +```text +Source Document + → Document Representation + → Durable Annotation Anchor + → Evidence Item with Commentary + → Evidence Link to Field / Claim / Requirement + → Portable Citation Card + → Reopenable Source Context +``` + +The subsystem repositories provide a clean separation of responsibilities: + +```text +citation-engine owns the domain and APIs +evidence-anchor owns selector creation, resolution, and highlighting +evidence-source owns ingestion, extraction, and recovery +citation-work owns review workflows +evidence-binder owns evidence-to-target binding +citation-evidence owns the integrated product shell +``` + +This gives the project a practical MVP path while preserving enough architectural clarity to grow into a reusable infrastructure layer for evidence-backed information work. + diff --git a/wiki/ProductRequirementsDocument.md b/wiki/ProductRequirementsDocument.md new file mode 100644 index 0000000..ce00489 --- /dev/null +++ b/wiki/ProductRequirementsDocument.md @@ -0,0 +1,527 @@ +# Product Requirements Document: citation-evidence + +## 1. Definition + +**citation-evidence** is a document-centered evidence workspace for capturing, managing, presenting, and re-opening citations with contextual commentary across PDFs and other document formats. + +The product enables users to review collections of documents, mark passages, attach commentary, bind evidence to structured targets such as form fields or claims, and later re-open the cited document context with the cited passage highlighted and centered in the viewport. + +It is designed as an umbrella project coordinating a set of focused subsystem repositories: + +| Repository | Role | +| --------------------- | ------------------------------------------------------------------------------------------------ | +| **citation-evidence** | Umbrella product, integration layer, workspace shell, documentation, reference deployment | +| **evidence-anchor** | Format-neutral anchoring, selector, highlighting, and re-anchoring mechanisms | +| **citation-work** | Review workspace for document collections, annotation workflows, and citation creation | +| **evidence-source** | Document ingestion, source discovery, metadata, full-text extraction, and citation recovery | +| **evidence-binder** | Binding of evidence items to form fields, claims, requirements, decisions, and document sections | +| **citation-engine** | Core domain model, APIs, storage model, citation rendering, export, and orchestration logic | + +--- + +## 2. Context + +Many workflows require information to be extracted from documents and justified with precise evidence. Typical examples include legal review, compliance documentation, procurement processes, academic research, product documentation, requirements engineering, grant applications, audits, and structured form submission. + +Current document viewers, PDF annotators, and citation tools often treat these needs separately: + +* PDF viewers display and annotate documents but do not provide durable, reusable evidence objects. +* Citation managers track bibliographic references but often do not preserve exact document context and commentary. +* Form systems collect structured information but rarely maintain traceable evidence links to source passages. +* Web annotation tools can mark documents but are not usually optimized for evidence-backed form filling or structured claim support. + +**citation-evidence** addresses this gap by treating citations as reusable evidence objects that connect document passages to structured targets and can be rendered in other contexts. + +--- + +## 3. Product Vision + +**citation-evidence enables evidence-backed information work by making cited document context reusable, navigable, and structurally linkable.** + +The product should allow a user to move smoothly from reading and marking documents to using those marked passages as evidence for forms, claims, reports, and web pages. + +A citation should not be a dead reference. It should be an actionable bridge back to the source context. + +--- + +## 4. Goals + +### 4.1 Primary Goals + +1. Allow users to add documents to a review collection and capture highlighted citations with commentary. +2. Allow citations to be stored as durable evidence objects independent of one specific viewer implementation. +3. Allow a citation to reopen the source document with the cited passage highlighted and centered. +4. Support both paginated documents such as PDFs and non-paginated documents such as Markdown and HTML. +5. Allow evidence citations to be linked to form fields and other structured targets. +6. Allow users to switch between multiple evidence items connected to a field, claim, or requirement. +7. Provide reusable citation presentation components for webpages, reports, and other documents. +8. Provide a path to citation recovery from bibliographic references, quotes, or partial source descriptions. + +### 4.2 Secondary Goals + +1. Support a modular repository architecture with clear subsystem responsibilities. +2. Use open standards where practical, especially W3C-style web annotation concepts. +3. Reuse mature open-source document viewing, parsing, and annotation components where appropriate. +4. Support future collaboration features such as review status, shared collections, and evidence validation. +5. Support future agentic workflows for document review, quote matching, source discovery, and form assistance. + +--- + +## 5. Non-Goals + +The first product version shall not attempt to solve all document management problems. + +The following are explicitly out of scope for the initial version: + +1. Full enterprise document management. +2. Complete bibliographic reference management comparable to Zotero or Mendeley. +3. Legal-grade digital signature workflows. +4. General-purpose PDF editing. +5. Full OCR correction workflow for scanned documents. +6. Automated truth verification of evidence. +7. Fully automatic citation recovery without human confirmation. +8. Real-time multi-user collaborative editing in the first iteration. + +These may become future capabilities but should not burden the MVP. + +--- + +## 6. Target Users + +### 6.1 Primary Users + +| User Type | Description | Core Need | +| ----------------------------- | ------------------------------------------------------------ | -------------------------------------- | +| Researcher / Analyst | Reviews many documents and extracts relevant evidence | Capture, organize, and reuse citations | +| Form Worker / Case Processor | Fills structured forms based on document evidence | Link form fields to source passages | +| Consultant / Knowledge Worker | Produces reports, memos, or structured recommendations | Export evidence-backed citation cards | +| Compliance / Audit Worker | Needs traceable evidence for claims or submitted information | Maintain source-backed audit trail | +| Product / Requirements Worker | Maps source material to requirements or decisions | Bind evidence to claims and artifacts | + +### 6.2 Secondary Users + +| User Type | Description | Core Need | +| ----------------- | ----------------------------------------------------- | --------------------------------------------- | +| Developer | Integrates citation-evidence into another application | APIs, web components, stable data model | +| Reviewer | Checks whether evidence supports a field or claim | Efficient navigation between claim and source | +| Agentic Assistant | Helps search, suggest, or classify evidence | Machine-readable domain model and APIs | + +--- + +## 7. Primary Use Cases + +### 7.1 Document Collection Review + +A user creates a collection of documents, reviews them, highlights passages, adds commentary, and stores the marked passages as evidence items for later use. + +#### User Story + +As a user, I want to add documents to a review collection and mark relevant passages with commentary, so that I can later find, reuse, and cite those passages. + +#### Functional Expectations + +* The user can create or open a document collection. +* The user can add PDFs, Markdown pages, HTML documents, and later other formats. +* The system displays the document in an appropriate viewer. +* The user can select text and create an annotation. +* The user can add commentary to the annotation. +* The system stores quote text, source metadata, selectors, and commentary. +* The user can browse evidence items collected from the documents. +* The user can click an evidence item and return to the exact document context. + +--- + +### 7.2 Evidence-Backed Form Filling + +A user displays a document next to a form. Form fields can be linked to evidence items. Activating a field opens or focuses the relevant citation context in the document viewer. + +#### User Story + +As a user, I want to fill a form while viewing source documents, so that each important field can be backed by precise document evidence. + +#### Functional Expectations + +* The user can display a structured form next to a document viewer. +* The user can link an annotation or evidence item to a form field. +* A form field can have zero, one, or multiple linked evidence items. +* Activating a field displays the linked evidence list. +* Activating a field focuses the document viewer on the currently selected evidence item. +* The cited text is highlighted and centered in the viewport where possible. +* The UI provides a visual guide from form field to evidence item to source highlight. +* The user can switch between multiple evidence items connected to the same field. +* The system can indicate whether a required field has sufficient evidence. + +--- + +### 7.3 Citation Recovery + +A user provides a citation, quote, or bibliographic clue. The system searches local and possibly online sources for the cited work, locates the passage, and allows the user to create an annotation from the recovered context. + +#### User Story + +As a user, I want to provide an external citation or quote and have the system find the source passage when available, so that I can turn a dead reference into a navigable citation annotation. + +#### Functional Expectations + +* The user can enter a citation, quote, bibliographic reference, DOI, URL, title, author, page reference, or partial source description. +* The system searches the local document library first. +* The system may search configured external sources where allowed. +* The system identifies candidate documents. +* The system searches for exact and fuzzy quote matches. +* The system presents candidate passages for confirmation. +* The user confirms the correct passage. +* The system creates a document reference, annotation, and evidence item. +* The system records unresolved or partially resolved citation recovery attempts. + +--- + +## 8. Functional Requirements + +### 8.1 Document Library and Collection Management + +| ID | Requirement | Priority | +| ------ | --------------------------------------------------------------------------------------------------------------------------- | -------- | +| FR-001 | The system shall allow users to create document collections. | Must | +| FR-002 | The system shall allow users to add documents to a collection. | Must | +| FR-003 | The system shall store document metadata including title, source URI, media type, fingerprint, and version where available. | Must | +| FR-004 | The system shall distinguish between original document source and generated document representations. | Must | +| FR-005 | The system shall support filtering and searching documents within a collection. | Should | +| FR-006 | The system shall support review status per document. | Should | + +### 8.2 Document Viewing + +| ID | Requirement | Priority | +| ------ | ---------------------------------------------------------------------------------------------------- | -------- | +| FR-010 | The system shall display PDF documents in a browser-based viewer. | Must | +| FR-011 | The system shall display Markdown documents as rendered HTML. | Must | +| FR-012 | The system shall display HTML documents in a normalized/sandboxed view. | Must | +| FR-013 | The system shall provide a common viewer adapter interface across document formats. | Must | +| FR-014 | The system shall support scrolling a document to a resolved annotation target. | Must | +| FR-015 | The system shall support centering the annotation target in the viewport where technically possible. | Must | +| FR-016 | The system shall support virtualized rendering for large documents where appropriate. | Should | + +### 8.3 Annotation and Anchoring + +| ID | Requirement | Priority | +| ------ | ----------------------------------------------------------------------------------------------------------- | -------- | +| FR-020 | The system shall allow users to select text and create an annotation. | Must | +| FR-021 | The system shall capture the exact selected text. | Must | +| FR-022 | The system shall capture prefix and suffix context for robust re-anchoring. | Must | +| FR-023 | The system shall capture text position selectors where available. | Must | +| FR-024 | The system shall capture PDF page and normalized rectangle selectors for PDF documents where available. | Must | +| FR-025 | The system shall support DOM or structural selectors for HTML and Markdown representations where available. | Should | +| FR-026 | The system shall support fuzzy re-anchoring when exact selectors fail. | Should | +| FR-027 | The system shall identify unresolved or orphaned annotations. | Should | + +### 8.4 Commentary and Evidence Items + +| ID | Requirement | Priority | +| ------ | ---------------------------------------------------------------------------------------------- | -------- | +| FR-030 | The system shall allow users to add commentary to an annotation. | Must | +| FR-031 | The system shall create evidence items based on one or more annotations. | Must | +| FR-032 | The system shall allow evidence items to have status, tags, confidence, and commentary. | Should | +| FR-033 | The system shall show evidence items in a sidebar or evidence panel. | Must | +| FR-034 | The system shall allow users to navigate from an evidence item to the source document context. | Must | +| FR-035 | The system shall support evidence items that support, contradict, explain, or source a target. | Should | + +### 8.5 Evidence Binding + +| ID | Requirement | Priority | +| ------ | ------------------------------------------------------------------------------------------------------------ | -------- | +| FR-040 | The system shall allow evidence items to be linked to form fields. | Must | +| FR-041 | The system shall support multiple evidence items per form field. | Must | +| FR-042 | The system shall allow users to switch between evidence items linked to a field. | Must | +| FR-043 | The system shall allow evidence items to be linked to claims, requirements, decisions, or document sections. | Should | +| FR-044 | The system shall indicate whether a field has no evidence, candidate evidence, or confirmed evidence. | Should | +| FR-045 | The system shall support relation types such as supports, contradicts, explains, and source-for. | Should | + +### 8.6 Evidence Form UI + +| ID | Requirement | Priority | +| ------ | ------------------------------------------------------------------------------------------------------- | -------- | +| FR-050 | The system shall display a form next to a document viewer. | Must | +| FR-051 | The system shall focus linked evidence when a form field is activated. | Must | +| FR-052 | The system shall visually identify the active form field, evidence item, and document annotation. | Must | +| FR-053 | The system shall provide a visual guide connecting form field, evidence item, and annotation highlight. | Should | +| FR-054 | The system shall support keyboard navigation between evidence items. | Should | +| FR-055 | The system shall support evidence chips or indicators near form fields. | Should | + +### 8.7 Citation Presentation and Export + +| ID | Requirement | Priority | +| ------ | -------------------------------------------------------------------------------------- | -------- | +| FR-060 | The system shall render evidence items as citation cards. | Must | +| FR-061 | Citation cards shall include quote, source label, commentary, and open-context action. | Must | +| FR-062 | The system shall export citation cards as HTML. | Must | +| FR-063 | The system shall export citation cards as Markdown. | Must | +| FR-064 | The system should support configurable citation display styles. | Should | +| FR-065 | The system should support embedding citation cards as web components. | Should | + +### 8.8 Citation Recovery + +| ID | Requirement | Priority | +| ------ | ------------------------------------------------------------------------------------- | -------- | +| FR-070 | The system shall allow users to enter a citation, quote, or source clue for recovery. | Should | +| FR-071 | The system shall search local documents for matching sources and quotes. | Should | +| FR-072 | The system shall support exact quote matching. | Should | +| FR-073 | The system shall support fuzzy quote matching. | Should | +| FR-074 | The system shall present candidate matches for user confirmation. | Should | +| FR-075 | The system may search configured external sources for digitally available documents. | Could | +| FR-076 | The system shall record unsuccessful recovery attempts. | Could | + +### 8.9 APIs and Integration + +| ID | Requirement | Priority | +| ------ | --------------------------------------------------------------------------------------------- | -------- | +| FR-080 | The system shall expose APIs for documents, annotations, evidence items, and evidence links. | Must | +| FR-081 | The system shall support a reusable web component or frontend component model. | Must | +| FR-082 | The system shall allow external systems to open a document viewer at a specific citation. | Must | +| FR-083 | The system shall support import/export of W3C Web Annotation-compatible data where practical. | Should | +| FR-084 | The system shall expose machine-readable structures suitable for agentic workflows. | Should | + +--- + +## 9. Non-Functional Requirements + +### 9.1 Performance + +| ID | Requirement | Priority | +| ------- | --------------------------------------------------------------------------------------------- | -------- | +| NFR-001 | The viewer should open common documents with acceptable latency for interactive review. | Must | +| NFR-002 | Large PDFs should be rendered lazily or virtually where possible. | Should | +| NFR-003 | Citation navigation should feel immediate after the document representation has been indexed. | Should | +| NFR-004 | Text extraction and indexing should be cacheable by document fingerprint. | Must | + +### 9.2 Reliability + +| ID | Requirement | Priority | +| ------- | -------------------------------------------------------------------------------- | -------- | +| NFR-010 | Citations shall remain stable across zoom, resize, and viewport changes. | Must | +| NFR-011 | The system shall detect when a citation can no longer be resolved. | Should | +| NFR-012 | The system shall provide fallback resolution strategies. | Should | +| NFR-013 | The system shall preserve original quote text even when source resolution fails. | Must | + +### 9.3 Security and Privacy + +| ID | Requirement | Priority | +| ------- | ---------------------------------------------------------------------------------------------------- | -------- | +| NFR-020 | The system shall avoid executing unsafe HTML content from imported documents. | Must | +| NFR-021 | The system shall support access control boundaries around document collections. | Should | +| NFR-022 | The system shall make external source lookup configurable and explicit. | Should | +| NFR-023 | The system shall avoid leaking private document text to external services unless explicitly allowed. | Must | + +### 9.4 Extensibility + +| ID | Requirement | Priority | +| ------- | --------------------------------------------------------------------------- | -------- | +| NFR-030 | The system shall allow additional document formats through viewer adapters. | Must | +| NFR-031 | The system shall allow additional selector types. | Should | +| NFR-032 | The system shall allow custom evidence target types beyond form fields. | Should | +| NFR-033 | The system shall allow custom citation card renderers. | Should | + +### 9.5 Usability + +| ID | Requirement | Priority | +| ------- | ---------------------------------------------------------------------------------------- | -------- | +| NFR-040 | Users should be able to create a citation with minimal interaction after selecting text. | Must | +| NFR-041 | Users should be able to understand which source passage supports which form field. | Must | +| NFR-042 | Users should be able to switch between evidence items without losing form context. | Must | +| NFR-043 | Users should be warned when source matching is uncertain. | Should | + +--- + +## 10. Subsystem Responsibilities + +### 10.1 citation-evidence + +Umbrella product and integration repository. + +Responsibilities: + +* Product documentation. +* Reference workspace application. +* Integration of subsystem packages. +* Demo deployments. +* Cross-subsystem test scenarios. +* Overall product shell and navigation. + +### 10.2 evidence-anchor + +Format-neutral anchoring and highlight resolution. + +Responsibilities: + +* Selector model. +* Text quote selectors. +* Text position selectors. +* PDF rectangle selectors. +* DOM/structural selectors. +* Anchor resolution. +* Re-anchoring strategies. +* Highlight rendering contract. + +### 10.3 citation-work + +Document review workspace. + +Responsibilities: + +* Document collection UI. +* Review workflow. +* Annotation capture UX. +* Evidence sidebar. +* Review status. +* Collection navigation. + +### 10.4 evidence-source + +Document source ingestion and recovery. + +Responsibilities: + +* Document import. +* Metadata extraction. +* Fingerprinting. +* Text extraction. +* Source lookup. +* Local source matching. +* External source discovery hooks. +* Citation recovery workflows. + +### 10.5 evidence-binder + +Binding evidence to structured targets. + +Responsibilities: + +* Evidence-to-field links. +* Evidence-to-claim links. +* Evidence sets. +* Relation types. +* Evidence status and confidence. +* Form synchronization state. +* Visual guide model. + +### 10.6 citation-engine + +Core domain engine and service layer. + +Responsibilities: + +* Domain model. +* API contracts. +* Persistence interfaces. +* Citation card rendering. +* Export to Markdown/HTML. +* W3C-compatible annotation mapping. +* Cross-subsystem orchestration. + +--- + +## 11. Suggested MVP Scope + +### MVP A: PDF Review and Citation Cards + +Must include: + +* Add PDF to collection. +* Display PDF. +* Select text. +* Create annotation with commentary. +* Store selectors and quote. +* Show evidence sidebar. +* Click evidence to reopen context. +* Export citation card as Markdown or HTML. + +### MVP B: Evidence-Backed Form Mode + +Must include: + +* Simple form definition. +* Side-by-side form and document viewer. +* Link evidence to form field. +* Activate field to focus evidence. +* Switch evidence for field. +* Show visual state for active field/evidence/highlight. + +### MVP C: Markdown/HTML Document Support + +Must include: + +* Render Markdown and HTML sources. +* Select text in rendered document. +* Create annotation using text quote and position selectors. +* Reopen and highlight selected passage. +* Reuse same evidence sidebar and citation card logic. + +### MVP D: Local Citation Recovery + +Should include: + +* Paste quote or citation clue. +* Search local indexed documents. +* Show candidate matches. +* Confirm passage. +* Create annotation and evidence item. + +--- + +## 12. Acceptance Criteria + +The first usable version is acceptable when: + +1. A user can create a document collection and add at least one PDF. +2. A user can select text in the PDF and create an evidence item with commentary. +3. A user can leave the citation and later reopen the document with the cited passage highlighted and centered. +4. A user can display a form next to the document viewer. +5. A user can link an evidence item to a form field. +6. Activating the form field focuses the relevant evidence and document context. +7. A user can export the citation as a reusable Markdown or HTML citation card. +8. The internal model does not depend on one specific viewer library. + +--- + +## 13. Open Questions + +1. Should the first implementation be React-first, web-component-first, or headless-core-first with adapters? +2. Should the storage model initially use local files, SQLite, PostgreSQL, or browser storage? +3. Should W3C Web Annotation JSON-LD be the native internal model or an import/export mapping? +4. Should form definitions be JSON Schema-based, custom, or adapter-based? +5. Should citation recovery start with local library search only, or include external web/source lookup from the beginning? +6. How much of the document text index should be persisted versus regenerated from source? +7. Should the system support multi-user collaboration early or remain single-user/local-first initially? +8. What is the minimum viable visual guide for field-to-evidence-to-highlight navigation? + +--- + +## 14. Initial Architecture Direction + +The system should be built around a headless citation and evidence core with viewer-specific adapters. + +Key architectural principles: + +1. **Viewer independence**: citations must not depend on one viewer implementation. +2. **Selector redundancy**: store multiple selector types for durable resolution. +3. **Evidence as first-class object**: evidence is more than an annotation; it can support fields, claims, and decisions. +4. **Format neutrality**: PDFs, Markdown, and HTML should share the same evidence model. +5. **Human confirmation**: uncertain source recovery and fuzzy matching should require user confirmation. +6. **Portable presentation**: citation cards should render in web pages, Markdown, and later reports. +7. **Agent readiness**: document, annotation, evidence, and binding structures should be machine-readable and API-accessible. + +--- + +## 15. Glossary + +| Term | Definition | +| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| Annotation | A technical mark or comment attached to a specific document range. | +| Citation | A reusable reference to source context, usually including quote, source, and link. | +| Evidence Item | A meaningful evidence object based on one or more annotations and usable in support of a field, claim, requirement, or decision. | +| Evidence Link | A relationship between an evidence item and a structured target. | +| Selector | A technical description of how to locate a passage within a document. | +| Re-anchoring | The process of resolving a citation again after layout or document changes. | +| Citation Card | A presentable rendering of a citation and commentary. | +| Document Representation | A normalized text, page, DOM, or structural representation generated from a document source. | +| Evidence Set | A group of evidence items connected to the same target or topic. | +| Citation Recovery | The process of finding and anchoring a cited passage from a quote or bibliographic clue. | +