# INTENT ## Purpose This repository exists to provide the umbrella product, integration shell, and reference implementation for **citation-evidence**. **citation-evidence** is a document-centered evidence workspace for capturing, managing, presenting, and reopening citations with contextual commentary across PDFs, Markdown, HTML, and other document formats. The project enables users to turn source passages into reusable evidence objects that can support form fields, claims, requirements, decisions, reports, and web publications. A citation should not be a dead reference. It should be an actionable bridge back to the source context. --- ## Primary Utility The repository provides the integrated workspace and coordination layer for the citation-evidence system. It brings together the subsystem projects: - **citation-engine** — core domain model, APIs, persistence contracts, citation rendering, and orchestration - **evidence-anchor** — durable selectors, anchoring, re-anchoring, and highlight resolution - **evidence-source** — document ingestion, text extraction, source metadata, and citation recovery - **citation-work** — document collection review, annotation workflow, and evidence sidebar UX - **evidence-binder** — linking evidence to form fields, claims, requirements, decisions, and other structured targets The umbrella repository exists to demonstrate and validate how these subsystems work together as one coherent product. --- ## Intended Users Primary users are people and systems that need evidence-backed information work: - researchers and analysts reviewing document collections - form workers and case processors who need source-backed field entries - consultants and knowledge workers producing evidence-backed reports - compliance, audit, procurement, and legal-adjacent workers who need traceable justification - product and requirements workers linking source material to structured decisions - developers integrating citation-evidence capabilities into other applications - agentic assistants helping users search, extract, bind, and present evidence --- ## Strategic Role The strategic role of **citation-evidence** is to establish a reusable infrastructure layer for **evidence-backed information spaces**. It connects three activities that are often handled separately: 1. reading and annotating documents, 2. extracting reusable citations and commentary, 3. binding evidence to structured outputs such as forms, claims, requirements, reports, and web pages. The project should become a foundation for workflows where information must remain traceable to its source context. --- ## Core Concept The central flow of the system is: ```text Source Document → Document Representation → Durable Annotation Anchor → Evidence Item with Commentary → Evidence Link to Field / Claim / Requirement → Portable Citation Card → Reopenable Source Context ```` The system treats an evidence item as more than a highlight. An evidence item is a reusable object that can: * quote a source passage, * preserve commentary, * reopen the source context, * support or contradict a structured target, * be exported into another document or webpage, * be reused by humans and software agents. --- ## Scope This repository owns the integrated product scope. It should contain: * product documentation * architecture documentation * integration scenarios * reference workspace application * cross-subsystem examples * demo data and test workflows * deployment sketches * system-level acceptance tests * onboarding material for developers and agents It should coordinate the subsystem repositories without absorbing their responsibilities. --- ## Out of Scope This repository should not become the implementation home for all subsystem internals. Specifically, it should not own: * low-level selector and re-anchoring algorithms * full document ingestion and extraction pipelines * the complete persistence implementation * all viewer-specific internals * all form-binding logic * all citation rendering logic Those responsibilities belong in the focused subsystem repositories. The umbrella repository should integrate, validate, and demonstrate them. --- ## Initial Product Modes The integrated product should support three primary modes. ### 1. Document Review Users add documents to a collection, review them, highlight relevant passages, add commentary, and create reusable evidence items. ### 2. Evidence-Backed Forms Users display source documents next to structured forms. Form fields can be linked to evidence items. Activating a field focuses the corresponding source citation and visually connects field, evidence item, and document highlight. ### 3. Citation Recovery Users provide a citation, quote, or source clue. The system searches local and eventually configured external sources, locates candidate passages, and allows the user to confirm and turn the passage into a navigable annotation. --- ## Architectural Direction The project should be built around a headless, format-neutral evidence model with viewer-specific adapters. Key principles: * citations must not depend on one specific viewer implementation * multiple selector types should be stored for durable re-anchoring * evidence items should be first-class domain objects * PDFs, Markdown, HTML, and future formats should share the same evidence model * uncertain source recovery should require human confirmation * citation cards should be portable across web, Markdown, and later report outputs * APIs and data structures should be suitable for agentic workflows --- ## Initial Reference Scenario The first end-to-end scenario should be: 1. A user creates a document collection. 2. The user adds a PDF. 3. The user selects a passage and adds commentary. 4. The system creates an annotation and evidence item. 5. The user opens a form next to the document. 6. The user links the evidence item to a form field. 7. The user focuses the field. 8. The system highlights the field, evidence item, and source passage. 9. The system draws a visual guide between them. 10. The user exports the evidence as a Markdown or HTML citation card. This scenario validates the core product promise without requiring advanced collaboration or external source discovery. --- ## Repository Character This repository should be: * integrative rather than monolithic * product-oriented rather than library-only * documentation-rich * testable through reference scenarios * friendly to human developers and coding agents * explicit about subsystem boundaries * suitable as the entry point for the overall citation-evidence ecosystem --- ## Home for Shared Contracts This repository is the **single home for everything the sister repos must agree on**. The canonical documents live in `wiki/`: * `wiki/ProductRequirementsDocument.md` — what the product does * `wiki/ArchitectureOverview.md` — how the subsystems compose * `wiki/SharedContracts.md` — vocabulary, state enums, relation types, selector taxonomy, event types, viewer adapter contract, canonical text normalization * `wiki/DependencyMap.md` — which subsystem may depend on which * `docs/decisions/` — ADRs that resolve ambiguities and bind the contract Sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`, `citation-work`, `evidence-binder`) defer to these documents. When their own `INTENT.md` files mention "shared contracts", they mean the documents listed above. Changes to shared contracts happen here, not in the sister repos. --- ## MVP Strategy — Umbrella-First (decided 2026-05-24) **The MVP lives entirely in this repository before being segmented into the sister repos.** This is a deliberate trade-off: fewer interface decisions up front, more refactoring later when extraction happens. The reasoning: 1. The architectural boundaries documented in the sister INTENT files are hypotheses. We do not yet know which ones will hold up under real product pressure. 2. Coordinating six repos with no working code is expensive. Coordinating one repo with working code is cheap. 3. Interfaces designed in advance of implementation tend to be wrong. 4. Extracting working code into a new repo is a known, bounded refactor. Reshaping a premature interface while implementing against it is not. Concretely: * All MVP source code lives under `citation-evidence/src/`, partitioned by future-repo names (`shared/`, `engine/`, `anchor/`, `source/`, `work/`, `binder/`, `app/`). * The `DependencyMap.md` rules are enforced by lint rules on these folders. * The five sister repos remain INTENT-only during MVP — they document the intended boundary, not current code. * When a subsystem's interface stabilizes (typically after the MVP scenario has run end-to-end at least once), its `src//` slice extracts to the sister repo. This INTENT will be updated when extraction happens. --- ## Success Criteria The repository is successful when it allows a developer or agent to understand, run, and extend the citation-evidence system as an integrated product. A first useful version should make it possible to: * load a document collection, * review a PDF, * create an evidence item from selected text, * link that evidence item to a structured form field, * reopen the cited source context from the field, * render the evidence as a citation card, * understand which subsystem owns which part of the implementation. --- ## Guiding Statement **citation-evidence exists to make source-backed information work navigable, reusable, and trustworthy.**