- INTENT.md: declare umbrella as the home for shared contracts; document umbrella-first MVP decision (code lives here until subsystems stabilize) - wiki/SharedContracts.md: vocabulary, state enums, relation types, selector taxonomy, event vocabulary, viewer adapter contract, canonical text normalization, rect-registry contract - wiki/DependencyMap.md: allowed dependency edges; folder layout + lint-rule strategy during umbrella-first phase - history/2026-05-24-initial-assessment.md: alignment review, technical risks, and the umbrella-first pivot rationale - workplans/CE-WP-0001..0004: four ralph-compatible workplans covering foundations, PDF review slice, form binding + visual guide, and citation card export — implementing PRD §20 end-to-end Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9.4 KiB
INTENT
Purpose
This repository exists to provide the umbrella product, integration shell, and reference implementation for citation-evidence.
citation-evidence is a document-centered evidence workspace for capturing, managing, presenting, and reopening citations with contextual commentary across PDFs, Markdown, HTML, and other document formats.
The project enables users to turn source passages into reusable evidence objects that can support form fields, claims, requirements, decisions, reports, and web publications.
A citation should not be a dead reference. It should be an actionable bridge back to the source context.
Primary Utility
The repository provides the integrated workspace and coordination layer for the citation-evidence system.
It brings together the subsystem projects:
- citation-engine — core domain model, APIs, persistence contracts, citation rendering, and orchestration
- evidence-anchor — durable selectors, anchoring, re-anchoring, and highlight resolution
- evidence-source — document ingestion, text extraction, source metadata, and citation recovery
- citation-work — document collection review, annotation workflow, and evidence sidebar UX
- evidence-binder — linking evidence to form fields, claims, requirements, decisions, and other structured targets
The umbrella repository exists to demonstrate and validate how these subsystems work together as one coherent product.
Intended Users
Primary users are people and systems that need evidence-backed information work:
- researchers and analysts reviewing document collections
- form workers and case processors who need source-backed field entries
- consultants and knowledge workers producing evidence-backed reports
- compliance, audit, procurement, and legal-adjacent workers who need traceable justification
- product and requirements workers linking source material to structured decisions
- developers integrating citation-evidence capabilities into other applications
- agentic assistants helping users search, extract, bind, and present evidence
Strategic Role
The strategic role of citation-evidence is to establish a reusable infrastructure layer for evidence-backed information spaces.
It connects three activities that are often handled separately:
- reading and annotating documents,
- extracting reusable citations and commentary,
- binding evidence to structured outputs such as forms, claims, requirements, reports, and web pages.
The project should become a foundation for workflows where information must remain traceable to its source context.
Core Concept
The central flow of the system is:
Source Document
→ Document Representation
→ Durable Annotation Anchor
→ Evidence Item with Commentary
→ Evidence Link to Field / Claim / Requirement
→ Portable Citation Card
→ Reopenable Source Context
The system treats an evidence item as more than a highlight.
An evidence item is a reusable object that can:
- quote a source passage,
- preserve commentary,
- reopen the source context,
- support or contradict a structured target,
- be exported into another document or webpage,
- be reused by humans and software agents.
Scope
This repository owns the integrated product scope.
It should contain:
- product documentation
- architecture documentation
- integration scenarios
- reference workspace application
- cross-subsystem examples
- demo data and test workflows
- deployment sketches
- system-level acceptance tests
- onboarding material for developers and agents
It should coordinate the subsystem repositories without absorbing their responsibilities.
Out of Scope
This repository should not become the implementation home for all subsystem internals.
Specifically, it should not own:
- low-level selector and re-anchoring algorithms
- full document ingestion and extraction pipelines
- the complete persistence implementation
- all viewer-specific internals
- all form-binding logic
- all citation rendering logic
Those responsibilities belong in the focused subsystem repositories.
The umbrella repository should integrate, validate, and demonstrate them.
Initial Product Modes
The integrated product should support three primary modes.
1. Document Review
Users add documents to a collection, review them, highlight relevant passages, add commentary, and create reusable evidence items.
2. Evidence-Backed Forms
Users display source documents next to structured forms. Form fields can be linked to evidence items. Activating a field focuses the corresponding source citation and visually connects field, evidence item, and document highlight.
3. Citation Recovery
Users provide a citation, quote, or source clue. The system searches local and eventually configured external sources, locates candidate passages, and allows the user to confirm and turn the passage into a navigable annotation.
Architectural Direction
The project should be built around a headless, format-neutral evidence model with viewer-specific adapters.
Key principles:
- citations must not depend on one specific viewer implementation
- multiple selector types should be stored for durable re-anchoring
- evidence items should be first-class domain objects
- PDFs, Markdown, HTML, and future formats should share the same evidence model
- uncertain source recovery should require human confirmation
- citation cards should be portable across web, Markdown, and later report outputs
- APIs and data structures should be suitable for agentic workflows
Initial Reference Scenario
The first end-to-end scenario should be:
- A user creates a document collection.
- The user adds a PDF.
- The user selects a passage and adds commentary.
- The system creates an annotation and evidence item.
- The user opens a form next to the document.
- The user links the evidence item to a form field.
- The user focuses the field.
- The system highlights the field, evidence item, and source passage.
- The system draws a visual guide between them.
- The user exports the evidence as a Markdown or HTML citation card.
This scenario validates the core product promise without requiring advanced collaboration or external source discovery.
Repository Character
This repository should be:
- integrative rather than monolithic
- product-oriented rather than library-only
- documentation-rich
- testable through reference scenarios
- friendly to human developers and coding agents
- explicit about subsystem boundaries
- suitable as the entry point for the overall citation-evidence ecosystem
Home for Shared Contracts
This repository is the single home for everything the sister repos must
agree on. The canonical documents live in wiki/:
wiki/ProductRequirementsDocument.md— what the product doeswiki/ArchitectureOverview.md— how the subsystems composewiki/SharedContracts.md— vocabulary, state enums, relation types, selector taxonomy, event types, viewer adapter contract, canonical text normalizationwiki/DependencyMap.md— which subsystem may depend on whichdocs/decisions/— ADRs that resolve ambiguities and bind the contract
Sister repos (citation-engine, evidence-anchor, evidence-source,
citation-work, evidence-binder) defer to these documents. When their
own INTENT.md files mention "shared contracts", they mean the documents
listed above.
Changes to shared contracts happen here, not in the sister repos.
MVP Strategy — Umbrella-First (decided 2026-05-24)
The MVP lives entirely in this repository before being segmented into the sister repos. This is a deliberate trade-off: fewer interface decisions up front, more refactoring later when extraction happens.
The reasoning:
- The architectural boundaries documented in the sister INTENT files are hypotheses. We do not yet know which ones will hold up under real product pressure.
- Coordinating six repos with no working code is expensive. Coordinating one repo with working code is cheap.
- Interfaces designed in advance of implementation tend to be wrong.
- Extracting working code into a new repo is a known, bounded refactor. Reshaping a premature interface while implementing against it is not.
Concretely:
- All MVP source code lives under
citation-evidence/src/, partitioned by future-repo names (shared/,engine/,anchor/,source/,work/,binder/,app/). - The
DependencyMap.mdrules are enforced by lint rules on these folders. - The five sister repos remain INTENT-only during MVP — they document the intended boundary, not current code.
- When a subsystem's interface stabilizes (typically after the MVP scenario
has run end-to-end at least once), its
src/<repo-name>/slice extracts to the sister repo.
This INTENT will be updated when extraction happens.
Success Criteria
The repository is successful when it allows a developer or agent to understand, run, and extend the citation-evidence system as an integrated product.
A first useful version should make it possible to:
- load a document collection,
- review a PDF,
- create an evidence item from selected text,
- link that evidence item to a structured form field,
- reopen the cited source context from the field,
- render the evidence as a citation card,
- understand which subsystem owns which part of the implementation.
Guiding Statement
citation-evidence exists to make source-backed information work navigable, reusable, and trustworthy.