Files
citation-evidence/wiki/ProductRequirementsDocument.md

30 KiB

Product Requirements Document: citation-evidence

1. Definition

citation-evidence is a document-centered evidence workspace for capturing, managing, presenting, and re-opening citations with contextual commentary across PDFs and other document formats.

The product enables users to review collections of documents, mark passages, attach commentary, bind evidence to structured targets such as form fields or claims, and later re-open the cited document context with the cited passage highlighted and centered in the viewport.

It is designed as an umbrella project coordinating a set of focused subsystem repositories:

Repository Role
citation-evidence Umbrella product, integration layer, workspace shell, documentation, reference deployment
evidence-anchor Format-neutral anchoring, selector, highlighting, and re-anchoring mechanisms
citation-work Review workspace for document collections, annotation workflows, and citation creation
evidence-source Document ingestion, source discovery, metadata, full-text extraction, and citation recovery
evidence-binder Binding of evidence items to form fields, claims, requirements, decisions, and document sections
citation-engine Core domain model, APIs, storage model, citation rendering, export, and orchestration logic

2. Context

Many workflows require information to be extracted from documents and justified with precise evidence. Typical examples include legal review, compliance documentation, procurement processes, academic research, product documentation, requirements engineering, grant applications, audits, and structured form submission.

Current document viewers, PDF annotators, and citation tools often treat these needs separately:

  • PDF viewers display and annotate documents but do not provide durable, reusable evidence objects.
  • Citation managers track bibliographic references but often do not preserve exact document context and commentary.
  • Form systems collect structured information but rarely maintain traceable evidence links to source passages.
  • Web annotation tools can mark documents but are not usually optimized for evidence-backed form filling or structured claim support.

citation-evidence addresses this gap by treating citations as reusable evidence objects that connect document passages to structured targets and can be rendered in other contexts.


3. Product Vision

citation-evidence enables evidence-backed information work by making cited document context reusable, navigable, and structurally linkable.

The product should allow a user to move smoothly from reading and marking documents to using those marked passages as evidence for forms, claims, reports, and web pages.

A citation should not be a dead reference. It should be an actionable bridge back to the source context.


4. Goals

4.1 Primary Goals

  1. Allow users to add documents to a review collection and capture highlighted citations with commentary.
  2. Allow citations to be stored as durable evidence objects independent of one specific viewer implementation.
  3. Allow a citation to reopen the source document with the cited passage highlighted and centered.
  4. Support both paginated documents such as PDFs and non-paginated documents such as Markdown and HTML.
  5. Allow evidence citations to be linked to form fields and other structured targets.
  6. Allow users to switch between multiple evidence items connected to a field, claim, or requirement.
  7. Provide reusable citation presentation components for webpages, reports, and other documents.
  8. Provide a path to citation recovery from bibliographic references, quotes, or partial source descriptions.

4.2 Secondary Goals

  1. Support a modular repository architecture with clear subsystem responsibilities.
  2. Use open standards where practical, especially W3C-style web annotation concepts.
  3. Reuse mature open-source document viewing, parsing, and annotation components where appropriate.
  4. Support future collaboration features such as review status, shared collections, and evidence validation.
  5. Support future agentic workflows for document review, quote matching, source discovery, and form assistance.

5. Non-Goals

The first product version shall not attempt to solve all document management problems.

The following are explicitly out of scope for the initial version:

  1. Full enterprise document management.
  2. Complete bibliographic reference management comparable to Zotero or Mendeley.
  3. Legal-grade digital signature workflows.
  4. General-purpose PDF editing.
  5. Full OCR correction workflow for scanned documents.
  6. Automated truth verification of evidence.
  7. Fully automatic citation recovery without human confirmation.
  8. Real-time multi-user collaborative editing in the first iteration.

These may become future capabilities but should not burden the MVP.


6. Target Users

6.1 Primary Users

User Type Description Core Need
Researcher / Analyst Reviews many documents and extracts relevant evidence Capture, organize, and reuse citations
Form Worker / Case Processor Fills structured forms based on document evidence Link form fields to source passages
Consultant / Knowledge Worker Produces reports, memos, or structured recommendations Export evidence-backed citation cards
Compliance / Audit Worker Needs traceable evidence for claims or submitted information Maintain source-backed audit trail
Product / Requirements Worker Maps source material to requirements or decisions Bind evidence to claims and artifacts

6.2 Secondary Users

User Type Description Core Need
Developer Integrates citation-evidence into another application APIs, web components, stable data model
Reviewer Checks whether evidence supports a field or claim Efficient navigation between claim and source
Agentic Assistant Helps search, suggest, or classify evidence Machine-readable domain model and APIs

7. Primary Use Cases

7.1 Document Collection Review

A user creates a collection of documents, reviews them, highlights passages, adds commentary, and stores the marked passages as evidence items for later use.

User Story

As a user, I want to add documents to a review collection and mark relevant passages with commentary, so that I can later find, reuse, and cite those passages.

Functional Expectations

  • The user can create or open a document collection.
  • The user can add PDFs, Markdown pages, HTML documents, and later other formats.
  • The system displays the document in an appropriate viewer.
  • The user can select text and create an annotation.
  • The user can add commentary to the annotation.
  • The system stores quote text, source metadata, selectors, and commentary.
  • The user can browse evidence items collected from the documents.
  • The user can click an evidence item and return to the exact document context.

7.2 Evidence-Backed Form Filling

A user displays a document next to a form. Form fields can be linked to evidence items. Activating a field opens or focuses the relevant citation context in the document viewer.

User Story

As a user, I want to fill a form while viewing source documents, so that each important field can be backed by precise document evidence.

Functional Expectations

  • The user can display a structured form next to a document viewer.
  • The user can link an annotation or evidence item to a form field.
  • A form field can have zero, one, or multiple linked evidence items.
  • Activating a field displays the linked evidence list.
  • Activating a field focuses the document viewer on the currently selected evidence item.
  • The cited text is highlighted and centered in the viewport where possible.
  • The UI provides a visual guide from form field to evidence item to source highlight.
  • The user can switch between multiple evidence items connected to the same field.
  • The system can indicate whether a required field has sufficient evidence.

7.3 Citation Recovery

A user provides a citation, quote, or bibliographic clue. The system searches local and possibly online sources for the cited work, locates the passage, and allows the user to create an annotation from the recovered context.

User Story

As a user, I want to provide an external citation or quote and have the system find the source passage when available, so that I can turn a dead reference into a navigable citation annotation.

Functional Expectations

  • The user can enter a citation, quote, bibliographic reference, DOI, URL, title, author, page reference, or partial source description.
  • The system searches the local document library first.
  • The system may search configured external sources where allowed.
  • The system identifies candidate documents.
  • The system searches for exact and fuzzy quote matches.
  • The system presents candidate passages for confirmation.
  • The user confirms the correct passage.
  • The system creates a document reference, annotation, and evidence item.
  • The system records unresolved or partially resolved citation recovery attempts.

8. Functional Requirements

8.1 Document Library and Collection Management

ID Requirement Priority
FR-001 The system shall allow users to create document collections. Must
FR-002 The system shall allow users to add documents to a collection. Must
FR-003 The system shall store document metadata including title, source URI, media type, fingerprint, and version where available. Must
FR-004 The system shall distinguish between original document source and generated document representations. Must
FR-005 The system shall support filtering and searching documents within a collection. Should
FR-006 The system shall support review status per document. Should

8.2 Document Viewing

ID Requirement Priority
FR-010 The system shall display PDF documents in a browser-based viewer. Must
FR-011 The system shall display Markdown documents as rendered HTML. Must
FR-012 The system shall display HTML documents in a normalized/sandboxed view. Must
FR-013 The system shall provide a common viewer adapter interface across document formats. Must
FR-014 The system shall support scrolling a document to a resolved annotation target. Must
FR-015 The system shall support centering the annotation target in the viewport where technically possible. Must
FR-016 The system shall support virtualized rendering for large documents where appropriate. Should

8.3 Annotation and Anchoring

ID Requirement Priority
FR-020 The system shall allow users to select text and create an annotation. Must
FR-021 The system shall capture the exact selected text. Must
FR-022 The system shall capture prefix and suffix context for robust re-anchoring. Must
FR-023 The system shall capture text position selectors where available. Must
FR-024 The system shall capture PDF page and normalized rectangle selectors for PDF documents where available. Must
FR-025 The system shall support DOM or structural selectors for HTML and Markdown representations where available. Should
FR-026 The system shall support fuzzy re-anchoring when exact selectors fail. Should
FR-027 The system shall identify unresolved or orphaned annotations. Should

8.4 Commentary and Evidence Items

ID Requirement Priority
FR-030 The system shall allow users to add commentary to an annotation. Must
FR-031 The system shall create evidence items based on one or more annotations. Must
FR-032 The system shall allow evidence items to have status, tags, confidence, and commentary. Should
FR-033 The system shall show evidence items in a sidebar or evidence panel. Must
FR-034 The system shall allow users to navigate from an evidence item to the source document context. Must
FR-035 The system shall support evidence items that support, contradict, explain, or source a target. Should

8.5 Evidence Binding

ID Requirement Priority
FR-040 The system shall allow evidence items to be linked to form fields. Must
FR-041 The system shall support multiple evidence items per form field. Must
FR-042 The system shall allow users to switch between evidence items linked to a field. Must
FR-043 The system shall allow evidence items to be linked to claims, requirements, decisions, or document sections. Should
FR-044 The system shall indicate whether a field has no evidence, candidate evidence, or confirmed evidence. Should
FR-045 The system shall support relation types such as supports, contradicts, explains, and source-for. Should

8.6 Evidence Form UI

ID Requirement Priority
FR-050 The system shall display a form next to a document viewer. Must
FR-051 The system shall focus linked evidence when a form field is activated. Must
FR-052 The system shall visually identify the active form field, evidence item, and document annotation. Must
FR-053 The system shall provide a visual guide connecting form field, evidence item, and annotation highlight. Should
FR-054 The system shall support keyboard navigation between evidence items. Should
FR-055 The system shall support evidence chips or indicators near form fields. Should

8.7 Citation Presentation and Export

ID Requirement Priority
FR-060 The system shall render evidence items as citation cards. Must
FR-061 Citation cards shall include quote, source label, commentary, and open-context action. Must
FR-062 The system shall export citation cards as HTML. Must
FR-063 The system shall export citation cards as Markdown. Must
FR-064 The system should support configurable citation display styles. Should
FR-065 The system should support embedding citation cards as web components. Should

8.8 Citation Recovery

ID Requirement Priority
FR-070 The system shall allow users to enter a citation, quote, or source clue for recovery. Should
FR-071 The system shall search local documents for matching sources and quotes. Should
FR-072 The system shall support exact quote matching. Should
FR-073 The system shall support fuzzy quote matching. Should
FR-074 The system shall present candidate matches for user confirmation. Should
FR-075 The system may search configured external sources for digitally available documents. Could
FR-076 The system shall record unsuccessful recovery attempts. Could

8.9 APIs and Integration

ID Requirement Priority
FR-080 The system shall expose APIs for documents, annotations, evidence items, and evidence links. Must
FR-081 The system shall support a reusable web component or frontend component model. Must
FR-082 The system shall allow external systems to open a document viewer at a specific citation. Must
FR-083 The system shall support import/export of W3C Web Annotation-compatible data where practical. Should
FR-084 The system shall expose machine-readable structures suitable for agentic workflows. Should

9. Non-Functional Requirements

9.1 Performance

ID Requirement Priority
NFR-001 The viewer should open common documents with acceptable latency for interactive review. Must
NFR-002 Large PDFs should be rendered lazily or virtually where possible. Should
NFR-003 Citation navigation should feel immediate after the document representation has been indexed. Should
NFR-004 Text extraction and indexing should be cacheable by document fingerprint. Must

9.2 Reliability

ID Requirement Priority
NFR-010 Citations shall remain stable across zoom, resize, and viewport changes. Must
NFR-011 The system shall detect when a citation can no longer be resolved. Should
NFR-012 The system shall provide fallback resolution strategies. Should
NFR-013 The system shall preserve original quote text even when source resolution fails. Must

9.3 Security and Privacy

ID Requirement Priority
NFR-020 The system shall avoid executing unsafe HTML content from imported documents. Must
NFR-021 The system shall support access control boundaries around document collections. Should
NFR-022 The system shall make external source lookup configurable and explicit. Should
NFR-023 The system shall avoid leaking private document text to external services unless explicitly allowed. Must

9.4 Extensibility

ID Requirement Priority
NFR-030 The system shall allow additional document formats through viewer adapters. Must
NFR-031 The system shall allow additional selector types. Should
NFR-032 The system shall allow custom evidence target types beyond form fields. Should
NFR-033 The system shall allow custom citation card renderers. Should

9.5 Usability

ID Requirement Priority
NFR-040 Users should be able to create a citation with minimal interaction after selecting text. Must
NFR-041 Users should be able to understand which source passage supports which form field. Must
NFR-042 Users should be able to switch between evidence items without losing form context. Must
NFR-043 Users should be warned when source matching is uncertain. Should

10. Subsystem Responsibilities

10.1 citation-evidence

Umbrella product and integration repository.

Responsibilities:

  • Product documentation.
  • Reference workspace application.
  • Integration of subsystem packages.
  • Demo deployments.
  • Cross-subsystem test scenarios.
  • Overall product shell and navigation.

10.2 evidence-anchor

Format-neutral anchoring and highlight resolution.

Responsibilities:

  • Selector model.
  • Text quote selectors.
  • Text position selectors.
  • PDF rectangle selectors.
  • DOM/structural selectors.
  • Anchor resolution.
  • Re-anchoring strategies.
  • Highlight rendering contract.

10.3 citation-work

Document review workspace.

Responsibilities:

  • Document collection UI.
  • Review workflow.
  • Annotation capture UX.
  • Evidence sidebar.
  • Review status.
  • Collection navigation.

10.4 evidence-source

Document source ingestion and recovery.

Responsibilities:

  • Document import.
  • Metadata extraction.
  • Fingerprinting.
  • Text extraction.
  • Source lookup.
  • Local source matching.
  • External source discovery hooks.
  • Citation recovery workflows.

10.5 evidence-binder

Binding evidence to structured targets.

Responsibilities:

  • Evidence-to-field links.
  • Evidence-to-claim links.
  • Evidence sets.
  • Relation types.
  • Evidence status and confidence.
  • Form synchronization state.
  • Visual guide model.

10.6 citation-engine

Core domain engine and service layer.

Responsibilities:

  • Domain model.
  • API contracts.
  • Persistence interfaces.
  • Citation card rendering.
  • Export to Markdown/HTML.
  • W3C-compatible annotation mapping.
  • Cross-subsystem orchestration.

11. Suggested MVP Scope

MVP A: PDF Review and Citation Cards

Must include:

  • Add PDF to collection.
  • Display PDF.
  • Select text.
  • Create annotation with commentary.
  • Store selectors and quote.
  • Show evidence sidebar.
  • Click evidence to reopen context.
  • Export citation card as Markdown or HTML.

MVP B: Evidence-Backed Form Mode

Must include:

  • Simple form definition.
  • Side-by-side form and document viewer.
  • Link evidence to form field.
  • Activate field to focus evidence.
  • Switch evidence for field.
  • Show visual state for active field/evidence/highlight.

MVP C: Markdown/HTML Document Support

Must include:

  • Render Markdown and HTML sources.
  • Select text in rendered document.
  • Create annotation using text quote and position selectors.
  • Reopen and highlight selected passage.
  • Reuse same evidence sidebar and citation card logic.

MVP D: Local Citation Recovery

Should include:

  • Paste quote or citation clue.
  • Search local indexed documents.
  • Show candidate matches.
  • Confirm passage.
  • Create annotation and evidence item.

12. Acceptance Criteria

The first usable version is acceptable when:

  1. A user can create a document collection and add at least one PDF.
  2. A user can select text in the PDF and create an evidence item with commentary.
  3. A user can leave the citation and later reopen the document with the cited passage highlighted and centered.
  4. A user can display a form next to the document viewer.
  5. A user can link an evidence item to a form field.
  6. Activating the form field focuses the relevant evidence and document context.
  7. A user can export the citation as a reusable Markdown or HTML citation card.
  8. The internal model does not depend on one specific viewer library.

13. Open Questions

  1. Should the first implementation be React-first, web-component-first, or headless-core-first with adapters?
  2. Should the storage model initially use local files, SQLite, PostgreSQL, or browser storage?
  3. Should W3C Web Annotation JSON-LD be the native internal model or an import/export mapping?
  4. Should form definitions be JSON Schema-based, custom, or adapter-based?
  5. Should citation recovery start with local library search only, or include external web/source lookup from the beginning?
  6. How much of the document text index should be persisted versus regenerated from source?
  7. Should the system support multi-user collaboration early or remain single-user/local-first initially?
  8. What is the minimum viable visual guide for field-to-evidence-to-highlight navigation?

14. Initial Architecture Direction

The system should be built around a headless citation and evidence core with viewer-specific adapters.

Key architectural principles:

  1. Viewer independence: citations must not depend on one viewer implementation.
  2. Selector redundancy: store multiple selector types for durable resolution.
  3. Evidence as first-class object: evidence is more than an annotation; it can support fields, claims, and decisions.
  4. Format neutrality: PDFs, Markdown, and HTML should share the same evidence model.
  5. Human confirmation: uncertain source recovery and fuzzy matching should require user confirmation.
  6. Portable presentation: citation cards should render in web pages, Markdown, and later reports.
  7. Agent readiness: document, annotation, evidence, and binding structures should be machine-readable and API-accessible.

15. Glossary

Term Definition
Annotation A technical mark or comment attached to a specific document range.
Citation A reusable reference to source context, usually including quote, source, and link.
Evidence Item A meaningful evidence object based on one or more annotations and usable in support of a field, claim, requirement, or decision.
Evidence Link A relationship between an evidence item and a structured target.
Selector A technical description of how to locate a passage within a document.
Re-anchoring The process of resolving a citation again after layout or document changes.
Citation Card A presentable rendering of a citation and commentary.
Document Representation A normalized text, page, DOM, or structural representation generated from a document source.
Evidence Set A group of evidence items connected to the same target or topic.
Citation Recovery The process of finding and anchoring a cited passage from a quote or bibliographic clue.