Files
citation-engine/INTENT.md
tegwick 6ba8f23b1f Add MVP Coordination section: code lives in citation-evidence umbrella during MVP
Documents the umbrella-first MVP decision (2026-05-24). This repo remains
INTENT-only until the engine's interfaces stabilize through real product
use. Points at the umbrella's wiki/SharedContracts.md, wiki/DependencyMap.md,
and docs/decisions/ as the canonical homes for cross-repo agreements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 16:50:59 +02:00

351 lines
10 KiB
Markdown

# INTENT
## Purpose
This repository exists to provide the core domain engine for the **citation-evidence** ecosystem.
**citation-engine** defines the stable conceptual model, service contracts, API boundaries, persistence interfaces, and citation rendering logic needed to manage documents, annotations, evidence items, evidence links, and reusable citation presentations.
It is the domain center of the system.
---
## Primary Utility
The repository provides the shared engine that allows the other citation-evidence subsystem repositories to work against a common model.
It should define and coordinate the core concepts:
- **Document**
- **DocumentRepresentation**
- **Annotation**
- **Selector**
- **EvidenceItem**
- **EvidenceSet**
- **EvidenceLink**
- **CitationCard**
- **CitationRecoveryAttempt**
The engine should make it possible to create, store, retrieve, relate, render, and export evidence-backed citations without depending on one specific viewer, frontend, storage backend, or document format.
---
## Intended Users
Primary users of this repository are developers and agents building the citation-evidence system.
They include:
- developers implementing the review workspace
- developers implementing evidence-backed form workflows
- developers implementing document ingestion and citation recovery
- developers implementing anchoring and re-anchoring logic
- developers integrating citation-evidence into other applications
- coding agents that need a stable model and API boundary for implementation work
End users should usually experience this repository indirectly through applications built on top of it.
---
## Strategic Role
The strategic role of **citation-engine** is to prevent the citation-evidence ecosystem from becoming a loose collection of viewer-specific annotation tools.
It provides the shared domain language that keeps the system coherent.
The repository should ensure that:
- annotations are not tied to one viewer implementation,
- evidence is treated as a first-class object,
- source passages can be reused across forms, claims, requirements, reports, and webpages,
- citation presentation is portable,
- storage and rendering implementations remain replaceable,
- subsystem repositories can evolve without breaking the core conceptual model.
---
## Core Concept
The engine models the central flow of evidence-backed citation work:
```text
Document
→ DocumentRepresentation
→ Annotation
→ EvidenceItem
→ EvidenceLink
→ CitationCard
````
A **Document** is the known source.
A **DocumentRepresentation** is a normalized, searchable, addressable representation of that source.
An **Annotation** is a technical mark on a source range.
An **EvidenceItem** is the meaningful evidence object created from one or more annotations.
An **EvidenceLink** connects evidence to a structured target such as a form field, claim, requirement, decision, or document section.
A **CitationCard** is a portable rendering of evidence for use in webpages, Markdown, reports, or other documents.
---
## Scope
This repository should own:
* the core domain model
* TypeScript interfaces and schemas for citation-evidence entities
* service contracts for documents, annotations, evidence, bindings, and citation rendering
* persistence interfaces
* event definitions
* citation card rendering contracts
* import/export contracts
* W3C Web Annotation mapping where practical
* orchestration-level use cases that combine domain objects
It should provide the stable contracts consumed by:
* **evidence-anchor**
* **evidence-source**
* **citation-work**
* **evidence-binder**
* **citation-evidence**
---
## Out of Scope
This repository should not own implementation details that belong to more focused subsystems.
Specifically, it should not own:
* PDF rendering internals
* viewer-specific selection capture
* low-level selector resolution algorithms
* fuzzy text matching implementations
* document parsing and ingestion pipelines
* OCR processing
* external source discovery implementations
* full review workspace UI
* form UI rendering
* visual guide overlay rendering
* application shell and deployment configuration
Those responsibilities belong to the appropriate subsystem repositories.
---
## Architectural Position
The repository sits between the product shell and the specialized subsystems.
```text
citation-evidence
integrated product shell
citation-engine
core model, APIs, persistence contracts, citation rendering
evidence-anchor
selector creation, resolution, re-anchoring, highlight contracts
evidence-source
ingestion, extraction, metadata, source discovery, recovery
evidence-binder
evidence-to-field / claim / requirement links
citation-work
review workspace and annotation UX
```
The engine should define the contract, not dictate every implementation.
---
## Design Principles
### Viewer Independence
The engine must not depend on one PDF viewer, Markdown renderer, HTML renderer, or frontend framework.
Viewer-specific logic should be hidden behind adapters owned by other subsystems.
### Evidence as First-Class Object
Evidence must not be reduced to a highlight.
An evidence item may include commentary, confidence, status, tags, and links to structured targets.
### Selector Neutrality
The engine should understand selectors as domain objects but should not own all selector resolution logic.
Selector creation and resolution belong primarily to **evidence-anchor**.
### Storage Replaceability
The engine should define persistence interfaces that can be implemented by local files, SQLite, PostgreSQL, browser storage, or other storage backends.
### Portable Presentation
Citation rendering should support multiple output targets, especially:
* internal web UI
* web components
* Markdown
* HTML
* later report/document exports
### Standards Compatibility
The engine should support mapping to W3C Web Annotation concepts where practical, but it does not need to use JSON-LD as the only internal representation.
### Agent Readiness
The model and APIs should be explicit, machine-readable, and suitable for agentic workflows.
---
## Initial Domain Services
The first implementation should likely define service contracts for:
```text
DocumentService
create, get, update, list, attach representation
AnnotationService
create, get, list by document, resolve status, update
EvidenceService
create evidence item, attach annotation, update commentary, set status
EvidenceBindingService
link evidence to target, list evidence for target, switch active evidence
CitationRenderingService
render citation card as HTML, Markdown, or structured object
ImportExportService
import/export internal model and W3C-compatible annotation data
```
These services may initially be interfaces and in-memory implementations only.
---
## Initial Entity Set
The first model version should include:
```text
Document
DocumentRepresentation
Annotation
Selector
EvidenceItem
EvidenceSet
EvidenceLink
CitationCard
CitationTarget
CitationRecoveryAttempt
```
The first implementation does not need to be complete, but the naming should stabilize early to guide the other repositories.
---
## Integration Expectations
This repository should be easy to consume from other subsystem projects.
Expected consumers:
* **evidence-anchor** uses the selector and annotation types.
* **evidence-source** creates documents and document representations.
* **citation-work** creates annotations and evidence items through engine services.
* **evidence-binder** creates and manages evidence links.
* **citation-evidence** integrates the services into the reference workspace.
The engine should avoid circular dependencies with these repositories.
---
## First Useful Version
A first useful version of **citation-engine** should provide:
* core TypeScript types for the main domain objects,
* in-memory repositories for development and tests,
* basic services for creating documents, annotations, evidence items, and evidence links,
* a simple citation card renderer for Markdown and HTML,
* basic event types,
* initial W3C Web Annotation mapping notes or stubs,
* examples showing how the other subsystem repos should interact with the engine.
---
## Success Criteria
The repository is successful when another developer or coding agent can use it to understand and implement the core citation-evidence domain without guessing the central concepts.
A first successful implementation should make it possible to:
1. create a document,
2. attach a document representation,
3. create an annotation with selectors,
4. create an evidence item from the annotation,
5. link the evidence item to a target,
6. render the evidence as a citation card,
7. leave viewer-specific and ingestion-specific work to other subsystems.
---
## Repository Character
This repository should be:
* domain-centered,
* stable but evolvable,
* implementation-light at first,
* strongly typed,
* explicit about boundaries,
* friendly to both humans and coding agents,
* suitable as the conceptual backbone of the citation-evidence ecosystem.
---
## MVP Coordination — Code Lives Upstream
During the umbrella-first MVP phase (decided 2026-05-24), **the source code
for this subsystem does not live in this repository yet**. It lives in the
umbrella repo at `citation-evidence/src/engine/` and `citation-evidence/src/shared/`.
This INTENT.md documents the *intended* responsibilities and boundaries.
When the engine's interfaces have stabilized through actual MVP use, the
corresponding code extracts into this repository.
**Shared contracts** (vocabulary, state enums, relation types, selector
taxonomy, event types, viewer adapter, canonical text normalization, allowed
dependency edges) are maintained in the umbrella repo:
* `citation-evidence/wiki/SharedContracts.md`
* `citation-evidence/wiki/DependencyMap.md`
* `citation-evidence/docs/decisions/` (ADRs)
This subsystem's eventual code must not contradict those documents. Changes
to shared contracts happen in the umbrella, not here.
Under the dependency map, **`citation-engine` is the leaf node** — it depends
on none of the other subsystems. Every other subsystem depends on it.
---
## Guiding Statement
**citation-engine exists to make evidence-backed citations a stable, reusable, and portable domain model.**