generated from coulomb/repo-seed
Establish shared-contracts home, dependency map, MVP workplans, and umbrella-first strategy
- INTENT.md: declare umbrella as the home for shared contracts; document umbrella-first MVP decision (code lives here until subsystems stabilize) - wiki/SharedContracts.md: vocabulary, state enums, relation types, selector taxonomy, event vocabulary, viewer adapter contract, canonical text normalization, rect-registry contract - wiki/DependencyMap.md: allowed dependency edges; folder layout + lint-rule strategy during umbrella-first phase - history/2026-05-24-initial-assessment.md: alignment review, technical risks, and the umbrella-first pivot rationale - workplans/CE-WP-0001..0004: four ralph-compatible workplans covering foundations, PDF review slice, form binding + visual guide, and citation card export — implementing PRD §20 end-to-end Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
155
wiki/DependencyMap.md
Normal file
155
wiki/DependencyMap.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Dependency Map — citation-evidence
|
||||
|
||||
This document describes the **allowed dependency edges** between the
|
||||
subsystems of the citation-evidence ecosystem. It is the cycle-prevention
|
||||
contract.
|
||||
|
||||
It complements `SharedContracts.md` (which says *what* is shared) by saying
|
||||
*who is allowed to depend on whom*.
|
||||
|
||||
---
|
||||
|
||||
## 1. The rule
|
||||
|
||||
> Types flow downward from `citation-engine`. Behavior flows upward into
|
||||
> specialised repos. No subsystem may import another subsystem's behavior
|
||||
> unless this map shows an edge.
|
||||
|
||||
The umbrella repo `citation-evidence` is allowed to depend on every
|
||||
subsystem; nothing depends on the umbrella.
|
||||
|
||||
---
|
||||
|
||||
## 2. Allowed edges
|
||||
|
||||
```
|
||||
┌───────────────────────┐
|
||||
│ citation-evidence │ (umbrella)
|
||||
└───────────┬───────────┘
|
||||
│ depends on
|
||||
┌──────────────────────────┼────────────────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌───────────────┐ ┌────────────────┐ ┌────────────────┐
|
||||
│ citation- │ │ evidence- │ │ citation- │
|
||||
│ work │ │ binder │ │ engine │
|
||||
└──────┬────────┘ └────────┬───────┘ └────────┬───────┘
|
||||
│ │ │
|
||||
│ depends on │ depends on │ depends on
|
||||
│ │ │ (nothing —
|
||||
▼ ▼ │ leaf node)
|
||||
┌────────────────┐ ┌────────────────┐ │
|
||||
│ evidence- │ │ evidence- │ │
|
||||
│ anchor │ │ anchor │ │
|
||||
└──────┬─────────┘ └────────┬───────┘ │
|
||||
│ │ │
|
||||
│ depends on │ depends on │
|
||||
▼ ▼ ▼
|
||||
┌────────────────┐ ┌────────────────┐ (citation-engine)
|
||||
│ evidence- │ │ citation- │
|
||||
│ source │ │ engine │
|
||||
└────────┬───────┘ └────────────────┘
|
||||
│
|
||||
│ depends on
|
||||
▼
|
||||
┌────────────────┐
|
||||
│ citation- │
|
||||
│ engine │
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
In tabular form:
|
||||
|
||||
| Repo | May depend on | Must not depend on |
|
||||
|--------------------|--------------------------------------------------------|-----------------------------------------|
|
||||
| `citation-engine` | (nothing — it is the leaf) | every other subsystem |
|
||||
| `evidence-anchor` | `citation-engine` | `evidence-source`, `citation-work`, `evidence-binder`, `citation-evidence` |
|
||||
| `evidence-source` | `citation-engine` | `evidence-anchor`, `citation-work`, `evidence-binder`, `citation-evidence` |
|
||||
| `evidence-binder` | `citation-engine`, `evidence-anchor` | `evidence-source`, `citation-work`, `citation-evidence` |
|
||||
| `citation-work` | `citation-engine`, `evidence-anchor`, `evidence-source`| `evidence-binder`, `citation-evidence` |
|
||||
| `citation-evidence`| all five subsystems | (nothing else in the ecosystem) |
|
||||
|
||||
Notes:
|
||||
|
||||
- `evidence-source` does NOT depend on `evidence-anchor`. When an ingestion
|
||||
pipeline needs to know "could a selector resolve here?", the answer comes
|
||||
through events, not direct calls.
|
||||
- `citation-work` does NOT depend on `evidence-binder`. Linking evidence to
|
||||
form fields is a separate workflow; the review workspace should function
|
||||
without it. A separate "evidence-backed form" application composes work +
|
||||
binder + engine.
|
||||
- `evidence-binder` does NOT depend on `evidence-source`. When a binder needs
|
||||
source context, it asks `evidence-anchor` to resolve the annotation, which
|
||||
in turn knows nothing about how the document was ingested.
|
||||
|
||||
---
|
||||
|
||||
## 3. Communication channels
|
||||
|
||||
Direct imports are allowed only along the edges above. Where two subsystems
|
||||
need to coordinate without being allowed to import each other, they use one
|
||||
of these indirect channels:
|
||||
|
||||
| Channel | Owner | Notes |
|
||||
|---------------------------------|------------------|---------------------------------------------------------|
|
||||
| Shared event bus | `citation-engine`| Vocabulary frozen in `SharedContracts.md` §4 |
|
||||
| Shared types package | `citation-engine`| Re-exported through `@citation-evidence/engine` (post-extraction) |
|
||||
| Rect registry | `evidence-binder`| Used by form UI, evidence sidebar, viewer adapter |
|
||||
| Persistence interfaces | `citation-engine`| Concrete adapters in subsystems but interfaces in engine|
|
||||
|
||||
---
|
||||
|
||||
## 4. During umbrella-first MVP
|
||||
|
||||
While all code lives in `citation-evidence/src/`, the rule is enforced by
|
||||
**folder structure** and **lint rules**:
|
||||
|
||||
```
|
||||
citation-evidence/src/
|
||||
shared/ ← what will become citation-engine (types + contracts)
|
||||
engine/ ← what will become citation-engine (services)
|
||||
anchor/ ← what will become evidence-anchor
|
||||
source/ ← what will become evidence-source
|
||||
work/ ← what will become citation-work (UI)
|
||||
binder/ ← what will become evidence-binder
|
||||
app/ ← the umbrella reference app
|
||||
```
|
||||
|
||||
Lint rule (to be added in WP-0001):
|
||||
|
||||
- `engine/` may import only from `shared/`.
|
||||
- `anchor/` may import only from `shared/`, `engine/`.
|
||||
- `source/` may import only from `shared/`, `engine/`.
|
||||
- `binder/` may import only from `shared/`, `engine/`, `anchor/`.
|
||||
- `work/` may import only from `shared/`, `engine/`, `anchor/`, `source/`.
|
||||
- `app/` may import from any.
|
||||
|
||||
Violating these rules in MVP is a lint error, not a runtime error. When
|
||||
subsystems extract into their own repos, the lint rule disappears and the
|
||||
package boundary enforces the same constraint.
|
||||
|
||||
---
|
||||
|
||||
## 5. Why these rules
|
||||
|
||||
1. **`citation-engine` as the leaf** prevents the most common monorepo pathology:
|
||||
the "core" repo accumulating UI/IO dependencies because it was easier than
|
||||
inverting a dependency.
|
||||
2. **`citation-work` ⊄ `evidence-binder`** keeps the review workspace usable
|
||||
even when there is no form context (e.g. just collecting evidence for a
|
||||
report).
|
||||
3. **`evidence-binder` ⊄ `evidence-source`** keeps binding logic from
|
||||
accidentally caring about ingestion details.
|
||||
4. **No subsystem depends on `citation-evidence`** — the umbrella is a
|
||||
composition point, not a library.
|
||||
|
||||
---
|
||||
|
||||
## 6. Change process
|
||||
|
||||
Adding an edge to this map is a change to the contract.
|
||||
|
||||
- New edges require a short ADR in `docs/decisions/`.
|
||||
- Removing an edge requires a refactoring plan (where do consumers go?).
|
||||
- The MVP itself is an exception: edges that turn out to be wrong during
|
||||
umbrella-first development are recorded as "deferred reshape" items in the
|
||||
relevant workplan, not as ADRs.
|
||||
296
wiki/SharedContracts.md
Normal file
296
wiki/SharedContracts.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# Shared Contracts — citation-evidence
|
||||
|
||||
This document is the **single source of truth** for everything that more than one
|
||||
subsystem in the citation-evidence ecosystem must agree on:
|
||||
|
||||
- the **vocabulary** (entity names and what they mean),
|
||||
- the **canonical state enums** for entities that flow across repo boundaries,
|
||||
- the **relation type** vocabulary,
|
||||
- the **selector type** taxonomy,
|
||||
- the **event type** vocabulary,
|
||||
- the **ownership rules** for shared types versus shared behavior.
|
||||
|
||||
The five sister repos (`citation-engine`, `evidence-anchor`, `evidence-source`,
|
||||
`citation-work`, `evidence-binder`) defer to this document. When their
|
||||
`INTENT.md` files refer to "shared contracts", they mean this file.
|
||||
|
||||
During the umbrella-first MVP phase, the **TypeScript implementations** of
|
||||
these contracts live in `citation-evidence/src/shared/` and are imported by
|
||||
the per-subsystem code under `citation-evidence/src/{engine,anchor,source,work,binder}/`.
|
||||
When a subsystem extracts to its own repo, it takes its slice of the shared
|
||||
types with it — but this document remains the canonical vocabulary.
|
||||
|
||||
---
|
||||
|
||||
## 1. Vocabulary
|
||||
|
||||
These nine entities are the vocabulary every subsystem uses.
|
||||
|
||||
| Entity | One-line definition | Owner (post-extraction) |
|
||||
|---------------------------|----------------------------------------------------------------------------------------------------|-------------------------|
|
||||
| `Document` | An identified source object: PDF, Markdown, HTML, scan, etc. | `citation-engine` |
|
||||
| `DocumentRepresentation` | A normalized, addressable view of a document (canonical text, page map, structure). | `citation-engine` |
|
||||
| `Selector` | A technical locator for a passage inside a representation. | `citation-engine` (types) / `evidence-anchor` (behavior) |
|
||||
| `Annotation` | A technical mark on a document range, expressed as one or more selectors plus quote text. | `citation-engine` |
|
||||
| `EvidenceItem` | A meaningful evidence object built from one or more annotations, with commentary and status. | `citation-engine` |
|
||||
| `EvidenceSet` | An ordered group of evidence items associated with a target or topic. | `citation-engine` (type) / `evidence-binder` (behavior) |
|
||||
| `EvidenceLink` | A relation between an `EvidenceItem` and a structured target (form field, claim, requirement, …). | `citation-engine` (type) / `evidence-binder` (behavior) |
|
||||
| `CitationCard` | A renderable, exportable presentation of an evidence item. | `citation-engine` |
|
||||
| `CitationRecoveryAttempt` | A traceable attempt to locate a cited passage from an external clue. | `citation-engine` (type) / `evidence-source` (behavior) |
|
||||
|
||||
**Ownership rule:** *types and interfaces flow downward from `citation-engine`;
|
||||
behavior flows upward into the specialised repos*. Where the table shows a
|
||||
split, the engine repo holds the data shape and the other repo holds the
|
||||
algorithms and lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## 2. Canonical state enums
|
||||
|
||||
These enums are the authoritative values. Subsystems must not invent local
|
||||
variants without updating this document first.
|
||||
|
||||
### 2.1 `Annotation.resolutionStatus`
|
||||
|
||||
```
|
||||
resolved — selectors located the passage with high confidence
|
||||
ambiguous — multiple plausible candidates found
|
||||
unresolved — no plausible candidate found
|
||||
stale — representation has changed since selectors were stored
|
||||
```
|
||||
|
||||
### 2.2 `EvidenceItem.status`
|
||||
|
||||
```
|
||||
candidate — captured but not yet vetted
|
||||
confirmed — verified by a user as useful evidence
|
||||
rejected — explicitly discarded
|
||||
needs-check — flagged for review
|
||||
```
|
||||
|
||||
> **Note:** earlier subsystem drafts introduced `strong-support`, `weak-support`,
|
||||
> and `contradicts` on the item. Those concepts now live on the **link**, not
|
||||
> the item — see §2.4.
|
||||
|
||||
### 2.3 `Document.reviewStatus` (when used by `citation-work`)
|
||||
|
||||
```
|
||||
unreviewed
|
||||
in-review
|
||||
relevant
|
||||
rejected
|
||||
needs-follow-up
|
||||
cited
|
||||
verified
|
||||
```
|
||||
|
||||
`citation-work` may treat any of these as the active state; the canonical
|
||||
storage lives on the Document record in `citation-engine`.
|
||||
|
||||
### 2.4 `EvidenceLink.status` (per target)
|
||||
|
||||
```
|
||||
no-evidence
|
||||
candidate
|
||||
confirmed
|
||||
conflicting
|
||||
insufficient
|
||||
verified
|
||||
```
|
||||
|
||||
`no-evidence` is a *derived* state computed when a target has zero links;
|
||||
it is not stored on a link itself.
|
||||
|
||||
### 2.5 `EvidenceLink.relation`
|
||||
|
||||
```
|
||||
supports
|
||||
contradicts
|
||||
explains
|
||||
qualifies
|
||||
source-for
|
||||
context-for
|
||||
```
|
||||
|
||||
This is the closed vocabulary for the MVP. Adding a relation requires updating
|
||||
this document and the `EvidenceLink` schema together.
|
||||
|
||||
### 2.6 `CitationRecoveryAttempt.state`
|
||||
|
||||
```
|
||||
created
|
||||
source-found-fulltext
|
||||
source-found-preview-only
|
||||
source-found-metadata-only
|
||||
source-not-found
|
||||
quote-found
|
||||
quote-not-found
|
||||
candidate-passages-found
|
||||
manual-confirmation-needed
|
||||
confirmed
|
||||
annotation-created
|
||||
failed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Selector taxonomy
|
||||
|
||||
A `Selector` is a discriminated union of:
|
||||
|
||||
```
|
||||
TextQuoteSelector exact quote + prefix/suffix context
|
||||
TextPositionSelector canonical text start/end offsets
|
||||
PdfRectSelector page number + normalized page rectangles
|
||||
PdfPageTextSelector page number + page-local text offsets
|
||||
DomRangeSelector DOM path + range offsets (HTML/Markdown)
|
||||
StructuralSelector heading/section/AST path
|
||||
FragmentSelector exported fragment / deep link (export-only)
|
||||
```
|
||||
|
||||
**Selector redundancy rule:** when an annotation is created, the system stores
|
||||
*all selector types that are available* for that document representation, not
|
||||
just one. Resolution tries them in order of expected confidence and stops at
|
||||
the first high-confidence match.
|
||||
|
||||
W3C Web Annotation mapping uses these same concepts but as JSON-LD; the mapping
|
||||
is documented separately (see ADR-0003 — pending).
|
||||
|
||||
---
|
||||
|
||||
## 4. Event vocabulary
|
||||
|
||||
Events are the primary integration mechanism between subsystems. The closed
|
||||
event vocabulary for the MVP is:
|
||||
|
||||
```
|
||||
DocumentImported
|
||||
DocumentRepresentationGenerated
|
||||
AnnotationCreated
|
||||
AnnotationResolved
|
||||
AnnotationResolutionFailed
|
||||
EvidenceItemCreated
|
||||
EvidenceItemUpdated
|
||||
EvidenceLinkCreated
|
||||
EvidenceLinkUpdated
|
||||
EvidenceItemActivated
|
||||
FormFieldActivated
|
||||
CitationCardRendered
|
||||
CitationRecoveryStarted
|
||||
CitationRecoveryCandidateFound
|
||||
CitationRecoveryConfirmed
|
||||
```
|
||||
|
||||
Subsystems must emit these events through a shared event bus owned by
|
||||
`citation-engine`. Subsystems may listen to any event but must not invent
|
||||
event types without updating this document.
|
||||
|
||||
---
|
||||
|
||||
## 5. Viewer adapter contract
|
||||
|
||||
Viewer adapters are the bridge between a document format and the rest of the
|
||||
system. They are **owned by `evidence-anchor`** as far as the contract goes;
|
||||
concrete adapters may live in either `evidence-anchor` or `evidence-source`
|
||||
depending on whether the heavy lifting is selector logic or document
|
||||
representation logic.
|
||||
|
||||
```ts
|
||||
interface DocumentViewerAdapter {
|
||||
mediaTypes: string[];
|
||||
load(document: Document, representation?: DocumentRepresentation): Promise<void>;
|
||||
getCurrentSelection(): Promise<SelectionCapture | null>;
|
||||
createSelectorsFromSelection(selection: SelectionCapture): Promise<Selector[]>;
|
||||
resolveSelectors(selectors: Selector[]): Promise<AnchorResolution>;
|
||||
scrollToResolvedTarget(target: ResolvedAnchorTarget, opts?: { center?: boolean; behavior?: "auto"|"smooth" }): Promise<void>;
|
||||
renderHighlight(target: ResolvedAnchorTarget, opts?: HighlightRenderOptions): Promise<void>;
|
||||
getHighlightClientRects(annotationId: string): Promise<DOMRect[]>;
|
||||
}
|
||||
```
|
||||
|
||||
MVP delivers a single `PDFViewerAdapter`. HTML and Markdown adapters are
|
||||
deferred.
|
||||
|
||||
---
|
||||
|
||||
## 6. Canonical text normalization
|
||||
|
||||
All text-based selectors and quote matching depend on a deterministic
|
||||
normalization function. The MVP normalization is:
|
||||
|
||||
1. Unicode NFC normalization.
|
||||
2. Replace all line-ending sequences with `\n`.
|
||||
3. Collapse runs of horizontal whitespace into a single space.
|
||||
4. Strip soft hyphens (U+00AD).
|
||||
5. Preserve paragraph boundaries (double `\n`).
|
||||
|
||||
**This function is versioned.** Stored selectors record the normalization
|
||||
version they were created against. Changing the function later requires either
|
||||
backwards-compatible behavior or a re-anchoring migration.
|
||||
|
||||
The reference implementation lives in `citation-evidence/src/shared/text/normalize.ts`.
|
||||
|
||||
---
|
||||
|
||||
## 7. Visual guide rect registry
|
||||
|
||||
The visual-guide overlay (form field → evidence card → source highlight)
|
||||
requires DOM rects from three independently-rendered subsystems. The contract
|
||||
is a **rect registry** owned by `evidence-binder`:
|
||||
|
||||
```ts
|
||||
interface RectRegistry {
|
||||
register(kind: "field" | "evidence-card" | "highlight", id: string, getRect: () => DOMRect | null): () => void;
|
||||
getRect(kind: "field" | "evidence-card" | "highlight", id: string): DOMRect | null;
|
||||
subscribe(listener: (event: RectRegistryEvent) => void): () => void;
|
||||
}
|
||||
```
|
||||
|
||||
Each renderer (form, evidence sidebar, viewer adapter) registers a
|
||||
`getRect` callback. The overlay queries on-demand and re-renders on scroll,
|
||||
resize, focus, and active-evidence change.
|
||||
|
||||
This contract MUST be defined and stable before any of the three renderers
|
||||
hardens, or the overlay becomes the system's coupling bottleneck.
|
||||
|
||||
---
|
||||
|
||||
## 8. Ownership rules (the short version)
|
||||
|
||||
1. **Types and interfaces** flow downward from `citation-engine`.
|
||||
2. **Behavior and algorithms** live in the specialised repos.
|
||||
3. Where a concept appears in both a type and a behavior context (e.g.
|
||||
`Selector`, `EvidenceLink`, `EvidenceSet`, `CitationRecoveryAttempt`),
|
||||
the engine owns the shape and the specialised repo owns the lifecycle.
|
||||
4. **The shared event bus is engine-owned**; subsystems publish and subscribe
|
||||
but do not extend the event vocabulary unilaterally.
|
||||
5. **No new enum values, relation types, event types, or selector kinds**
|
||||
land in code without first appearing in this document.
|
||||
6. During umbrella-first MVP: rules 1-5 are aspirational. We will tolerate
|
||||
small violations in `citation-evidence/src/` and reconcile during extraction.
|
||||
|
||||
---
|
||||
|
||||
## 9. Change process
|
||||
|
||||
Changes to this document are change to the contract.
|
||||
|
||||
- Small additions (a new enum value, a new event type) can be made in a single
|
||||
PR that updates this doc + the type definitions + at least one consumer.
|
||||
- Breaking changes (renaming an entity, removing a state, changing an
|
||||
ownership split) require a short ADR in `docs/decisions/` and a heads-up
|
||||
progress event on the state-hub.
|
||||
|
||||
---
|
||||
|
||||
## 10. Pending ADRs that will affect this document
|
||||
|
||||
These are listed in `docs/decisions/` once written. Until then the document
|
||||
reflects the current best understanding from the architecture overview.
|
||||
|
||||
- **ADR-0001** — Umbrella-first MVP strategy (decided 2026-05-24, this session).
|
||||
- **ADR-0002** — Monorepo vs polyrepo packaging (pending).
|
||||
- **ADR-0003** — W3C Web Annotation: lossy mapping vs round-trip guarantee (pending).
|
||||
- **ADR-0004** — PDF viewer library choice: `react-pdf-highlighter-plus` vs PDF.js direct (pending).
|
||||
- **ADR-0005** — Persistence: local-first SQLite vs Postgres from day one (pending).
|
||||
- **ADR-0006** — Selector ownership split (types in engine, algorithms in anchor) (pending — implied here).
|
||||
Reference in New Issue
Block a user