Files
citation-evidence/history/2026-06-07-ecosystem-state-assessment.md
tegwick 2fd085b65e CE-WP-0006/0007: Capture view polish, workplans, and UX refinements
- Blob URL stability, scroll centre, strip-only visual guide
- Focus-gated linking, unlink clears overlay, field badge tooltips
- Capture layout (viewer centre), grey guide lines, Add field button
- Workplans CE-WP-0006 (done) and CE-WP-0007 (T01-T09 done, T10-T12 todo)
- Integration tests and viewer-url helpers
2026-06-08 00:37:34 +02:00

205 lines
9.6 KiB
Markdown

# Ecosystem State Assessment — citation-evidence family
**Date:** 2026-06-07
**Author:** Grok (Cursor), commissioned by Bernd
**Scope:** Review of all six `INTENT.md` files in the citation-evidence family, plus the
umbrella repo's code, workplans, wiki contracts, and test coverage — to assess current
state and recommend next steps.
---
## 1. Family topology
The citation-evidence ecosystem comprises **one umbrella repo and five subsystem repos**:
```text
citation-evidence (umbrella — all MVP code lives here)
├── citation-engine (domain model, services, persistence, rendering)
├── evidence-anchor (selectors, resolution, viewer adapter contract)
├── evidence-source (ingest, extraction, citation recovery)
├── citation-work (review workspace UX)
└── evidence-binder (evidence-to-target binding, visual guide)
```
| Repo | Declared role | Actual state (2026-06-07) |
|------|---------------|---------------------------|
| **citation-evidence** | Umbrella product, contracts, reference app | **Active** — ~118 TS/TSX files, tests, workplans, wiki, ADRs |
| **citation-engine** | Domain model, services, persistence, rendering | **INTENT + README only** — code in `src/{shared,engine}/` |
| **evidence-anchor** | Selectors, resolution, viewer adapter | **INTENT + README only** — code in `src/anchor/` |
| **evidence-source** | Ingest, extraction, recovery | **INTENT + README only** — code in `src/source/` (PDF only) |
| **citation-work** | Review workspace UX | **INTENT + README only** — code in `src/work/` |
| **evidence-binder** | Evidence-to-target binding, visual guide | **INTENT + README only** — code in `src/binder/` |
This is **intentional**, not neglect. On 2026-05-24 the family adopted an
**umbrella-first MVP** (ADR-0002 context, `INTENT.md` §MVP Strategy): prove the product
in one repo, then extract subsystems once boundaries are validated by real use.
---
## 2. INTENT.md quality — design maturity is high
All six `INTENT.md` files are coherent and mutually reinforcing. They share:
- The same core flow:
`Document → DocumentRepresentation → Annotation → EvidenceItem → EvidenceLink → CitationCard`
- Explicit **in-scope / out-of-scope** boundaries (each repo pushes responsibilities outward)
- A consistent document shape (Purpose, Scope, Workflows, Success Criteria, Guiding Statement)
- A shared **"MVP Coordination — Code Lives Upstream"** section pointing at
`citation-evidence/wiki/`
The umbrella `INTENT.md` is the strategic anchor: it owns shared contracts, integration,
and the reference scenario. Sister repos document *future* homes, not current code.
### 2.1 Ambiguities from the original INTENTs — largely resolved
The initial assessment (`history/2026-05-24-initial-assessment.md`) flagged overlapping
ownership (selectors, evidence states, viewer adapters, recovery). Those have since been
codified in:
- `wiki/SharedContracts.md` — canonical enums, vocabulary, type/behavior split
- `wiki/DependencyMap.md` — allowed import edges, cycle prevention
- `docs/decisions/` — ADR-0004 (PDF viewer), ADR-0006 (selector ownership),
ADR-0005 (persistence), ADR-0007 (citation card format), ADR-0008 (session archive), etc.
Notable reconciliations baked into sister INTENTs:
- `strong-support` / `weak-support` / `contradicts` moved from `EvidenceItem.status`
to `EvidenceLink.relation`
- Selector **types** → engine; selector **algorithms** → anchor
- `citation-work` must not depend on `evidence-binder` (review works standalone;
forms compose both)
---
## 3. Implementation state — MVP reference scenario is done
Workplans **CE-WP-0001 through CE-WP-0005** are all `status: done`:
| Workplan | Delivers |
|----------|----------|
| CE-WP-0001 | Scaffold, folder partitions, ESLint boundary rules, normalization, fixtures |
| CE-WP-0002 | PDF review slice — engine types, anchor, source ingest, viewer, sidebar |
| CE-WP-0003 | Form binding + visual guide (rect registry, SVG overlay) |
| CE-WP-0004 | Citation card export (Markdown + HTML) |
| CE-WP-0005 | Named sessions, arbitrary PDF upload, ZIP export/import |
The PRD §20 reference scenario is covered end-to-end for **PDF**:
1. Create collection/session
2. Upload PDF
3. Select passage → annotation → evidence item
4. Open side-by-side form
5. Link evidence to field
6. Focus field → coordinated highlight + visual guide
7. Export citation card
Test coverage includes 7 integration tests (PRD scenario, forms flows, overlay, citation
export, session ZIP round-trip, anchor/source roundtrip) plus extensive unit tests per
subsystem folder. Recent git activity (June 2026) shows active polish on PDF text-layer
positioning and session UX.
Boundary enforcement is real: `eslint-plugin-boundaries` guards the
`src/{shared,engine,anchor,source,binder,work,app}/` dependency graph described in
`DependencyMap.md`.
---
## 4. Gap analysis — vision vs. current code
Against the full product vision in the PRD and subsystem INTENTs, significant pieces
remain **designed but not built**:
| Capability | PRD / INTENT status | Code status |
|------------|---------------------|-------------|
| **PDF review & evidence capture** | Primary MVP | **Implemented** |
| **Evidence-backed forms + visual guide** | Primary MVP | **Implemented** |
| **Citation card export** | Primary MVP | **Implemented** |
| **Session portability (ZIP)** | Demo enhancement | **Implemented** (CE-WP-0005) |
| **Markdown / HTML documents** | Primary goal (FR) | **Not started**`src/source/` is PDF-only |
| **Citation recovery mode** | Third product mode | **Not started**`CitationRecoveryAttempt` in contracts/ids only |
| **Document review status workflow** | `citation-work` INTENT | **Not wired**`reviewStatus` enum in contracts, no UI usage |
| **External source discovery** | Future / privacy-sensitive | **Deferred** (correct per PRD non-goals) |
| **Sister repo extraction** | Post-MVP | **Not started** — all code still in umbrella |
| **Monorepo vs. polyrepo decision** | ADR-0002 | **Still blank** — blocks clean extraction |
**Housekeeping debt:** `workplans/README.md` is stale (still lists CE-WP-0001..0004 as
`todo`); the individual workplan files correctly show `done`.
---
## 5. Per-repo assessment
### 5.1 citation-evidence — healthy, past MVP baseline
**Strengths:** Working reference app, enforced architecture, rich documentation, completed
Ralph workplans, contracts that sister repos can defer to.
**Risks:** Umbrella carries all complexity; extraction strategy undecided; PDF-only
implementation may hide format-neutral claims until HTML/Markdown adapters land; citation
recovery is a large remaining vertical with no code yet.
**Verdict:** The **center of gravity** of the family. This is where all meaningful
engineering lives today.
### 5.2 Sister repos (engine, anchor, source, work, binder) — scaffolded placeholders
**Strengths:** Excellent `INTENT.md` + `README.md` that correctly point upstream; LICENSE
and git remotes in place; boundaries pre-negotiated via umbrella wiki.
**Gaps:** No `package.json`, no source, no CI, no published packages. They are **boundary
documents**, not runnable libraries.
**Verdict:** Ready as **extraction targets**, not as independent products. Extraction should
follow ADR-0002 resolution and a deliberate `git mv` + package cut per README.
---
## 6. Strategic read
The family is in a **deliberate transitional architecture**:
```text
Phase A (complete): Design six-repo boundaries + build MVP in umbrella
Phase B (current): Harden PDF path, demo UX, contracts via real use
Phase C (next): Format expansion (MD/HTML) and/or citation recovery
Phase D (later): Extract subsystems to sister repos
```
Compared to the original phased plan in `history/2026-05-24-initial-assessment.md`, the
project has **skipped ahead**: Phase 1 (PDF vertical slice) and Phase 2 (form binding)
are done, plus demo/session portability. Phase 3 (format expansion) and Phase 4 (local
citation recovery) have **not** started.
The INTENT documents describe a mature, agent-friendly architecture. The code validates the
**hardest integration path** (PDF selection → durable selectors → form binding → visual
guide → export). What remains is mostly **breadth** (more formats, recovery mode) and
**structural** (extraction, packaging).
---
## 7. Recommended priorities
1. **Update `workplans/README.md`** to reflect CE-WP-0001..0005 as done; add CE-WP-0006
for the next vertical (Markdown adapter or local citation recovery — pick one).
2. **Resolve ADR-0002** before any extraction — monorepo workspaces vs. published
packages affects everything downstream.
3. **Either** expand formats (validates "format-neutral" claim) **or** build citation
recovery (validates third product mode) — doing both in parallel would split focus.
4. **Extract `citation-engine` first** when ready — it is the leaf node every other repo
depends on; `shared/` + `engine/` are the most stable slices.
---
## 8. Bottom line
The citation family is **well-architected on paper and materially implemented in one
place**. The six `INTENT.md` files form a consistent, boundary-aware design; the umbrella
repo has delivered a working PDF-centric MVP with tests and enforced dependency rules. The
five sister repos are **correctly empty** during umbrella-first MVP — they are extraction
targets, not lagging implementations.
**Overall state:** design maturity high, implementation maturity solid for PDF MVP,
extraction maturity low, product breadth ~half of full PRD vision.
The main open question is what comes next — format expansion, citation recovery, or
subsystem extraction.