Files
citation-evidence/workplans/CE-WP-0005-demo-sessions-zip-archive.md
tegwick 779ae0d317 Implement CE-WP-0005 T01-T08: demo app — sessions, uploads, ZIP archive
Turn the MVP into a self-contained demo. Users now:
  1. Land on an empty-state and create a named session.
  2. Drag-drop or pick arbitrary PDFs into that session.
  3. Annotate, build evidence, link to form fields — all session-scoped.
  4. Export the whole session as a single .zip archive (manifest +
     per-document PDFs).
  5. Import a .zip back — into a new session, or merged into an
     existing one (documents deduped by SHA-256 fingerprint;
     annotations/evidence/links added additively).

Architecture:
- New shared types: SessionId, Session, SessionArchiveManifest +
  parseSessionArchiveManifest with schema-version validation.
- SessionService (engine/services/sessions.ts) handles lifecycle
  (create/rename/delete/setActive) + emits 4 new events through its
  own bus; SharedContracts.md §4 lists the additions.
- SessionProvider (work/SessionContext.tsx) owns the cross-session
  state: service, per-session PdfByteStore registry, per-session
  version counter that drives EngineProvider remounts after imports.
- EngineProvider becomes session-aware (sessionId prop drives per-
  session localStorage keys). Bumping engineRevision after
  restoreFromStorage forces consumers to re-render so restored repos
  show up immediately.
- PdfByteStore (source/pdf/byte-store.ts) holds Uint8Array bytes per
  document and mints blob URLs; ingestPdfFromFile is the upload
  entry-point that wraps the existing ingestPdf pipeline.
- ADR-0008 locks the ZIP layout (manifest.json + documents/<id>.pdf),
  the manifest schema (schemaVersion 1), and the merge-on-collision
  policy. JSZip is the only new dependency.
- App.tsx restructured: SessionProvider at the root, EngineProvider
  keyed by ${sessionId}:${version}, hash routing #/s/<id>[/forms/demo],
  SessionMenu top-bar, CreateFirstSession empty state.
- New DocumentRemoved event for per-document delete cleanup in
  CollectionList; engine.documents.remove() is the new service method.

Tests:
- Unit: 16 SessionService lifecycle + persistence tests;
  per-session snapshot round-trip; PdfByteStore + ingestPdfFromFile;
  SessionArchive parser; exportSessionZip + importSessionZip with
  create + merge + corrupt-archive paths.
- DOM: UploadDropzone, session-scoped CollectionList delete,
  SessionMenu create/switch/rename, routing parser.
- E2E: tests/integration/session-export-reimport.dom.test.tsx walks
  the full create → annotate → export → reimport flow and asserts
  the additive merge (deduped doc + doubled evidence rows).
- Legacy E2Es updated to use a seed-session helper instead of the
  removed fixture-button flow.

Known limitation (documented in ADR-0008): re-importing your own
freshly-exported ZIP creates duplicate annotations. Forward pointer
left for an importBundleId follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 14:57:28 +02:00

508 lines
18 KiB
Markdown

---
id: CE-WP-0005
type: workplan
title: "Demo app — Named sessions, document uploads, ZIP export/import"
domain: citation_evidence
repo: citation-evidence
repo_id: a677c189-b4e2-4f2a-9e48-faa482c277e6
topic_slug: citation_evidence_mvp
topic_id: 96fa8e80-9f74-40f2-84cd-644e9747b9ec
state_hub_workstream_id: ec88caf3-85ad-413c-8ddd-ef7278f6ce57
status: done
owner: Bernd
created: 2026-05-25
updated: 2026-05-26
depends_on_workplan: CE-WP-0004
spec_refs:
- wiki/ProductRequirementsDocument.md
- wiki/ArchitectureOverview.md
- wiki/SharedContracts.md
---
# CE-WP-0005 — Demo App: Sessions + Uploads + ZIP Archive
Turn the MVP into a self-contained demo that a stranger can pick up and use.
After this workplan, a user can:
1. Land on the app and create a **named session** ("Lease 2024", "Klage
Müller", …).
2. Drag-drop or pick **arbitrary PDFs** into that session (no fixtures
required).
3. Annotate, build evidence, link to form fields, and export citation
cards — same flows as CE-WP-0002..0004, now scoped to the active
session.
4. **Export the whole session** as a single `.zip` archive: every PDF
plus a manifest with the engine snapshot.
5. **Import a `.zip` back** — into a new session, or merged into an
existing one (documents deduped by fingerprint; annotations and
evidence added additively).
The demo replaces the current single-bucket app: Review and Forms modes
both become **session-scoped**. The previous fixture-driven workflow
survives as an optional "Sample sessions" quick-start.
## Scoping decisions (locked before drafting)
- **Demo placement:** the demo *replaces* the main app. The MVP Review
and Forms layouts continue to work, but now under a session.
- **PDF byte storage:** in-memory only. PDFs survive within a tab
session; reloading the page loses uploaded bytes unless the ZIP was
exported. Re-importing the ZIP restores them. No IndexedDB tier in
this workplan.
- **Import conflict policy:** if a ZIP carries a session name that
already exists, **merge**. Documents are deduped by SHA-256
fingerprint (incoming references rebound to the existing
`documentId`). Annotations, evidence, and links are added as fresh
ids — additive, never overwriting. Locked in
[`ADR-0008`](../docs/decisions/ADR-0008-session-archive-format.md)
(created in T05).
## Dependency Order
```
T01 (Session model + service + per-session snapshots)
├─ T02 (PdfByteStore + uploaded-document ingest path)
│ └─ T03 (Upload UI + session-scoped CollectionList)
└─ T04 (Session management UI — top-bar menu + hash routing)
T05 (ADR-0008 + SessionArchive schema)
├─ T06 (Export session as ZIP) ───┐
└─ T07 (Import ZIP with merge) │
T08 (E2E test of full flow)
```
T03 and T04 can land in either order once T01+T02 are done. T06 and T07
can be parallelised within T05.
---
## T01 — Session model + service + per-session engine snapshots
```task
id: CE-WP-0005-T01
state_hub_task_id: 5b479bf5-b54a-4fc8-b500-ec49f5d68f6a
priority: high
status: done
```
Under `src/shared/`:
- `src/shared/session.ts``SessionId` branded type added to
`ids.ts`; `Session` interface with `{ id, name, createdAt,
updatedAt, lastOpenedAt? }`. No `documentIds` field — membership is
implicit (a session "owns" the documents in its engine snapshot).
Under `src/engine/`:
- Extend `events/types.ts` with the four new events:
`SessionCreated`, `SessionRenamed`, `SessionDeleted`,
`SessionActivated`. Add to the `EngineEvent` union.
- `src/engine/services/sessions.ts``SessionService` with
`create(name)`, `rename(id, name)`, `delete(id)`, `list()`,
`get(id)`, `setActive(id | null)`, `getActive()`. Backed by a
repo + the event bus.
- `src/engine/repos/in-memory-sessions.ts` — Map-backed
`SessionRepository`.
Per-session **engine snapshot persistence**:
- `STORAGE_KEY` becomes a function of `sessionId`:
`citation-evidence:session:<sessionId>:engine-snapshot:v1`.
- A separate index key
`citation-evidence:sessions:v1` stores the list of all known
sessions.
- The active session id is held in
`citation-evidence:active-session-id:v1`.
- When the active session changes, the old engine snapshot's
persister stops; a new persister is attached against the new
session's key. The engine itself is **recreated** (the cleanest
way to reset every in-memory repo); `EngineProvider` is keyed by
`sessionId` so React unmounts/remounts on switch.
Tests:
- Unit: `SessionService` lifecycle (create/rename/delete/setActive),
event emission, conflict on rename to a duplicate name.
- `restoreFromStorage` round-trip with the new per-session key
scheme — drop in a fixture set of two sessions, restore each,
assert engines hold the right documents.
---
## T02 — PdfByteStore + uploaded-document ingest path
```task
id: CE-WP-0005-T02
state_hub_task_id: 25626309-4cad-44b5-ac44-7e0dc7ea48fa
priority: high
status: done
depends_on: [T01]
```
Under `src/source/pdf/`:
- `byte-store.ts``createPdfByteStore()` returns a
`Map<DocumentId, Uint8Array>` wrapper with `put`, `get`, `delete`,
`list`, `clear`. Scoped per session (one instance per active
session; replaced on switch).
- `upload.ts` — `ingestPdfFromFile(file: File | Blob, store):
Promise<{ document, representation }>`:
1. Read bytes via `file.arrayBuffer()`.
2. Call existing `ingestPdf(bytes, { filename: file.name })`.
3. `store.put(document.id, bytes)`.
4. Mint a `blob:` URL from a fresh `Blob([bytes], { type: "application/pdf" })`
and stash it on `document.uri` so the viewer adapter can mount it.
5. Return the engine inputs ready for
`engine.documents.register(...)`.
- Blob URL **revocation**: when a document is deleted from a session,
`URL.revokeObjectURL(document.uri)` runs before the engine drops the
record. A small helper inside the byte store handles this so the
app layer doesn't have to remember.
The fixture-loading path (current `App.tsx` fetch + `ingestPdf`)
remains as-is for the optional "Sample sessions" quick-start; the
upload path is a parallel branch that ends at the same engine call.
Tests:
- Unit: round-trip a known-bytes PDF through `ingestPdfFromFile`,
assert `store.get(documentId)` returns the same bytes.
- Unit: delete revokes the blob URL exactly once even if called
twice.
---
## T03 — Upload UI + session-scoped Collection list
```task
id: CE-WP-0005-T03
state_hub_task_id: 55275918-e610-4513-ba2d-c05018ecd42d
priority: high
status: done
depends_on: [T02]
```
Under `src/app/sessions/`:
- `UploadDropzone.tsx` — drag-drop region and a file picker that
accepts `application/pdf` (multi-select). On drop:
1. For each `File`: call `ingestPdfFromFile(file, byteStore)` then
`engine.documents.register(...)`. Show a per-file progress chip;
surface a toast on failure.
2. Make the most-recently-uploaded document the active document.
Under `src/work/CollectionList.tsx`:
- Rework to list the **active session's** documents (read from
`engine.documents.list()`), not the fixture manifest.
- Header bar with the session name + an inline "Upload PDF" button
that opens the dropzone.
- Per-item: title (filename), document id, "Open" + "Delete" actions.
Delete confirms via a small inline state, then calls into the byte
store + engine repo.
Fixtures become an optional **Sample sessions** entry inside the
session menu (T04). The current `CollectionList`'s manifest-driven
fixture loader moves into
`src/app/sessions/SampleSessions.tsx`, kept for tests and
demonstration.
Tests:
- DOM: dropping a synthetic File triggers ingest and the new document
appears in the list.
- DOM: per-item delete removes the row and revokes the blob URL.
---
## T04 — Session management UI (top-bar menu, hash routing)
```task
id: CE-WP-0005-T04
state_hub_task_id: e008524c-9cef-448f-b95b-fa524c725bc3
priority: medium
status: done
depends_on: [T01]
```
Under `src/app/sessions/`:
- `SessionMenu.tsx` — top-bar dropdown showing the active session
name. Menu items:
- **Switch to…** (list of all sessions sorted by `lastOpenedAt`)
- **New session…** (opens an inline name-input modal)
- **Rename…**, **Delete…** (with confirmation) for the active
session
- **Export ZIP** (T06)
- **Import ZIP** (T07)
- **Sample sessions ▸** (T03, optional submenu)
- Empty state: if no sessions exist, the app body is replaced by a
centred "Create your first session" call-to-action with an inline
name input.
Hash routing:
- Current routes (`#/forms/demo`, default Review) become
session-scoped: `#/s/<sessionId>`,
`#/s/<sessionId>/forms/demo`. The active-session pointer is the
router's responsibility (single source of truth = the hash); the
`SessionService.setActive(...)` call is a side effect of hash
change.
- A bare `#/` (no session) renders the empty state.
- Deep links into a deleted/unknown session redirect to the empty
state with a toast.
Tests:
- DOM: switching sessions in the menu updates the hash and unmounts +
remounts the engine (verified by checking that the previous
session's documents disappear from the CollectionList).
- DOM: deep-link to a known session loads that session's documents.
---
## T05 — ADR-0008 + SessionArchive manifest schema
```task
id: CE-WP-0005-T05
state_hub_task_id: 50d525b1-ba7d-454e-91b4-34d96bc5ab7b
priority: high
status: done
```
Add `docs/decisions/ADR-0008-session-archive-format.md`. Locks:
- **ZIP layout**:
```
manifest.json
documents/
<documentId>.pdf
```
`<documentId>` is the engine's branded id (`doc_…`), used as the
filename. Future variants (per-representation files,
per-attachment) are intentionally deferred.
- **manifest.json shape** (top-level fields):
- `schemaVersion: 1`
- `exportedAt: string` (ISO-8601)
- `session: { id, name, createdAt, updatedAt }`
- `engine: EngineSnapshot` (the same shape produced by
`captureSnapshot()` — re-used verbatim so the round-trip stays
one-way).
- `documentBindings: Array<{ documentId, filename, fingerprint }>` —
pairs every engine document with its file inside `documents/`.
- **Merge-on-name-collision policy** (T07 spec): documents are
deduped by fingerprint; annotations/evidence/links are imported
with fresh ids and rebound to the deduped `documentId`. Re-importing
*your own* freshly-exported ZIP into the same session therefore
duplicates annotations — documented as a known limitation. A
later workplan can add idempotent imports via an `importBundleId`
field.
Under `src/shared/`:
- `session-archive.ts` — TypeScript interfaces for
`SessionArchiveManifest` matching the ADR. Pure types + a
`parseSessionArchiveManifest(json: unknown):
SessionArchiveManifest` that throws on schema mismatch (used by
the importer in T07).
Tests:
- Unit conformance: round-trip a synthetic manifest object →
`JSON.stringify` → parse → deep-equal.
- Unit failure: a manifest missing required fields, or with the
wrong `schemaVersion`, throws with a useful message.
---
## T06 — Export session as ZIP archive
```task
id: CE-WP-0005-T06
state_hub_task_id: 07546a24-90d8-4b5d-9833-2648d2936ea2
priority: high
status: done
depends_on: [T05]
```
Dependency add: `jszip` (small, MIT, battle-tested). Use the ESM
build to keep the bundle clean.
Under `src/app/sessions/`:
- `exportSessionZip.ts`:
```ts
export async function exportSessionZip(
sessionId: SessionId,
engine: Engine,
byteStore: PdfByteStore,
session: Session,
): Promise<Blob>
```
Steps:
1. Build the manifest from `captureSnapshot(engine)` + session
metadata + per-document `{ filename, fingerprint }` derived
from `engine.documents`.
2. For each `documentBindings[i]`, push `bytes` into
`documents/<documentId>.pdf`.
3. Push `manifest.json` (stringified, pretty-printed).
4. `zip.generateAsync({ type: "blob" })`.
- `triggerSessionDownload(blob, filename)` — creates an `<a download>`
element, clicks it, revokes the URL. Filename:
`<slugified session name>-<ISO date>.zip`.
UI:
- **Export ZIP** menu item in `SessionMenu` calls the above. Show a
brief spinner state on the menu item while the zip generates.
- Surface a success/error toast (re-use the toast pattern from
CE-WP-0004 sidebar, lifted into `src/app/sessions/Toast.tsx`).
Tests:
- DOM: synthesise a one-document session, click Export, capture the
generated Blob, unzip in the test (via JSZip), assert the manifest
matches the engine snapshot and `documents/<id>.pdf` contains the
original bytes.
---
## T07 — Import ZIP with merge-on-name-collision + fingerprint dedup
```task
id: CE-WP-0005-T07
state_hub_task_id: 2fedab8d-6af7-458a-90d3-383241978f4e
priority: high
status: done
depends_on: [T05]
```
Under `src/app/sessions/`:
- `importSessionZip.ts`:
```ts
export interface ImportSessionResult {
readonly sessionId: SessionId;
readonly outcome: "created" | "merged-into";
readonly stats: {
readonly documentsAdded: number;
readonly documentsDeduped: number;
readonly annotationsAdded: number;
readonly evidenceAdded: number;
readonly linksAdded: number;
};
}
export async function importSessionZip(
file: File | Blob,
services: SessionImportServices,
): Promise<ImportSessionResult>;
```
Steps:
1. Read with JSZip; parse `manifest.json` via
`parseSessionArchiveManifest`. Reject on schema mismatch.
2. Find target session: if a session with the manifest's
`session.name` exists → that one (`merged-into`). Else: create a
fresh session (`created`) preserving the imported name.
3. For each `documentBindings[i]`:
- If a document with the same `fingerprint` already lives in the
target session's engine: reuse its `documentId`; record a
remap `incoming.documentId → existing.documentId`. Skip the
bytes (we already have them).
- Else: register a new engine document with a freshly minted
`documentId` (the manifest's id is not preserved — it could
collide with future imports). Push bytes into the byte store.
Remap `incoming.documentId → new.documentId`.
4. Apply remaps:
- Annotations: mint new `annotationId`, rebind `documentId` to
the remapped value, call `engine.annotations.create(...)`.
- Evidence: mint new `evidenceItemId`, remap
`annotationIds`, call `engine.evidence.create(...)`.
- EvidenceLinks: mint new `evidenceLinkId`, remap
`evidenceItemId`, call `bindings.linkEvidenceToTarget(...)`.
5. Switch the active session to the target.
- Errors surface as toasts (corrupt zip, version mismatch, missing
binary file referenced by the manifest, generic IO).
UI:
- **Import ZIP** menu item in `SessionMenu` opens a file picker
(`accept=".zip,application/zip"`).
- After success: show a toast like *"Imported 'Demo' — 1 new
document, 2 annotations, 1 evidence item"*.
Tests:
- Unit: in-process round-trip — export a synthetic session, import
into an empty engine, assert outcome=`created`, document/anno/ev
counts match.
- Unit: same export, but import into a session of the same name that
already holds the document → assert `outcome="merged-into"`,
`documentsDeduped=1`, annotation/evidence counts double (additive
behaviour, per ADR-0008).
- Unit: corrupt manifest (drop a required field) → import rejects
with the parser error.
---
## T08 — E2E test of full create → annotate → export → reimport flow
```task
id: CE-WP-0005-T08
state_hub_task_id: 72d92828-8814-4034-96cc-7da5b6a5e281
priority: high
status: done
depends_on: [T03, T04, T06, T07]
```
`tests/integration/session-export-reimport.dom.test.tsx`. Mocks: the
PDF viewer (same pattern as
`tests/integration/citation-card-export-e2e.dom.test.tsx`); the
ingest path can use the real `ingestPdf` because we hand it real
fixture bytes.
Walk:
1. Load the app — empty state appears.
2. Create session "Demo" via the inline name input.
3. Upload a fixture PDF (read fixture bytes via
`node:fs` and wrap in a `File`).
4. Inject a selection for the manifest's known-good quote → save
evidence with a commentary.
5. (Sanity) Click Export → Copy as Markdown; assert the clipboard
payload contains the quote + commentary + openContextUrl.
6. Click **Export ZIP** in the SessionMenu. Intercept the
`<a download>` invocation; capture the Blob.
7. Click **Import ZIP** with the captured Blob. Assert merge:
- `outcome="merged-into"`, `documentsDeduped=1`,
`annotationsAdded=1`, `evidenceAdded=1`.
- The CollectionList still shows one document (deduped).
- The EvidenceSidebar now shows **two** evidence rows for the
same passage (one original + one from the merge — the known
additive behaviour per ADR-0008).
8. Click Export → Copy as Markdown on the *merged* evidence item;
assert the citation card output matches the original (proves the
round-trip preserves quote + commentary + URL shape).
If T08 passes, the MVP demo loop is complete and the project can ship
as a usable single-page demo.
---
## Out of scope (deferred to later workplans)
- **IndexedDB persistence** of PDF bytes between page reloads.
Currently only the ZIP path persists binaries.
- **Idempotent re-imports** (avoiding annotation duplication when
re-importing your own export). Requires an `importBundleId` field
in the manifest and a dedupe pass during T07. Track as a future
improvement; ADR-0008 already calls it out.
- **Session sharing via URL** (one-click "open this session in a
read-only viewer"). Adjacent to the deep-link URL scheme from
`wiki/ArchitectureOverview.md` §14.3 but not in scope here.
- **Per-document download** (export a single PDF + its annotations
as a `.zip`). The session-level export covers the demo loop; a
per-document variant is a small follow-up if asked for.
- **Polish**: branding, theme, first-run tutorial. Once the loop
works end-to-end, a separate workplan can tackle the look-and-feel.