Files
citation-evidence/workplans/CE-WP-0005-demo-sessions-zip-archive.md
tegwick 779ae0d317 Implement CE-WP-0005 T01-T08: demo app — sessions, uploads, ZIP archive
Turn the MVP into a self-contained demo. Users now:
  1. Land on an empty-state and create a named session.
  2. Drag-drop or pick arbitrary PDFs into that session.
  3. Annotate, build evidence, link to form fields — all session-scoped.
  4. Export the whole session as a single .zip archive (manifest +
     per-document PDFs).
  5. Import a .zip back — into a new session, or merged into an
     existing one (documents deduped by SHA-256 fingerprint;
     annotations/evidence/links added additively).

Architecture:
- New shared types: SessionId, Session, SessionArchiveManifest +
  parseSessionArchiveManifest with schema-version validation.
- SessionService (engine/services/sessions.ts) handles lifecycle
  (create/rename/delete/setActive) + emits 4 new events through its
  own bus; SharedContracts.md §4 lists the additions.
- SessionProvider (work/SessionContext.tsx) owns the cross-session
  state: service, per-session PdfByteStore registry, per-session
  version counter that drives EngineProvider remounts after imports.
- EngineProvider becomes session-aware (sessionId prop drives per-
  session localStorage keys). Bumping engineRevision after
  restoreFromStorage forces consumers to re-render so restored repos
  show up immediately.
- PdfByteStore (source/pdf/byte-store.ts) holds Uint8Array bytes per
  document and mints blob URLs; ingestPdfFromFile is the upload
  entry-point that wraps the existing ingestPdf pipeline.
- ADR-0008 locks the ZIP layout (manifest.json + documents/<id>.pdf),
  the manifest schema (schemaVersion 1), and the merge-on-collision
  policy. JSZip is the only new dependency.
- App.tsx restructured: SessionProvider at the root, EngineProvider
  keyed by ${sessionId}:${version}, hash routing #/s/<id>[/forms/demo],
  SessionMenu top-bar, CreateFirstSession empty state.
- New DocumentRemoved event for per-document delete cleanup in
  CollectionList; engine.documents.remove() is the new service method.

Tests:
- Unit: 16 SessionService lifecycle + persistence tests;
  per-session snapshot round-trip; PdfByteStore + ingestPdfFromFile;
  SessionArchive parser; exportSessionZip + importSessionZip with
  create + merge + corrupt-archive paths.
- DOM: UploadDropzone, session-scoped CollectionList delete,
  SessionMenu create/switch/rename, routing parser.
- E2E: tests/integration/session-export-reimport.dom.test.tsx walks
  the full create → annotate → export → reimport flow and asserts
  the additive merge (deduped doc + doubled evidence rows).
- Legacy E2Es updated to use a seed-session helper instead of the
  removed fixture-button flow.

Known limitation (documented in ADR-0008): re-importing your own
freshly-exported ZIP creates duplicate annotations. Forward pointer
left for an importBundleId follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 14:57:28 +02:00

18 KiB

id, type, title, domain, repo, repo_id, topic_slug, topic_id, state_hub_workstream_id, status, owner, created, updated, depends_on_workplan, spec_refs
id type title domain repo repo_id topic_slug topic_id state_hub_workstream_id status owner created updated depends_on_workplan spec_refs
CE-WP-0005 workplan Demo app — Named sessions, document uploads, ZIP export/import citation_evidence citation-evidence a677c189-b4e2-4f2a-9e48-faa482c277e6 citation_evidence_mvp 96fa8e80-9f74-40f2-84cd-644e9747b9ec ec88caf3-85ad-413c-8ddd-ef7278f6ce57 done Bernd 2026-05-25 2026-05-26 CE-WP-0004
wiki/ProductRequirementsDocument.md
wiki/ArchitectureOverview.md
wiki/SharedContracts.md

CE-WP-0005 — Demo App: Sessions + Uploads + ZIP Archive

Turn the MVP into a self-contained demo that a stranger can pick up and use. After this workplan, a user can:

  1. Land on the app and create a named session ("Lease 2024", "Klage Müller", …).
  2. Drag-drop or pick arbitrary PDFs into that session (no fixtures required).
  3. Annotate, build evidence, link to form fields, and export citation cards — same flows as CE-WP-0002..0004, now scoped to the active session.
  4. Export the whole session as a single .zip archive: every PDF plus a manifest with the engine snapshot.
  5. Import a .zip back — into a new session, or merged into an existing one (documents deduped by fingerprint; annotations and evidence added additively).

The demo replaces the current single-bucket app: Review and Forms modes both become session-scoped. The previous fixture-driven workflow survives as an optional "Sample sessions" quick-start.

Scoping decisions (locked before drafting)

  • Demo placement: the demo replaces the main app. The MVP Review and Forms layouts continue to work, but now under a session.
  • PDF byte storage: in-memory only. PDFs survive within a tab session; reloading the page loses uploaded bytes unless the ZIP was exported. Re-importing the ZIP restores them. No IndexedDB tier in this workplan.
  • Import conflict policy: if a ZIP carries a session name that already exists, merge. Documents are deduped by SHA-256 fingerprint (incoming references rebound to the existing documentId). Annotations, evidence, and links are added as fresh ids — additive, never overwriting. Locked in ADR-0008 (created in T05).

Dependency Order

T01 (Session model + service + per-session snapshots)
  ├─ T02 (PdfByteStore + uploaded-document ingest path)
  │    └─ T03 (Upload UI + session-scoped CollectionList)
  └─ T04 (Session management UI — top-bar menu + hash routing)
       ↓
T05 (ADR-0008 + SessionArchive schema)
  ├─ T06 (Export session as ZIP)  ───┐
  └─ T07 (Import ZIP with merge)     │
                                     ↓
                                T08 (E2E test of full flow)

T03 and T04 can land in either order once T01+T02 are done. T06 and T07 can be parallelised within T05.


T01 — Session model + service + per-session engine snapshots

id: CE-WP-0005-T01
state_hub_task_id: 5b479bf5-b54a-4fc8-b500-ec49f5d68f6a
priority: high
status: done

Under src/shared/:

  • src/shared/session.tsSessionId branded type added to ids.ts; Session interface with { id, name, createdAt, updatedAt, lastOpenedAt? }. No documentIds field — membership is implicit (a session "owns" the documents in its engine snapshot).

Under src/engine/:

  • Extend events/types.ts with the four new events: SessionCreated, SessionRenamed, SessionDeleted, SessionActivated. Add to the EngineEvent union.
  • src/engine/services/sessions.tsSessionService with create(name), rename(id, name), delete(id), list(), get(id), setActive(id | null), getActive(). Backed by a repo + the event bus.
  • src/engine/repos/in-memory-sessions.ts — Map-backed SessionRepository.

Per-session engine snapshot persistence:

  • STORAGE_KEY becomes a function of sessionId: citation-evidence:session:<sessionId>:engine-snapshot:v1.
  • A separate index key citation-evidence:sessions:v1 stores the list of all known sessions.
  • The active session id is held in citation-evidence:active-session-id:v1.
  • When the active session changes, the old engine snapshot's persister stops; a new persister is attached against the new session's key. The engine itself is recreated (the cleanest way to reset every in-memory repo); EngineProvider is keyed by sessionId so React unmounts/remounts on switch.

Tests:

  • Unit: SessionService lifecycle (create/rename/delete/setActive), event emission, conflict on rename to a duplicate name.
  • restoreFromStorage round-trip with the new per-session key scheme — drop in a fixture set of two sessions, restore each, assert engines hold the right documents.

T02 — PdfByteStore + uploaded-document ingest path

id: CE-WP-0005-T02
state_hub_task_id: 25626309-4cad-44b5-ac44-7e0dc7ea48fa
priority: high
status: done
depends_on: [T01]

Under src/source/pdf/:

  • byte-store.tscreatePdfByteStore() returns a Map<DocumentId, Uint8Array> wrapper with put, get, delete, list, clear. Scoped per session (one instance per active session; replaced on switch).
  • upload.tsingestPdfFromFile(file: File | Blob, store): Promise<{ document, representation }>:
    1. Read bytes via file.arrayBuffer().
    2. Call existing ingestPdf(bytes, { filename: file.name }).
    3. store.put(document.id, bytes).
    4. Mint a blob: URL from a fresh Blob([bytes], { type: "application/pdf" }) and stash it on document.uri so the viewer adapter can mount it.
    5. Return the engine inputs ready for engine.documents.register(...).
  • Blob URL revocation: when a document is deleted from a session, URL.revokeObjectURL(document.uri) runs before the engine drops the record. A small helper inside the byte store handles this so the app layer doesn't have to remember.

The fixture-loading path (current App.tsx fetch + ingestPdf) remains as-is for the optional "Sample sessions" quick-start; the upload path is a parallel branch that ends at the same engine call.

Tests:

  • Unit: round-trip a known-bytes PDF through ingestPdfFromFile, assert store.get(documentId) returns the same bytes.
  • Unit: delete revokes the blob URL exactly once even if called twice.

T03 — Upload UI + session-scoped Collection list

id: CE-WP-0005-T03
state_hub_task_id: 55275918-e610-4513-ba2d-c05018ecd42d
priority: high
status: done
depends_on: [T02]

Under src/app/sessions/:

  • UploadDropzone.tsx — drag-drop region and a file picker that accepts application/pdf (multi-select). On drop:
    1. For each File: call ingestPdfFromFile(file, byteStore) then engine.documents.register(...). Show a per-file progress chip; surface a toast on failure.
    2. Make the most-recently-uploaded document the active document.

Under src/work/CollectionList.tsx:

  • Rework to list the active session's documents (read from engine.documents.list()), not the fixture manifest.
  • Header bar with the session name + an inline "Upload PDF" button that opens the dropzone.
  • Per-item: title (filename), document id, "Open" + "Delete" actions. Delete confirms via a small inline state, then calls into the byte store + engine repo.

Fixtures become an optional Sample sessions entry inside the session menu (T04). The current CollectionList's manifest-driven fixture loader moves into src/app/sessions/SampleSessions.tsx, kept for tests and demonstration.

Tests:

  • DOM: dropping a synthetic File triggers ingest and the new document appears in the list.
  • DOM: per-item delete removes the row and revokes the blob URL.

T04 — Session management UI (top-bar menu, hash routing)

id: CE-WP-0005-T04
state_hub_task_id: e008524c-9cef-448f-b95b-fa524c725bc3
priority: medium
status: done
depends_on: [T01]

Under src/app/sessions/:

  • SessionMenu.tsx — top-bar dropdown showing the active session name. Menu items:
    • Switch to… (list of all sessions sorted by lastOpenedAt)
    • New session… (opens an inline name-input modal)
    • Rename…, Delete… (with confirmation) for the active session
    • Export ZIP (T06)
    • Import ZIP (T07)
    • Sample sessions ▸ (T03, optional submenu)
  • Empty state: if no sessions exist, the app body is replaced by a centred "Create your first session" call-to-action with an inline name input.

Hash routing:

  • Current routes (#/forms/demo, default Review) become session-scoped: #/s/<sessionId>, #/s/<sessionId>/forms/demo. The active-session pointer is the router's responsibility (single source of truth = the hash); the SessionService.setActive(...) call is a side effect of hash change.
  • A bare #/ (no session) renders the empty state.
  • Deep links into a deleted/unknown session redirect to the empty state with a toast.

Tests:

  • DOM: switching sessions in the menu updates the hash and unmounts + remounts the engine (verified by checking that the previous session's documents disappear from the CollectionList).
  • DOM: deep-link to a known session loads that session's documents.

T05 — ADR-0008 + SessionArchive manifest schema

id: CE-WP-0005-T05
state_hub_task_id: 50d525b1-ba7d-454e-91b4-34d96bc5ab7b
priority: high
status: done

Add docs/decisions/ADR-0008-session-archive-format.md. Locks:

  • ZIP layout:
    manifest.json
    documents/
      <documentId>.pdf
    
    <documentId> is the engine's branded id (doc_…), used as the filename. Future variants (per-representation files, per-attachment) are intentionally deferred.
  • manifest.json shape (top-level fields):
    • schemaVersion: 1
    • exportedAt: string (ISO-8601)
    • session: { id, name, createdAt, updatedAt }
    • engine: EngineSnapshot (the same shape produced by captureSnapshot() — re-used verbatim so the round-trip stays one-way).
    • documentBindings: Array<{ documentId, filename, fingerprint }> — pairs every engine document with its file inside documents/.
  • Merge-on-name-collision policy (T07 spec): documents are deduped by fingerprint; annotations/evidence/links are imported with fresh ids and rebound to the deduped documentId. Re-importing your own freshly-exported ZIP into the same session therefore duplicates annotations — documented as a known limitation. A later workplan can add idempotent imports via an importBundleId field.

Under src/shared/:

  • session-archive.ts — TypeScript interfaces for SessionArchiveManifest matching the ADR. Pure types + a parseSessionArchiveManifest(json: unknown): SessionArchiveManifest that throws on schema mismatch (used by the importer in T07).

Tests:

  • Unit conformance: round-trip a synthetic manifest object → JSON.stringify → parse → deep-equal.
  • Unit failure: a manifest missing required fields, or with the wrong schemaVersion, throws with a useful message.

T06 — Export session as ZIP archive

id: CE-WP-0005-T06
state_hub_task_id: 07546a24-90d8-4b5d-9833-2648d2936ea2
priority: high
status: done
depends_on: [T05]

Dependency add: jszip (small, MIT, battle-tested). Use the ESM build to keep the bundle clean.

Under src/app/sessions/:

  • exportSessionZip.ts:
    export async function exportSessionZip(
      sessionId: SessionId,
      engine: Engine,
      byteStore: PdfByteStore,
      session: Session,
    ): Promise<Blob>
    
    Steps:
    1. Build the manifest from captureSnapshot(engine) + session metadata + per-document { filename, fingerprint } derived from engine.documents.
    2. For each documentBindings[i], push bytes into documents/<documentId>.pdf.
    3. Push manifest.json (stringified, pretty-printed).
    4. zip.generateAsync({ type: "blob" }).
  • triggerSessionDownload(blob, filename) — creates an <a download> element, clicks it, revokes the URL. Filename: <slugified session name>-<ISO date>.zip.

UI:

  • Export ZIP menu item in SessionMenu calls the above. Show a brief spinner state on the menu item while the zip generates.
  • Surface a success/error toast (re-use the toast pattern from CE-WP-0004 sidebar, lifted into src/app/sessions/Toast.tsx).

Tests:

  • DOM: synthesise a one-document session, click Export, capture the generated Blob, unzip in the test (via JSZip), assert the manifest matches the engine snapshot and documents/<id>.pdf contains the original bytes.

T07 — Import ZIP with merge-on-name-collision + fingerprint dedup

id: CE-WP-0005-T07
state_hub_task_id: 2fedab8d-6af7-458a-90d3-383241978f4e
priority: high
status: done
depends_on: [T05]

Under src/app/sessions/:

  • importSessionZip.ts:
    export interface ImportSessionResult {
      readonly sessionId: SessionId;
      readonly outcome: "created" | "merged-into";
      readonly stats: {
        readonly documentsAdded: number;
        readonly documentsDeduped: number;
        readonly annotationsAdded: number;
        readonly evidenceAdded: number;
        readonly linksAdded: number;
      };
    }
    export async function importSessionZip(
      file: File | Blob,
      services: SessionImportServices,
    ): Promise<ImportSessionResult>;
    
    Steps:
    1. Read with JSZip; parse manifest.json via parseSessionArchiveManifest. Reject on schema mismatch.
    2. Find target session: if a session with the manifest's session.name exists → that one (merged-into). Else: create a fresh session (created) preserving the imported name.
    3. For each documentBindings[i]:
      • If a document with the same fingerprint already lives in the target session's engine: reuse its documentId; record a remap incoming.documentId → existing.documentId. Skip the bytes (we already have them).
      • Else: register a new engine document with a freshly minted documentId (the manifest's id is not preserved — it could collide with future imports). Push bytes into the byte store. Remap incoming.documentId → new.documentId.
    4. Apply remaps:
      • Annotations: mint new annotationId, rebind documentId to the remapped value, call engine.annotations.create(...).
      • Evidence: mint new evidenceItemId, remap annotationIds, call engine.evidence.create(...).
      • EvidenceLinks: mint new evidenceLinkId, remap evidenceItemId, call bindings.linkEvidenceToTarget(...).
    5. Switch the active session to the target.
  • Errors surface as toasts (corrupt zip, version mismatch, missing binary file referenced by the manifest, generic IO).

UI:

  • Import ZIP menu item in SessionMenu opens a file picker (accept=".zip,application/zip").
  • After success: show a toast like "Imported 'Demo' — 1 new document, 2 annotations, 1 evidence item".

Tests:

  • Unit: in-process round-trip — export a synthetic session, import into an empty engine, assert outcome=created, document/anno/ev counts match.
  • Unit: same export, but import into a session of the same name that already holds the document → assert outcome="merged-into", documentsDeduped=1, annotation/evidence counts double (additive behaviour, per ADR-0008).
  • Unit: corrupt manifest (drop a required field) → import rejects with the parser error.

T08 — E2E test of full create → annotate → export → reimport flow

id: CE-WP-0005-T08
state_hub_task_id: 72d92828-8814-4034-96cc-7da5b6a5e281
priority: high
status: done
depends_on: [T03, T04, T06, T07]

tests/integration/session-export-reimport.dom.test.tsx. Mocks: the PDF viewer (same pattern as tests/integration/citation-card-export-e2e.dom.test.tsx); the ingest path can use the real ingestPdf because we hand it real fixture bytes.

Walk:

  1. Load the app — empty state appears.
  2. Create session "Demo" via the inline name input.
  3. Upload a fixture PDF (read fixture bytes via node:fs and wrap in a File).
  4. Inject a selection for the manifest's known-good quote → save evidence with a commentary.
  5. (Sanity) Click Export → Copy as Markdown; assert the clipboard payload contains the quote + commentary + openContextUrl.
  6. Click Export ZIP in the SessionMenu. Intercept the <a download> invocation; capture the Blob.
  7. Click Import ZIP with the captured Blob. Assert merge:
    • outcome="merged-into", documentsDeduped=1, annotationsAdded=1, evidenceAdded=1.
    • The CollectionList still shows one document (deduped).
    • The EvidenceSidebar now shows two evidence rows for the same passage (one original + one from the merge — the known additive behaviour per ADR-0008).
  8. Click Export → Copy as Markdown on the merged evidence item; assert the citation card output matches the original (proves the round-trip preserves quote + commentary + URL shape).

If T08 passes, the MVP demo loop is complete and the project can ship as a usable single-page demo.


Out of scope (deferred to later workplans)

  • IndexedDB persistence of PDF bytes between page reloads. Currently only the ZIP path persists binaries.
  • Idempotent re-imports (avoiding annotation duplication when re-importing your own export). Requires an importBundleId field in the manifest and a dedupe pass during T07. Track as a future improvement; ADR-0008 already calls it out.
  • Session sharing via URL (one-click "open this session in a read-only viewer"). Adjacent to the deep-link URL scheme from wiki/ArchitectureOverview.md §14.3 but not in scope here.
  • Per-document download (export a single PDF + its annotations as a .zip). The session-level export covers the demo loop; a per-document variant is a small follow-up if asked for.
  • Polish: branding, theme, first-run tutorial. Once the loop works end-to-end, a separate workplan can tackle the look-and-feel.