Turn the MVP into a self-contained demo. Users now:
1. Land on an empty-state and create a named session.
2. Drag-drop or pick arbitrary PDFs into that session.
3. Annotate, build evidence, link to form fields — all session-scoped.
4. Export the whole session as a single .zip archive (manifest +
per-document PDFs).
5. Import a .zip back — into a new session, or merged into an
existing one (documents deduped by SHA-256 fingerprint;
annotations/evidence/links added additively).
Architecture:
- New shared types: SessionId, Session, SessionArchiveManifest +
parseSessionArchiveManifest with schema-version validation.
- SessionService (engine/services/sessions.ts) handles lifecycle
(create/rename/delete/setActive) + emits 4 new events through its
own bus; SharedContracts.md §4 lists the additions.
- SessionProvider (work/SessionContext.tsx) owns the cross-session
state: service, per-session PdfByteStore registry, per-session
version counter that drives EngineProvider remounts after imports.
- EngineProvider becomes session-aware (sessionId prop drives per-
session localStorage keys). Bumping engineRevision after
restoreFromStorage forces consumers to re-render so restored repos
show up immediately.
- PdfByteStore (source/pdf/byte-store.ts) holds Uint8Array bytes per
document and mints blob URLs; ingestPdfFromFile is the upload
entry-point that wraps the existing ingestPdf pipeline.
- ADR-0008 locks the ZIP layout (manifest.json + documents/<id>.pdf),
the manifest schema (schemaVersion 1), and the merge-on-collision
policy. JSZip is the only new dependency.
- App.tsx restructured: SessionProvider at the root, EngineProvider
keyed by ${sessionId}:${version}, hash routing #/s/<id>[/forms/demo],
SessionMenu top-bar, CreateFirstSession empty state.
- New DocumentRemoved event for per-document delete cleanup in
CollectionList; engine.documents.remove() is the new service method.
Tests:
- Unit: 16 SessionService lifecycle + persistence tests;
per-session snapshot round-trip; PdfByteStore + ingestPdfFromFile;
SessionArchive parser; exportSessionZip + importSessionZip with
create + merge + corrupt-archive paths.
- DOM: UploadDropzone, session-scoped CollectionList delete,
SessionMenu create/switch/rename, routing parser.
- E2E: tests/integration/session-export-reimport.dom.test.tsx walks
the full create → annotate → export → reimport flow and asserts
the additive merge (deduped doc + doubled evidence rows).
- Legacy E2Es updated to use a seed-session helper instead of the
removed fixture-button flow.
Known limitation (documented in ADR-0008): re-importing your own
freshly-exported ZIP creates duplicate annotations. Forward pointer
left for an importBundleId follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.3 KiB
ADR-0008 — Session archive format (ZIP layout, manifest schema, merge policy)
- Status: accepted
- Date: 2026-05-25
- Workplan: CE-WP-0005-T05 (schema), CE-WP-0005-T06 (export), CE-WP-0005-T07 (import)
- Spec refs:
wiki/ProductRequirementsDocument.md§20,wiki/ArchitectureOverview.md§3.4, §14.3
Context
The CE-WP-0005 demo loop ends with a user exporting an entire session
(documents, annotations, evidence, links) into a single .zip
archive and importing it back later. The archive needs to be the
only persistence mechanism the demo provides beyond a tab close —
no IndexedDB in this workplan — so its shape needs to be locked
before two parallel tasks (T06, T07) and the integration test (T08)
land on top of it.
Three things need a written contract:
- ZIP layout — what files live in the archive, named how.
- manifest.json shape — versioned JSON schema, validated on import.
- Conflict policy — what happens when an imported session's name already exists in the receiving repository.
Decision
ZIP layout
manifest.json
documents/
<documentId>.pdf
<documentId>is the engine-minted branded id (doc_<uuid>). Using it as the filename means the manifest'sdocumentBindings[i]can cross-reference the binary file without an additional lookup table.- Per-representation files (e.g. an extracted-text JSON alongside each
PDF) are intentionally deferred. The canonical text + selectors are
embedded in the engine snapshot inside
manifest.json, so a re-import can regenerate everything from the binary. - Future archive variants (multi-attachment documents, Markdown documents) extend by adding subdirectories under the archive root. Importers must ignore unknown top-level entries so older clients remain compatible with newer archives that add new file types.
manifest.json shape (schemaVersion 1)
interface SessionArchiveManifest {
schemaVersion: 1;
exportedAt: string; // ISO-8601 UTC timestamp
session: {
id: SessionId; // sess_<uuid>
name: string; // trimmed display name
createdAt: string; // ISO-8601
updatedAt: string; // ISO-8601
};
engine: EngineSnapshot; // shape from src/engine/persistence.ts
documentBindings: Array<{
documentId: DocumentId; // matches the engine's record
filename: string; // original filename from upload
fingerprint: string; // SHA-256 — used by the importer for dedup
}>;
}
The engine field is the same shape that captureSnapshot() produces
in src/engine/persistence.ts. Re-using it verbatim keeps the
in-memory ↔ archive round-trip a one-way conversion (snapshot ↔
JSON) instead of growing a parallel schema that would drift.
Unknown fields at the top level must be preserved on import (a
future client can write them) but unknown fields inside session or
documentBindings[i] are dropped — the import constructs typed
domain objects from the validated subset.
Merge-on-name-collision policy (T07)
When an imported manifest's session.name matches an existing
session, the existing session is the target (outcome: "merged-into"). Otherwise a fresh session is created with the
imported name (outcome: "created").
Within the target session:
- Documents are deduped by
fingerprint(SHA-256 over the PDF bytes). If a document with the same fingerprint already exists, the import keeps the existingdocumentIdand records a remap from the incoming id. The binary file is skipped (we already have the bytes). Otherwise a freshdocumentIdis minted and the bytes go into the per-session byte store. - Annotations, evidence items, and evidence links are
imported additively: each gets a freshly minted id, with any
documentId/annotationId/evidenceItemIdreferences rewritten via the remap. No update-in-place, no overwrite-by-id.
Known limitation: re-importing your own export duplicates annotations
Because annotations/evidence/links are always added with fresh ids, re-importing a ZIP you just exported into the same session creates a second copy of every annotation (the existing PDF bytes dedupe correctly via fingerprint, but the annotations have nothing to de-dupe against).
This is intentional for the demo loop and documented here so it's not
mistaken for a bug. A future workplan can introduce an
importBundleId field (a UUID minted at export time, stamped onto
the manifest and on every annotation/evidence-link the import
creates) plus a dedupe pass that skips entities already imported
under the same bundle id.
Consequences
- One source of truth for the engine snapshot. Same shape on disk and in memory; the persistence helpers stay re-usable.
- Fingerprint-based dedup is byte-stable. Two users converting the same PDF end up with identical fingerprints; merging their archives works as expected.
- Idempotency is opt-in, not the default. A user who wants exact
round-trips must use a future
importBundleIdflow, not the basic T07 import. - Forward-compatible additions are cheap. New top-level keys land by adding fields; old importers preserve them and new importers consume them.
Status
Accepted. The TypeScript types + parseSessionArchiveManifest in
src/shared/session-archive.ts are the executable contract for
schemaVersion 1.