# ADR-0008 — Session archive format (ZIP layout, manifest schema, merge policy) - Status: accepted - Date: 2026-05-25 - Workplan: CE-WP-0005-T05 (schema), CE-WP-0005-T06 (export), CE-WP-0005-T07 (import) - Spec refs: `wiki/ProductRequirementsDocument.md` §20, `wiki/ArchitectureOverview.md` §3.4, §14.3 ## Context The CE-WP-0005 demo loop ends with a user exporting an entire session (documents, annotations, evidence, links) into a single `.zip` archive and importing it back later. The archive needs to be the **only** persistence mechanism the demo provides beyond a tab close — no IndexedDB in this workplan — so its shape needs to be locked before two parallel tasks (T06, T07) and the integration test (T08) land on top of it. Three things need a written contract: 1. **ZIP layout** — what files live in the archive, named how. 2. **manifest.json shape** — versioned JSON schema, validated on import. 3. **Conflict policy** — what happens when an imported session's name already exists in the receiving repository. ## Decision ### ZIP layout ``` manifest.json documents/ .pdf ``` - `` is the engine-minted branded id (`doc_`). Using it as the filename means the manifest's `documentBindings[i]` can cross-reference the binary file without an additional lookup table. - Per-representation files (e.g. an extracted-text JSON alongside each PDF) are intentionally deferred. The canonical text + selectors are embedded in the engine snapshot inside `manifest.json`, so a re-import can regenerate everything from the binary. - Future archive variants (multi-attachment documents, Markdown documents) extend by adding subdirectories under the archive root. Importers must ignore unknown top-level entries so older clients remain compatible with newer archives that add new file types. ### `manifest.json` shape (schemaVersion 1) ```ts interface SessionArchiveManifest { schemaVersion: 1; exportedAt: string; // ISO-8601 UTC timestamp session: { id: SessionId; // sess_ name: string; // trimmed display name createdAt: string; // ISO-8601 updatedAt: string; // ISO-8601 }; engine: EngineSnapshot; // shape from src/engine/persistence.ts documentBindings: Array<{ documentId: DocumentId; // matches the engine's record filename: string; // original filename from upload fingerprint: string; // SHA-256 — used by the importer for dedup }>; } ``` The `engine` field is the same shape that `captureSnapshot()` produces in `src/engine/persistence.ts`. Re-using it verbatim keeps the in-memory ↔ archive round-trip a one-way conversion (snapshot ↔ JSON) instead of growing a parallel schema that would drift. Unknown fields at the top level **must be preserved** on import (a future client can write them) but unknown fields inside `session` or `documentBindings[i]` are dropped — the import constructs typed domain objects from the validated subset. ### Merge-on-name-collision policy (T07) When an imported manifest's `session.name` matches an existing session, the existing session is the **target** (`outcome: "merged-into"`). Otherwise a fresh session is created with the imported name (`outcome: "created"`). Within the target session: - **Documents** are deduped by `fingerprint` (SHA-256 over the PDF bytes). If a document with the same fingerprint already exists, the import keeps the existing `documentId` and records a remap from the incoming id. The binary file is **skipped** (we already have the bytes). Otherwise a fresh `documentId` is minted and the bytes go into the per-session byte store. - **Annotations**, **evidence items**, and **evidence links** are imported **additively**: each gets a freshly minted id, with any `documentId`/`annotationId`/`evidenceItemId` references rewritten via the remap. No update-in-place, no overwrite-by-id. #### Known limitation: re-importing your own export duplicates annotations Because annotations/evidence/links are always added with fresh ids, re-importing a ZIP you just exported into the same session creates a second copy of every annotation (the existing PDF bytes dedupe correctly via fingerprint, but the annotations have nothing to de-dupe against). This is intentional for the demo loop and documented here so it's not mistaken for a bug. A future workplan can introduce an `importBundleId` field (a UUID minted at export time, stamped onto the manifest and on every annotation/evidence-link the import creates) plus a dedupe pass that skips entities already imported under the same bundle id. ## Consequences - **One source of truth for the engine snapshot.** Same shape on disk and in memory; the persistence helpers stay re-usable. - **Fingerprint-based dedup is byte-stable.** Two users converting the same PDF end up with identical fingerprints; merging their archives works as expected. - **Idempotency is opt-in, not the default.** A user who wants exact round-trips must use a future `importBundleId` flow, not the basic T07 import. - **Forward-compatible additions are cheap.** New top-level keys land by adding fields; old importers preserve them and new importers consume them. ## Status Accepted. The TypeScript types + `parseSessionArchiveManifest` in `src/shared/session-archive.ts` are the executable contract for schemaVersion 1.