generated from coulomb/repo-seed
Turn the MVP into a self-contained demo. Users now:
1. Land on an empty-state and create a named session.
2. Drag-drop or pick arbitrary PDFs into that session.
3. Annotate, build evidence, link to form fields — all session-scoped.
4. Export the whole session as a single .zip archive (manifest +
per-document PDFs).
5. Import a .zip back — into a new session, or merged into an
existing one (documents deduped by SHA-256 fingerprint;
annotations/evidence/links added additively).
Architecture:
- New shared types: SessionId, Session, SessionArchiveManifest +
parseSessionArchiveManifest with schema-version validation.
- SessionService (engine/services/sessions.ts) handles lifecycle
(create/rename/delete/setActive) + emits 4 new events through its
own bus; SharedContracts.md §4 lists the additions.
- SessionProvider (work/SessionContext.tsx) owns the cross-session
state: service, per-session PdfByteStore registry, per-session
version counter that drives EngineProvider remounts after imports.
- EngineProvider becomes session-aware (sessionId prop drives per-
session localStorage keys). Bumping engineRevision after
restoreFromStorage forces consumers to re-render so restored repos
show up immediately.
- PdfByteStore (source/pdf/byte-store.ts) holds Uint8Array bytes per
document and mints blob URLs; ingestPdfFromFile is the upload
entry-point that wraps the existing ingestPdf pipeline.
- ADR-0008 locks the ZIP layout (manifest.json + documents/<id>.pdf),
the manifest schema (schemaVersion 1), and the merge-on-collision
policy. JSZip is the only new dependency.
- App.tsx restructured: SessionProvider at the root, EngineProvider
keyed by ${sessionId}:${version}, hash routing #/s/<id>[/forms/demo],
SessionMenu top-bar, CreateFirstSession empty state.
- New DocumentRemoved event for per-document delete cleanup in
CollectionList; engine.documents.remove() is the new service method.
Tests:
- Unit: 16 SessionService lifecycle + persistence tests;
per-session snapshot round-trip; PdfByteStore + ingestPdfFromFile;
SessionArchive parser; exportSessionZip + importSessionZip with
create + merge + corrupt-archive paths.
- DOM: UploadDropzone, session-scoped CollectionList delete,
SessionMenu create/switch/rename, routing parser.
- E2E: tests/integration/session-export-reimport.dom.test.tsx walks
the full create → annotate → export → reimport flow and asserts
the additive merge (deduped doc + doubled evidence rows).
- Legacy E2Es updated to use a seed-session helper instead of the
removed fixture-button flow.
Known limitation (documented in ADR-0008): re-importing your own
freshly-exported ZIP creates duplicate annotations. Forward pointer
left for an importBundleId follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
135 lines
5.3 KiB
Markdown
135 lines
5.3 KiB
Markdown
# ADR-0008 — Session archive format (ZIP layout, manifest schema, merge policy)
|
|
|
|
- Status: accepted
|
|
- Date: 2026-05-25
|
|
- Workplan: CE-WP-0005-T05 (schema), CE-WP-0005-T06 (export),
|
|
CE-WP-0005-T07 (import)
|
|
- Spec refs: `wiki/ProductRequirementsDocument.md` §20,
|
|
`wiki/ArchitectureOverview.md` §3.4, §14.3
|
|
|
|
## Context
|
|
|
|
The CE-WP-0005 demo loop ends with a user exporting an entire session
|
|
(documents, annotations, evidence, links) into a single `.zip`
|
|
archive and importing it back later. The archive needs to be the
|
|
**only** persistence mechanism the demo provides beyond a tab close —
|
|
no IndexedDB in this workplan — so its shape needs to be locked
|
|
before two parallel tasks (T06, T07) and the integration test (T08)
|
|
land on top of it.
|
|
|
|
Three things need a written contract:
|
|
|
|
1. **ZIP layout** — what files live in the archive, named how.
|
|
2. **manifest.json shape** — versioned JSON schema, validated on
|
|
import.
|
|
3. **Conflict policy** — what happens when an imported session's name
|
|
already exists in the receiving repository.
|
|
|
|
## Decision
|
|
|
|
### ZIP layout
|
|
|
|
```
|
|
manifest.json
|
|
documents/
|
|
<documentId>.pdf
|
|
```
|
|
|
|
- `<documentId>` is the engine-minted branded id (`doc_<uuid>`). Using
|
|
it as the filename means the manifest's `documentBindings[i]` can
|
|
cross-reference the binary file without an additional lookup table.
|
|
- Per-representation files (e.g. an extracted-text JSON alongside each
|
|
PDF) are intentionally deferred. The canonical text + selectors are
|
|
embedded in the engine snapshot inside `manifest.json`, so a
|
|
re-import can regenerate everything from the binary.
|
|
- Future archive variants (multi-attachment documents, Markdown
|
|
documents) extend by adding subdirectories under the archive root.
|
|
Importers must ignore unknown top-level entries so older clients
|
|
remain compatible with newer archives that add new file types.
|
|
|
|
### `manifest.json` shape (schemaVersion 1)
|
|
|
|
```ts
|
|
interface SessionArchiveManifest {
|
|
schemaVersion: 1;
|
|
exportedAt: string; // ISO-8601 UTC timestamp
|
|
session: {
|
|
id: SessionId; // sess_<uuid>
|
|
name: string; // trimmed display name
|
|
createdAt: string; // ISO-8601
|
|
updatedAt: string; // ISO-8601
|
|
};
|
|
engine: EngineSnapshot; // shape from src/engine/persistence.ts
|
|
documentBindings: Array<{
|
|
documentId: DocumentId; // matches the engine's record
|
|
filename: string; // original filename from upload
|
|
fingerprint: string; // SHA-256 — used by the importer for dedup
|
|
}>;
|
|
}
|
|
```
|
|
|
|
The `engine` field is the same shape that `captureSnapshot()` produces
|
|
in `src/engine/persistence.ts`. Re-using it verbatim keeps the
|
|
in-memory ↔ archive round-trip a one-way conversion (snapshot ↔
|
|
JSON) instead of growing a parallel schema that would drift.
|
|
|
|
Unknown fields at the top level **must be preserved** on import (a
|
|
future client can write them) but unknown fields inside `session` or
|
|
`documentBindings[i]` are dropped — the import constructs typed
|
|
domain objects from the validated subset.
|
|
|
|
### Merge-on-name-collision policy (T07)
|
|
|
|
When an imported manifest's `session.name` matches an existing
|
|
session, the existing session is the **target** (`outcome:
|
|
"merged-into"`). Otherwise a fresh session is created with the
|
|
imported name (`outcome: "created"`).
|
|
|
|
Within the target session:
|
|
|
|
- **Documents** are deduped by `fingerprint` (SHA-256 over the PDF
|
|
bytes). If a document with the same fingerprint already exists,
|
|
the import keeps the existing `documentId` and records a remap
|
|
from the incoming id. The binary file is **skipped** (we already
|
|
have the bytes). Otherwise a fresh `documentId` is minted and the
|
|
bytes go into the per-session byte store.
|
|
- **Annotations**, **evidence items**, and **evidence links** are
|
|
imported **additively**: each gets a freshly minted id, with any
|
|
`documentId`/`annotationId`/`evidenceItemId` references rewritten
|
|
via the remap. No update-in-place, no overwrite-by-id.
|
|
|
|
#### Known limitation: re-importing your own export duplicates annotations
|
|
|
|
Because annotations/evidence/links are always added with fresh ids,
|
|
re-importing a ZIP you just exported into the same session creates a
|
|
second copy of every annotation (the existing PDF bytes dedupe
|
|
correctly via fingerprint, but the annotations have nothing to
|
|
de-dupe against).
|
|
|
|
This is intentional for the demo loop and documented here so it's not
|
|
mistaken for a bug. A future workplan can introduce an
|
|
`importBundleId` field (a UUID minted at export time, stamped onto
|
|
the manifest and on every annotation/evidence-link the import
|
|
creates) plus a dedupe pass that skips entities already imported
|
|
under the same bundle id.
|
|
|
|
## Consequences
|
|
|
|
- **One source of truth for the engine snapshot.** Same shape on disk
|
|
and in memory; the persistence helpers stay re-usable.
|
|
- **Fingerprint-based dedup is byte-stable.** Two users converting
|
|
the same PDF end up with identical fingerprints; merging their
|
|
archives works as expected.
|
|
- **Idempotency is opt-in, not the default.** A user who wants exact
|
|
round-trips must use a future `importBundleId` flow, not the basic
|
|
T07 import.
|
|
- **Forward-compatible additions are cheap.** New top-level keys land
|
|
by adding fields; old importers preserve them and new importers
|
|
consume them.
|
|
|
|
## Status
|
|
|
|
Accepted. The TypeScript types + `parseSessionArchiveManifest` in
|
|
`src/shared/session-archive.ts` are the executable contract for
|
|
schemaVersion 1.
|