Files
infospace-bench/docs/archive-integration.md
tegwick ddefd69f71 IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05)
Round out IB-WP-0014 with the remaining archive operations and docs.

- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
  a finalized package's bytes back to disk. Refuses to overwrite a non-empty
  target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
  per-infospace registry and joins each record with its current retention
  state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
  retention classes, storage layout, credentials policy, and the explicit
  non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:46:23 +02:00

152 lines
5.8 KiB
Markdown

# Archive Integration With artifact-store
`infospace-bench` is an application workspace for *live* infospaces. The
working state lives in a local folder and is read-write-read-write across many
sessions. Durable, content-addressed preservation of finalized snapshots is
delegated to [`artifact-store`](file:///home/worsch/artifact-store), which
owns identity, manifests, retention policy, audit, and pluggable storage
backends (local FS today, S3-compatible / Ceph RGW in artifact-store WP-0004).
This document is the operator-facing companion to workplan
[`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md).
## When to archive
Archive an infospace when:
- A milestone has been reached (pilot complete, evaluations stable).
- The infospace will be referenced from another system (StateHub linkage,
release notes, audit evidence).
- You want a recoverable point-in-time snapshot before a destructive change.
- You need to share an exact, hash-verifiable copy of the state with someone
else.
Do **not** archive as a substitute for normal save / commit. Each archive
creates a new immutable package; long sequences of archives without intent
will inflate the local store. Use git for in-flight working state.
## What gets archived
By default, the archive includes:
- `infospace.yaml`
- `artifacts/`
- `workflows/`
- `output/` (metrics, evaluations, run records, memory traces, ...)
- `reports/`
- `exports/`
Always excluded:
- `output/archives/.store/` (the artifact-store data dir — would cause
recursive capture)
- `output/archives/index.yaml` (the archive record index itself is a local
pointer, not part of the preserved snapshot)
Override the include / exclude sets with `--include` and `--exclude`
(repeatable). Both accept relative paths or globs.
## Retention class
`artifact-store` ships these retention classes:
| Class | Typical use |
|-----------------------|--------------------------------------------|
| `transient` | Scratch outputs you only need briefly |
| `raw-evidence` | Untriaged raw run output |
| `summary-evidence` | Aggregated metrics / reports |
| `release-evidence` | Snapshots tied to a release or milestone |
| `permanent-record` | Never expires |
The infospace-bench default is `release-evidence`. Override with
`--retention-class`. Run `artifactstore retention sweep` from the
`artifact-store` repo to mark expired packages eligible for deletion.
## CLI usage
```bash
# Archive the current infospace (default include set)
infospace-bench archive infospaces/agentic-memory-profile-pilot \
--note "Memory profile pilot v1 frozen"
# Custom include set
infospace-bench archive infospaces/lefevre \
--include reports --include exports --include infospace.yaml \
--retention-class summary-evidence
# List recorded archives
infospace-bench archive-list infospaces/agentic-memory-profile-pilot
# List with current retention state (eligibility, holds, expiry)
infospace-bench archive-list infospaces/agentic-memory-profile-pilot \
--with-retention
# Restore an archive into a new directory
infospace-bench restore <package-id> \
--target /tmp/restored-infospace \
--from infospaces/agentic-memory-profile-pilot
```
## Storage location
By default, each infospace gets its own self-contained artifact-store under
`<infospace>/output/archives/.store/`:
```
output/archives/
index.yaml # human-readable archive record list
.store/
registry.sqlite # artifact-store event log + materialised views
storage/
blake3/
ab/
cd/
abcdef... # content-addressed bytes
```
To point a different artifact-store deployment (shared host, separate
volume), pass `--store-root` or run a shared artifact-store service and pass
its CLI / library handle in code. Future improvement: respect the standard
`ARTIFACTSTORE_*` environment variables so an operator can point any
infospace at a shared deployment without code changes. Today the in-process
helper builds a self-contained store; an `artifactstore.app.build_registry()`
adapter for that env-driven path is a small follow-up.
## Credentials policy
- Never write secrets (API keys, S3 access keys) into `infospace.yaml` or
archive metadata. Archive metadata is part of the immutable manifest.
- Backend secrets live with the artifact-store deployment
(`ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:NAME` or
`file:/run/secrets/...`) — never inside the infospace.
## Round-trip guarantees
- `restore_archive` re-materializes every file recorded in the package's
manifest into the target directory, byte-equivalent to the originals.
- The manifest digest (`blake3:<hex>`) returned by `archive` is the stable
external identifier; it survives store relocations.
- Restoration refuses to overwrite a non-empty target unless `--force` is
passed. Pre-existing files not in the manifest are left in place.
## What this is not
- Not a replacement for the local working folder during active work.
- Not a sync / replication channel between hosts. Use git or
artifact-store's S3 backend (artifact-store WP-0004) for that.
- Not a backup strategy. Backups are an operations concern at the
artifact-store deployment level.
- Not an S3 or git client inside `infospace-bench`. Those backends live in
`artifact-store`.
## Related workplans
- [`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md) —
this integration.
- [`IB-WP-0013`](../workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md) —
generation parity on the local working folder (archives capture its
outputs).
- `artifact-store` WP-0004 — S3-compatible / Ceph RGW backend; pointing
infospace-bench archives at S3 will be an artifact-store configuration
change only.