generated from coulomb/repo-seed
IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05)
Round out IB-WP-0014 with the remaining archive operations and docs. - restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip a finalized package's bytes back to disk. Refuses to overwrite a non-empty target unless --force. --from <infospace-root> resolves the store location. - archive-list CLI with --with-retention flag; annotate_retention() opens the per-infospace registry and joins each record with its current retention state (effective class, expires, holds, eligibility). - docs/archive-integration.md covers when to archive, the include set, retention classes, storage layout, credentials policy, and the explicit non-goal that S3/git backends live in artifact-store. - SCOPE.md cross-links the new doc. - Workplan flipped to status: done. Full pytest suite: 72 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
151
docs/archive-integration.md
Normal file
151
docs/archive-integration.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Archive Integration With artifact-store
|
||||
|
||||
`infospace-bench` is an application workspace for *live* infospaces. The
|
||||
working state lives in a local folder and is read-write-read-write across many
|
||||
sessions. Durable, content-addressed preservation of finalized snapshots is
|
||||
delegated to [`artifact-store`](file:///home/worsch/artifact-store), which
|
||||
owns identity, manifests, retention policy, audit, and pluggable storage
|
||||
backends (local FS today, S3-compatible / Ceph RGW in artifact-store WP-0004).
|
||||
|
||||
This document is the operator-facing companion to workplan
|
||||
[`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md).
|
||||
|
||||
## When to archive
|
||||
|
||||
Archive an infospace when:
|
||||
|
||||
- A milestone has been reached (pilot complete, evaluations stable).
|
||||
- The infospace will be referenced from another system (StateHub linkage,
|
||||
release notes, audit evidence).
|
||||
- You want a recoverable point-in-time snapshot before a destructive change.
|
||||
- You need to share an exact, hash-verifiable copy of the state with someone
|
||||
else.
|
||||
|
||||
Do **not** archive as a substitute for normal save / commit. Each archive
|
||||
creates a new immutable package; long sequences of archives without intent
|
||||
will inflate the local store. Use git for in-flight working state.
|
||||
|
||||
## What gets archived
|
||||
|
||||
By default, the archive includes:
|
||||
|
||||
- `infospace.yaml`
|
||||
- `artifacts/`
|
||||
- `workflows/`
|
||||
- `output/` (metrics, evaluations, run records, memory traces, ...)
|
||||
- `reports/`
|
||||
- `exports/`
|
||||
|
||||
Always excluded:
|
||||
|
||||
- `output/archives/.store/` (the artifact-store data dir — would cause
|
||||
recursive capture)
|
||||
- `output/archives/index.yaml` (the archive record index itself is a local
|
||||
pointer, not part of the preserved snapshot)
|
||||
|
||||
Override the include / exclude sets with `--include` and `--exclude`
|
||||
(repeatable). Both accept relative paths or globs.
|
||||
|
||||
## Retention class
|
||||
|
||||
`artifact-store` ships these retention classes:
|
||||
|
||||
| Class | Typical use |
|
||||
|-----------------------|--------------------------------------------|
|
||||
| `transient` | Scratch outputs you only need briefly |
|
||||
| `raw-evidence` | Untriaged raw run output |
|
||||
| `summary-evidence` | Aggregated metrics / reports |
|
||||
| `release-evidence` | Snapshots tied to a release or milestone |
|
||||
| `permanent-record` | Never expires |
|
||||
|
||||
The infospace-bench default is `release-evidence`. Override with
|
||||
`--retention-class`. Run `artifactstore retention sweep` from the
|
||||
`artifact-store` repo to mark expired packages eligible for deletion.
|
||||
|
||||
## CLI usage
|
||||
|
||||
```bash
|
||||
# Archive the current infospace (default include set)
|
||||
infospace-bench archive infospaces/agentic-memory-profile-pilot \
|
||||
--note "Memory profile pilot v1 frozen"
|
||||
|
||||
# Custom include set
|
||||
infospace-bench archive infospaces/lefevre \
|
||||
--include reports --include exports --include infospace.yaml \
|
||||
--retention-class summary-evidence
|
||||
|
||||
# List recorded archives
|
||||
infospace-bench archive-list infospaces/agentic-memory-profile-pilot
|
||||
|
||||
# List with current retention state (eligibility, holds, expiry)
|
||||
infospace-bench archive-list infospaces/agentic-memory-profile-pilot \
|
||||
--with-retention
|
||||
|
||||
# Restore an archive into a new directory
|
||||
infospace-bench restore <package-id> \
|
||||
--target /tmp/restored-infospace \
|
||||
--from infospaces/agentic-memory-profile-pilot
|
||||
```
|
||||
|
||||
## Storage location
|
||||
|
||||
By default, each infospace gets its own self-contained artifact-store under
|
||||
`<infospace>/output/archives/.store/`:
|
||||
|
||||
```
|
||||
output/archives/
|
||||
index.yaml # human-readable archive record list
|
||||
.store/
|
||||
registry.sqlite # artifact-store event log + materialised views
|
||||
storage/
|
||||
blake3/
|
||||
ab/
|
||||
cd/
|
||||
abcdef... # content-addressed bytes
|
||||
```
|
||||
|
||||
To point a different artifact-store deployment (shared host, separate
|
||||
volume), pass `--store-root` or run a shared artifact-store service and pass
|
||||
its CLI / library handle in code. Future improvement: respect the standard
|
||||
`ARTIFACTSTORE_*` environment variables so an operator can point any
|
||||
infospace at a shared deployment without code changes. Today the in-process
|
||||
helper builds a self-contained store; an `artifactstore.app.build_registry()`
|
||||
adapter for that env-driven path is a small follow-up.
|
||||
|
||||
## Credentials policy
|
||||
|
||||
- Never write secrets (API keys, S3 access keys) into `infospace.yaml` or
|
||||
archive metadata. Archive metadata is part of the immutable manifest.
|
||||
- Backend secrets live with the artifact-store deployment
|
||||
(`ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:NAME` or
|
||||
`file:/run/secrets/...`) — never inside the infospace.
|
||||
|
||||
## Round-trip guarantees
|
||||
|
||||
- `restore_archive` re-materializes every file recorded in the package's
|
||||
manifest into the target directory, byte-equivalent to the originals.
|
||||
- The manifest digest (`blake3:<hex>`) returned by `archive` is the stable
|
||||
external identifier; it survives store relocations.
|
||||
- Restoration refuses to overwrite a non-empty target unless `--force` is
|
||||
passed. Pre-existing files not in the manifest are left in place.
|
||||
|
||||
## What this is not
|
||||
|
||||
- Not a replacement for the local working folder during active work.
|
||||
- Not a sync / replication channel between hosts. Use git or
|
||||
artifact-store's S3 backend (artifact-store WP-0004) for that.
|
||||
- Not a backup strategy. Backups are an operations concern at the
|
||||
artifact-store deployment level.
|
||||
- Not an S3 or git client inside `infospace-bench`. Those backends live in
|
||||
`artifact-store`.
|
||||
|
||||
## Related workplans
|
||||
|
||||
- [`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md) —
|
||||
this integration.
|
||||
- [`IB-WP-0013`](../workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md) —
|
||||
generation parity on the local working folder (archives capture its
|
||||
outputs).
|
||||
- `artifact-store` WP-0004 — S3-compatible / Ceph RGW backend; pointing
|
||||
infospace-bench archives at S3 will be an artifact-store configuration
|
||||
change only.
|
||||
Reference in New Issue
Block a user