generated from coulomb/repo-seed
Round out IB-WP-0014 with the remaining archive operations and docs. - restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip a finalized package's bytes back to disk. Refuses to overwrite a non-empty target unless --force. --from <infospace-root> resolves the store location. - archive-list CLI with --with-retention flag; annotate_retention() opens the per-infospace registry and joins each record with its current retention state (effective class, expires, holds, eligibility). - docs/archive-integration.md covers when to archive, the include set, retention classes, storage layout, credentials policy, and the explicit non-goal that S3/git backends live in artifact-store. - SCOPE.md cross-links the new doc. - Workplan flipped to status: done. Full pytest suite: 72 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
152 lines
5.8 KiB
Markdown
152 lines
5.8 KiB
Markdown
# Archive Integration With artifact-store
|
|
|
|
`infospace-bench` is an application workspace for *live* infospaces. The
|
|
working state lives in a local folder and is read-write-read-write across many
|
|
sessions. Durable, content-addressed preservation of finalized snapshots is
|
|
delegated to [`artifact-store`](file:///home/worsch/artifact-store), which
|
|
owns identity, manifests, retention policy, audit, and pluggable storage
|
|
backends (local FS today, S3-compatible / Ceph RGW in artifact-store WP-0004).
|
|
|
|
This document is the operator-facing companion to workplan
|
|
[`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md).
|
|
|
|
## When to archive
|
|
|
|
Archive an infospace when:
|
|
|
|
- A milestone has been reached (pilot complete, evaluations stable).
|
|
- The infospace will be referenced from another system (StateHub linkage,
|
|
release notes, audit evidence).
|
|
- You want a recoverable point-in-time snapshot before a destructive change.
|
|
- You need to share an exact, hash-verifiable copy of the state with someone
|
|
else.
|
|
|
|
Do **not** archive as a substitute for normal save / commit. Each archive
|
|
creates a new immutable package; long sequences of archives without intent
|
|
will inflate the local store. Use git for in-flight working state.
|
|
|
|
## What gets archived
|
|
|
|
By default, the archive includes:
|
|
|
|
- `infospace.yaml`
|
|
- `artifacts/`
|
|
- `workflows/`
|
|
- `output/` (metrics, evaluations, run records, memory traces, ...)
|
|
- `reports/`
|
|
- `exports/`
|
|
|
|
Always excluded:
|
|
|
|
- `output/archives/.store/` (the artifact-store data dir — would cause
|
|
recursive capture)
|
|
- `output/archives/index.yaml` (the archive record index itself is a local
|
|
pointer, not part of the preserved snapshot)
|
|
|
|
Override the include / exclude sets with `--include` and `--exclude`
|
|
(repeatable). Both accept relative paths or globs.
|
|
|
|
## Retention class
|
|
|
|
`artifact-store` ships these retention classes:
|
|
|
|
| Class | Typical use |
|
|
|-----------------------|--------------------------------------------|
|
|
| `transient` | Scratch outputs you only need briefly |
|
|
| `raw-evidence` | Untriaged raw run output |
|
|
| `summary-evidence` | Aggregated metrics / reports |
|
|
| `release-evidence` | Snapshots tied to a release or milestone |
|
|
| `permanent-record` | Never expires |
|
|
|
|
The infospace-bench default is `release-evidence`. Override with
|
|
`--retention-class`. Run `artifactstore retention sweep` from the
|
|
`artifact-store` repo to mark expired packages eligible for deletion.
|
|
|
|
## CLI usage
|
|
|
|
```bash
|
|
# Archive the current infospace (default include set)
|
|
infospace-bench archive infospaces/agentic-memory-profile-pilot \
|
|
--note "Memory profile pilot v1 frozen"
|
|
|
|
# Custom include set
|
|
infospace-bench archive infospaces/lefevre \
|
|
--include reports --include exports --include infospace.yaml \
|
|
--retention-class summary-evidence
|
|
|
|
# List recorded archives
|
|
infospace-bench archive-list infospaces/agentic-memory-profile-pilot
|
|
|
|
# List with current retention state (eligibility, holds, expiry)
|
|
infospace-bench archive-list infospaces/agentic-memory-profile-pilot \
|
|
--with-retention
|
|
|
|
# Restore an archive into a new directory
|
|
infospace-bench restore <package-id> \
|
|
--target /tmp/restored-infospace \
|
|
--from infospaces/agentic-memory-profile-pilot
|
|
```
|
|
|
|
## Storage location
|
|
|
|
By default, each infospace gets its own self-contained artifact-store under
|
|
`<infospace>/output/archives/.store/`:
|
|
|
|
```
|
|
output/archives/
|
|
index.yaml # human-readable archive record list
|
|
.store/
|
|
registry.sqlite # artifact-store event log + materialised views
|
|
storage/
|
|
blake3/
|
|
ab/
|
|
cd/
|
|
abcdef... # content-addressed bytes
|
|
```
|
|
|
|
To point a different artifact-store deployment (shared host, separate
|
|
volume), pass `--store-root` or run a shared artifact-store service and pass
|
|
its CLI / library handle in code. Future improvement: respect the standard
|
|
`ARTIFACTSTORE_*` environment variables so an operator can point any
|
|
infospace at a shared deployment without code changes. Today the in-process
|
|
helper builds a self-contained store; an `artifactstore.app.build_registry()`
|
|
adapter for that env-driven path is a small follow-up.
|
|
|
|
## Credentials policy
|
|
|
|
- Never write secrets (API keys, S3 access keys) into `infospace.yaml` or
|
|
archive metadata. Archive metadata is part of the immutable manifest.
|
|
- Backend secrets live with the artifact-store deployment
|
|
(`ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:NAME` or
|
|
`file:/run/secrets/...`) — never inside the infospace.
|
|
|
|
## Round-trip guarantees
|
|
|
|
- `restore_archive` re-materializes every file recorded in the package's
|
|
manifest into the target directory, byte-equivalent to the originals.
|
|
- The manifest digest (`blake3:<hex>`) returned by `archive` is the stable
|
|
external identifier; it survives store relocations.
|
|
- Restoration refuses to overwrite a non-empty target unless `--force` is
|
|
passed. Pre-existing files not in the manifest are left in place.
|
|
|
|
## What this is not
|
|
|
|
- Not a replacement for the local working folder during active work.
|
|
- Not a sync / replication channel between hosts. Use git or
|
|
artifact-store's S3 backend (artifact-store WP-0004) for that.
|
|
- Not a backup strategy. Backups are an operations concern at the
|
|
artifact-store deployment level.
|
|
- Not an S3 or git client inside `infospace-bench`. Those backends live in
|
|
`artifact-store`.
|
|
|
|
## Related workplans
|
|
|
|
- [`IB-WP-0014`](../workplans/IB-WP-0014-infospace-backend-abstraction.md) —
|
|
this integration.
|
|
- [`IB-WP-0013`](../workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md) —
|
|
generation parity on the local working folder (archives capture its
|
|
outputs).
|
|
- `artifact-store` WP-0004 — S3-compatible / Ceph RGW backend; pointing
|
|
infospace-bench archives at S3 will be an artifact-store configuration
|
|
change only.
|