Files
infospace-bench/docs/archive-integration.md
tegwick ddefd69f71 IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05)
Round out IB-WP-0014 with the remaining archive operations and docs.

- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
  a finalized package's bytes back to disk. Refuses to overwrite a non-empty
  target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
  per-infospace registry and joins each record with its current retention
  state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
  retention classes, storage layout, credentials policy, and the explicit
  non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:46:23 +02:00

5.8 KiB

Archive Integration With artifact-store

infospace-bench is an application workspace for live infospaces. The working state lives in a local folder and is read-write-read-write across many sessions. Durable, content-addressed preservation of finalized snapshots is delegated to artifact-store, which owns identity, manifests, retention policy, audit, and pluggable storage backends (local FS today, S3-compatible / Ceph RGW in artifact-store WP-0004).

This document is the operator-facing companion to workplan IB-WP-0014.

When to archive

Archive an infospace when:

  • A milestone has been reached (pilot complete, evaluations stable).
  • The infospace will be referenced from another system (StateHub linkage, release notes, audit evidence).
  • You want a recoverable point-in-time snapshot before a destructive change.
  • You need to share an exact, hash-verifiable copy of the state with someone else.

Do not archive as a substitute for normal save / commit. Each archive creates a new immutable package; long sequences of archives without intent will inflate the local store. Use git for in-flight working state.

What gets archived

By default, the archive includes:

  • infospace.yaml
  • artifacts/
  • workflows/
  • output/ (metrics, evaluations, run records, memory traces, ...)
  • reports/
  • exports/

Always excluded:

  • output/archives/.store/ (the artifact-store data dir — would cause recursive capture)
  • output/archives/index.yaml (the archive record index itself is a local pointer, not part of the preserved snapshot)

Override the include / exclude sets with --include and --exclude (repeatable). Both accept relative paths or globs.

Retention class

artifact-store ships these retention classes:

Class Typical use
transient Scratch outputs you only need briefly
raw-evidence Untriaged raw run output
summary-evidence Aggregated metrics / reports
release-evidence Snapshots tied to a release or milestone
permanent-record Never expires

The infospace-bench default is release-evidence. Override with --retention-class. Run artifactstore retention sweep from the artifact-store repo to mark expired packages eligible for deletion.

CLI usage

# Archive the current infospace (default include set)
infospace-bench archive infospaces/agentic-memory-profile-pilot \
  --note "Memory profile pilot v1 frozen"

# Custom include set
infospace-bench archive infospaces/lefevre \
  --include reports --include exports --include infospace.yaml \
  --retention-class summary-evidence

# List recorded archives
infospace-bench archive-list infospaces/agentic-memory-profile-pilot

# List with current retention state (eligibility, holds, expiry)
infospace-bench archive-list infospaces/agentic-memory-profile-pilot \
  --with-retention

# Restore an archive into a new directory
infospace-bench restore <package-id> \
  --target /tmp/restored-infospace \
  --from infospaces/agentic-memory-profile-pilot

Storage location

By default, each infospace gets its own self-contained artifact-store under <infospace>/output/archives/.store/:

output/archives/
  index.yaml                    # human-readable archive record list
  .store/
    registry.sqlite             # artifact-store event log + materialised views
    storage/
      blake3/
        ab/
          cd/
            abcdef...           # content-addressed bytes

To point a different artifact-store deployment (shared host, separate volume), pass --store-root or run a shared artifact-store service and pass its CLI / library handle in code. Future improvement: respect the standard ARTIFACTSTORE_* environment variables so an operator can point any infospace at a shared deployment without code changes. Today the in-process helper builds a self-contained store; an artifactstore.app.build_registry() adapter for that env-driven path is a small follow-up.

Credentials policy

  • Never write secrets (API keys, S3 access keys) into infospace.yaml or archive metadata. Archive metadata is part of the immutable manifest.
  • Backend secrets live with the artifact-store deployment (ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:NAME or file:/run/secrets/...) — never inside the infospace.

Round-trip guarantees

  • restore_archive re-materializes every file recorded in the package's manifest into the target directory, byte-equivalent to the originals.
  • The manifest digest (blake3:<hex>) returned by archive is the stable external identifier; it survives store relocations.
  • Restoration refuses to overwrite a non-empty target unless --force is passed. Pre-existing files not in the manifest are left in place.

What this is not

  • Not a replacement for the local working folder during active work.
  • Not a sync / replication channel between hosts. Use git or artifact-store's S3 backend (artifact-store WP-0004) for that.
  • Not a backup strategy. Backups are an operations concern at the artifact-store deployment level.
  • Not an S3 or git client inside infospace-bench. Those backends live in artifact-store.
  • IB-WP-0014 — this integration.
  • IB-WP-0013 — generation parity on the local working folder (archives capture its outputs).
  • artifact-store WP-0004 — S3-compatible / Ceph RGW backend; pointing infospace-bench archives at S3 will be an artifact-store configuration change only.