--- id: IB-WP-0014 type: workplan title: "Infospace Archive Integration With artifact-store" domain: markitect repo: infospace-bench status: done owner: markitect topic_slug: markitect created: "2026-05-14" updated: "2026-05-17" state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction" state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956" --- # IB-WP-0014 - Infospace Archive Integration With artifact-store ## Goal Let a finalized infospace state (or a curated slice of it) be preserved as a durable, content-addressed package through `artifact-store`, while the live infospace continues to live in a local working folder. ## Intent The original framing of this workplan asked for a pluggable storage backend (local, remote folder, S3, git) *inside* `infospace-bench`. Looking at `/home/worsch/artifact-store`, that is exactly the boundary the artifact-store service is being built for: an immutable, content-addressed registry with retention policy, holds, audit, manifests, and a pluggable storage adapter SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004). Re-inventing a second backend abstraction in `infospace-bench` would duplicate that surface and tangle durable-storage concerns with the live infospace working directory (which is read-write-read-write across many sessions and is not a fit for content-addressed immutability). This workplan therefore replaces "selectable backend" with "durable archive surface": - The working infospace continues to live in a local folder. That stays the only *working* storage form. - A new `archive` capability bundles the infospace (or selected subdirs) into an `artifact-store` package, finalizes it, and records the returned package id and manifest digest inside the infospace. - A `restore` capability re-materializes a previously archived state into a target directory. - Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the configured artifact-store deployment, not implemented here. ## Non-Goals - Replace the local working folder for live infospace operations. - Re-implement S3, git, or any other storage backend inside `infospace-bench`. - Make the live infospace content-addressed or immutable. - Provide multi-writer concurrency control beyond what artifact-store offers. - Ship a remote service. Integration is library-only via the `artifactstore` Python package (path dep), wired in-process. ## Development setup `artifactstore` brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3, pydantic-settings, structlog) that are not on the system Python. Use the repo Makefile to provision a local venv: ```bash make install # creates .venv and installs path deps + pytest make test # full suite make test-archive # only the archive integration tests ``` The `.venv/` directory is gitignored. ## Architecture ```text +----------------------------+ +-------------------------+ | live infospace (folder) | | artifact-store | | - infospace.yaml | ==> | (library, in-process) | | - artifacts/... |archive | - registry | | - output/metrics/... | | - manifest | | - reports/... | <== | - retention policy | | - exports/... | restore| - storage backends | | - output/archives/index | +-------------------------+ +----------------------------+ ``` - The infospace remains the working source of truth for the live state. - artifact-store owns durable storage, content hashing, manifest, retention, audit, and backend selection. - A new `output/archives/index.yaml` inside the infospace records every archive event (package id, manifest digest, retention class, included paths, note). ## Tasks ### T01 - Archive contract and infospace metadata ```task id: IB-WP-0014-T01 status: done priority: high state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd" ``` - Add `artifactstore` as a path dependency on `/home/worsch/artifact-store`. - Define an in-repo Python contract `infospace_bench.archive`: - `archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord` - `restore_archive(package_id, *, target, registry=None) -> RestoredArchive` - `list_archives(root) -> list[ArchiveRecord]` - Map default `retention_class` to `release-evidence`; allow any class the registry exposes via `list_retention_classes()`. - Default `include` set: `infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`, `exports/`. Allow caller-supplied include patterns. - Document credentials policy: never write secrets into `infospace.yaml` or archive metadata; backend secrets stay with the artifact-store deployment. - Define `output/archives/index.yaml` schema: list of records with `package_id`, `manifest_digest`, `retention_class`, `created_at`, `included_paths`, `file_count`, `note`, `producer`, `subject`. ### T02 - Archive command and library implementation ```task id: IB-WP-0014-T02 status: done priority: high state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca" ``` - Implement `archive_infospace` against `artifactstore.registry.Registry`: - Create a package with producer=`infospace-bench`, subject=``, retention class as requested. - Walk the include set; stream each file via `registry.ingest_file`. - Finalize the package and capture the manifest digest. - Append the new record to `output/archives/index.yaml`. - Wire `infospace-bench archive ` in the CLI with flags `--retention-class`, `--include`, `--note`, `--store-root`. - Provide a `_build_default_registry(store_root)` helper that calls `artifactstore.app.build_registry()` with overridden settings so the default behavior is a self-contained store under `/output/archives/.store/` (SQLite + local FS). Honor `ARTIFACTSTORE_*` env vars when set so operators can point at a shared artifact-store deployment. - Tests: - Archiving a small infospace returns a stable record and writes index. - Re-archiving the same content reuses content-addressed bytes (verifies artifact-store dedup at the storage layer). - Excluded paths are not ingested. - Default-include path produces a non-empty package. ### T03 - Archive index and list command ```task id: IB-WP-0014-T03 status: done priority: high state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05" ``` - Implement `list_archives(root)` reading `output/archives/index.yaml`. - Wire `infospace-bench archive-list ` to print the records as a table and optionally `--json`. - Surface retention state when a registry is available: query `get_retention_state(package_id)` and annotate each record with current expiry and hold status. - Tests for empty index, single-record index, and registry-augmented listing. ### T04 - Restore command ```task id: IB-WP-0014-T04 status: done priority: medium state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b" ``` - Implement `restore_archive(package_id, *, target, registry)`: - Fetch the finalized manifest via `registry.get_manifest_bytes(..., format="json")`. - Iterate files, call `registry.get_file(file_id)`, stream bytes to `/`. - Refuse to overwrite an existing non-empty target unless `--force` is set. - Wire `infospace-bench restore --target ` in the CLI. - Tests: - Round-trip: archive an infospace, restore into a new directory, diff is empty (modulo `output/archives/index.yaml` which is local). - Restore refuses to overwrite a non-empty target. - Restore by manifest digest also works (lookup via `list_packages`). ### T05 - Docs and operator notes ```task id: IB-WP-0014-T05 status: done priority: medium state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a" ``` - Add `docs/archive-integration.md` covering: - When to archive vs keep editing locally. - How to point at a shared artifact-store deployment via `ARTIFACTSTORE_*` env vars. - Retention class selection guidance. - Restore workflow. - Cross-link from `SCOPE.md` and the relevant CLI help output. - Note the explicit non-goal: infospace-bench does not implement S3 or git backends; those live in artifact-store. ## Acceptance - `infospace-bench archive ` produces a finalized artifact-store package and writes a new entry to `output/archives/index.yaml`. - `infospace-bench archive-list ` lists recorded archives, with optional retention annotations when the registry is reachable. - `infospace-bench restore --target ` round-trips the archived state byte-for-byte through artifact-store. - The default-included file set covers the live infospace contract (`infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`, `exports/`). - Default tests do not require network access or external credentials; archive and restore round-trip against the local-FS backend. - No S3, git, or remote-folder code exists inside `infospace-bench` after this workplan. ## Relationship To Other Workplans - `IB-WP-0013` proves generation parity on the local working folder. This workplan adds durable preservation of those generated outputs. - `artifact-store` WP-0004 will bring S3-compatible storage; once that lands, pointing infospace-bench archives at S3 is purely an artifact-store configuration change.