Files
infospace-bench/workplans/IB-WP-0014-infospace-backend-abstraction.md
tegwick 7825608307 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T05: todo → done
2026-05-17 11:52:20 +02:00

231 lines
9.2 KiB
Markdown

---
id: IB-WP-0014
type: workplan
title: "Infospace Archive Integration With artifact-store"
domain: markitect
repo: infospace-bench
status: done
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-17"
state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction"
state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956"
---
# IB-WP-0014 - Infospace Archive Integration With artifact-store
## Goal
Let a finalized infospace state (or a curated slice of it) be preserved as a
durable, content-addressed package through `artifact-store`, while the live
infospace continues to live in a local working folder.
## Intent
The original framing of this workplan asked for a pluggable storage backend
(local, remote folder, S3, git) *inside* `infospace-bench`. Looking at
`/home/worsch/artifact-store`, that is exactly the boundary the artifact-store
service is being built for: an immutable, content-addressed registry with
retention policy, holds, audit, manifests, and a pluggable storage adapter
SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).
Re-inventing a second backend abstraction in `infospace-bench` would duplicate
that surface and tangle durable-storage concerns with the live infospace
working directory (which is read-write-read-write across many sessions and is
not a fit for content-addressed immutability).
This workplan therefore replaces "selectable backend" with "durable archive
surface":
- The working infospace continues to live in a local folder. That stays the
only *working* storage form.
- A new `archive` capability bundles the infospace (or selected subdirs) into
an `artifact-store` package, finalizes it, and records the returned package
id and manifest digest inside the infospace.
- A `restore` capability re-materializes a previously archived state into a
target directory.
- Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the
configured artifact-store deployment, not implemented here.
## Non-Goals
- Replace the local working folder for live infospace operations.
- Re-implement S3, git, or any other storage backend inside `infospace-bench`.
- Make the live infospace content-addressed or immutable.
- Provide multi-writer concurrency control beyond what artifact-store offers.
- Ship a remote service. Integration is library-only via the `artifactstore`
Python package (path dep), wired in-process.
## Development setup
`artifactstore` brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3,
pydantic-settings, structlog) that are not on the system Python. Use the
repo Makefile to provision a local venv:
```bash
make install # creates .venv and installs path deps + pytest
make test # full suite
make test-archive # only the archive integration tests
```
The `.venv/` directory is gitignored.
## Architecture
```text
+----------------------------+ +-------------------------+
| live infospace (folder) | | artifact-store |
| - infospace.yaml | ==> | (library, in-process) |
| - artifacts/... |archive | - registry |
| - output/metrics/... | | - manifest |
| - reports/... | <== | - retention policy |
| - exports/... | restore| - storage backends |
| - output/archives/index | +-------------------------+
+----------------------------+
```
- The infospace remains the working source of truth for the live state.
- artifact-store owns durable storage, content hashing, manifest, retention,
audit, and backend selection.
- A new `output/archives/index.yaml` inside the infospace records every
archive event (package id, manifest digest, retention class, included
paths, note).
## Tasks
### T01 - Archive contract and infospace metadata
```task
id: IB-WP-0014-T01
status: done
priority: high
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
```
- Add `artifactstore` as a path dependency on `/home/worsch/artifact-store`.
- Define an in-repo Python contract `infospace_bench.archive`:
- `archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord`
- `restore_archive(package_id, *, target, registry=None) -> RestoredArchive`
- `list_archives(root) -> list[ArchiveRecord]`
- Map default `retention_class` to `release-evidence`; allow any class the
registry exposes via `list_retention_classes()`.
- Default `include` set: `infospace.yaml`, `artifacts/`, `workflows/`,
`output/`, `reports/`, `exports/`. Allow caller-supplied include patterns.
- Document credentials policy: never write secrets into `infospace.yaml` or
archive metadata; backend secrets stay with the artifact-store deployment.
- Define `output/archives/index.yaml` schema: list of records with
`package_id`, `manifest_digest`, `retention_class`, `created_at`,
`included_paths`, `file_count`, `note`, `producer`, `subject`.
### T02 - Archive command and library implementation
```task
id: IB-WP-0014-T02
status: done
priority: high
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
```
- Implement `archive_infospace` against `artifactstore.registry.Registry`:
- Create a package with producer=`infospace-bench`,
subject=`<infospace-slug>`, retention class as requested.
- Walk the include set; stream each file via `registry.ingest_file`.
- Finalize the package and capture the manifest digest.
- Append the new record to `output/archives/index.yaml`.
- Wire `infospace-bench archive <root>` in the CLI with flags
`--retention-class`, `--include`, `--note`, `--store-root`.
- Provide a `_build_default_registry(store_root)` helper that calls
`artifactstore.app.build_registry()` with overridden settings so the
default behavior is a self-contained store under
`<infospace>/output/archives/.store/` (SQLite + local FS). Honor
`ARTIFACTSTORE_*` env vars when set so operators can point at a shared
artifact-store deployment.
- Tests:
- Archiving a small infospace returns a stable record and writes index.
- Re-archiving the same content reuses content-addressed bytes (verifies
artifact-store dedup at the storage layer).
- Excluded paths are not ingested.
- Default-include path produces a non-empty package.
### T03 - Archive index and list command
```task
id: IB-WP-0014-T03
status: done
priority: high
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
```
- Implement `list_archives(root)` reading `output/archives/index.yaml`.
- Wire `infospace-bench archive-list <root>` to print the records as a table
and optionally `--json`.
- Surface retention state when a registry is available: query
`get_retention_state(package_id)` and annotate each record with current
expiry and hold status.
- Tests for empty index, single-record index, and registry-augmented listing.
### T04 - Restore command
```task
id: IB-WP-0014-T04
status: done
priority: medium
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
```
- Implement `restore_archive(package_id, *, target, registry)`:
- Fetch the finalized manifest via `registry.get_manifest_bytes(..., format="json")`.
- Iterate files, call `registry.get_file(file_id)`, stream bytes to
`<target>/<relative_path>`.
- Refuse to overwrite an existing non-empty target unless `--force` is set.
- Wire `infospace-bench restore <package-id> --target <dir>` in the CLI.
- Tests:
- Round-trip: archive an infospace, restore into a new directory, diff is
empty (modulo `output/archives/index.yaml` which is local).
- Restore refuses to overwrite a non-empty target.
- Restore by manifest digest also works (lookup via `list_packages`).
### T05 - Docs and operator notes
```task
id: IB-WP-0014-T05
status: done
priority: medium
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
```
- Add `docs/archive-integration.md` covering:
- When to archive vs keep editing locally.
- How to point at a shared artifact-store deployment via `ARTIFACTSTORE_*`
env vars.
- Retention class selection guidance.
- Restore workflow.
- Cross-link from `SCOPE.md` and the relevant CLI help output.
- Note the explicit non-goal: infospace-bench does not implement S3 or git
backends; those live in artifact-store.
## Acceptance
- `infospace-bench archive <root>` produces a finalized artifact-store package
and writes a new entry to `output/archives/index.yaml`.
- `infospace-bench archive-list <root>` lists recorded archives, with optional
retention annotations when the registry is reachable.
- `infospace-bench restore <package-id> --target <dir>` round-trips the
archived state byte-for-byte through artifact-store.
- The default-included file set covers the live infospace contract
(`infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`,
`exports/`).
- Default tests do not require network access or external credentials;
archive and restore round-trip against the local-FS backend.
- No S3, git, or remote-folder code exists inside `infospace-bench` after
this workplan.
## Relationship To Other Workplans
- `IB-WP-0013` proves generation parity on the local working folder. This
workplan adds durable preservation of those generated outputs.
- `artifact-store` WP-0004 will bring S3-compatible storage; once that
lands, pointing infospace-bench archives at S3 is purely an
artifact-store configuration change.