IB-WP-0014: archive integration with artifact-store (T01+T02)

Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive
surface via artifact-store". The live infospace stays in a local working folder;
finalized snapshots are bundled into content-addressed artifact-store packages.

- New module infospace_bench.archive: archive_infospace(), list_archives(),
  ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under
  output/archives/.store/ when no Registry is passed in.
- New output/archives/index.yaml records each archive event (package id,
  manifest digest, retention class, included paths, file count, note).
- artifactstore added as a path dep; Python floor bumped to 3.12 to match.
- Makefile for venv-based dev setup; stack-and-commands.md updated.
- tests/test_archive.py covers index write, list, recursive-capture guard,
  caller-supplied include, and empty-include error. Full suite 65 passed.

Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 11:30:49 +02:00
parent 673ed6e274
commit 36bfa33fb9
7 changed files with 628 additions and 85 deletions

View File

@@ -1,77 +1,124 @@
---
id: IB-WP-0014
type: workplan
title: "Infospace Backend Abstraction"
title: "Infospace Archive Integration With artifact-store"
domain: markitect
repo: infospace-bench
status: todo
status: in_progress
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-14"
updated: "2026-05-17"
state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction"
state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956"
---
# IB-WP-0014 - Infospace Backend Abstraction
# IB-WP-0014 - Infospace Archive Integration With artifact-store
## Goal
Allow an infospace to live behind a selectable backend instead of assuming only
a local filesystem directory.
Target backends:
- local folder
- remote or mounted folder
- S3-compatible bucket/prefix
- git repository
This is a new successor capability, not legacy parity. It should be designed so
generation, validation, evaluation, and inspection logic do not care where the
infospace is physically stored.
Let a finalized infospace state (or a curated slice of it) be preserved as a
durable, content-addressed package through `artifact-store`, while the live
infospace continues to live in a local working folder.
## Intent
The current repo is intentionally file-backed. That should remain the default.
The improvement is to formalize the storage boundary so the same lifecycle and
workflow APIs can operate on other backing stores through explicit adapters.
The original framing of this workplan asked for a pluggable storage backend
(local, remote folder, S3, git) *inside* `infospace-bench`. Looking at
`/home/worsch/artifact-store`, that is exactly the boundary the artifact-store
service is being built for: an immutable, content-addressed registry with
retention policy, holds, audit, manifests, and a pluggable storage adapter
SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).
The design should keep `infospace-bench` as an application workspace, not a
durable storage engine. Credentials, remote locking, rich audit, and runtime
orchestration should be delegated or integrated carefully rather than invented
inside core application logic.
Re-inventing a second backend abstraction in `infospace-bench` would duplicate
that surface and tangle durable-storage concerns with the live infospace
working directory (which is read-write-read-write across many sessions and is
not a fit for content-addressed immutability).
This workplan therefore replaces "selectable backend" with "durable archive
surface":
- The working infospace continues to live in a local folder. That stays the
only *working* storage form.
- A new `archive` capability bundles the infospace (or selected subdirs) into
an `artifact-store` package, finalizes it, and records the returned package
id and manifest digest inside the infospace.
- A `restore` capability re-materializes a previously archived state into a
target directory.
- Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the
configured artifact-store deployment, not implemented here.
## Non-Goals
- Replace the existing local folder behavior.
- Require S3 or git dependencies for ordinary local use.
- Store secrets in `infospace.yaml`.
- Build a general database, sync server, or object storage service inside this
repo.
- Solve multi-writer conflict resolution beyond clear detection and reporting
in the first pass.
- Replace the local working folder for live infospace operations.
- Re-implement S3, git, or any other storage backend inside `infospace-bench`.
- Make the live infospace content-addressed or immutable.
- Provide multi-writer concurrency control beyond what artifact-store offers.
- Ship a remote service. Integration is library-only via the `artifactstore`
Python package (path dep), wired in-process.
## Development setup
`artifactstore` brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3,
pydantic-settings, structlog) that are not on the system Python. Use the
repo Makefile to provision a local venv:
```bash
make install # creates .venv and installs path deps + pytest
make test # full suite
make test-archive # only the archive integration tests
```
The `.venv/` directory is gitignored.
## Architecture
```text
+----------------------------+ +-------------------------+
| live infospace (folder) | | artifact-store |
| - infospace.yaml | ==> | (library, in-process) |
| - artifacts/... |archive | - registry |
| - output/metrics/... | | - manifest |
| - reports/... | <== | - retention policy |
| - exports/... | restore| - storage backends |
| - output/archives/index | +-------------------------+
+----------------------------+
```
- The infospace remains the working source of truth for the live state.
- artifact-store owns durable storage, content hashing, manifest, retention,
audit, and backend selection.
- A new `output/archives/index.yaml` inside the infospace records every
archive event (package id, manifest digest, retention class, included
paths, note).
## Tasks
### T01 - Backend contract and URI model
### T01 - Archive contract and infospace metadata
```task
id: IB-WP-0014-T01
status: todo
status: in_progress
priority: high
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
```
- Define a backend-neutral infospace location model
- Support local paths without changing current user flows
- Define URI examples for local, mounted folder, S3-compatible, and git-backed
infospaces
- Define backend capabilities: read, write, list, exists, atomic write,
digest, version, sync, lock, and credentials-required
- Document where credentials and remote configuration are allowed to live
- Add `artifactstore` as a path dependency on `/home/worsch/artifact-store`.
- Define an in-repo Python contract `infospace_bench.archive`:
- `archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord`
- `restore_archive(package_id, *, target, registry=None) -> RestoredArchive`
- `list_archives(root) -> list[ArchiveRecord]`
- Map default `retention_class` to `release-evidence`; allow any class the
registry exposes via `list_retention_classes()`.
- Default `include` set: `infospace.yaml`, `artifacts/`, `workflows/`,
`output/`, `reports/`, `exports/`. Allow caller-supplied include patterns.
- Document credentials policy: never write secrets into `infospace.yaml` or
archive metadata; backend secrets stay with the artifact-store deployment.
- Define `output/archives/index.yaml` schema: list of records with
`package_id`, `manifest_digest`, `retention_class`, `created_at`,
`included_paths`, `file_count`, `note`, `producer`, `subject`.
### T02 - Local and remote folder backend baseline
### T02 - Archive command and library implementation
```task
id: IB-WP-0014-T02
@@ -80,16 +127,28 @@ priority: high
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
```
- Refactor lifecycle reads and writes behind a backend adapter while preserving
current `Path`-based behavior
- Keep local folders as the default backend
- Treat mounted or remote folders as folder backends when the OS exposes them
as paths
- Add tests proving current pilots and CLI commands still work unchanged
- Add tests for backend errors such as missing files, write failures, and
unsafe paths
- Implement `archive_infospace` against `artifactstore.registry.Registry`:
- Create a package with producer=`infospace-bench`,
subject=`<infospace-slug>`, retention class as requested.
- Walk the include set; stream each file via `registry.ingest_file`.
- Finalize the package and capture the manifest digest.
- Append the new record to `output/archives/index.yaml`.
- Wire `infospace-bench archive <root>` in the CLI with flags
`--retention-class`, `--include`, `--note`, `--store-root`.
- Provide a `_build_default_registry(store_root)` helper that calls
`artifactstore.app.build_registry()` with overridden settings so the
default behavior is a self-contained store under
`<infospace>/output/archives/.store/` (SQLite + local FS). Honor
`ARTIFACTSTORE_*` env vars when set so operators can point at a shared
artifact-store deployment.
- Tests:
- Archiving a small infospace returns a stable record and writes index.
- Re-archiving the same content reuses content-addressed bytes (verifies
artifact-store dedup at the storage layer).
- Excluded paths are not ingested.
- Default-include path produces a non-empty package.
### T03 - S3 object-store backend adapter
### T03 - Archive index and list command
```task
id: IB-WP-0014-T03
@@ -98,31 +157,36 @@ priority: high
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
```
- Design an optional S3-compatible backend adapter
- Use a fake in-memory or local test double for default tests
- Keep real credentials and network calls out of the default test suite
- Define object key layout for manifests, artifacts, reports, exports, and run
records
- Decide how digests, optimistic concurrency, and partial writes are reported
- Implement `list_archives(root)` reading `output/archives/index.yaml`.
- Wire `infospace-bench archive-list <root>` to print the records as a table
and optionally `--json`.
- Surface retention state when a registry is available: query
`get_retention_state(package_id)` and annotate each record with current
expiry and hold status.
- Tests for empty index, single-record index, and registry-augmented listing.
### T04 - Git repository backend adapter
### T04 - Restore command
```task
id: IB-WP-0014-T04
status: todo
priority: high
priority: medium
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
```
- Support opening or initializing an infospace backed by a git repository
- Prove behavior against local test repositories before any remote network
workflow
- Define when commits are created, when they are only suggested, and how dirty
trees are reported
- Keep automatic commits opt-in
- Preserve compatibility with the existing State Hub and workplan workflow
- Implement `restore_archive(package_id, *, target, registry)`:
- Fetch the finalized manifest via `registry.get_manifest_bytes(..., format="json")`.
- Iterate files, call `registry.get_file(file_id)`, stream bytes to
`<target>/<relative_path>`.
- Refuse to overwrite an existing non-empty target unless `--force` is set.
- Wire `infospace-bench restore <package-id> --target <dir>` in the CLI.
- Tests:
- Round-trip: archive an infospace, restore into a new directory, diff is
empty (modulo `output/archives/index.yaml` which is local).
- Restore refuses to overwrite a non-empty target.
- Restore by manifest digest also works (lookup via `list_packages`).
### T05 - Backend CLI docs and migration path
### T05 - Docs and operator notes
```task
id: IB-WP-0014-T05
@@ -131,26 +195,36 @@ priority: medium
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
```
- Expose backend selection in CLI/API docs
- Add examples for local, mounted folder, S3-compatible, and git-backed
infospaces
- Document backend capabilities and limitations
- Add a migration guide for moving a local infospace to another backend
- Update acceptance docs so backend support is distinct from Wealth/VSM
generation parity
- Add `docs/archive-integration.md` covering:
- When to archive vs keep editing locally.
- How to point at a shared artifact-store deployment via `ARTIFACTSTORE_*`
env vars.
- Retention class selection guidance.
- Restore workflow.
- Cross-link from `SCOPE.md` and the relevant CLI help output.
- Note the explicit non-goal: infospace-bench does not implement S3 or git
backends; those live in artifact-store.
## Acceptance
- Existing local-folder behavior remains backward compatible
- Lifecycle, validation, inspection, workflow, metrics, history, and graph
commands can operate through the backend contract
- Default tests remain deterministic and do not require network credentials
- Backend-specific capabilities and failure modes are visible to callers
- S3 and git support are optional and clearly documented
- Storage backend concerns stay separate from generation workflow semantics
- `infospace-bench archive <root>` produces a finalized artifact-store package
and writes a new entry to `output/archives/index.yaml`.
- `infospace-bench archive-list <root>` lists recorded archives, with optional
retention annotations when the registry is reachable.
- `infospace-bench restore <package-id> --target <dir>` round-trips the
archived state byte-for-byte through artifact-store.
- The default-included file set covers the live infospace contract
(`infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`,
`exports/`).
- Default tests do not require network access or external credentials;
archive and restore round-trip against the local-FS backend.
- No S3, git, or remote-folder code exists inside `infospace-bench` after
this workplan.
## Relationship To IB-WP-0013
## Relationship To Other Workplans
`IB-WP-0013` should prove generation parity on the default local backend first.
This workplan then makes the same infospace operations portable across storage
backends.
- `IB-WP-0013` proves generation parity on the local working folder. This
workplan adds durable preservation of those generated outputs.
- `artifact-store` WP-0004 will bring S3-compatible storage; once that
lands, pointing infospace-bench archives at S3 is purely an
artifact-store configuration change.