generated from coulomb/repo-seed
231 lines
9.2 KiB
Markdown
231 lines
9.2 KiB
Markdown
---
|
|
id: IB-WP-0014
|
|
type: workplan
|
|
title: "Infospace Archive Integration With artifact-store"
|
|
domain: markitect
|
|
repo: infospace-bench
|
|
status: done
|
|
owner: markitect
|
|
topic_slug: markitect
|
|
created: "2026-05-14"
|
|
updated: "2026-05-17"
|
|
state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction"
|
|
state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956"
|
|
---
|
|
|
|
# IB-WP-0014 - Infospace Archive Integration With artifact-store
|
|
|
|
## Goal
|
|
|
|
Let a finalized infospace state (or a curated slice of it) be preserved as a
|
|
durable, content-addressed package through `artifact-store`, while the live
|
|
infospace continues to live in a local working folder.
|
|
|
|
## Intent
|
|
|
|
The original framing of this workplan asked for a pluggable storage backend
|
|
(local, remote folder, S3, git) *inside* `infospace-bench`. Looking at
|
|
`/home/worsch/artifact-store`, that is exactly the boundary the artifact-store
|
|
service is being built for: an immutable, content-addressed registry with
|
|
retention policy, holds, audit, manifests, and a pluggable storage adapter
|
|
SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).
|
|
|
|
Re-inventing a second backend abstraction in `infospace-bench` would duplicate
|
|
that surface and tangle durable-storage concerns with the live infospace
|
|
working directory (which is read-write-read-write across many sessions and is
|
|
not a fit for content-addressed immutability).
|
|
|
|
This workplan therefore replaces "selectable backend" with "durable archive
|
|
surface":
|
|
|
|
- The working infospace continues to live in a local folder. That stays the
|
|
only *working* storage form.
|
|
- A new `archive` capability bundles the infospace (or selected subdirs) into
|
|
an `artifact-store` package, finalizes it, and records the returned package
|
|
id and manifest digest inside the infospace.
|
|
- A `restore` capability re-materializes a previously archived state into a
|
|
target directory.
|
|
- Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the
|
|
configured artifact-store deployment, not implemented here.
|
|
|
|
## Non-Goals
|
|
|
|
- Replace the local working folder for live infospace operations.
|
|
- Re-implement S3, git, or any other storage backend inside `infospace-bench`.
|
|
- Make the live infospace content-addressed or immutable.
|
|
- Provide multi-writer concurrency control beyond what artifact-store offers.
|
|
- Ship a remote service. Integration is library-only via the `artifactstore`
|
|
Python package (path dep), wired in-process.
|
|
|
|
## Development setup
|
|
|
|
`artifactstore` brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3,
|
|
pydantic-settings, structlog) that are not on the system Python. Use the
|
|
repo Makefile to provision a local venv:
|
|
|
|
```bash
|
|
make install # creates .venv and installs path deps + pytest
|
|
make test # full suite
|
|
make test-archive # only the archive integration tests
|
|
```
|
|
|
|
The `.venv/` directory is gitignored.
|
|
|
|
## Architecture
|
|
|
|
```text
|
|
+----------------------------+ +-------------------------+
|
|
| live infospace (folder) | | artifact-store |
|
|
| - infospace.yaml | ==> | (library, in-process) |
|
|
| - artifacts/... |archive | - registry |
|
|
| - output/metrics/... | | - manifest |
|
|
| - reports/... | <== | - retention policy |
|
|
| - exports/... | restore| - storage backends |
|
|
| - output/archives/index | +-------------------------+
|
|
+----------------------------+
|
|
```
|
|
|
|
- The infospace remains the working source of truth for the live state.
|
|
- artifact-store owns durable storage, content hashing, manifest, retention,
|
|
audit, and backend selection.
|
|
- A new `output/archives/index.yaml` inside the infospace records every
|
|
archive event (package id, manifest digest, retention class, included
|
|
paths, note).
|
|
|
|
## Tasks
|
|
|
|
### T01 - Archive contract and infospace metadata
|
|
|
|
```task
|
|
id: IB-WP-0014-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
|
|
```
|
|
|
|
- Add `artifactstore` as a path dependency on `/home/worsch/artifact-store`.
|
|
- Define an in-repo Python contract `infospace_bench.archive`:
|
|
- `archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord`
|
|
- `restore_archive(package_id, *, target, registry=None) -> RestoredArchive`
|
|
- `list_archives(root) -> list[ArchiveRecord]`
|
|
- Map default `retention_class` to `release-evidence`; allow any class the
|
|
registry exposes via `list_retention_classes()`.
|
|
- Default `include` set: `infospace.yaml`, `artifacts/`, `workflows/`,
|
|
`output/`, `reports/`, `exports/`. Allow caller-supplied include patterns.
|
|
- Document credentials policy: never write secrets into `infospace.yaml` or
|
|
archive metadata; backend secrets stay with the artifact-store deployment.
|
|
- Define `output/archives/index.yaml` schema: list of records with
|
|
`package_id`, `manifest_digest`, `retention_class`, `created_at`,
|
|
`included_paths`, `file_count`, `note`, `producer`, `subject`.
|
|
|
|
### T02 - Archive command and library implementation
|
|
|
|
```task
|
|
id: IB-WP-0014-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
|
|
```
|
|
|
|
- Implement `archive_infospace` against `artifactstore.registry.Registry`:
|
|
- Create a package with producer=`infospace-bench`,
|
|
subject=`<infospace-slug>`, retention class as requested.
|
|
- Walk the include set; stream each file via `registry.ingest_file`.
|
|
- Finalize the package and capture the manifest digest.
|
|
- Append the new record to `output/archives/index.yaml`.
|
|
- Wire `infospace-bench archive <root>` in the CLI with flags
|
|
`--retention-class`, `--include`, `--note`, `--store-root`.
|
|
- Provide a `_build_default_registry(store_root)` helper that calls
|
|
`artifactstore.app.build_registry()` with overridden settings so the
|
|
default behavior is a self-contained store under
|
|
`<infospace>/output/archives/.store/` (SQLite + local FS). Honor
|
|
`ARTIFACTSTORE_*` env vars when set so operators can point at a shared
|
|
artifact-store deployment.
|
|
- Tests:
|
|
- Archiving a small infospace returns a stable record and writes index.
|
|
- Re-archiving the same content reuses content-addressed bytes (verifies
|
|
artifact-store dedup at the storage layer).
|
|
- Excluded paths are not ingested.
|
|
- Default-include path produces a non-empty package.
|
|
|
|
### T03 - Archive index and list command
|
|
|
|
```task
|
|
id: IB-WP-0014-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
|
|
```
|
|
|
|
- Implement `list_archives(root)` reading `output/archives/index.yaml`.
|
|
- Wire `infospace-bench archive-list <root>` to print the records as a table
|
|
and optionally `--json`.
|
|
- Surface retention state when a registry is available: query
|
|
`get_retention_state(package_id)` and annotate each record with current
|
|
expiry and hold status.
|
|
- Tests for empty index, single-record index, and registry-augmented listing.
|
|
|
|
### T04 - Restore command
|
|
|
|
```task
|
|
id: IB-WP-0014-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
|
|
```
|
|
|
|
- Implement `restore_archive(package_id, *, target, registry)`:
|
|
- Fetch the finalized manifest via `registry.get_manifest_bytes(..., format="json")`.
|
|
- Iterate files, call `registry.get_file(file_id)`, stream bytes to
|
|
`<target>/<relative_path>`.
|
|
- Refuse to overwrite an existing non-empty target unless `--force` is set.
|
|
- Wire `infospace-bench restore <package-id> --target <dir>` in the CLI.
|
|
- Tests:
|
|
- Round-trip: archive an infospace, restore into a new directory, diff is
|
|
empty (modulo `output/archives/index.yaml` which is local).
|
|
- Restore refuses to overwrite a non-empty target.
|
|
- Restore by manifest digest also works (lookup via `list_packages`).
|
|
|
|
### T05 - Docs and operator notes
|
|
|
|
```task
|
|
id: IB-WP-0014-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
|
|
```
|
|
|
|
- Add `docs/archive-integration.md` covering:
|
|
- When to archive vs keep editing locally.
|
|
- How to point at a shared artifact-store deployment via `ARTIFACTSTORE_*`
|
|
env vars.
|
|
- Retention class selection guidance.
|
|
- Restore workflow.
|
|
- Cross-link from `SCOPE.md` and the relevant CLI help output.
|
|
- Note the explicit non-goal: infospace-bench does not implement S3 or git
|
|
backends; those live in artifact-store.
|
|
|
|
## Acceptance
|
|
|
|
- `infospace-bench archive <root>` produces a finalized artifact-store package
|
|
and writes a new entry to `output/archives/index.yaml`.
|
|
- `infospace-bench archive-list <root>` lists recorded archives, with optional
|
|
retention annotations when the registry is reachable.
|
|
- `infospace-bench restore <package-id> --target <dir>` round-trips the
|
|
archived state byte-for-byte through artifact-store.
|
|
- The default-included file set covers the live infospace contract
|
|
(`infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`,
|
|
`exports/`).
|
|
- Default tests do not require network access or external credentials;
|
|
archive and restore round-trip against the local-FS backend.
|
|
- No S3, git, or remote-folder code exists inside `infospace-bench` after
|
|
this workplan.
|
|
|
|
## Relationship To Other Workplans
|
|
|
|
- `IB-WP-0013` proves generation parity on the local working folder. This
|
|
workplan adds durable preservation of those generated outputs.
|
|
- `artifact-store` WP-0004 will bring S3-compatible storage; once that
|
|
lands, pointing infospace-bench archives at S3 is purely an
|
|
artifact-store configuration change.
|