generated from coulomb/repo-seed
IB-WP-0014: archive integration with artifact-store (T01+T02)
Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive surface via artifact-store". The live infospace stays in a local working folder; finalized snapshots are bundled into content-addressed artifact-store packages. - New module infospace_bench.archive: archive_infospace(), list_archives(), ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under output/archives/.store/ when no Registry is passed in. - New output/archives/index.yaml records each archive event (package id, manifest digest, retention class, included paths, file count, note). - artifactstore added as a path dep; Python floor bumped to 3.12 to match. - Makefile for venv-based dev setup; stack-and-commands.md updated. - tests/test_archive.py covers index write, list, recursive-capture guard, caller-supplied include, and empty-include error. Full suite 65 passed. Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,77 +1,124 @@
|
||||
---
|
||||
id: IB-WP-0014
|
||||
type: workplan
|
||||
title: "Infospace Backend Abstraction"
|
||||
title: "Infospace Archive Integration With artifact-store"
|
||||
domain: markitect
|
||||
repo: infospace-bench
|
||||
status: todo
|
||||
status: in_progress
|
||||
owner: markitect
|
||||
topic_slug: markitect
|
||||
created: "2026-05-14"
|
||||
updated: "2026-05-14"
|
||||
updated: "2026-05-17"
|
||||
state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction"
|
||||
state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956"
|
||||
---
|
||||
|
||||
# IB-WP-0014 - Infospace Backend Abstraction
|
||||
# IB-WP-0014 - Infospace Archive Integration With artifact-store
|
||||
|
||||
## Goal
|
||||
|
||||
Allow an infospace to live behind a selectable backend instead of assuming only
|
||||
a local filesystem directory.
|
||||
|
||||
Target backends:
|
||||
|
||||
- local folder
|
||||
- remote or mounted folder
|
||||
- S3-compatible bucket/prefix
|
||||
- git repository
|
||||
|
||||
This is a new successor capability, not legacy parity. It should be designed so
|
||||
generation, validation, evaluation, and inspection logic do not care where the
|
||||
infospace is physically stored.
|
||||
Let a finalized infospace state (or a curated slice of it) be preserved as a
|
||||
durable, content-addressed package through `artifact-store`, while the live
|
||||
infospace continues to live in a local working folder.
|
||||
|
||||
## Intent
|
||||
|
||||
The current repo is intentionally file-backed. That should remain the default.
|
||||
The improvement is to formalize the storage boundary so the same lifecycle and
|
||||
workflow APIs can operate on other backing stores through explicit adapters.
|
||||
The original framing of this workplan asked for a pluggable storage backend
|
||||
(local, remote folder, S3, git) *inside* `infospace-bench`. Looking at
|
||||
`/home/worsch/artifact-store`, that is exactly the boundary the artifact-store
|
||||
service is being built for: an immutable, content-addressed registry with
|
||||
retention policy, holds, audit, manifests, and a pluggable storage adapter
|
||||
SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).
|
||||
|
||||
The design should keep `infospace-bench` as an application workspace, not a
|
||||
durable storage engine. Credentials, remote locking, rich audit, and runtime
|
||||
orchestration should be delegated or integrated carefully rather than invented
|
||||
inside core application logic.
|
||||
Re-inventing a second backend abstraction in `infospace-bench` would duplicate
|
||||
that surface and tangle durable-storage concerns with the live infospace
|
||||
working directory (which is read-write-read-write across many sessions and is
|
||||
not a fit for content-addressed immutability).
|
||||
|
||||
This workplan therefore replaces "selectable backend" with "durable archive
|
||||
surface":
|
||||
|
||||
- The working infospace continues to live in a local folder. That stays the
|
||||
only *working* storage form.
|
||||
- A new `archive` capability bundles the infospace (or selected subdirs) into
|
||||
an `artifact-store` package, finalizes it, and records the returned package
|
||||
id and manifest digest inside the infospace.
|
||||
- A `restore` capability re-materializes a previously archived state into a
|
||||
target directory.
|
||||
- Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the
|
||||
configured artifact-store deployment, not implemented here.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Replace the existing local folder behavior.
|
||||
- Require S3 or git dependencies for ordinary local use.
|
||||
- Store secrets in `infospace.yaml`.
|
||||
- Build a general database, sync server, or object storage service inside this
|
||||
repo.
|
||||
- Solve multi-writer conflict resolution beyond clear detection and reporting
|
||||
in the first pass.
|
||||
- Replace the local working folder for live infospace operations.
|
||||
- Re-implement S3, git, or any other storage backend inside `infospace-bench`.
|
||||
- Make the live infospace content-addressed or immutable.
|
||||
- Provide multi-writer concurrency control beyond what artifact-store offers.
|
||||
- Ship a remote service. Integration is library-only via the `artifactstore`
|
||||
Python package (path dep), wired in-process.
|
||||
|
||||
## Development setup
|
||||
|
||||
`artifactstore` brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3,
|
||||
pydantic-settings, structlog) that are not on the system Python. Use the
|
||||
repo Makefile to provision a local venv:
|
||||
|
||||
```bash
|
||||
make install # creates .venv and installs path deps + pytest
|
||||
make test # full suite
|
||||
make test-archive # only the archive integration tests
|
||||
```
|
||||
|
||||
The `.venv/` directory is gitignored.
|
||||
|
||||
## Architecture
|
||||
|
||||
```text
|
||||
+----------------------------+ +-------------------------+
|
||||
| live infospace (folder) | | artifact-store |
|
||||
| - infospace.yaml | ==> | (library, in-process) |
|
||||
| - artifacts/... |archive | - registry |
|
||||
| - output/metrics/... | | - manifest |
|
||||
| - reports/... | <== | - retention policy |
|
||||
| - exports/... | restore| - storage backends |
|
||||
| - output/archives/index | +-------------------------+
|
||||
+----------------------------+
|
||||
```
|
||||
|
||||
- The infospace remains the working source of truth for the live state.
|
||||
- artifact-store owns durable storage, content hashing, manifest, retention,
|
||||
audit, and backend selection.
|
||||
- A new `output/archives/index.yaml` inside the infospace records every
|
||||
archive event (package id, manifest digest, retention class, included
|
||||
paths, note).
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Backend contract and URI model
|
||||
### T01 - Archive contract and infospace metadata
|
||||
|
||||
```task
|
||||
id: IB-WP-0014-T01
|
||||
status: todo
|
||||
status: in_progress
|
||||
priority: high
|
||||
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
|
||||
```
|
||||
|
||||
- Define a backend-neutral infospace location model
|
||||
- Support local paths without changing current user flows
|
||||
- Define URI examples for local, mounted folder, S3-compatible, and git-backed
|
||||
infospaces
|
||||
- Define backend capabilities: read, write, list, exists, atomic write,
|
||||
digest, version, sync, lock, and credentials-required
|
||||
- Document where credentials and remote configuration are allowed to live
|
||||
- Add `artifactstore` as a path dependency on `/home/worsch/artifact-store`.
|
||||
- Define an in-repo Python contract `infospace_bench.archive`:
|
||||
- `archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord`
|
||||
- `restore_archive(package_id, *, target, registry=None) -> RestoredArchive`
|
||||
- `list_archives(root) -> list[ArchiveRecord]`
|
||||
- Map default `retention_class` to `release-evidence`; allow any class the
|
||||
registry exposes via `list_retention_classes()`.
|
||||
- Default `include` set: `infospace.yaml`, `artifacts/`, `workflows/`,
|
||||
`output/`, `reports/`, `exports/`. Allow caller-supplied include patterns.
|
||||
- Document credentials policy: never write secrets into `infospace.yaml` or
|
||||
archive metadata; backend secrets stay with the artifact-store deployment.
|
||||
- Define `output/archives/index.yaml` schema: list of records with
|
||||
`package_id`, `manifest_digest`, `retention_class`, `created_at`,
|
||||
`included_paths`, `file_count`, `note`, `producer`, `subject`.
|
||||
|
||||
### T02 - Local and remote folder backend baseline
|
||||
### T02 - Archive command and library implementation
|
||||
|
||||
```task
|
||||
id: IB-WP-0014-T02
|
||||
@@ -80,16 +127,28 @@ priority: high
|
||||
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
|
||||
```
|
||||
|
||||
- Refactor lifecycle reads and writes behind a backend adapter while preserving
|
||||
current `Path`-based behavior
|
||||
- Keep local folders as the default backend
|
||||
- Treat mounted or remote folders as folder backends when the OS exposes them
|
||||
as paths
|
||||
- Add tests proving current pilots and CLI commands still work unchanged
|
||||
- Add tests for backend errors such as missing files, write failures, and
|
||||
unsafe paths
|
||||
- Implement `archive_infospace` against `artifactstore.registry.Registry`:
|
||||
- Create a package with producer=`infospace-bench`,
|
||||
subject=`<infospace-slug>`, retention class as requested.
|
||||
- Walk the include set; stream each file via `registry.ingest_file`.
|
||||
- Finalize the package and capture the manifest digest.
|
||||
- Append the new record to `output/archives/index.yaml`.
|
||||
- Wire `infospace-bench archive <root>` in the CLI with flags
|
||||
`--retention-class`, `--include`, `--note`, `--store-root`.
|
||||
- Provide a `_build_default_registry(store_root)` helper that calls
|
||||
`artifactstore.app.build_registry()` with overridden settings so the
|
||||
default behavior is a self-contained store under
|
||||
`<infospace>/output/archives/.store/` (SQLite + local FS). Honor
|
||||
`ARTIFACTSTORE_*` env vars when set so operators can point at a shared
|
||||
artifact-store deployment.
|
||||
- Tests:
|
||||
- Archiving a small infospace returns a stable record and writes index.
|
||||
- Re-archiving the same content reuses content-addressed bytes (verifies
|
||||
artifact-store dedup at the storage layer).
|
||||
- Excluded paths are not ingested.
|
||||
- Default-include path produces a non-empty package.
|
||||
|
||||
### T03 - S3 object-store backend adapter
|
||||
### T03 - Archive index and list command
|
||||
|
||||
```task
|
||||
id: IB-WP-0014-T03
|
||||
@@ -98,31 +157,36 @@ priority: high
|
||||
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
|
||||
```
|
||||
|
||||
- Design an optional S3-compatible backend adapter
|
||||
- Use a fake in-memory or local test double for default tests
|
||||
- Keep real credentials and network calls out of the default test suite
|
||||
- Define object key layout for manifests, artifacts, reports, exports, and run
|
||||
records
|
||||
- Decide how digests, optimistic concurrency, and partial writes are reported
|
||||
- Implement `list_archives(root)` reading `output/archives/index.yaml`.
|
||||
- Wire `infospace-bench archive-list <root>` to print the records as a table
|
||||
and optionally `--json`.
|
||||
- Surface retention state when a registry is available: query
|
||||
`get_retention_state(package_id)` and annotate each record with current
|
||||
expiry and hold status.
|
||||
- Tests for empty index, single-record index, and registry-augmented listing.
|
||||
|
||||
### T04 - Git repository backend adapter
|
||||
### T04 - Restore command
|
||||
|
||||
```task
|
||||
id: IB-WP-0014-T04
|
||||
status: todo
|
||||
priority: high
|
||||
priority: medium
|
||||
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
|
||||
```
|
||||
|
||||
- Support opening or initializing an infospace backed by a git repository
|
||||
- Prove behavior against local test repositories before any remote network
|
||||
workflow
|
||||
- Define when commits are created, when they are only suggested, and how dirty
|
||||
trees are reported
|
||||
- Keep automatic commits opt-in
|
||||
- Preserve compatibility with the existing State Hub and workplan workflow
|
||||
- Implement `restore_archive(package_id, *, target, registry)`:
|
||||
- Fetch the finalized manifest via `registry.get_manifest_bytes(..., format="json")`.
|
||||
- Iterate files, call `registry.get_file(file_id)`, stream bytes to
|
||||
`<target>/<relative_path>`.
|
||||
- Refuse to overwrite an existing non-empty target unless `--force` is set.
|
||||
- Wire `infospace-bench restore <package-id> --target <dir>` in the CLI.
|
||||
- Tests:
|
||||
- Round-trip: archive an infospace, restore into a new directory, diff is
|
||||
empty (modulo `output/archives/index.yaml` which is local).
|
||||
- Restore refuses to overwrite a non-empty target.
|
||||
- Restore by manifest digest also works (lookup via `list_packages`).
|
||||
|
||||
### T05 - Backend CLI docs and migration path
|
||||
### T05 - Docs and operator notes
|
||||
|
||||
```task
|
||||
id: IB-WP-0014-T05
|
||||
@@ -131,26 +195,36 @@ priority: medium
|
||||
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
|
||||
```
|
||||
|
||||
- Expose backend selection in CLI/API docs
|
||||
- Add examples for local, mounted folder, S3-compatible, and git-backed
|
||||
infospaces
|
||||
- Document backend capabilities and limitations
|
||||
- Add a migration guide for moving a local infospace to another backend
|
||||
- Update acceptance docs so backend support is distinct from Wealth/VSM
|
||||
generation parity
|
||||
- Add `docs/archive-integration.md` covering:
|
||||
- When to archive vs keep editing locally.
|
||||
- How to point at a shared artifact-store deployment via `ARTIFACTSTORE_*`
|
||||
env vars.
|
||||
- Retention class selection guidance.
|
||||
- Restore workflow.
|
||||
- Cross-link from `SCOPE.md` and the relevant CLI help output.
|
||||
- Note the explicit non-goal: infospace-bench does not implement S3 or git
|
||||
backends; those live in artifact-store.
|
||||
|
||||
## Acceptance
|
||||
|
||||
- Existing local-folder behavior remains backward compatible
|
||||
- Lifecycle, validation, inspection, workflow, metrics, history, and graph
|
||||
commands can operate through the backend contract
|
||||
- Default tests remain deterministic and do not require network credentials
|
||||
- Backend-specific capabilities and failure modes are visible to callers
|
||||
- S3 and git support are optional and clearly documented
|
||||
- Storage backend concerns stay separate from generation workflow semantics
|
||||
- `infospace-bench archive <root>` produces a finalized artifact-store package
|
||||
and writes a new entry to `output/archives/index.yaml`.
|
||||
- `infospace-bench archive-list <root>` lists recorded archives, with optional
|
||||
retention annotations when the registry is reachable.
|
||||
- `infospace-bench restore <package-id> --target <dir>` round-trips the
|
||||
archived state byte-for-byte through artifact-store.
|
||||
- The default-included file set covers the live infospace contract
|
||||
(`infospace.yaml`, `artifacts/`, `workflows/`, `output/`, `reports/`,
|
||||
`exports/`).
|
||||
- Default tests do not require network access or external credentials;
|
||||
archive and restore round-trip against the local-FS backend.
|
||||
- No S3, git, or remote-folder code exists inside `infospace-bench` after
|
||||
this workplan.
|
||||
|
||||
## Relationship To IB-WP-0013
|
||||
## Relationship To Other Workplans
|
||||
|
||||
`IB-WP-0013` should prove generation parity on the default local backend first.
|
||||
This workplan then makes the same infospace operations portable across storage
|
||||
backends.
|
||||
- `IB-WP-0013` proves generation parity on the local working folder. This
|
||||
workplan adds durable preservation of those generated outputs.
|
||||
- `artifact-store` WP-0004 will bring S3-compatible storage; once that
|
||||
lands, pointing infospace-bench archives at S3 is purely an
|
||||
artifact-store configuration change.
|
||||
|
||||
Reference in New Issue
Block a user