9.2 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_slug, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_slug | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|
| IB-WP-0014 | workplan | Infospace Archive Integration With artifact-store | markitect | infospace-bench | done | markitect | markitect | 2026-05-14 | 2026-05-17 | ib-wp-0014-infospace-backend-abstraction | c2d23ee7-6b2b-4db0-b660-a9e295c94956 |
IB-WP-0014 - Infospace Archive Integration With artifact-store
Goal
Let a finalized infospace state (or a curated slice of it) be preserved as a
durable, content-addressed package through artifact-store, while the live
infospace continues to live in a local working folder.
Intent
The original framing of this workplan asked for a pluggable storage backend
(local, remote folder, S3, git) inside infospace-bench. Looking at
/home/worsch/artifact-store, that is exactly the boundary the artifact-store
service is being built for: an immutable, content-addressed registry with
retention policy, holds, audit, manifests, and a pluggable storage adapter
SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).
Re-inventing a second backend abstraction in infospace-bench would duplicate
that surface and tangle durable-storage concerns with the live infospace
working directory (which is read-write-read-write across many sessions and is
not a fit for content-addressed immutability).
This workplan therefore replaces "selectable backend" with "durable archive surface":
- The working infospace continues to live in a local folder. That stays the only working storage form.
- A new
archivecapability bundles the infospace (or selected subdirs) into anartifact-storepackage, finalizes it, and records the returned package id and manifest digest inside the infospace. - A
restorecapability re-materializes a previously archived state into a target directory. - Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the configured artifact-store deployment, not implemented here.
Non-Goals
- Replace the local working folder for live infospace operations.
- Re-implement S3, git, or any other storage backend inside
infospace-bench. - Make the live infospace content-addressed or immutable.
- Provide multi-writer concurrency control beyond what artifact-store offers.
- Ship a remote service. Integration is library-only via the
artifactstorePython package (path dep), wired in-process.
Development setup
artifactstore brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3,
pydantic-settings, structlog) that are not on the system Python. Use the
repo Makefile to provision a local venv:
make install # creates .venv and installs path deps + pytest
make test # full suite
make test-archive # only the archive integration tests
The .venv/ directory is gitignored.
Architecture
+----------------------------+ +-------------------------+
| live infospace (folder) | | artifact-store |
| - infospace.yaml | ==> | (library, in-process) |
| - artifacts/... |archive | - registry |
| - output/metrics/... | | - manifest |
| - reports/... | <== | - retention policy |
| - exports/... | restore| - storage backends |
| - output/archives/index | +-------------------------+
+----------------------------+
- The infospace remains the working source of truth for the live state.
- artifact-store owns durable storage, content hashing, manifest, retention, audit, and backend selection.
- A new
output/archives/index.yamlinside the infospace records every archive event (package id, manifest digest, retention class, included paths, note).
Tasks
T01 - Archive contract and infospace metadata
id: IB-WP-0014-T01
status: done
priority: high
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
- Add
artifactstoreas a path dependency on/home/worsch/artifact-store. - Define an in-repo Python contract
infospace_bench.archive:archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecordrestore_archive(package_id, *, target, registry=None) -> RestoredArchivelist_archives(root) -> list[ArchiveRecord]
- Map default
retention_classtorelease-evidence; allow any class the registry exposes vialist_retention_classes(). - Default
includeset:infospace.yaml,artifacts/,workflows/,output/,reports/,exports/. Allow caller-supplied include patterns. - Document credentials policy: never write secrets into
infospace.yamlor archive metadata; backend secrets stay with the artifact-store deployment. - Define
output/archives/index.yamlschema: list of records withpackage_id,manifest_digest,retention_class,created_at,included_paths,file_count,note,producer,subject.
T02 - Archive command and library implementation
id: IB-WP-0014-T02
status: done
priority: high
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
- Implement
archive_infospaceagainstartifactstore.registry.Registry:- Create a package with producer=
infospace-bench, subject=<infospace-slug>, retention class as requested. - Walk the include set; stream each file via
registry.ingest_file. - Finalize the package and capture the manifest digest.
- Append the new record to
output/archives/index.yaml.
- Create a package with producer=
- Wire
infospace-bench archive <root>in the CLI with flags--retention-class,--include,--note,--store-root. - Provide a
_build_default_registry(store_root)helper that callsartifactstore.app.build_registry()with overridden settings so the default behavior is a self-contained store under<infospace>/output/archives/.store/(SQLite + local FS). HonorARTIFACTSTORE_*env vars when set so operators can point at a shared artifact-store deployment. - Tests:
- Archiving a small infospace returns a stable record and writes index.
- Re-archiving the same content reuses content-addressed bytes (verifies artifact-store dedup at the storage layer).
- Excluded paths are not ingested.
- Default-include path produces a non-empty package.
T03 - Archive index and list command
id: IB-WP-0014-T03
status: done
priority: high
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
- Implement
list_archives(root)readingoutput/archives/index.yaml. - Wire
infospace-bench archive-list <root>to print the records as a table and optionally--json. - Surface retention state when a registry is available: query
get_retention_state(package_id)and annotate each record with current expiry and hold status. - Tests for empty index, single-record index, and registry-augmented listing.
T04 - Restore command
id: IB-WP-0014-T04
status: done
priority: medium
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
- Implement
restore_archive(package_id, *, target, registry):- Fetch the finalized manifest via
registry.get_manifest_bytes(..., format="json"). - Iterate files, call
registry.get_file(file_id), stream bytes to<target>/<relative_path>. - Refuse to overwrite an existing non-empty target unless
--forceis set.
- Fetch the finalized manifest via
- Wire
infospace-bench restore <package-id> --target <dir>in the CLI. - Tests:
- Round-trip: archive an infospace, restore into a new directory, diff is
empty (modulo
output/archives/index.yamlwhich is local). - Restore refuses to overwrite a non-empty target.
- Restore by manifest digest also works (lookup via
list_packages).
- Round-trip: archive an infospace, restore into a new directory, diff is
empty (modulo
T05 - Docs and operator notes
id: IB-WP-0014-T05
status: done
priority: medium
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
- Add
docs/archive-integration.mdcovering:- When to archive vs keep editing locally.
- How to point at a shared artifact-store deployment via
ARTIFACTSTORE_*env vars. - Retention class selection guidance.
- Restore workflow.
- Cross-link from
SCOPE.mdand the relevant CLI help output. - Note the explicit non-goal: infospace-bench does not implement S3 or git backends; those live in artifact-store.
Acceptance
infospace-bench archive <root>produces a finalized artifact-store package and writes a new entry tooutput/archives/index.yaml.infospace-bench archive-list <root>lists recorded archives, with optional retention annotations when the registry is reachable.infospace-bench restore <package-id> --target <dir>round-trips the archived state byte-for-byte through artifact-store.- The default-included file set covers the live infospace contract
(
infospace.yaml,artifacts/,workflows/,output/,reports/,exports/). - Default tests do not require network access or external credentials; archive and restore round-trip against the local-FS backend.
- No S3, git, or remote-folder code exists inside
infospace-benchafter this workplan.
Relationship To Other Workplans
IB-WP-0013proves generation parity on the local working folder. This workplan adds durable preservation of those generated outputs.artifact-storeWP-0004 will bring S3-compatible storage; once that lands, pointing infospace-bench archives at S3 is purely an artifact-store configuration change.