Files
infospace-bench/workplans/IB-WP-0014-infospace-backend-abstraction.md
tegwick 7825608307 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T05: todo → done
2026-05-17 11:52:20 +02:00

9.2 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_slug, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_slug state_hub_workstream_id
IB-WP-0014 workplan Infospace Archive Integration With artifact-store markitect infospace-bench done markitect markitect 2026-05-14 2026-05-17 ib-wp-0014-infospace-backend-abstraction c2d23ee7-6b2b-4db0-b660-a9e295c94956

IB-WP-0014 - Infospace Archive Integration With artifact-store

Goal

Let a finalized infospace state (or a curated slice of it) be preserved as a durable, content-addressed package through artifact-store, while the live infospace continues to live in a local working folder.

Intent

The original framing of this workplan asked for a pluggable storage backend (local, remote folder, S3, git) inside infospace-bench. Looking at /home/worsch/artifact-store, that is exactly the boundary the artifact-store service is being built for: an immutable, content-addressed registry with retention policy, holds, audit, manifests, and a pluggable storage adapter SPI (local FS in v0.1, S3-compatible/Ceph RGW in WP-0004).

Re-inventing a second backend abstraction in infospace-bench would duplicate that surface and tangle durable-storage concerns with the live infospace working directory (which is read-write-read-write across many sessions and is not a fit for content-addressed immutability).

This workplan therefore replaces "selectable backend" with "durable archive surface":

  • The working infospace continues to live in a local folder. That stays the only working storage form.
  • A new archive capability bundles the infospace (or selected subdirs) into an artifact-store package, finalizes it, and records the returned package id and manifest digest inside the infospace.
  • A restore capability re-materializes a previously archived state into a target directory.
  • Multi-backend storage (S3-compatible, Ceph RGW) is delegated to the configured artifact-store deployment, not implemented here.

Non-Goals

  • Replace the local working folder for live infospace operations.
  • Re-implement S3, git, or any other storage backend inside infospace-bench.
  • Make the live infospace content-addressed or immutable.
  • Provide multi-writer concurrency control beyond what artifact-store offers.
  • Ship a remote service. Integration is library-only via the artifactstore Python package (path dep), wired in-process.

Development setup

artifactstore brings runtime deps (SQLAlchemy, FastAPI, cbor2, blake3, pydantic-settings, structlog) that are not on the system Python. Use the repo Makefile to provision a local venv:

make install     # creates .venv and installs path deps + pytest
make test        # full suite
make test-archive  # only the archive integration tests

The .venv/ directory is gitignored.

Architecture

+----------------------------+        +-------------------------+
|   live infospace (folder)  |        |     artifact-store      |
|   - infospace.yaml         |   ==>  |  (library, in-process)  |
|   - artifacts/...          |archive |  - registry             |
|   - output/metrics/...     |        |  - manifest             |
|   - reports/...            |  <==   |  - retention policy     |
|   - exports/...            | restore|  - storage backends     |
|   - output/archives/index  |        +-------------------------+
+----------------------------+
  • The infospace remains the working source of truth for the live state.
  • artifact-store owns durable storage, content hashing, manifest, retention, audit, and backend selection.
  • A new output/archives/index.yaml inside the infospace records every archive event (package id, manifest digest, retention class, included paths, note).

Tasks

T01 - Archive contract and infospace metadata

id: IB-WP-0014-T01
status: done
priority: high
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
  • Add artifactstore as a path dependency on /home/worsch/artifact-store.
  • Define an in-repo Python contract infospace_bench.archive:
    • archive_infospace(root, *, retention_class, include, note, registry=None) -> ArchiveRecord
    • restore_archive(package_id, *, target, registry=None) -> RestoredArchive
    • list_archives(root) -> list[ArchiveRecord]
  • Map default retention_class to release-evidence; allow any class the registry exposes via list_retention_classes().
  • Default include set: infospace.yaml, artifacts/, workflows/, output/, reports/, exports/. Allow caller-supplied include patterns.
  • Document credentials policy: never write secrets into infospace.yaml or archive metadata; backend secrets stay with the artifact-store deployment.
  • Define output/archives/index.yaml schema: list of records with package_id, manifest_digest, retention_class, created_at, included_paths, file_count, note, producer, subject.

T02 - Archive command and library implementation

id: IB-WP-0014-T02
status: done
priority: high
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
  • Implement archive_infospace against artifactstore.registry.Registry:
    • Create a package with producer=infospace-bench, subject=<infospace-slug>, retention class as requested.
    • Walk the include set; stream each file via registry.ingest_file.
    • Finalize the package and capture the manifest digest.
    • Append the new record to output/archives/index.yaml.
  • Wire infospace-bench archive <root> in the CLI with flags --retention-class, --include, --note, --store-root.
  • Provide a _build_default_registry(store_root) helper that calls artifactstore.app.build_registry() with overridden settings so the default behavior is a self-contained store under <infospace>/output/archives/.store/ (SQLite + local FS). Honor ARTIFACTSTORE_* env vars when set so operators can point at a shared artifact-store deployment.
  • Tests:
    • Archiving a small infospace returns a stable record and writes index.
    • Re-archiving the same content reuses content-addressed bytes (verifies artifact-store dedup at the storage layer).
    • Excluded paths are not ingested.
    • Default-include path produces a non-empty package.

T03 - Archive index and list command

id: IB-WP-0014-T03
status: done
priority: high
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
  • Implement list_archives(root) reading output/archives/index.yaml.
  • Wire infospace-bench archive-list <root> to print the records as a table and optionally --json.
  • Surface retention state when a registry is available: query get_retention_state(package_id) and annotate each record with current expiry and hold status.
  • Tests for empty index, single-record index, and registry-augmented listing.

T04 - Restore command

id: IB-WP-0014-T04
status: done
priority: medium
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
  • Implement restore_archive(package_id, *, target, registry):
    • Fetch the finalized manifest via registry.get_manifest_bytes(..., format="json").
    • Iterate files, call registry.get_file(file_id), stream bytes to <target>/<relative_path>.
    • Refuse to overwrite an existing non-empty target unless --force is set.
  • Wire infospace-bench restore <package-id> --target <dir> in the CLI.
  • Tests:
    • Round-trip: archive an infospace, restore into a new directory, diff is empty (modulo output/archives/index.yaml which is local).
    • Restore refuses to overwrite a non-empty target.
    • Restore by manifest digest also works (lookup via list_packages).

T05 - Docs and operator notes

id: IB-WP-0014-T05
status: done
priority: medium
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
  • Add docs/archive-integration.md covering:
    • When to archive vs keep editing locally.
    • How to point at a shared artifact-store deployment via ARTIFACTSTORE_* env vars.
    • Retention class selection guidance.
    • Restore workflow.
  • Cross-link from SCOPE.md and the relevant CLI help output.
  • Note the explicit non-goal: infospace-bench does not implement S3 or git backends; those live in artifact-store.

Acceptance

  • infospace-bench archive <root> produces a finalized artifact-store package and writes a new entry to output/archives/index.yaml.
  • infospace-bench archive-list <root> lists recorded archives, with optional retention annotations when the registry is reachable.
  • infospace-bench restore <package-id> --target <dir> round-trips the archived state byte-for-byte through artifact-store.
  • The default-included file set covers the live infospace contract (infospace.yaml, artifacts/, workflows/, output/, reports/, exports/).
  • Default tests do not require network access or external credentials; archive and restore round-trip against the local-FS backend.
  • No S3, git, or remote-folder code exists inside infospace-bench after this workplan.

Relationship To Other Workplans

  • IB-WP-0013 proves generation parity on the local working folder. This workplan adds durable preservation of those generated outputs.
  • artifact-store WP-0004 will bring S3-compatible storage; once that lands, pointing infospace-bench archives at S3 is purely an artifact-store configuration change.