# Operator Guide Status: v0.1 (WP-0003 baseline) Updated: 2026-05-16 This guide is the user manual for running `artifact-store` v0.1: the library, CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks, and the guide-board pilot path. For architectural background see [ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md), the ADRs under [adr/](adr/), and the [ROADMAP](ROADMAP.md). ## Prerequisites - Python 3.12 or 3.13 - [`uv`](https://docs.astral.sh/uv/) on the PATH (one static binary) - A POSIX-ish shell (Linux, macOS, WSL2) The pinned tech stack is documented in [ADR-0005](adr/0005-v1-tech-stack.md). ## Quick start ```sh uv sync --all-extras # install deps; produces .venv/ and uv.lock cp .env.example .env # optional — the defaults work out of the box make migrate-fresh # creates ./var/artifactstore.db and applies migrations make dev # uvicorn on 127.0.0.1:8000 ``` In another terminal: ```sh curl -s http://127.0.0.1:8000/health | python3 -m json.tool artifactstore health ``` Both should report ``status: ok``. ## Environment variables All settings are prefixed with ``ARTIFACTSTORE_`` and read by `pydantic-settings` from the environment and (optionally) `./.env`. | Variable | Default | Purpose | |-----------------------------------|-----------------------------------------------|---------| | `ARTIFACTSTORE_DATABASE_URL` | `sqlite+aiosqlite:///./var/artifactstore.db` | SQLAlchemy async URL. Alembic translates `+aiosqlite` and `+asyncpg` to their sync drivers at migrate-time. | | `ARTIFACTSTORE_STORAGE_LOCAL_ROOT`| `./var/storage` | Root directory for the local filesystem storage backend. Created on first use. | | `ARTIFACTSTORE_LOG_LEVEL` | `INFO` | Python logging level (`DEBUG` / `INFO` / `WARNING` / `ERROR`). | | `ARTIFACTSTORE_AUTH_TOKENS` | empty | Comma- or newline-separated shared-secret bearer tokens for the HTTP API. | | `ARTIFACTSTORE_ANON_READ` | `false` | Set `true` only for local demos where read endpoints may be anonymous. | | `ARTIFACTSTORE_API_URL` | `http://127.0.0.1:8000` | Default API base URL used by HTTP-backed CLI commands. | | `ARTIFACTSTORE_API_TOKEN` | empty | Default bearer token used by HTTP-backed CLI commands. | | `ARTIFACTSTORE_GUIDE_BOARD_SCHEMA` | `schemas/guide-board.run.v1.json` | Schema path used by guide-board pilot bootstrap helpers. | | `ARTIFACTSTORE_RETENTION_CONFIG_PATH` | empty | Optional TOML file overriding retention-class default durations. | | `ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS` | `3600` | Default interval for external schedulers that invoke the retention sweeper. | | `ARTIFACTSTORE_STORAGE_BACKENDS` | `local` | Comma-separated backend IDs to configure (`local`, `s3`). | | `ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND` | `local` | Backend used when no routing rule matches. | | `ARTIFACTSTORE_STORAGE_BACKEND_ROUTES` | empty | Comma-separated `producer:retention_class=backend_id` rules; `*` is a wildcard. | | `ARTIFACTSTORE_S3_ENDPOINT_URL` | empty | S3-compatible endpoint URL for Ceph RGW / MinIO / AWS S3. | | `ARTIFACTSTORE_S3_REGION` | `us-east-1` | S3 signing region. | | `ARTIFACTSTORE_S3_BUCKET` | empty | Bucket/container for artifact objects. | | `ARTIFACTSTORE_S3_KEY_PREFIX` | empty | Optional object-key prefix before `/`. | | `ARTIFACTSTORE_S3_ACCESS_KEY_REF` | empty | Access key reference, `env:NAME` or `file:/mounted/path`. | | `ARTIFACTSTORE_S3_SECRET_KEY_REF` | empty | Secret key reference, `env:NAME` or `file:/mounted/path`. | | `ARTIFACTSTORE_S3_STORAGE_CLASS` | empty | Optional storage class sent on writes. | | `ARTIFACTSTORE_S3_SSE` | empty | Optional server-side encryption value, e.g. `AES256`. | | `ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES` | `67108864` | Multipart threshold for the S3 backend. | | `ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES` | `8388608` | Multipart part size for the S3 backend. | | `STATE_HUB_URL` | `http://127.0.0.1:8000` | State Hub base URL used by guide-board linkage helpers. | | `STATE_HUB_WORKSTREAM_ID` | empty | Optional workstream id for State Hub linkage events. | | `STATE_HUB_TASK_ID` | empty | Optional task id for State Hub linkage events. | See [`.env.example`](../.env.example) for the canonical template. ### Retention policy TOML By default, retention durations come from the seeded `retention_classes` rows. Operators can override the default duration per class with `ARTIFACTSTORE_RETENTION_CONFIG_PATH`: ```toml [retention_classes.transient] default_duration_seconds = 86400 [retention_classes."raw-evidence"] default_duration_seconds = 7776000 [retention_classes."summary-evidence"] default_duration_seconds = 31536000 [retention_classes."release-evidence"] default_duration_seconds = 220752000 [retention_classes."permanent-record"] # Omit default_duration_seconds for no expiry. ``` Run `artifactstore retention sweep` from cron or another scheduler to mark expired, unheld packages eligible for deletion. Then run `artifactstore retention gc` to release the eligible packages' storage locations and delete physical objects whose final reference has been released: ```sh artifactstore retention sweep artifactstore retention gc ``` GC is reference-counted by `(backend_id, content_address)`: shared bytes stay in the backend until every non-deleted storage location has been released. Each released location emits a `v1.storage.location_deleted` event. A package becomes `garbage_collected` only after all of its storage locations are released. ## Database backends ### SQLite (development default) Zero-config. The database file lives at `./var/artifactstore.db` by default and is gitignored. ```sh make migrate-fresh # drop and re-create make migrate # idempotent: apply pending migrations ``` ### PostgreSQL 16+ (shared deployments) Install the optional `postgres` extra (pulls in `psycopg[binary]` for Alembic's sync driver): ```sh uv sync --all-extras --extra postgres ``` Set the URL with the async driver; Alembic switches to `+psycopg` for migrations automatically: ```sh export ARTIFACTSTORE_DATABASE_URL=postgresql+asyncpg://artifactstore:secret@db.internal:5432/artifactstore make migrate ``` The schema is identical to SQLite (per [ADR-0002](adr/0002-event-log-source-of-truth.md) the events table drives all materialised views). ## Storage backends The storage adapter SPI is documented in [ADR-0001](adr/0001-content-addressed-storage.md) and [ADR-0004](adr/0004-control-plane-data-plane-contract.md). ### Local filesystem (default) Objects are addressed by content (`blake3:`) and laid out as ``` //// ``` with atomic writes (tmpfile + fsync + rename). ### S3-compatible backend The `s3` backend targets Ceph RGW first, with MinIO as the development stand-in and AWS S3 as an interoperability check. Install the optional S3 dependency before enabling it: ```sh uv sync --all-extras --extra s3 ``` Ceph RGW example: ```sh export ARTIFACTSTORE_STORAGE_BACKENDS=local,s3 export ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND=s3 export ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local' export ARTIFACTSTORE_S3_ENDPOINT_URL=https://rgw.example.internal export ARTIFACTSTORE_S3_REGION=us-east-1 export ARTIFACTSTORE_S3_BUCKET=artifact-store export ARTIFACTSTORE_S3_KEY_PREFIX=prod/artifact-store export ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:ARTIFACTSTORE_RGW_ACCESS_KEY export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore-rgw-secret export ARTIFACTSTORE_S3_STORAGE_CLASS=STANDARD export ARTIFACTSTORE_S3_SSE=AES256 ``` Manual smoke against Ceph RGW: ```sh artifactstore health artifactstore push ./fixtures/smoke \ --producer guide-board \ --subject rgw-smoke \ --retention-class release-evidence artifactstore storage verify --backend s3 ``` The verification command re-reads stored objects, recomputes the primary digest, emits `v1.storage.location_verified`, and marks failed locations as `failed`. A nonzero failed-location count degrades `/health`. ## CLI reference `artifactstore --help` lists every subcommand. The v0.1 set: | Command | Purpose | |--------------------------|---------| | `artifactstore version` | Print the package version and exit. | | `artifactstore migrate` | Run `alembic upgrade head` against the configured database. | | `artifactstore replay` | Truncate every materialised view and rebuild it from the event log; prints the highest sequence applied. | | `artifactstore health` | JSON liveness summary (db, backend, status). Same payload as the HTTP `/health` endpoint. | | `artifactstore push ` | Push a directory through the HTTP API and finalize the package. | | `artifactstore manifest ` | Fetch the JSON manifest projection through the HTTP API. | | `artifactstore retention sweep` | Run one deletion-eligibility sweep against the configured DB. | | `artifactstore retention gc` | Run one reference-counted garbage-collection pass. | | `artifactstore storage verify --backend ` | Re-read stored objects for a backend and record verification events. | | `artifactstore guide-board ingest ` | Ingest one guide-board run directory as an artifact package. | The CLI is a thin client over `artifactstore.registry.Registry` (see [ADR-0005](adr/0005-v1-tech-stack.md)). ## HTTP reference (v0.1) | Route family | Purpose | |-----------------------|---------| | `GET /`, `GET /health` | Anonymous service banner and liveness summary. | | `GET /docs`, `GET /openapi.json` | FastAPI's interactive OpenAPI docs and generated schema. | | `/packages...` | Create, list, inspect, upload files to, finalize, and retrieve manifests for packages. | | `/files...` | File metadata and byte downloads, including single-range reads. | | `/uploads...` | Upload-session wire shape for whole-body v1 uploads. | | `/packages/{id}/retention...` | Extend retention, apply/release holds, and read retention history. | | `POST /metadata-schemas` | Register package metadata schemas by slug. | | `GET /events` | Long-poll event feed, CBOR by default or JSON with `Accept: application/json`. | All non-health routes require a bearer token unless `ARTIFACTSTORE_ANON_READ=true` is set for read endpoints. ## End-to-end smoke test (Python library) This exercises every layer (identity, manifest, events, dataplane, storage, registry, replay) end-to-end against the default SQLite + local FS configuration. ```python import asyncio from collections.abc import AsyncIterator from artifactstore.app import build_registry from artifactstore.manifest import decode as manifest_decode async def chunks(data: bytes) -> AsyncIterator[bytes]: yield data async def main() -> None: registry = build_registry() try: pkg = await registry.create_package( name="smoke-test", producer="ops", subject="example.org", retention_class="raw-evidence", actor="ops", metadata={"smoke": True}, ) await registry.ingest_file( pkg, relative_path="hello.txt", media_type="text/plain", stream=chunks(b"hello world"), actor="ops", ) manifest_addr = await registry.finalize_package(pkg, actor="ops") cbor = await registry.get_manifest_bytes(pkg, format="cbor") manifest = manifest_decode(cbor) print("package:", pkg) print("manifest digest:", manifest_addr) print("files in manifest:", [f.relative_path for f in manifest.files]) finally: await registry.dispose() asyncio.run(main()) ``` Prerequisites: `make migrate-fresh` has been run so the schema and the retention class seeds exist. ## Guide-board pilot The guide-board pilot stores a run directory as one artifact package and records only package identifiers in State Hub. See [docs/pilots/guide-board.md](pilots/guide-board.md) for schema registration, the real `~/guide-board` plus `~/open-cmis-tck` smoke procedure, and the exact `POST /progress/` linkage payload. ## Replay / disaster recovery Every state-changing operation writes one row to `events` and updates the materialised views in the same transaction ([ADR-0002](adr/0002-event-log-source-of-truth.md)). If the materialised views are lost or corrupted, rebuild them from the event log: ```sh artifactstore replay ``` The command drops every row from `artifact_packages`, `artifact_files`, `storage_locations`, and `retention_state`, then replays the events in sequence order through the canonical view writer. The result is **byte-identical** to the materialised state before the replay (verified by the WP-0001-T013 integration test). ## Failure modes operators should expect | Symptom | Likely cause | Fix | |--------------------------------------------------|----------------------------------------------|-----| | `/health` returns `status: degraded`, `db.healthy: false` | DB unreachable or migrations not applied | Check `ARTIFACTSTORE_DATABASE_URL`; run `make migrate`. | | `/health` returns `status: degraded`, `backend.healthy: false` | Storage root missing or unreadable | Recreate `ARTIFACTSTORE_STORAGE_LOCAL_ROOT` or fix permissions. | | `ObjectNotFoundError` from `get_file` | Underlying bytes deleted but the file row remains | Investigate; v1 does not garbage-collect orphaned rows (WP-0006). | | `DuplicateRelativePathError` from `ingest_file` | Same package + path ingested twice | Use a distinct `relative_path` per file within one package. | ## References - [INTENT.md](../INTENT.md) — purpose and scope. - [SCOPE.md](../SCOPE.md) — what this repo does and does not own. - [ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md) — module layout, data model, API shape. - [PLATFORM-AMBITION.md](PLATFORM-AMBITION.md) — longer-horizon thesis and the v1 schema commitments. - [ROADMAP.md](ROADMAP.md) — workplan sequencing. - [ASSEMBLY-EXPERIMENT.md](ASSEMBLY-EXPERIMENT.md) — opt-in asm research line. - [pilots/guide-board.md](pilots/guide-board.md) — guide-board pilot ingestion and State Hub linkage. ### Architecture Decision Records - [ADR-0001 — Content-Addressed Storage with Dual Digest](adr/0001-content-addressed-storage.md) - [ADR-0002 — Append-Only Event Log as Source of Truth](adr/0002-event-log-source-of-truth.md) - [ADR-0003 — Manifest Canonicalisation = Canonical CBOR](adr/0003-manifest-canonical-cbor.md) - [ADR-0004 — Control Plane / Data Plane Contract](adr/0004-control-plane-data-plane-contract.md) - [ADR-0005 — V1 Technology Stack](adr/0005-v1-tech-stack.md) - [ADR-0006 — OCI Artifact Compatibility Kept Reachable](adr/0006-oci-compatibility-reachable.md)