16 KiB
Operator Guide
Status: v0.1 (WP-0003 baseline) Updated: 2026-05-16
This guide is the user manual for running artifact-store v0.1: the library,
CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks,
and the guide-board pilot path.
For architectural background see ARCHITECTURE-BLUEPRINT.md, the ADRs under adr/, and the ROADMAP.
Prerequisites
- Python 3.12 or 3.13
uvon the PATH (one static binary)- A POSIX-ish shell (Linux, macOS, WSL2)
The pinned tech stack is documented in ADR-0005.
Quick start
uv sync --all-extras # install deps; produces .venv/ and uv.lock
cp .env.example .env # optional — the defaults work out of the box
make migrate-fresh # creates ./var/artifactstore.db and applies migrations
make dev # uvicorn on 127.0.0.1:8000
In another terminal:
curl -s http://127.0.0.1:8000/health | python3 -m json.tool
artifactstore health
Both should report status: ok.
Environment variables
All settings are prefixed with ARTIFACTSTORE_ and read by
pydantic-settings from the environment and (optionally) ./.env.
| Variable | Default | Purpose |
|---|---|---|
ARTIFACTSTORE_DATABASE_URL |
sqlite+aiosqlite:///./var/artifactstore.db |
SQLAlchemy async URL. Alembic translates +aiosqlite and +asyncpg to their sync drivers at migrate-time. |
ARTIFACTSTORE_STORAGE_LOCAL_ROOT |
./var/storage |
Root directory for the local filesystem storage backend. Created on first use. |
ARTIFACTSTORE_LOG_LEVEL |
INFO |
Python logging level (DEBUG / INFO / WARNING / ERROR). |
ARTIFACTSTORE_AUTH_TOKENS |
empty | Comma- or newline-separated shared-secret bearer tokens for the HTTP API. |
ARTIFACTSTORE_ANON_READ |
false |
Set true only for local demos where read endpoints may be anonymous. |
ARTIFACTSTORE_API_URL |
http://127.0.0.1:8000 |
Default API base URL used by HTTP-backed CLI commands. |
ARTIFACTSTORE_API_TOKEN |
empty | Default bearer token used by HTTP-backed CLI commands. |
ARTIFACTSTORE_GUIDE_BOARD_SCHEMA |
schemas/guide-board.run.v1.json |
Schema path used by guide-board pilot bootstrap helpers. |
ARTIFACTSTORE_RETENTION_CONFIG_PATH |
empty | Optional TOML file overriding retention-class default durations. |
ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS |
3600 |
Default interval for external schedulers that invoke the retention sweeper. |
ARTIFACTSTORE_STORAGE_BACKENDS |
local |
Comma-separated backend IDs to configure (local, s3). |
ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND |
local |
Backend used when no routing rule matches. |
ARTIFACTSTORE_STORAGE_BACKEND_ROUTES |
empty | Comma-separated producer:retention_class=backend_id rules; * is a wildcard. |
ARTIFACTSTORE_S3_ENDPOINT_URL |
empty | S3-compatible endpoint URL for Ceph RGW / MinIO / AWS S3. |
ARTIFACTSTORE_S3_REGION |
us-east-1 |
S3 signing region. |
ARTIFACTSTORE_S3_BUCKET |
empty | Bucket/container for artifact objects. |
ARTIFACTSTORE_S3_KEY_PREFIX |
empty | Optional object-key prefix before <algorithm>/<hex...>. |
ARTIFACTSTORE_S3_ACCESS_KEY_REF |
empty | Access key reference, env:NAME or file:/mounted/path. |
ARTIFACTSTORE_S3_SECRET_KEY_REF |
empty | Secret key reference, env:NAME or file:/mounted/path. |
ARTIFACTSTORE_S3_STORAGE_CLASS |
empty | Optional storage class sent on writes. |
ARTIFACTSTORE_S3_SSE |
empty | Optional server-side encryption value, e.g. AES256. |
ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES |
67108864 |
Multipart threshold for the S3 backend. |
ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES |
8388608 |
Multipart part size for the S3 backend. |
STATE_HUB_URL |
http://127.0.0.1:8000 |
State Hub base URL used by guide-board linkage helpers. |
STATE_HUB_WORKSTREAM_ID |
empty | Optional workstream id for State Hub linkage events. |
STATE_HUB_TASK_ID |
empty | Optional task id for State Hub linkage events. |
See .env.example for the canonical template.
Retention policy TOML
By default, retention durations come from the seeded retention_classes
rows. Operators can override the default duration per class with
ARTIFACTSTORE_RETENTION_CONFIG_PATH:
[retention_classes.transient]
default_duration_seconds = 86400
[retention_classes."raw-evidence"]
default_duration_seconds = 7776000
[retention_classes."summary-evidence"]
default_duration_seconds = 31536000
[retention_classes."release-evidence"]
default_duration_seconds = 220752000
[retention_classes."permanent-record"]
# Omit default_duration_seconds for no expiry.
Run artifactstore retention sweep from cron or another scheduler to mark
expired, unheld packages eligible for deletion. Then run
artifactstore retention gc to release the eligible packages' storage
locations and delete physical objects whose final reference has been
released:
artifactstore retention sweep
artifactstore retention gc
GC is reference-counted by (backend_id, content_address): shared bytes stay in
the backend until every non-deleted storage location has been released. Each
released location emits a v1.storage.location_deleted event. A package becomes
garbage_collected only after all of its storage locations are released.
Database backends
SQLite (development default)
Zero-config. The database file lives at ./var/artifactstore.db by default
and is gitignored.
make migrate-fresh # drop and re-create
make migrate # idempotent: apply pending migrations
PostgreSQL 16+ (shared deployments)
Install the optional postgres extra (pulls in psycopg[binary] for
Alembic's sync driver):
uv sync --all-extras --extra postgres
Set the URL with the async driver; Alembic switches to +psycopg for
migrations automatically:
export ARTIFACTSTORE_DATABASE_URL=postgresql+asyncpg://artifactstore:secret@db.internal:5432/artifactstore
make migrate
The schema is identical to SQLite (per ADR-0002 the events table drives all materialised views).
Storage backends
The storage adapter SPI is documented in ADR-0001 and ADR-0004.
Local filesystem (default)
Objects are addressed by content (blake3:<hex>) and laid out as
<root>/<algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>
with atomic writes (tmpfile + fsync + rename).
S3-compatible backend
The s3 backend targets Ceph RGW first, with MinIO as the development
stand-in and AWS S3 as an interoperability check. Install the optional S3
dependency before enabling it:
uv sync --all-extras --extra s3
Ceph RGW example:
export ARTIFACTSTORE_STORAGE_BACKENDS=local,s3
export ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND=s3
export ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local'
export ARTIFACTSTORE_S3_ENDPOINT_URL=https://rgw.example.internal
export ARTIFACTSTORE_S3_REGION=us-east-1
export ARTIFACTSTORE_S3_BUCKET=artifact-store
export ARTIFACTSTORE_S3_KEY_PREFIX=prod/artifact-store
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:ARTIFACTSTORE_RGW_ACCESS_KEY
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore-rgw-secret
export ARTIFACTSTORE_S3_STORAGE_CLASS=STANDARD
export ARTIFACTSTORE_S3_SSE=AES256
Manual smoke against Ceph RGW:
artifactstore health
artifactstore push ./fixtures/smoke \
--producer guide-board \
--subject rgw-smoke \
--retention-class release-evidence
artifactstore storage verify --backend s3
The verification command re-reads stored objects, recomputes the primary
digest, emits v1.storage.location_verified, and marks failed locations as
failed. A nonzero failed-location count degrades /health.
CLI reference
artifactstore --help lists every subcommand. The v0.1 set:
| Command | Purpose |
|---|---|
artifactstore version |
Print the package version and exit. |
artifactstore migrate |
Run alembic upgrade head against the configured database. |
artifactstore replay |
Truncate every materialised view and rebuild it from the event log; prints the highest sequence applied. |
artifactstore health |
JSON liveness summary (db, backend, status). Same payload as the HTTP /health endpoint. |
artifactstore push <dir> |
Push a directory through the HTTP API and finalize the package. |
artifactstore manifest <package_id> |
Fetch the JSON manifest projection through the HTTP API. |
artifactstore retention sweep |
Run one deletion-eligibility sweep against the configured DB. |
artifactstore retention gc |
Run one reference-counted garbage-collection pass. |
artifactstore storage verify --backend <id> |
Re-read stored objects for a backend and record verification events. |
artifactstore guide-board ingest <run-dir> |
Ingest one guide-board run directory as an artifact package. |
The CLI is a thin client over artifactstore.registry.Registry
(see ADR-0005).
HTTP reference (v0.1)
| Route family | Purpose |
|---|---|
GET /, GET /health |
Anonymous service banner and liveness summary. |
GET /docs, GET /openapi.json |
FastAPI's interactive OpenAPI docs and generated schema. |
/packages... |
Create, list, inspect, upload files to, finalize, and retrieve manifests for packages. |
/files... |
File metadata and byte downloads, including single-range reads. |
/uploads... |
Upload-session wire shape for whole-body v1 uploads. |
/packages/{id}/retention... |
Extend retention, apply/release holds, and read retention history. |
POST /metadata-schemas |
Register package metadata schemas by slug. |
GET /events |
Long-poll event feed, CBOR by default or JSON with Accept: application/json. |
All non-health routes require a bearer token unless
ARTIFACTSTORE_ANON_READ=true is set for read endpoints.
End-to-end smoke test (Python library)
This exercises every layer (identity, manifest, events, dataplane, storage, registry, replay) end-to-end against the default SQLite + local FS configuration.
import asyncio
from collections.abc import AsyncIterator
from artifactstore.app import build_registry
from artifactstore.manifest import decode as manifest_decode
async def chunks(data: bytes) -> AsyncIterator[bytes]:
yield data
async def main() -> None:
registry = build_registry()
try:
pkg = await registry.create_package(
name="smoke-test",
producer="ops",
subject="example.org",
retention_class="raw-evidence",
actor="ops",
metadata={"smoke": True},
)
await registry.ingest_file(
pkg, relative_path="hello.txt", media_type="text/plain",
stream=chunks(b"hello world"), actor="ops",
)
manifest_addr = await registry.finalize_package(pkg, actor="ops")
cbor = await registry.get_manifest_bytes(pkg, format="cbor")
manifest = manifest_decode(cbor)
print("package:", pkg)
print("manifest digest:", manifest_addr)
print("files in manifest:", [f.relative_path for f in manifest.files])
finally:
await registry.dispose()
asyncio.run(main())
Prerequisites: make migrate-fresh has been run so the schema and the
retention class seeds exist.
Guide-board pilot
The guide-board pilot stores a run directory as one artifact package and records
only package identifiers in State Hub. See
docs/pilots/guide-board.md for schema registration,
the real ~/guide-board plus ~/open-cmis-tck smoke procedure, and the exact
POST /progress/ linkage payload.
Replay / disaster recovery
Every state-changing operation writes one row to events and updates the
materialised views in the same transaction
(ADR-0002). If the materialised
views are lost or corrupted, rebuild them from the event log:
artifactstore replay
The command drops every row from artifact_packages, artifact_files,
storage_locations, and retention_state, then replays the events in
sequence order through the canonical view writer. The result is
byte-identical to the materialised state before the replay
(verified by the WP-0001-T013 integration test).
Failure modes operators should expect
| Symptom | Likely cause | Fix |
|---|---|---|
/health returns status: degraded, db.healthy: false |
DB unreachable or migrations not applied | Check ARTIFACTSTORE_DATABASE_URL; run make migrate. |
/health returns status: degraded, backend.healthy: false |
Storage root missing or unreadable | Recreate ARTIFACTSTORE_STORAGE_LOCAL_ROOT or fix permissions. |
ObjectNotFoundError from get_file |
Underlying bytes deleted but the file row remains | Investigate; v1 does not garbage-collect orphaned rows (WP-0006). |
DuplicateRelativePathError from ingest_file |
Same package + path ingested twice | Use a distinct relative_path per file within one package. |
References
- INTENT.md — purpose and scope.
- SCOPE.md — what this repo does and does not own.
- ARCHITECTURE-BLUEPRINT.md — module layout, data model, API shape.
- PLATFORM-AMBITION.md — longer-horizon thesis and the v1 schema commitments.
- ROADMAP.md — workplan sequencing.
- ASSEMBLY-EXPERIMENT.md — opt-in asm research line.
- pilots/guide-board.md — guide-board pilot ingestion and State Hub linkage.