15 KiB
Operator Guide
Status: v0.1 (WP-0003 baseline) Updated: 2026-05-16
This guide is the user manual for running artifact-store v0.1: the library,
CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks,
and the guide-board pilot path.
For architectural background see ARCHITECTURE-BLUEPRINT.md, the ADRs under adr/, and the ROADMAP.
Prerequisites
- Python 3.12 or 3.13
uvon the PATH (one static binary)- A POSIX-ish shell (Linux, macOS, WSL2)
The pinned tech stack is documented in ADR-0005.
Quick start
uv sync --all-extras # install deps; produces .venv/ and uv.lock
cp .env.example .env # optional — the defaults work out of the box
make migrate-fresh # creates ./var/artifactstore.db and applies migrations
make dev # uvicorn on 127.0.0.1:8000
In another terminal:
curl -s http://127.0.0.1:8000/health | python3 -m json.tool
artifactstore health
Both should report status: ok.
Environment variables
All settings are prefixed with ARTIFACTSTORE_ and read by
pydantic-settings from the environment and (optionally) ./.env.
| Variable | Default | Purpose |
|---|---|---|
ARTIFACTSTORE_DATABASE_URL |
sqlite+aiosqlite:///./var/artifactstore.db |
SQLAlchemy async URL. Alembic translates +aiosqlite and +asyncpg to their sync drivers at migrate-time. |
ARTIFACTSTORE_STORAGE_LOCAL_ROOT |
./var/storage |
Root directory for the local filesystem storage backend. Created on first use. |
ARTIFACTSTORE_LOG_LEVEL |
INFO |
Python logging level (DEBUG / INFO / WARNING / ERROR). |
ARTIFACTSTORE_AUTH_TOKENS |
empty | Comma- or newline-separated shared-secret bearer tokens for the HTTP API. |
ARTIFACTSTORE_ANON_READ |
false |
Set true only for local demos where read endpoints may be anonymous. |
ARTIFACTSTORE_API_URL |
http://127.0.0.1:8000 |
Default API base URL used by HTTP-backed CLI commands. |
ARTIFACTSTORE_API_TOKEN |
empty | Default bearer token used by HTTP-backed CLI commands. |
ARTIFACTSTORE_GUIDE_BOARD_SCHEMA |
schemas/guide-board.run.v1.json |
Schema path used by guide-board pilot bootstrap helpers. |
ARTIFACTSTORE_RETENTION_CONFIG_PATH |
empty | Optional TOML file overriding retention-class default durations. |
ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS |
3600 |
Default interval for external schedulers that invoke the retention sweeper. |
ARTIFACTSTORE_STORAGE_BACKENDS |
local |
Comma-separated backend IDs to configure (local, s3). |
ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND |
local |
Backend used when no routing rule matches. |
ARTIFACTSTORE_STORAGE_BACKEND_ROUTES |
empty | Comma-separated producer:retention_class=backend_id rules; * is a wildcard. |
ARTIFACTSTORE_S3_ENDPOINT_URL |
empty | S3-compatible endpoint URL for Ceph RGW / MinIO / AWS S3. |
ARTIFACTSTORE_S3_REGION |
us-east-1 |
S3 signing region. |
ARTIFACTSTORE_S3_BUCKET |
empty | Bucket/container for artifact objects. |
ARTIFACTSTORE_S3_KEY_PREFIX |
empty | Optional object-key prefix before <algorithm>/<hex...>. |
ARTIFACTSTORE_S3_ACCESS_KEY_REF |
empty | Access key reference, env:NAME or file:/mounted/path. |
ARTIFACTSTORE_S3_SECRET_KEY_REF |
empty | Secret key reference, env:NAME or file:/mounted/path. |
ARTIFACTSTORE_S3_STORAGE_CLASS |
empty | Optional storage class sent on writes. |
ARTIFACTSTORE_S3_SSE |
empty | Optional server-side encryption value, e.g. AES256. |
ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES |
67108864 |
Multipart threshold for the S3 backend. |
ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES |
8388608 |
Multipart part size for the S3 backend. |
STATE_HUB_URL |
http://127.0.0.1:8000 |
State Hub base URL used by guide-board linkage helpers. |
STATE_HUB_WORKSTREAM_ID |
empty | Optional workstream id for State Hub linkage events. |
STATE_HUB_TASK_ID |
empty | Optional task id for State Hub linkage events. |
See .env.example for the canonical template.
Retention policy TOML
By default, retention durations come from the seeded retention_classes
rows. Operators can override the default duration per class with
ARTIFACTSTORE_RETENTION_CONFIG_PATH:
[retention_classes.transient]
default_duration_seconds = 86400
[retention_classes."raw-evidence"]
default_duration_seconds = 7776000
[retention_classes."summary-evidence"]
default_duration_seconds = 31536000
[retention_classes."release-evidence"]
default_duration_seconds = 220752000
[retention_classes."permanent-record"]
# Omit default_duration_seconds for no expiry.
Run artifactstore retention sweep from cron or another scheduler to mark
expired, unheld packages eligible for deletion. This work only records
eligibility; it never deletes bytes.
Database backends
SQLite (development default)
Zero-config. The database file lives at ./var/artifactstore.db by default
and is gitignored.
make migrate-fresh # drop and re-create
make migrate # idempotent: apply pending migrations
PostgreSQL 16+ (shared deployments)
Install the optional postgres extra (pulls in psycopg[binary] for
Alembic's sync driver):
uv sync --all-extras --extra postgres
Set the URL with the async driver; Alembic switches to +psycopg for
migrations automatically:
export ARTIFACTSTORE_DATABASE_URL=postgresql+asyncpg://artifactstore:secret@db.internal:5432/artifactstore
make migrate
The schema is identical to SQLite (per ADR-0002 the events table drives all materialised views).
Storage backends
The storage adapter SPI is documented in ADR-0001 and ADR-0004.
Local filesystem (default)
Objects are addressed by content (blake3:<hex>) and laid out as
<root>/<algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>
with atomic writes (tmpfile + fsync + rename).
S3-compatible backend
The s3 backend targets Ceph RGW first, with MinIO as the development
stand-in and AWS S3 as an interoperability check. Install the optional S3
dependency before enabling it:
uv sync --all-extras --extra s3
Ceph RGW example:
export ARTIFACTSTORE_STORAGE_BACKENDS=local,s3
export ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND=s3
export ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local'
export ARTIFACTSTORE_S3_ENDPOINT_URL=https://rgw.example.internal
export ARTIFACTSTORE_S3_REGION=us-east-1
export ARTIFACTSTORE_S3_BUCKET=artifact-store
export ARTIFACTSTORE_S3_KEY_PREFIX=prod/artifact-store
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:ARTIFACTSTORE_RGW_ACCESS_KEY
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore-rgw-secret
export ARTIFACTSTORE_S3_STORAGE_CLASS=STANDARD
export ARTIFACTSTORE_S3_SSE=AES256
Manual smoke against Ceph RGW:
artifactstore health
artifactstore push ./fixtures/smoke \
--producer guide-board \
--subject rgw-smoke \
--retention-class release-evidence
artifactstore storage verify --backend s3
The verification command re-reads stored objects, recomputes the primary
digest, emits v1.storage.location_verified, and marks failed locations as
failed. A nonzero failed-location count degrades /health.
CLI reference
artifactstore --help lists every subcommand. The v0.1 set:
| Command | Purpose |
|---|---|
artifactstore version |
Print the package version and exit. |
artifactstore migrate |
Run alembic upgrade head against the configured database. |
artifactstore replay |
Truncate every materialised view and rebuild it from the event log; prints the highest sequence applied. |
artifactstore health |
JSON liveness summary (db, backend, status). Same payload as the HTTP /health endpoint. |
artifactstore push <dir> |
Push a directory through the HTTP API and finalize the package. |
artifactstore manifest <package_id> |
Fetch the JSON manifest projection through the HTTP API. |
artifactstore retention sweep |
Run one deletion-eligibility sweep against the configured DB. |
artifactstore storage verify --backend <id> |
Re-read stored objects for a backend and record verification events. |
artifactstore guide-board ingest <run-dir> |
Ingest one guide-board run directory as an artifact package. |
The CLI is a thin client over artifactstore.registry.Registry
(see ADR-0005).
HTTP reference (v0.1)
| Route family | Purpose |
|---|---|
GET /, GET /health |
Anonymous service banner and liveness summary. |
GET /docs, GET /openapi.json |
FastAPI's interactive OpenAPI docs and generated schema. |
/packages... |
Create, list, inspect, upload files to, finalize, and retrieve manifests for packages. |
/files... |
File metadata and byte downloads, including single-range reads. |
/uploads... |
Upload-session wire shape for whole-body v1 uploads. |
/packages/{id}/retention... |
Extend retention, apply/release holds, and read retention history. |
POST /metadata-schemas |
Register package metadata schemas by slug. |
GET /events |
Long-poll event feed, CBOR by default or JSON with Accept: application/json. |
All non-health routes require a bearer token unless
ARTIFACTSTORE_ANON_READ=true is set for read endpoints.
End-to-end smoke test (Python library)
This exercises every layer (identity, manifest, events, dataplane, storage, registry, replay) end-to-end against the default SQLite + local FS configuration.
import asyncio
from collections.abc import AsyncIterator
from artifactstore.app import build_registry
from artifactstore.manifest import decode as manifest_decode
async def chunks(data: bytes) -> AsyncIterator[bytes]:
yield data
async def main() -> None:
registry = build_registry()
try:
pkg = await registry.create_package(
name="smoke-test",
producer="ops",
subject="example.org",
retention_class="raw-evidence",
actor="ops",
metadata={"smoke": True},
)
await registry.ingest_file(
pkg, relative_path="hello.txt", media_type="text/plain",
stream=chunks(b"hello world"), actor="ops",
)
manifest_addr = await registry.finalize_package(pkg, actor="ops")
cbor = await registry.get_manifest_bytes(pkg, format="cbor")
manifest = manifest_decode(cbor)
print("package:", pkg)
print("manifest digest:", manifest_addr)
print("files in manifest:", [f.relative_path for f in manifest.files])
finally:
await registry.dispose()
asyncio.run(main())
Prerequisites: make migrate-fresh has been run so the schema and the
retention class seeds exist.
Guide-board pilot
The guide-board pilot stores a run directory as one artifact package and records
only package identifiers in State Hub. See
docs/pilots/guide-board.md for schema registration,
the real ~/guide-board plus ~/open-cmis-tck smoke procedure, and the exact
POST /progress/ linkage payload.
Replay / disaster recovery
Every state-changing operation writes one row to events and updates the
materialised views in the same transaction
(ADR-0002). If the materialised
views are lost or corrupted, rebuild them from the event log:
artifactstore replay
The command drops every row from artifact_packages, artifact_files,
storage_locations, and retention_state, then replays the events in
sequence order through the canonical view writer. The result is
byte-identical to the materialised state before the replay
(verified by the WP-0001-T013 integration test).
Failure modes operators should expect
| Symptom | Likely cause | Fix |
|---|---|---|
/health returns status: degraded, db.healthy: false |
DB unreachable or migrations not applied | Check ARTIFACTSTORE_DATABASE_URL; run make migrate. |
/health returns status: degraded, backend.healthy: false |
Storage root missing or unreadable | Recreate ARTIFACTSTORE_STORAGE_LOCAL_ROOT or fix permissions. |
ObjectNotFoundError from get_file |
Underlying bytes deleted but the file row remains | Investigate; v1 does not garbage-collect orphaned rows (WP-0006). |
DuplicateRelativePathError from ingest_file |
Same package + path ingested twice | Use a distinct relative_path per file within one package. |
References
- INTENT.md — purpose and scope.
- SCOPE.md — what this repo does and does not own.
- ARCHITECTURE-BLUEPRINT.md — module layout, data model, API shape.
- PLATFORM-AMBITION.md — longer-horizon thesis and the v1 schema commitments.
- ROADMAP.md — workplan sequencing.
- ASSEMBLY-EXPERIMENT.md — opt-in asm research line.
- pilots/guide-board.md — guide-board pilot ingestion and State Hub linkage.