Files
artifact-store/docs/OPERATOR.md

345 lines
16 KiB
Markdown

# Operator Guide
Status: v0.1 (WP-0003 baseline)
Updated: 2026-05-16
This guide is the user manual for running `artifact-store` v0.1: the library,
CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks,
and the guide-board pilot path.
For architectural background see
[ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md), the ADRs under
[adr/](adr/), and the [ROADMAP](ROADMAP.md).
## Prerequisites
- Python 3.12 or 3.13
- [`uv`](https://docs.astral.sh/uv/) on the PATH (one static binary)
- A POSIX-ish shell (Linux, macOS, WSL2)
The pinned tech stack is documented in
[ADR-0005](adr/0005-v1-tech-stack.md).
## Quick start
```sh
uv sync --all-extras # install deps; produces .venv/ and uv.lock
cp .env.example .env # optional — the defaults work out of the box
make migrate-fresh # creates ./var/artifactstore.db and applies migrations
make dev # uvicorn on 127.0.0.1:8000
```
In another terminal:
```sh
curl -s http://127.0.0.1:8000/health | python3 -m json.tool
artifactstore health
```
Both should report ``status: ok``.
## Environment variables
All settings are prefixed with ``ARTIFACTSTORE_`` and read by
`pydantic-settings` from the environment and (optionally) `./.env`.
| Variable | Default | Purpose |
|-----------------------------------|-----------------------------------------------|---------|
| `ARTIFACTSTORE_DATABASE_URL` | `sqlite+aiosqlite:///./var/artifactstore.db` | SQLAlchemy async URL. Alembic translates `+aiosqlite` and `+asyncpg` to their sync drivers at migrate-time. |
| `ARTIFACTSTORE_STORAGE_LOCAL_ROOT`| `./var/storage` | Root directory for the local filesystem storage backend. Created on first use. |
| `ARTIFACTSTORE_LOG_LEVEL` | `INFO` | Python logging level (`DEBUG` / `INFO` / `WARNING` / `ERROR`). |
| `ARTIFACTSTORE_AUTH_TOKENS` | empty | Comma- or newline-separated shared-secret bearer tokens for the HTTP API. |
| `ARTIFACTSTORE_ANON_READ` | `false` | Set `true` only for local demos where read endpoints may be anonymous. |
| `ARTIFACTSTORE_API_URL` | `http://127.0.0.1:8000` | Default API base URL used by HTTP-backed CLI commands. |
| `ARTIFACTSTORE_API_TOKEN` | empty | Default bearer token used by HTTP-backed CLI commands. |
| `ARTIFACTSTORE_GUIDE_BOARD_SCHEMA` | `schemas/guide-board.run.v1.json` | Schema path used by guide-board pilot bootstrap helpers. |
| `ARTIFACTSTORE_RETENTION_CONFIG_PATH` | empty | Optional TOML file overriding retention-class default durations. |
| `ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS` | `3600` | Default interval for external schedulers that invoke the retention sweeper. |
| `ARTIFACTSTORE_STORAGE_BACKENDS` | `local` | Comma-separated backend IDs to configure (`local`, `s3`). |
| `ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND` | `local` | Backend used when no routing rule matches. |
| `ARTIFACTSTORE_STORAGE_BACKEND_ROUTES` | empty | Comma-separated `producer:retention_class=backend_id` rules; `*` is a wildcard. |
| `ARTIFACTSTORE_S3_ENDPOINT_URL` | empty | S3-compatible endpoint URL for Ceph RGW / MinIO / AWS S3. |
| `ARTIFACTSTORE_S3_REGION` | `us-east-1` | S3 signing region. |
| `ARTIFACTSTORE_S3_BUCKET` | empty | Bucket/container for artifact objects. |
| `ARTIFACTSTORE_S3_KEY_PREFIX` | empty | Optional object-key prefix before `<algorithm>/<hex...>`. |
| `ARTIFACTSTORE_S3_ACCESS_KEY_REF` | empty | Access key reference, `env:NAME` or `file:/mounted/path`. |
| `ARTIFACTSTORE_S3_SECRET_KEY_REF` | empty | Secret key reference, `env:NAME` or `file:/mounted/path`. |
| `ARTIFACTSTORE_S3_STORAGE_CLASS` | empty | Optional storage class sent on writes. |
| `ARTIFACTSTORE_S3_SSE` | empty | Optional server-side encryption value, e.g. `AES256`. |
| `ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES` | `67108864` | Multipart threshold for the S3 backend. |
| `ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES` | `8388608` | Multipart part size for the S3 backend. |
| `STATE_HUB_URL` | `http://127.0.0.1:8000` | State Hub base URL used by guide-board linkage helpers. |
| `STATE_HUB_WORKSTREAM_ID` | empty | Optional workstream id for State Hub linkage events. |
| `STATE_HUB_TASK_ID` | empty | Optional task id for State Hub linkage events. |
See [`.env.example`](../.env.example) for the canonical template.
### Retention policy TOML
By default, retention durations come from the seeded `retention_classes`
rows. Operators can override the default duration per class with
`ARTIFACTSTORE_RETENTION_CONFIG_PATH`:
```toml
[retention_classes.transient]
default_duration_seconds = 86400
[retention_classes."raw-evidence"]
default_duration_seconds = 7776000
[retention_classes."summary-evidence"]
default_duration_seconds = 31536000
[retention_classes."release-evidence"]
default_duration_seconds = 220752000
[retention_classes."permanent-record"]
# Omit default_duration_seconds for no expiry.
```
Run `artifactstore retention sweep` from cron or another scheduler to mark
expired, unheld packages eligible for deletion. Then run
`artifactstore retention gc` to release the eligible packages' storage
locations and delete physical objects whose final reference has been
released:
```sh
artifactstore retention sweep
artifactstore retention gc
```
GC is reference-counted by `(backend_id, content_address)`: shared bytes stay in
the backend until every non-deleted storage location has been released. Each
released location emits a `v1.storage.location_deleted` event. A package becomes
`garbage_collected` only after all of its storage locations are released.
## Database backends
### SQLite (development default)
Zero-config. The database file lives at `./var/artifactstore.db` by default
and is gitignored.
```sh
make migrate-fresh # drop and re-create
make migrate # idempotent: apply pending migrations
```
### PostgreSQL 16+ (shared deployments)
Install the optional `postgres` extra (pulls in `psycopg[binary]` for
Alembic's sync driver):
```sh
uv sync --all-extras --extra postgres
```
Set the URL with the async driver; Alembic switches to `+psycopg` for
migrations automatically:
```sh
export ARTIFACTSTORE_DATABASE_URL=postgresql+asyncpg://artifactstore:secret@db.internal:5432/artifactstore
make migrate
```
The schema is identical to SQLite (per
[ADR-0002](adr/0002-event-log-source-of-truth.md) the events table drives
all materialised views).
## Storage backends
The storage adapter SPI is documented in
[ADR-0001](adr/0001-content-addressed-storage.md) and
[ADR-0004](adr/0004-control-plane-data-plane-contract.md).
### Local filesystem (default)
Objects are addressed by content (`blake3:<hex>`) and laid out as
```
<root>/<algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>
```
with atomic writes (tmpfile + fsync + rename).
### S3-compatible backend
The `s3` backend targets Ceph RGW first, with MinIO as the development
stand-in and AWS S3 as an interoperability check. Install the optional S3
dependency before enabling it:
```sh
uv sync --all-extras --extra s3
```
Ceph RGW example:
```sh
export ARTIFACTSTORE_STORAGE_BACKENDS=local,s3
export ARTIFACTSTORE_STORAGE_DEFAULT_BACKEND=s3
export ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local'
export ARTIFACTSTORE_S3_ENDPOINT_URL=https://rgw.example.internal
export ARTIFACTSTORE_S3_REGION=us-east-1
export ARTIFACTSTORE_S3_BUCKET=artifact-store
export ARTIFACTSTORE_S3_KEY_PREFIX=prod/artifact-store
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=env:ARTIFACTSTORE_RGW_ACCESS_KEY
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore-rgw-secret
export ARTIFACTSTORE_S3_STORAGE_CLASS=STANDARD
export ARTIFACTSTORE_S3_SSE=AES256
```
Manual smoke against Ceph RGW:
```sh
artifactstore health
artifactstore push ./fixtures/smoke \
--producer guide-board \
--subject rgw-smoke \
--retention-class release-evidence
artifactstore storage verify --backend s3
```
The verification command re-reads stored objects, recomputes the primary
digest, emits `v1.storage.location_verified`, and marks failed locations as
`failed`. A nonzero failed-location count degrades `/health`.
## CLI reference
`artifactstore --help` lists every subcommand. The v0.1 set:
| Command | Purpose |
|--------------------------|---------|
| `artifactstore version` | Print the package version and exit. |
| `artifactstore migrate` | Run `alembic upgrade head` against the configured database. |
| `artifactstore replay` | Truncate every materialised view and rebuild it from the event log; prints the highest sequence applied. |
| `artifactstore health` | JSON liveness summary (db, backend, status). Same payload as the HTTP `/health` endpoint. |
| `artifactstore push <dir>` | Push a directory through the HTTP API and finalize the package. |
| `artifactstore manifest <package_id>` | Fetch the JSON manifest projection through the HTTP API. |
| `artifactstore retention sweep` | Run one deletion-eligibility sweep against the configured DB. |
| `artifactstore retention gc` | Run one reference-counted garbage-collection pass. |
| `artifactstore storage verify --backend <id>` | Re-read stored objects for a backend and record verification events. |
| `artifactstore guide-board ingest <run-dir>` | Ingest one guide-board run directory as an artifact package. |
The CLI is a thin client over `artifactstore.registry.Registry`
(see [ADR-0005](adr/0005-v1-tech-stack.md)).
## HTTP reference (v0.1)
| Route family | Purpose |
|-----------------------|---------|
| `GET /`, `GET /health` | Anonymous service banner and liveness summary. |
| `GET /docs`, `GET /openapi.json` | FastAPI's interactive OpenAPI docs and generated schema. |
| `/packages...` | Create, list, inspect, upload files to, finalize, and retrieve manifests for packages. |
| `/files...` | File metadata and byte downloads, including single-range reads. |
| `/uploads...` | Upload-session wire shape for whole-body v1 uploads. |
| `/packages/{id}/retention...` | Extend retention, apply/release holds, and read retention history. |
| `POST /metadata-schemas` | Register package metadata schemas by slug. |
| `GET /events` | Long-poll event feed, CBOR by default or JSON with `Accept: application/json`. |
All non-health routes require a bearer token unless
`ARTIFACTSTORE_ANON_READ=true` is set for read endpoints.
## End-to-end smoke test (Python library)
This exercises every layer (identity, manifest, events, dataplane, storage,
registry, replay) end-to-end against the default SQLite + local FS configuration.
```python
import asyncio
from collections.abc import AsyncIterator
from artifactstore.app import build_registry
from artifactstore.manifest import decode as manifest_decode
async def chunks(data: bytes) -> AsyncIterator[bytes]:
yield data
async def main() -> None:
registry = build_registry()
try:
pkg = await registry.create_package(
name="smoke-test",
producer="ops",
subject="example.org",
retention_class="raw-evidence",
actor="ops",
metadata={"smoke": True},
)
await registry.ingest_file(
pkg, relative_path="hello.txt", media_type="text/plain",
stream=chunks(b"hello world"), actor="ops",
)
manifest_addr = await registry.finalize_package(pkg, actor="ops")
cbor = await registry.get_manifest_bytes(pkg, format="cbor")
manifest = manifest_decode(cbor)
print("package:", pkg)
print("manifest digest:", manifest_addr)
print("files in manifest:", [f.relative_path for f in manifest.files])
finally:
await registry.dispose()
asyncio.run(main())
```
Prerequisites: `make migrate-fresh` has been run so the schema and the
retention class seeds exist.
## Guide-board pilot
The guide-board pilot stores a run directory as one artifact package and records
only package identifiers in State Hub. See
[docs/pilots/guide-board.md](pilots/guide-board.md) for schema registration,
the real `~/guide-board` plus `~/open-cmis-tck` smoke procedure, and the exact
`POST /progress/` linkage payload.
## Replay / disaster recovery
Every state-changing operation writes one row to `events` and updates the
materialised views in the same transaction
([ADR-0002](adr/0002-event-log-source-of-truth.md)). If the materialised
views are lost or corrupted, rebuild them from the event log:
```sh
artifactstore replay
```
The command drops every row from `artifact_packages`, `artifact_files`,
`storage_locations`, and `retention_state`, then replays the events in
sequence order through the canonical view writer. The result is
**byte-identical** to the materialised state before the replay
(verified by the WP-0001-T013 integration test).
## Failure modes operators should expect
| Symptom | Likely cause | Fix |
|--------------------------------------------------|----------------------------------------------|-----|
| `/health` returns `status: degraded`, `db.healthy: false` | DB unreachable or migrations not applied | Check `ARTIFACTSTORE_DATABASE_URL`; run `make migrate`. |
| `/health` returns `status: degraded`, `backend.healthy: false` | Storage root missing or unreadable | Recreate `ARTIFACTSTORE_STORAGE_LOCAL_ROOT` or fix permissions. |
| `ObjectNotFoundError` from `get_file` | Underlying bytes deleted but the file row remains | Investigate; v1 does not garbage-collect orphaned rows (WP-0006). |
| `DuplicateRelativePathError` from `ingest_file` | Same package + path ingested twice | Use a distinct `relative_path` per file within one package. |
## References
- [INTENT.md](../INTENT.md) — purpose and scope.
- [SCOPE.md](../SCOPE.md) — what this repo does and does not own.
- [ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md) — module layout,
data model, API shape.
- [PLATFORM-AMBITION.md](PLATFORM-AMBITION.md) — longer-horizon thesis
and the v1 schema commitments.
- [ROADMAP.md](ROADMAP.md) — workplan sequencing.
- [ASSEMBLY-EXPERIMENT.md](ASSEMBLY-EXPERIMENT.md) — opt-in asm research line.
- [pilots/guide-board.md](pilots/guide-board.md) — guide-board pilot ingestion
and State Hub linkage.
### Architecture Decision Records
- [ADR-0001 — Content-Addressed Storage with Dual Digest](adr/0001-content-addressed-storage.md)
- [ADR-0002 — Append-Only Event Log as Source of Truth](adr/0002-event-log-source-of-truth.md)
- [ADR-0003 — Manifest Canonicalisation = Canonical CBOR](adr/0003-manifest-canonical-cbor.md)
- [ADR-0004 — Control Plane / Data Plane Contract](adr/0004-control-plane-data-plane-contract.md)
- [ADR-0005 — V1 Technology Stack](adr/0005-v1-tech-stack.md)
- [ADR-0006 — OCI Artifact Compatibility Kept Reachable](adr/0006-oci-compatibility-reachable.md)