Files
artifact-store/workplans/ARTIFACT-STORE-WP-0004-s3-compatible-backend.md
tegwick 747afc27a6 docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00

132 lines
3.5 KiB
Markdown

---
id: ARTIFACT-STORE-WP-0004
type: workplan
title: "S3-Compatible Backend (Ceph RGW Target)"
repo: artifact-store
domain: stack
status: planned
owner: codex
topic_slug: stack
planning_priority: medium
planning_order: 4
created: "2026-05-15"
updated: "2026-05-15"
---
# ARTIFACT-STORE-WP-0004: S3-Compatible Backend
## Purpose
Add a second concrete storage backend that speaks the S3 protocol.
Validated targets: Ceph RGW (primary self-hosted production target),
MinIO (dev / CI), AWS S3 (interop check). The backend must satisfy
the storage SPI without any leaks of S3-specific concepts into the
registry.
## Constraints
- `storage.spi.StorageBackend` Protocol from WP-0001 is the contract.
- No S3 vocabulary leaks into `registry.*` or `api.*`.
- `docs/ARCHITECTURE-BLUEPRINT.md` storage-backend section.
## Prerequisites
- WP-0001 done (SPI exists, local backend exists as a reference).
## D4.1 - Configuration Surface
```task
id: ARTIFACT-STORE-WP-0004-T001
status: todo
priority: high
state_hub_task_id: "7b980a55-2364-48c3-98ac-081629a8d2b7"
```
Acceptance:
- `s3` backend configuration accepts: `endpoint_url`, `region`,
`bucket`, `key_prefix`, `access_key_ref`, `secret_key_ref`,
`storage_class`, `sse` (optional), `multipart_threshold_bytes`,
`multipart_chunk_bytes`.
- Credential references resolve from env vars or mounted files; never
from request bodies.
- Documented Ceph RGW configuration example checked in under
`docs/OPERATOR.md`.
## D4.2 - S3 Backend Implementation
```task
id: ARTIFACT-STORE-WP-0004-T002
status: todo
priority: high
```
Acceptance:
- `storage.backends.s3.S3Backend` implements the SPI using `aioboto3`
or `aiobotocore` (decision recorded in the workplan; whichever is
better-maintained at implementation time).
- Object key layout
`<key_prefix>/<digest_algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>`.
- `put` uses multipart for objects above the configured threshold.
- `get` supports `Range`.
- `head`, `delete`, `health` implemented.
- `delete` is idempotent (delete-of-missing returns success).
## D4.3 - Backend Selection And Routing
```task
id: ARTIFACT-STORE-WP-0004-T003
status: todo
priority: medium
```
Acceptance:
- A registry can have multiple backends configured; package creation
records which backend a file is stored in.
- Per-package backend selection rule: configurable function of
`retention_class` + producer; default routes everything to a single
backend.
- `storage_locations.backend_id` reflects the actual storage.
## D4.4 - Test Strategy: MinIO In CI, RGW As Documented Manual Smoke
```task
id: ARTIFACT-STORE-WP-0004-T004
status: todo
priority: high
```
Acceptance:
- Integration tests run against MinIO via `testcontainers-python`
(or a docker-compose fixture if testcontainers fights the WSL2
environment).
- A documented manual procedure tests against a real Ceph RGW
endpoint; results recorded in `docs/OPERATOR.md`.
- No CI dependency on a live Ceph or AWS account.
## D4.5 - Verification Pass
```task
id: ARTIFACT-STORE-WP-0004-T005
status: todo
priority: medium
```
Acceptance:
- `artifactstore storage verify --backend s3` re-reads every object in
the backend, recomputes its primary digest, and emits
`v1.storage.location_verified` events.
- Mismatches are reported as `failed` locations and surfaced via the
health endpoint.
## Success criteria
- The same package ingestion flow that worked against `local` in
WP-0001 works unchanged against `s3`.
- Switching backend by config — without code changes in the registry
or API layers — is the smoke test.