--- id: ARTIFACT-STORE-WP-0001 type: workplan title: "Foundation: Scaffold, Core Kernels, Local FS Backend" repo: artifact-store domain: stack status: done owner: codex topic_slug: stack planning_priority: high planning_order: 1 created: "2026-05-15" updated: "2026-05-16" state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb" --- # ARTIFACT-STORE-WP-0001: Foundation — Scaffold, Core Kernels, Local FS Backend ## Purpose Stand up the smallest credible `artifact-store` core. By the end of this workplan, the library can ingest a directory of files into a package, compute dual digests, write canonical-CBOR manifests, persist state through the append-only event log, store bytes on local filesystem, and replay materialised views from the event log. No HTTP API yet (that lands in WP-0002); a `/health` endpoint exists so that the dev loop has something to hit. The shape is **library-first** (ffmpeg-style). HTTP server and CLI are explicitly thin consumers of `artifactstore.registry`. ## Constraints (must satisfy) - ADR-0001 — content-addressed storage with dual digest. - ADR-0002 — append-only event log as source of truth. - ADR-0003 — manifest canonicalisation = canonical CBOR. - ADR-0004 — control plane / data plane SPI named. - ADR-0005 — v1 technology stack pinned (Python 3.12, uv, FastAPI, SQLAlchemy Core, asyncpg, alembic, cbor2, blake3, ruff, mypy, pytest). - ADR-0006 — OCI compatibility kept reachable. - `docs/ARCHITECTURE-BLUEPRINT.md` data model and module layout. ## Boundary This workplan builds the library and a minimal `/health` endpoint. It does NOT implement: package CRUD HTTP API (WP-0002), retention rules beyond the seed (WP-0003), S3-compatible backend (WP-0004), guide-board producer wiring (WP-0005), GC of unreferenced bytes (WP-0006). ## Target architecture (this workplan) ```text artifactstore (library) identity ──┐ manifest ──┼──> registry (orchestrator) ──> events (WAL + views) events ───┘ │ retention (seed only) └──> dataplane.spi ──> dataplane.inproc ──> storage.spi ──> storage.backends.local audit (view) └──> filesystem storage.spi dataplane.spi + inproc api.http (just /health) cli (just `artifactstore version`, `artifactstore migrate`, `artifactstore replay`) ``` ## D1.1 - Service Scaffold And Repository Identity ```task id: ARTIFACT-STORE-WP-0001-T001 status: done priority: high state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230" ``` Acceptance: - `pyproject.toml` with `hatchling` build backend, pinned dependencies per ADR-0005. - `uv.lock` committed. - `Makefile` exposes: `make dev`, `make test`, `make lint`, `make type`, `make migrate`. Each target is a thin shim, no logic inline. - `src/artifactstore/` package skeleton matches ADR-0005's layout (empty `__init__.py` and one placeholder module per top-level concern: `identity`, `manifest`, `events`, `retention`, `audit`, `storage`, `dataplane`, `registry`, `api/http`, `cli`, `config`). - `tests/{unit,integration}/conftest.py` in place. - `.env.example` documents required environment variables: `ARTIFACTSTORE_DATABASE_URL`, `ARTIFACTSTORE_STORAGE_LOCAL_ROOT`, `ARTIFACTSTORE_LOG_LEVEL`. - CI-equivalent local commands: `make lint && make type && make test` pass on a clean checkout. - `README.md` replaces the seed README: install with `uv sync`, run with `make dev`, test with `make test`, links to ADRs and blueprint. ## D1.2 - Digest Abstraction And Content Address ```task id: ARTIFACT-STORE-WP-0001-T009 status: done priority: high state_hub_task_id: "4dc465c5-5c14-412d-b8c0-aa84076e4560" ``` Acceptance: - `identity.Digest` value type with `algorithm: str` and `hex: str`, immutable, hashable. - `identity.ContentAddress` — string-form `:` with validating parser and emitter. - `identity.digest_stream(reader) -> {primary: Digest, sha256: Digest}` — single-pass dual-hash over an `AsyncIterator[bytes]`. Default primary algorithm: `blake3`. - Algorithm registry with `blake3` and `sha256` registered at import. - Property test: digest over random byte sequences round-trips through serialisation; `sha256` matches `hashlib.sha256(...).hexdigest()`; `blake3` matches `blake3.blake3(...).hexdigest()`. ## D1.3 - Manifest Codec (Canonical CBOR + JCS Projection) ```task id: ARTIFACT-STORE-WP-0001-T010 status: done priority: high state_hub_task_id: "8b45a3d9-aa19-4ae8-afe0-687417bf12d0" ``` Acceptance: - `manifest.Manifest` dataclass with the v1 fields enumerated in the blueprint (`manifest_version=1`, package, files, storage_receipts, retention_summary, provenance). - `manifest.codec.encode(m) -> bytes` produces canonical CBOR (RFC 8949 §4.2.2): definite-length, shortest-form integers, sorted map keys. - `manifest.codec.decode(b) -> Manifest`. - `manifest.projection.jcs(m) -> bytes` produces RFC 8785 canonical JSON. - Property test: `decode(encode(m)) == m` for randomly-generated manifests; `encode(decode(jcs_to_cbor(jcs(m)))) == encode(m)`. - Manifest digest helper: `manifest_digest(m) -> ContentAddress` using BLAKE3 over the canonical CBOR bytes. ## D1.4 - Registry Data Model And Migrations ```task id: ARTIFACT-STORE-WP-0001-T002 status: done priority: high state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e" ``` Acceptance: - Alembic configured with `migrations/` directory; `alembic upgrade head` works against both SQLite (dev) and PostgreSQL (prod). - `events`, `artifact_packages`, `artifact_files`, `storage_locations`, `retention_classes`, `retention_state`, `metadata_schemas` tables match the blueprint schema. - Seed migration populates `retention_classes` with the five v1 entries. - A `make migrate` and `make migrate-fresh` target work end-to-end on a clean DB. - All schema columns required by ADR-0001 (`digest_algorithm`, `digest_primary`, `digest_sha256`, `content_address`), ADR-0002 (full `events` table), and the blueprint's `retrieval_tier` and `restore_status` are present. ## D1.5 - Event Log Persistence And Replay ```task id: ARTIFACT-STORE-WP-0001-T011 status: done priority: high state_hub_task_id: "90fce17d-cce5-4687-ae9e-02abd7d92622" ``` Acceptance: - `events.write(transaction, Event)` writes one row in the given DB transaction. Sequence numbers are assigned by the DB (`BIGSERIAL`) and are guaranteed monotonic and gapless within a registry instance. - `events.tail(since_sequence) -> AsyncIterator[Event]` long-polls the table (notify-style on PostgreSQL via `LISTEN/NOTIFY`, poll-style on SQLite). - `events.replay(into=ViewWriter)` rebuilds all materialised view tables from `events` deterministically. - Test: ingesting a fixed sequence of events, then rebuilding the views from scratch, yields byte-identical materialised state. - Event payloads use canonical CBOR (`manifest.codec`) so the same bytes flow through registry → DB → tail consumer without re-encoding. ## D1.6 - Storage Adapter SPI And Local Filesystem Backend ```task id: ARTIFACT-STORE-WP-0001-T003 status: done priority: high state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a" ``` Acceptance: - `storage.spi.StorageBackend` Protocol matches the blueprint. - `storage.backends.local.LocalBackend` implements the SPI: - Object key layout `////`. - Atomic write via `fsync(tmpfile) + rename`. - Path traversal rejected at the SPI boundary. - `health()` returns disk usage and root accessibility. - Backend registry resolves by `backend_id` string (per ADR-0004). - Unit tests cover: put, get, head, delete, double-put idempotency, delete-of-missing, range read. ## D1.7 - Data Plane SPI And In-Process Implementation ```task id: ARTIFACT-STORE-WP-0001-T012 status: done priority: high state_hub_task_id: "8cb8a245-beb5-4713-8d1d-8a623431ad81" ``` Acceptance: - `dataplane.spi.DataPlane` Protocol matches ADR-0004. - `dataplane.inproc.InProcessDataPlane` implements all five operations on top of a configured `StorageBackend`. - `ingest_stream` computes both digests in a single pass, writes to the backend keyed by the primary content address, and returns an `IngestResult` containing both digests, size, and the `StorageReceipt`. - `serve_object` and `verify_object` re-read bytes through the backend; `verify_object` re-digests and returns mismatches if any. - Lint rule (or test): no code outside `dataplane.*` imports `storage.backends.*` directly. ## D1.8 - Registry Orchestrator (Library Surface) ```task id: ARTIFACT-STORE-WP-0001-T013 status: done priority: high state_hub_task_id: "f4967308-4613-4def-8c09-41caaeb631f7" ``` Acceptance: - `registry.Registry` exposes: `create_package`, `ingest_file`, `finalize_package`, `get_manifest_bytes` (CBOR + JCS), `get_file`, `tail_events`. Plus stubs for the retention operations that lighten WP-0003. - Each mutating operation is one DB transaction that writes events AND updates materialised views. - Finalisation writes one `v1.package.finalized` event whose payload *is* the canonical CBOR manifest, and stamps `manifest_digest` on `artifact_packages`. - Duplicate `relative_path` within one not-yet-finalised package is rejected unless an explicit replace is requested. - Integration test: end-to-end ingest of a 3-file package against local backend → finalize → read manifest → verify digests → tail events → replay rebuilds identical state. ## D1.9 - Minimal HTTP App And CLI ```task id: ARTIFACT-STORE-WP-0001-T014 status: done priority: medium state_hub_task_id: "a43628ab-8b53-45fa-852a-ff0118dd12e7" ``` Acceptance: - `api.http.app` is a FastAPI app with one route: `GET /health` reporting registry liveness, DB connectivity, and backend health. - `cli` exposes `artifactstore version`, `artifactstore migrate`, `artifactstore replay`, `artifactstore health`. - `make dev` starts the API on `127.0.0.1:8000` with SQLite + local FS backend by default. ## D1.10 - Operator Documentation And ADR Cross-Linking ```task id: ARTIFACT-STORE-WP-0001-T008 status: done priority: medium state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0" ``` Acceptance: - `README.md` updated with current run / test / migrate commands. - `AGENTS.md` "Current Repo Shape" section reflects the scaffold. - An `docs/OPERATOR.md` page documents environment variables, local vs PostgreSQL setup, replay command, and a smoke-test recipe. - Every ADR is cross-linked from at least one of: blueprint, this workplan, or `OPERATOR.md`. ## Suggested implementation order 1. T001 — scaffold and tooling (no other task can start without this). 2. T009 — digest abstraction (unblocks T010, T012). 3. T010 — manifest codec (unblocks T013). 4. T002 — schema and migrations (unblocks T011, T013). 5. T011 — event log + replay. 6. T003 — storage SPI + local backend. 7. T012 — data plane SPI + in-process impl. 8. T013 — registry orchestrator. 9. T014 — minimal HTTP app and CLI. 10. T008 — docs. ## Success criteria - `make dev && make test` round-trips on a clean checkout. - A scripted integration test ingests a directory of fixture files, finalises the package, reads the manifest, downloads each file, and verifies digests end-to-end against the local backend. - Replaying events from sequence 1 reproduces the materialised view state byte-for-byte. - The library can be imported and exercised without an HTTP server running (embedding test).