Files
artifact-store/workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
tegwick f90c761ef6 WP-0001-T008: operator docs + ADR cross-linking; mark WP-0001 done
docs/OPERATOR.md (new): runbook with prerequisites, quick start,
environment variables, SQLite + PostgreSQL setup, storage layout,
CLI reference, HTTP /health reference, an end-to-end Python smoke
test (create_package -> ingest_file -> finalize -> manifest), the
replay / disaster-recovery procedure, common failure modes, and a
References section that links every ADR (0001..0006), the
blueprint, platform ambition, roadmap, and assembly experiment.

README.md: refreshed to v0.1 baseline status. Quick-start uses the
real flow (uv sync, migrate-fresh, dev, /health, artifactstore health).
Make targets and CLI commands tabulated. Links docs/OPERATOR.md.

AGENTS.md: Current Repo Shape now reflects the landed scaffold +
library + CLI + HTTP app rather than "no runnable scaffold yet";
links OPERATOR.md and lists the canonical local commands.

workplans/ARTIFACT-STORE-WP-0001-service-baseline.md:
- T008 marked done.
- frontmatter status: active -> done; updated: 2026-05-16.

All ten WP-0001 tasks are now done (T001/T002/T003/T008/T009/T010/
T011/T012/T013/T014). Foundation workplan retires.

Gates: ruff clean, mypy --strict clean, 83 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:02:36 +02:00

320 lines
11 KiB
Markdown

---
id: ARTIFACT-STORE-WP-0001
type: workplan
title: "Foundation: Scaffold, Core Kernels, Local FS Backend"
repo: artifact-store
domain: stack
status: done
owner: codex
topic_slug: stack
planning_priority: high
planning_order: 1
created: "2026-05-15"
updated: "2026-05-16"
state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb"
---
# ARTIFACT-STORE-WP-0001: Foundation — Scaffold, Core Kernels, Local FS Backend
## Purpose
Stand up the smallest credible `artifact-store` core. By the end of
this workplan, the library can ingest a directory of files into a
package, compute dual digests, write canonical-CBOR manifests, persist
state through the append-only event log, store bytes on local
filesystem, and replay materialised views from the event log. No HTTP
API yet (that lands in WP-0002); a `/health` endpoint exists so that
the dev loop has something to hit.
The shape is **library-first** (ffmpeg-style). HTTP server and CLI are
explicitly thin consumers of `artifactstore.registry`.
## Constraints (must satisfy)
- ADR-0001 — content-addressed storage with dual digest.
- ADR-0002 — append-only event log as source of truth.
- ADR-0003 — manifest canonicalisation = canonical CBOR.
- ADR-0004 — control plane / data plane SPI named.
- ADR-0005 — v1 technology stack pinned (Python 3.12, uv, FastAPI,
SQLAlchemy Core, asyncpg, alembic, cbor2, blake3, ruff, mypy, pytest).
- ADR-0006 — OCI compatibility kept reachable.
- `docs/ARCHITECTURE-BLUEPRINT.md` data model and module layout.
## Boundary
This workplan builds the library and a minimal `/health` endpoint. It
does NOT implement: package CRUD HTTP API (WP-0002), retention rules
beyond the seed (WP-0003), S3-compatible backend (WP-0004), guide-board
producer wiring (WP-0005), GC of unreferenced bytes (WP-0006).
## Target architecture (this workplan)
```text
artifactstore (library)
identity ──┐
manifest ──┼──> registry (orchestrator) ──> events (WAL + views)
events ───┘ │
retention (seed only) └──> dataplane.spi ──> dataplane.inproc ──> storage.spi ──> storage.backends.local
audit (view) └──> filesystem
storage.spi
dataplane.spi + inproc
api.http (just /health)
cli (just `artifactstore version`, `artifactstore migrate`, `artifactstore replay`)
```
## D1.1 - Service Scaffold And Repository Identity
```task
id: ARTIFACT-STORE-WP-0001-T001
status: done
priority: high
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"
```
Acceptance:
- `pyproject.toml` with `hatchling` build backend, pinned dependencies
per ADR-0005.
- `uv.lock` committed.
- `Makefile` exposes: `make dev`, `make test`, `make lint`, `make
type`, `make migrate`. Each target is a thin shim, no logic inline.
- `src/artifactstore/` package skeleton matches ADR-0005's layout
(empty `__init__.py` and one placeholder module per top-level
concern: `identity`, `manifest`, `events`, `retention`, `audit`,
`storage`, `dataplane`, `registry`, `api/http`, `cli`, `config`).
- `tests/{unit,integration}/conftest.py` in place.
- `.env.example` documents required environment variables:
`ARTIFACTSTORE_DATABASE_URL`, `ARTIFACTSTORE_STORAGE_LOCAL_ROOT`,
`ARTIFACTSTORE_LOG_LEVEL`.
- CI-equivalent local commands: `make lint && make type && make test`
pass on a clean checkout.
- `README.md` replaces the seed README: install with `uv sync`, run
with `make dev`, test with `make test`, links to ADRs and blueprint.
## D1.2 - Digest Abstraction And Content Address
```task
id: ARTIFACT-STORE-WP-0001-T009
status: done
priority: high
state_hub_task_id: "4dc465c5-5c14-412d-b8c0-aa84076e4560"
```
Acceptance:
- `identity.Digest` value type with `algorithm: str` and `hex: str`,
immutable, hashable.
- `identity.ContentAddress` — string-form `<algorithm>:<hex>` with
validating parser and emitter.
- `identity.digest_stream(reader) -> {primary: Digest, sha256: Digest}` —
single-pass dual-hash over an `AsyncIterator[bytes]`. Default primary
algorithm: `blake3`.
- Algorithm registry with `blake3` and `sha256` registered at import.
- Property test: digest over random byte sequences round-trips through
serialisation; `sha256` matches `hashlib.sha256(...).hexdigest()`;
`blake3` matches `blake3.blake3(...).hexdigest()`.
## D1.3 - Manifest Codec (Canonical CBOR + JCS Projection)
```task
id: ARTIFACT-STORE-WP-0001-T010
status: done
priority: high
state_hub_task_id: "8b45a3d9-aa19-4ae8-afe0-687417bf12d0"
```
Acceptance:
- `manifest.Manifest` dataclass with the v1 fields enumerated in the
blueprint (`manifest_version=1`, package, files, storage_receipts,
retention_summary, provenance).
- `manifest.codec.encode(m) -> bytes` produces canonical CBOR
(RFC 8949 §4.2.2): definite-length, shortest-form integers,
sorted map keys.
- `manifest.codec.decode(b) -> Manifest`.
- `manifest.projection.jcs(m) -> bytes` produces RFC 8785 canonical
JSON.
- Property test: `decode(encode(m)) == m` for randomly-generated
manifests; `encode(decode(jcs_to_cbor(jcs(m)))) == encode(m)`.
- Manifest digest helper: `manifest_digest(m) -> ContentAddress` using
BLAKE3 over the canonical CBOR bytes.
## D1.4 - Registry Data Model And Migrations
```task
id: ARTIFACT-STORE-WP-0001-T002
status: done
priority: high
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"
```
Acceptance:
- Alembic configured with `migrations/` directory; `alembic upgrade
head` works against both SQLite (dev) and PostgreSQL (prod).
- `events`, `artifact_packages`, `artifact_files`, `storage_locations`,
`retention_classes`, `retention_state`, `metadata_schemas` tables
match the blueprint schema.
- Seed migration populates `retention_classes` with the five v1 entries.
- A `make migrate` and `make migrate-fresh` target work end-to-end on
a clean DB.
- All schema columns required by ADR-0001 (`digest_algorithm`,
`digest_primary`, `digest_sha256`, `content_address`), ADR-0002
(full `events` table), and the blueprint's `retrieval_tier` and
`restore_status` are present.
## D1.5 - Event Log Persistence And Replay
```task
id: ARTIFACT-STORE-WP-0001-T011
status: done
priority: high
state_hub_task_id: "90fce17d-cce5-4687-ae9e-02abd7d92622"
```
Acceptance:
- `events.write(transaction, Event)` writes one row in the given DB
transaction. Sequence numbers are assigned by the DB
(`BIGSERIAL`) and are guaranteed monotonic and gapless within a
registry instance.
- `events.tail(since_sequence) -> AsyncIterator[Event]` long-polls
the table (notify-style on PostgreSQL via `LISTEN/NOTIFY`,
poll-style on SQLite).
- `events.replay(into=ViewWriter)` rebuilds all materialised view
tables from `events` deterministically.
- Test: ingesting a fixed sequence of events, then rebuilding the
views from scratch, yields byte-identical materialised state.
- Event payloads use canonical CBOR (`manifest.codec`) so the same
bytes flow through registry → DB → tail consumer without re-encoding.
## D1.6 - Storage Adapter SPI And Local Filesystem Backend
```task
id: ARTIFACT-STORE-WP-0001-T003
status: done
priority: high
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"
```
Acceptance:
- `storage.spi.StorageBackend` Protocol matches the blueprint.
- `storage.backends.local.LocalBackend` implements the SPI:
- Object key layout `<root>/<algo>/<hex[0:2]>/<hex[2:4]>/<hex>`.
- Atomic write via `fsync(tmpfile) + rename`.
- Path traversal rejected at the SPI boundary.
- `health()` returns disk usage and root accessibility.
- Backend registry resolves by `backend_id` string (per ADR-0004).
- Unit tests cover: put, get, head, delete, double-put idempotency,
delete-of-missing, range read.
## D1.7 - Data Plane SPI And In-Process Implementation
```task
id: ARTIFACT-STORE-WP-0001-T012
status: done
priority: high
state_hub_task_id: "8cb8a245-beb5-4713-8d1d-8a623431ad81"
```
Acceptance:
- `dataplane.spi.DataPlane` Protocol matches ADR-0004.
- `dataplane.inproc.InProcessDataPlane` implements all five operations
on top of a configured `StorageBackend`.
- `ingest_stream` computes both digests in a single pass, writes to
the backend keyed by the primary content address, and returns an
`IngestResult` containing both digests, size, and the
`StorageReceipt`.
- `serve_object` and `verify_object` re-read bytes through the
backend; `verify_object` re-digests and returns mismatches if any.
- Lint rule (or test): no code outside `dataplane.*` imports
`storage.backends.*` directly.
## D1.8 - Registry Orchestrator (Library Surface)
```task
id: ARTIFACT-STORE-WP-0001-T013
status: done
priority: high
state_hub_task_id: "f4967308-4613-4def-8c09-41caaeb631f7"
```
Acceptance:
- `registry.Registry` exposes: `create_package`, `ingest_file`,
`finalize_package`, `get_manifest_bytes` (CBOR + JCS), `get_file`,
`tail_events`. Plus stubs for the retention operations that lighten
WP-0003.
- Each mutating operation is one DB transaction that writes events
AND updates materialised views.
- Finalisation writes one `v1.package.finalized` event whose payload
*is* the canonical CBOR manifest, and stamps `manifest_digest` on
`artifact_packages`.
- Duplicate `relative_path` within one not-yet-finalised package is
rejected unless an explicit replace is requested.
- Integration test: end-to-end ingest of a 3-file package against
local backend → finalize → read manifest → verify digests
→ tail events → replay rebuilds identical state.
## D1.9 - Minimal HTTP App And CLI
```task
id: ARTIFACT-STORE-WP-0001-T014
status: done
priority: medium
state_hub_task_id: "a43628ab-8b53-45fa-852a-ff0118dd12e7"
```
Acceptance:
- `api.http.app` is a FastAPI app with one route: `GET /health`
reporting registry liveness, DB connectivity, and backend health.
- `cli` exposes `artifactstore version`, `artifactstore migrate`,
`artifactstore replay`, `artifactstore health`.
- `make dev` starts the API on `127.0.0.1:8000` with SQLite +
local FS backend by default.
## D1.10 - Operator Documentation And ADR Cross-Linking
```task
id: ARTIFACT-STORE-WP-0001-T008
status: done
priority: medium
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"
```
Acceptance:
- `README.md` updated with current run / test / migrate commands.
- `AGENTS.md` "Current Repo Shape" section reflects the scaffold.
- An `docs/OPERATOR.md` page documents environment variables, local
vs PostgreSQL setup, replay command, and a smoke-test recipe.
- Every ADR is cross-linked from at least one of: blueprint, this
workplan, or `OPERATOR.md`.
## Suggested implementation order
1. T001 — scaffold and tooling (no other task can start without this).
2. T009 — digest abstraction (unblocks T010, T012).
3. T010 — manifest codec (unblocks T013).
4. T002 — schema and migrations (unblocks T011, T013).
5. T011 — event log + replay.
6. T003 — storage SPI + local backend.
7. T012 — data plane SPI + in-process impl.
8. T013 — registry orchestrator.
9. T014 — minimal HTTP app and CLI.
10. T008 — docs.
## Success criteria
- `make dev && make test` round-trips on a clean checkout.
- A scripted integration test ingests a directory of fixture files,
finalises the package, reads the manifest, downloads each file, and
verifies digests end-to-end against the local backend.
- Replaying events from sequence 1 reproduces the materialised view
state byte-for-byte.
- The library can be imported and exercised without an HTTP server
running (embedding test).