generated from coulomb/repo-seed
docs/OPERATOR.md (new): runbook with prerequisites, quick start, environment variables, SQLite + PostgreSQL setup, storage layout, CLI reference, HTTP /health reference, an end-to-end Python smoke test (create_package -> ingest_file -> finalize -> manifest), the replay / disaster-recovery procedure, common failure modes, and a References section that links every ADR (0001..0006), the blueprint, platform ambition, roadmap, and assembly experiment. README.md: refreshed to v0.1 baseline status. Quick-start uses the real flow (uv sync, migrate-fresh, dev, /health, artifactstore health). Make targets and CLI commands tabulated. Links docs/OPERATOR.md. AGENTS.md: Current Repo Shape now reflects the landed scaffold + library + CLI + HTTP app rather than "no runnable scaffold yet"; links OPERATOR.md and lists the canonical local commands. workplans/ARTIFACT-STORE-WP-0001-service-baseline.md: - T008 marked done. - frontmatter status: active -> done; updated: 2026-05-16. All ten WP-0001 tasks are now done (T001/T002/T003/T008/T009/T010/ T011/T012/T013/T014). Foundation workplan retires. Gates: ruff clean, mypy --strict clean, 83 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
320 lines
11 KiB
Markdown
320 lines
11 KiB
Markdown
---
|
|
id: ARTIFACT-STORE-WP-0001
|
|
type: workplan
|
|
title: "Foundation: Scaffold, Core Kernels, Local FS Backend"
|
|
repo: artifact-store
|
|
domain: stack
|
|
status: done
|
|
owner: codex
|
|
topic_slug: stack
|
|
planning_priority: high
|
|
planning_order: 1
|
|
created: "2026-05-15"
|
|
updated: "2026-05-16"
|
|
state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb"
|
|
---
|
|
|
|
# ARTIFACT-STORE-WP-0001: Foundation — Scaffold, Core Kernels, Local FS Backend
|
|
|
|
## Purpose
|
|
|
|
Stand up the smallest credible `artifact-store` core. By the end of
|
|
this workplan, the library can ingest a directory of files into a
|
|
package, compute dual digests, write canonical-CBOR manifests, persist
|
|
state through the append-only event log, store bytes on local
|
|
filesystem, and replay materialised views from the event log. No HTTP
|
|
API yet (that lands in WP-0002); a `/health` endpoint exists so that
|
|
the dev loop has something to hit.
|
|
|
|
The shape is **library-first** (ffmpeg-style). HTTP server and CLI are
|
|
explicitly thin consumers of `artifactstore.registry`.
|
|
|
|
## Constraints (must satisfy)
|
|
|
|
- ADR-0001 — content-addressed storage with dual digest.
|
|
- ADR-0002 — append-only event log as source of truth.
|
|
- ADR-0003 — manifest canonicalisation = canonical CBOR.
|
|
- ADR-0004 — control plane / data plane SPI named.
|
|
- ADR-0005 — v1 technology stack pinned (Python 3.12, uv, FastAPI,
|
|
SQLAlchemy Core, asyncpg, alembic, cbor2, blake3, ruff, mypy, pytest).
|
|
- ADR-0006 — OCI compatibility kept reachable.
|
|
- `docs/ARCHITECTURE-BLUEPRINT.md` data model and module layout.
|
|
|
|
## Boundary
|
|
|
|
This workplan builds the library and a minimal `/health` endpoint. It
|
|
does NOT implement: package CRUD HTTP API (WP-0002), retention rules
|
|
beyond the seed (WP-0003), S3-compatible backend (WP-0004), guide-board
|
|
producer wiring (WP-0005), GC of unreferenced bytes (WP-0006).
|
|
|
|
## Target architecture (this workplan)
|
|
|
|
```text
|
|
artifactstore (library)
|
|
identity ──┐
|
|
manifest ──┼──> registry (orchestrator) ──> events (WAL + views)
|
|
events ───┘ │
|
|
retention (seed only) └──> dataplane.spi ──> dataplane.inproc ──> storage.spi ──> storage.backends.local
|
|
audit (view) └──> filesystem
|
|
storage.spi
|
|
dataplane.spi + inproc
|
|
api.http (just /health)
|
|
cli (just `artifactstore version`, `artifactstore migrate`, `artifactstore replay`)
|
|
```
|
|
|
|
## D1.1 - Service Scaffold And Repository Identity
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T001
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `pyproject.toml` with `hatchling` build backend, pinned dependencies
|
|
per ADR-0005.
|
|
- `uv.lock` committed.
|
|
- `Makefile` exposes: `make dev`, `make test`, `make lint`, `make
|
|
type`, `make migrate`. Each target is a thin shim, no logic inline.
|
|
- `src/artifactstore/` package skeleton matches ADR-0005's layout
|
|
(empty `__init__.py` and one placeholder module per top-level
|
|
concern: `identity`, `manifest`, `events`, `retention`, `audit`,
|
|
`storage`, `dataplane`, `registry`, `api/http`, `cli`, `config`).
|
|
- `tests/{unit,integration}/conftest.py` in place.
|
|
- `.env.example` documents required environment variables:
|
|
`ARTIFACTSTORE_DATABASE_URL`, `ARTIFACTSTORE_STORAGE_LOCAL_ROOT`,
|
|
`ARTIFACTSTORE_LOG_LEVEL`.
|
|
- CI-equivalent local commands: `make lint && make type && make test`
|
|
pass on a clean checkout.
|
|
- `README.md` replaces the seed README: install with `uv sync`, run
|
|
with `make dev`, test with `make test`, links to ADRs and blueprint.
|
|
|
|
## D1.2 - Digest Abstraction And Content Address
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T009
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "4dc465c5-5c14-412d-b8c0-aa84076e4560"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `identity.Digest` value type with `algorithm: str` and `hex: str`,
|
|
immutable, hashable.
|
|
- `identity.ContentAddress` — string-form `<algorithm>:<hex>` with
|
|
validating parser and emitter.
|
|
- `identity.digest_stream(reader) -> {primary: Digest, sha256: Digest}` —
|
|
single-pass dual-hash over an `AsyncIterator[bytes]`. Default primary
|
|
algorithm: `blake3`.
|
|
- Algorithm registry with `blake3` and `sha256` registered at import.
|
|
- Property test: digest over random byte sequences round-trips through
|
|
serialisation; `sha256` matches `hashlib.sha256(...).hexdigest()`;
|
|
`blake3` matches `blake3.blake3(...).hexdigest()`.
|
|
|
|
## D1.3 - Manifest Codec (Canonical CBOR + JCS Projection)
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T010
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "8b45a3d9-aa19-4ae8-afe0-687417bf12d0"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `manifest.Manifest` dataclass with the v1 fields enumerated in the
|
|
blueprint (`manifest_version=1`, package, files, storage_receipts,
|
|
retention_summary, provenance).
|
|
- `manifest.codec.encode(m) -> bytes` produces canonical CBOR
|
|
(RFC 8949 §4.2.2): definite-length, shortest-form integers,
|
|
sorted map keys.
|
|
- `manifest.codec.decode(b) -> Manifest`.
|
|
- `manifest.projection.jcs(m) -> bytes` produces RFC 8785 canonical
|
|
JSON.
|
|
- Property test: `decode(encode(m)) == m` for randomly-generated
|
|
manifests; `encode(decode(jcs_to_cbor(jcs(m)))) == encode(m)`.
|
|
- Manifest digest helper: `manifest_digest(m) -> ContentAddress` using
|
|
BLAKE3 over the canonical CBOR bytes.
|
|
|
|
## D1.4 - Registry Data Model And Migrations
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T002
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- Alembic configured with `migrations/` directory; `alembic upgrade
|
|
head` works against both SQLite (dev) and PostgreSQL (prod).
|
|
- `events`, `artifact_packages`, `artifact_files`, `storage_locations`,
|
|
`retention_classes`, `retention_state`, `metadata_schemas` tables
|
|
match the blueprint schema.
|
|
- Seed migration populates `retention_classes` with the five v1 entries.
|
|
- A `make migrate` and `make migrate-fresh` target work end-to-end on
|
|
a clean DB.
|
|
- All schema columns required by ADR-0001 (`digest_algorithm`,
|
|
`digest_primary`, `digest_sha256`, `content_address`), ADR-0002
|
|
(full `events` table), and the blueprint's `retrieval_tier` and
|
|
`restore_status` are present.
|
|
|
|
## D1.5 - Event Log Persistence And Replay
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T011
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "90fce17d-cce5-4687-ae9e-02abd7d92622"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `events.write(transaction, Event)` writes one row in the given DB
|
|
transaction. Sequence numbers are assigned by the DB
|
|
(`BIGSERIAL`) and are guaranteed monotonic and gapless within a
|
|
registry instance.
|
|
- `events.tail(since_sequence) -> AsyncIterator[Event]` long-polls
|
|
the table (notify-style on PostgreSQL via `LISTEN/NOTIFY`,
|
|
poll-style on SQLite).
|
|
- `events.replay(into=ViewWriter)` rebuilds all materialised view
|
|
tables from `events` deterministically.
|
|
- Test: ingesting a fixed sequence of events, then rebuilding the
|
|
views from scratch, yields byte-identical materialised state.
|
|
- Event payloads use canonical CBOR (`manifest.codec`) so the same
|
|
bytes flow through registry → DB → tail consumer without re-encoding.
|
|
|
|
## D1.6 - Storage Adapter SPI And Local Filesystem Backend
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T003
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `storage.spi.StorageBackend` Protocol matches the blueprint.
|
|
- `storage.backends.local.LocalBackend` implements the SPI:
|
|
- Object key layout `<root>/<algo>/<hex[0:2]>/<hex[2:4]>/<hex>`.
|
|
- Atomic write via `fsync(tmpfile) + rename`.
|
|
- Path traversal rejected at the SPI boundary.
|
|
- `health()` returns disk usage and root accessibility.
|
|
- Backend registry resolves by `backend_id` string (per ADR-0004).
|
|
- Unit tests cover: put, get, head, delete, double-put idempotency,
|
|
delete-of-missing, range read.
|
|
|
|
## D1.7 - Data Plane SPI And In-Process Implementation
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T012
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "8cb8a245-beb5-4713-8d1d-8a623431ad81"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `dataplane.spi.DataPlane` Protocol matches ADR-0004.
|
|
- `dataplane.inproc.InProcessDataPlane` implements all five operations
|
|
on top of a configured `StorageBackend`.
|
|
- `ingest_stream` computes both digests in a single pass, writes to
|
|
the backend keyed by the primary content address, and returns an
|
|
`IngestResult` containing both digests, size, and the
|
|
`StorageReceipt`.
|
|
- `serve_object` and `verify_object` re-read bytes through the
|
|
backend; `verify_object` re-digests and returns mismatches if any.
|
|
- Lint rule (or test): no code outside `dataplane.*` imports
|
|
`storage.backends.*` directly.
|
|
|
|
## D1.8 - Registry Orchestrator (Library Surface)
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T013
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "f4967308-4613-4def-8c09-41caaeb631f7"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `registry.Registry` exposes: `create_package`, `ingest_file`,
|
|
`finalize_package`, `get_manifest_bytes` (CBOR + JCS), `get_file`,
|
|
`tail_events`. Plus stubs for the retention operations that lighten
|
|
WP-0003.
|
|
- Each mutating operation is one DB transaction that writes events
|
|
AND updates materialised views.
|
|
- Finalisation writes one `v1.package.finalized` event whose payload
|
|
*is* the canonical CBOR manifest, and stamps `manifest_digest` on
|
|
`artifact_packages`.
|
|
- Duplicate `relative_path` within one not-yet-finalised package is
|
|
rejected unless an explicit replace is requested.
|
|
- Integration test: end-to-end ingest of a 3-file package against
|
|
local backend → finalize → read manifest → verify digests
|
|
→ tail events → replay rebuilds identical state.
|
|
|
|
## D1.9 - Minimal HTTP App And CLI
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T014
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "a43628ab-8b53-45fa-852a-ff0118dd12e7"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `api.http.app` is a FastAPI app with one route: `GET /health`
|
|
reporting registry liveness, DB connectivity, and backend health.
|
|
- `cli` exposes `artifactstore version`, `artifactstore migrate`,
|
|
`artifactstore replay`, `artifactstore health`.
|
|
- `make dev` starts the API on `127.0.0.1:8000` with SQLite +
|
|
local FS backend by default.
|
|
|
|
## D1.10 - Operator Documentation And ADR Cross-Linking
|
|
|
|
```task
|
|
id: ARTIFACT-STORE-WP-0001-T008
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"
|
|
```
|
|
|
|
Acceptance:
|
|
|
|
- `README.md` updated with current run / test / migrate commands.
|
|
- `AGENTS.md` "Current Repo Shape" section reflects the scaffold.
|
|
- An `docs/OPERATOR.md` page documents environment variables, local
|
|
vs PostgreSQL setup, replay command, and a smoke-test recipe.
|
|
- Every ADR is cross-linked from at least one of: blueprint, this
|
|
workplan, or `OPERATOR.md`.
|
|
|
|
## Suggested implementation order
|
|
|
|
1. T001 — scaffold and tooling (no other task can start without this).
|
|
2. T009 — digest abstraction (unblocks T010, T012).
|
|
3. T010 — manifest codec (unblocks T013).
|
|
4. T002 — schema and migrations (unblocks T011, T013).
|
|
5. T011 — event log + replay.
|
|
6. T003 — storage SPI + local backend.
|
|
7. T012 — data plane SPI + in-process impl.
|
|
8. T013 — registry orchestrator.
|
|
9. T014 — minimal HTTP app and CLI.
|
|
10. T008 — docs.
|
|
|
|
## Success criteria
|
|
|
|
- `make dev && make test` round-trips on a clean checkout.
|
|
- A scripted integration test ingests a directory of fixture files,
|
|
finalises the package, reads the manifest, downloads each file, and
|
|
verifies digests end-to-end against the local backend.
|
|
- Replaying events from sequence 1 reproduces the materialised view
|
|
state byte-for-byte.
|
|
- The library can be imported and exercised without an HTTP server
|
|
running (embedding test).
|