Files
artifact-store/workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
tegwick f90c761ef6 WP-0001-T008: operator docs + ADR cross-linking; mark WP-0001 done
docs/OPERATOR.md (new): runbook with prerequisites, quick start,
environment variables, SQLite + PostgreSQL setup, storage layout,
CLI reference, HTTP /health reference, an end-to-end Python smoke
test (create_package -> ingest_file -> finalize -> manifest), the
replay / disaster-recovery procedure, common failure modes, and a
References section that links every ADR (0001..0006), the
blueprint, platform ambition, roadmap, and assembly experiment.

README.md: refreshed to v0.1 baseline status. Quick-start uses the
real flow (uv sync, migrate-fresh, dev, /health, artifactstore health).
Make targets and CLI commands tabulated. Links docs/OPERATOR.md.

AGENTS.md: Current Repo Shape now reflects the landed scaffold +
library + CLI + HTTP app rather than "no runnable scaffold yet";
links OPERATOR.md and lists the canonical local commands.

workplans/ARTIFACT-STORE-WP-0001-service-baseline.md:
- T008 marked done.
- frontmatter status: active -> done; updated: 2026-05-16.

All ten WP-0001 tasks are now done (T001/T002/T003/T008/T009/T010/
T011/T012/T013/T014). Foundation workplan retires.

Gates: ruff clean, mypy --strict clean, 83 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 09:02:36 +02:00

11 KiB

id, type, title, repo, domain, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
id type title repo domain status owner topic_slug planning_priority planning_order created updated state_hub_workstream_id
ARTIFACT-STORE-WP-0001 workplan Foundation: Scaffold, Core Kernels, Local FS Backend artifact-store stack done codex stack high 1 2026-05-15 2026-05-16 aebf996c-8721-4e8c-9e56-61d5e4bf8dcb

ARTIFACT-STORE-WP-0001: Foundation — Scaffold, Core Kernels, Local FS Backend

Purpose

Stand up the smallest credible artifact-store core. By the end of this workplan, the library can ingest a directory of files into a package, compute dual digests, write canonical-CBOR manifests, persist state through the append-only event log, store bytes on local filesystem, and replay materialised views from the event log. No HTTP API yet (that lands in WP-0002); a /health endpoint exists so that the dev loop has something to hit.

The shape is library-first (ffmpeg-style). HTTP server and CLI are explicitly thin consumers of artifactstore.registry.

Constraints (must satisfy)

  • ADR-0001 — content-addressed storage with dual digest.
  • ADR-0002 — append-only event log as source of truth.
  • ADR-0003 — manifest canonicalisation = canonical CBOR.
  • ADR-0004 — control plane / data plane SPI named.
  • ADR-0005 — v1 technology stack pinned (Python 3.12, uv, FastAPI, SQLAlchemy Core, asyncpg, alembic, cbor2, blake3, ruff, mypy, pytest).
  • ADR-0006 — OCI compatibility kept reachable.
  • docs/ARCHITECTURE-BLUEPRINT.md data model and module layout.

Boundary

This workplan builds the library and a minimal /health endpoint. It does NOT implement: package CRUD HTTP API (WP-0002), retention rules beyond the seed (WP-0003), S3-compatible backend (WP-0004), guide-board producer wiring (WP-0005), GC of unreferenced bytes (WP-0006).

Target architecture (this workplan)

artifactstore (library)
  identity ──┐
  manifest ──┼──> registry (orchestrator) ──> events (WAL + views)
  events  ───┘                       │
  retention (seed only)              └──> dataplane.spi ──> dataplane.inproc ──> storage.spi ──> storage.backends.local
  audit (view)                                                                                  └──> filesystem
  storage.spi
  dataplane.spi + inproc
api.http (just /health)
cli (just `artifactstore version`, `artifactstore migrate`, `artifactstore replay`)

D1.1 - Service Scaffold And Repository Identity

id: ARTIFACT-STORE-WP-0001-T001
status: done
priority: high
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"

Acceptance:

  • pyproject.toml with hatchling build backend, pinned dependencies per ADR-0005.
  • uv.lock committed.
  • Makefile exposes: make dev, make test, make lint, make type, make migrate. Each target is a thin shim, no logic inline.
  • src/artifactstore/ package skeleton matches ADR-0005's layout (empty __init__.py and one placeholder module per top-level concern: identity, manifest, events, retention, audit, storage, dataplane, registry, api/http, cli, config).
  • tests/{unit,integration}/conftest.py in place.
  • .env.example documents required environment variables: ARTIFACTSTORE_DATABASE_URL, ARTIFACTSTORE_STORAGE_LOCAL_ROOT, ARTIFACTSTORE_LOG_LEVEL.
  • CI-equivalent local commands: make lint && make type && make test pass on a clean checkout.
  • README.md replaces the seed README: install with uv sync, run with make dev, test with make test, links to ADRs and blueprint.

D1.2 - Digest Abstraction And Content Address

id: ARTIFACT-STORE-WP-0001-T009
status: done
priority: high
state_hub_task_id: "4dc465c5-5c14-412d-b8c0-aa84076e4560"

Acceptance:

  • identity.Digest value type with algorithm: str and hex: str, immutable, hashable.
  • identity.ContentAddress — string-form <algorithm>:<hex> with validating parser and emitter.
  • identity.digest_stream(reader) -> {primary: Digest, sha256: Digest} — single-pass dual-hash over an AsyncIterator[bytes]. Default primary algorithm: blake3.
  • Algorithm registry with blake3 and sha256 registered at import.
  • Property test: digest over random byte sequences round-trips through serialisation; sha256 matches hashlib.sha256(...).hexdigest(); blake3 matches blake3.blake3(...).hexdigest().

D1.3 - Manifest Codec (Canonical CBOR + JCS Projection)

id: ARTIFACT-STORE-WP-0001-T010
status: done
priority: high
state_hub_task_id: "8b45a3d9-aa19-4ae8-afe0-687417bf12d0"

Acceptance:

  • manifest.Manifest dataclass with the v1 fields enumerated in the blueprint (manifest_version=1, package, files, storage_receipts, retention_summary, provenance).
  • manifest.codec.encode(m) -> bytes produces canonical CBOR (RFC 8949 §4.2.2): definite-length, shortest-form integers, sorted map keys.
  • manifest.codec.decode(b) -> Manifest.
  • manifest.projection.jcs(m) -> bytes produces RFC 8785 canonical JSON.
  • Property test: decode(encode(m)) == m for randomly-generated manifests; encode(decode(jcs_to_cbor(jcs(m)))) == encode(m).
  • Manifest digest helper: manifest_digest(m) -> ContentAddress using BLAKE3 over the canonical CBOR bytes.

D1.4 - Registry Data Model And Migrations

id: ARTIFACT-STORE-WP-0001-T002
status: done
priority: high
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"

Acceptance:

  • Alembic configured with migrations/ directory; alembic upgrade head works against both SQLite (dev) and PostgreSQL (prod).
  • events, artifact_packages, artifact_files, storage_locations, retention_classes, retention_state, metadata_schemas tables match the blueprint schema.
  • Seed migration populates retention_classes with the five v1 entries.
  • A make migrate and make migrate-fresh target work end-to-end on a clean DB.
  • All schema columns required by ADR-0001 (digest_algorithm, digest_primary, digest_sha256, content_address), ADR-0002 (full events table), and the blueprint's retrieval_tier and restore_status are present.

D1.5 - Event Log Persistence And Replay

id: ARTIFACT-STORE-WP-0001-T011
status: done
priority: high
state_hub_task_id: "90fce17d-cce5-4687-ae9e-02abd7d92622"

Acceptance:

  • events.write(transaction, Event) writes one row in the given DB transaction. Sequence numbers are assigned by the DB (BIGSERIAL) and are guaranteed monotonic and gapless within a registry instance.
  • events.tail(since_sequence) -> AsyncIterator[Event] long-polls the table (notify-style on PostgreSQL via LISTEN/NOTIFY, poll-style on SQLite).
  • events.replay(into=ViewWriter) rebuilds all materialised view tables from events deterministically.
  • Test: ingesting a fixed sequence of events, then rebuilding the views from scratch, yields byte-identical materialised state.
  • Event payloads use canonical CBOR (manifest.codec) so the same bytes flow through registry → DB → tail consumer without re-encoding.

D1.6 - Storage Adapter SPI And Local Filesystem Backend

id: ARTIFACT-STORE-WP-0001-T003
status: done
priority: high
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"

Acceptance:

  • storage.spi.StorageBackend Protocol matches the blueprint.
  • storage.backends.local.LocalBackend implements the SPI:
    • Object key layout <root>/<algo>/<hex[0:2]>/<hex[2:4]>/<hex>.
    • Atomic write via fsync(tmpfile) + rename.
    • Path traversal rejected at the SPI boundary.
    • health() returns disk usage and root accessibility.
  • Backend registry resolves by backend_id string (per ADR-0004).
  • Unit tests cover: put, get, head, delete, double-put idempotency, delete-of-missing, range read.

D1.7 - Data Plane SPI And In-Process Implementation

id: ARTIFACT-STORE-WP-0001-T012
status: done
priority: high
state_hub_task_id: "8cb8a245-beb5-4713-8d1d-8a623431ad81"

Acceptance:

  • dataplane.spi.DataPlane Protocol matches ADR-0004.
  • dataplane.inproc.InProcessDataPlane implements all five operations on top of a configured StorageBackend.
  • ingest_stream computes both digests in a single pass, writes to the backend keyed by the primary content address, and returns an IngestResult containing both digests, size, and the StorageReceipt.
  • serve_object and verify_object re-read bytes through the backend; verify_object re-digests and returns mismatches if any.
  • Lint rule (or test): no code outside dataplane.* imports storage.backends.* directly.

D1.8 - Registry Orchestrator (Library Surface)

id: ARTIFACT-STORE-WP-0001-T013
status: done
priority: high
state_hub_task_id: "f4967308-4613-4def-8c09-41caaeb631f7"

Acceptance:

  • registry.Registry exposes: create_package, ingest_file, finalize_package, get_manifest_bytes (CBOR + JCS), get_file, tail_events. Plus stubs for the retention operations that lighten WP-0003.
  • Each mutating operation is one DB transaction that writes events AND updates materialised views.
  • Finalisation writes one v1.package.finalized event whose payload is the canonical CBOR manifest, and stamps manifest_digest on artifact_packages.
  • Duplicate relative_path within one not-yet-finalised package is rejected unless an explicit replace is requested.
  • Integration test: end-to-end ingest of a 3-file package against local backend → finalize → read manifest → verify digests → tail events → replay rebuilds identical state.

D1.9 - Minimal HTTP App And CLI

id: ARTIFACT-STORE-WP-0001-T014
status: done
priority: medium
state_hub_task_id: "a43628ab-8b53-45fa-852a-ff0118dd12e7"

Acceptance:

  • api.http.app is a FastAPI app with one route: GET /health reporting registry liveness, DB connectivity, and backend health.
  • cli exposes artifactstore version, artifactstore migrate, artifactstore replay, artifactstore health.
  • make dev starts the API on 127.0.0.1:8000 with SQLite + local FS backend by default.

D1.10 - Operator Documentation And ADR Cross-Linking

id: ARTIFACT-STORE-WP-0001-T008
status: done
priority: medium
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"

Acceptance:

  • README.md updated with current run / test / migrate commands.
  • AGENTS.md "Current Repo Shape" section reflects the scaffold.
  • An docs/OPERATOR.md page documents environment variables, local vs PostgreSQL setup, replay command, and a smoke-test recipe.
  • Every ADR is cross-linked from at least one of: blueprint, this workplan, or OPERATOR.md.

Suggested implementation order

  1. T001 — scaffold and tooling (no other task can start without this).
  2. T009 — digest abstraction (unblocks T010, T012).
  3. T010 — manifest codec (unblocks T013).
  4. T002 — schema and migrations (unblocks T011, T013).
  5. T011 — event log + replay.
  6. T003 — storage SPI + local backend.
  7. T012 — data plane SPI + in-process impl.
  8. T013 — registry orchestrator.
  9. T014 — minimal HTTP app and CLI.
  10. T008 — docs.

Success criteria

  • make dev && make test round-trips on a clean checkout.
  • A scripted integration test ingests a directory of fixture files, finalises the package, reads the manifest, downloads each file, and verifies digests end-to-end against the local backend.
  • Replaying events from sequence 1 reproduces the materialised view state byte-for-byte.
  • The library can be imported and exercised without an HTTP server running (embedding test).