Files
artifact-store/docs/adr/0004-control-plane-data-plane-contract.md
tegwick 747afc27a6 docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00

3.4 KiB

ADR-0004 — Control Plane / Data Plane Contract

Status: accepted Date: 2026-05-15 Related: ADR-0005, docs/PLATFORM-AMBITION.md commitment A5, docs/ASSEMBLY-EXPERIMENT.md

Context

The platform ambition expects a Rust (eventually asm-tuned) data plane to handle hot ingest paths — hashing, chunking, optional compression and encryption, storage backend I/O. The v1 service is written entirely in Python (ADR-0005). The cost of conflating control and data planes at the code level is that extracting the data plane later requires API churn, test rework, and producer migrations.

The cost of separating them now is one named module boundary and one in-process protocol shape. That cost is essentially free if taken before any consumer exists.

Decision

  1. The Python package is organised so that every byte-handling operation lives behind a named contract:
    • artifactstore.dataplane.spi — the abstract surface (typed dataclasses, async iterator protocols).
    • artifactstore.dataplane.inproc — the v1 implementation, running in the same process as the control plane.
  2. The control plane (artifactstore.registry, artifactstore.api.http, artifactstore.retention, artifactstore.audit) interacts with bytes only through the SPI. No HTTP handler, no DB writer, no retention rule ever reads or writes file bytes directly.
  3. The SPI exposes exactly these operations:
    • ingest_stream(stream, hints) -> IngestResult — consumes an upload, returns content addresses, sizes, and storage receipts.
    • serve_object(content_address, range?) -> AsyncIterator[bytes] — produces bytes for a download.
    • verify_object(content_address) -> VerifyResult — re-reads bytes, re-digests, returns mismatches.
    • delete_object(content_address) -> DeletionResult — best-effort, idempotent.
    • backend_health() -> BackendStatus — readiness, latency, free capacity.
  4. The SPI surface is the contract a future Rust daemon must satisfy. When that daemon ships, artifactstore.dataplane.inproc is replaced by artifactstore.dataplane.remote (a thin gRPC or framed-bincode-over-Unix-socket client). The control plane sees no change.
  5. SPI parameter and return types are CBOR-serialisable today, even when nothing serialises them. This lets us toggle to RPC without rewriting types.

Consequences

Positive:

  • The data plane can be rewritten in Rust later with zero API churn.
  • Tests can fake the SPI cheaply; integration tests pin the contract.
  • The CLI in artifactstore.cli is a second consumer of the SPI on equal footing with the HTTP server.
  • Operators with strong embedding requirements can use the in-process data plane forever; nothing forces the RPC hop.

Negative:

  • One extra abstraction layer in v1. Mitigated by the contract being narrow (five operations).
  • Discipline required: PRs that bypass the SPI are rejected. A linter rule (forbidden import: artifactstore.api.* -> filesystem) makes this mechanical.

Implementation notes

  • The SPI is a Protocol (typing.Protocol) in dataplane/spi.py so the in-process and future remote impls don't share an inheritance tree.
  • Streaming returns AsyncIterator[bytes] so neither full-file buffering nor sendfile() zero-copy is foreclosed.
  • The IngestResult payload is the canonical CBOR-able value used in events (ADR-0002). The same byte sequence flows API → SPI → event.