generated from coulomb/repo-seed
Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
ADR-0004 — Control Plane / Data Plane Contract
Status: accepted
Date: 2026-05-15
Related: ADR-0005, docs/PLATFORM-AMBITION.md commitment A5,
docs/ASSEMBLY-EXPERIMENT.md
Context
The platform ambition expects a Rust (eventually asm-tuned) data plane to handle hot ingest paths — hashing, chunking, optional compression and encryption, storage backend I/O. The v1 service is written entirely in Python (ADR-0005). The cost of conflating control and data planes at the code level is that extracting the data plane later requires API churn, test rework, and producer migrations.
The cost of separating them now is one named module boundary and one in-process protocol shape. That cost is essentially free if taken before any consumer exists.
Decision
- The Python package is organised so that every byte-handling
operation lives behind a named contract:
artifactstore.dataplane.spi— the abstract surface (typed dataclasses, async iterator protocols).artifactstore.dataplane.inproc— the v1 implementation, running in the same process as the control plane.
- The control plane (
artifactstore.registry,artifactstore.api.http,artifactstore.retention,artifactstore.audit) interacts with bytes only through the SPI. No HTTP handler, no DB writer, no retention rule ever reads or writes file bytes directly. - The SPI exposes exactly these operations:
ingest_stream(stream, hints) -> IngestResult— consumes an upload, returns content addresses, sizes, and storage receipts.serve_object(content_address, range?) -> AsyncIterator[bytes]— produces bytes for a download.verify_object(content_address) -> VerifyResult— re-reads bytes, re-digests, returns mismatches.delete_object(content_address) -> DeletionResult— best-effort, idempotent.backend_health() -> BackendStatus— readiness, latency, free capacity.
- The SPI surface is the contract a future Rust daemon must satisfy.
When that daemon ships,
artifactstore.dataplane.inprocis replaced byartifactstore.dataplane.remote(a thin gRPC or framed-bincode-over-Unix-socket client). The control plane sees no change. - SPI parameter and return types are CBOR-serialisable today, even when nothing serialises them. This lets us toggle to RPC without rewriting types.
Consequences
Positive:
- The data plane can be rewritten in Rust later with zero API churn.
- Tests can fake the SPI cheaply; integration tests pin the contract.
- The CLI in
artifactstore.cliis a second consumer of the SPI on equal footing with the HTTP server. - Operators with strong embedding requirements can use the in-process data plane forever; nothing forces the RPC hop.
Negative:
- One extra abstraction layer in v1. Mitigated by the contract being narrow (five operations).
- Discipline required: PRs that bypass the SPI are rejected. A linter
rule (forbidden import:
artifactstore.api.* -> filesystem) makes this mechanical.
Implementation notes
- The SPI is a
Protocol(typing.Protocol) indataplane/spi.pyso the in-process and future remote impls don't share an inheritance tree. - Streaming returns
AsyncIterator[bytes]so neither full-file buffering norsendfile()zero-copy is foreclosed. - The
IngestResultpayload is the canonical CBOR-able value used in events (ADR-0002). The same byte sequence flows API → SPI → event.