generated from coulomb/repo-seed
Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
80 lines
3.4 KiB
Markdown
80 lines
3.4 KiB
Markdown
# ADR-0004 — Control Plane / Data Plane Contract
|
|
|
|
Status: accepted
|
|
Date: 2026-05-15
|
|
Related: ADR-0005, `docs/PLATFORM-AMBITION.md` commitment A5,
|
|
`docs/ASSEMBLY-EXPERIMENT.md`
|
|
|
|
## Context
|
|
|
|
The platform ambition expects a Rust (eventually asm-tuned) data plane
|
|
to handle hot ingest paths — hashing, chunking, optional compression and
|
|
encryption, storage backend I/O. The v1 service is written entirely in
|
|
Python (ADR-0005). The cost of conflating control and data planes at the
|
|
code level is that extracting the data plane later requires API churn,
|
|
test rework, and producer migrations.
|
|
|
|
The cost of separating them now is one named module boundary and one
|
|
in-process protocol shape. That cost is essentially free if taken
|
|
before any consumer exists.
|
|
|
|
## Decision
|
|
|
|
1. The Python package is organised so that *every byte-handling
|
|
operation* lives behind a named contract:
|
|
- `artifactstore.dataplane.spi` — the abstract surface (typed
|
|
dataclasses, async iterator protocols).
|
|
- `artifactstore.dataplane.inproc` — the v1 implementation, running
|
|
in the same process as the control plane.
|
|
2. The control plane (`artifactstore.registry`, `artifactstore.api.http`,
|
|
`artifactstore.retention`, `artifactstore.audit`) interacts with
|
|
bytes *only* through the SPI. No HTTP handler, no DB writer, no
|
|
retention rule ever reads or writes file bytes directly.
|
|
3. The SPI exposes exactly these operations:
|
|
- `ingest_stream(stream, hints) -> IngestResult` — consumes an
|
|
upload, returns content addresses, sizes, and storage receipts.
|
|
- `serve_object(content_address, range?) -> AsyncIterator[bytes]` —
|
|
produces bytes for a download.
|
|
- `verify_object(content_address) -> VerifyResult` — re-reads bytes,
|
|
re-digests, returns mismatches.
|
|
- `delete_object(content_address) -> DeletionResult` — best-effort,
|
|
idempotent.
|
|
- `backend_health() -> BackendStatus` — readiness, latency, free
|
|
capacity.
|
|
4. The SPI surface is the contract a future Rust daemon must satisfy.
|
|
When that daemon ships, `artifactstore.dataplane.inproc` is replaced
|
|
by `artifactstore.dataplane.remote` (a thin gRPC or
|
|
framed-bincode-over-Unix-socket client). The control plane sees no
|
|
change.
|
|
5. SPI parameter and return types are CBOR-serialisable today, even when
|
|
nothing serialises them. This lets us toggle to RPC without rewriting
|
|
types.
|
|
|
|
## Consequences
|
|
|
|
Positive:
|
|
|
|
- The data plane can be rewritten in Rust later with zero API churn.
|
|
- Tests can fake the SPI cheaply; integration tests pin the contract.
|
|
- The CLI in `artifactstore.cli` is a second consumer of the SPI on
|
|
equal footing with the HTTP server.
|
|
- Operators with strong embedding requirements can use the in-process
|
|
data plane forever; nothing forces the RPC hop.
|
|
|
|
Negative:
|
|
|
|
- One extra abstraction layer in v1. Mitigated by the contract being
|
|
narrow (five operations).
|
|
- Discipline required: PRs that bypass the SPI are rejected. A linter
|
|
rule (forbidden import: `artifactstore.api.* -> filesystem`) makes
|
|
this mechanical.
|
|
|
|
## Implementation notes
|
|
|
|
- The SPI is a `Protocol` (typing.Protocol) in `dataplane/spi.py` so the
|
|
in-process and future remote impls don't share an inheritance tree.
|
|
- Streaming returns `AsyncIterator[bytes]` so neither full-file buffering
|
|
nor `sendfile()` zero-copy is foreclosed.
|
|
- The `IngestResult` payload is the canonical CBOR-able value used in
|
|
events (ADR-0002). The same byte sequence flows API → SPI → event.
|