Files
artifact-store/docs/adr/0005-v1-tech-stack.md
tegwick 747afc27a6 docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00

5.7 KiB

ADR-0005 — V1 Technology Stack

Status: accepted Date: 2026-05-15 Related: ADR-0001, ADR-0002, ADR-0003, ADR-0004

Context

WP-0001 ("Foundation") cannot start without a pinned stack. The decision needs to balance:

  • ffmpeg / VLC philosophy: minimal dependency budget, sharp boundaries, native code at the hot edges, plain tools.
  • Python is already implied by .gitignore and ecosystem fit (StateHub, guide-board, open-cmis-tck are all Python-leaning).
  • The data plane will eventually be Rust (ADR-0004); the control plane stays in Python and must stay approachable.

Decision

Concern Choice Rationale
Language (control plane) Python 3.12+ Async ecosystem, type hints, matches sibling repos. 3.12 specifically: PEP 695 generics, faster CPython, sys.monitoring.
Package / project manager uv Single static binary, fast resolver, lockfile-first, replaces pip + pip-tools + venv + pipx in one tool.
Build backend hatchling (via pyproject.toml) Standards-track PEP 517 backend. No magic.
HTTP framework FastAPI (Starlette + Pydantic v2) OpenAPI generation, async-native, broad community.
ASGI server uvicorn (dev), gunicorn + uvicorn workers (prod) Plain, well-understood.
Database (prod) PostgreSQL 16+ Source-of-truth event log (ADR-0002) wants BIGSERIAL, BYTEA, advisory locks, logical replication.
Database (dev/embedded) SQLite (WAL mode) Zero-dependency local. Schema is portable when we use SQLAlchemy Core.
DB access SQLAlchemy 2.0 Core + asyncpg (prod) / aiosqlite (dev) Core, not ORM — explicit SQL, async drivers. Migrations live below the API surface.
Migrations Alembic Standard, integrates with SQLAlchemy Core, supports both pg and sqlite.
Hashing stdlib hashlib for SHA-256, blake3 PyPI wheel for BLAKE3 blake3 wheel embeds the SIMD-tuned Rust impl with no build-time toolchain.
Serialisation cbor2 for canonical CBOR (ADR-0003); stdlib json for JCS or jcs PyPI Smallest deps that satisfy ADR-0003.
CLI typer (atop click) Sits on FastAPI's Pydantic types cleanly; type-driven CLI surface.
Tests pytest + httpx + trio-asyncio-free pytest-asyncio Standard.
Lint / format ruff (lint + format) One tool replaces black + isort + flake8 + pyupgrade.
Type checker mypy in --strict Pyright is acceptable for editor support; CI gate is mypy.
Logging stdlib logging + structlog for structured output No exotic deps.
Metrics / tracing OpenTelemetry SDK (deferred to its own workplan) Listed for forward-compatibility; not a v1 dep.

Project layout

artifact-store/
├── pyproject.toml
├── uv.lock
├── Makefile                              # thin shim: make dev / test / lint / type / migrate
├── alembic.ini
├── src/
│   └── artifactstore/
│       ├── __init__.py
│       ├── identity/                     # content address, digest abstraction (ADR-0001)
│       ├── manifest/                     # canonical CBOR, JCS projection (ADR-0003)
│       ├── events/                       # append-only log + replayer (ADR-0002)
│       ├── retention/                    # policy engine
│       ├── audit/                        # audit emission as event subset
│       ├── storage/                      # adapter SPI + backend registry
│       │   ├── spi.py
│       │   └── backends/
│       │       ├── local.py              # filesystem backend
│       │       └── s3.py                 # placeholder, WP-0004
│       ├── dataplane/                    # SPI + in-process impl (ADR-0004)
│       │   ├── spi.py
│       │   └── inproc.py
│       ├── registry/                     # high-level orchestrator
│       ├── api/
│       │   └── http/                     # FastAPI app
│       ├── cli/                          # typer CLI (thin)
│       └── config.py
├── tests/
│   ├── unit/
│   ├── integration/
│   └── conftest.py
├── migrations/                           # alembic
└── docs/

Commands (T001 acceptance)

make dev        # uvicorn with reload, sqlite backend, local FS storage
make test       # pytest -q
make lint       # ruff check + ruff format --check
make type       # mypy --strict src tests
make migrate    # alembic upgrade head
artifactstore   # CLI entry point installed by uv

Consequences

Positive:

  • Dependency budget is small and each dep is best-in-class for its slot.
  • The same toolchain works on Linux, macOS, and CI without special cases.
  • uv.lock is checked in; builds are reproducible.
  • Every layer maps one-to-one to a docs concept (identity, manifest, events, dataplane, etc.), so the codebase remains navigable.

Negative:

  • Pydantic v2 is the heaviest non-DB dep; acceptable for the OpenAPI win.
  • Choosing SQLAlchemy Core over ORM costs some convenience; we accept it because explicit SQL is easier to migrate to Rust later (ADR-0004).
  • mypy --strict is a per-PR tax; bounded by keeping the codebase small.

Revision policy

This ADR is the most likely candidate for revision once we have profile data from real ingestion. Candidates we are already watching:

  • Replace cbor2 with a Rust-backed CBOR codec if profile shows it on the hot path.
  • Replace uvicorn with granian (Rust ASGI server) if perf demands.
  • Replace SQLAlchemy Core with raw asyncpg + a tiny query builder if Core's abstractions show up in flame graphs.

Each replacement is its own ADR. None of them are v1 work.