Files
artifact-store/docs/adr/0005-v1-tech-stack.md
tegwick 747afc27a6 docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00

118 lines
5.7 KiB
Markdown

# ADR-0005 — V1 Technology Stack
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0002, ADR-0003, ADR-0004
## Context
WP-0001 ("Foundation") cannot start without a pinned stack. The decision
needs to balance:
- ffmpeg / VLC philosophy: minimal dependency budget, sharp boundaries,
native code at the hot edges, plain tools.
- Python is already implied by `.gitignore` and ecosystem fit (StateHub,
guide-board, open-cmis-tck are all Python-leaning).
- The data plane will eventually be Rust (ADR-0004); the control plane
stays in Python and must stay approachable.
## Decision
| Concern | Choice | Rationale |
|---|---|---|
| Language (control plane) | **Python 3.12+** | Async ecosystem, type hints, matches sibling repos. 3.12 specifically: PEP 695 generics, faster CPython, `sys.monitoring`. |
| Package / project manager | **uv** | Single static binary, fast resolver, lockfile-first, replaces `pip + pip-tools + venv + pipx` in one tool. |
| Build backend | **hatchling** (via `pyproject.toml`) | Standards-track PEP 517 backend. No magic. |
| HTTP framework | **FastAPI** (Starlette + Pydantic v2) | OpenAPI generation, async-native, broad community. |
| ASGI server | **uvicorn** (dev), **gunicorn + uvicorn workers** (prod) | Plain, well-understood. |
| Database (prod) | **PostgreSQL 16+** | Source-of-truth event log (ADR-0002) wants `BIGSERIAL`, `BYTEA`, advisory locks, logical replication. |
| Database (dev/embedded) | **SQLite (WAL mode)** | Zero-dependency local. Schema is portable when we use SQLAlchemy Core. |
| DB access | **SQLAlchemy 2.0 Core** + **asyncpg** (prod) / **aiosqlite** (dev) | Core, not ORM — explicit SQL, async drivers. Migrations live below the API surface. |
| Migrations | **Alembic** | Standard, integrates with SQLAlchemy Core, supports both pg and sqlite. |
| Hashing | stdlib **`hashlib`** for SHA-256, **`blake3`** PyPI wheel for BLAKE3 | `blake3` wheel embeds the SIMD-tuned Rust impl with no build-time toolchain. |
| Serialisation | **`cbor2`** for canonical CBOR (ADR-0003); stdlib `json` for JCS or `jcs` PyPI | Smallest deps that satisfy ADR-0003. |
| CLI | **typer** (atop click) | Sits on FastAPI's Pydantic types cleanly; type-driven CLI surface. |
| Tests | **pytest** + **httpx** + **trio-asyncio**-free `pytest-asyncio` | Standard. |
| Lint / format | **ruff** (lint + format) | One tool replaces black + isort + flake8 + pyupgrade. |
| Type checker | **mypy** in `--strict` | Pyright is acceptable for editor support; CI gate is mypy. |
| Logging | stdlib `logging` + `structlog` for structured output | No exotic deps. |
| Metrics / tracing | OpenTelemetry SDK (deferred to its own workplan) | Listed for forward-compatibility; not a v1 dep. |
### Project layout
```
artifact-store/
├── pyproject.toml
├── uv.lock
├── Makefile # thin shim: make dev / test / lint / type / migrate
├── alembic.ini
├── src/
│ └── artifactstore/
│ ├── __init__.py
│ ├── identity/ # content address, digest abstraction (ADR-0001)
│ ├── manifest/ # canonical CBOR, JCS projection (ADR-0003)
│ ├── events/ # append-only log + replayer (ADR-0002)
│ ├── retention/ # policy engine
│ ├── audit/ # audit emission as event subset
│ ├── storage/ # adapter SPI + backend registry
│ │ ├── spi.py
│ │ └── backends/
│ │ ├── local.py # filesystem backend
│ │ └── s3.py # placeholder, WP-0004
│ ├── dataplane/ # SPI + in-process impl (ADR-0004)
│ │ ├── spi.py
│ │ └── inproc.py
│ ├── registry/ # high-level orchestrator
│ ├── api/
│ │ └── http/ # FastAPI app
│ ├── cli/ # typer CLI (thin)
│ └── config.py
├── tests/
│ ├── unit/
│ ├── integration/
│ └── conftest.py
├── migrations/ # alembic
└── docs/
```
### Commands (T001 acceptance)
```
make dev # uvicorn with reload, sqlite backend, local FS storage
make test # pytest -q
make lint # ruff check + ruff format --check
make type # mypy --strict src tests
make migrate # alembic upgrade head
artifactstore # CLI entry point installed by uv
```
## Consequences
Positive:
- Dependency budget is small and each dep is best-in-class for its slot.
- The same toolchain works on Linux, macOS, and CI without special cases.
- `uv.lock` is checked in; builds are reproducible.
- Every layer maps one-to-one to a docs concept (identity, manifest,
events, dataplane, etc.), so the codebase remains navigable.
Negative:
- Pydantic v2 is the heaviest non-DB dep; acceptable for the OpenAPI win.
- Choosing SQLAlchemy Core over ORM costs some convenience; we accept
it because explicit SQL is easier to migrate to Rust later (ADR-0004).
- mypy `--strict` is a per-PR tax; bounded by keeping the codebase small.
## Revision policy
This ADR is the most likely candidate for revision once we have profile
data from real ingestion. Candidates we are already watching:
- Replace `cbor2` with a Rust-backed CBOR codec if profile shows it on
the hot path.
- Replace `uvicorn` with `granian` (Rust ASGI server) if perf demands.
- Replace `SQLAlchemy Core` with raw `asyncpg` + a tiny query builder
if Core's abstractions show up in flame graphs.
Each replacement is its own ADR. None of them are v1 work.