generated from coulomb/repo-seed
Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
77 lines
3.2 KiB
Markdown
77 lines
3.2 KiB
Markdown
# ADR-0002 — Append-Only Event Log as Source of Truth
|
|
|
|
Status: accepted
|
|
Date: 2026-05-15
|
|
Related: `docs/PLATFORM-AMBITION.md` commitment A3
|
|
|
|
## Context
|
|
|
|
The original blueprint defines `audit_events` and `retention_events` as
|
|
separate tables. Both are useful, but neither is a complete authoritative
|
|
record of how registry state was produced. Several downstream needs share
|
|
one underlying primitive:
|
|
|
|
- audit (who did what when, with what result),
|
|
- change-data-capture feed for downstream consumers (Statehub, search),
|
|
- replication and federation between instances,
|
|
- point-in-time replay and disaster recovery,
|
|
- materialised view rebuilds when schemas evolve.
|
|
|
|
Each can be served by an append-only log of registry events with a
|
|
monotonic sequence number. Two separate tables cannot.
|
|
|
|
## Decision
|
|
|
|
1. The registry persists an append-only `events` table. Every state-
|
|
changing operation writes one row in the same database transaction as
|
|
the operation. Once written, rows are immutable.
|
|
2. Each row has a strictly monotonic, gapless sequence number scoped to
|
|
the registry instance, and a UTC ingest timestamp.
|
|
3. The current `artifact_packages`, `artifact_files`, `storage_locations`,
|
|
and `retention_state` tables are materialised views over `events`.
|
|
They are rebuildable by replay.
|
|
4. Event payloads are stored as canonical CBOR (ADR-0003), keyed by
|
|
`event_type` (string slug). The `event_type` namespace is versioned
|
|
(`v1.package.created`, `v1.file.ingested`, `v1.retention.extended`,
|
|
etc.).
|
|
5. `audit_events` and `retention_events` cease to exist as standalone
|
|
tables; their semantics are subsets of `events` filtered by
|
|
`event_type`.
|
|
|
|
## Consequences
|
|
|
|
Positive:
|
|
|
|
- One primitive serves audit, CDC, replication, replay, and rebuild.
|
|
- A consumer can tail by `sequence > N` and never miss an event.
|
|
- Forward-compatibility: new view columns can be derived from existing
|
|
events by adding a replay path; no migration required.
|
|
- Signed event chains are reachable later by adding a signature column.
|
|
|
|
Negative:
|
|
|
|
- Replays cost wall-clock time on large datasets. Snapshots of
|
|
materialised views (with the highest applied sequence stamped on them)
|
|
are used to bound replay cost.
|
|
- Schema migrations on materialised views still happen; they just no
|
|
longer touch the source of truth.
|
|
- Discipline required: any write that bypasses the event log is a bug.
|
|
Enforced by code review and a runtime invariant check on the
|
|
materialised tables.
|
|
|
|
## Implementation notes
|
|
|
|
- `events` schema (v1):
|
|
- `sequence BIGSERIAL PRIMARY KEY`
|
|
- `created_at TIMESTAMPTZ NOT NULL DEFAULT now()`
|
|
- `event_type TEXT NOT NULL`
|
|
- `subject_kind TEXT NOT NULL` — `package` | `file` | `retention` | `storage` | `system`
|
|
- `subject_id UUID` — nullable for system-level events
|
|
- `actor TEXT NOT NULL` — producer or operator identity
|
|
- `payload BYTEA NOT NULL` — canonical CBOR
|
|
- `payload_digest BYTEA NOT NULL` — BLAKE3 of `payload`
|
|
- Indexes: `(subject_kind, subject_id)`, `(event_type, sequence)`.
|
|
- Replay tool ships in v1 as a CLI subcommand (`artifactstore replay`).
|
|
- Outbound CDC stream (NATS / Kafka) is its own workplan; v1 only exposes
|
|
long-poll over `GET /events?since=<sequence>`.
|