docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans

Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00
parent 403d903585
commit 747afc27a6
16 changed files with 1761 additions and 404 deletions
--- a/docs/adr/0001-content-addressed-storage.md
+++ b/docs/adr/0001-content-addressed-storage.md
@@ -0,0 +1,80 @@
+# ADR-0001 — Content-Addressed Storage with Dual Digest
+
+Status: accepted
+Date: 2026-05-15
+Supersedes: —
+Related: ADR-0003, ADR-0006, `docs/PLATFORM-AMBITION.md` commitments A1, A2, A9
+
+## Context
+
+The architecture blueprint as originally drafted addresses stored bytes by
+logical `(package, relative_path)`. That is sufficient for v1 ingestion but
+forecloses global deduplication, Merkle integrity proofs, partial
+replication, federation, and OCI artifact compatibility — all of which the
+platform ambition requires to remain reachable.
+
+Independently, the original blueprint pins SHA-256 as the only file digest.
+SHA-256 with SHA-NI on modern x86 reaches ~1.5–2 GB/s/core. BLAKE3 on the
+same hardware reaches 6–10+ GB/s/core, parallelises across cores, and its
+construction *is* a Merkle tree — package-level integrity becomes free.
+SHA-256 remains the lingua franca of SLSA, in-toto, cosign, and OCI; we
+cannot drop it.
+
+## Decision
+
+1. The canonical storage key for any byte sequence is its content address
+   in the form `<algorithm>:<lowercase-hex-digest>`. Storage backends store
+   and retrieve by this key. `relative_path` is logical metadata recorded
+   in the manifest, not a storage-layer concept.
+2. Every `artifact_files` row carries two digest columns:
+   - `digest_primary` — the native digest; default algorithm `blake3`.
+   - `digest_sha256` — always populated for interop, even when `blake3`
+     is the primary.
+   Both are computed in a single ingest pass (one read of the input).
+3. The schema also carries a `digest_algorithm` column naming the primary
+   algorithm. Additional algorithms are added by new columns or a side
+   table, never by overloading `digest_primary`.
+4. Storage backend object keys are derived from `digest_primary` only.
+   Migrations between primary algorithms are explicit and audited; they
+   are not silent.
+
+## Consequences
+
+Positive:
+
+- Global deduplication is automatic — two identical files in two packages
+  share one backend object.
+- Merkle integrity over a package is free with BLAKE3 (use the tree mode).
+- Federation, partial mirrors, and OCI compatibility (ADR-0006) become
+  reachable without schema migration.
+- Verification of a single file does not require fetching its package.
+
+Negative:
+
+- Two digests must be computed per ingest. Mitigated by streaming both
+  through one buffer; the bottleneck is I/O, not hashing.
+- Reference counting: deletion of an `artifact_file` row cannot
+  unconditionally delete the backend object. A garbage-collector pass
+  reconciles references before deleting bytes. This is correct anyway
+  (deletion should be deliberate, per the blueprint).
+- Producers requesting "store these N bytes at path P" must understand
+  that their P is logical. This is a documentation problem, not a
+  technical one.
+
+## Implementation notes
+
+- v1 ships BLAKE3 via the `blake3` PyPI wheel (Rust core, SIMD-accelerated;
+  no asm we maintain).
+- v1 ships SHA-256 via stdlib `hashlib` (SHA-NI used when the CPython
+  build links against OpenSSL with SHA-NI support).
+- A `Digest` value object wraps `(algorithm, hex)`; serialised forms
+  always include the algorithm prefix.
+- A garbage-collector workplan is filed at WP-0006 (TBD); v1 does not
+  delete bytes automatically — it marks them eligible.
+
+## Status of the original blueprint pin
+
+The pre-cleanup blueprint's `artifact_files.sha256` column is replaced by
+`digest_algorithm`, `digest_primary`, `digest_sha256`. The pre-cleanup
+blueprint's implicit path-keyed storage is replaced by content-keyed
+storage. These changes are absorbed into `docs/ARCHITECTURE-BLUEPRINT.md`.
--- a/docs/adr/0002-event-log-source-of-truth.md
+++ b/docs/adr/0002-event-log-source-of-truth.md
@@ -0,0 +1,76 @@
+# ADR-0002 — Append-Only Event Log as Source of Truth
+
+Status: accepted
+Date: 2026-05-15
+Related: `docs/PLATFORM-AMBITION.md` commitment A3
+
+## Context
+
+The original blueprint defines `audit_events` and `retention_events` as
+separate tables. Both are useful, but neither is a complete authoritative
+record of how registry state was produced. Several downstream needs share
+one underlying primitive:
+
+- audit (who did what when, with what result),
+- change-data-capture feed for downstream consumers (Statehub, search),
+- replication and federation between instances,
+- point-in-time replay and disaster recovery,
+- materialised view rebuilds when schemas evolve.
+
+Each can be served by an append-only log of registry events with a
+monotonic sequence number. Two separate tables cannot.
+
+## Decision
+
+1. The registry persists an append-only `events` table. Every state-
+   changing operation writes one row in the same database transaction as
+   the operation. Once written, rows are immutable.
+2. Each row has a strictly monotonic, gapless sequence number scoped to
+   the registry instance, and a UTC ingest timestamp.
+3. The current `artifact_packages`, `artifact_files`, `storage_locations`,
+   and `retention_state` tables are materialised views over `events`.
+   They are rebuildable by replay.
+4. Event payloads are stored as canonical CBOR (ADR-0003), keyed by
+   `event_type` (string slug). The `event_type` namespace is versioned
+   (`v1.package.created`, `v1.file.ingested`, `v1.retention.extended`,
+   etc.).
+5. `audit_events` and `retention_events` cease to exist as standalone
+   tables; their semantics are subsets of `events` filtered by
+   `event_type`.
+
+## Consequences
+
+Positive:
+
+- One primitive serves audit, CDC, replication, replay, and rebuild.
+- A consumer can tail by `sequence > N` and never miss an event.
+- Forward-compatibility: new view columns can be derived from existing
+  events by adding a replay path; no migration required.
+- Signed event chains are reachable later by adding a signature column.
+
+Negative:
+
+- Replays cost wall-clock time on large datasets. Snapshots of
+  materialised views (with the highest applied sequence stamped on them)
+  are used to bound replay cost.
+- Schema migrations on materialised views still happen; they just no
+  longer touch the source of truth.
+- Discipline required: any write that bypasses the event log is a bug.
+  Enforced by code review and a runtime invariant check on the
+  materialised tables.
+
+## Implementation notes
+
+- `events` schema (v1):
+  - `sequence BIGSERIAL PRIMARY KEY`
+  - `created_at TIMESTAMPTZ NOT NULL DEFAULT now()`
+  - `event_type TEXT NOT NULL`
+  - `subject_kind TEXT NOT NULL` — `package` | `file` | `retention` | `storage` | `system`
+  - `subject_id UUID` — nullable for system-level events
+  - `actor TEXT NOT NULL` — producer or operator identity
+  - `payload BYTEA NOT NULL` — canonical CBOR
+  - `payload_digest BYTEA NOT NULL` — BLAKE3 of `payload`
+- Indexes: `(subject_kind, subject_id)`, `(event_type, sequence)`.
+- Replay tool ships in v1 as a CLI subcommand (`artifactstore replay`).
+- Outbound CDC stream (NATS / Kafka) is its own workplan; v1 only exposes
+  long-poll over `GET /events?since=<sequence>`.
--- a/docs/adr/0003-manifest-canonical-cbor.md
+++ b/docs/adr/0003-manifest-canonical-cbor.md
@@ -0,0 +1,78 @@
+# ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)
+
+Status: accepted
+Date: 2026-05-15
+Related: ADR-0001, ADR-0002, ADR-0006, `docs/PLATFORM-AMBITION.md` commitment A4
+
+## Context
+
+Manifests describe a package's identity, contents, retention, and
+provenance. They are the durable, portable, signable summary of a package.
+Three downstream features depend on byte-identical manifest serialisation:
+
+1. Manifest digest (used as the package's content address — ADR-0001).
+2. Signatures (cosign, Sigstore, in-toto, SLSA).
+3. Cross-language / cross-version reproducibility (any client must be
+   able to verify a manifest produced by any other client).
+
+JSON does not guarantee byte-identical output without an explicit
+canonicalisation profile. The candidates are:
+
+- **JCS** (JSON Canonicalization Scheme, RFC 8785) — JSON-shaped, widely
+  available, text-format, signs cleanly.
+- **Canonical CBOR** (RFC 8949 §4.2.2) — binary, smaller, lower overhead
+  to canonicalise, native in cosign / Sigstore tooling, used by COSE.
+- **DAG-CBOR** (IPLD profile) — canonical CBOR plus content-addressing
+  conventions; useful if we later integrate with IPLD/IPFS, but pulls in
+  ecosystem assumptions we don't yet need.
+
+Canonical CBOR wins on size, parser surface, and direct compatibility
+with the tooling we will adopt for signing (ADR commitments A4, A9). JCS
+is a reasonable alternative; we keep an emit-JCS path for human-readable
+display but the signed form is CBOR.
+
+## Decision
+
+1. Manifests are serialised as **canonical CBOR** per RFC 8949 §4.2.2:
+   - definite-length encoding throughout,
+   - shortest-form integer encoding,
+   - map keys sorted bytewise lexicographically,
+   - no floating-point unless explicitly required (we do not require it),
+   - no semantic tags except those we explicitly enumerate.
+2. The manifest's content address is `blake3:<hex>` of its canonical
+   CBOR bytes. This is the package's primary identifier in storage.
+3. A canonical JSON projection (JCS) of the same manifest is available
+   for display, signing-tool interop, and human inspection. The
+   projection is deterministic: round-tripping through it must yield
+   byte-identical CBOR.
+4. The manifest schema is itself versioned (`manifest_version: 1`).
+   Unknown fields are preserved on read and re-emitted on write (forward
+   compatibility); breaking schema changes bump the version.
+
+## Consequences
+
+Positive:
+
+- Manifests are signable today by any tool that consumes CBOR (cosign,
+  ssh-keygen `-Y sign`, COSE libraries).
+- The manifest digest is stable across languages, OS, and compiler.
+- Smaller on disk and on the wire than JSON.
+- Replay (ADR-0002) is unambiguous because event payloads are also CBOR.
+
+Negative:
+
+- Less human-readable in raw form; the CLI must offer a `pretty` projection.
+- One more dependency (a CBOR library). We pin one in ADR-0005.
+- Future schema evolution requires the same canonicalisation discipline.
+  Enforced by a property-based test: any manifest must round-trip
+  CBOR → JCS → CBOR with byte equality.
+
+## Implementation notes
+
+- v1 library: `cbor2` (PyPI; pure-Python with optional C extension).
+  Wrapped behind `artifactstore.manifest.codec` so swapping to a faster
+  impl is transparent.
+- JCS projection: `jcs` (PyPI) or hand-rolled — decision deferred to
+  WP-0001-T003.
+- A `Manifest` value class enforces field order on emit, not just on
+  encode. This catches non-canonical producers at the API boundary.
--- a/docs/adr/0004-control-plane-data-plane-contract.md
+++ b/docs/adr/0004-control-plane-data-plane-contract.md
@@ -0,0 +1,79 @@
+# ADR-0004 — Control Plane / Data Plane Contract
+
+Status: accepted
+Date: 2026-05-15
+Related: ADR-0005, `docs/PLATFORM-AMBITION.md` commitment A5,
+`docs/ASSEMBLY-EXPERIMENT.md`
+
+## Context
+
+The platform ambition expects a Rust (eventually asm-tuned) data plane
+to handle hot ingest paths — hashing, chunking, optional compression and
+encryption, storage backend I/O. The v1 service is written entirely in
+Python (ADR-0005). The cost of conflating control and data planes at the
+code level is that extracting the data plane later requires API churn,
+test rework, and producer migrations.
+
+The cost of separating them now is one named module boundary and one
+in-process protocol shape. That cost is essentially free if taken
+before any consumer exists.
+
+## Decision
+
+1. The Python package is organised so that *every byte-handling
+   operation* lives behind a named contract:
+   - `artifactstore.dataplane.spi` — the abstract surface (typed
+     dataclasses, async iterator protocols).
+   - `artifactstore.dataplane.inproc` — the v1 implementation, running
+     in the same process as the control plane.
+2. The control plane (`artifactstore.registry`, `artifactstore.api.http`,
+   `artifactstore.retention`, `artifactstore.audit`) interacts with
+   bytes *only* through the SPI. No HTTP handler, no DB writer, no
+   retention rule ever reads or writes file bytes directly.
+3. The SPI exposes exactly these operations:
+   - `ingest_stream(stream, hints) -> IngestResult` — consumes an
+     upload, returns content addresses, sizes, and storage receipts.
+   - `serve_object(content_address, range?) -> AsyncIterator[bytes]` —
+     produces bytes for a download.
+   - `verify_object(content_address) -> VerifyResult` — re-reads bytes,
+     re-digests, returns mismatches.
+   - `delete_object(content_address) -> DeletionResult` — best-effort,
+     idempotent.
+   - `backend_health() -> BackendStatus` — readiness, latency, free
+     capacity.
+4. The SPI surface is the contract a future Rust daemon must satisfy.
+   When that daemon ships, `artifactstore.dataplane.inproc` is replaced
+   by `artifactstore.dataplane.remote` (a thin gRPC or
+   framed-bincode-over-Unix-socket client). The control plane sees no
+   change.
+5. SPI parameter and return types are CBOR-serialisable today, even when
+   nothing serialises them. This lets us toggle to RPC without rewriting
+   types.
+
+## Consequences
+
+Positive:
+
+- The data plane can be rewritten in Rust later with zero API churn.
+- Tests can fake the SPI cheaply; integration tests pin the contract.
+- The CLI in `artifactstore.cli` is a second consumer of the SPI on
+  equal footing with the HTTP server.
+- Operators with strong embedding requirements can use the in-process
+  data plane forever; nothing forces the RPC hop.
+
+Negative:
+
+- One extra abstraction layer in v1. Mitigated by the contract being
+  narrow (five operations).
+- Discipline required: PRs that bypass the SPI are rejected. A linter
+  rule (forbidden import: `artifactstore.api.* -> filesystem`) makes
+  this mechanical.
+
+## Implementation notes
+
+- The SPI is a `Protocol` (typing.Protocol) in `dataplane/spi.py` so the
+  in-process and future remote impls don't share an inheritance tree.
+- Streaming returns `AsyncIterator[bytes]` so neither full-file buffering
+  nor `sendfile()` zero-copy is foreclosed.
+- The `IngestResult` payload is the canonical CBOR-able value used in
+  events (ADR-0002). The same byte sequence flows API → SPI → event.
--- a/docs/adr/0005-v1-tech-stack.md
+++ b/docs/adr/0005-v1-tech-stack.md
@@ -0,0 +1,117 @@
+# ADR-0005 — V1 Technology Stack
+
+Status: accepted
+Date: 2026-05-15
+Related: ADR-0001, ADR-0002, ADR-0003, ADR-0004
+
+## Context
+
+WP-0001 ("Foundation") cannot start without a pinned stack. The decision
+needs to balance:
+
+- ffmpeg / VLC philosophy: minimal dependency budget, sharp boundaries,
+  native code at the hot edges, plain tools.
+- Python is already implied by `.gitignore` and ecosystem fit (StateHub,
+  guide-board, open-cmis-tck are all Python-leaning).
+- The data plane will eventually be Rust (ADR-0004); the control plane
+  stays in Python and must stay approachable.
+
+## Decision
+
+| Concern | Choice | Rationale |
+|---|---|---|
+| Language (control plane) | **Python 3.12+** | Async ecosystem, type hints, matches sibling repos. 3.12 specifically: PEP 695 generics, faster CPython, `sys.monitoring`. |
+| Package / project manager | **uv** | Single static binary, fast resolver, lockfile-first, replaces `pip + pip-tools + venv + pipx` in one tool. |
+| Build backend | **hatchling** (via `pyproject.toml`) | Standards-track PEP 517 backend. No magic. |
+| HTTP framework | **FastAPI** (Starlette + Pydantic v2) | OpenAPI generation, async-native, broad community. |
+| ASGI server | **uvicorn** (dev), **gunicorn + uvicorn workers** (prod) | Plain, well-understood. |
+| Database (prod) | **PostgreSQL 16+** | Source-of-truth event log (ADR-0002) wants `BIGSERIAL`, `BYTEA`, advisory locks, logical replication. |
+| Database (dev/embedded) | **SQLite (WAL mode)** | Zero-dependency local. Schema is portable when we use SQLAlchemy Core. |
+| DB access | **SQLAlchemy 2.0 Core** + **asyncpg** (prod) / **aiosqlite** (dev) | Core, not ORM — explicit SQL, async drivers. Migrations live below the API surface. |
+| Migrations | **Alembic** | Standard, integrates with SQLAlchemy Core, supports both pg and sqlite. |
+| Hashing | stdlib **`hashlib`** for SHA-256, **`blake3`** PyPI wheel for BLAKE3 | `blake3` wheel embeds the SIMD-tuned Rust impl with no build-time toolchain. |
+| Serialisation | **`cbor2`** for canonical CBOR (ADR-0003); stdlib `json` for JCS or `jcs` PyPI | Smallest deps that satisfy ADR-0003. |
+| CLI | **typer** (atop click) | Sits on FastAPI's Pydantic types cleanly; type-driven CLI surface. |
+| Tests | **pytest** + **httpx** + **trio-asyncio**-free `pytest-asyncio` | Standard. |
+| Lint / format | **ruff** (lint + format) | One tool replaces black + isort + flake8 + pyupgrade. |
+| Type checker | **mypy** in `--strict` | Pyright is acceptable for editor support; CI gate is mypy. |
+| Logging | stdlib `logging` + `structlog` for structured output | No exotic deps. |
+| Metrics / tracing | OpenTelemetry SDK (deferred to its own workplan) | Listed for forward-compatibility; not a v1 dep. |
+
+### Project layout
+
+```
+artifact-store/
+├── pyproject.toml
+├── uv.lock
+├── Makefile                              # thin shim: make dev / test / lint / type / migrate
+├── alembic.ini
+├── src/
+│   └── artifactstore/
+│       ├── __init__.py
+│       ├── identity/                     # content address, digest abstraction (ADR-0001)
+│       ├── manifest/                     # canonical CBOR, JCS projection (ADR-0003)
+│       ├── events/                       # append-only log + replayer (ADR-0002)
+│       ├── retention/                    # policy engine
+│       ├── audit/                        # audit emission as event subset
+│       ├── storage/                      # adapter SPI + backend registry
+│       │   ├── spi.py
+│       │   └── backends/
+│       │       ├── local.py              # filesystem backend
+│       │       └── s3.py                 # placeholder, WP-0004
+│       ├── dataplane/                    # SPI + in-process impl (ADR-0004)
+│       │   ├── spi.py
+│       │   └── inproc.py
+│       ├── registry/                     # high-level orchestrator
+│       ├── api/
+│       │   └── http/                     # FastAPI app
+│       ├── cli/                          # typer CLI (thin)
+│       └── config.py
+├── tests/
+│   ├── unit/
+│   ├── integration/
+│   └── conftest.py
+├── migrations/                           # alembic
+└── docs/
+```
+
+### Commands (T001 acceptance)
+
+```
+make dev        # uvicorn with reload, sqlite backend, local FS storage
+make test       # pytest -q
+make lint       # ruff check + ruff format --check
+make type       # mypy --strict src tests
+make migrate    # alembic upgrade head
+artifactstore   # CLI entry point installed by uv
+```
+
+## Consequences
+
+Positive:
+
+- Dependency budget is small and each dep is best-in-class for its slot.
+- The same toolchain works on Linux, macOS, and CI without special cases.
+- `uv.lock` is checked in; builds are reproducible.
+- Every layer maps one-to-one to a docs concept (identity, manifest,
+  events, dataplane, etc.), so the codebase remains navigable.
+
+Negative:
+
+- Pydantic v2 is the heaviest non-DB dep; acceptable for the OpenAPI win.
+- Choosing SQLAlchemy Core over ORM costs some convenience; we accept
+  it because explicit SQL is easier to migrate to Rust later (ADR-0004).
+- mypy `--strict` is a per-PR tax; bounded by keeping the codebase small.
+
+## Revision policy
+
+This ADR is the most likely candidate for revision once we have profile
+data from real ingestion. Candidates we are already watching:
+
+- Replace `cbor2` with a Rust-backed CBOR codec if profile shows it on
+  the hot path.
+- Replace `uvicorn` with `granian` (Rust ASGI server) if perf demands.
+- Replace `SQLAlchemy Core` with raw `asyncpg` + a tiny query builder
+  if Core's abstractions show up in flame graphs.
+
+Each replacement is its own ADR. None of them are v1 work.
--- a/docs/adr/0006-oci-compatibility-reachable.md
+++ b/docs/adr/0006-oci-compatibility-reachable.md
@@ -0,0 +1,69 @@
+# ADR-0006 — OCI Artifact Compatibility Kept Reachable
+
+Status: accepted
+Date: 2026-05-15
+Related: ADR-0001, ADR-0003, `docs/PLATFORM-AMBITION.md` commitment A9
+
+## Context
+
+The OCI Distribution Specification and the OCI Artifact Manifest define
+a widely-deployed wire format for content-addressed artifact exchange.
+The ecosystem includes `oras`, `cosign`, `crane`, Helm, ChartMuseum,
+ML-model packaging tools, and most container registries. Compatibility
+with this ecosystem is the single highest-leverage opportunity in
+`docs/PLATFORM-AMBITION.md`.
+
+We do not implement OCI compatibility in v1. We do refuse to take any
+v1 decision that prevents it.
+
+## Decision
+
+1. The internal data model is structurally compatible with an OCI
+   artifact manifest. Concretely:
+   - Storage addresses content as `<algorithm>:<lowercase-hex>`
+     (ADR-0001). OCI requires exactly this shape.
+   - Manifests have a `config` blob plus an ordered list of `layers`,
+     each with `mediaType`, `digest`, `size`, and optional
+     `annotations`. Our `Manifest` value class includes all of these
+     fields, even when v1 has no use for `mediaType` or `annotations`.
+   - Manifest serialisation produces byte-identical output across
+     callers (ADR-0003). OCI requires this for the manifest digest.
+2. The native API may be richer than OCI, but v1 reviews every schema
+   change against the OCI spec and rejects changes that would block
+   later OCI compatibility.
+3. A future `/v2/` namespace will speak the OCI Distribution Spec on
+   top of the same storage. This is its own workplan; it does not
+   modify v1 endpoints, only add new ones.
+
+## Consequences
+
+Positive:
+
+- `oras push`, `cosign sign`, `crane copy`, Helm `chart pull` become
+  reachable additions, not rewrites.
+- Customers who already speak OCI can adopt incrementally.
+- The `mediaType` discipline forces v1 producers to label their files,
+  which improves the manifest's value as a portable record.
+
+Negative:
+
+- v1 carries some otherwise-unnecessary manifest fields. Acceptable;
+  the cost is bytes, not complexity.
+- The OCI manifest model uses SHA-256 as the canonical digest in
+  practice. ADR-0001's `digest_sha256` column satisfies this; the
+  native primary digest can still be BLAKE3.
+
+## What this ADR does NOT commit to
+
+- It does not commit to implementing OCI Distribution in v1.
+- It does not commit to OCI as the *only* wire format. The native API
+  remains the richer interface.
+- It does not commit to specific OCI media types for evidence packages.
+  Media-type assignment is the subject of a later workplan.
+
+## Review trigger
+
+Every schema-affecting workplan (anything that touches the data model
+or the manifest shape) must include an explicit one-paragraph review
+against this ADR. Reject changes that introduce OCI-incompatible
+invariants without superseding this ADR.
--- a/docs/adr/README.md
+++ b/docs/adr/README.md
@@ -0,0 +1,32 @@
+# Architecture Decision Records
+
+This directory holds the architectural decisions that govern `artifact-store`.
+Each ADR is a small Markdown file with a status (`proposed`, `accepted`,
+`superseded`, `deprecated`), a concise statement of the decision, the
+forces that pushed it, and the consequences.
+
+ADRs are the canonical home for "we are doing X" statements that survive
+multiple workplans. `INTENT.md` says what we build; `SCOPE.md` says where
+the boundary is; `docs/PLATFORM-AMBITION.md` says where we are pointed;
+ADRs say how — and they are the only document that records a *changeable*
+decision in a form that can be superseded cleanly.
+
+Workplans cite the ADRs they depend on. The architecture blueprint cites
+the ADRs it operationalises.
+
+## Index
+
+- [ADR-0001 — Content-Addressed Storage with Dual Digest](0001-content-addressed-storage.md) — accepted
+- [ADR-0002 — Append-Only Event Log as Source of Truth](0002-event-log-source-of-truth.md) — accepted
+- [ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)](0003-manifest-canonical-cbor.md) — accepted
+- [ADR-0004 — Control Plane / Data Plane Contract](0004-control-plane-data-plane-contract.md) — accepted
+- [ADR-0005 — V1 Technology Stack](0005-v1-tech-stack.md) — accepted
+- [ADR-0006 — OCI Artifact Compatibility Kept Reachable](0006-oci-compatibility-reachable.md) — accepted
+
+## Conventions
+
+- Filenames: `NNNN-kebab-case-slug.md`, numbered in acceptance order.
+- Status transitions: `proposed → accepted → (superseded | deprecated)`.
+- Supersession is explicit: the new ADR links the old; the old ADR links
+  forward and changes status. Never delete an ADR.
+- Each ADR is short. If it is long, it is wrong: split it.