generated from coulomb/repo-seed
Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0002, ADR-0006, docs/PLATFORM-AMBITION.md commitment A4
Context
Manifests describe a package's identity, contents, retention, and provenance. They are the durable, portable, signable summary of a package. Three downstream features depend on byte-identical manifest serialisation:
- Manifest digest (used as the package's content address — ADR-0001).
- Signatures (cosign, Sigstore, in-toto, SLSA).
- Cross-language / cross-version reproducibility (any client must be able to verify a manifest produced by any other client).
JSON does not guarantee byte-identical output without an explicit canonicalisation profile. The candidates are:
- JCS (JSON Canonicalization Scheme, RFC 8785) — JSON-shaped, widely available, text-format, signs cleanly.
- Canonical CBOR (RFC 8949 §4.2.2) — binary, smaller, lower overhead to canonicalise, native in cosign / Sigstore tooling, used by COSE.
- DAG-CBOR (IPLD profile) — canonical CBOR plus content-addressing conventions; useful if we later integrate with IPLD/IPFS, but pulls in ecosystem assumptions we don't yet need.
Canonical CBOR wins on size, parser surface, and direct compatibility with the tooling we will adopt for signing (ADR commitments A4, A9). JCS is a reasonable alternative; we keep an emit-JCS path for human-readable display but the signed form is CBOR.
Decision
- Manifests are serialised as canonical CBOR per RFC 8949 §4.2.2:
- definite-length encoding throughout,
- shortest-form integer encoding,
- map keys sorted bytewise lexicographically,
- no floating-point unless explicitly required (we do not require it),
- no semantic tags except those we explicitly enumerate.
- The manifest's content address is
blake3:<hex>of its canonical CBOR bytes. This is the package's primary identifier in storage. - A canonical JSON projection (JCS) of the same manifest is available for display, signing-tool interop, and human inspection. The projection is deterministic: round-tripping through it must yield byte-identical CBOR.
- The manifest schema is itself versioned (
manifest_version: 1). Unknown fields are preserved on read and re-emitted on write (forward compatibility); breaking schema changes bump the version.
Consequences
Positive:
- Manifests are signable today by any tool that consumes CBOR (cosign,
ssh-keygen
-Y sign, COSE libraries). - The manifest digest is stable across languages, OS, and compiler.
- Smaller on disk and on the wire than JSON.
- Replay (ADR-0002) is unambiguous because event payloads are also CBOR.
Negative:
- Less human-readable in raw form; the CLI must offer a
prettyprojection. - One more dependency (a CBOR library). We pin one in ADR-0005.
- Future schema evolution requires the same canonicalisation discipline. Enforced by a property-based test: any manifest must round-trip CBOR → JCS → CBOR with byte equality.
Implementation notes
- v1 library:
cbor2(PyPI; pure-Python with optional C extension). Wrapped behindartifactstore.manifest.codecso swapping to a faster impl is transparent. - JCS projection:
jcs(PyPI) or hand-rolled — decision deferred to WP-0001-T003. - A
Manifestvalue class enforces field order on emit, not just on encode. This catches non-canonical producers at the API boundary.