generated from coulomb/repo-seed
Aligns the v1 architecture with the longer-horizon platform thesis so we can start implementation without the schema-level inconsistencies the prior review surfaced. ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only event log as source of truth, canonical CBOR manifests, control/data-plane contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core + asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI compatibility kept reachable. Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module layout, materialised-view data model over the event log, upload-session and event-stream endpoints pinned, retrieval tiering promoted into the schema. Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005 created carrying the existing state_hub_task_ids forward semantically: ingestion API (T004), retention lifecycle (T005), S3-compatible backend (T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001 with refined acceptance. README and AGENTS.md refreshed to reflect the new repo shape. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
79 lines
3.4 KiB
Markdown
79 lines
3.4 KiB
Markdown
# ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)
|
|
|
|
Status: accepted
|
|
Date: 2026-05-15
|
|
Related: ADR-0001, ADR-0002, ADR-0006, `docs/PLATFORM-AMBITION.md` commitment A4
|
|
|
|
## Context
|
|
|
|
Manifests describe a package's identity, contents, retention, and
|
|
provenance. They are the durable, portable, signable summary of a package.
|
|
Three downstream features depend on byte-identical manifest serialisation:
|
|
|
|
1. Manifest digest (used as the package's content address — ADR-0001).
|
|
2. Signatures (cosign, Sigstore, in-toto, SLSA).
|
|
3. Cross-language / cross-version reproducibility (any client must be
|
|
able to verify a manifest produced by any other client).
|
|
|
|
JSON does not guarantee byte-identical output without an explicit
|
|
canonicalisation profile. The candidates are:
|
|
|
|
- **JCS** (JSON Canonicalization Scheme, RFC 8785) — JSON-shaped, widely
|
|
available, text-format, signs cleanly.
|
|
- **Canonical CBOR** (RFC 8949 §4.2.2) — binary, smaller, lower overhead
|
|
to canonicalise, native in cosign / Sigstore tooling, used by COSE.
|
|
- **DAG-CBOR** (IPLD profile) — canonical CBOR plus content-addressing
|
|
conventions; useful if we later integrate with IPLD/IPFS, but pulls in
|
|
ecosystem assumptions we don't yet need.
|
|
|
|
Canonical CBOR wins on size, parser surface, and direct compatibility
|
|
with the tooling we will adopt for signing (ADR commitments A4, A9). JCS
|
|
is a reasonable alternative; we keep an emit-JCS path for human-readable
|
|
display but the signed form is CBOR.
|
|
|
|
## Decision
|
|
|
|
1. Manifests are serialised as **canonical CBOR** per RFC 8949 §4.2.2:
|
|
- definite-length encoding throughout,
|
|
- shortest-form integer encoding,
|
|
- map keys sorted bytewise lexicographically,
|
|
- no floating-point unless explicitly required (we do not require it),
|
|
- no semantic tags except those we explicitly enumerate.
|
|
2. The manifest's content address is `blake3:<hex>` of its canonical
|
|
CBOR bytes. This is the package's primary identifier in storage.
|
|
3. A canonical JSON projection (JCS) of the same manifest is available
|
|
for display, signing-tool interop, and human inspection. The
|
|
projection is deterministic: round-tripping through it must yield
|
|
byte-identical CBOR.
|
|
4. The manifest schema is itself versioned (`manifest_version: 1`).
|
|
Unknown fields are preserved on read and re-emitted on write (forward
|
|
compatibility); breaking schema changes bump the version.
|
|
|
|
## Consequences
|
|
|
|
Positive:
|
|
|
|
- Manifests are signable today by any tool that consumes CBOR (cosign,
|
|
ssh-keygen `-Y sign`, COSE libraries).
|
|
- The manifest digest is stable across languages, OS, and compiler.
|
|
- Smaller on disk and on the wire than JSON.
|
|
- Replay (ADR-0002) is unambiguous because event payloads are also CBOR.
|
|
|
|
Negative:
|
|
|
|
- Less human-readable in raw form; the CLI must offer a `pretty` projection.
|
|
- One more dependency (a CBOR library). We pin one in ADR-0005.
|
|
- Future schema evolution requires the same canonicalisation discipline.
|
|
Enforced by a property-based test: any manifest must round-trip
|
|
CBOR → JCS → CBOR with byte equality.
|
|
|
|
## Implementation notes
|
|
|
|
- v1 library: `cbor2` (PyPI; pure-Python with optional C extension).
|
|
Wrapped behind `artifactstore.manifest.codec` so swapping to a faster
|
|
impl is transparent.
|
|
- JCS projection: `jcs` (PyPI) or hand-rolled — decision deferred to
|
|
WP-0001-T003.
|
|
- A `Manifest` value class enforces field order on emit, not just on
|
|
encode. This catches non-canonical producers at the API boundary.
|