Files
artifact-store/docs/adr/0003-manifest-canonical-cbor.md
tegwick 747afc27a6 docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 21:16:17 +02:00

79 lines
3.4 KiB
Markdown

# ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0002, ADR-0006, `docs/PLATFORM-AMBITION.md` commitment A4
## Context
Manifests describe a package's identity, contents, retention, and
provenance. They are the durable, portable, signable summary of a package.
Three downstream features depend on byte-identical manifest serialisation:
1. Manifest digest (used as the package's content address — ADR-0001).
2. Signatures (cosign, Sigstore, in-toto, SLSA).
3. Cross-language / cross-version reproducibility (any client must be
able to verify a manifest produced by any other client).
JSON does not guarantee byte-identical output without an explicit
canonicalisation profile. The candidates are:
- **JCS** (JSON Canonicalization Scheme, RFC 8785) — JSON-shaped, widely
available, text-format, signs cleanly.
- **Canonical CBOR** (RFC 8949 §4.2.2) — binary, smaller, lower overhead
to canonicalise, native in cosign / Sigstore tooling, used by COSE.
- **DAG-CBOR** (IPLD profile) — canonical CBOR plus content-addressing
conventions; useful if we later integrate with IPLD/IPFS, but pulls in
ecosystem assumptions we don't yet need.
Canonical CBOR wins on size, parser surface, and direct compatibility
with the tooling we will adopt for signing (ADR commitments A4, A9). JCS
is a reasonable alternative; we keep an emit-JCS path for human-readable
display but the signed form is CBOR.
## Decision
1. Manifests are serialised as **canonical CBOR** per RFC 8949 §4.2.2:
- definite-length encoding throughout,
- shortest-form integer encoding,
- map keys sorted bytewise lexicographically,
- no floating-point unless explicitly required (we do not require it),
- no semantic tags except those we explicitly enumerate.
2. The manifest's content address is `blake3:<hex>` of its canonical
CBOR bytes. This is the package's primary identifier in storage.
3. A canonical JSON projection (JCS) of the same manifest is available
for display, signing-tool interop, and human inspection. The
projection is deterministic: round-tripping through it must yield
byte-identical CBOR.
4. The manifest schema is itself versioned (`manifest_version: 1`).
Unknown fields are preserved on read and re-emitted on write (forward
compatibility); breaking schema changes bump the version.
## Consequences
Positive:
- Manifests are signable today by any tool that consumes CBOR (cosign,
ssh-keygen `-Y sign`, COSE libraries).
- The manifest digest is stable across languages, OS, and compiler.
- Smaller on disk and on the wire than JSON.
- Replay (ADR-0002) is unambiguous because event payloads are also CBOR.
Negative:
- Less human-readable in raw form; the CLI must offer a `pretty` projection.
- One more dependency (a CBOR library). We pin one in ADR-0005.
- Future schema evolution requires the same canonicalisation discipline.
Enforced by a property-based test: any manifest must round-trip
CBOR → JCS → CBOR with byte equality.
## Implementation notes
- v1 library: `cbor2` (PyPI; pure-Python with optional C extension).
Wrapped behind `artifactstore.manifest.codec` so swapping to a faster
impl is transparent.
- JCS projection: `jcs` (PyPI) or hand-rolled — decision deferred to
WP-0001-T003.
- A `Manifest` value class enforces field order on emit, not just on
encode. This catches non-canonical producers at the API boundary.