# Architecture Blueprint Status: accepted (v2 — supersedes 2026-05-15 draft) Updated: 2026-05-15 This document operationalises `INTENT.md`, the `docs/PLATFORM-AMBITION.md` thesis, and the decisions recorded in `docs/adr/`. Where a tension exists between this blueprint and an ADR, the ADR wins; raise an issue or supersede the ADR. ## Architecture in one paragraph `artifact-store` is a **library-first** artifact registry and storage gateway. A small core library (`artifactstore`) implements identity, manifests, retention, the storage adapter SPI, the data plane SPI, and the registry orchestrator. The HTTP server and the CLI are thin consumers of that library. Bytes are addressed by content (`blake3:`) and stored through a pluggable adapter SPI. State is authoritative in an append-only event log; queryable tables are materialised views. ## Design lineage The shape is deliberately borrowed from `ffmpeg` and `VLC`: a tight core of well-named modules with stable contracts, runtime-pluggable backends, a thin orchestration binary, and an explicit hot-path boundary that can be rewritten in faster code without changing the consumer API. See `docs/PLATFORM-AMBITION.md` for the reference table. ## Top-level shape ```text producers / operators / agents | v +------------------------+ | HTTP API | CLI | <-- thin consumers +------------------------+ | v +------------------------+ | registry orchestrator | +------------------------+ | | | v v v +----------+ +---------+ +---------+ | identity | | events | |retention| |/manifest | | (log + | | policy | | | | views) | | engine | +----------+ +---------+ +---------+ | v +-----------------------+ | data plane SPI | <-- ADR-0004 contract +-----------------------+ | v +-----------------------+ | storage adapter SPI | +-----------------------+ | | | v v v +-----+ +------+ +-------+ |local| | S3 | | Ceph | ... future backends | FS | | RGW | | RGW | +-----+ +------+ +-------+ ``` ## Core modules Mapped one-to-one to ADR-0005's project layout. Each module has a stable public surface; internals are free to evolve. ### `identity` - `Digest(algorithm, hex)` — value object. - `ContentAddress` — `:` (ADR-0001). - `digest_stream(reader) -> {primary, sha256}` — single-pass dual digest. - Algorithm registry: `blake3` (default primary), `sha256` (always computed). ### `manifest` - `Manifest` — versioned dataclass: package metadata + ordered file list + retention summary + provenance + storage receipts. - `manifest.codec.encode(manifest) -> bytes` — canonical CBOR (ADR-0003). - `manifest.codec.decode(bytes) -> Manifest`. - `manifest.projection.jcs(manifest) -> bytes` — canonical-JSON projection for display and signing-tool interop. - Round-trip invariant: `decode(encode(m)) == m` and `encode(decode(jcs_to_cbor(jcs(m)))) == encode(m)`. ### `events` - `events.write(transaction, event)` — appends one row with monotonic sequence (ADR-0002). - `events.tail(since_sequence) -> AsyncIterator[Event]` — long-poll. - `events.replay(into=ViewWriter)` — rebuild materialised views. - Event types (v1): `v1.package.created`, `v1.file.ingested`, `v1.package.finalized`, `v1.retention.default_applied`, `v1.retention.extended`, `v1.retention.hold_applied`, `v1.retention.hold_released`, `v1.retention.deletion_eligible`, `v1.storage.location_recorded`, `v1.storage.location_verified`, `v1.audit.access`, `v1.system.note`. ### `retention` - `retention.classes` — `transient`, `raw-evidence`, `summary-evidence`, `release-evidence`, `permanent-record`. Defined as data, not code. - `retention.policy.apply(package, class) -> RetentionDecision` — computes `expires_at` and the deletion eligibility rule. - `retention.extend(package, until, reason, actor)` — emits an event; the materialised view updates on commit. - `retention.hold(package, reason, actor)` / `retention.release_hold(hold_id, actor)`. ### `audit` - A view over `events` filtered to access and lifecycle events. No separate write path; auditing happens by event emission elsewhere. ### `storage` (adapter SPI) ```python class StorageBackend(Protocol): backend_id: str async def put(self, content_address: ContentAddress, stream: AsyncIterator[bytes], size_hint: int | None) -> StorageReceipt: ... async def get(self, content_address: ContentAddress, byte_range: tuple[int, int] | None = None) -> AsyncIterator[bytes]: ... async def head(self, content_address: ContentAddress) -> StorageObjectMetadata: ... async def delete(self, content_address: ContentAddress) -> DeletionResult: ... async def health(self) -> BackendStatus: ... ``` - Backend registry: backends register at import time; selection is per-package by configuration. - v1 ships `local` (filesystem); `s3` ships in WP-0004. ### `dataplane` (SPI per ADR-0004) ```python class DataPlane(Protocol): async def ingest_stream(self, stream: AsyncIterator[bytes], hints: IngestHints) -> IngestResult: ... async def serve_object(self, content_address: ContentAddress, byte_range: tuple[int, int] | None = None) -> AsyncIterator[bytes]: ... async def verify_object(self, content_address: ContentAddress) -> VerifyResult: ... async def delete_object(self, content_address: ContentAddress) -> DeletionResult: ... async def backend_health(self) -> BackendStatus: ... ``` - v1 implementation: `dataplane.inproc` — wraps a `StorageBackend`, computes digests during streaming. - Future implementation: `dataplane.remote` — gRPC or framed-bincode-over-Unix-socket client to a Rust daemon. ### `registry` The orchestrator. Combines `identity + manifest + events + retention + dataplane` into the operations the HTTP API and CLI consume: `create_package`, `ingest_file`, `finalize_package`, `get_manifest`, `download_file`, `extend_retention`, `apply_hold`, `release_hold`, `mark_deletion_eligible`, `tail_events`. Each operation is one DB transaction that writes one or more events and updates materialised views. ### `api.http` and `cli` Thin. Their job is to translate transport (HTTP / argv) into calls on `registry`. No business logic. ## Data model All tables exist as **materialised views over `events`** (ADR-0002), except `events` itself, `retention_classes` (seed data), and `metadata_schemas` (config). ### `events` (source of truth) | Column | Type | Notes | |---|---|---| | `sequence` | `BIGSERIAL PRIMARY KEY` | monotonic, gapless | | `created_at` | `TIMESTAMPTZ NOT NULL` | UTC, set by DB default | | `event_type` | `TEXT NOT NULL` | versioned slug (`v1.…`) | | `subject_kind` | `TEXT NOT NULL` | `package` / `file` / `retention` / `storage` / `system` | | `subject_id` | `UUID NULL` | | | `actor` | `TEXT NOT NULL` | producer or operator identity | | `payload` | `BYTEA NOT NULL` | canonical CBOR | | `payload_digest` | `BYTEA NOT NULL` | BLAKE3 of `payload` | Indexes: `(subject_kind, subject_id)`, `(event_type, sequence)`. ### `artifact_packages` (materialised view) | Column | Type | Notes | |---|---|---| | `id` | `UUID PRIMARY KEY` | | | `name` | `TEXT NOT NULL` | | | `producer` | `TEXT NOT NULL` | | | `subject` | `TEXT NOT NULL` | | | `retention_class` | `TEXT NOT NULL` | FK to `retention_classes` | | `metadata_schema_id` | `UUID NULL` | FK to `metadata_schemas` | | `metadata` | `JSONB NOT NULL` | validated against schema if present | | `status` | `TEXT NOT NULL` | `created` / `uploading` / `finalized` / `deletion_eligible` / `deleted` / `failed` | | `manifest_digest` | `BYTEA NULL` | populated on finalize | | `created_at`, `finalized_at`, `expires_at` | `TIMESTAMPTZ` | | | `last_event_sequence` | `BIGINT NOT NULL` | for replay bookkeeping | ### `artifact_files` (materialised view) | Column | Type | Notes | |---|---|---| | `id` | `UUID PRIMARY KEY` | | | `package_id` | `UUID NOT NULL` | FK | | `relative_path` | `TEXT NOT NULL` | logical path; unique within package | | `media_type` | `TEXT NOT NULL` | required (ADR-0006) | | `size_bytes` | `BIGINT NOT NULL` | | | `digest_algorithm` | `TEXT NOT NULL` | `blake3` by default (ADR-0001) | | `digest_primary` | `BYTEA NOT NULL` | bytes of the primary digest | | `digest_sha256` | `BYTEA NOT NULL` | always populated for interop | | `created_at` | `TIMESTAMPTZ NOT NULL` | | ### `storage_locations` (materialised view) | Column | Type | Notes | |---|---|---| | `id` | `UUID PRIMARY KEY` | | | `artifact_file_id` | `UUID NOT NULL` | FK | | `backend_id` | `TEXT NOT NULL` | | | `content_address` | `TEXT NOT NULL` | `:` | | `object_key` | `TEXT NOT NULL` | backend-specific, usually derived from `content_address` | | `storage_class` | `TEXT NULL` | backend-specific label | | `retrieval_tier` | `TEXT NOT NULL DEFAULT 'hot'` | `hot` / `warm` / `cold` / `archive` | | `restore_status` | `TEXT NULL` | `available` / `restore_requested` / `restoring` / `restored` / `expired` | | `status` | `TEXT NOT NULL` | `recorded` / `verified` / `failed` / `deleted` | | `created_at`, `last_verified_at` | `TIMESTAMPTZ` | | ### `retention_state` (materialised view) | Column | Type | Notes | |---|---|---| | `package_id` | `UUID PRIMARY KEY` | | | `current_expires_at` | `TIMESTAMPTZ NULL` | NULL = no expiry (permanent or held) | | `effective_class` | `TEXT NOT NULL` | | | `active_hold_id` | `UUID NULL` | | | `eligible_for_deletion` | `BOOLEAN NOT NULL` | | ### `retention_classes` (seed data, not derived) | Column | Type | Notes | |---|---|---| | `class_id` | `TEXT PRIMARY KEY` | `transient` / `raw-evidence` / `summary-evidence` / `release-evidence` / `permanent-record` | | `default_duration` | `INTERVAL NULL` | NULL for `permanent-record` | | `deletion_strategy` | `TEXT NOT NULL` | `mark_eligible` / `auto_delete_after_grace` (v1 only uses the former) | ### `metadata_schemas` (config table) | Column | Type | Notes | |---|---|---| | `id` | `UUID PRIMARY KEY` | | | `slug` | `TEXT NOT NULL UNIQUE` | e.g. `guide-board.run.v1` | | `json_schema` | `JSONB NOT NULL` | | | `created_at` | `TIMESTAMPTZ NOT NULL` | | ## API shape ### Native v1 surface ```text GET /health GET /backends GET /retention-classes POST /packages # create GET /packages # list, query by metadata GET /packages/{package_id} # metadata POST /packages/{package_id}/files # single-shot file upload POST /packages/{package_id}/finalize # produce manifest GET /packages/{package_id}/manifest # canonical CBOR (Accept: application/cbor) GET /packages/{package_id}/manifest.json # JCS projection (Accept: application/json) GET /files/{file_id} # metadata GET /files/{file_id}/download # bytes POST /uploads # open an upload session (resource shape pinned now) PATCH /uploads/{upload_id} # range body POST /uploads/{upload_id}/complete # promote to /packages/.../files POST /packages/{package_id}/retention/extensions POST /packages/{package_id}/retention/holds POST /packages/{package_id}/retention/holds/{hold_id}/release GET /events?since={sequence} # long-poll registry change feed ``` The `POST /uploads/...` resource shape is committed now even if v1 implements it as single-shot internally; ADR per `PLATFORM-AMBITION` A6. ### Deferred / not v1 - `/v2/…` OCI Distribution endpoints (ADR-0006). - gRPC API. - Streaming CDC topic (NATS / Kafka). - Multi-tenant namespacing in URLs. ## Package manifest content (v1) A finalised manifest carries: - `manifest_version: 1` - `package`: id, name, producer, subject, retention class, created_at, finalized_at, expires_at, metadata, metadata_schema_id (nullable). - `files`: ordered list of `{id, relative_path, media_type, size_bytes, digest_algorithm, digest_primary_hex, digest_sha256_hex}`. - `storage_receipts`: ordered list of `{file_id, backend_id, content_address, retrieval_tier, status}` per stored copy. - `retention_summary`: current class, expires_at, holds, last retention event. - `provenance`: `{source_commits, tool_versions, environment, ingest_actor, ingest_timestamps}`. Schema-driven; freeform under a registered schema or empty if none. The manifest digest (`blake3:`) is the package's canonical external identifier. ## Storage backends ### Local filesystem (v1) - Root: configured directory. - Object key layout: `////`. - Atomic write via `fsync(tmpfile) + rename`. No partial states visible. - Path traversal prevented at the SPI boundary; the local backend rejects any key that does not match the expected layout. ### S3-compatible / Ceph RGW (WP-0004) - Endpoint, bucket, region, access key ref, secret key ref, key prefix, storage class label, optional SSE config. - Object key: `////`. - Multipart upload for objects above a configurable threshold. ## Security boundary (v1) - Internal service. No anonymous public access. - Authenticated producer / operator API. v1 ships shared-secret bearer tokens; OIDC integration is its own workplan. - No secret values in artifact metadata. - Upload paths are logical; never trusted filesystem paths. The `/uploads/...` path-ingest endpoint is *not* offered in v1. - Download authorisation is checked at the registry layer, never at the backend. ## Resolved open questions - **Deduplication scope.** Global by content address (ADR-0001). Reference-counted deletion via a GC pass (WP-0006, TBD). - **Deletion ordering.** Mark records `deletion_eligible` first via an event. Byte deletion is a separate, audited operation that emits a second event. Reverse order is forbidden. - **Metadata schemas.** Open JSON with optional producer-registered JSON Schema; validation at ingest (ADR-0005, `metadata_schemas`). - **Statehub integration scope.** Statehub keeps package IDs and summary; never bytes. The `/events` long-poll is the integration point. ## Outstanding open questions (not blocking v1) - Identity provider for shared deployments. - Default retention durations per class (operator-configurable; needs one round of stakeholder input). - WASM plugin host design (deferred to its own workplan; see `PLATFORM-AMBITION`). - Federation / mirroring protocol (post-OCI-endpoint workplan). ## Roadmap pointer The implementation sequence is in `docs/ROADMAP.md`. The first workplan is `workplans/ARTIFACT-STORE-WP-0001-foundation.md`.