docs+plans: reconcile blueprint with ambition, add ADRs, sequence workplans

Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.

ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.

Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.

Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.

README and AGENTS.md refreshed to reflect the new repo shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-15 21:16:17 +02:00
parent 403d903585
commit 747afc27a6
16 changed files with 1761 additions and 404 deletions

View File

@@ -1,330 +1,378 @@
# Artifact Store Architecture Blueprint
# Architecture Blueprint
Status: draft
Created: 2026-05-15
Status: accepted (v2 — supersedes 2026-05-15 draft)
Updated: 2026-05-15
## Purpose
This document operationalises `INTENT.md`, the `docs/PLATFORM-AMBITION.md`
thesis, and the decisions recorded in `docs/adr/`. Where a tension exists
between this blueprint and an ADR, the ADR wins; raise an issue or
supersede the ADR.
`artifact-store` provides a generic registry and storage gateway for durable
generated artifacts. Producers register packages and files with metadata;
storage adapters persist the bytes; retention policy decides how long artifacts
remain eligible for retrieval.
## Architecture in one paragraph
The design keeps artifact identity and lifecycle separate from storage
implementation. This allows the first version to run against local filesystem
storage while the production path can use S3-compatible object storage such as
Ceph RGW.
`artifact-store` is a **library-first** artifact registry and storage
gateway. A small core library (`artifactstore`) implements identity,
manifests, retention, the storage adapter SPI, the data plane SPI, and
the registry orchestrator. The HTTP server and the CLI are thin
consumers of that library. Bytes are addressed by content
(`blake3:<hex>`) and stored through a pluggable adapter SPI. State is
authoritative in an append-only event log; queryable tables are
materialised views.
## Architecture Summary
## Design lineage
The shape is deliberately borrowed from `ffmpeg` and `VLC`: a tight
core of well-named modules with stable contracts, runtime-pluggable
backends, a thin orchestration binary, and an explicit hot-path
boundary that can be rewritten in faster code without changing the
consumer API. See `docs/PLATFORM-AMBITION.md` for the reference table.
## Top-level shape
```text
producer
-> Artifact Registry API
-> metadata database
-> retention policy engine
-> audit event log
-> storage adapter interface
-> local filesystem backend
-> S3-compatible backend
-> Ceph RGW deployment
-> future cloud/blob/archive backends
producers / operators / agents
|
v
+------------------------+
| HTTP API | CLI | <-- thin consumers
+------------------------+
|
v
+------------------------+
| registry orchestrator |
+------------------------+
| | |
v v v
+----------+ +---------+ +---------+
| identity | | events | |retention|
|/manifest | | (log + | | policy |
| | | views) | | engine |
+----------+ +---------+ +---------+
|
v
+-----------------------+
| data plane SPI | <-- ADR-0004 contract
+-----------------------+
|
v
+-----------------------+
| storage adapter SPI |
+-----------------------+
| | |
v v v
+-----+ +------+ +-------+
|local| | S3 | | Ceph | ... future backends
| FS | | RGW | | RGW |
+-----+ +------+ +-------+
```
The registry is the authority for artifact metadata and lifecycle. Backends are
responsible for byte storage and retrieval.
## Core modules
## Design Principles
Mapped one-to-one to ADR-0005's project layout. Each module has a
stable public surface; internals are free to evolve.
- Backend-neutral registry: no producer should know whether bytes live in Ceph,
local disk, or a cloud bucket.
- Content-addressable confidence: every stored file has a digest and size.
- Retention by default: every package receives an expiry decision at ingestion.
- Extensions are explicit: retention extensions and holds are audit events, not
silent metadata edits.
- Packages remain portable: a manifest should be enough to understand a package
without calling the producer.
- Statehub links, it does not store bytes: Statehub records artifact IDs and
outcomes; artifact-store owns file persistence.
- Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion
jobs must be auditable and reversible only when the backend still has data.
### `identity`
## Components
- `Digest(algorithm, hex)` — value object.
- `ContentAddress``<algorithm>:<hex>` (ADR-0001).
- `digest_stream(reader) -> {primary, sha256}` — single-pass dual digest.
- Algorithm registry: `blake3` (default primary), `sha256` (always
computed).
### Registry API
### `manifest`
HTTP API for producers and operators.
- `Manifest` — versioned dataclass: package metadata + ordered file list
+ retention summary + provenance + storage receipts.
- `manifest.codec.encode(manifest) -> bytes` — canonical CBOR
(ADR-0003).
- `manifest.codec.decode(bytes) -> Manifest`.
- `manifest.projection.jcs(manifest) -> bytes` — canonical-JSON
projection for display and signing-tool interop.
- Round-trip invariant: `decode(encode(m)) == m` and
`encode(decode(jcs_to_cbor(jcs(m)))) == encode(m)`.
Initial responsibilities:
### `events`
- create artifact packages,
- upload or ingest files,
- finalize packages,
- retrieve package metadata,
- list/search packages by subject and producer metadata,
- create retention extensions and holds,
- expose download metadata or redirect/download endpoints,
- expose health and backend status.
- `events.write(transaction, event)` — appends one row with monotonic
sequence (ADR-0002).
- `events.tail(since_sequence) -> AsyncIterator[Event]` — long-poll.
- `events.replay(into=ViewWriter)` — rebuild materialised views.
- Event types (v1):
`v1.package.created`, `v1.file.ingested`, `v1.package.finalized`,
`v1.retention.default_applied`, `v1.retention.extended`,
`v1.retention.hold_applied`, `v1.retention.hold_released`,
`v1.retention.deletion_eligible`, `v1.storage.location_recorded`,
`v1.storage.location_verified`, `v1.audit.access`,
`v1.system.note`.
### Metadata Store
### `retention`
Persistent database for registry state.
- `retention.classes``transient`, `raw-evidence`, `summary-evidence`,
`release-evidence`, `permanent-record`. Defined as data, not code.
- `retention.policy.apply(package, class) -> RetentionDecision`
computes `expires_at` and the deletion eligibility rule.
- `retention.extend(package, until, reason, actor)` — emits an event;
the materialised view updates on commit.
- `retention.hold(package, reason, actor)` /
`retention.release_hold(hold_id, actor)`.
Initial implementation can use SQLite for local development and PostgreSQL for
shared service deployments if that matches the surrounding service stack.
### `audit`
Core tables:
- A view over `events` filtered to access and lifecycle events. No
separate write path; auditing happens by event emission elsewhere.
- `artifact_packages`
- `artifact_files`
- `storage_locations`
- `retention_rules`
- `retention_events`
- `audit_events`
### `storage` (adapter SPI)
### Storage Adapter Interface
```python
class StorageBackend(Protocol):
backend_id: str
async def put(self, content_address: ContentAddress, stream: AsyncIterator[bytes], size_hint: int | None) -> StorageReceipt: ...
async def get(self, content_address: ContentAddress, byte_range: tuple[int, int] | None = None) -> AsyncIterator[bytes]: ...
async def head(self, content_address: ContentAddress) -> StorageObjectMetadata: ...
async def delete(self, content_address: ContentAddress) -> DeletionResult: ...
async def health(self) -> BackendStatus: ...
```
Small backend contract used by the API service.
- Backend registry: backends register at import time; selection is
per-package by configuration.
- v1 ships `local` (filesystem); `s3` ships in WP-0004.
Required operations:
### `dataplane` (SPI per ADR-0004)
- `put(object_key, stream, metadata) -> storage_location`
- `get(object_key) -> stream or signed_url`
- `head(object_key) -> object_metadata`
- `delete(object_key) -> deletion_result`
- `health() -> backend_status`
```python
class DataPlane(Protocol):
async def ingest_stream(self, stream: AsyncIterator[bytes], hints: IngestHints) -> IngestResult: ...
async def serve_object(self, content_address: ContentAddress, byte_range: tuple[int, int] | None = None) -> AsyncIterator[bytes]: ...
async def verify_object(self, content_address: ContentAddress) -> VerifyResult: ...
async def delete_object(self, content_address: ContentAddress) -> DeletionResult: ...
async def backend_health(self) -> BackendStatus: ...
```
Initial backends:
- v1 implementation: `dataplane.inproc` — wraps a `StorageBackend`,
computes digests during streaming.
- Future implementation: `dataplane.remote` — gRPC or
framed-bincode-over-Unix-socket client to a Rust daemon.
- local filesystem backend for tests and development,
- S3-compatible backend for Ceph RGW and cloud object stores.
### `registry`
### Retention Policy Engine
The orchestrator. Combines `identity + manifest + events + retention +
dataplane` into the operations the HTTP API and CLI consume:
`create_package`, `ingest_file`, `finalize_package`, `get_manifest`,
`download_file`, `extend_retention`, `apply_hold`, `release_hold`,
`mark_deletion_eligible`, `tail_events`. Each operation is one DB
transaction that writes one or more events and updates materialised
views.
Applies default rules at ingestion and records later changes.
### `api.http` and `cli`
Initial retention classes:
Thin. Their job is to translate transport (HTTP / argv) into calls on
`registry`. No business logic.
- `transient`: short-lived scratch artifacts,
- `raw-evidence`: raw logs and run output,
- `summary-evidence`: compact reports and summaries,
- `release-evidence`: release or customer-facing evidence packages,
- `permanent-record`: manually held records with no automatic expiry.
## Data model
Each package stores:
All tables exist as **materialised views over `events`** (ADR-0002),
except `events` itself, `retention_classes` (seed data), and
`metadata_schemas` (config).
- selected retention class,
- default retention rule,
- computed `expires_at`,
- extension records,
- hold records,
- deletion eligibility state.
### `events` (source of truth)
### Audit Log
| Column | Type | Notes |
|---|---|---|
| `sequence` | `BIGSERIAL PRIMARY KEY` | monotonic, gapless |
| `created_at` | `TIMESTAMPTZ NOT NULL` | UTC, set by DB default |
| `event_type` | `TEXT NOT NULL` | versioned slug (`v1.…`) |
| `subject_kind` | `TEXT NOT NULL` | `package` / `file` / `retention` / `storage` / `system` |
| `subject_id` | `UUID NULL` | |
| `actor` | `TEXT NOT NULL` | producer or operator identity |
| `payload` | `BYTEA NOT NULL` | canonical CBOR |
| `payload_digest` | `BYTEA NOT NULL` | BLAKE3 of `payload` |
Append-only record of important events:
Indexes: `(subject_kind, subject_id)`, `(event_type, sequence)`.
- package created,
- file uploaded,
- package finalized,
- retrieval requested,
- retention extended,
- hold applied or released,
- deletion requested,
- deletion completed or failed.
### `artifact_packages` (materialised view)
The audit log does not need to be cryptographic in the first release, but the
schema should leave room for signed events or external write-once storage later.
| Column | Type | Notes |
|---|---|---|
| `id` | `UUID PRIMARY KEY` | |
| `name` | `TEXT NOT NULL` | |
| `producer` | `TEXT NOT NULL` | |
| `subject` | `TEXT NOT NULL` | |
| `retention_class` | `TEXT NOT NULL` | FK to `retention_classes` |
| `metadata_schema_id` | `UUID NULL` | FK to `metadata_schemas` |
| `metadata` | `JSONB NOT NULL` | validated against schema if present |
| `status` | `TEXT NOT NULL` | `created` / `uploading` / `finalized` / `deletion_eligible` / `deleted` / `failed` |
| `manifest_digest` | `BYTEA NULL` | populated on finalize |
| `created_at`, `finalized_at`, `expires_at` | `TIMESTAMPTZ` | |
| `last_event_sequence` | `BIGINT NOT NULL` | for replay bookkeeping |
## Data Model
### `artifact_files` (materialised view)
### Artifact Package
| Column | Type | Notes |
|---|---|---|
| `id` | `UUID PRIMARY KEY` | |
| `package_id` | `UUID NOT NULL` | FK |
| `relative_path` | `TEXT NOT NULL` | logical path; unique within package |
| `media_type` | `TEXT NOT NULL` | required (ADR-0006) |
| `size_bytes` | `BIGINT NOT NULL` | |
| `digest_algorithm` | `TEXT NOT NULL` | `blake3` by default (ADR-0001) |
| `digest_primary` | `BYTEA NOT NULL` | bytes of the primary digest |
| `digest_sha256` | `BYTEA NOT NULL` | always populated for interop |
| `created_at` | `TIMESTAMPTZ NOT NULL` | |
Required fields:
### `storage_locations` (materialised view)
- `id`
- `name`
- `producer`
- `subject`
- `retention_class`
- `status`
- `created_at`
- `finalized_at`
- `expires_at`
- `metadata`
| Column | Type | Notes |
|---|---|---|
| `id` | `UUID PRIMARY KEY` | |
| `artifact_file_id` | `UUID NOT NULL` | FK |
| `backend_id` | `TEXT NOT NULL` | |
| `content_address` | `TEXT NOT NULL` | `<algo>:<hex>` |
| `object_key` | `TEXT NOT NULL` | backend-specific, usually derived from `content_address` |
| `storage_class` | `TEXT NULL` | backend-specific label |
| `retrieval_tier` | `TEXT NOT NULL DEFAULT 'hot'` | `hot` / `warm` / `cold` / `archive` |
| `restore_status` | `TEXT NULL` | `available` / `restore_requested` / `restoring` / `restored` / `expired` |
| `status` | `TEXT NOT NULL` | `recorded` / `verified` / `failed` / `deleted` |
| `created_at`, `last_verified_at` | `TIMESTAMPTZ` | |
Recommended metadata keys:
### `retention_state` (materialised view)
- `repo_slug`
- `run_id`
- `assessment_id`
- `target_profile_ref`
- `assessment_profile_ref`
- `source_commits`
- `tool_versions`
- `environment`
| Column | Type | Notes |
|---|---|---|
| `package_id` | `UUID PRIMARY KEY` | |
| `current_expires_at` | `TIMESTAMPTZ NULL` | NULL = no expiry (permanent or held) |
| `effective_class` | `TEXT NOT NULL` | |
| `active_hold_id` | `UUID NULL` | |
| `eligible_for_deletion` | `BOOLEAN NOT NULL` | |
### Artifact File
### `retention_classes` (seed data, not derived)
Required fields:
| Column | Type | Notes |
|---|---|---|
| `class_id` | `TEXT PRIMARY KEY` | `transient` / `raw-evidence` / `summary-evidence` / `release-evidence` / `permanent-record` |
| `default_duration` | `INTERVAL NULL` | NULL for `permanent-record` |
| `deletion_strategy` | `TEXT NOT NULL` | `mark_eligible` / `auto_delete_after_grace` (v1 only uses the former) |
- `id`
- `package_id`
- `relative_path`
- `media_type`
- `size_bytes`
- `sha256`
- `created_at`
### `metadata_schemas` (config table)
### Storage Location
| Column | Type | Notes |
|---|---|---|
| `id` | `UUID PRIMARY KEY` | |
| `slug` | `TEXT NOT NULL UNIQUE` | e.g. `guide-board.run.v1` |
| `json_schema` | `JSONB NOT NULL` | |
| `created_at` | `TIMESTAMPTZ NOT NULL` | |
Required fields:
## API shape
- `id`
- `artifact_file_id`
- `backend_id`
- `object_key`
- `storage_class`
- `status`
- `created_at`
- `last_verified_at`
### Retention Event
Required fields:
- `id`
- `package_id`
- `event_type`
- `reason`
- `created_by`
- `created_at`
- `previous_expires_at`
- `new_expires_at`
Event types:
- `default_rule_applied`
- `extended`
- `hold_applied`
- `hold_released`
- `deletion_eligible`
- `deleted`
## API Shape
Initial endpoints:
### Native v1 surface
```text
GET /health
GET /backends
POST /packages
GET /packages
GET /packages/{package_id}
POST /packages/{package_id}/files
POST /packages/{package_id}/finalize
GET /packages/{package_id}/manifest
GET /files/{file_id}/download
POST /packages/{package_id}/retention/extensions
POST /packages/{package_id}/retention/holds
POST /packages/{package_id}/retention/holds/{hold_id}/release
GET /health
GET /backends
GET /retention-classes
POST /packages # create
GET /packages # list, query by metadata
GET /packages/{package_id} # metadata
POST /packages/{package_id}/files # single-shot file upload
POST /packages/{package_id}/finalize # produce manifest
GET /packages/{package_id}/manifest # canonical CBOR (Accept: application/cbor)
GET /packages/{package_id}/manifest.json # JCS projection (Accept: application/json)
GET /files/{file_id} # metadata
GET /files/{file_id}/download # bytes
POST /uploads # open an upload session (resource shape pinned now)
PATCH /uploads/{upload_id} # range body
POST /uploads/{upload_id}/complete # promote to /packages/.../files
POST /packages/{package_id}/retention/extensions
POST /packages/{package_id}/retention/holds
POST /packages/{package_id}/retention/holds/{hold_id}/release
GET /events?since={sequence} # long-poll registry change feed
```
The first ingestion path can accept multipart file uploads. A later trusted-local
operator endpoint may ingest from server-local paths, but it should be disabled
by default because path ingestion changes the security boundary.
The `POST /uploads/...` resource shape is committed now even if v1
implements it as single-shot internally; ADR per `PLATFORM-AMBITION` A6.
## Package Manifest
### Deferred / not v1
Every finalized package should expose a JSON manifest containing:
- `/v2/…` OCI Distribution endpoints (ADR-0006).
- gRPC API.
- Streaming CDC topic (NATS / Kafka).
- Multi-tenant namespacing in URLs.
- package metadata,
- retention summary,
- file list,
- file digests and sizes,
- storage backend references,
- source metadata,
- created/finalized timestamps.
## Package manifest content (v1)
For guide-board runs, the manifest should preserve links to:
A finalised manifest carries:
- `run.json`
- `retention-summary.json`
- `reports/assessment-package.json`
- `reports/report.md`
- extension-generated scorecards or log reviews,
- raw artifact files captured by the assessment package manifest.
- `manifest_version: 1`
- `package`: id, name, producer, subject, retention class, created_at,
finalized_at, expires_at, metadata, metadata_schema_id (nullable).
- `files`: ordered list of `{id, relative_path, media_type, size_bytes,
digest_algorithm, digest_primary_hex, digest_sha256_hex}`.
- `storage_receipts`: ordered list of `{file_id, backend_id,
content_address, retrieval_tier, status}` per stored copy.
- `retention_summary`: current class, expires_at, holds, last
retention event.
- `provenance`: `{source_commits, tool_versions, environment,
ingest_actor, ingest_timestamps}`. Schema-driven; freeform under a
registered schema or empty if none.
## Guide-Board Pilot Flow
The manifest digest (`blake3:<hex>`) is the package's canonical
external identifier.
```text
guide-board run directory
-> open-cmis-tck scorecard/log review
-> artifact-store package create
-> upload run files
-> finalize manifest
-> Statehub record links package id and summary
```
## Storage backends
The artifact package should carry:
### Local filesystem (v1)
- run id,
- target profile reference,
- assessment profile reference,
- result status,
- source commits for guide-board, open-cmis-tck, and the assessed repository,
- important report paths,
- retention class `raw-evidence` or `release-evidence`.
- Root: configured directory.
- Object key layout: `<root>/<digest_algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>`.
- Atomic write via `fsync(tmpfile) + rename`. No partial states visible.
- Path traversal prevented at the SPI boundary; the local backend
rejects any key that does not match the expected layout.
## Ceph And S3-Compatible Storage
### S3-compatible / Ceph RGW (WP-0004)
Ceph should be introduced through the S3-compatible adapter, not as a special
case in producer logic.
- Endpoint, bucket, region, access key ref, secret key ref, key
prefix, storage class label, optional SSE config.
- Object key: `<prefix>/<digest_algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>`.
- Multipart upload for objects above a configurable threshold.
Configuration should support:
## Security boundary (v1)
- endpoint URL,
- bucket,
- region,
- access key reference,
- secret key reference,
- optional server-side encryption settings,
- object key prefix,
- storage class label.
- Internal service. No anonymous public access.
- Authenticated producer / operator API. v1 ships shared-secret bearer
tokens; OIDC integration is its own workplan.
- No secret values in artifact metadata.
- Upload paths are logical; never trusted filesystem paths. The
`/uploads/...` path-ingest endpoint is *not* offered in v1.
- Download authorisation is checked at the registry layer, never at
the backend.
The service should never require credentials in producer request bodies. Use
environment variables, mounted secret files, or a local secret provider.
## Resolved open questions
## Future Retrieval Tiers
- **Deduplication scope.** Global by content address (ADR-0001).
Reference-counted deletion via a GC pass (WP-0006, TBD).
- **Deletion ordering.** Mark records `deletion_eligible` first via an
event. Byte deletion is a separate, audited operation that emits a
second event. Reverse order is forbidden.
- **Metadata schemas.** Open JSON with optional producer-registered
JSON Schema; validation at ingest (ADR-0005, `metadata_schemas`).
- **Statehub integration scope.** Statehub keeps package IDs and
summary; never bytes. The `/events` long-poll is the integration
point.
The initial API can treat all stored files as immediately retrievable. Later,
storage locations can include:
## Outstanding open questions (not blocking v1)
- `retrieval_tier`: hot, warm, cold, archive,
- `restore_status`: available, restore_requested, restoring, restored, expired,
- `restore_requested_at`,
- `restore_expires_at`.
- Identity provider for shared deployments.
- Default retention durations per class (operator-configurable; needs
one round of stakeholder input).
- WASM plugin host design (deferred to its own workplan; see
`PLATFORM-AMBITION`).
- Federation / mirroring protocol (post-OCI-endpoint workplan).
The registry API should be able to return "not immediately available" without
changing artifact identity.
## Roadmap pointer
## Security Boundary
Initial service assumptions:
- internal service, not public internet exposed,
- authenticated producer/operator API before shared deployment,
- no secret values stored in artifact metadata,
- package paths are logical paths, not trusted filesystem paths,
- download authorization should be checked at the registry layer.
Files may contain sensitive evidence. The service must treat metadata and bytes
as confidential by default.
## Open Questions
- Which identity provider should guard shared deployments?
- Should package metadata schemas be open-ended JSON or typed by producer?
- Should deduplication be package-local only or global by content hash?
- Should deletion first mark records deleted, then delete bytes, or reverse that
order with compensating events?
- How much Statehub integration belongs in this repo versus in Statehub clients?
The implementation sequence is in `docs/ROADMAP.md`. The first
workplan is `workplans/ARTIFACT-STORE-WP-0001-foundation.md`.

93
docs/ROADMAP.md Normal file
View File

@@ -0,0 +1,93 @@
# Roadmap
Status: living document
Updated: 2026-05-15
The roadmap sequences `artifact-store` from "no code" to a credible
production v1 to the longer-horizon platform shape recorded in
`docs/PLATFORM-AMBITION.md`. Each row is a self-contained workplan with
its own acceptance criteria; nothing here is a binding milestone.
The sequencing principle is **library-first** (ffmpeg-shaped):
foundational kernels and contracts before any consumer code. The HTTP
server and CLI exist only after the core library can be exercised
end-to-end against a local filesystem backend.
## Phase 0 — Cleanup (done 2026-05-15)
- ADR-0001 through ADR-0006 accepted.
- Architecture blueprint rewritten to v2.
- Platform ambition and assembly experiment documented.
- Workplans re-sequenced.
## Phase 1 — Foundation and pilot (v0.1)
Goal: ingest a real guide-board run end-to-end, against a local
filesystem backend, with retention applied and events logged.
| ID | Title | Carries existing task IDs | Notes |
|---|---|---|---|
| WP-0001 | Foundation: scaffold, core kernels, local FS backend | T001, T002, T003, T008 | All of the library-shaped modules; no HTTP API yet beyond `/health`. |
| WP-0002 | Ingestion API + manifest surface | T004 | The HTTP API. Builds on WP-0001's library. |
| WP-0003 | Retention lifecycle | T005 | Retention engine, extensions, holds, deletion eligibility. |
| WP-0004 | S3-compatible backend (Ceph RGW target) | T006 | Second concrete adapter. |
| WP-0005 | Guide-board pilot ingestion | T007 | First real producer wired up. |
Exit criteria for v0.1: WP-0001 through WP-0005 done; a guide-board
CMIS run round-trips through artifact-store with manifest, retention,
and Statehub linkage; backend swappable between local FS and an
S3-compatible store.
## Phase 2 — Production hardening (v0.2 v0.3)
| ID | Title | Notes |
|---|---|---|
| WP-0006 | Garbage collection + reference counting | Required by ADR-0001 global dedup. Mark-eligible already lands in WP-0003; this workplan does the byte-deletion pass. |
| WP-0007 | Resumable / chunked upload implementation | The wire shape lands in WP-0002; this workplan makes the implementation actually streaming. |
| WP-0008 | Auth, multi-tenancy, quota | OIDC integration; tenant namespacing; per-tenant rate limit and storage quota. |
| WP-0009 | Observability: metrics, tracing, structured logs | OpenTelemetry SDK; latency / throughput SLOs published. |
| WP-0010 | Event stream out (CDC) | NATS or Kafka topic of registry events; long-poll `/events` becomes a fallback. |
| WP-0011 | Signed manifests | Sigstore / cosign integration; signature recorded alongside manifest digest. |
Exit criteria for v0.3: a deployment is operatable by humans without
internal knowledge; SLOs are measurable; access is authenticated;
artifacts can be signed and verified.
## Phase 3 — Platform features (v0.4 v1.0)
| ID | Title | Notes |
|---|---|---|
| WP-0012 | OCI artifact `/v2/` endpoint | Implements OCI Distribution Spec on top of the same storage (ADR-0006). |
| WP-0013 | Content-defined chunking + global dedup at chunk level | FastCDC; chunked storage. Builds toward `docs/ASSEMBLY-EXPERIMENT.md`. |
| WP-0014 | Rust data plane extraction | Move `dataplane.inproc` to `dataplane.remote` (ADR-0004). |
| WP-0015 | WASM plugin host | Extension surface for indexers, redactors, scorecard generators. |
| WP-0016 | Cold-tier adapters | Glacier / Tape / IA classes; restore flow. |
| WP-0017 | Federation and replication | Signed manifest exchange between artifact-store instances. |
Exit criteria for v1.0: artifact-store is embeddable as a library, runs
as a single-binary CLI, runs as a server, speaks OCI, federates between
instances, and is fast enough to be a credible commercial substrate.
## What this roadmap deliberately does NOT promise
- Specific calendar dates. Cadence is set by sessions, not quarters.
- A UI. UIs are out-of-tree (see `docs/PLATFORM-AMBITION.md`).
- ML-specific or container-specific features. Use OCI compatibility.
- A storage backend for every cloud. Adapters are community surface.
## How to add a workplan
1. Pick the next free `ARTIFACT-STORE-WP-NNNN` number.
2. Create `workplans/ARTIFACT-STORE-WP-NNNN-<slug>.md` with the
frontmatter and task block format in `AGENTS.md`.
3. Cite the ADRs the workplan depends on in its `## Constraints`
section.
4. Append a row to the appropriate phase table in this file.
5. Notify the custodian operator to run
`make fix-consistency REPO=artifact-store`.
## How to retire a workplan
1. Set `status: done` in the frontmatter when all tasks are `done`.
2. Move the file to `workplans/archived/YYMMDD-ARTIFACT-STORE-WP-NNNN-<slug>.md`.
3. Update this roadmap to reflect the new state.

View File

@@ -0,0 +1,80 @@
# ADR-0001 — Content-Addressed Storage with Dual Digest
Status: accepted
Date: 2026-05-15
Supersedes: —
Related: ADR-0003, ADR-0006, `docs/PLATFORM-AMBITION.md` commitments A1, A2, A9
## Context
The architecture blueprint as originally drafted addresses stored bytes by
logical `(package, relative_path)`. That is sufficient for v1 ingestion but
forecloses global deduplication, Merkle integrity proofs, partial
replication, federation, and OCI artifact compatibility — all of which the
platform ambition requires to remain reachable.
Independently, the original blueprint pins SHA-256 as the only file digest.
SHA-256 with SHA-NI on modern x86 reaches ~1.52 GB/s/core. BLAKE3 on the
same hardware reaches 610+ GB/s/core, parallelises across cores, and its
construction *is* a Merkle tree — package-level integrity becomes free.
SHA-256 remains the lingua franca of SLSA, in-toto, cosign, and OCI; we
cannot drop it.
## Decision
1. The canonical storage key for any byte sequence is its content address
in the form `<algorithm>:<lowercase-hex-digest>`. Storage backends store
and retrieve by this key. `relative_path` is logical metadata recorded
in the manifest, not a storage-layer concept.
2. Every `artifact_files` row carries two digest columns:
- `digest_primary` — the native digest; default algorithm `blake3`.
- `digest_sha256` — always populated for interop, even when `blake3`
is the primary.
Both are computed in a single ingest pass (one read of the input).
3. The schema also carries a `digest_algorithm` column naming the primary
algorithm. Additional algorithms are added by new columns or a side
table, never by overloading `digest_primary`.
4. Storage backend object keys are derived from `digest_primary` only.
Migrations between primary algorithms are explicit and audited; they
are not silent.
## Consequences
Positive:
- Global deduplication is automatic — two identical files in two packages
share one backend object.
- Merkle integrity over a package is free with BLAKE3 (use the tree mode).
- Federation, partial mirrors, and OCI compatibility (ADR-0006) become
reachable without schema migration.
- Verification of a single file does not require fetching its package.
Negative:
- Two digests must be computed per ingest. Mitigated by streaming both
through one buffer; the bottleneck is I/O, not hashing.
- Reference counting: deletion of an `artifact_file` row cannot
unconditionally delete the backend object. A garbage-collector pass
reconciles references before deleting bytes. This is correct anyway
(deletion should be deliberate, per the blueprint).
- Producers requesting "store these N bytes at path P" must understand
that their P is logical. This is a documentation problem, not a
technical one.
## Implementation notes
- v1 ships BLAKE3 via the `blake3` PyPI wheel (Rust core, SIMD-accelerated;
no asm we maintain).
- v1 ships SHA-256 via stdlib `hashlib` (SHA-NI used when the CPython
build links against OpenSSL with SHA-NI support).
- A `Digest` value object wraps `(algorithm, hex)`; serialised forms
always include the algorithm prefix.
- A garbage-collector workplan is filed at WP-0006 (TBD); v1 does not
delete bytes automatically — it marks them eligible.
## Status of the original blueprint pin
The pre-cleanup blueprint's `artifact_files.sha256` column is replaced by
`digest_algorithm`, `digest_primary`, `digest_sha256`. The pre-cleanup
blueprint's implicit path-keyed storage is replaced by content-keyed
storage. These changes are absorbed into `docs/ARCHITECTURE-BLUEPRINT.md`.

View File

@@ -0,0 +1,76 @@
# ADR-0002 — Append-Only Event Log as Source of Truth
Status: accepted
Date: 2026-05-15
Related: `docs/PLATFORM-AMBITION.md` commitment A3
## Context
The original blueprint defines `audit_events` and `retention_events` as
separate tables. Both are useful, but neither is a complete authoritative
record of how registry state was produced. Several downstream needs share
one underlying primitive:
- audit (who did what when, with what result),
- change-data-capture feed for downstream consumers (Statehub, search),
- replication and federation between instances,
- point-in-time replay and disaster recovery,
- materialised view rebuilds when schemas evolve.
Each can be served by an append-only log of registry events with a
monotonic sequence number. Two separate tables cannot.
## Decision
1. The registry persists an append-only `events` table. Every state-
changing operation writes one row in the same database transaction as
the operation. Once written, rows are immutable.
2. Each row has a strictly monotonic, gapless sequence number scoped to
the registry instance, and a UTC ingest timestamp.
3. The current `artifact_packages`, `artifact_files`, `storage_locations`,
and `retention_state` tables are materialised views over `events`.
They are rebuildable by replay.
4. Event payloads are stored as canonical CBOR (ADR-0003), keyed by
`event_type` (string slug). The `event_type` namespace is versioned
(`v1.package.created`, `v1.file.ingested`, `v1.retention.extended`,
etc.).
5. `audit_events` and `retention_events` cease to exist as standalone
tables; their semantics are subsets of `events` filtered by
`event_type`.
## Consequences
Positive:
- One primitive serves audit, CDC, replication, replay, and rebuild.
- A consumer can tail by `sequence > N` and never miss an event.
- Forward-compatibility: new view columns can be derived from existing
events by adding a replay path; no migration required.
- Signed event chains are reachable later by adding a signature column.
Negative:
- Replays cost wall-clock time on large datasets. Snapshots of
materialised views (with the highest applied sequence stamped on them)
are used to bound replay cost.
- Schema migrations on materialised views still happen; they just no
longer touch the source of truth.
- Discipline required: any write that bypasses the event log is a bug.
Enforced by code review and a runtime invariant check on the
materialised tables.
## Implementation notes
- `events` schema (v1):
- `sequence BIGSERIAL PRIMARY KEY`
- `created_at TIMESTAMPTZ NOT NULL DEFAULT now()`
- `event_type TEXT NOT NULL`
- `subject_kind TEXT NOT NULL``package` | `file` | `retention` | `storage` | `system`
- `subject_id UUID` — nullable for system-level events
- `actor TEXT NOT NULL` — producer or operator identity
- `payload BYTEA NOT NULL` — canonical CBOR
- `payload_digest BYTEA NOT NULL` — BLAKE3 of `payload`
- Indexes: `(subject_kind, subject_id)`, `(event_type, sequence)`.
- Replay tool ships in v1 as a CLI subcommand (`artifactstore replay`).
- Outbound CDC stream (NATS / Kafka) is its own workplan; v1 only exposes
long-poll over `GET /events?since=<sequence>`.

View File

@@ -0,0 +1,78 @@
# ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0002, ADR-0006, `docs/PLATFORM-AMBITION.md` commitment A4
## Context
Manifests describe a package's identity, contents, retention, and
provenance. They are the durable, portable, signable summary of a package.
Three downstream features depend on byte-identical manifest serialisation:
1. Manifest digest (used as the package's content address — ADR-0001).
2. Signatures (cosign, Sigstore, in-toto, SLSA).
3. Cross-language / cross-version reproducibility (any client must be
able to verify a manifest produced by any other client).
JSON does not guarantee byte-identical output without an explicit
canonicalisation profile. The candidates are:
- **JCS** (JSON Canonicalization Scheme, RFC 8785) — JSON-shaped, widely
available, text-format, signs cleanly.
- **Canonical CBOR** (RFC 8949 §4.2.2) — binary, smaller, lower overhead
to canonicalise, native in cosign / Sigstore tooling, used by COSE.
- **DAG-CBOR** (IPLD profile) — canonical CBOR plus content-addressing
conventions; useful if we later integrate with IPLD/IPFS, but pulls in
ecosystem assumptions we don't yet need.
Canonical CBOR wins on size, parser surface, and direct compatibility
with the tooling we will adopt for signing (ADR commitments A4, A9). JCS
is a reasonable alternative; we keep an emit-JCS path for human-readable
display but the signed form is CBOR.
## Decision
1. Manifests are serialised as **canonical CBOR** per RFC 8949 §4.2.2:
- definite-length encoding throughout,
- shortest-form integer encoding,
- map keys sorted bytewise lexicographically,
- no floating-point unless explicitly required (we do not require it),
- no semantic tags except those we explicitly enumerate.
2. The manifest's content address is `blake3:<hex>` of its canonical
CBOR bytes. This is the package's primary identifier in storage.
3. A canonical JSON projection (JCS) of the same manifest is available
for display, signing-tool interop, and human inspection. The
projection is deterministic: round-tripping through it must yield
byte-identical CBOR.
4. The manifest schema is itself versioned (`manifest_version: 1`).
Unknown fields are preserved on read and re-emitted on write (forward
compatibility); breaking schema changes bump the version.
## Consequences
Positive:
- Manifests are signable today by any tool that consumes CBOR (cosign,
ssh-keygen `-Y sign`, COSE libraries).
- The manifest digest is stable across languages, OS, and compiler.
- Smaller on disk and on the wire than JSON.
- Replay (ADR-0002) is unambiguous because event payloads are also CBOR.
Negative:
- Less human-readable in raw form; the CLI must offer a `pretty` projection.
- One more dependency (a CBOR library). We pin one in ADR-0005.
- Future schema evolution requires the same canonicalisation discipline.
Enforced by a property-based test: any manifest must round-trip
CBOR → JCS → CBOR with byte equality.
## Implementation notes
- v1 library: `cbor2` (PyPI; pure-Python with optional C extension).
Wrapped behind `artifactstore.manifest.codec` so swapping to a faster
impl is transparent.
- JCS projection: `jcs` (PyPI) or hand-rolled — decision deferred to
WP-0001-T003.
- A `Manifest` value class enforces field order on emit, not just on
encode. This catches non-canonical producers at the API boundary.

View File

@@ -0,0 +1,79 @@
# ADR-0004 — Control Plane / Data Plane Contract
Status: accepted
Date: 2026-05-15
Related: ADR-0005, `docs/PLATFORM-AMBITION.md` commitment A5,
`docs/ASSEMBLY-EXPERIMENT.md`
## Context
The platform ambition expects a Rust (eventually asm-tuned) data plane
to handle hot ingest paths — hashing, chunking, optional compression and
encryption, storage backend I/O. The v1 service is written entirely in
Python (ADR-0005). The cost of conflating control and data planes at the
code level is that extracting the data plane later requires API churn,
test rework, and producer migrations.
The cost of separating them now is one named module boundary and one
in-process protocol shape. That cost is essentially free if taken
before any consumer exists.
## Decision
1. The Python package is organised so that *every byte-handling
operation* lives behind a named contract:
- `artifactstore.dataplane.spi` — the abstract surface (typed
dataclasses, async iterator protocols).
- `artifactstore.dataplane.inproc` — the v1 implementation, running
in the same process as the control plane.
2. The control plane (`artifactstore.registry`, `artifactstore.api.http`,
`artifactstore.retention`, `artifactstore.audit`) interacts with
bytes *only* through the SPI. No HTTP handler, no DB writer, no
retention rule ever reads or writes file bytes directly.
3. The SPI exposes exactly these operations:
- `ingest_stream(stream, hints) -> IngestResult` — consumes an
upload, returns content addresses, sizes, and storage receipts.
- `serve_object(content_address, range?) -> AsyncIterator[bytes]`
produces bytes for a download.
- `verify_object(content_address) -> VerifyResult` — re-reads bytes,
re-digests, returns mismatches.
- `delete_object(content_address) -> DeletionResult` — best-effort,
idempotent.
- `backend_health() -> BackendStatus` — readiness, latency, free
capacity.
4. The SPI surface is the contract a future Rust daemon must satisfy.
When that daemon ships, `artifactstore.dataplane.inproc` is replaced
by `artifactstore.dataplane.remote` (a thin gRPC or
framed-bincode-over-Unix-socket client). The control plane sees no
change.
5. SPI parameter and return types are CBOR-serialisable today, even when
nothing serialises them. This lets us toggle to RPC without rewriting
types.
## Consequences
Positive:
- The data plane can be rewritten in Rust later with zero API churn.
- Tests can fake the SPI cheaply; integration tests pin the contract.
- The CLI in `artifactstore.cli` is a second consumer of the SPI on
equal footing with the HTTP server.
- Operators with strong embedding requirements can use the in-process
data plane forever; nothing forces the RPC hop.
Negative:
- One extra abstraction layer in v1. Mitigated by the contract being
narrow (five operations).
- Discipline required: PRs that bypass the SPI are rejected. A linter
rule (forbidden import: `artifactstore.api.* -> filesystem`) makes
this mechanical.
## Implementation notes
- The SPI is a `Protocol` (typing.Protocol) in `dataplane/spi.py` so the
in-process and future remote impls don't share an inheritance tree.
- Streaming returns `AsyncIterator[bytes]` so neither full-file buffering
nor `sendfile()` zero-copy is foreclosed.
- The `IngestResult` payload is the canonical CBOR-able value used in
events (ADR-0002). The same byte sequence flows API → SPI → event.

View File

@@ -0,0 +1,117 @@
# ADR-0005 — V1 Technology Stack
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0002, ADR-0003, ADR-0004
## Context
WP-0001 ("Foundation") cannot start without a pinned stack. The decision
needs to balance:
- ffmpeg / VLC philosophy: minimal dependency budget, sharp boundaries,
native code at the hot edges, plain tools.
- Python is already implied by `.gitignore` and ecosystem fit (StateHub,
guide-board, open-cmis-tck are all Python-leaning).
- The data plane will eventually be Rust (ADR-0004); the control plane
stays in Python and must stay approachable.
## Decision
| Concern | Choice | Rationale |
|---|---|---|
| Language (control plane) | **Python 3.12+** | Async ecosystem, type hints, matches sibling repos. 3.12 specifically: PEP 695 generics, faster CPython, `sys.monitoring`. |
| Package / project manager | **uv** | Single static binary, fast resolver, lockfile-first, replaces `pip + pip-tools + venv + pipx` in one tool. |
| Build backend | **hatchling** (via `pyproject.toml`) | Standards-track PEP 517 backend. No magic. |
| HTTP framework | **FastAPI** (Starlette + Pydantic v2) | OpenAPI generation, async-native, broad community. |
| ASGI server | **uvicorn** (dev), **gunicorn + uvicorn workers** (prod) | Plain, well-understood. |
| Database (prod) | **PostgreSQL 16+** | Source-of-truth event log (ADR-0002) wants `BIGSERIAL`, `BYTEA`, advisory locks, logical replication. |
| Database (dev/embedded) | **SQLite (WAL mode)** | Zero-dependency local. Schema is portable when we use SQLAlchemy Core. |
| DB access | **SQLAlchemy 2.0 Core** + **asyncpg** (prod) / **aiosqlite** (dev) | Core, not ORM — explicit SQL, async drivers. Migrations live below the API surface. |
| Migrations | **Alembic** | Standard, integrates with SQLAlchemy Core, supports both pg and sqlite. |
| Hashing | stdlib **`hashlib`** for SHA-256, **`blake3`** PyPI wheel for BLAKE3 | `blake3` wheel embeds the SIMD-tuned Rust impl with no build-time toolchain. |
| Serialisation | **`cbor2`** for canonical CBOR (ADR-0003); stdlib `json` for JCS or `jcs` PyPI | Smallest deps that satisfy ADR-0003. |
| CLI | **typer** (atop click) | Sits on FastAPI's Pydantic types cleanly; type-driven CLI surface. |
| Tests | **pytest** + **httpx** + **trio-asyncio**-free `pytest-asyncio` | Standard. |
| Lint / format | **ruff** (lint + format) | One tool replaces black + isort + flake8 + pyupgrade. |
| Type checker | **mypy** in `--strict` | Pyright is acceptable for editor support; CI gate is mypy. |
| Logging | stdlib `logging` + `structlog` for structured output | No exotic deps. |
| Metrics / tracing | OpenTelemetry SDK (deferred to its own workplan) | Listed for forward-compatibility; not a v1 dep. |
### Project layout
```
artifact-store/
├── pyproject.toml
├── uv.lock
├── Makefile # thin shim: make dev / test / lint / type / migrate
├── alembic.ini
├── src/
│ └── artifactstore/
│ ├── __init__.py
│ ├── identity/ # content address, digest abstraction (ADR-0001)
│ ├── manifest/ # canonical CBOR, JCS projection (ADR-0003)
│ ├── events/ # append-only log + replayer (ADR-0002)
│ ├── retention/ # policy engine
│ ├── audit/ # audit emission as event subset
│ ├── storage/ # adapter SPI + backend registry
│ │ ├── spi.py
│ │ └── backends/
│ │ ├── local.py # filesystem backend
│ │ └── s3.py # placeholder, WP-0004
│ ├── dataplane/ # SPI + in-process impl (ADR-0004)
│ │ ├── spi.py
│ │ └── inproc.py
│ ├── registry/ # high-level orchestrator
│ ├── api/
│ │ └── http/ # FastAPI app
│ ├── cli/ # typer CLI (thin)
│ └── config.py
├── tests/
│ ├── unit/
│ ├── integration/
│ └── conftest.py
├── migrations/ # alembic
└── docs/
```
### Commands (T001 acceptance)
```
make dev # uvicorn with reload, sqlite backend, local FS storage
make test # pytest -q
make lint # ruff check + ruff format --check
make type # mypy --strict src tests
make migrate # alembic upgrade head
artifactstore # CLI entry point installed by uv
```
## Consequences
Positive:
- Dependency budget is small and each dep is best-in-class for its slot.
- The same toolchain works on Linux, macOS, and CI without special cases.
- `uv.lock` is checked in; builds are reproducible.
- Every layer maps one-to-one to a docs concept (identity, manifest,
events, dataplane, etc.), so the codebase remains navigable.
Negative:
- Pydantic v2 is the heaviest non-DB dep; acceptable for the OpenAPI win.
- Choosing SQLAlchemy Core over ORM costs some convenience; we accept
it because explicit SQL is easier to migrate to Rust later (ADR-0004).
- mypy `--strict` is a per-PR tax; bounded by keeping the codebase small.
## Revision policy
This ADR is the most likely candidate for revision once we have profile
data from real ingestion. Candidates we are already watching:
- Replace `cbor2` with a Rust-backed CBOR codec if profile shows it on
the hot path.
- Replace `uvicorn` with `granian` (Rust ASGI server) if perf demands.
- Replace `SQLAlchemy Core` with raw `asyncpg` + a tiny query builder
if Core's abstractions show up in flame graphs.
Each replacement is its own ADR. None of them are v1 work.

View File

@@ -0,0 +1,69 @@
# ADR-0006 — OCI Artifact Compatibility Kept Reachable
Status: accepted
Date: 2026-05-15
Related: ADR-0001, ADR-0003, `docs/PLATFORM-AMBITION.md` commitment A9
## Context
The OCI Distribution Specification and the OCI Artifact Manifest define
a widely-deployed wire format for content-addressed artifact exchange.
The ecosystem includes `oras`, `cosign`, `crane`, Helm, ChartMuseum,
ML-model packaging tools, and most container registries. Compatibility
with this ecosystem is the single highest-leverage opportunity in
`docs/PLATFORM-AMBITION.md`.
We do not implement OCI compatibility in v1. We do refuse to take any
v1 decision that prevents it.
## Decision
1. The internal data model is structurally compatible with an OCI
artifact manifest. Concretely:
- Storage addresses content as `<algorithm>:<lowercase-hex>`
(ADR-0001). OCI requires exactly this shape.
- Manifests have a `config` blob plus an ordered list of `layers`,
each with `mediaType`, `digest`, `size`, and optional
`annotations`. Our `Manifest` value class includes all of these
fields, even when v1 has no use for `mediaType` or `annotations`.
- Manifest serialisation produces byte-identical output across
callers (ADR-0003). OCI requires this for the manifest digest.
2. The native API may be richer than OCI, but v1 reviews every schema
change against the OCI spec and rejects changes that would block
later OCI compatibility.
3. A future `/v2/` namespace will speak the OCI Distribution Spec on
top of the same storage. This is its own workplan; it does not
modify v1 endpoints, only add new ones.
## Consequences
Positive:
- `oras push`, `cosign sign`, `crane copy`, Helm `chart pull` become
reachable additions, not rewrites.
- Customers who already speak OCI can adopt incrementally.
- The `mediaType` discipline forces v1 producers to label their files,
which improves the manifest's value as a portable record.
Negative:
- v1 carries some otherwise-unnecessary manifest fields. Acceptable;
the cost is bytes, not complexity.
- The OCI manifest model uses SHA-256 as the canonical digest in
practice. ADR-0001's `digest_sha256` column satisfies this; the
native primary digest can still be BLAKE3.
## What this ADR does NOT commit to
- It does not commit to implementing OCI Distribution in v1.
- It does not commit to OCI as the *only* wire format. The native API
remains the richer interface.
- It does not commit to specific OCI media types for evidence packages.
Media-type assignment is the subject of a later workplan.
## Review trigger
Every schema-affecting workplan (anything that touches the data model
or the manifest shape) must include an explicit one-paragraph review
against this ADR. Reject changes that introduce OCI-incompatible
invariants without superseding this ADR.

32
docs/adr/README.md Normal file
View File

@@ -0,0 +1,32 @@
# Architecture Decision Records
This directory holds the architectural decisions that govern `artifact-store`.
Each ADR is a small Markdown file with a status (`proposed`, `accepted`,
`superseded`, `deprecated`), a concise statement of the decision, the
forces that pushed it, and the consequences.
ADRs are the canonical home for "we are doing X" statements that survive
multiple workplans. `INTENT.md` says what we build; `SCOPE.md` says where
the boundary is; `docs/PLATFORM-AMBITION.md` says where we are pointed;
ADRs say how — and they are the only document that records a *changeable*
decision in a form that can be superseded cleanly.
Workplans cite the ADRs they depend on. The architecture blueprint cites
the ADRs it operationalises.
## Index
- [ADR-0001 — Content-Addressed Storage with Dual Digest](0001-content-addressed-storage.md) — accepted
- [ADR-0002 — Append-Only Event Log as Source of Truth](0002-event-log-source-of-truth.md) — accepted
- [ADR-0003 — Manifest Canonicalisation = Canonical CBOR (RFC 8949 §4.2.2)](0003-manifest-canonical-cbor.md) — accepted
- [ADR-0004 — Control Plane / Data Plane Contract](0004-control-plane-data-plane-contract.md) — accepted
- [ADR-0005 — V1 Technology Stack](0005-v1-tech-stack.md) — accepted
- [ADR-0006 — OCI Artifact Compatibility Kept Reachable](0006-oci-compatibility-reachable.md) — accepted
## Conventions
- Filenames: `NNNN-kebab-case-slug.md`, numbered in acceptance order.
- Status transitions: `proposed → accepted → (superseded | deprecated)`.
- Supersession is explicit: the new ADR links the old; the old ADR links
forward and changes status. Never delete an ADR.
- Each ADR is short. If it is long, it is wrong: split it.