docs/OPERATOR.md (new): runbook with prerequisites, quick start,
environment variables, SQLite + PostgreSQL setup, storage layout,
CLI reference, HTTP /health reference, an end-to-end Python smoke
test (create_package -> ingest_file -> finalize -> manifest), the
replay / disaster-recovery procedure, common failure modes, and a
References section that links every ADR (0001..0006), the
blueprint, platform ambition, roadmap, and assembly experiment.
README.md: refreshed to v0.1 baseline status. Quick-start uses the
real flow (uv sync, migrate-fresh, dev, /health, artifactstore health).
Make targets and CLI commands tabulated. Links docs/OPERATOR.md.
AGENTS.md: Current Repo Shape now reflects the landed scaffold +
library + CLI + HTTP app rather than "no runnable scaffold yet";
links OPERATOR.md and lists the canonical local commands.
workplans/ARTIFACT-STORE-WP-0001-service-baseline.md:
- T008 marked done.
- frontmatter status: active -> done; updated: 2026-05-16.
All ten WP-0001 tasks are now done (T001/T002/T003/T008/T009/T010/
T011/T012/T013/T014). Foundation workplan retires.
Gates: ruff clean, mypy --strict clean, 83 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/app.py (new): composition root. build_registry(settings)
wires AsyncEngine + LocalBackend + InProcessDataPlane + RegistryViewWriter
into a Registry. Used by both the HTTP app and the CLI.
src/artifactstore/registry/__init__.py: adds db_health() (SELECT 1 probe),
backend_health() (pass-through to dataplane), and dispose() (engine
shutdown) helpers so the HTTP /health endpoint and CLI commands can talk
to the registry without reaching for private state.
src/artifactstore/api/http/__init__.py:
- create_app(settings=None) factory; lifespan owns the registry instance
and disposes it on shutdown.
- GET / returns the scaffold banner.
- GET /health reports overall status + db {healthy, detail} + backend
{backend_id, healthy, detail, free_bytes, total_bytes}. Uses
FastAPI Depends() with a request->state.registry helper rather than
reaching app.state directly.
- Module-level `app = create_app()` so `uvicorn artifactstore.api.http:app`
keeps working.
src/artifactstore/cli/__init__.py:
- migrate: `alembic upgrade head` via the alembic command API.
- replay: drops + rebuilds materialised views from the event log; prints
the highest applied sequence.
- health: prints the same payload as the HTTP /health endpoint, as JSON.
- version unchanged.
Tests:
- tests/integration/test_http_health.py (TestClient-based): /
scaffold banner; /health reports ok with db.healthy + backend.healthy
+ free_bytes populated.
- tests/integration/test_cli_commands.py (typer CliRunner): version
prints; migrate creates the schema (events + retention_classes +
alembic_version); replay against an empty log exits ok with
"replayed up to sequence 0"; health prints a status=ok JSON payload.
Gates: ruff clean, mypy --strict clean on 48 files, 83 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/registry/__init__.py implements the Registry class with
six operations the HTTP API and CLI both consume:
* create_package(name, producer, subject, retention_class, actor, metadata?)
-> UUID. Validates retention_class against the seed table; emits
v1.package.created with CBOR payload; applies view in same transaction.
* ingest_file(package_id, relative_path, media_type, stream, actor) -> UUID.
Validates the package is in 'created' status and rejects duplicate
relative_path. Calls dataplane.ingest_stream (which dual-hashes and writes
to the backend). Emits v1.file.ingested whose payload carries the file
metadata + storage receipt + deterministic storage_location_id so replay
reproduces UUIDs. View handler in events/views.py inserts artifact_files
+ storage_locations and bumps last_event_sequence on the package.
* finalize_package(package_id, actor) -> ContentAddress. Queries the views
to build a Manifest dataclass, encodes it as canonical CBOR, computes the
BLAKE3 content address, and writes v1.package.finalized whose payload IS
the canonical CBOR manifest. The view handler now records
manifest_digest = event.payload_digest (BLAKE3 of the manifest), not a
separate field parsed from the payload.
* get_manifest_bytes(package_id, format='cbor'|'json') -> bytes. Reads the
finalize event payload (CBOR) and optionally projects to JCS.
* get_file(file_id) -> AsyncIterator[bytes]. Looks up the storage location
and serves bytes via the data plane.
* tail_events(since_sequence, poll_interval_seconds) -> AsyncIterator[Event].
Pass-through to events.tail.
src/artifactstore/events/views.py:
- New v1.file.ingested handler.
- v1.package.finalized handler updated: manifest_digest now derived from
event.payload_digest (= BLAKE3 of the canonical CBOR manifest payload).
- All inserts now pass created_at=event.created_at explicitly so replay
produces byte-identical materialised state (server_default=now() was
firing fresh on each replay insert).
tests/integration/test_registry.py (7 cases):
- Rejects unknown retention class.
- create_package writes the event and the package row.
- ingest_file writes file + storage_location, populates content_address
with blake3 prefix.
- Duplicate relative_path raises DuplicateRelativePathError.
- ingest into unknown package raises PackageNotFoundError.
- Finalising twice raises IllegalPackageStateError.
- End-to-end: create + ingest 3 files + finalize + read manifest in CBOR
and JSON + download each file with byte equality + tail 5 events + replay
+ assert byte-identical materialised state across pre and post snapshots.
tests/integration/test_event_log.py updated: the v1.package.finalized
replay test now uses the new payload semantics (payload is the canonical
CBOR manifest; manifest_digest = BLAKE3 of payload).
Gates: ruff clean, mypy --strict clean on 45 files, 77 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/dataplane/:
- spi.py: DataPlane Protocol with the five operations ingest_stream,
serve_object, verify_object, delete_object, backend_health
(ADR-0004). Dataclasses: IngestHints (size_hint, primary_algorithm,
backend_id overrides), IngestResult (primary_digest + sha256_digest +
size_bytes + StorageReceipt), VerifyResult (verified bool, mismatch
reason, actual digests + size).
- inproc.py: InProcessDataPlane wraps one StorageBackend. ingest_stream
is two-pass against a tempfile (drain stream while dual-hashing into
BLAKE3+SHA-256, then forward the tempfile to backend.put under the
primary content address); fsync+cleanup on exception. serve_object
passes byte ranges through; verify_object re-reads bytes via backend.get,
re-digests with the stored algorithm, and reports mismatches. delete
and health are thin pass-throughs.
tests/unit/test_dataplane_inproc.py (11 cases):
- ingest_stream computes correct dual digests, returns receipt, stores
bytes at the content-addressed path.
- empty-input ingest returns the BLAKE3/SHA-256 of empty.
- serve_object round-trips ingested bytes; supports byte_range.
- verify_object verifies intact bytes; detects on-disk corruption.
- delete_object passes through (True then False).
- backend_health passes through.
- IngestHints override of primary_algorithm (sha256-as-primary path).
- Missing-object serve raises ObjectNotFoundError.
- Architectural test (ADR-0004 invariant): no control-plane module
(api / registry / retention / audit) imports
artifactstore.storage.backends.* or artifactstore.dataplane.inproc
directly. Enforced via AST scan of every .py file in those packages.
Gates: ruff clean, mypy --strict clean on 44 files, 70 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/storage/:
- spi.py: StorageBackend Protocol (backend_id, put, get, head, delete,
health) and result dataclasses (StorageReceipt, StorageObjectMetadata,
DeletionResult, BackendStatus). ObjectNotFoundError exception type.
- registry.py: backend lookup by string ID (register/get/list_backends/
clear) per ADR-0004.
- backends/local.py: LocalBackend implementation.
* Object layout <root>/<algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>.
* Atomic writes: tmpfile + fsync + rename (idempotent re-puts drain the
stream without rewriting).
* Defence in depth: resolves the final path and asserts it remains under
the configured root.
* Range reads honour HTTP-style inclusive (start, end) tuples.
* health() returns disk usage via shutil.disk_usage and surfaces an
unhealthy status when the root has disappeared.
* delete() cleans up emptied shard directories opportunistically.
tests/unit/test_storage_local.py (14 cases): put/get round-trip; object
key layout matches blueprint; head returns metadata; head/get missing
raise ObjectNotFoundError; put is idempotent; delete returns True then
False; range read returns subrange; range read rejects invalid range;
health reports disk usage; health reports unhealthy when root vanished;
ContentAddress validation blocks path-traversal-flavoured inputs;
registry register/get/list/clear round-trip; idempotent re-put leaves
bytes intact.
Gates: ruff clean, mypy --strict clean on 41 files, 59 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/events/:
- model.py: Event frozen dataclass (event_type, subject_kind, subject_id,
actor, payload, payload_digest; sequence + created_at populated by the
DB on write). make_event() helper computes payload_digest as raw BLAKE3
(32 bytes) of payload. ViewWriter Protocol with reset() + apply().
- log.py:
* write(connection, event) — inserts one row in the caller's transaction
and returns Event with sequence + created_at populated via RETURNING.
* fetch_since(connection, since_sequence, limit) — read events after a
cursor in order.
* tail(engine, since_sequence) — async-iterator long-poll over the log;
SQLite uses interval polling, PG LISTEN/NOTIFY is a future workplan.
* replay(engine, view_writer, reset=True) — drains the event log through
a ViewWriter inside one transaction; returns the highest sequence
applied.
- views.py: RegistryViewWriter — canonical event handlers shared by direct
write and replay paths. Ships handlers for v1.package.created (inserts
artifact_packages + retention_state) and v1.package.finalized (updates
status, finalized_at, manifest_digest). Unknown event types tolerated;
additional handlers register here as later tasks land.
src/artifactstore/db/schema.py: events.sequence type is now
BigInteger().with_variant(Integer(), 'sqlite') so SQLite's autoincrement
(INTEGER PRIMARY KEY rowid alias) works while PostgreSQL keeps BIGSERIAL.
tests/integration/test_event_log.py (6 cases):
- write() assigns monotonic sequence numbers (1, 2, ...) and a created_at.
- fetch_since(since_sequence=2) returns the ordered tail.
- tail() yields events and exits cleanly on consumer break.
- Direct write path (write + apply) and replay path produce byte-identical
materialised state — the key ADR-0002 invariant.
- Replay handles multiple event types (package.created -> finalized).
- Unknown event types are tolerated (no-op apply).
- payload_digest equals BLAKE3 of payload.
Gates: ruff clean, mypy --strict clean on 36 files, 45 tests pass.
make migrate-fresh end-to-end ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Schema (src/artifactstore/db/schema.py):
- events table (ADR-0002 source of truth): sequence BIGSERIAL PK, created_at,
event_type, subject_kind, subject_id, actor, payload (CBOR bytes),
payload_digest. Indexes on (subject_kind, subject_id) and
(event_type, sequence).
- artifact_packages, artifact_files, storage_locations, retention_state
(materialised views over events).
- retention_classes (seed table) and metadata_schemas (config table).
- ADR-0001 columns present: digest_algorithm, digest_primary, digest_sha256,
content_address. Blueprint tiering columns present: retrieval_tier
(default 'hot'), restore_status.
- Types portable: SQLAlchemy 2.0 Core with JSON().with_variant(JSONB, 'postgresql'),
Uuid, LargeBinary, DateTime(timezone=True), Boolean false() default.
Seed (src/artifactstore/db/seed.py): five v1 retention classes (transient,
raw-evidence, summary-evidence, release-evidence, permanent-record) with
default durations in seconds; permanent-record has no expiry.
Alembic:
- alembic.ini with sync sqlite URL default; path_separator=os to silence the
1.13 deprecation warning.
- migrations/env.py: translates async URLs (+aiosqlite, +asyncpg) to sync
counterparts at migrate-time so a single ARTIFACTSTORE_DATABASE_URL works
for both runtime (async) and Alembic (sync).
- migrations/script.py.mako template.
- migrations/versions/20260516_0001_initial.py: metadata.create_all + bulk
insert of retention class seeds.
Make:
- make migrate: alembic upgrade head (ensures var/ exists).
- make migrate-fresh: drop local SQLite + re-run.
Deps: psycopg[binary] added as optional `postgres` extra (PostgreSQL prod
path; SQLite default for dev needs no extra).
Tests:
- tests/unit/test_db_schema.py: every expected table present; ADR-0001 and
tiering columns present; seed has the five v1 classes; permanent-record
has no default_duration; create_all + FK insert + Boolean default
round-trip on in-memory SQLite.
- tests/integration/test_migrations.py: alembic upgrade head against a
tempfile SQLite produces all tables (+ alembic_version) and the seed rows.
Gates: ruff clean, mypy --strict clean on 32 files, 38 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix-consistency assigned state_hub_task_id and state_hub_workstream_id UUIDs
to the tasks and workplans added in 747afc2.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Aligns the v1 architecture with the longer-horizon platform thesis so we can
start implementation without the schema-level inconsistencies the prior
review surfaced.
ADRs (docs/adr/0001..0006): content-addressed dual-digest storage, append-only
event log as source of truth, canonical CBOR manifests, control/data-plane
contract, v1 tech stack (Python 3.12 / uv / FastAPI / SQLAlchemy Core +
asyncpg / Alembic / cbor2 / blake3 / ruff / mypy / pytest / typer), OCI
compatibility kept reachable.
Architecture blueprint rewritten to v2: library-first (ffmpeg-shaped) module
layout, materialised-view data model over the event log, upload-session and
event-stream endpoints pinned, retrieval tiering promoted into the schema.
Roadmap added (docs/ROADMAP.md) with three phases. WP-0001 rewritten as the
Foundation plan (scaffold + kernels + local FS + minimal app). WP-0002..0005
created carrying the existing state_hub_task_ids forward semantically:
ingestion API (T004), retention lifecycle (T005), S3-compatible backend
(T006), guide-board pilot (T007). T001/T002/T003/T008 remain in WP-0001
with refined acceptance.
README and AGENTS.md refreshed to reflect the new repo shape.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the longer-horizon thesis (sovereign-cloud artifact substrate)
alongside the carefully-scoped v1 INTENT. PLATFORM-AMBITION records nine
schema/contract commitments the v1 must preserve to keep that horizon
reachable. ASSEMBLY-EXPERIMENT frames an opt-in research line on
ffmpeg-grade hand-tuned asm with an MIT-0 vs LGPL-aware reuse map.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>