src/artifactstore/dataplane/:
- spi.py: DataPlane Protocol with the five operations ingest_stream,
serve_object, verify_object, delete_object, backend_health
(ADR-0004). Dataclasses: IngestHints (size_hint, primary_algorithm,
backend_id overrides), IngestResult (primary_digest + sha256_digest +
size_bytes + StorageReceipt), VerifyResult (verified bool, mismatch
reason, actual digests + size).
- inproc.py: InProcessDataPlane wraps one StorageBackend. ingest_stream
is two-pass against a tempfile (drain stream while dual-hashing into
BLAKE3+SHA-256, then forward the tempfile to backend.put under the
primary content address); fsync+cleanup on exception. serve_object
passes byte ranges through; verify_object re-reads bytes via backend.get,
re-digests with the stored algorithm, and reports mismatches. delete
and health are thin pass-throughs.
tests/unit/test_dataplane_inproc.py (11 cases):
- ingest_stream computes correct dual digests, returns receipt, stores
bytes at the content-addressed path.
- empty-input ingest returns the BLAKE3/SHA-256 of empty.
- serve_object round-trips ingested bytes; supports byte_range.
- verify_object verifies intact bytes; detects on-disk corruption.
- delete_object passes through (True then False).
- backend_health passes through.
- IngestHints override of primary_algorithm (sha256-as-primary path).
- Missing-object serve raises ObjectNotFoundError.
- Architectural test (ADR-0004 invariant): no control-plane module
(api / registry / retention / audit) imports
artifactstore.storage.backends.* or artifactstore.dataplane.inproc
directly. Enforced via AST scan of every .py file in those packages.
Gates: ruff clean, mypy --strict clean on 44 files, 70 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/artifactstore/storage/:
- spi.py: StorageBackend Protocol (backend_id, put, get, head, delete,
health) and result dataclasses (StorageReceipt, StorageObjectMetadata,
DeletionResult, BackendStatus). ObjectNotFoundError exception type.
- registry.py: backend lookup by string ID (register/get/list_backends/
clear) per ADR-0004.
- backends/local.py: LocalBackend implementation.
* Object layout <root>/<algorithm>/<hex[0:2]>/<hex[2:4]>/<hex>.
* Atomic writes: tmpfile + fsync + rename (idempotent re-puts drain the
stream without rewriting).
* Defence in depth: resolves the final path and asserts it remains under
the configured root.
* Range reads honour HTTP-style inclusive (start, end) tuples.
* health() returns disk usage via shutil.disk_usage and surfaces an
unhealthy status when the root has disappeared.
* delete() cleans up emptied shard directories opportunistically.
tests/unit/test_storage_local.py (14 cases): put/get round-trip; object
key layout matches blueprint; head returns metadata; head/get missing
raise ObjectNotFoundError; put is idempotent; delete returns True then
False; range read returns subrange; range read rejects invalid range;
health reports disk usage; health reports unhealthy when root vanished;
ContentAddress validation blocks path-traversal-flavoured inputs;
registry register/get/list/clear round-trip; idempotent re-put leaves
bytes intact.
Gates: ruff clean, mypy --strict clean on 41 files, 59 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Schema (src/artifactstore/db/schema.py):
- events table (ADR-0002 source of truth): sequence BIGSERIAL PK, created_at,
event_type, subject_kind, subject_id, actor, payload (CBOR bytes),
payload_digest. Indexes on (subject_kind, subject_id) and
(event_type, sequence).
- artifact_packages, artifact_files, storage_locations, retention_state
(materialised views over events).
- retention_classes (seed table) and metadata_schemas (config table).
- ADR-0001 columns present: digest_algorithm, digest_primary, digest_sha256,
content_address. Blueprint tiering columns present: retrieval_tier
(default 'hot'), restore_status.
- Types portable: SQLAlchemy 2.0 Core with JSON().with_variant(JSONB, 'postgresql'),
Uuid, LargeBinary, DateTime(timezone=True), Boolean false() default.
Seed (src/artifactstore/db/seed.py): five v1 retention classes (transient,
raw-evidence, summary-evidence, release-evidence, permanent-record) with
default durations in seconds; permanent-record has no expiry.
Alembic:
- alembic.ini with sync sqlite URL default; path_separator=os to silence the
1.13 deprecation warning.
- migrations/env.py: translates async URLs (+aiosqlite, +asyncpg) to sync
counterparts at migrate-time so a single ARTIFACTSTORE_DATABASE_URL works
for both runtime (async) and Alembic (sync).
- migrations/script.py.mako template.
- migrations/versions/20260516_0001_initial.py: metadata.create_all + bulk
insert of retention class seeds.
Make:
- make migrate: alembic upgrade head (ensures var/ exists).
- make migrate-fresh: drop local SQLite + re-run.
Deps: psycopg[binary] added as optional `postgres` extra (PostgreSQL prod
path; SQLite default for dev needs no extra).
Tests:
- tests/unit/test_db_schema.py: every expected table present; ADR-0001 and
tiering columns present; seed has the five v1 classes; permanent-record
has no default_duration; create_all + FK insert + Boolean default
round-trip on in-memory SQLite.
- tests/integration/test_migrations.py: alembic upgrade head against a
tempfile SQLite produces all tables (+ alembic_version) and the seed rows.
Gates: ruff clean, mypy --strict clean on 32 files, 38 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>