coulomb/artifact-store

Fork 0

generated from coulomb/repo-seed

Files

tegwick 793c0c7ba5 Bootstraping the repo

2026-05-15 20:08:32 +02:00

8.2 KiB

Raw Blame History

Artifact Store Architecture Blueprint

Status: draft Created: 2026-05-15

Purpose

artifact-store provides a generic registry and storage gateway for durable generated artifacts. Producers register packages and files with metadata; storage adapters persist the bytes; retention policy decides how long artifacts remain eligible for retrieval.

The design keeps artifact identity and lifecycle separate from storage implementation. This allows the first version to run against local filesystem storage while the production path can use S3-compatible object storage such as Ceph RGW.

Architecture Summary

producer
  -> Artifact Registry API
    -> metadata database
    -> retention policy engine
    -> audit event log
    -> storage adapter interface
      -> local filesystem backend
      -> S3-compatible backend
      -> Ceph RGW deployment
      -> future cloud/blob/archive backends

The registry is the authority for artifact metadata and lifecycle. Backends are responsible for byte storage and retrieval.

Design Principles

Backend-neutral registry: no producer should know whether bytes live in Ceph, local disk, or a cloud bucket.
Content-addressable confidence: every stored file has a digest and size.
Retention by default: every package receives an expiry decision at ingestion.
Extensions are explicit: retention extensions and holds are audit events, not silent metadata edits.
Packages remain portable: a manifest should be enough to understand a package without calling the producer.
Statehub links, it does not store bytes: Statehub records artifact IDs and outcomes; artifact-store owns file persistence.
Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion jobs must be auditable and reversible only when the backend still has data.

Components

Registry API

HTTP API for producers and operators.

Initial responsibilities:

create artifact packages,
upload or ingest files,
finalize packages,
retrieve package metadata,
list/search packages by subject and producer metadata,
create retention extensions and holds,
expose download metadata or redirect/download endpoints,
expose health and backend status.

Metadata Store

Persistent database for registry state.

Initial implementation can use SQLite for local development and PostgreSQL for shared service deployments if that matches the surrounding service stack.

Core tables:

artifact_packages
artifact_files
storage_locations
retention_rules
retention_events
audit_events

Storage Adapter Interface

Small backend contract used by the API service.

Required operations:

put(object_key, stream, metadata) -> storage_location
get(object_key) -> stream or signed_url
head(object_key) -> object_metadata
delete(object_key) -> deletion_result
health() -> backend_status

Initial backends:

local filesystem backend for tests and development,
S3-compatible backend for Ceph RGW and cloud object stores.

Retention Policy Engine

Applies default rules at ingestion and records later changes.

Initial retention classes:

transient: short-lived scratch artifacts,
raw-evidence: raw logs and run output,
summary-evidence: compact reports and summaries,
release-evidence: release or customer-facing evidence packages,
permanent-record: manually held records with no automatic expiry.

Each package stores:

selected retention class,
default retention rule,
computed expires_at,
extension records,
hold records,
deletion eligibility state.

Audit Log

Append-only record of important events:

package created,
file uploaded,
package finalized,
retrieval requested,
retention extended,
hold applied or released,
deletion requested,
deletion completed or failed.

The audit log does not need to be cryptographic in the first release, but the schema should leave room for signed events or external write-once storage later.

Data Model

Artifact Package

Required fields:

id
name
producer
subject
retention_class
status
created_at
finalized_at
expires_at
metadata

Recommended metadata keys:

repo_slug
run_id
assessment_id
target_profile_ref
assessment_profile_ref
source_commits
tool_versions
environment

Artifact File

Required fields:

id
package_id
relative_path
media_type
size_bytes
sha256
created_at

Storage Location

Required fields:

id
artifact_file_id
backend_id
object_key
storage_class
status
created_at
last_verified_at

Retention Event

Required fields:

id
package_id
event_type
reason
created_by
created_at
previous_expires_at
new_expires_at

Event types:

default_rule_applied
extended
hold_applied
hold_released
deletion_eligible
deleted

API Shape

Initial endpoints:

GET  /health
GET  /backends
POST /packages
GET  /packages
GET  /packages/{package_id}
POST /packages/{package_id}/files
POST /packages/{package_id}/finalize
GET  /packages/{package_id}/manifest
GET  /files/{file_id}/download
POST /packages/{package_id}/retention/extensions
POST /packages/{package_id}/retention/holds
POST /packages/{package_id}/retention/holds/{hold_id}/release

The first ingestion path can accept multipart file uploads. A later trusted-local operator endpoint may ingest from server-local paths, but it should be disabled by default because path ingestion changes the security boundary.

Package Manifest

Every finalized package should expose a JSON manifest containing:

package metadata,
retention summary,
file list,
file digests and sizes,
storage backend references,
source metadata,
created/finalized timestamps.

For guide-board runs, the manifest should preserve links to:

run.json
retention-summary.json
reports/assessment-package.json
reports/report.md
extension-generated scorecards or log reviews,
raw artifact files captured by the assessment package manifest.

Guide-Board Pilot Flow

guide-board run directory
  -> open-cmis-tck scorecard/log review
  -> artifact-store package create
  -> upload run files
  -> finalize manifest
  -> Statehub record links package id and summary

The artifact package should carry:

run id,
target profile reference,
assessment profile reference,
result status,
source commits for guide-board, open-cmis-tck, and the assessed repository,
important report paths,
retention class raw-evidence or release-evidence.

Ceph And S3-Compatible Storage

Ceph should be introduced through the S3-compatible adapter, not as a special case in producer logic.

Configuration should support:

endpoint URL,
bucket,
region,
access key reference,
secret key reference,
optional server-side encryption settings,
object key prefix,
storage class label.

The service should never require credentials in producer request bodies. Use environment variables, mounted secret files, or a local secret provider.

Future Retrieval Tiers

The initial API can treat all stored files as immediately retrievable. Later, storage locations can include:

retrieval_tier: hot, warm, cold, archive,
restore_status: available, restore_requested, restoring, restored, expired,
restore_requested_at,
restore_expires_at.

The registry API should be able to return "not immediately available" without changing artifact identity.

Security Boundary

Initial service assumptions:

internal service, not public internet exposed,
authenticated producer/operator API before shared deployment,
no secret values stored in artifact metadata,
package paths are logical paths, not trusted filesystem paths,
download authorization should be checked at the registry layer.

Files may contain sensitive evidence. The service must treat metadata and bytes as confidential by default.

Open Questions

Which identity provider should guard shared deployments?
Should package metadata schemas be open-ended JSON or typed by producer?
Should deduplication be package-local only or global by content hash?
Should deletion first mark records deleted, then delete bytes, or reverse that order with compensating events?
How much Statehub integration belongs in this repo versus in Statehub clients?

8.2 KiB Raw Blame History

Artifact Store Architecture Blueprint

Purpose

Architecture Summary

Design Principles

Components

Registry API

Metadata Store

Storage Adapter Interface

Retention Policy Engine

Audit Log

Data Model

Artifact Package

Artifact File

Storage Location

Retention Event

API Shape

Package Manifest

Guide-Board Pilot Flow

Ceph And S3-Compatible Storage

Future Retrieval Tiers

Security Boundary

Open Questions

8.2 KiB

Raw Blame History