generated from coulomb/repo-seed
Bootstraping the repo
This commit is contained in:
180
AGENTS.md
Normal file
180
AGENTS.md
Normal file
@@ -0,0 +1,180 @@
|
||||
# artifact-store — Agent Instructions
|
||||
|
||||
## Repo Identity
|
||||
|
||||
**Purpose:** Generic artifact registry and storage gateway for generated outputs,
|
||||
evidence packages, reports, logs, snapshots, exports, and release artifacts.
|
||||
|
||||
**Domain:** stack
|
||||
**Repo slug:** artifact-store
|
||||
**Topic ID:** `595afc64-bd28-47bf-aafb-ba230b28371b`
|
||||
**Workplan prefix:** `ARTIFACT-STORE-WP-`
|
||||
|
||||
---
|
||||
|
||||
## State Hub Integration
|
||||
|
||||
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
|
||||
there is no MCP server for Codex agents.
|
||||
|
||||
| Context | URL |
|
||||
|---------|-----|
|
||||
| Local workstation | `http://127.0.0.1:8000` |
|
||||
| Remote via tunnel | `http://127.0.0.1:18000` |
|
||||
|
||||
### Orient at session start
|
||||
|
||||
```bash
|
||||
# Offline brief — works without hub connection
|
||||
cat .custodian-brief.md
|
||||
|
||||
# Active workstreams for this domain
|
||||
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=595afc64-bd28-47bf-aafb-ba230b28371b&status=active" \
|
||||
| python3 -m json.tool
|
||||
|
||||
# Check inbox
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=artifact-store&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
```
|
||||
|
||||
Mark a message read:
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
```
|
||||
|
||||
### Log progress (required at session close)
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"summary": "what was done",
|
||||
"event_type": "note",
|
||||
"author": "codex",
|
||||
"workstream_id": "<uuid>",
|
||||
"task_id": "<uuid>"
|
||||
}'
|
||||
```
|
||||
|
||||
Omit `workstream_id` / `task_id` when not applicable.
|
||||
|
||||
### Update task status
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"status": "in_progress"}'
|
||||
# values: todo | in_progress | done | blocked
|
||||
```
|
||||
|
||||
### Flag a task for human review
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"needs_human": true, "intervention_note": "reason"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Protocol
|
||||
|
||||
**Start:**
|
||||
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
|
||||
2. Check inbox: `GET /messages/?to_agent=artifact-store&unread_only=true`; mark read
|
||||
3. Scan workplans: `ls workplans/` — note `status: active` files and open tasks
|
||||
4. Check blocked tasks: `GET /tasks/?needs_human=true`
|
||||
|
||||
**During work:**
|
||||
- Update task statuses in workplan files as tasks progress
|
||||
- Record significant decisions via `POST /decisions/`
|
||||
|
||||
**Close:**
|
||||
1. Update workplan file task statuses to reflect progress
|
||||
2. Log: `POST /progress/` with a summary of what changed
|
||||
3. Note for the custodian operator: after workplan file changes, run from
|
||||
`~/the-custodian/state-hub`:
|
||||
```bash
|
||||
make fix-consistency REPO=artifact-store
|
||||
```
|
||||
This syncs task status from files into the hub DB.
|
||||
|
||||
---
|
||||
|
||||
## Workplan Convention (ADR-001)
|
||||
|
||||
Work items originate as files in this repo — not in the hub. The hub is a
|
||||
read/cache/index layer that rebuilds from files.
|
||||
|
||||
**File location:** `workplans/ARTIFACT-STORE-WP-NNNN-<slug>.md`
|
||||
|
||||
**Archived location:** completed workplans may move to
|
||||
`workplans/archived/YYMMDD-ARTIFACT-STORE-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
|
||||
the completion/archive date; the frontmatter `id` does not change.
|
||||
|
||||
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
|
||||
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
|
||||
this only for low-risk work completed directly; create a normal workplan for
|
||||
anything needing analysis, design, approval, dependencies, or multiple phases.
|
||||
|
||||
**Frontmatter:**
|
||||
|
||||
```yaml
|
||||
---
|
||||
id: ARTIFACT-STORE-WP-NNNN
|
||||
type: workplan
|
||||
title: "..."
|
||||
domain: stack
|
||||
repo: artifact-store
|
||||
status: active | done
|
||||
owner: codex
|
||||
topic_slug: ...
|
||||
created: "YYYY-MM-DD"
|
||||
updated: "YYYY-MM-DD"
|
||||
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
---
|
||||
```
|
||||
|
||||
**Task block format** (one per `##` section):
|
||||
|
||||
```
|
||||
## Task Title
|
||||
|
||||
` ` `task
|
||||
id: ARTIFACT-STORE-WP-NNNN-T01
|
||||
status: todo | in_progress | done | blocked
|
||||
priority: high | medium | low
|
||||
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
` ` `
|
||||
|
||||
Task description text.
|
||||
```
|
||||
|
||||
Status progression: `todo` → `in_progress` → `done` (or `blocked`)
|
||||
|
||||
To create a new workplan:
|
||||
1. Write the file following the format above
|
||||
2. Notify the custodian operator to run `make fix-consistency REPO=artifact-store`
|
||||
(or send a message to the hub agent via `POST /messages/`)
|
||||
|
||||
---
|
||||
|
||||
## Current Repo Shape
|
||||
|
||||
This repository is in service-baseline planning. The current source of truth is:
|
||||
|
||||
- `INTENT.md` for purpose, product thesis, scope, and service boundary
|
||||
- `docs/ARCHITECTURE-BLUEPRINT.md` for the draft architecture
|
||||
- `workplans/ARTIFACT-STORE-WP-0001-service-baseline.md` for implementation tasks
|
||||
|
||||
No runnable service scaffold exists yet. Add install, dev-server, and test
|
||||
commands here when `ARTIFACT-STORE-WP-0001-T001` lands.
|
||||
|
||||
## Repo Boundary
|
||||
|
||||
This repo owns artifact identity, package/file metadata, storage backend
|
||||
abstraction, retention policy, retrieval metadata, and audit trails.
|
||||
|
||||
It does not own StateHub work records, guide-board assessment semantics, formal
|
||||
records-management certification, or producer-specific business logic.
|
||||
121
INTENT.md
Normal file
121
INTENT.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# INTENT
|
||||
|
||||
## Project Name
|
||||
|
||||
`artifact-store`
|
||||
|
||||
## Purpose
|
||||
|
||||
`artifact-store` is a generic artifact registry and storage gateway. It gives
|
||||
projects a stable place to register files, evidence packages, logs, reports,
|
||||
snapshots, exports, and other generated outputs without forcing every producer
|
||||
to invent its own retention, indexing, and storage rules.
|
||||
|
||||
The service owns artifact identity, metadata, provenance, retention decisions,
|
||||
lookup, and audit trails. Actual bytes are delegated to one or more configured
|
||||
storage backends such as a local filesystem, S3-compatible object storage, Ceph
|
||||
RGW, AWS S3, Azure Blob Storage, Google Cloud Storage, or future archival tiers.
|
||||
|
||||
## Product Thesis
|
||||
|
||||
Generated artifacts become valuable when they are findable, attributable,
|
||||
retained for the right amount of time, and safely discardable when they are no
|
||||
longer needed. Teams should be able to preserve a run result, point Statehub or
|
||||
another system at its durable registry record, and later prove which files were
|
||||
stored, which hashes they had, where they lived, and when retention was extended
|
||||
or released.
|
||||
|
||||
`artifact-store` exists to make artifact preservation a shared platform concern
|
||||
instead of an ad hoc directory convention.
|
||||
|
||||
## Primary Use Case
|
||||
|
||||
Given a producer such as `guide-board`, a completed assessment run, and an
|
||||
artifact package directory, `artifact-store` should:
|
||||
|
||||
1. register the package and its files,
|
||||
2. compute and store content hashes and sizes,
|
||||
3. capture producer, subject, run, repository, commit, and environment metadata,
|
||||
4. select the applicable retention rule,
|
||||
5. write files through a configured storage backend,
|
||||
6. record all storage locations and backend object keys,
|
||||
7. provide stable retrieval metadata and download links,
|
||||
8. allow retention extension or hold decisions,
|
||||
9. expose enough index data for Statehub, release records, and future UIs,
|
||||
10. make expired artifacts eligible for deletion through an auditable process.
|
||||
|
||||
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
|
||||
assessment output for `kontextual-engine`.
|
||||
|
||||
## Intended Users
|
||||
|
||||
- Assessment and compliance tools that produce evidence packages.
|
||||
- Build, release, and quality systems that need durable generated outputs.
|
||||
- Statehub and repository automation that need to link work records to
|
||||
preserved evidence.
|
||||
- Operators who need retention visibility and controlled deletion.
|
||||
- Future UI and agent workflows that need artifact search, download, or restore
|
||||
status.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
- Artifact package: a logical collection of files registered together, such as a
|
||||
guide-board assessment run directory.
|
||||
- Artifact file: one stored file with a path, media type, size, digest, and
|
||||
storage location.
|
||||
- Registry record: metadata and lifecycle state for an artifact package or file.
|
||||
- Storage backend: a configured adapter that stores and retrieves bytes.
|
||||
- Storage location: a backend-specific pointer such as a bucket/key, filesystem
|
||||
path, or future archive locator.
|
||||
- Retention class: a named policy category such as transient, raw-evidence,
|
||||
release-evidence, audit-prep, or permanent-record.
|
||||
- Retention rule: the default storage duration and deletion behavior for a class.
|
||||
- Retention extension: a time-bounded extension of an artifact's expiry date.
|
||||
- Hold: a stronger instruction that prevents deletion until explicitly released.
|
||||
- Retrieval tier: a future storage or access class such as hot, warm, cold, or
|
||||
archived.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- metadata registry for artifact packages and files,
|
||||
- content hashing and manifest generation,
|
||||
- pluggable storage backend interface,
|
||||
- local filesystem backend for development,
|
||||
- S3-compatible backend suitable for Ceph RGW,
|
||||
- default retention classes and expiry calculation,
|
||||
- retention extension and hold records,
|
||||
- retrieval metadata and download path generation,
|
||||
- audit events for ingestion, retrieval, retention changes, and deletion,
|
||||
- API-first service suitable for automation,
|
||||
- pilot integration with guide-board assessment runs.
|
||||
|
||||
Out of scope for the initial service:
|
||||
|
||||
- replacing Statehub as the work, repository, or decision system of record,
|
||||
- embedding guide-board-specific assessment semantics in the registry core,
|
||||
- full compliance certification or legal-record guarantees,
|
||||
- cloud-provider-specific lifecycle automation beyond backend adapter hooks,
|
||||
- asynchronous cold-archive restore flows,
|
||||
- user-facing UI beyond API contracts and minimal operator docs.
|
||||
|
||||
## Relationship To Other Services
|
||||
|
||||
`artifact-store` should remain a shared infrastructure service.
|
||||
|
||||
- `guide-board` produces assessment packages and asks `artifact-store` to
|
||||
preserve them.
|
||||
- `open-cmis-tck` can add CMIS-specific scorecards and log reviews before a
|
||||
guide-board run is ingested.
|
||||
- `Statehub` records work, decisions, repository state, and links to artifact
|
||||
registry identifiers.
|
||||
- Ceph is a strong self-hosted storage backend candidate because its RGW layer is
|
||||
S3-compatible, but the registry must not be Ceph-only.
|
||||
|
||||
## Boundary
|
||||
|
||||
The registry can prove what it stored, where it stored it, which hashes it
|
||||
computed, and which retention decisions were applied. It does not prove the
|
||||
truth of the artifact contents, certify a system, or replace formal records
|
||||
management without additional governance.
|
||||
15
README.md
15
README.md
@@ -1,3 +1,14 @@
|
||||
# repo-seed
|
||||
# artifact-store
|
||||
|
||||
A git repository template to bootstrap coulomb projects from.
|
||||
Generic artifact registry and storage gateway for generated outputs, evidence
|
||||
packages, reports, logs, and release artifacts.
|
||||
|
||||
The registry owns artifact identity, metadata, provenance, retention policy, and
|
||||
retrieval records. Actual bytes are delegated to configured storage backends such
|
||||
as a local filesystem, S3-compatible object storage, or Ceph RGW.
|
||||
|
||||
Start here:
|
||||
|
||||
- [INTENT.md](INTENT.md)
|
||||
- [docs/ARCHITECTURE-BLUEPRINT.md](docs/ARCHITECTURE-BLUEPRINT.md)
|
||||
- [workplans/ARTIFACT-STORE-WP-0001-service-baseline.md](workplans/ARTIFACT-STORE-WP-0001-service-baseline.md)
|
||||
|
||||
133
SCOPE.md
Normal file
133
SCOPE.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# SCOPE
|
||||
|
||||
> This file helps you quickly understand what this repository is about,
|
||||
> when it is relevant, and when it is not.
|
||||
> It is intentionally lightweight and may be incomplete.
|
||||
|
||||
---
|
||||
|
||||
## One-liner
|
||||
|
||||
`artifact-store` is a generic artifact registry and storage gateway for durable
|
||||
generated outputs, evidence packages, reports, logs, snapshots, exports, and
|
||||
release artifacts.
|
||||
|
||||
---
|
||||
|
||||
## Core Idea
|
||||
|
||||
Generated artifacts become valuable when they are findable, attributable,
|
||||
retained for the right amount of time, and safely discardable when they expire.
|
||||
This repository makes artifact preservation a shared platform concern: producers
|
||||
register packages and files, the registry owns metadata and lifecycle, and
|
||||
storage backends own bytes.
|
||||
|
||||
---
|
||||
|
||||
## In Scope
|
||||
|
||||
- Artifact package and artifact file metadata.
|
||||
- Content hashing, manifests, provenance, and audit events.
|
||||
- Pluggable storage backend interface with local filesystem and S3-compatible
|
||||
backends as the first targets.
|
||||
- Retention classes, expiry calculation, retention extension, and holds.
|
||||
- Retrieval metadata and download/link surfaces for automation.
|
||||
- Pilot ingestion flow for guide-board / OpenCMIS TCK assessment output.
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Replacing StateHub as the work, repository, or decision system of record.
|
||||
- Encoding guide-board-specific assessment semantics in the registry core.
|
||||
- Formal compliance certification or legal-record guarantees by itself.
|
||||
- Cloud-provider-specific lifecycle automation beyond backend adapter hooks.
|
||||
- User-facing UI beyond API contracts and minimal operator documentation.
|
||||
|
||||
---
|
||||
|
||||
## Relevant When
|
||||
|
||||
- A tool has generated files that need durable storage and stable identifiers.
|
||||
- StateHub, release records, or operators need to link to preserved evidence.
|
||||
- A producer needs backend-neutral storage across local disk, Ceph RGW, or other
|
||||
S3-compatible object storage.
|
||||
- Retention, hold, or deletion eligibility needs to be explicit and auditable.
|
||||
|
||||
---
|
||||
|
||||
## Not Relevant When
|
||||
|
||||
- The artifact is purely temporary scratch output with no retention need.
|
||||
- The work is about StateHub tasks, decisions, or repository catalog data rather
|
||||
than artifact bytes and metadata.
|
||||
- A producer needs domain-specific scoring, validation, or assessment semantics.
|
||||
- The requirement is a human-facing artifact browser rather than API-first
|
||||
preservation.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- Status: concept / service-baseline planning
|
||||
- Implementation: documentation and initial workplan only
|
||||
- Stability: evolving
|
||||
- Usage: none yet; first pilot target is guide-board assessment output
|
||||
|
||||
---
|
||||
|
||||
## How It Fits
|
||||
|
||||
- Upstream dependencies: producer repositories such as `guide-board` and
|
||||
`open-cmis-tck`; future storage backends such as local filesystem and Ceph RGW.
|
||||
- Downstream consumers: StateHub records, release/evidence workflows, operators,
|
||||
and future artifact search or retrieval UIs.
|
||||
- Often used with: `guide-board`, `open-cmis-tck`, `kontextual-engine`,
|
||||
StateHub, and S3-compatible object storage.
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
- Preferred terms: artifact package, artifact file, registry record, storage
|
||||
backend, storage location, retention class, retention rule, hold.
|
||||
- Also known as: artifact registry, evidence store, storage gateway.
|
||||
- Potentially confusing terms: StateHub links to artifact identifiers but does
|
||||
not store artifact bytes.
|
||||
|
||||
---
|
||||
|
||||
## Related / Overlapping
|
||||
|
||||
- `guide-board` - produces assessment packages and evidence output.
|
||||
- `open-cmis-tck` - contributes CMIS-specific assessment artifacts for the pilot.
|
||||
- `kontextual-engine` - first likely subject of preserved guide-board evidence.
|
||||
- `the-custodian/state-hub` - records work, decisions, repository state, and
|
||||
links to artifact registry identifiers.
|
||||
|
||||
---
|
||||
|
||||
## Getting Oriented
|
||||
|
||||
- Start with: `INTENT.md`.
|
||||
- Key files / directories: `docs/ARCHITECTURE-BLUEPRINT.md`, `workplans/`.
|
||||
- Entry points: no service entry point yet; see
|
||||
`workplans/ARTIFACT-STORE-WP-0001-service-baseline.md`.
|
||||
|
||||
---
|
||||
|
||||
## Provided Capabilities
|
||||
|
||||
```capability
|
||||
type: infrastructure
|
||||
title: Artifact package preservation
|
||||
description: Register generated artifact packages and files, store bytes through a configured backend, compute hashes, apply retention policy, and return stable package identifiers for StateHub or producer records.
|
||||
keywords: [artifacts, evidence, retention, storage, registry, provenance]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
|
||||
assessment output for `kontextual-engine`.
|
||||
330
docs/ARCHITECTURE-BLUEPRINT.md
Normal file
330
docs/ARCHITECTURE-BLUEPRINT.md
Normal file
@@ -0,0 +1,330 @@
|
||||
# Artifact Store Architecture Blueprint
|
||||
|
||||
Status: draft
|
||||
Created: 2026-05-15
|
||||
|
||||
## Purpose
|
||||
|
||||
`artifact-store` provides a generic registry and storage gateway for durable
|
||||
generated artifacts. Producers register packages and files with metadata;
|
||||
storage adapters persist the bytes; retention policy decides how long artifacts
|
||||
remain eligible for retrieval.
|
||||
|
||||
The design keeps artifact identity and lifecycle separate from storage
|
||||
implementation. This allows the first version to run against local filesystem
|
||||
storage while the production path can use S3-compatible object storage such as
|
||||
Ceph RGW.
|
||||
|
||||
## Architecture Summary
|
||||
|
||||
```text
|
||||
producer
|
||||
-> Artifact Registry API
|
||||
-> metadata database
|
||||
-> retention policy engine
|
||||
-> audit event log
|
||||
-> storage adapter interface
|
||||
-> local filesystem backend
|
||||
-> S3-compatible backend
|
||||
-> Ceph RGW deployment
|
||||
-> future cloud/blob/archive backends
|
||||
```
|
||||
|
||||
The registry is the authority for artifact metadata and lifecycle. Backends are
|
||||
responsible for byte storage and retrieval.
|
||||
|
||||
## Design Principles
|
||||
|
||||
- Backend-neutral registry: no producer should know whether bytes live in Ceph,
|
||||
local disk, or a cloud bucket.
|
||||
- Content-addressable confidence: every stored file has a digest and size.
|
||||
- Retention by default: every package receives an expiry decision at ingestion.
|
||||
- Extensions are explicit: retention extensions and holds are audit events, not
|
||||
silent metadata edits.
|
||||
- Packages remain portable: a manifest should be enough to understand a package
|
||||
without calling the producer.
|
||||
- Statehub links, it does not store bytes: Statehub records artifact IDs and
|
||||
outcomes; artifact-store owns file persistence.
|
||||
- Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion
|
||||
jobs must be auditable and reversible only when the backend still has data.
|
||||
|
||||
## Components
|
||||
|
||||
### Registry API
|
||||
|
||||
HTTP API for producers and operators.
|
||||
|
||||
Initial responsibilities:
|
||||
|
||||
- create artifact packages,
|
||||
- upload or ingest files,
|
||||
- finalize packages,
|
||||
- retrieve package metadata,
|
||||
- list/search packages by subject and producer metadata,
|
||||
- create retention extensions and holds,
|
||||
- expose download metadata or redirect/download endpoints,
|
||||
- expose health and backend status.
|
||||
|
||||
### Metadata Store
|
||||
|
||||
Persistent database for registry state.
|
||||
|
||||
Initial implementation can use SQLite for local development and PostgreSQL for
|
||||
shared service deployments if that matches the surrounding service stack.
|
||||
|
||||
Core tables:
|
||||
|
||||
- `artifact_packages`
|
||||
- `artifact_files`
|
||||
- `storage_locations`
|
||||
- `retention_rules`
|
||||
- `retention_events`
|
||||
- `audit_events`
|
||||
|
||||
### Storage Adapter Interface
|
||||
|
||||
Small backend contract used by the API service.
|
||||
|
||||
Required operations:
|
||||
|
||||
- `put(object_key, stream, metadata) -> storage_location`
|
||||
- `get(object_key) -> stream or signed_url`
|
||||
- `head(object_key) -> object_metadata`
|
||||
- `delete(object_key) -> deletion_result`
|
||||
- `health() -> backend_status`
|
||||
|
||||
Initial backends:
|
||||
|
||||
- local filesystem backend for tests and development,
|
||||
- S3-compatible backend for Ceph RGW and cloud object stores.
|
||||
|
||||
### Retention Policy Engine
|
||||
|
||||
Applies default rules at ingestion and records later changes.
|
||||
|
||||
Initial retention classes:
|
||||
|
||||
- `transient`: short-lived scratch artifacts,
|
||||
- `raw-evidence`: raw logs and run output,
|
||||
- `summary-evidence`: compact reports and summaries,
|
||||
- `release-evidence`: release or customer-facing evidence packages,
|
||||
- `permanent-record`: manually held records with no automatic expiry.
|
||||
|
||||
Each package stores:
|
||||
|
||||
- selected retention class,
|
||||
- default retention rule,
|
||||
- computed `expires_at`,
|
||||
- extension records,
|
||||
- hold records,
|
||||
- deletion eligibility state.
|
||||
|
||||
### Audit Log
|
||||
|
||||
Append-only record of important events:
|
||||
|
||||
- package created,
|
||||
- file uploaded,
|
||||
- package finalized,
|
||||
- retrieval requested,
|
||||
- retention extended,
|
||||
- hold applied or released,
|
||||
- deletion requested,
|
||||
- deletion completed or failed.
|
||||
|
||||
The audit log does not need to be cryptographic in the first release, but the
|
||||
schema should leave room for signed events or external write-once storage later.
|
||||
|
||||
## Data Model
|
||||
|
||||
### Artifact Package
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`
|
||||
- `name`
|
||||
- `producer`
|
||||
- `subject`
|
||||
- `retention_class`
|
||||
- `status`
|
||||
- `created_at`
|
||||
- `finalized_at`
|
||||
- `expires_at`
|
||||
- `metadata`
|
||||
|
||||
Recommended metadata keys:
|
||||
|
||||
- `repo_slug`
|
||||
- `run_id`
|
||||
- `assessment_id`
|
||||
- `target_profile_ref`
|
||||
- `assessment_profile_ref`
|
||||
- `source_commits`
|
||||
- `tool_versions`
|
||||
- `environment`
|
||||
|
||||
### Artifact File
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`
|
||||
- `package_id`
|
||||
- `relative_path`
|
||||
- `media_type`
|
||||
- `size_bytes`
|
||||
- `sha256`
|
||||
- `created_at`
|
||||
|
||||
### Storage Location
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`
|
||||
- `artifact_file_id`
|
||||
- `backend_id`
|
||||
- `object_key`
|
||||
- `storage_class`
|
||||
- `status`
|
||||
- `created_at`
|
||||
- `last_verified_at`
|
||||
|
||||
### Retention Event
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`
|
||||
- `package_id`
|
||||
- `event_type`
|
||||
- `reason`
|
||||
- `created_by`
|
||||
- `created_at`
|
||||
- `previous_expires_at`
|
||||
- `new_expires_at`
|
||||
|
||||
Event types:
|
||||
|
||||
- `default_rule_applied`
|
||||
- `extended`
|
||||
- `hold_applied`
|
||||
- `hold_released`
|
||||
- `deletion_eligible`
|
||||
- `deleted`
|
||||
|
||||
## API Shape
|
||||
|
||||
Initial endpoints:
|
||||
|
||||
```text
|
||||
GET /health
|
||||
GET /backends
|
||||
POST /packages
|
||||
GET /packages
|
||||
GET /packages/{package_id}
|
||||
POST /packages/{package_id}/files
|
||||
POST /packages/{package_id}/finalize
|
||||
GET /packages/{package_id}/manifest
|
||||
GET /files/{file_id}/download
|
||||
POST /packages/{package_id}/retention/extensions
|
||||
POST /packages/{package_id}/retention/holds
|
||||
POST /packages/{package_id}/retention/holds/{hold_id}/release
|
||||
```
|
||||
|
||||
The first ingestion path can accept multipart file uploads. A later trusted-local
|
||||
operator endpoint may ingest from server-local paths, but it should be disabled
|
||||
by default because path ingestion changes the security boundary.
|
||||
|
||||
## Package Manifest
|
||||
|
||||
Every finalized package should expose a JSON manifest containing:
|
||||
|
||||
- package metadata,
|
||||
- retention summary,
|
||||
- file list,
|
||||
- file digests and sizes,
|
||||
- storage backend references,
|
||||
- source metadata,
|
||||
- created/finalized timestamps.
|
||||
|
||||
For guide-board runs, the manifest should preserve links to:
|
||||
|
||||
- `run.json`
|
||||
- `retention-summary.json`
|
||||
- `reports/assessment-package.json`
|
||||
- `reports/report.md`
|
||||
- extension-generated scorecards or log reviews,
|
||||
- raw artifact files captured by the assessment package manifest.
|
||||
|
||||
## Guide-Board Pilot Flow
|
||||
|
||||
```text
|
||||
guide-board run directory
|
||||
-> open-cmis-tck scorecard/log review
|
||||
-> artifact-store package create
|
||||
-> upload run files
|
||||
-> finalize manifest
|
||||
-> Statehub record links package id and summary
|
||||
```
|
||||
|
||||
The artifact package should carry:
|
||||
|
||||
- run id,
|
||||
- target profile reference,
|
||||
- assessment profile reference,
|
||||
- result status,
|
||||
- source commits for guide-board, open-cmis-tck, and the assessed repository,
|
||||
- important report paths,
|
||||
- retention class `raw-evidence` or `release-evidence`.
|
||||
|
||||
## Ceph And S3-Compatible Storage
|
||||
|
||||
Ceph should be introduced through the S3-compatible adapter, not as a special
|
||||
case in producer logic.
|
||||
|
||||
Configuration should support:
|
||||
|
||||
- endpoint URL,
|
||||
- bucket,
|
||||
- region,
|
||||
- access key reference,
|
||||
- secret key reference,
|
||||
- optional server-side encryption settings,
|
||||
- object key prefix,
|
||||
- storage class label.
|
||||
|
||||
The service should never require credentials in producer request bodies. Use
|
||||
environment variables, mounted secret files, or a local secret provider.
|
||||
|
||||
## Future Retrieval Tiers
|
||||
|
||||
The initial API can treat all stored files as immediately retrievable. Later,
|
||||
storage locations can include:
|
||||
|
||||
- `retrieval_tier`: hot, warm, cold, archive,
|
||||
- `restore_status`: available, restore_requested, restoring, restored, expired,
|
||||
- `restore_requested_at`,
|
||||
- `restore_expires_at`.
|
||||
|
||||
The registry API should be able to return "not immediately available" without
|
||||
changing artifact identity.
|
||||
|
||||
## Security Boundary
|
||||
|
||||
Initial service assumptions:
|
||||
|
||||
- internal service, not public internet exposed,
|
||||
- authenticated producer/operator API before shared deployment,
|
||||
- no secret values stored in artifact metadata,
|
||||
- package paths are logical paths, not trusted filesystem paths,
|
||||
- download authorization should be checked at the registry layer.
|
||||
|
||||
Files may contain sensitive evidence. The service must treat metadata and bytes
|
||||
as confidential by default.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- Which identity provider should guard shared deployments?
|
||||
- Should package metadata schemas be open-ended JSON or typed by producer?
|
||||
- Should deduplication be package-local only or global by content hash?
|
||||
- Should deletion first mark records deleted, then delete bytes, or reverse that
|
||||
order with compensating events?
|
||||
- How much Statehub integration belongs in this repo versus in Statehub clients?
|
||||
229
workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
Normal file
229
workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
Normal file
@@ -0,0 +1,229 @@
|
||||
---
|
||||
id: ARTIFACT-STORE-WP-0001
|
||||
type: workplan
|
||||
title: "Artifact Store Service Baseline"
|
||||
repo: artifact-store
|
||||
domain: stack
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: stack
|
||||
planning_priority: high
|
||||
planning_order: 1
|
||||
created: "2026-05-15"
|
||||
updated: "2026-05-15"
|
||||
state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb"
|
||||
---
|
||||
|
||||
# ARTIFACT-STORE-WP-0001: Artifact Store Service Baseline
|
||||
|
||||
## Purpose
|
||||
|
||||
Implement the first usable artifact registry and storage gateway. The service
|
||||
should preserve artifact packages, index their metadata, delegate bytes to a
|
||||
configured storage backend, apply default retention rules, and expose stable
|
||||
package identifiers that Statehub and producer repositories can link to.
|
||||
|
||||
The first producer target is a guide-board assessment run, including OpenCMIS TCK
|
||||
reports and raw assessment artifacts.
|
||||
|
||||
## Background
|
||||
|
||||
Guide-board can already produce self-contained run directories with retention
|
||||
summaries, assessment packages, raw artifacts, scorecards, and log reviews. Those
|
||||
directories should not live only in `/tmp`, and committing raw evidence into
|
||||
producer repositories is the wrong long-term shape.
|
||||
|
||||
`artifact-store` becomes the shared preservation layer:
|
||||
|
||||
- producers generate files,
|
||||
- artifact-store registers and stores them,
|
||||
- Statehub records the work outcome and links to the registry package,
|
||||
- storage backends handle durable bytes.
|
||||
|
||||
Ceph is the likely self-hosted production backend through its S3-compatible RGW
|
||||
interface, but the service must keep the backend interface generic.
|
||||
|
||||
## Target Architecture
|
||||
|
||||
```text
|
||||
producer package
|
||||
-> registry API
|
||||
-> metadata database
|
||||
-> retention policy engine
|
||||
-> storage adapter
|
||||
-> local filesystem or S3-compatible object storage
|
||||
```
|
||||
|
||||
## Boundary
|
||||
|
||||
This workplan owns the first service implementation and API contract. It does
|
||||
not need to build a UI, implement cold-storage restore tiers, replace Statehub,
|
||||
or provide formal records-management certification.
|
||||
|
||||
## D1.1 - Service Scaffold And Repository Identity
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T001
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Replace the seed README with artifact-store service instructions.
|
||||
- Add a Python service scaffold with a clear package/module layout.
|
||||
- Provide a local development command.
|
||||
- Provide a test command.
|
||||
- Keep generated artifact bytes and local databases ignored by git.
|
||||
- Document required environment variables.
|
||||
|
||||
## D1.2 - Registry Data Model
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T002
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Define persistent models for artifact packages, files, storage locations,
|
||||
retention rules, retention events, and audit events.
|
||||
- Store package metadata as structured JSON while keeping core query fields
|
||||
explicit.
|
||||
- Record package lifecycle status: created, uploading, finalized, deleted, and
|
||||
failed.
|
||||
- Record file `sha256`, size, media type, and logical relative path.
|
||||
- Add migrations or a reproducible schema initialization path.
|
||||
|
||||
## D1.3 - Local Filesystem Storage Backend
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T003
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Implement a storage adapter interface.
|
||||
- Implement a local filesystem backend for development and tests.
|
||||
- Store objects under deterministic package/file keys.
|
||||
- Prevent path traversal and accidental writes outside the configured storage
|
||||
root.
|
||||
- Add backend health reporting.
|
||||
- Add tests for put, get, head, and delete operations.
|
||||
|
||||
## D1.4 - Package Ingestion API
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T004
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "e3879111-4be9-4731-8aea-15abb874f960"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Add endpoints to create a package, upload files, finalize a package, retrieve
|
||||
package metadata, list packages, and download files.
|
||||
- Compute file hashes server-side during ingestion.
|
||||
- Reject duplicate logical paths within one package unless explicitly replacing
|
||||
a non-finalized file.
|
||||
- Produce a package manifest after finalization.
|
||||
- Add API tests covering successful ingestion and validation failures.
|
||||
|
||||
## D1.5 - Retention Baseline
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T005
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "2d6cbd83-c348-45ad-a223-7870a3412225"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Seed default retention classes for transient, raw-evidence, summary-evidence,
|
||||
release-evidence, and permanent-record.
|
||||
- Apply a default `expires_at` when a package is created or finalized.
|
||||
- Add endpoints to extend retention and apply or release holds.
|
||||
- Record retention changes as retention events and audit events.
|
||||
- Expose deletion eligibility without deleting bytes automatically in the first
|
||||
implementation.
|
||||
|
||||
## D1.6 - S3-Compatible Backend Design Hook
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T006
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "7b980a55-2364-48c3-98ac-081629a8d2b7"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Define configuration fields for an S3-compatible backend.
|
||||
- Keep the adapter contract compatible with Ceph RGW.
|
||||
- Add an implementation stub or feature-flagged backend if dependencies are not
|
||||
ready.
|
||||
- Document expected Ceph/S3 configuration without requiring a live Ceph service
|
||||
for baseline tests.
|
||||
|
||||
## D1.7 - Guide-Board Pilot Ingestion
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T007
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "eb822821-353c-4cd2-95bf-acb2f084b7ea"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Provide a CLI helper or documented curl flow to register a guide-board run
|
||||
directory as one package.
|
||||
- Preserve guide-board run metadata: run id, target profile, assessment profile,
|
||||
evidence result counts, finding counts, source commits, and report paths.
|
||||
- Ingest the CMIS pilot run shape, including scorecard and log-review reports.
|
||||
- Return a package id suitable for recording in Statehub.
|
||||
- Add a fixture-based test that does not require the real OpenCMIS TCK.
|
||||
|
||||
## D1.8 - Operator Documentation And Handoff
|
||||
|
||||
```task
|
||||
id: ARTIFACT-STORE-WP-0001-T008
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"
|
||||
```
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Document local run, test, and package ingestion commands.
|
||||
- Document retention behavior and extension flow.
|
||||
- Document the boundary between artifact-store and Statehub.
|
||||
- Include a dev-agent handoff section listing the first implementation order.
|
||||
- Keep architecture docs aligned with the implemented API.
|
||||
|
||||
## Suggested Implementation Order
|
||||
|
||||
1. Service scaffold, test harness, and README.
|
||||
2. Metadata models and local database setup.
|
||||
3. Local filesystem storage adapter.
|
||||
4. Package create/upload/finalize/download API.
|
||||
5. Retention defaults, extension, hold, and audit events.
|
||||
6. Guide-board run ingestion helper.
|
||||
7. S3-compatible backend configuration and Ceph notes.
|
||||
|
||||
## First Pilot Success Criteria
|
||||
|
||||
- A completed guide-board CMIS run can be ingested from a local directory.
|
||||
- The package manifest lists every stored file with SHA-256 and size.
|
||||
- The registry returns a stable package id.
|
||||
- Files can be downloaded through the service.
|
||||
- Default retention is visible and can be extended.
|
||||
- Statehub can record the package id and summary without storing artifact bytes.
|
||||
Reference in New Issue
Block a user