generated from coulomb/repo-seed
Bootstraping the repo
This commit is contained in:
180
AGENTS.md
Normal file
180
AGENTS.md
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
# artifact-store — Agent Instructions
|
||||||
|
|
||||||
|
## Repo Identity
|
||||||
|
|
||||||
|
**Purpose:** Generic artifact registry and storage gateway for generated outputs,
|
||||||
|
evidence packages, reports, logs, snapshots, exports, and release artifacts.
|
||||||
|
|
||||||
|
**Domain:** stack
|
||||||
|
**Repo slug:** artifact-store
|
||||||
|
**Topic ID:** `595afc64-bd28-47bf-aafb-ba230b28371b`
|
||||||
|
**Workplan prefix:** `ARTIFACT-STORE-WP-`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## State Hub Integration
|
||||||
|
|
||||||
|
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
|
||||||
|
there is no MCP server for Codex agents.
|
||||||
|
|
||||||
|
| Context | URL |
|
||||||
|
|---------|-----|
|
||||||
|
| Local workstation | `http://127.0.0.1:8000` |
|
||||||
|
| Remote via tunnel | `http://127.0.0.1:18000` |
|
||||||
|
|
||||||
|
### Orient at session start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Offline brief — works without hub connection
|
||||||
|
cat .custodian-brief.md
|
||||||
|
|
||||||
|
# Active workstreams for this domain
|
||||||
|
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=595afc64-bd28-47bf-aafb-ba230b28371b&status=active" \
|
||||||
|
| python3 -m json.tool
|
||||||
|
|
||||||
|
# Check inbox
|
||||||
|
curl -s "http://127.0.0.1:8000/messages/?to_agent=artifact-store&unread_only=true" \
|
||||||
|
| python3 -m json.tool
|
||||||
|
```
|
||||||
|
|
||||||
|
Mark a message read:
|
||||||
|
```bash
|
||||||
|
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||||
|
-H "Content-Type: application/json" -d '{}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Log progress (required at session close)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"summary": "what was done",
|
||||||
|
"event_type": "note",
|
||||||
|
"author": "codex",
|
||||||
|
"workstream_id": "<uuid>",
|
||||||
|
"task_id": "<uuid>"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Omit `workstream_id` / `task_id` when not applicable.
|
||||||
|
|
||||||
|
### Update task status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"status": "in_progress"}'
|
||||||
|
# values: todo | in_progress | done | blocked
|
||||||
|
```
|
||||||
|
|
||||||
|
### Flag a task for human review
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"needs_human": true, "intervention_note": "reason"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Session Protocol
|
||||||
|
|
||||||
|
**Start:**
|
||||||
|
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
|
||||||
|
2. Check inbox: `GET /messages/?to_agent=artifact-store&unread_only=true`; mark read
|
||||||
|
3. Scan workplans: `ls workplans/` — note `status: active` files and open tasks
|
||||||
|
4. Check blocked tasks: `GET /tasks/?needs_human=true`
|
||||||
|
|
||||||
|
**During work:**
|
||||||
|
- Update task statuses in workplan files as tasks progress
|
||||||
|
- Record significant decisions via `POST /decisions/`
|
||||||
|
|
||||||
|
**Close:**
|
||||||
|
1. Update workplan file task statuses to reflect progress
|
||||||
|
2. Log: `POST /progress/` with a summary of what changed
|
||||||
|
3. Note for the custodian operator: after workplan file changes, run from
|
||||||
|
`~/the-custodian/state-hub`:
|
||||||
|
```bash
|
||||||
|
make fix-consistency REPO=artifact-store
|
||||||
|
```
|
||||||
|
This syncs task status from files into the hub DB.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Workplan Convention (ADR-001)
|
||||||
|
|
||||||
|
Work items originate as files in this repo — not in the hub. The hub is a
|
||||||
|
read/cache/index layer that rebuilds from files.
|
||||||
|
|
||||||
|
**File location:** `workplans/ARTIFACT-STORE-WP-NNNN-<slug>.md`
|
||||||
|
|
||||||
|
**Archived location:** completed workplans may move to
|
||||||
|
`workplans/archived/YYMMDD-ARTIFACT-STORE-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
|
||||||
|
the completion/archive date; the frontmatter `id` does not change.
|
||||||
|
|
||||||
|
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
|
||||||
|
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
|
||||||
|
this only for low-risk work completed directly; create a normal workplan for
|
||||||
|
anything needing analysis, design, approval, dependencies, or multiple phases.
|
||||||
|
|
||||||
|
**Frontmatter:**
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
id: ARTIFACT-STORE-WP-NNNN
|
||||||
|
type: workplan
|
||||||
|
title: "..."
|
||||||
|
domain: stack
|
||||||
|
repo: artifact-store
|
||||||
|
status: active | done
|
||||||
|
owner: codex
|
||||||
|
topic_slug: ...
|
||||||
|
created: "YYYY-MM-DD"
|
||||||
|
updated: "YYYY-MM-DD"
|
||||||
|
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
**Task block format** (one per `##` section):
|
||||||
|
|
||||||
|
```
|
||||||
|
## Task Title
|
||||||
|
|
||||||
|
` ` `task
|
||||||
|
id: ARTIFACT-STORE-WP-NNNN-T01
|
||||||
|
status: todo | in_progress | done | blocked
|
||||||
|
priority: high | medium | low
|
||||||
|
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||||
|
` ` `
|
||||||
|
|
||||||
|
Task description text.
|
||||||
|
```
|
||||||
|
|
||||||
|
Status progression: `todo` → `in_progress` → `done` (or `blocked`)
|
||||||
|
|
||||||
|
To create a new workplan:
|
||||||
|
1. Write the file following the format above
|
||||||
|
2. Notify the custodian operator to run `make fix-consistency REPO=artifact-store`
|
||||||
|
(or send a message to the hub agent via `POST /messages/`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current Repo Shape
|
||||||
|
|
||||||
|
This repository is in service-baseline planning. The current source of truth is:
|
||||||
|
|
||||||
|
- `INTENT.md` for purpose, product thesis, scope, and service boundary
|
||||||
|
- `docs/ARCHITECTURE-BLUEPRINT.md` for the draft architecture
|
||||||
|
- `workplans/ARTIFACT-STORE-WP-0001-service-baseline.md` for implementation tasks
|
||||||
|
|
||||||
|
No runnable service scaffold exists yet. Add install, dev-server, and test
|
||||||
|
commands here when `ARTIFACT-STORE-WP-0001-T001` lands.
|
||||||
|
|
||||||
|
## Repo Boundary
|
||||||
|
|
||||||
|
This repo owns artifact identity, package/file metadata, storage backend
|
||||||
|
abstraction, retention policy, retrieval metadata, and audit trails.
|
||||||
|
|
||||||
|
It does not own StateHub work records, guide-board assessment semantics, formal
|
||||||
|
records-management certification, or producer-specific business logic.
|
||||||
121
INTENT.md
Normal file
121
INTENT.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
# INTENT
|
||||||
|
|
||||||
|
## Project Name
|
||||||
|
|
||||||
|
`artifact-store`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
`artifact-store` is a generic artifact registry and storage gateway. It gives
|
||||||
|
projects a stable place to register files, evidence packages, logs, reports,
|
||||||
|
snapshots, exports, and other generated outputs without forcing every producer
|
||||||
|
to invent its own retention, indexing, and storage rules.
|
||||||
|
|
||||||
|
The service owns artifact identity, metadata, provenance, retention decisions,
|
||||||
|
lookup, and audit trails. Actual bytes are delegated to one or more configured
|
||||||
|
storage backends such as a local filesystem, S3-compatible object storage, Ceph
|
||||||
|
RGW, AWS S3, Azure Blob Storage, Google Cloud Storage, or future archival tiers.
|
||||||
|
|
||||||
|
## Product Thesis
|
||||||
|
|
||||||
|
Generated artifacts become valuable when they are findable, attributable,
|
||||||
|
retained for the right amount of time, and safely discardable when they are no
|
||||||
|
longer needed. Teams should be able to preserve a run result, point Statehub or
|
||||||
|
another system at its durable registry record, and later prove which files were
|
||||||
|
stored, which hashes they had, where they lived, and when retention was extended
|
||||||
|
or released.
|
||||||
|
|
||||||
|
`artifact-store` exists to make artifact preservation a shared platform concern
|
||||||
|
instead of an ad hoc directory convention.
|
||||||
|
|
||||||
|
## Primary Use Case
|
||||||
|
|
||||||
|
Given a producer such as `guide-board`, a completed assessment run, and an
|
||||||
|
artifact package directory, `artifact-store` should:
|
||||||
|
|
||||||
|
1. register the package and its files,
|
||||||
|
2. compute and store content hashes and sizes,
|
||||||
|
3. capture producer, subject, run, repository, commit, and environment metadata,
|
||||||
|
4. select the applicable retention rule,
|
||||||
|
5. write files through a configured storage backend,
|
||||||
|
6. record all storage locations and backend object keys,
|
||||||
|
7. provide stable retrieval metadata and download links,
|
||||||
|
8. allow retention extension or hold decisions,
|
||||||
|
9. expose enough index data for Statehub, release records, and future UIs,
|
||||||
|
10. make expired artifacts eligible for deletion through an auditable process.
|
||||||
|
|
||||||
|
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
|
||||||
|
assessment output for `kontextual-engine`.
|
||||||
|
|
||||||
|
## Intended Users
|
||||||
|
|
||||||
|
- Assessment and compliance tools that produce evidence packages.
|
||||||
|
- Build, release, and quality systems that need durable generated outputs.
|
||||||
|
- Statehub and repository automation that need to link work records to
|
||||||
|
preserved evidence.
|
||||||
|
- Operators who need retention visibility and controlled deletion.
|
||||||
|
- Future UI and agent workflows that need artifact search, download, or restore
|
||||||
|
status.
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
- Artifact package: a logical collection of files registered together, such as a
|
||||||
|
guide-board assessment run directory.
|
||||||
|
- Artifact file: one stored file with a path, media type, size, digest, and
|
||||||
|
storage location.
|
||||||
|
- Registry record: metadata and lifecycle state for an artifact package or file.
|
||||||
|
- Storage backend: a configured adapter that stores and retrieves bytes.
|
||||||
|
- Storage location: a backend-specific pointer such as a bucket/key, filesystem
|
||||||
|
path, or future archive locator.
|
||||||
|
- Retention class: a named policy category such as transient, raw-evidence,
|
||||||
|
release-evidence, audit-prep, or permanent-record.
|
||||||
|
- Retention rule: the default storage duration and deletion behavior for a class.
|
||||||
|
- Retention extension: a time-bounded extension of an artifact's expiry date.
|
||||||
|
- Hold: a stronger instruction that prevents deletion until explicitly released.
|
||||||
|
- Retrieval tier: a future storage or access class such as hot, warm, cold, or
|
||||||
|
archived.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
In scope:
|
||||||
|
|
||||||
|
- metadata registry for artifact packages and files,
|
||||||
|
- content hashing and manifest generation,
|
||||||
|
- pluggable storage backend interface,
|
||||||
|
- local filesystem backend for development,
|
||||||
|
- S3-compatible backend suitable for Ceph RGW,
|
||||||
|
- default retention classes and expiry calculation,
|
||||||
|
- retention extension and hold records,
|
||||||
|
- retrieval metadata and download path generation,
|
||||||
|
- audit events for ingestion, retrieval, retention changes, and deletion,
|
||||||
|
- API-first service suitable for automation,
|
||||||
|
- pilot integration with guide-board assessment runs.
|
||||||
|
|
||||||
|
Out of scope for the initial service:
|
||||||
|
|
||||||
|
- replacing Statehub as the work, repository, or decision system of record,
|
||||||
|
- embedding guide-board-specific assessment semantics in the registry core,
|
||||||
|
- full compliance certification or legal-record guarantees,
|
||||||
|
- cloud-provider-specific lifecycle automation beyond backend adapter hooks,
|
||||||
|
- asynchronous cold-archive restore flows,
|
||||||
|
- user-facing UI beyond API contracts and minimal operator docs.
|
||||||
|
|
||||||
|
## Relationship To Other Services
|
||||||
|
|
||||||
|
`artifact-store` should remain a shared infrastructure service.
|
||||||
|
|
||||||
|
- `guide-board` produces assessment packages and asks `artifact-store` to
|
||||||
|
preserve them.
|
||||||
|
- `open-cmis-tck` can add CMIS-specific scorecards and log reviews before a
|
||||||
|
guide-board run is ingested.
|
||||||
|
- `Statehub` records work, decisions, repository state, and links to artifact
|
||||||
|
registry identifiers.
|
||||||
|
- Ceph is a strong self-hosted storage backend candidate because its RGW layer is
|
||||||
|
S3-compatible, but the registry must not be Ceph-only.
|
||||||
|
|
||||||
|
## Boundary
|
||||||
|
|
||||||
|
The registry can prove what it stored, where it stored it, which hashes it
|
||||||
|
computed, and which retention decisions were applied. It does not prove the
|
||||||
|
truth of the artifact contents, certify a system, or replace formal records
|
||||||
|
management without additional governance.
|
||||||
15
README.md
15
README.md
@@ -1,3 +1,14 @@
|
|||||||
# repo-seed
|
# artifact-store
|
||||||
|
|
||||||
A git repository template to bootstrap coulomb projects from.
|
Generic artifact registry and storage gateway for generated outputs, evidence
|
||||||
|
packages, reports, logs, and release artifacts.
|
||||||
|
|
||||||
|
The registry owns artifact identity, metadata, provenance, retention policy, and
|
||||||
|
retrieval records. Actual bytes are delegated to configured storage backends such
|
||||||
|
as a local filesystem, S3-compatible object storage, or Ceph RGW.
|
||||||
|
|
||||||
|
Start here:
|
||||||
|
|
||||||
|
- [INTENT.md](INTENT.md)
|
||||||
|
- [docs/ARCHITECTURE-BLUEPRINT.md](docs/ARCHITECTURE-BLUEPRINT.md)
|
||||||
|
- [workplans/ARTIFACT-STORE-WP-0001-service-baseline.md](workplans/ARTIFACT-STORE-WP-0001-service-baseline.md)
|
||||||
|
|||||||
133
SCOPE.md
Normal file
133
SCOPE.md
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
# SCOPE
|
||||||
|
|
||||||
|
> This file helps you quickly understand what this repository is about,
|
||||||
|
> when it is relevant, and when it is not.
|
||||||
|
> It is intentionally lightweight and may be incomplete.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## One-liner
|
||||||
|
|
||||||
|
`artifact-store` is a generic artifact registry and storage gateway for durable
|
||||||
|
generated outputs, evidence packages, reports, logs, snapshots, exports, and
|
||||||
|
release artifacts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Idea
|
||||||
|
|
||||||
|
Generated artifacts become valuable when they are findable, attributable,
|
||||||
|
retained for the right amount of time, and safely discardable when they expire.
|
||||||
|
This repository makes artifact preservation a shared platform concern: producers
|
||||||
|
register packages and files, the registry owns metadata and lifecycle, and
|
||||||
|
storage backends own bytes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## In Scope
|
||||||
|
|
||||||
|
- Artifact package and artifact file metadata.
|
||||||
|
- Content hashing, manifests, provenance, and audit events.
|
||||||
|
- Pluggable storage backend interface with local filesystem and S3-compatible
|
||||||
|
backends as the first targets.
|
||||||
|
- Retention classes, expiry calculation, retention extension, and holds.
|
||||||
|
- Retrieval metadata and download/link surfaces for automation.
|
||||||
|
- Pilot ingestion flow for guide-board / OpenCMIS TCK assessment output.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
|
||||||
|
- Replacing StateHub as the work, repository, or decision system of record.
|
||||||
|
- Encoding guide-board-specific assessment semantics in the registry core.
|
||||||
|
- Formal compliance certification or legal-record guarantees by itself.
|
||||||
|
- Cloud-provider-specific lifecycle automation beyond backend adapter hooks.
|
||||||
|
- User-facing UI beyond API contracts and minimal operator documentation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Relevant When
|
||||||
|
|
||||||
|
- A tool has generated files that need durable storage and stable identifiers.
|
||||||
|
- StateHub, release records, or operators need to link to preserved evidence.
|
||||||
|
- A producer needs backend-neutral storage across local disk, Ceph RGW, or other
|
||||||
|
S3-compatible object storage.
|
||||||
|
- Retention, hold, or deletion eligibility needs to be explicit and auditable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Not Relevant When
|
||||||
|
|
||||||
|
- The artifact is purely temporary scratch output with no retention need.
|
||||||
|
- The work is about StateHub tasks, decisions, or repository catalog data rather
|
||||||
|
than artifact bytes and metadata.
|
||||||
|
- A producer needs domain-specific scoring, validation, or assessment semantics.
|
||||||
|
- The requirement is a human-facing artifact browser rather than API-first
|
||||||
|
preservation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
- Status: concept / service-baseline planning
|
||||||
|
- Implementation: documentation and initial workplan only
|
||||||
|
- Stability: evolving
|
||||||
|
- Usage: none yet; first pilot target is guide-board assessment output
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How It Fits
|
||||||
|
|
||||||
|
- Upstream dependencies: producer repositories such as `guide-board` and
|
||||||
|
`open-cmis-tck`; future storage backends such as local filesystem and Ceph RGW.
|
||||||
|
- Downstream consumers: StateHub records, release/evidence workflows, operators,
|
||||||
|
and future artifact search or retrieval UIs.
|
||||||
|
- Often used with: `guide-board`, `open-cmis-tck`, `kontextual-engine`,
|
||||||
|
StateHub, and S3-compatible object storage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Terminology
|
||||||
|
|
||||||
|
- Preferred terms: artifact package, artifact file, registry record, storage
|
||||||
|
backend, storage location, retention class, retention rule, hold.
|
||||||
|
- Also known as: artifact registry, evidence store, storage gateway.
|
||||||
|
- Potentially confusing terms: StateHub links to artifact identifiers but does
|
||||||
|
not store artifact bytes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related / Overlapping
|
||||||
|
|
||||||
|
- `guide-board` - produces assessment packages and evidence output.
|
||||||
|
- `open-cmis-tck` - contributes CMIS-specific assessment artifacts for the pilot.
|
||||||
|
- `kontextual-engine` - first likely subject of preserved guide-board evidence.
|
||||||
|
- `the-custodian/state-hub` - records work, decisions, repository state, and
|
||||||
|
links to artifact registry identifiers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Getting Oriented
|
||||||
|
|
||||||
|
- Start with: `INTENT.md`.
|
||||||
|
- Key files / directories: `docs/ARCHITECTURE-BLUEPRINT.md`, `workplans/`.
|
||||||
|
- Entry points: no service entry point yet; see
|
||||||
|
`workplans/ARTIFACT-STORE-WP-0001-service-baseline.md`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provided Capabilities
|
||||||
|
|
||||||
|
```capability
|
||||||
|
type: infrastructure
|
||||||
|
title: Artifact package preservation
|
||||||
|
description: Register generated artifact packages and files, store bytes through a configured backend, compute hashes, apply retention policy, and return stable package identifiers for StateHub or producer records.
|
||||||
|
keywords: [artifacts, evidence, retention, storage, registry, provenance]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
|
||||||
|
assessment output for `kontextual-engine`.
|
||||||
330
docs/ARCHITECTURE-BLUEPRINT.md
Normal file
330
docs/ARCHITECTURE-BLUEPRINT.md
Normal file
@@ -0,0 +1,330 @@
|
|||||||
|
# Artifact Store Architecture Blueprint
|
||||||
|
|
||||||
|
Status: draft
|
||||||
|
Created: 2026-05-15
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
`artifact-store` provides a generic registry and storage gateway for durable
|
||||||
|
generated artifacts. Producers register packages and files with metadata;
|
||||||
|
storage adapters persist the bytes; retention policy decides how long artifacts
|
||||||
|
remain eligible for retrieval.
|
||||||
|
|
||||||
|
The design keeps artifact identity and lifecycle separate from storage
|
||||||
|
implementation. This allows the first version to run against local filesystem
|
||||||
|
storage while the production path can use S3-compatible object storage such as
|
||||||
|
Ceph RGW.
|
||||||
|
|
||||||
|
## Architecture Summary
|
||||||
|
|
||||||
|
```text
|
||||||
|
producer
|
||||||
|
-> Artifact Registry API
|
||||||
|
-> metadata database
|
||||||
|
-> retention policy engine
|
||||||
|
-> audit event log
|
||||||
|
-> storage adapter interface
|
||||||
|
-> local filesystem backend
|
||||||
|
-> S3-compatible backend
|
||||||
|
-> Ceph RGW deployment
|
||||||
|
-> future cloud/blob/archive backends
|
||||||
|
```
|
||||||
|
|
||||||
|
The registry is the authority for artifact metadata and lifecycle. Backends are
|
||||||
|
responsible for byte storage and retrieval.
|
||||||
|
|
||||||
|
## Design Principles
|
||||||
|
|
||||||
|
- Backend-neutral registry: no producer should know whether bytes live in Ceph,
|
||||||
|
local disk, or a cloud bucket.
|
||||||
|
- Content-addressable confidence: every stored file has a digest and size.
|
||||||
|
- Retention by default: every package receives an expiry decision at ingestion.
|
||||||
|
- Extensions are explicit: retention extensions and holds are audit events, not
|
||||||
|
silent metadata edits.
|
||||||
|
- Packages remain portable: a manifest should be enough to understand a package
|
||||||
|
without calling the producer.
|
||||||
|
- Statehub links, it does not store bytes: Statehub records artifact IDs and
|
||||||
|
outcomes; artifact-store owns file persistence.
|
||||||
|
- Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion
|
||||||
|
jobs must be auditable and reversible only when the backend still has data.
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### Registry API
|
||||||
|
|
||||||
|
HTTP API for producers and operators.
|
||||||
|
|
||||||
|
Initial responsibilities:
|
||||||
|
|
||||||
|
- create artifact packages,
|
||||||
|
- upload or ingest files,
|
||||||
|
- finalize packages,
|
||||||
|
- retrieve package metadata,
|
||||||
|
- list/search packages by subject and producer metadata,
|
||||||
|
- create retention extensions and holds,
|
||||||
|
- expose download metadata or redirect/download endpoints,
|
||||||
|
- expose health and backend status.
|
||||||
|
|
||||||
|
### Metadata Store
|
||||||
|
|
||||||
|
Persistent database for registry state.
|
||||||
|
|
||||||
|
Initial implementation can use SQLite for local development and PostgreSQL for
|
||||||
|
shared service deployments if that matches the surrounding service stack.
|
||||||
|
|
||||||
|
Core tables:
|
||||||
|
|
||||||
|
- `artifact_packages`
|
||||||
|
- `artifact_files`
|
||||||
|
- `storage_locations`
|
||||||
|
- `retention_rules`
|
||||||
|
- `retention_events`
|
||||||
|
- `audit_events`
|
||||||
|
|
||||||
|
### Storage Adapter Interface
|
||||||
|
|
||||||
|
Small backend contract used by the API service.
|
||||||
|
|
||||||
|
Required operations:
|
||||||
|
|
||||||
|
- `put(object_key, stream, metadata) -> storage_location`
|
||||||
|
- `get(object_key) -> stream or signed_url`
|
||||||
|
- `head(object_key) -> object_metadata`
|
||||||
|
- `delete(object_key) -> deletion_result`
|
||||||
|
- `health() -> backend_status`
|
||||||
|
|
||||||
|
Initial backends:
|
||||||
|
|
||||||
|
- local filesystem backend for tests and development,
|
||||||
|
- S3-compatible backend for Ceph RGW and cloud object stores.
|
||||||
|
|
||||||
|
### Retention Policy Engine
|
||||||
|
|
||||||
|
Applies default rules at ingestion and records later changes.
|
||||||
|
|
||||||
|
Initial retention classes:
|
||||||
|
|
||||||
|
- `transient`: short-lived scratch artifacts,
|
||||||
|
- `raw-evidence`: raw logs and run output,
|
||||||
|
- `summary-evidence`: compact reports and summaries,
|
||||||
|
- `release-evidence`: release or customer-facing evidence packages,
|
||||||
|
- `permanent-record`: manually held records with no automatic expiry.
|
||||||
|
|
||||||
|
Each package stores:
|
||||||
|
|
||||||
|
- selected retention class,
|
||||||
|
- default retention rule,
|
||||||
|
- computed `expires_at`,
|
||||||
|
- extension records,
|
||||||
|
- hold records,
|
||||||
|
- deletion eligibility state.
|
||||||
|
|
||||||
|
### Audit Log
|
||||||
|
|
||||||
|
Append-only record of important events:
|
||||||
|
|
||||||
|
- package created,
|
||||||
|
- file uploaded,
|
||||||
|
- package finalized,
|
||||||
|
- retrieval requested,
|
||||||
|
- retention extended,
|
||||||
|
- hold applied or released,
|
||||||
|
- deletion requested,
|
||||||
|
- deletion completed or failed.
|
||||||
|
|
||||||
|
The audit log does not need to be cryptographic in the first release, but the
|
||||||
|
schema should leave room for signed events or external write-once storage later.
|
||||||
|
|
||||||
|
## Data Model
|
||||||
|
|
||||||
|
### Artifact Package
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `id`
|
||||||
|
- `name`
|
||||||
|
- `producer`
|
||||||
|
- `subject`
|
||||||
|
- `retention_class`
|
||||||
|
- `status`
|
||||||
|
- `created_at`
|
||||||
|
- `finalized_at`
|
||||||
|
- `expires_at`
|
||||||
|
- `metadata`
|
||||||
|
|
||||||
|
Recommended metadata keys:
|
||||||
|
|
||||||
|
- `repo_slug`
|
||||||
|
- `run_id`
|
||||||
|
- `assessment_id`
|
||||||
|
- `target_profile_ref`
|
||||||
|
- `assessment_profile_ref`
|
||||||
|
- `source_commits`
|
||||||
|
- `tool_versions`
|
||||||
|
- `environment`
|
||||||
|
|
||||||
|
### Artifact File
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `id`
|
||||||
|
- `package_id`
|
||||||
|
- `relative_path`
|
||||||
|
- `media_type`
|
||||||
|
- `size_bytes`
|
||||||
|
- `sha256`
|
||||||
|
- `created_at`
|
||||||
|
|
||||||
|
### Storage Location
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `id`
|
||||||
|
- `artifact_file_id`
|
||||||
|
- `backend_id`
|
||||||
|
- `object_key`
|
||||||
|
- `storage_class`
|
||||||
|
- `status`
|
||||||
|
- `created_at`
|
||||||
|
- `last_verified_at`
|
||||||
|
|
||||||
|
### Retention Event
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `id`
|
||||||
|
- `package_id`
|
||||||
|
- `event_type`
|
||||||
|
- `reason`
|
||||||
|
- `created_by`
|
||||||
|
- `created_at`
|
||||||
|
- `previous_expires_at`
|
||||||
|
- `new_expires_at`
|
||||||
|
|
||||||
|
Event types:
|
||||||
|
|
||||||
|
- `default_rule_applied`
|
||||||
|
- `extended`
|
||||||
|
- `hold_applied`
|
||||||
|
- `hold_released`
|
||||||
|
- `deletion_eligible`
|
||||||
|
- `deleted`
|
||||||
|
|
||||||
|
## API Shape
|
||||||
|
|
||||||
|
Initial endpoints:
|
||||||
|
|
||||||
|
```text
|
||||||
|
GET /health
|
||||||
|
GET /backends
|
||||||
|
POST /packages
|
||||||
|
GET /packages
|
||||||
|
GET /packages/{package_id}
|
||||||
|
POST /packages/{package_id}/files
|
||||||
|
POST /packages/{package_id}/finalize
|
||||||
|
GET /packages/{package_id}/manifest
|
||||||
|
GET /files/{file_id}/download
|
||||||
|
POST /packages/{package_id}/retention/extensions
|
||||||
|
POST /packages/{package_id}/retention/holds
|
||||||
|
POST /packages/{package_id}/retention/holds/{hold_id}/release
|
||||||
|
```
|
||||||
|
|
||||||
|
The first ingestion path can accept multipart file uploads. A later trusted-local
|
||||||
|
operator endpoint may ingest from server-local paths, but it should be disabled
|
||||||
|
by default because path ingestion changes the security boundary.
|
||||||
|
|
||||||
|
## Package Manifest
|
||||||
|
|
||||||
|
Every finalized package should expose a JSON manifest containing:
|
||||||
|
|
||||||
|
- package metadata,
|
||||||
|
- retention summary,
|
||||||
|
- file list,
|
||||||
|
- file digests and sizes,
|
||||||
|
- storage backend references,
|
||||||
|
- source metadata,
|
||||||
|
- created/finalized timestamps.
|
||||||
|
|
||||||
|
For guide-board runs, the manifest should preserve links to:
|
||||||
|
|
||||||
|
- `run.json`
|
||||||
|
- `retention-summary.json`
|
||||||
|
- `reports/assessment-package.json`
|
||||||
|
- `reports/report.md`
|
||||||
|
- extension-generated scorecards or log reviews,
|
||||||
|
- raw artifact files captured by the assessment package manifest.
|
||||||
|
|
||||||
|
## Guide-Board Pilot Flow
|
||||||
|
|
||||||
|
```text
|
||||||
|
guide-board run directory
|
||||||
|
-> open-cmis-tck scorecard/log review
|
||||||
|
-> artifact-store package create
|
||||||
|
-> upload run files
|
||||||
|
-> finalize manifest
|
||||||
|
-> Statehub record links package id and summary
|
||||||
|
```
|
||||||
|
|
||||||
|
The artifact package should carry:
|
||||||
|
|
||||||
|
- run id,
|
||||||
|
- target profile reference,
|
||||||
|
- assessment profile reference,
|
||||||
|
- result status,
|
||||||
|
- source commits for guide-board, open-cmis-tck, and the assessed repository,
|
||||||
|
- important report paths,
|
||||||
|
- retention class `raw-evidence` or `release-evidence`.
|
||||||
|
|
||||||
|
## Ceph And S3-Compatible Storage
|
||||||
|
|
||||||
|
Ceph should be introduced through the S3-compatible adapter, not as a special
|
||||||
|
case in producer logic.
|
||||||
|
|
||||||
|
Configuration should support:
|
||||||
|
|
||||||
|
- endpoint URL,
|
||||||
|
- bucket,
|
||||||
|
- region,
|
||||||
|
- access key reference,
|
||||||
|
- secret key reference,
|
||||||
|
- optional server-side encryption settings,
|
||||||
|
- object key prefix,
|
||||||
|
- storage class label.
|
||||||
|
|
||||||
|
The service should never require credentials in producer request bodies. Use
|
||||||
|
environment variables, mounted secret files, or a local secret provider.
|
||||||
|
|
||||||
|
## Future Retrieval Tiers
|
||||||
|
|
||||||
|
The initial API can treat all stored files as immediately retrievable. Later,
|
||||||
|
storage locations can include:
|
||||||
|
|
||||||
|
- `retrieval_tier`: hot, warm, cold, archive,
|
||||||
|
- `restore_status`: available, restore_requested, restoring, restored, expired,
|
||||||
|
- `restore_requested_at`,
|
||||||
|
- `restore_expires_at`.
|
||||||
|
|
||||||
|
The registry API should be able to return "not immediately available" without
|
||||||
|
changing artifact identity.
|
||||||
|
|
||||||
|
## Security Boundary
|
||||||
|
|
||||||
|
Initial service assumptions:
|
||||||
|
|
||||||
|
- internal service, not public internet exposed,
|
||||||
|
- authenticated producer/operator API before shared deployment,
|
||||||
|
- no secret values stored in artifact metadata,
|
||||||
|
- package paths are logical paths, not trusted filesystem paths,
|
||||||
|
- download authorization should be checked at the registry layer.
|
||||||
|
|
||||||
|
Files may contain sensitive evidence. The service must treat metadata and bytes
|
||||||
|
as confidential by default.
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
- Which identity provider should guard shared deployments?
|
||||||
|
- Should package metadata schemas be open-ended JSON or typed by producer?
|
||||||
|
- Should deduplication be package-local only or global by content hash?
|
||||||
|
- Should deletion first mark records deleted, then delete bytes, or reverse that
|
||||||
|
order with compensating events?
|
||||||
|
- How much Statehub integration belongs in this repo versus in Statehub clients?
|
||||||
229
workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
Normal file
229
workplans/ARTIFACT-STORE-WP-0001-service-baseline.md
Normal file
@@ -0,0 +1,229 @@
|
|||||||
|
---
|
||||||
|
id: ARTIFACT-STORE-WP-0001
|
||||||
|
type: workplan
|
||||||
|
title: "Artifact Store Service Baseline"
|
||||||
|
repo: artifact-store
|
||||||
|
domain: stack
|
||||||
|
status: active
|
||||||
|
owner: codex
|
||||||
|
topic_slug: stack
|
||||||
|
planning_priority: high
|
||||||
|
planning_order: 1
|
||||||
|
created: "2026-05-15"
|
||||||
|
updated: "2026-05-15"
|
||||||
|
state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb"
|
||||||
|
---
|
||||||
|
|
||||||
|
# ARTIFACT-STORE-WP-0001: Artifact Store Service Baseline
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Implement the first usable artifact registry and storage gateway. The service
|
||||||
|
should preserve artifact packages, index their metadata, delegate bytes to a
|
||||||
|
configured storage backend, apply default retention rules, and expose stable
|
||||||
|
package identifiers that Statehub and producer repositories can link to.
|
||||||
|
|
||||||
|
The first producer target is a guide-board assessment run, including OpenCMIS TCK
|
||||||
|
reports and raw assessment artifacts.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Guide-board can already produce self-contained run directories with retention
|
||||||
|
summaries, assessment packages, raw artifacts, scorecards, and log reviews. Those
|
||||||
|
directories should not live only in `/tmp`, and committing raw evidence into
|
||||||
|
producer repositories is the wrong long-term shape.
|
||||||
|
|
||||||
|
`artifact-store` becomes the shared preservation layer:
|
||||||
|
|
||||||
|
- producers generate files,
|
||||||
|
- artifact-store registers and stores them,
|
||||||
|
- Statehub records the work outcome and links to the registry package,
|
||||||
|
- storage backends handle durable bytes.
|
||||||
|
|
||||||
|
Ceph is the likely self-hosted production backend through its S3-compatible RGW
|
||||||
|
interface, but the service must keep the backend interface generic.
|
||||||
|
|
||||||
|
## Target Architecture
|
||||||
|
|
||||||
|
```text
|
||||||
|
producer package
|
||||||
|
-> registry API
|
||||||
|
-> metadata database
|
||||||
|
-> retention policy engine
|
||||||
|
-> storage adapter
|
||||||
|
-> local filesystem or S3-compatible object storage
|
||||||
|
```
|
||||||
|
|
||||||
|
## Boundary
|
||||||
|
|
||||||
|
This workplan owns the first service implementation and API contract. It does
|
||||||
|
not need to build a UI, implement cold-storage restore tiers, replace Statehub,
|
||||||
|
or provide formal records-management certification.
|
||||||
|
|
||||||
|
## D1.1 - Service Scaffold And Repository Identity
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Replace the seed README with artifact-store service instructions.
|
||||||
|
- Add a Python service scaffold with a clear package/module layout.
|
||||||
|
- Provide a local development command.
|
||||||
|
- Provide a test command.
|
||||||
|
- Keep generated artifact bytes and local databases ignored by git.
|
||||||
|
- Document required environment variables.
|
||||||
|
|
||||||
|
## D1.2 - Registry Data Model
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Define persistent models for artifact packages, files, storage locations,
|
||||||
|
retention rules, retention events, and audit events.
|
||||||
|
- Store package metadata as structured JSON while keeping core query fields
|
||||||
|
explicit.
|
||||||
|
- Record package lifecycle status: created, uploading, finalized, deleted, and
|
||||||
|
failed.
|
||||||
|
- Record file `sha256`, size, media type, and logical relative path.
|
||||||
|
- Add migrations or a reproducible schema initialization path.
|
||||||
|
|
||||||
|
## D1.3 - Local Filesystem Storage Backend
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Implement a storage adapter interface.
|
||||||
|
- Implement a local filesystem backend for development and tests.
|
||||||
|
- Store objects under deterministic package/file keys.
|
||||||
|
- Prevent path traversal and accidental writes outside the configured storage
|
||||||
|
root.
|
||||||
|
- Add backend health reporting.
|
||||||
|
- Add tests for put, get, head, and delete operations.
|
||||||
|
|
||||||
|
## D1.4 - Package Ingestion API
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T004
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "e3879111-4be9-4731-8aea-15abb874f960"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Add endpoints to create a package, upload files, finalize a package, retrieve
|
||||||
|
package metadata, list packages, and download files.
|
||||||
|
- Compute file hashes server-side during ingestion.
|
||||||
|
- Reject duplicate logical paths within one package unless explicitly replacing
|
||||||
|
a non-finalized file.
|
||||||
|
- Produce a package manifest after finalization.
|
||||||
|
- Add API tests covering successful ingestion and validation failures.
|
||||||
|
|
||||||
|
## D1.5 - Retention Baseline
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T005
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "2d6cbd83-c348-45ad-a223-7870a3412225"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Seed default retention classes for transient, raw-evidence, summary-evidence,
|
||||||
|
release-evidence, and permanent-record.
|
||||||
|
- Apply a default `expires_at` when a package is created or finalized.
|
||||||
|
- Add endpoints to extend retention and apply or release holds.
|
||||||
|
- Record retention changes as retention events and audit events.
|
||||||
|
- Expose deletion eligibility without deleting bytes automatically in the first
|
||||||
|
implementation.
|
||||||
|
|
||||||
|
## D1.6 - S3-Compatible Backend Design Hook
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "7b980a55-2364-48c3-98ac-081629a8d2b7"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Define configuration fields for an S3-compatible backend.
|
||||||
|
- Keep the adapter contract compatible with Ceph RGW.
|
||||||
|
- Add an implementation stub or feature-flagged backend if dependencies are not
|
||||||
|
ready.
|
||||||
|
- Document expected Ceph/S3 configuration without requiring a live Ceph service
|
||||||
|
for baseline tests.
|
||||||
|
|
||||||
|
## D1.7 - Guide-Board Pilot Ingestion
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T007
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "eb822821-353c-4cd2-95bf-acb2f084b7ea"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Provide a CLI helper or documented curl flow to register a guide-board run
|
||||||
|
directory as one package.
|
||||||
|
- Preserve guide-board run metadata: run id, target profile, assessment profile,
|
||||||
|
evidence result counts, finding counts, source commits, and report paths.
|
||||||
|
- Ingest the CMIS pilot run shape, including scorecard and log-review reports.
|
||||||
|
- Return a package id suitable for recording in Statehub.
|
||||||
|
- Add a fixture-based test that does not require the real OpenCMIS TCK.
|
||||||
|
|
||||||
|
## D1.8 - Operator Documentation And Handoff
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: ARTIFACT-STORE-WP-0001-T008
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance:
|
||||||
|
|
||||||
|
- Document local run, test, and package ingestion commands.
|
||||||
|
- Document retention behavior and extension flow.
|
||||||
|
- Document the boundary between artifact-store and Statehub.
|
||||||
|
- Include a dev-agent handoff section listing the first implementation order.
|
||||||
|
- Keep architecture docs aligned with the implemented API.
|
||||||
|
|
||||||
|
## Suggested Implementation Order
|
||||||
|
|
||||||
|
1. Service scaffold, test harness, and README.
|
||||||
|
2. Metadata models and local database setup.
|
||||||
|
3. Local filesystem storage adapter.
|
||||||
|
4. Package create/upload/finalize/download API.
|
||||||
|
5. Retention defaults, extension, hold, and audit events.
|
||||||
|
6. Guide-board run ingestion helper.
|
||||||
|
7. S3-compatible backend configuration and Ceph notes.
|
||||||
|
|
||||||
|
## First Pilot Success Criteria
|
||||||
|
|
||||||
|
- A completed guide-board CMIS run can be ingested from a local directory.
|
||||||
|
- The package manifest lists every stored file with SHA-256 and size.
|
||||||
|
- The registry returns a stable package id.
|
||||||
|
- Files can be downloaded through the service.
|
||||||
|
- Default retention is visible and can be extended.
|
||||||
|
- Statehub can record the package id and summary without storing artifact bytes.
|
||||||
Reference in New Issue
Block a user