Bootstraping the repo

This commit is contained in:
2026-05-15 20:08:32 +02:00
parent c99ffe429f
commit 793c0c7ba5
6 changed files with 1006 additions and 2 deletions

180
AGENTS.md Normal file
View File

@@ -0,0 +1,180 @@
# artifact-store — Agent Instructions
## Repo Identity
**Purpose:** Generic artifact registry and storage gateway for generated outputs,
evidence packages, reports, logs, snapshots, exports, and release artifacts.
**Domain:** stack
**Repo slug:** artifact-store
**Topic ID:** `595afc64-bd28-47bf-aafb-ba230b28371b`
**Workplan prefix:** `ARTIFACT-STORE-WP-`
---
## State Hub Integration
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
there is no MCP server for Codex agents.
| Context | URL |
|---------|-----|
| Local workstation | `http://127.0.0.1:8000` |
| Remote via tunnel | `http://127.0.0.1:18000` |
### Orient at session start
```bash
# Offline brief — works without hub connection
cat .custodian-brief.md
# Active workstreams for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=595afc64-bd28-47bf-aafb-ba230b28371b&status=active" \
| python3 -m json.tool
# Check inbox
curl -s "http://127.0.0.1:8000/messages/?to_agent=artifact-store&unread_only=true" \
| python3 -m json.tool
```
Mark a message read:
```bash
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
### Log progress (required at session close)
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{
"summary": "what was done",
"event_type": "note",
"author": "codex",
"workstream_id": "<uuid>",
"task_id": "<uuid>"
}'
```
Omit `workstream_id` / `task_id` when not applicable.
### Update task status
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"status": "in_progress"}'
# values: todo | in_progress | done | blocked
```
### Flag a task for human review
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"needs_human": true, "intervention_note": "reason"}'
```
---
## Session Protocol
**Start:**
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
2. Check inbox: `GET /messages/?to_agent=artifact-store&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: active` files and open tasks
4. Check blocked tasks: `GET /tasks/?needs_human=true`
**During work:**
- Update task statuses in workplan files as tasks progress
- Record significant decisions via `POST /decisions/`
**Close:**
1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from
`~/the-custodian/state-hub`:
```bash
make fix-consistency REPO=artifact-store
```
This syncs task status from files into the hub DB.
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
read/cache/index layer that rebuilds from files.
**File location:** `workplans/ARTIFACT-STORE-WP-NNNN-<slug>.md`
**Archived location:** completed workplans may move to
`workplans/archived/YYMMDD-ARTIFACT-STORE-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
the completion/archive date; the frontmatter `id` does not change.
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
this only for low-risk work completed directly; create a normal workplan for
anything needing analysis, design, approval, dependencies, or multiple phases.
**Frontmatter:**
```yaml
---
id: ARTIFACT-STORE-WP-NNNN
type: workplan
title: "..."
domain: stack
repo: artifact-store
status: active | done
owner: codex
topic_slug: ...
created: "YYYY-MM-DD"
updated: "YYYY-MM-DD"
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
---
```
**Task block format** (one per `##` section):
```
## Task Title
` ` `task
id: ARTIFACT-STORE-WP-NNNN-T01
status: todo | in_progress | done | blocked
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
` ` `
Task description text.
```
Status progression: `todo` → `in_progress` → `done` (or `blocked`)
To create a new workplan:
1. Write the file following the format above
2. Notify the custodian operator to run `make fix-consistency REPO=artifact-store`
(or send a message to the hub agent via `POST /messages/`)
---
## Current Repo Shape
This repository is in service-baseline planning. The current source of truth is:
- `INTENT.md` for purpose, product thesis, scope, and service boundary
- `docs/ARCHITECTURE-BLUEPRINT.md` for the draft architecture
- `workplans/ARTIFACT-STORE-WP-0001-service-baseline.md` for implementation tasks
No runnable service scaffold exists yet. Add install, dev-server, and test
commands here when `ARTIFACT-STORE-WP-0001-T001` lands.
## Repo Boundary
This repo owns artifact identity, package/file metadata, storage backend
abstraction, retention policy, retrieval metadata, and audit trails.
It does not own StateHub work records, guide-board assessment semantics, formal
records-management certification, or producer-specific business logic.

121
INTENT.md Normal file
View File

@@ -0,0 +1,121 @@
# INTENT
## Project Name
`artifact-store`
## Purpose
`artifact-store` is a generic artifact registry and storage gateway. It gives
projects a stable place to register files, evidence packages, logs, reports,
snapshots, exports, and other generated outputs without forcing every producer
to invent its own retention, indexing, and storage rules.
The service owns artifact identity, metadata, provenance, retention decisions,
lookup, and audit trails. Actual bytes are delegated to one or more configured
storage backends such as a local filesystem, S3-compatible object storage, Ceph
RGW, AWS S3, Azure Blob Storage, Google Cloud Storage, or future archival tiers.
## Product Thesis
Generated artifacts become valuable when they are findable, attributable,
retained for the right amount of time, and safely discardable when they are no
longer needed. Teams should be able to preserve a run result, point Statehub or
another system at its durable registry record, and later prove which files were
stored, which hashes they had, where they lived, and when retention was extended
or released.
`artifact-store` exists to make artifact preservation a shared platform concern
instead of an ad hoc directory convention.
## Primary Use Case
Given a producer such as `guide-board`, a completed assessment run, and an
artifact package directory, `artifact-store` should:
1. register the package and its files,
2. compute and store content hashes and sizes,
3. capture producer, subject, run, repository, commit, and environment metadata,
4. select the applicable retention rule,
5. write files through a configured storage backend,
6. record all storage locations and backend object keys,
7. provide stable retrieval metadata and download links,
8. allow retention extension or hold decisions,
9. expose enough index data for Statehub, release records, and future UIs,
10. make expired artifacts eligible for deletion through an auditable process.
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
assessment output for `kontextual-engine`.
## Intended Users
- Assessment and compliance tools that produce evidence packages.
- Build, release, and quality systems that need durable generated outputs.
- Statehub and repository automation that need to link work records to
preserved evidence.
- Operators who need retention visibility and controlled deletion.
- Future UI and agent workflows that need artifact search, download, or restore
status.
## Core Concepts
- Artifact package: a logical collection of files registered together, such as a
guide-board assessment run directory.
- Artifact file: one stored file with a path, media type, size, digest, and
storage location.
- Registry record: metadata and lifecycle state for an artifact package or file.
- Storage backend: a configured adapter that stores and retrieves bytes.
- Storage location: a backend-specific pointer such as a bucket/key, filesystem
path, or future archive locator.
- Retention class: a named policy category such as transient, raw-evidence,
release-evidence, audit-prep, or permanent-record.
- Retention rule: the default storage duration and deletion behavior for a class.
- Retention extension: a time-bounded extension of an artifact's expiry date.
- Hold: a stronger instruction that prevents deletion until explicitly released.
- Retrieval tier: a future storage or access class such as hot, warm, cold, or
archived.
## Scope
In scope:
- metadata registry for artifact packages and files,
- content hashing and manifest generation,
- pluggable storage backend interface,
- local filesystem backend for development,
- S3-compatible backend suitable for Ceph RGW,
- default retention classes and expiry calculation,
- retention extension and hold records,
- retrieval metadata and download path generation,
- audit events for ingestion, retrieval, retention changes, and deletion,
- API-first service suitable for automation,
- pilot integration with guide-board assessment runs.
Out of scope for the initial service:
- replacing Statehub as the work, repository, or decision system of record,
- embedding guide-board-specific assessment semantics in the registry core,
- full compliance certification or legal-record guarantees,
- cloud-provider-specific lifecycle automation beyond backend adapter hooks,
- asynchronous cold-archive restore flows,
- user-facing UI beyond API contracts and minimal operator docs.
## Relationship To Other Services
`artifact-store` should remain a shared infrastructure service.
- `guide-board` produces assessment packages and asks `artifact-store` to
preserve them.
- `open-cmis-tck` can add CMIS-specific scorecards and log reviews before a
guide-board run is ingested.
- `Statehub` records work, decisions, repository state, and links to artifact
registry identifiers.
- Ceph is a strong self-hosted storage backend candidate because its RGW layer is
S3-compatible, but the registry must not be Ceph-only.
## Boundary
The registry can prove what it stored, where it stored it, which hashes it
computed, and which retention decisions were applied. It does not prove the
truth of the artifact contents, certify a system, or replace formal records
management without additional governance.

View File

@@ -1,3 +1,14 @@
# repo-seed
# artifact-store
A git repository template to bootstrap coulomb projects from.
Generic artifact registry and storage gateway for generated outputs, evidence
packages, reports, logs, and release artifacts.
The registry owns artifact identity, metadata, provenance, retention policy, and
retrieval records. Actual bytes are delegated to configured storage backends such
as a local filesystem, S3-compatible object storage, or Ceph RGW.
Start here:
- [INTENT.md](INTENT.md)
- [docs/ARCHITECTURE-BLUEPRINT.md](docs/ARCHITECTURE-BLUEPRINT.md)
- [workplans/ARTIFACT-STORE-WP-0001-service-baseline.md](workplans/ARTIFACT-STORE-WP-0001-service-baseline.md)

133
SCOPE.md Normal file
View File

@@ -0,0 +1,133 @@
# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> It is intentionally lightweight and may be incomplete.
---
## One-liner
`artifact-store` is a generic artifact registry and storage gateway for durable
generated outputs, evidence packages, reports, logs, snapshots, exports, and
release artifacts.
---
## Core Idea
Generated artifacts become valuable when they are findable, attributable,
retained for the right amount of time, and safely discardable when they expire.
This repository makes artifact preservation a shared platform concern: producers
register packages and files, the registry owns metadata and lifecycle, and
storage backends own bytes.
---
## In Scope
- Artifact package and artifact file metadata.
- Content hashing, manifests, provenance, and audit events.
- Pluggable storage backend interface with local filesystem and S3-compatible
backends as the first targets.
- Retention classes, expiry calculation, retention extension, and holds.
- Retrieval metadata and download/link surfaces for automation.
- Pilot ingestion flow for guide-board / OpenCMIS TCK assessment output.
---
## Out of Scope
- Replacing StateHub as the work, repository, or decision system of record.
- Encoding guide-board-specific assessment semantics in the registry core.
- Formal compliance certification or legal-record guarantees by itself.
- Cloud-provider-specific lifecycle automation beyond backend adapter hooks.
- User-facing UI beyond API contracts and minimal operator documentation.
---
## Relevant When
- A tool has generated files that need durable storage and stable identifiers.
- StateHub, release records, or operators need to link to preserved evidence.
- A producer needs backend-neutral storage across local disk, Ceph RGW, or other
S3-compatible object storage.
- Retention, hold, or deletion eligibility needs to be explicit and auditable.
---
## Not Relevant When
- The artifact is purely temporary scratch output with no retention need.
- The work is about StateHub tasks, decisions, or repository catalog data rather
than artifact bytes and metadata.
- A producer needs domain-specific scoring, validation, or assessment semantics.
- The requirement is a human-facing artifact browser rather than API-first
preservation.
---
## Current State
- Status: concept / service-baseline planning
- Implementation: documentation and initial workplan only
- Stability: evolving
- Usage: none yet; first pilot target is guide-board assessment output
---
## How It Fits
- Upstream dependencies: producer repositories such as `guide-board` and
`open-cmis-tck`; future storage backends such as local filesystem and Ceph RGW.
- Downstream consumers: StateHub records, release/evidence workflows, operators,
and future artifact search or retrieval UIs.
- Often used with: `guide-board`, `open-cmis-tck`, `kontextual-engine`,
StateHub, and S3-compatible object storage.
---
## Terminology
- Preferred terms: artifact package, artifact file, registry record, storage
backend, storage location, retention class, retention rule, hold.
- Also known as: artifact registry, evidence store, storage gateway.
- Potentially confusing terms: StateHub links to artifact identifiers but does
not store artifact bytes.
---
## Related / Overlapping
- `guide-board` - produces assessment packages and evidence output.
- `open-cmis-tck` - contributes CMIS-specific assessment artifacts for the pilot.
- `kontextual-engine` - first likely subject of preserved guide-board evidence.
- `the-custodian/state-hub` - records work, decisions, repository state, and
links to artifact registry identifiers.
---
## Getting Oriented
- Start with: `INTENT.md`.
- Key files / directories: `docs/ARCHITECTURE-BLUEPRINT.md`, `workplans/`.
- Entry points: no service entry point yet; see
`workplans/ARTIFACT-STORE-WP-0001-service-baseline.md`.
---
## Provided Capabilities
```capability
type: infrastructure
title: Artifact package preservation
description: Register generated artifact packages and files, store bytes through a configured backend, compute hashes, apply retention policy, and return stable package identifiers for StateHub or producer records.
keywords: [artifacts, evidence, retention, storage, registry, provenance]
```
---
## Notes
The first concrete pilot is preserving `guide-board` / `open-cmis-tck`
assessment output for `kontextual-engine`.

View File

@@ -0,0 +1,330 @@
# Artifact Store Architecture Blueprint
Status: draft
Created: 2026-05-15
## Purpose
`artifact-store` provides a generic registry and storage gateway for durable
generated artifacts. Producers register packages and files with metadata;
storage adapters persist the bytes; retention policy decides how long artifacts
remain eligible for retrieval.
The design keeps artifact identity and lifecycle separate from storage
implementation. This allows the first version to run against local filesystem
storage while the production path can use S3-compatible object storage such as
Ceph RGW.
## Architecture Summary
```text
producer
-> Artifact Registry API
-> metadata database
-> retention policy engine
-> audit event log
-> storage adapter interface
-> local filesystem backend
-> S3-compatible backend
-> Ceph RGW deployment
-> future cloud/blob/archive backends
```
The registry is the authority for artifact metadata and lifecycle. Backends are
responsible for byte storage and retrieval.
## Design Principles
- Backend-neutral registry: no producer should know whether bytes live in Ceph,
local disk, or a cloud bucket.
- Content-addressable confidence: every stored file has a digest and size.
- Retention by default: every package receives an expiry decision at ingestion.
- Extensions are explicit: retention extensions and holds are audit events, not
silent metadata edits.
- Packages remain portable: a manifest should be enough to understand a package
without calling the producer.
- Statehub links, it does not store bytes: Statehub records artifact IDs and
outcomes; artifact-store owns file persistence.
- Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion
jobs must be auditable and reversible only when the backend still has data.
## Components
### Registry API
HTTP API for producers and operators.
Initial responsibilities:
- create artifact packages,
- upload or ingest files,
- finalize packages,
- retrieve package metadata,
- list/search packages by subject and producer metadata,
- create retention extensions and holds,
- expose download metadata or redirect/download endpoints,
- expose health and backend status.
### Metadata Store
Persistent database for registry state.
Initial implementation can use SQLite for local development and PostgreSQL for
shared service deployments if that matches the surrounding service stack.
Core tables:
- `artifact_packages`
- `artifact_files`
- `storage_locations`
- `retention_rules`
- `retention_events`
- `audit_events`
### Storage Adapter Interface
Small backend contract used by the API service.
Required operations:
- `put(object_key, stream, metadata) -> storage_location`
- `get(object_key) -> stream or signed_url`
- `head(object_key) -> object_metadata`
- `delete(object_key) -> deletion_result`
- `health() -> backend_status`
Initial backends:
- local filesystem backend for tests and development,
- S3-compatible backend for Ceph RGW and cloud object stores.
### Retention Policy Engine
Applies default rules at ingestion and records later changes.
Initial retention classes:
- `transient`: short-lived scratch artifacts,
- `raw-evidence`: raw logs and run output,
- `summary-evidence`: compact reports and summaries,
- `release-evidence`: release or customer-facing evidence packages,
- `permanent-record`: manually held records with no automatic expiry.
Each package stores:
- selected retention class,
- default retention rule,
- computed `expires_at`,
- extension records,
- hold records,
- deletion eligibility state.
### Audit Log
Append-only record of important events:
- package created,
- file uploaded,
- package finalized,
- retrieval requested,
- retention extended,
- hold applied or released,
- deletion requested,
- deletion completed or failed.
The audit log does not need to be cryptographic in the first release, but the
schema should leave room for signed events or external write-once storage later.
## Data Model
### Artifact Package
Required fields:
- `id`
- `name`
- `producer`
- `subject`
- `retention_class`
- `status`
- `created_at`
- `finalized_at`
- `expires_at`
- `metadata`
Recommended metadata keys:
- `repo_slug`
- `run_id`
- `assessment_id`
- `target_profile_ref`
- `assessment_profile_ref`
- `source_commits`
- `tool_versions`
- `environment`
### Artifact File
Required fields:
- `id`
- `package_id`
- `relative_path`
- `media_type`
- `size_bytes`
- `sha256`
- `created_at`
### Storage Location
Required fields:
- `id`
- `artifact_file_id`
- `backend_id`
- `object_key`
- `storage_class`
- `status`
- `created_at`
- `last_verified_at`
### Retention Event
Required fields:
- `id`
- `package_id`
- `event_type`
- `reason`
- `created_by`
- `created_at`
- `previous_expires_at`
- `new_expires_at`
Event types:
- `default_rule_applied`
- `extended`
- `hold_applied`
- `hold_released`
- `deletion_eligible`
- `deleted`
## API Shape
Initial endpoints:
```text
GET /health
GET /backends
POST /packages
GET /packages
GET /packages/{package_id}
POST /packages/{package_id}/files
POST /packages/{package_id}/finalize
GET /packages/{package_id}/manifest
GET /files/{file_id}/download
POST /packages/{package_id}/retention/extensions
POST /packages/{package_id}/retention/holds
POST /packages/{package_id}/retention/holds/{hold_id}/release
```
The first ingestion path can accept multipart file uploads. A later trusted-local
operator endpoint may ingest from server-local paths, but it should be disabled
by default because path ingestion changes the security boundary.
## Package Manifest
Every finalized package should expose a JSON manifest containing:
- package metadata,
- retention summary,
- file list,
- file digests and sizes,
- storage backend references,
- source metadata,
- created/finalized timestamps.
For guide-board runs, the manifest should preserve links to:
- `run.json`
- `retention-summary.json`
- `reports/assessment-package.json`
- `reports/report.md`
- extension-generated scorecards or log reviews,
- raw artifact files captured by the assessment package manifest.
## Guide-Board Pilot Flow
```text
guide-board run directory
-> open-cmis-tck scorecard/log review
-> artifact-store package create
-> upload run files
-> finalize manifest
-> Statehub record links package id and summary
```
The artifact package should carry:
- run id,
- target profile reference,
- assessment profile reference,
- result status,
- source commits for guide-board, open-cmis-tck, and the assessed repository,
- important report paths,
- retention class `raw-evidence` or `release-evidence`.
## Ceph And S3-Compatible Storage
Ceph should be introduced through the S3-compatible adapter, not as a special
case in producer logic.
Configuration should support:
- endpoint URL,
- bucket,
- region,
- access key reference,
- secret key reference,
- optional server-side encryption settings,
- object key prefix,
- storage class label.
The service should never require credentials in producer request bodies. Use
environment variables, mounted secret files, or a local secret provider.
## Future Retrieval Tiers
The initial API can treat all stored files as immediately retrievable. Later,
storage locations can include:
- `retrieval_tier`: hot, warm, cold, archive,
- `restore_status`: available, restore_requested, restoring, restored, expired,
- `restore_requested_at`,
- `restore_expires_at`.
The registry API should be able to return "not immediately available" without
changing artifact identity.
## Security Boundary
Initial service assumptions:
- internal service, not public internet exposed,
- authenticated producer/operator API before shared deployment,
- no secret values stored in artifact metadata,
- package paths are logical paths, not trusted filesystem paths,
- download authorization should be checked at the registry layer.
Files may contain sensitive evidence. The service must treat metadata and bytes
as confidential by default.
## Open Questions
- Which identity provider should guard shared deployments?
- Should package metadata schemas be open-ended JSON or typed by producer?
- Should deduplication be package-local only or global by content hash?
- Should deletion first mark records deleted, then delete bytes, or reverse that
order with compensating events?
- How much Statehub integration belongs in this repo versus in Statehub clients?

View File

@@ -0,0 +1,229 @@
---
id: ARTIFACT-STORE-WP-0001
type: workplan
title: "Artifact Store Service Baseline"
repo: artifact-store
domain: stack
status: active
owner: codex
topic_slug: stack
planning_priority: high
planning_order: 1
created: "2026-05-15"
updated: "2026-05-15"
state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb"
---
# ARTIFACT-STORE-WP-0001: Artifact Store Service Baseline
## Purpose
Implement the first usable artifact registry and storage gateway. The service
should preserve artifact packages, index their metadata, delegate bytes to a
configured storage backend, apply default retention rules, and expose stable
package identifiers that Statehub and producer repositories can link to.
The first producer target is a guide-board assessment run, including OpenCMIS TCK
reports and raw assessment artifacts.
## Background
Guide-board can already produce self-contained run directories with retention
summaries, assessment packages, raw artifacts, scorecards, and log reviews. Those
directories should not live only in `/tmp`, and committing raw evidence into
producer repositories is the wrong long-term shape.
`artifact-store` becomes the shared preservation layer:
- producers generate files,
- artifact-store registers and stores them,
- Statehub records the work outcome and links to the registry package,
- storage backends handle durable bytes.
Ceph is the likely self-hosted production backend through its S3-compatible RGW
interface, but the service must keep the backend interface generic.
## Target Architecture
```text
producer package
-> registry API
-> metadata database
-> retention policy engine
-> storage adapter
-> local filesystem or S3-compatible object storage
```
## Boundary
This workplan owns the first service implementation and API contract. It does
not need to build a UI, implement cold-storage restore tiers, replace Statehub,
or provide formal records-management certification.
## D1.1 - Service Scaffold And Repository Identity
```task
id: ARTIFACT-STORE-WP-0001-T001
status: todo
priority: high
state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230"
```
Acceptance:
- Replace the seed README with artifact-store service instructions.
- Add a Python service scaffold with a clear package/module layout.
- Provide a local development command.
- Provide a test command.
- Keep generated artifact bytes and local databases ignored by git.
- Document required environment variables.
## D1.2 - Registry Data Model
```task
id: ARTIFACT-STORE-WP-0001-T002
status: todo
priority: high
state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e"
```
Acceptance:
- Define persistent models for artifact packages, files, storage locations,
retention rules, retention events, and audit events.
- Store package metadata as structured JSON while keeping core query fields
explicit.
- Record package lifecycle status: created, uploading, finalized, deleted, and
failed.
- Record file `sha256`, size, media type, and logical relative path.
- Add migrations or a reproducible schema initialization path.
## D1.3 - Local Filesystem Storage Backend
```task
id: ARTIFACT-STORE-WP-0001-T003
status: todo
priority: high
state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a"
```
Acceptance:
- Implement a storage adapter interface.
- Implement a local filesystem backend for development and tests.
- Store objects under deterministic package/file keys.
- Prevent path traversal and accidental writes outside the configured storage
root.
- Add backend health reporting.
- Add tests for put, get, head, and delete operations.
## D1.4 - Package Ingestion API
```task
id: ARTIFACT-STORE-WP-0001-T004
status: todo
priority: high
state_hub_task_id: "e3879111-4be9-4731-8aea-15abb874f960"
```
Acceptance:
- Add endpoints to create a package, upload files, finalize a package, retrieve
package metadata, list packages, and download files.
- Compute file hashes server-side during ingestion.
- Reject duplicate logical paths within one package unless explicitly replacing
a non-finalized file.
- Produce a package manifest after finalization.
- Add API tests covering successful ingestion and validation failures.
## D1.5 - Retention Baseline
```task
id: ARTIFACT-STORE-WP-0001-T005
status: todo
priority: high
state_hub_task_id: "2d6cbd83-c348-45ad-a223-7870a3412225"
```
Acceptance:
- Seed default retention classes for transient, raw-evidence, summary-evidence,
release-evidence, and permanent-record.
- Apply a default `expires_at` when a package is created or finalized.
- Add endpoints to extend retention and apply or release holds.
- Record retention changes as retention events and audit events.
- Expose deletion eligibility without deleting bytes automatically in the first
implementation.
## D1.6 - S3-Compatible Backend Design Hook
```task
id: ARTIFACT-STORE-WP-0001-T006
status: todo
priority: medium
state_hub_task_id: "7b980a55-2364-48c3-98ac-081629a8d2b7"
```
Acceptance:
- Define configuration fields for an S3-compatible backend.
- Keep the adapter contract compatible with Ceph RGW.
- Add an implementation stub or feature-flagged backend if dependencies are not
ready.
- Document expected Ceph/S3 configuration without requiring a live Ceph service
for baseline tests.
## D1.7 - Guide-Board Pilot Ingestion
```task
id: ARTIFACT-STORE-WP-0001-T007
status: todo
priority: high
state_hub_task_id: "eb822821-353c-4cd2-95bf-acb2f084b7ea"
```
Acceptance:
- Provide a CLI helper or documented curl flow to register a guide-board run
directory as one package.
- Preserve guide-board run metadata: run id, target profile, assessment profile,
evidence result counts, finding counts, source commits, and report paths.
- Ingest the CMIS pilot run shape, including scorecard and log-review reports.
- Return a package id suitable for recording in Statehub.
- Add a fixture-based test that does not require the real OpenCMIS TCK.
## D1.8 - Operator Documentation And Handoff
```task
id: ARTIFACT-STORE-WP-0001-T008
status: todo
priority: medium
state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0"
```
Acceptance:
- Document local run, test, and package ingestion commands.
- Document retention behavior and extension flow.
- Document the boundary between artifact-store and Statehub.
- Include a dev-agent handoff section listing the first implementation order.
- Keep architecture docs aligned with the implemented API.
## Suggested Implementation Order
1. Service scaffold, test harness, and README.
2. Metadata models and local database setup.
3. Local filesystem storage adapter.
4. Package create/upload/finalize/download API.
5. Retention defaults, extension, hold, and audit events.
6. Guide-board run ingestion helper.
7. S3-compatible backend configuration and Ceph notes.
## First Pilot Success Criteria
- A completed guide-board CMIS run can be ingested from a local directory.
- The package manifest lists every stored file with SHA-256 and size.
- The registry returns a stable package id.
- Files can be downloaded through the service.
- Default retention is visible and can be extended.
- Statehub can record the package id and summary without storing artifact bytes.