diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..d11aa0e --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,180 @@ +# artifact-store — Agent Instructions + +## Repo Identity + +**Purpose:** Generic artifact registry and storage gateway for generated outputs, +evidence packages, reports, logs, snapshots, exports, and release artifacts. + +**Domain:** stack +**Repo slug:** artifact-store +**Topic ID:** `595afc64-bd28-47bf-aafb-ba230b28371b` +**Workplan prefix:** `ARTIFACT-STORE-WP-` + +--- + +## State Hub Integration + +The Custodian State Hub tracks work across all domains. Interact via HTTP REST — +there is no MCP server for Codex agents. + +| Context | URL | +|---------|-----| +| Local workstation | `http://127.0.0.1:8000` | +| Remote via tunnel | `http://127.0.0.1:18000` | + +### Orient at session start + +```bash +# Offline brief — works without hub connection +cat .custodian-brief.md + +# Active workstreams for this domain +curl -s "http://127.0.0.1:8000/workstreams/?topic_id=595afc64-bd28-47bf-aafb-ba230b28371b&status=active" \ + | python3 -m json.tool + +# Check inbox +curl -s "http://127.0.0.1:8000/messages/?to_agent=artifact-store&unread_only=true" \ + | python3 -m json.tool +``` + +Mark a message read: +```bash +curl -s -X PATCH "http://127.0.0.1:8000/messages//read" \ + -H "Content-Type: application/json" -d '{}' +``` + +### Log progress (required at session close) + +```bash +curl -s -X POST http://127.0.0.1:8000/progress/ \ + -H "Content-Type: application/json" \ + -d '{ + "summary": "what was done", + "event_type": "note", + "author": "codex", + "workstream_id": "", + "task_id": "" + }' +``` + +Omit `workstream_id` / `task_id` when not applicable. + +### Update task status + +```bash +curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ + -H "Content-Type: application/json" \ + -d '{"status": "in_progress"}' +# values: todo | in_progress | done | blocked +``` + +### Flag a task for human review + +```bash +curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ + -H "Content-Type: application/json" \ + -d '{"needs_human": true, "intervention_note": "reason"}' +``` + +--- + +## Session Protocol + +**Start:** +1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe) +2. Check inbox: `GET /messages/?to_agent=artifact-store&unread_only=true`; mark read +3. Scan workplans: `ls workplans/` — note `status: active` files and open tasks +4. Check blocked tasks: `GET /tasks/?needs_human=true` + +**During work:** +- Update task statuses in workplan files as tasks progress +- Record significant decisions via `POST /decisions/` + +**Close:** +1. Update workplan file task statuses to reflect progress +2. Log: `POST /progress/` with a summary of what changed +3. Note for the custodian operator: after workplan file changes, run from + `~/the-custodian/state-hub`: + ```bash + make fix-consistency REPO=artifact-store + ``` + This syncs task status from files into the hub DB. + +--- + +## Workplan Convention (ADR-001) + +Work items originate as files in this repo — not in the hub. The hub is a +read/cache/index layer that rebuilds from files. + +**File location:** `workplans/ARTIFACT-STORE-WP-NNNN-.md` + +**Archived location:** completed workplans may move to +`workplans/archived/YYMMDD-ARTIFACT-STORE-WP-NNNN-.md`. The `YYMMDD` prefix is +the completion/archive date; the frontmatter `id` does not change. + +**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use +`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use +this only for low-risk work completed directly; create a normal workplan for +anything needing analysis, design, approval, dependencies, or multiple phases. + +**Frontmatter:** + +```yaml +--- +id: ARTIFACT-STORE-WP-NNNN +type: workplan +title: "..." +domain: stack +repo: artifact-store +status: active | done +owner: codex +topic_slug: ... +created: "YYYY-MM-DD" +updated: "YYYY-MM-DD" +state_hub_workstream_id: "" # written by fix-consistency — do not edit +--- +``` + +**Task block format** (one per `##` section): + +``` +## Task Title + +` ` `task +id: ARTIFACT-STORE-WP-NNNN-T01 +status: todo | in_progress | done | blocked +priority: high | medium | low +state_hub_task_id: "" # written by fix-consistency — do not edit +` ` ` + +Task description text. +``` + +Status progression: `todo` → `in_progress` → `done` (or `blocked`) + +To create a new workplan: +1. Write the file following the format above +2. Notify the custodian operator to run `make fix-consistency REPO=artifact-store` + (or send a message to the hub agent via `POST /messages/`) + +--- + +## Current Repo Shape + +This repository is in service-baseline planning. The current source of truth is: + +- `INTENT.md` for purpose, product thesis, scope, and service boundary +- `docs/ARCHITECTURE-BLUEPRINT.md` for the draft architecture +- `workplans/ARTIFACT-STORE-WP-0001-service-baseline.md` for implementation tasks + +No runnable service scaffold exists yet. Add install, dev-server, and test +commands here when `ARTIFACT-STORE-WP-0001-T001` lands. + +## Repo Boundary + +This repo owns artifact identity, package/file metadata, storage backend +abstraction, retention policy, retrieval metadata, and audit trails. + +It does not own StateHub work records, guide-board assessment semantics, formal +records-management certification, or producer-specific business logic. diff --git a/INTENT.md b/INTENT.md new file mode 100644 index 0000000..7c4daa1 --- /dev/null +++ b/INTENT.md @@ -0,0 +1,121 @@ +# INTENT + +## Project Name + +`artifact-store` + +## Purpose + +`artifact-store` is a generic artifact registry and storage gateway. It gives +projects a stable place to register files, evidence packages, logs, reports, +snapshots, exports, and other generated outputs without forcing every producer +to invent its own retention, indexing, and storage rules. + +The service owns artifact identity, metadata, provenance, retention decisions, +lookup, and audit trails. Actual bytes are delegated to one or more configured +storage backends such as a local filesystem, S3-compatible object storage, Ceph +RGW, AWS S3, Azure Blob Storage, Google Cloud Storage, or future archival tiers. + +## Product Thesis + +Generated artifacts become valuable when they are findable, attributable, +retained for the right amount of time, and safely discardable when they are no +longer needed. Teams should be able to preserve a run result, point Statehub or +another system at its durable registry record, and later prove which files were +stored, which hashes they had, where they lived, and when retention was extended +or released. + +`artifact-store` exists to make artifact preservation a shared platform concern +instead of an ad hoc directory convention. + +## Primary Use Case + +Given a producer such as `guide-board`, a completed assessment run, and an +artifact package directory, `artifact-store` should: + +1. register the package and its files, +2. compute and store content hashes and sizes, +3. capture producer, subject, run, repository, commit, and environment metadata, +4. select the applicable retention rule, +5. write files through a configured storage backend, +6. record all storage locations and backend object keys, +7. provide stable retrieval metadata and download links, +8. allow retention extension or hold decisions, +9. expose enough index data for Statehub, release records, and future UIs, +10. make expired artifacts eligible for deletion through an auditable process. + +The first concrete pilot is preserving `guide-board` / `open-cmis-tck` +assessment output for `kontextual-engine`. + +## Intended Users + +- Assessment and compliance tools that produce evidence packages. +- Build, release, and quality systems that need durable generated outputs. +- Statehub and repository automation that need to link work records to + preserved evidence. +- Operators who need retention visibility and controlled deletion. +- Future UI and agent workflows that need artifact search, download, or restore + status. + +## Core Concepts + +- Artifact package: a logical collection of files registered together, such as a + guide-board assessment run directory. +- Artifact file: one stored file with a path, media type, size, digest, and + storage location. +- Registry record: metadata and lifecycle state for an artifact package or file. +- Storage backend: a configured adapter that stores and retrieves bytes. +- Storage location: a backend-specific pointer such as a bucket/key, filesystem + path, or future archive locator. +- Retention class: a named policy category such as transient, raw-evidence, + release-evidence, audit-prep, or permanent-record. +- Retention rule: the default storage duration and deletion behavior for a class. +- Retention extension: a time-bounded extension of an artifact's expiry date. +- Hold: a stronger instruction that prevents deletion until explicitly released. +- Retrieval tier: a future storage or access class such as hot, warm, cold, or + archived. + +## Scope + +In scope: + +- metadata registry for artifact packages and files, +- content hashing and manifest generation, +- pluggable storage backend interface, +- local filesystem backend for development, +- S3-compatible backend suitable for Ceph RGW, +- default retention classes and expiry calculation, +- retention extension and hold records, +- retrieval metadata and download path generation, +- audit events for ingestion, retrieval, retention changes, and deletion, +- API-first service suitable for automation, +- pilot integration with guide-board assessment runs. + +Out of scope for the initial service: + +- replacing Statehub as the work, repository, or decision system of record, +- embedding guide-board-specific assessment semantics in the registry core, +- full compliance certification or legal-record guarantees, +- cloud-provider-specific lifecycle automation beyond backend adapter hooks, +- asynchronous cold-archive restore flows, +- user-facing UI beyond API contracts and minimal operator docs. + +## Relationship To Other Services + +`artifact-store` should remain a shared infrastructure service. + +- `guide-board` produces assessment packages and asks `artifact-store` to + preserve them. +- `open-cmis-tck` can add CMIS-specific scorecards and log reviews before a + guide-board run is ingested. +- `Statehub` records work, decisions, repository state, and links to artifact + registry identifiers. +- Ceph is a strong self-hosted storage backend candidate because its RGW layer is + S3-compatible, but the registry must not be Ceph-only. + +## Boundary + +The registry can prove what it stored, where it stored it, which hashes it +computed, and which retention decisions were applied. It does not prove the +truth of the artifact contents, certify a system, or replace formal records +management without additional governance. diff --git a/README.md b/README.md index fcd7b8f..59e55c4 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,14 @@ -# repo-seed +# artifact-store -A git repository template to bootstrap coulomb projects from. \ No newline at end of file +Generic artifact registry and storage gateway for generated outputs, evidence +packages, reports, logs, and release artifacts. + +The registry owns artifact identity, metadata, provenance, retention policy, and +retrieval records. Actual bytes are delegated to configured storage backends such +as a local filesystem, S3-compatible object storage, or Ceph RGW. + +Start here: + +- [INTENT.md](INTENT.md) +- [docs/ARCHITECTURE-BLUEPRINT.md](docs/ARCHITECTURE-BLUEPRINT.md) +- [workplans/ARTIFACT-STORE-WP-0001-service-baseline.md](workplans/ARTIFACT-STORE-WP-0001-service-baseline.md) diff --git a/SCOPE.md b/SCOPE.md new file mode 100644 index 0000000..435273a --- /dev/null +++ b/SCOPE.md @@ -0,0 +1,133 @@ +# SCOPE + +> This file helps you quickly understand what this repository is about, +> when it is relevant, and when it is not. +> It is intentionally lightweight and may be incomplete. + +--- + +## One-liner + +`artifact-store` is a generic artifact registry and storage gateway for durable +generated outputs, evidence packages, reports, logs, snapshots, exports, and +release artifacts. + +--- + +## Core Idea + +Generated artifacts become valuable when they are findable, attributable, +retained for the right amount of time, and safely discardable when they expire. +This repository makes artifact preservation a shared platform concern: producers +register packages and files, the registry owns metadata and lifecycle, and +storage backends own bytes. + +--- + +## In Scope + +- Artifact package and artifact file metadata. +- Content hashing, manifests, provenance, and audit events. +- Pluggable storage backend interface with local filesystem and S3-compatible + backends as the first targets. +- Retention classes, expiry calculation, retention extension, and holds. +- Retrieval metadata and download/link surfaces for automation. +- Pilot ingestion flow for guide-board / OpenCMIS TCK assessment output. + +--- + +## Out of Scope + +- Replacing StateHub as the work, repository, or decision system of record. +- Encoding guide-board-specific assessment semantics in the registry core. +- Formal compliance certification or legal-record guarantees by itself. +- Cloud-provider-specific lifecycle automation beyond backend adapter hooks. +- User-facing UI beyond API contracts and minimal operator documentation. + +--- + +## Relevant When + +- A tool has generated files that need durable storage and stable identifiers. +- StateHub, release records, or operators need to link to preserved evidence. +- A producer needs backend-neutral storage across local disk, Ceph RGW, or other + S3-compatible object storage. +- Retention, hold, or deletion eligibility needs to be explicit and auditable. + +--- + +## Not Relevant When + +- The artifact is purely temporary scratch output with no retention need. +- The work is about StateHub tasks, decisions, or repository catalog data rather + than artifact bytes and metadata. +- A producer needs domain-specific scoring, validation, or assessment semantics. +- The requirement is a human-facing artifact browser rather than API-first + preservation. + +--- + +## Current State + +- Status: concept / service-baseline planning +- Implementation: documentation and initial workplan only +- Stability: evolving +- Usage: none yet; first pilot target is guide-board assessment output + +--- + +## How It Fits + +- Upstream dependencies: producer repositories such as `guide-board` and + `open-cmis-tck`; future storage backends such as local filesystem and Ceph RGW. +- Downstream consumers: StateHub records, release/evidence workflows, operators, + and future artifact search or retrieval UIs. +- Often used with: `guide-board`, `open-cmis-tck`, `kontextual-engine`, + StateHub, and S3-compatible object storage. + +--- + +## Terminology + +- Preferred terms: artifact package, artifact file, registry record, storage + backend, storage location, retention class, retention rule, hold. +- Also known as: artifact registry, evidence store, storage gateway. +- Potentially confusing terms: StateHub links to artifact identifiers but does + not store artifact bytes. + +--- + +## Related / Overlapping + +- `guide-board` - produces assessment packages and evidence output. +- `open-cmis-tck` - contributes CMIS-specific assessment artifacts for the pilot. +- `kontextual-engine` - first likely subject of preserved guide-board evidence. +- `the-custodian/state-hub` - records work, decisions, repository state, and + links to artifact registry identifiers. + +--- + +## Getting Oriented + +- Start with: `INTENT.md`. +- Key files / directories: `docs/ARCHITECTURE-BLUEPRINT.md`, `workplans/`. +- Entry points: no service entry point yet; see + `workplans/ARTIFACT-STORE-WP-0001-service-baseline.md`. + +--- + +## Provided Capabilities + +```capability +type: infrastructure +title: Artifact package preservation +description: Register generated artifact packages and files, store bytes through a configured backend, compute hashes, apply retention policy, and return stable package identifiers for StateHub or producer records. +keywords: [artifacts, evidence, retention, storage, registry, provenance] +``` + +--- + +## Notes + +The first concrete pilot is preserving `guide-board` / `open-cmis-tck` +assessment output for `kontextual-engine`. diff --git a/docs/ARCHITECTURE-BLUEPRINT.md b/docs/ARCHITECTURE-BLUEPRINT.md new file mode 100644 index 0000000..d374d88 --- /dev/null +++ b/docs/ARCHITECTURE-BLUEPRINT.md @@ -0,0 +1,330 @@ +# Artifact Store Architecture Blueprint + +Status: draft +Created: 2026-05-15 + +## Purpose + +`artifact-store` provides a generic registry and storage gateway for durable +generated artifacts. Producers register packages and files with metadata; +storage adapters persist the bytes; retention policy decides how long artifacts +remain eligible for retrieval. + +The design keeps artifact identity and lifecycle separate from storage +implementation. This allows the first version to run against local filesystem +storage while the production path can use S3-compatible object storage such as +Ceph RGW. + +## Architecture Summary + +```text +producer + -> Artifact Registry API + -> metadata database + -> retention policy engine + -> audit event log + -> storage adapter interface + -> local filesystem backend + -> S3-compatible backend + -> Ceph RGW deployment + -> future cloud/blob/archive backends +``` + +The registry is the authority for artifact metadata and lifecycle. Backends are +responsible for byte storage and retrieval. + +## Design Principles + +- Backend-neutral registry: no producer should know whether bytes live in Ceph, + local disk, or a cloud bucket. +- Content-addressable confidence: every stored file has a digest and size. +- Retention by default: every package receives an expiry decision at ingestion. +- Extensions are explicit: retention extensions and holds are audit events, not + silent metadata edits. +- Packages remain portable: a manifest should be enough to understand a package + without calling the producer. +- Statehub links, it does not store bytes: Statehub records artifact IDs and + outcomes; artifact-store owns file persistence. +- Deletion is deliberate: expiry makes artifacts eligible for deletion; deletion + jobs must be auditable and reversible only when the backend still has data. + +## Components + +### Registry API + +HTTP API for producers and operators. + +Initial responsibilities: + +- create artifact packages, +- upload or ingest files, +- finalize packages, +- retrieve package metadata, +- list/search packages by subject and producer metadata, +- create retention extensions and holds, +- expose download metadata or redirect/download endpoints, +- expose health and backend status. + +### Metadata Store + +Persistent database for registry state. + +Initial implementation can use SQLite for local development and PostgreSQL for +shared service deployments if that matches the surrounding service stack. + +Core tables: + +- `artifact_packages` +- `artifact_files` +- `storage_locations` +- `retention_rules` +- `retention_events` +- `audit_events` + +### Storage Adapter Interface + +Small backend contract used by the API service. + +Required operations: + +- `put(object_key, stream, metadata) -> storage_location` +- `get(object_key) -> stream or signed_url` +- `head(object_key) -> object_metadata` +- `delete(object_key) -> deletion_result` +- `health() -> backend_status` + +Initial backends: + +- local filesystem backend for tests and development, +- S3-compatible backend for Ceph RGW and cloud object stores. + +### Retention Policy Engine + +Applies default rules at ingestion and records later changes. + +Initial retention classes: + +- `transient`: short-lived scratch artifacts, +- `raw-evidence`: raw logs and run output, +- `summary-evidence`: compact reports and summaries, +- `release-evidence`: release or customer-facing evidence packages, +- `permanent-record`: manually held records with no automatic expiry. + +Each package stores: + +- selected retention class, +- default retention rule, +- computed `expires_at`, +- extension records, +- hold records, +- deletion eligibility state. + +### Audit Log + +Append-only record of important events: + +- package created, +- file uploaded, +- package finalized, +- retrieval requested, +- retention extended, +- hold applied or released, +- deletion requested, +- deletion completed or failed. + +The audit log does not need to be cryptographic in the first release, but the +schema should leave room for signed events or external write-once storage later. + +## Data Model + +### Artifact Package + +Required fields: + +- `id` +- `name` +- `producer` +- `subject` +- `retention_class` +- `status` +- `created_at` +- `finalized_at` +- `expires_at` +- `metadata` + +Recommended metadata keys: + +- `repo_slug` +- `run_id` +- `assessment_id` +- `target_profile_ref` +- `assessment_profile_ref` +- `source_commits` +- `tool_versions` +- `environment` + +### Artifact File + +Required fields: + +- `id` +- `package_id` +- `relative_path` +- `media_type` +- `size_bytes` +- `sha256` +- `created_at` + +### Storage Location + +Required fields: + +- `id` +- `artifact_file_id` +- `backend_id` +- `object_key` +- `storage_class` +- `status` +- `created_at` +- `last_verified_at` + +### Retention Event + +Required fields: + +- `id` +- `package_id` +- `event_type` +- `reason` +- `created_by` +- `created_at` +- `previous_expires_at` +- `new_expires_at` + +Event types: + +- `default_rule_applied` +- `extended` +- `hold_applied` +- `hold_released` +- `deletion_eligible` +- `deleted` + +## API Shape + +Initial endpoints: + +```text +GET /health +GET /backends +POST /packages +GET /packages +GET /packages/{package_id} +POST /packages/{package_id}/files +POST /packages/{package_id}/finalize +GET /packages/{package_id}/manifest +GET /files/{file_id}/download +POST /packages/{package_id}/retention/extensions +POST /packages/{package_id}/retention/holds +POST /packages/{package_id}/retention/holds/{hold_id}/release +``` + +The first ingestion path can accept multipart file uploads. A later trusted-local +operator endpoint may ingest from server-local paths, but it should be disabled +by default because path ingestion changes the security boundary. + +## Package Manifest + +Every finalized package should expose a JSON manifest containing: + +- package metadata, +- retention summary, +- file list, +- file digests and sizes, +- storage backend references, +- source metadata, +- created/finalized timestamps. + +For guide-board runs, the manifest should preserve links to: + +- `run.json` +- `retention-summary.json` +- `reports/assessment-package.json` +- `reports/report.md` +- extension-generated scorecards or log reviews, +- raw artifact files captured by the assessment package manifest. + +## Guide-Board Pilot Flow + +```text +guide-board run directory + -> open-cmis-tck scorecard/log review + -> artifact-store package create + -> upload run files + -> finalize manifest + -> Statehub record links package id and summary +``` + +The artifact package should carry: + +- run id, +- target profile reference, +- assessment profile reference, +- result status, +- source commits for guide-board, open-cmis-tck, and the assessed repository, +- important report paths, +- retention class `raw-evidence` or `release-evidence`. + +## Ceph And S3-Compatible Storage + +Ceph should be introduced through the S3-compatible adapter, not as a special +case in producer logic. + +Configuration should support: + +- endpoint URL, +- bucket, +- region, +- access key reference, +- secret key reference, +- optional server-side encryption settings, +- object key prefix, +- storage class label. + +The service should never require credentials in producer request bodies. Use +environment variables, mounted secret files, or a local secret provider. + +## Future Retrieval Tiers + +The initial API can treat all stored files as immediately retrievable. Later, +storage locations can include: + +- `retrieval_tier`: hot, warm, cold, archive, +- `restore_status`: available, restore_requested, restoring, restored, expired, +- `restore_requested_at`, +- `restore_expires_at`. + +The registry API should be able to return "not immediately available" without +changing artifact identity. + +## Security Boundary + +Initial service assumptions: + +- internal service, not public internet exposed, +- authenticated producer/operator API before shared deployment, +- no secret values stored in artifact metadata, +- package paths are logical paths, not trusted filesystem paths, +- download authorization should be checked at the registry layer. + +Files may contain sensitive evidence. The service must treat metadata and bytes +as confidential by default. + +## Open Questions + +- Which identity provider should guard shared deployments? +- Should package metadata schemas be open-ended JSON or typed by producer? +- Should deduplication be package-local only or global by content hash? +- Should deletion first mark records deleted, then delete bytes, or reverse that + order with compensating events? +- How much Statehub integration belongs in this repo versus in Statehub clients? diff --git a/workplans/ARTIFACT-STORE-WP-0001-service-baseline.md b/workplans/ARTIFACT-STORE-WP-0001-service-baseline.md new file mode 100644 index 0000000..51116f2 --- /dev/null +++ b/workplans/ARTIFACT-STORE-WP-0001-service-baseline.md @@ -0,0 +1,229 @@ +--- +id: ARTIFACT-STORE-WP-0001 +type: workplan +title: "Artifact Store Service Baseline" +repo: artifact-store +domain: stack +status: active +owner: codex +topic_slug: stack +planning_priority: high +planning_order: 1 +created: "2026-05-15" +updated: "2026-05-15" +state_hub_workstream_id: "aebf996c-8721-4e8c-9e56-61d5e4bf8dcb" +--- + +# ARTIFACT-STORE-WP-0001: Artifact Store Service Baseline + +## Purpose + +Implement the first usable artifact registry and storage gateway. The service +should preserve artifact packages, index their metadata, delegate bytes to a +configured storage backend, apply default retention rules, and expose stable +package identifiers that Statehub and producer repositories can link to. + +The first producer target is a guide-board assessment run, including OpenCMIS TCK +reports and raw assessment artifacts. + +## Background + +Guide-board can already produce self-contained run directories with retention +summaries, assessment packages, raw artifacts, scorecards, and log reviews. Those +directories should not live only in `/tmp`, and committing raw evidence into +producer repositories is the wrong long-term shape. + +`artifact-store` becomes the shared preservation layer: + +- producers generate files, +- artifact-store registers and stores them, +- Statehub records the work outcome and links to the registry package, +- storage backends handle durable bytes. + +Ceph is the likely self-hosted production backend through its S3-compatible RGW +interface, but the service must keep the backend interface generic. + +## Target Architecture + +```text +producer package + -> registry API + -> metadata database + -> retention policy engine + -> storage adapter + -> local filesystem or S3-compatible object storage +``` + +## Boundary + +This workplan owns the first service implementation and API contract. It does +not need to build a UI, implement cold-storage restore tiers, replace Statehub, +or provide formal records-management certification. + +## D1.1 - Service Scaffold And Repository Identity + +```task +id: ARTIFACT-STORE-WP-0001-T001 +status: todo +priority: high +state_hub_task_id: "84209430-ec3b-4c5e-924e-019c25434230" +``` + +Acceptance: + +- Replace the seed README with artifact-store service instructions. +- Add a Python service scaffold with a clear package/module layout. +- Provide a local development command. +- Provide a test command. +- Keep generated artifact bytes and local databases ignored by git. +- Document required environment variables. + +## D1.2 - Registry Data Model + +```task +id: ARTIFACT-STORE-WP-0001-T002 +status: todo +priority: high +state_hub_task_id: "e5249a39-46a2-4b56-813e-0339c52cd14e" +``` + +Acceptance: + +- Define persistent models for artifact packages, files, storage locations, + retention rules, retention events, and audit events. +- Store package metadata as structured JSON while keeping core query fields + explicit. +- Record package lifecycle status: created, uploading, finalized, deleted, and + failed. +- Record file `sha256`, size, media type, and logical relative path. +- Add migrations or a reproducible schema initialization path. + +## D1.3 - Local Filesystem Storage Backend + +```task +id: ARTIFACT-STORE-WP-0001-T003 +status: todo +priority: high +state_hub_task_id: "68f9a752-0012-4cc1-8768-ec3f75295e7a" +``` + +Acceptance: + +- Implement a storage adapter interface. +- Implement a local filesystem backend for development and tests. +- Store objects under deterministic package/file keys. +- Prevent path traversal and accidental writes outside the configured storage + root. +- Add backend health reporting. +- Add tests for put, get, head, and delete operations. + +## D1.4 - Package Ingestion API + +```task +id: ARTIFACT-STORE-WP-0001-T004 +status: todo +priority: high +state_hub_task_id: "e3879111-4be9-4731-8aea-15abb874f960" +``` + +Acceptance: + +- Add endpoints to create a package, upload files, finalize a package, retrieve + package metadata, list packages, and download files. +- Compute file hashes server-side during ingestion. +- Reject duplicate logical paths within one package unless explicitly replacing + a non-finalized file. +- Produce a package manifest after finalization. +- Add API tests covering successful ingestion and validation failures. + +## D1.5 - Retention Baseline + +```task +id: ARTIFACT-STORE-WP-0001-T005 +status: todo +priority: high +state_hub_task_id: "2d6cbd83-c348-45ad-a223-7870a3412225" +``` + +Acceptance: + +- Seed default retention classes for transient, raw-evidence, summary-evidence, + release-evidence, and permanent-record. +- Apply a default `expires_at` when a package is created or finalized. +- Add endpoints to extend retention and apply or release holds. +- Record retention changes as retention events and audit events. +- Expose deletion eligibility without deleting bytes automatically in the first + implementation. + +## D1.6 - S3-Compatible Backend Design Hook + +```task +id: ARTIFACT-STORE-WP-0001-T006 +status: todo +priority: medium +state_hub_task_id: "7b980a55-2364-48c3-98ac-081629a8d2b7" +``` + +Acceptance: + +- Define configuration fields for an S3-compatible backend. +- Keep the adapter contract compatible with Ceph RGW. +- Add an implementation stub or feature-flagged backend if dependencies are not + ready. +- Document expected Ceph/S3 configuration without requiring a live Ceph service + for baseline tests. + +## D1.7 - Guide-Board Pilot Ingestion + +```task +id: ARTIFACT-STORE-WP-0001-T007 +status: todo +priority: high +state_hub_task_id: "eb822821-353c-4cd2-95bf-acb2f084b7ea" +``` + +Acceptance: + +- Provide a CLI helper or documented curl flow to register a guide-board run + directory as one package. +- Preserve guide-board run metadata: run id, target profile, assessment profile, + evidence result counts, finding counts, source commits, and report paths. +- Ingest the CMIS pilot run shape, including scorecard and log-review reports. +- Return a package id suitable for recording in Statehub. +- Add a fixture-based test that does not require the real OpenCMIS TCK. + +## D1.8 - Operator Documentation And Handoff + +```task +id: ARTIFACT-STORE-WP-0001-T008 +status: todo +priority: medium +state_hub_task_id: "9b60036c-61f2-4c22-ad31-7213473d42d0" +``` + +Acceptance: + +- Document local run, test, and package ingestion commands. +- Document retention behavior and extension flow. +- Document the boundary between artifact-store and Statehub. +- Include a dev-agent handoff section listing the first implementation order. +- Keep architecture docs aligned with the implemented API. + +## Suggested Implementation Order + +1. Service scaffold, test harness, and README. +2. Metadata models and local database setup. +3. Local filesystem storage adapter. +4. Package create/upload/finalize/download API. +5. Retention defaults, extension, hold, and audit events. +6. Guide-board run ingestion helper. +7. S3-compatible backend configuration and Ceph notes. + +## First Pilot Success Criteria + +- A completed guide-board CMIS run can be ingested from a local directory. +- The package manifest lists every stored file with SHA-256 and size. +- The registry returns a stable package id. +- Files can be downloaded through the service. +- Default retention is visible and can be extended. +- Statehub can record the package id and summary without storing artifact bytes.