diff --git a/.env.example b/.env.example index 6cd7f90..d3ae3da 100644 --- a/.env.example +++ b/.env.example @@ -26,6 +26,15 @@ ARTIFACTSTORE_ANON_READ=false ARTIFACTSTORE_API_URL=http://127.0.0.1:8000 ARTIFACTSTORE_API_TOKEN=dev-token +# Guide-board pilot helper defaults. +ARTIFACTSTORE_GUIDE_BOARD_SCHEMA=schemas/guide-board.run.v1.json +STATE_HUB_URL=http://127.0.0.1:8000 +STATE_HUB_AUTHOR=artifact-store +STATE_HUB_WORKSTREAM_ID= +STATE_HUB_TASK_ID= +GUIDE_BOARD_RUN_DIR= +ARTIFACTSTORE_INGEST_RESULT_PATH= + # Optional TOML file overriding retention class default durations. ARTIFACTSTORE_RETENTION_CONFIG_PATH= diff --git a/docs/OPERATOR.md b/docs/OPERATOR.md index 73e451c..472f291 100644 --- a/docs/OPERATOR.md +++ b/docs/OPERATOR.md @@ -3,8 +3,9 @@ Status: v0.1 (WP-0003 baseline) Updated: 2026-05-16 -This guide is the user manual for running `artifact-store` v0.1 — the -library, CLI, HTTP ingestion API, manifest surface, and retention lifecycle. +This guide is the user manual for running `artifact-store` v0.1: the library, +CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks, +and the guide-board pilot path. For architectural background see [ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md), the ADRs under @@ -52,6 +53,7 @@ All settings are prefixed with ``ARTIFACTSTORE_`` and read by | `ARTIFACTSTORE_ANON_READ` | `false` | Set `true` only for local demos where read endpoints may be anonymous. | | `ARTIFACTSTORE_API_URL` | `http://127.0.0.1:8000` | Default API base URL used by HTTP-backed CLI commands. | | `ARTIFACTSTORE_API_TOKEN` | empty | Default bearer token used by HTTP-backed CLI commands. | +| `ARTIFACTSTORE_GUIDE_BOARD_SCHEMA` | `schemas/guide-board.run.v1.json` | Schema path used by guide-board pilot bootstrap helpers. | | `ARTIFACTSTORE_RETENTION_CONFIG_PATH` | empty | Optional TOML file overriding retention-class default durations. | | `ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS` | `3600` | Default interval for external schedulers that invoke the retention sweeper. | | `ARTIFACTSTORE_STORAGE_BACKENDS` | `local` | Comma-separated backend IDs to configure (`local`, `s3`). | @@ -67,6 +69,9 @@ All settings are prefixed with ``ARTIFACTSTORE_`` and read by | `ARTIFACTSTORE_S3_SSE` | empty | Optional server-side encryption value, e.g. `AES256`. | | `ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES` | `67108864` | Multipart threshold for the S3 backend. | | `ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES` | `8388608` | Multipart part size for the S3 backend. | +| `STATE_HUB_URL` | `http://127.0.0.1:8000` | State Hub base URL used by guide-board linkage helpers. | +| `STATE_HUB_WORKSTREAM_ID` | empty | Optional workstream id for State Hub linkage events. | +| `STATE_HUB_TASK_ID` | empty | Optional task id for State Hub linkage events. | See [`.env.example`](../.env.example) for the canonical template. @@ -201,6 +206,7 @@ digest, emits `v1.storage.location_verified`, and marks failed locations as | `artifactstore manifest ` | Fetch the JSON manifest projection through the HTTP API. | | `artifactstore retention sweep` | Run one deletion-eligibility sweep against the configured DB. | | `artifactstore storage verify --backend ` | Re-read stored objects for a backend and record verification events. | +| `artifactstore guide-board ingest ` | Ingest one guide-board run directory as an artifact package. | The CLI is a thin client over `artifactstore.registry.Registry` (see [ADR-0005](adr/0005-v1-tech-stack.md)). @@ -215,6 +221,7 @@ The CLI is a thin client over `artifactstore.registry.Registry` | `/files...` | File metadata and byte downloads, including single-range reads. | | `/uploads...` | Upload-session wire shape for whole-body v1 uploads. | | `/packages/{id}/retention...` | Extend retention, apply/release holds, and read retention history. | +| `POST /metadata-schemas` | Register package metadata schemas by slug. | | `GET /events` | Long-poll event feed, CBOR by default or JSON with `Accept: application/json`. | All non-health routes require a bearer token unless @@ -267,6 +274,14 @@ asyncio.run(main()) Prerequisites: `make migrate-fresh` has been run so the schema and the retention class seeds exist. +## Guide-board pilot + +The guide-board pilot stores a run directory as one artifact package and records +only package identifiers in State Hub. See +[docs/pilots/guide-board.md](pilots/guide-board.md) for schema registration, +the real `~/guide-board` plus `~/open-cmis-tck` smoke procedure, and the exact +`POST /progress/` linkage payload. + ## Replay / disaster recovery Every state-changing operation writes one row to `events` and updates the @@ -303,6 +318,8 @@ sequence order through the canonical view writer. The result is and the v1 schema commitments. - [ROADMAP.md](ROADMAP.md) — workplan sequencing. - [ASSEMBLY-EXPERIMENT.md](ASSEMBLY-EXPERIMENT.md) — opt-in asm research line. +- [pilots/guide-board.md](pilots/guide-board.md) — guide-board pilot ingestion + and State Hub linkage. ### Architecture Decision Records diff --git a/docs/pilots/guide-board.md b/docs/pilots/guide-board.md new file mode 100644 index 0000000..d73c706 --- /dev/null +++ b/docs/pilots/guide-board.md @@ -0,0 +1,162 @@ +# Guide-Board Pilot + +Status: active pilot +Updated: 2026-05-16 + +This guide wires the first real producer into artifact-store. A guide-board run +directory becomes one artifact package; State Hub records the package identity +and manifest digest, but never stores artifact bytes. + +## One-Time Schema Registration + +Start artifact-store and register the pilot metadata schema: + +```sh +cd /home/worsch/artifact-store +export ARTIFACTSTORE_API_URL=http://127.0.0.1:8000 +export ARTIFACTSTORE_API_TOKEN=dev-token +python3 scripts/register-guide-board-schema.py +``` + +The script posts this payload shape to `POST /metadata-schemas`: + +```json +{ + "slug": "guide-board.run.v1", + "json_schema": { + "$id": "artifactstore:schemas:guide-board.run.v1" + } +} +``` + +## Ingest A Run + +The local CLI path opens the configured database and storage backend directly: + +```sh +artifactstore guide-board ingest /tmp/guide-board-run \ + --schema schemas/guide-board.run.v1.json +``` + +Output is JSON: + +```json +{ + "package_id": "00000000-0000-0000-0000-000000000000", + "manifest_digest": "blake3:...", + "file_count": 8, + "reused_existing": false +} +``` + +The helper is idempotent by guide-board `run_id`. Re-ingesting the same +finalized run returns the existing package id and manifest digest with +`reused_existing: true`. + +## State Hub Linkage + +After ingest, record a progress event with structured `detail`. This is the +canonical linkage shape: + +```sh +curl -s -X POST "$STATE_HUB_URL/progress/" \ + -H "Content-Type: application/json" \ + -d '{ + "event_type": "artifact_link", + "author": "artifact-store", + "workstream_id": "701c4d8c-5cf4-4a4a-ab60-1dcae53fe771", + "task_id": "bffa3573-4a1f-4c12-8c73-6d55bd8f6297", + "summary": "guide-board run artifacts stored in artifact-store package ", + "detail": { + "producer": "guide-board", + "artifact_store_api_url": "http://127.0.0.1:8000", + "run_dir": "/tmp/guide-board-run", + "run_id": "", + "target_profile_ref": "", + "assessment_profile_ref": "", + "result_status": "", + "package_id": "", + "manifest_digest": "", + "file_count": 8, + "retention_class": "release-evidence" + } + }' +``` + +Use the checked-in helper to build the same event from environment variables: + +```sh +export STATE_HUB_URL=http://127.0.0.1:8000 +export STATE_HUB_WORKSTREAM_ID=701c4d8c-5cf4-4a4a-ab60-1dcae53fe771 +export STATE_HUB_TASK_ID=bffa3573-4a1f-4c12-8c73-6d55bd8f6297 +export GUIDE_BOARD_RUN_DIR=/tmp/guide-board-run +export ARTIFACTSTORE_INGEST_RESULT_PATH=/tmp/artifactstore-guide-board-ingest.json +python3 scripts/link-guide-board-package.py +``` + +The helper posts only identifiers, summary metadata, and links. Artifact bytes +remain in artifact-store storage backends. + +## Real Producer Smoke + +This path uses the real guide-board core and the external `open-cmis-tck` +extension. It is expected to complete under five minutes on a developer +workstation once Python dependencies and local candidate prerequisites are in +place. + +1. Produce a guide-board run: + +```sh +cd /home/worsch/guide-board +mkdir -p /tmp/guide-board-artifact-store-smoke +PYTHONPATH=src python3 -m guide_board \ + --extension-dir ../open-cmis-tck \ + run \ + --target ../open-cmis-tck/profiles/targets/kontextual-cmis-compat.json \ + --assessment ../open-cmis-tck/profiles/assessments/cmis-browser-baseline.json \ + --output-dir /tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline +``` + +2. Start artifact-store: + +```sh +cd /home/worsch/artifact-store +cp .env.example .env +make migrate-fresh +make dev +``` + +3. Register the schema and ingest the run: + +```sh +export ARTIFACTSTORE_API_TOKEN=dev-token +python3 scripts/register-guide-board-schema.py +artifactstore guide-board ingest \ + /tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline \ + --schema schemas/guide-board.run.v1.json \ + > /tmp/artifactstore-guide-board-ingest.json +cat /tmp/artifactstore-guide-board-ingest.json +``` + +4. Verify the manifest: + +```sh +PACKAGE_ID=$(python3 -c 'import json; print(json.load(open("/tmp/artifactstore-guide-board-ingest.json"))["package_id"])') +artifactstore manifest "$PACKAGE_ID" +``` + +5. Record State Hub linkage: + +```sh +export STATE_HUB_URL=http://127.0.0.1:8000 +export STATE_HUB_WORKSTREAM_ID=701c4d8c-5cf4-4a4a-ab60-1dcae53fe771 +export STATE_HUB_TASK_ID=bffa3573-4a1f-4c12-8c73-6d55bd8f6297 +export GUIDE_BOARD_RUN_DIR=/tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline +export ARTIFACTSTORE_INGEST_RESULT_PATH=/tmp/artifactstore-guide-board-ingest.json +python3 scripts/link-guide-board-package.py +``` + +To smoke the storage swap after enabling WP-0004 S3 settings, keep the same +guide-board ingest command and set +`ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local'` +before starting artifact-store. diff --git a/schemas/guide-board.run.v1.json b/schemas/guide-board.run.v1.json new file mode 100644 index 0000000..5a589db --- /dev/null +++ b/schemas/guide-board.run.v1.json @@ -0,0 +1,42 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "artifactstore:schemas:guide-board.run.v1", + "title": "Guide-board run metadata", + "type": "object", + "additionalProperties": false, + "required": [ + "run_id", + "target_profile_ref", + "assessment_profile_ref", + "result_status", + "source_commits", + "report_paths", + "evidence_counts", + "finding_counts" + ], + "properties": { + "run_id": { "type": "string", "minLength": 1 }, + "target_profile_ref": { "type": "string", "minLength": 1 }, + "assessment_profile_ref": { "type": "string", "minLength": 1 }, + "result_status": { "type": "string", "minLength": 1 }, + "source_commits": { + "type": "object", + "additionalProperties": { + "type": "string", + "minLength": 7 + } + }, + "report_paths": { + "type": "array", + "items": { "type": "string", "minLength": 1 } + }, + "evidence_counts": { + "type": "object", + "additionalProperties": { "type": "integer", "minimum": 0 } + }, + "finding_counts": { + "type": "object", + "additionalProperties": { "type": "integer", "minimum": 0 } + } + } +} diff --git a/scripts/link-guide-board-package.py b/scripts/link-guide-board-package.py new file mode 100644 index 0000000..a2bec5d --- /dev/null +++ b/scripts/link-guide-board-package.py @@ -0,0 +1,133 @@ +#!/usr/bin/env python3 +"""Record guide-board artifact package linkage in State Hub.""" + +from __future__ import annotations + +import json +import os +import urllib.error +import urllib.request +from pathlib import Path +from typing import Any + + +def main() -> None: + state_hub_url = _env("STATE_HUB_URL", "http://127.0.0.1:8000").rstrip("/") + artifact_api_url = _env("ARTIFACTSTORE_API_URL", "http://127.0.0.1:8000").rstrip("/") + run_dir = Path(_required("GUIDE_BOARD_RUN_DIR")) + run_json = _read_json(run_dir / "run.json") + retention_summary = _read_json(run_dir / "retention-summary.json") + ingest_result = _ingest_result() + + package_id = _env("ARTIFACTSTORE_PACKAGE_ID") or _required_from( + ingest_result, + "package_id", + "ARTIFACTSTORE_PACKAGE_ID", + ) + manifest_digest = _env("ARTIFACTSTORE_MANIFEST_DIGEST") or _required_from( + ingest_result, + "manifest_digest", + "ARTIFACTSTORE_MANIFEST_DIGEST", + ) + run_id = _env("GUIDE_BOARD_RUN_ID") or str( + run_json.get("run_id") or run_json.get("id") or retention_summary.get("run_id") + ) + summary = retention_summary.get("summary", {}) + if not isinstance(summary, dict): + summary = {} + result_status = _env("GUIDE_BOARD_RESULT_STATUS") or str( + run_json.get("result_status") or run_json.get("status") or summary.get("status") + ) + + detail: dict[str, Any] = { + "producer": "guide-board", + "artifact_store_api_url": artifact_api_url, + "run_dir": str(run_dir), + "run_id": run_id, + "target_profile_ref": str(run_json["target_profile_ref"]), + "assessment_profile_ref": str(run_json["assessment_profile_ref"]), + "result_status": result_status, + "package_id": package_id, + "manifest_digest": manifest_digest, + } + if "file_count" in ingest_result: + detail["file_count"] = ingest_result["file_count"] + retention_class = _env("ARTIFACTSTORE_RETENTION_CLASS") + if retention_class: + detail["retention_class"] = retention_class + + payload: dict[str, Any] = { + "event_type": _env("STATE_HUB_EVENT_TYPE", "artifact_link"), + "author": _env("STATE_HUB_AUTHOR", "artifact-store"), + "summary": _env( + "STATE_HUB_SUMMARY", + f"guide-board run {run_id} artifacts stored in artifact-store package {package_id}", + ), + "detail": detail, + } + for field, env_name in ( + ("topic_id", "STATE_HUB_TOPIC_ID"), + ("workstream_id", "STATE_HUB_WORKSTREAM_ID"), + ("task_id", "STATE_HUB_TASK_ID"), + ("session_id", "STATE_HUB_SESSION_ID"), + ): + value = _env(env_name) + if value: + payload[field] = value + + request = urllib.request.Request( + f"{state_hub_url}/progress/", + data=json.dumps(payload).encode("utf-8"), + headers={"Content-Type": "application/json", "Accept": "application/json"}, + method="POST", + ) + try: + with urllib.request.urlopen(request, timeout=30) as response: + print(response.read().decode("utf-8")) + except urllib.error.HTTPError as exc: + detail_text = exc.read().decode("utf-8", errors="replace") + raise SystemExit(f"HTTP {exc.code}: {detail_text}") from exc + + +def _env(name: str, default: str = "") -> str: + return os.environ.get(name, default) + + +def _required(name: str) -> str: + value = _env(name) + if not value: + raise SystemExit(f"missing required environment variable: {name}") + return value + + +def _required_from(payload: dict[str, Any], key: str, env_name: str) -> str: + value = payload.get(key) + if isinstance(value, str) and value: + return value + raise SystemExit(f"missing {key!r}; set {env_name} or ARTIFACTSTORE_INGEST_RESULT_PATH") + + +def _ingest_result() -> dict[str, Any]: + raw_json = _env("ARTIFACTSTORE_INGEST_RESULT_JSON") + if raw_json: + payload = json.loads(raw_json) + if not isinstance(payload, dict): + raise SystemExit("ARTIFACTSTORE_INGEST_RESULT_JSON must be a JSON object") + return payload + + result_path = _env("ARTIFACTSTORE_INGEST_RESULT_PATH") + if result_path: + return _read_json(Path(result_path)) + return {} + + +def _read_json(path: Path) -> dict[str, Any]: + with path.open("r", encoding="utf-8") as fh: + payload = json.load(fh) + if not isinstance(payload, dict): + raise SystemExit(f"{path} must contain a JSON object") + return payload + + +if __name__ == "__main__": + main() diff --git a/scripts/register-guide-board-schema.py b/scripts/register-guide-board-schema.py new file mode 100644 index 0000000..e165f18 --- /dev/null +++ b/scripts/register-guide-board-schema.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python3 +"""Register the guide-board pilot metadata schema through the HTTP API.""" + +from __future__ import annotations + +import json +import os +import urllib.error +import urllib.request +from pathlib import Path + +SCHEMA_SLUG = "guide-board.run.v1" + + +def main() -> None: + api_url = os.environ.get("ARTIFACTSTORE_API_URL", "http://127.0.0.1:8000").rstrip("/") + token = os.environ["ARTIFACTSTORE_API_TOKEN"] + schema_path = Path( + os.environ.get("ARTIFACTSTORE_GUIDE_BOARD_SCHEMA", "schemas/guide-board.run.v1.json") + ) + payload = { + "slug": SCHEMA_SLUG, + "json_schema": json.loads(schema_path.read_text(encoding="utf-8")), + } + request = urllib.request.Request( + f"{api_url}/metadata-schemas", + data=json.dumps(payload).encode(), + headers={ + "Authorization": f"Bearer {token}", + "Content-Type": "application/json", + "Accept": "application/json", + }, + method="POST", + ) + try: + with urllib.request.urlopen(request, timeout=30) as response: + print(response.read().decode("utf-8")) + except urllib.error.HTTPError as exc: + detail = exc.read().decode("utf-8", errors="replace") + raise SystemExit(f"HTTP {exc.code}: {detail}") from exc + + +if __name__ == "__main__": + main() diff --git a/src/artifactstore/api/http/__init__.py b/src/artifactstore/api/http/__init__.py index da8f26a..5790207 100644 --- a/src/artifactstore/api/http/__init__.py +++ b/src/artifactstore/api/http/__init__.py @@ -52,6 +52,12 @@ class PackageCreate(BaseModel): subject: str = Field(min_length=1) retention_class: str = Field(min_length=1) metadata: dict[str, Any] = Field(default_factory=dict) + metadata_schema_slug: str | None = None + + +class MetadataSchemaCreate(BaseModel): + slug: str = Field(min_length=1) + json_schema: dict[str, Any] class UploadCreate(BaseModel): @@ -224,6 +230,24 @@ def create_app(settings: Settings | None = None) -> FastAPI: classes = await registry.list_retention_classes() return {"retention_classes": [_retention_class_dict(c) for c in classes]} + @application.post("/metadata-schemas", status_code=status.HTTP_201_CREATED) + async def register_metadata_schema( + body: MetadataSchemaCreate, + _actor: str = Depends(require_write_auth), + registry: Registry = Depends(get_registry), + ) -> dict[str, Any]: + schema_id = await registry.register_metadata_schema( + slug=body.slug, + json_schema=body.json_schema, + ) + schema = await registry.get_metadata_schema(body.slug) + return { + "id": str(schema_id), + "slug": schema.slug, + "json_schema": schema.json_schema, + "created_at": _iso(schema.created_at), + } + @application.post("/packages", status_code=status.HTTP_201_CREATED) async def create_package( body: PackageCreate, @@ -238,6 +262,7 @@ def create_app(settings: Settings | None = None) -> FastAPI: retention_class=body.retention_class, actor=actor, metadata=body.metadata, + metadata_schema_slug=body.metadata_schema_slug, ) return _package_dict(await registry.get_package(package_id)) except ValueError as exc: diff --git a/src/artifactstore/cli/__init__.py b/src/artifactstore/cli/__init__.py index b1a33fe..649beb4 100644 --- a/src/artifactstore/cli/__init__.py +++ b/src/artifactstore/cli/__init__.py @@ -35,8 +35,10 @@ app = typer.Typer( ) retention_app = typer.Typer(help="Retention lifecycle commands", no_args_is_help=True) storage_app = typer.Typer(help="Storage backend commands", no_args_is_help=True) +guide_board_app = typer.Typer(help="Guide-board pilot commands", no_args_is_help=True) app.add_typer(retention_app, name="retention") app.add_typer(storage_app, name="storage") +app.add_typer(guide_board_app, name="guide-board") @app.callback() @@ -208,6 +210,28 @@ def storage_verify( ) +@guide_board_app.command("ingest") +def guide_board_ingest( + run_dir: Path = typer.Argument( + ..., + exists=True, + file_okay=False, + dir_okay=True, + readable=True, + help="Guide-board run directory.", + ), + schema_path: Path = typer.Option( + Path("schemas/guide-board.run.v1.json"), + "--schema", + help="Path to the guide-board metadata schema JSON.", + ), +) -> None: + """Ingest a guide-board run directory through the local registry.""" + settings = get_settings() + result = asyncio.run(_guide_board_ingest_async(settings, run_dir, schema_path)) + typer.echo(json.dumps(result, indent=2)) + + # ---- internals ------------------------------------------------------------- @@ -286,6 +310,34 @@ async def _storage_verify_async( ] +async def _guide_board_ingest_async( + settings: Settings, + run_dir: Path, + schema_path: Path, +) -> dict[str, Any]: + from artifactstore.app import build_registry + from artifactstore.pilots.guide_board import GUIDE_BOARD_SCHEMA_SLUG, ingest_run + + registry: Registry = build_registry(settings) + try: + schema = json.loads(schema_path.read_text(encoding="utf-8")) + if not isinstance(schema, dict): + raise click.BadParameter(f"schema must be a JSON object: {schema_path}") + await registry.register_metadata_schema( + slug=GUIDE_BOARD_SCHEMA_SLUG, + json_schema=schema, + ) + result = await ingest_run(run_dir, registry=registry) + finally: + await registry.dispose() + return { + "package_id": result.package_id, + "manifest_digest": result.manifest_digest, + "file_count": result.file_count, + "reused_existing": result.reused_existing, + } + + def _http_json( method: str, base_url: str, diff --git a/src/artifactstore/events/views.py b/src/artifactstore/events/views.py index 40c9511..080da00 100644 --- a/src/artifactstore/events/views.py +++ b/src/artifactstore/events/views.py @@ -68,7 +68,9 @@ async def _apply_package_created(connection: AsyncConnection, event: Event) -> N producer=payload["producer"], subject=payload["subject"], retention_class=payload["retention_class"], - metadata_schema_id=None, + metadata_schema_id=UUID(payload["metadata_schema_id"]) + if payload.get("metadata_schema_id") + else None, metadata=payload.get("metadata", {}), status="created", manifest_digest=None, diff --git a/src/artifactstore/pilots/__init__.py b/src/artifactstore/pilots/__init__.py new file mode 100644 index 0000000..a315581 --- /dev/null +++ b/src/artifactstore/pilots/__init__.py @@ -0,0 +1 @@ +"""Pilot producer integrations.""" diff --git a/src/artifactstore/pilots/guide_board.py b/src/artifactstore/pilots/guide_board.py new file mode 100644 index 0000000..32eb38e --- /dev/null +++ b/src/artifactstore/pilots/guide_board.py @@ -0,0 +1,308 @@ +"""Guide-board pilot ingestion helper.""" + +from __future__ import annotations + +import json +import mimetypes +import subprocess +from collections.abc import AsyncIterator +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +from artifactstore.registry import Registry + +__all__ = ["GUIDE_BOARD_SCHEMA_SLUG", "GuideBoardIngestResult", "ingest_run"] + +GUIDE_BOARD_SCHEMA_SLUG = "guide-board.run.v1" +CORE_RUN_PATHS = ( + "run.json", + "retention-summary.json", + "plan.json", + "sources.lock.json", + "target-profile.snapshot.json", + "assessment-profile.snapshot.json", + "normalized/evidence.json", + "normalized/findings.json", + "normalized/mappings.json", + "reports/fragments.json", + "reports/submission-package.json", + "exports/export-manifest.json", +) + + +@dataclass(frozen=True, slots=True) +class GuideBoardIngestResult: + package_id: str + manifest_digest: str + file_count: int + reused_existing: bool = False + + +async def ingest_run( + run_dir: str | Path, + *, + registry: Registry, + actor: str = "guide-board", + metadata_schema_slug: str = GUIDE_BOARD_SCHEMA_SLUG, +) -> GuideBoardIngestResult: + """Ingest one guide-board run directory into artifact-store.""" + root = Path(run_dir) + run_json = _read_json(root / "run.json") + retention_summary = _read_json(root / "retention-summary.json") + source_lock = _read_json_if_exists(root / "sources.lock.json") + package_manifest_path = root / "reports" / "assessment-package.json" + package_manifest = _read_json(package_manifest_path) + + metadata = _metadata(run_json, retention_summary, source_lock) + run_id = str(metadata["run_id"]) + existing = await registry.list_packages( + producer="guide-board", + metadata_key="run_id", + metadata_value=run_id, + ) + for package in existing: + if package.status == "finalized" and package.manifest_digest_hex: + return GuideBoardIngestResult( + package_id=str(package.id), + manifest_digest=f"blake3:{package.manifest_digest_hex}", + file_count=0, + reused_existing=True, + ) + + package_id = await registry.create_package( + name=f"guide-board run {run_id}", + producer="guide-board", + subject=str(metadata["target_profile_ref"]), + retention_class=str(retention_summary.get("retention_class", "release-evidence")), + actor=actor, + metadata=metadata, + metadata_schema_slug=metadata_schema_slug, + ) + + paths = _declared_paths(package_manifest) + paths.update(_retained_report_paths(retention_summary)) + paths.add("reports/assessment-package.json") + for rel_path in CORE_RUN_PATHS: + if (root / rel_path).is_file(): + paths.add(rel_path) + for rel_path in sorted(paths): + source = root / rel_path + await registry.ingest_file( + package_id, + relative_path=rel_path, + media_type=mimetypes.guess_type(source.name)[0] or "application/octet-stream", + stream=_file_chunks(source), + actor=actor, + ) + + await registry.finalize_package(package_id, actor=actor) + package = await registry.get_package(package_id) + if package.manifest_digest_hex is None: + raise RuntimeError(f"package {package_id} finalized without manifest digest") + return GuideBoardIngestResult( + package_id=str(package_id), + manifest_digest=f"blake3:{package.manifest_digest_hex}", + file_count=len(paths), + ) + + +def _metadata( + run_json: dict[str, Any], + retention_summary: dict[str, Any], + source_lock: dict[str, Any] | None, +) -> dict[str, Any]: + summary = retention_summary.get("summary", {}) + if not isinstance(summary, dict): + summary = {} + return { + "run_id": str(run_json.get("run_id") or run_json.get("id") or retention_summary["run_id"]), + "target_profile_ref": str(run_json["target_profile_ref"]), + "assessment_profile_ref": str(run_json["assessment_profile_ref"]), + "result_status": str( + run_json.get("result_status") or run_json.get("status") or summary.get("status") + ), + "source_commits": _source_commits(run_json, source_lock), + "report_paths": sorted(_retained_report_paths(retention_summary)), + "evidence_counts": _evidence_counts(retention_summary, summary), + "finding_counts": _finding_counts(retention_summary, summary), + } + + +def _declared_paths(package_manifest: dict[str, Any]) -> set[str]: + paths: set[str] = set() + raw_files = package_manifest.get("files", []) + if raw_files is not None and not isinstance(raw_files, list): + raise ValueError("assessment-package.json 'files' must be a list") + for entry in raw_files or []: + if isinstance(entry, str): + paths.add(entry) + elif isinstance(entry, dict) and isinstance(entry.get("path"), str): + paths.add(entry["path"]) + else: + raise ValueError(f"invalid assessment package file entry: {entry!r}") + + raw_artifacts = package_manifest.get("artifact_manifest", []) + if raw_artifacts is not None and not isinstance(raw_artifacts, list): + raise ValueError("assessment-package.json 'artifact_manifest' must be a list") + for entry in raw_artifacts or []: + if isinstance(entry, dict) and isinstance(entry.get("path"), str): + paths.add(entry["path"]) + else: + raise ValueError(f"invalid assessment package artifact entry: {entry!r}") + return paths + + +def _retained_report_paths(retention_summary: dict[str, Any]) -> set[str]: + paths: set[str] = set() + for key in ("report_paths", "report_refs", "export_refs"): + raw_paths = retention_summary.get(key, []) + if not isinstance(raw_paths, list): + continue + paths.update(path for path in raw_paths if isinstance(path, str) and path) + return paths + + +def _source_commits( + run_json: dict[str, Any], + source_lock: dict[str, Any] | None, +) -> dict[str, str]: + raw = run_json.get("source_commits") + if isinstance(raw, dict): + return {str(key): str(value) for key, value in raw.items()} + + commits: dict[str, str] = {} + if source_lock is not None: + for label, path in _source_paths(source_lock).items(): + commit = _git_head(path) + if commit is not None: + commits[label] = commit + if commits: + return commits + + fingerprints = _source_fingerprints(source_lock) + if fingerprints: + return fingerprints + + return {"unknown": "unrecorded-source"} + + +def _source_paths(source_lock: dict[str, Any]) -> dict[str, Path]: + paths: dict[str, Path] = {} + profiles = source_lock.get("profiles", {}) + if isinstance(profiles, dict): + for key, value in profiles.items(): + if isinstance(value, dict) and isinstance(value.get("path"), str): + paths[f"profile:{key}"] = Path(value["path"]) + + extensions = source_lock.get("extensions", []) + if isinstance(extensions, list): + for entry in extensions: + if not isinstance(entry, dict): + continue + extension_id = str(entry.get("id") or "unknown-extension") + raw_path = entry.get("path") + if isinstance(raw_path, str) and Path(raw_path).is_absolute(): + paths[f"extension:{extension_id}"] = Path(raw_path) + return paths + + +def _git_head(path: Path) -> str | None: + try: + completed = subprocess.run( + ["git", "-C", str(path.parent if path.is_file() else path), "rev-parse", "HEAD"], + check=True, + capture_output=True, + text=True, + timeout=5, + ) + except (OSError, subprocess.CalledProcessError, subprocess.TimeoutExpired): + return None + commit = completed.stdout.strip() + return commit or None + + +def _source_fingerprints(source_lock: dict[str, Any]) -> dict[str, str]: + fingerprints: dict[str, str] = {} + for key, value in source_lock.items(): + if key == "id" and isinstance(value, str): + fingerprints["source_lock"] = value + + profiles = source_lock.get("profiles", {}) + if isinstance(profiles, dict): + for key, value in profiles.items(): + if isinstance(value, dict) and isinstance(value.get("checksum"), str): + fingerprints[f"profile:{key}"] = value["checksum"] + + extensions = source_lock.get("extensions", []) + if isinstance(extensions, list): + for entry in extensions: + if isinstance(entry, dict) and isinstance(entry.get("manifest_checksum"), str): + fingerprints[f"extension:{entry.get('id', 'unknown-extension')}"] = entry[ + "manifest_checksum" + ] + return fingerprints + + +def _evidence_counts( + retention_summary: dict[str, Any], + summary: dict[str, Any], +) -> dict[str, int]: + raw = retention_summary.get("evidence_counts") + if isinstance(raw, dict): + return _int_mapping(raw) + raw_evidence = summary.get("evidence_results") + if isinstance(raw_evidence, dict): + return _int_mapping(raw_evidence) + return {} + + +def _finding_counts( + retention_summary: dict[str, Any], + summary: dict[str, Any], +) -> dict[str, int]: + raw = retention_summary.get("finding_counts") + if isinstance(raw, dict): + return _int_mapping(raw) + keys = ( + "finding_count", + "unexpected_findings", + "expected_findings", + "waived_findings", + "challenged_findings", + "authority_exclusions", + "unresolved_defects", + "unresolved_review_items", + ) + return _int_mapping({key: summary[key] for key in keys if key in summary}) + + +def _int_mapping(raw: dict[str, Any]) -> dict[str, int]: + return { + str(key): int(value) + for key, value in raw.items() + if isinstance(value, int) and not isinstance(value, bool) + } + + +def _read_json(path: Path) -> dict[str, Any]: + with path.open("r", encoding="utf-8") as fh: + payload = json.load(fh) + if not isinstance(payload, dict): + raise ValueError(f"{path} must contain a JSON object") + return payload + + +def _read_json_if_exists(path: Path) -> dict[str, Any] | None: + if not path.exists(): + return None + return _read_json(path) + + +async def _file_chunks(path: Path, chunk_size: int = 64 * 1024) -> AsyncIterator[bytes]: + with path.open("rb") as fh: + while True: + chunk = fh.read(chunk_size) + if not chunk: + break + yield chunk diff --git a/src/artifactstore/registry/__init__.py b/src/artifactstore/registry/__init__.py index 83d8df3..54a5ba3 100644 --- a/src/artifactstore/registry/__init__.py +++ b/src/artifactstore/registry/__init__.py @@ -33,6 +33,7 @@ from artifactstore.dataplane.spi import DataPlane, IngestHints from artifactstore.db.schema import ( artifact_files, artifact_packages, + metadata_schemas, retention_classes, retention_state, storage_locations, @@ -70,6 +71,7 @@ __all__ = [ "FileNotFoundError", "FileRecord", "IllegalPackageStateError", + "MetadataSchemaRecord", "PackageNotFoundError", "PackageRecord", "Registry", @@ -100,6 +102,16 @@ class RetentionStateError(ValueError): """Raised when a retention lifecycle operation is invalid.""" +@dataclass(frozen=True, slots=True) +class MetadataSchemaRecord: + """Registered package metadata schema.""" + + id: UUID + slug: str + json_schema: dict[str, Any] + created_at: datetime | None + + @dataclass(frozen=True, slots=True) class PackageRecord: """Materialised package row projected into the registry API.""" @@ -208,9 +220,15 @@ class Registry: retention_class: str, actor: str, metadata: dict[str, Any] | None = None, + metadata_schema_slug: str | None = None, ) -> UUID: """Create a new package; returns its ``UUID``.""" retention_class_row = await self._get_retention_class(retention_class) + package_metadata = metadata or {} + metadata_schema_id = await self._validate_metadata_schema( + metadata_schema_slug, + package_metadata, + ) package_id = uuid.uuid4() payload = cbor2.dumps( { @@ -218,7 +236,8 @@ class Registry: "producer": producer, "subject": subject, "retention_class": retention_class, - "metadata": metadata or {}, + "metadata": package_metadata, + "metadata_schema_id": str(metadata_schema_id) if metadata_schema_id else None, }, canonical=True, ) @@ -513,6 +532,48 @@ class Registry: for r in rows ] + async def register_metadata_schema( + self, + *, + slug: str, + json_schema: dict[str, Any], + ) -> UUID: + """Register a package metadata JSON Schema, idempotent by slug.""" + schema_id = uuid.uuid4() + async with self._engine.begin() as conn: + existing = ( + await conn.execute( + select(metadata_schemas.c.id).where(metadata_schemas.c.slug == slug) + ) + ).first() + if existing is not None: + return UUID(str(existing.id)) + await conn.execute( + metadata_schemas.insert().values( + id=schema_id, + slug=slug, + json_schema=json_schema, + ) + ) + return schema_id + + async def get_metadata_schema(self, slug: str) -> MetadataSchemaRecord: + """Return one registered metadata schema by slug.""" + async with self._engine.connect() as conn: + row = ( + await conn.execute( + select(metadata_schemas).where(metadata_schemas.c.slug == slug) + ) + ).first() + if row is None: + raise KeyError(f"metadata schema not found: {slug}") + return MetadataSchemaRecord( + id=row.id, + slug=row.slug, + json_schema=dict(row.json_schema), + created_at=row.created_at, + ) + async def get_retention_state(self, package_id: UUID) -> RetentionStateRecord: """Return the retention materialised view for one package.""" async with self._engine.connect() as conn: @@ -902,6 +963,25 @@ class Registry: deletion_strategy=row.deletion_strategy, ) + async def _validate_metadata_schema( + self, + slug: str | None, + metadata: dict[str, Any], + ) -> UUID | None: + if slug is None: + return None + try: + schema = await self.get_metadata_schema(slug) + except KeyError as exc: + raise ValueError(str(exc)) from exc + required = schema.json_schema.get("required", []) + if not isinstance(required, list): + raise ValueError(f"metadata schema {slug!r} has invalid required list") + missing = [key for key in required if isinstance(key, str) and key not in metadata] + if missing: + raise ValueError(f"metadata missing required schema keys: {', '.join(missing)}") + return schema.id + def _iso(value: datetime | None) -> str | None: if value is None: diff --git a/tests/fixtures/guide-board/logs/log-review-summary.json b/tests/fixtures/guide-board/logs/log-review-summary.json new file mode 100644 index 0000000..aa4d31b --- /dev/null +++ b/tests/fixtures/guide-board/logs/log-review-summary.json @@ -0,0 +1,8 @@ +{ + "reviewed_logs": [ + "raw/session/transcript.txt" + ], + "warnings": [ + "Repository returned one optional capability warning." + ] +} diff --git a/tests/fixtures/guide-board/raw/session/browser-response.json b/tests/fixtures/guide-board/raw/session/browser-response.json new file mode 100644 index 0000000..4afc753 --- /dev/null +++ b/tests/fixtures/guide-board/raw/session/browser-response.json @@ -0,0 +1,6 @@ +{ + "repositoryId": "fixture-repo", + "capabilities": { + "capabilityQuery": "metadataonly" + } +} diff --git a/tests/fixtures/guide-board/raw/session/transcript.txt b/tests/fixtures/guide-board/raw/session/transcript.txt new file mode 100644 index 0000000..0c57d49 --- /dev/null +++ b/tests/fixtures/guide-board/raw/session/transcript.txt @@ -0,0 +1,3 @@ +GET /cmis/browser +200 OK +Repository info collected for fixture. diff --git a/tests/fixtures/guide-board/reports/assessment-package.json b/tests/fixtures/guide-board/reports/assessment-package.json new file mode 100644 index 0000000..6a68d9e --- /dev/null +++ b/tests/fixtures/guide-board/reports/assessment-package.json @@ -0,0 +1,12 @@ +{ + "package_version": 1, + "files": [ + { "path": "run.json", "kind": "run-metadata" }, + { "path": "retention-summary.json", "kind": "retention-summary" }, + { "path": "reports/report.md", "kind": "report" }, + { "path": "scorecards/cmis-scorecard.json", "kind": "scorecard" }, + { "path": "logs/log-review-summary.json", "kind": "log-review" }, + { "path": "raw/session/transcript.txt", "kind": "raw-artifact" }, + { "path": "raw/session/browser-response.json", "kind": "raw-artifact" } + ] +} diff --git a/tests/fixtures/guide-board/reports/report.md b/tests/fixtures/guide-board/reports/report.md new file mode 100644 index 0000000..9ef0b47 --- /dev/null +++ b/tests/fixtures/guide-board/reports/report.md @@ -0,0 +1,3 @@ +# Guide-board CMIS Assessment + +Fixture run `gb-fixture-001` completed with one warning and no failed checks. diff --git a/tests/fixtures/guide-board/retention-summary.json b/tests/fixtures/guide-board/retention-summary.json new file mode 100644 index 0000000..c6a0201 --- /dev/null +++ b/tests/fixtures/guide-board/retention-summary.json @@ -0,0 +1,17 @@ +{ + "retention_class": "release-evidence", + "report_paths": [ + "reports/report.md", + "scorecards/cmis-scorecard.json", + "logs/log-review-summary.json" + ], + "evidence_counts": { + "raw_artifacts": 2, + "reports": 3 + }, + "finding_counts": { + "pass": 17, + "warning": 1, + "fail": 0 + } +} diff --git a/tests/fixtures/guide-board/run.json b/tests/fixtures/guide-board/run.json new file mode 100644 index 0000000..eee7951 --- /dev/null +++ b/tests/fixtures/guide-board/run.json @@ -0,0 +1,10 @@ +{ + "run_id": "gb-fixture-001", + "target_profile_ref": "open-cmis-tck:browser-binding", + "assessment_profile_ref": "guide-board:cmis-assessment:v1", + "result_status": "passed-with-findings", + "source_commits": { + "guide-board": "1234567890abcdef", + "open-cmis-tck": "abcdef1234567890" + } +} diff --git a/tests/fixtures/guide-board/scorecards/cmis-scorecard.json b/tests/fixtures/guide-board/scorecards/cmis-scorecard.json new file mode 100644 index 0000000..416621c --- /dev/null +++ b/tests/fixtures/guide-board/scorecards/cmis-scorecard.json @@ -0,0 +1,7 @@ +{ + "scorecard": "cmis-browser-binding", + "checks": 18, + "passed": 17, + "warnings": 1, + "failed": 0 +} diff --git a/tests/integration/test_guide_board_pilot.py b/tests/integration/test_guide_board_pilot.py new file mode 100644 index 0000000..004032a --- /dev/null +++ b/tests/integration/test_guide_board_pilot.py @@ -0,0 +1,110 @@ +"""Guide-board pilot ingestion tests (ARTIFACT-STORE-WP-0005).""" + +from __future__ import annotations + +import json +from collections.abc import AsyncIterator +from pathlib import Path +from uuid import UUID + +import pytest +import pytest_asyncio +from sqlalchemy import create_engine, insert +from sqlalchemy.ext.asyncio import create_async_engine +from typer.testing import CliRunner + +from artifactstore.cli import app as cli_app +from artifactstore.dataplane import InProcessDataPlane +from artifactstore.db.schema import metadata, retention_classes +from artifactstore.db.seed import RETENTION_CLASS_SEEDS +from artifactstore.events import RegistryViewWriter +from artifactstore.manifest import decode as manifest_decode +from artifactstore.pilots.guide_board import GUIDE_BOARD_SCHEMA_SLUG, ingest_run +from artifactstore.registry import Registry +from artifactstore.storage import LocalBackend + +REPO_ROOT = Path(__file__).resolve().parents[2] +FIXTURE = REPO_ROOT / "tests" / "fixtures" / "guide-board" +SCHEMA = REPO_ROOT / "schemas" / "guide-board.run.v1.json" + + +@pytest_asyncio.fixture +async def registry(tmp_path: Path) -> AsyncIterator[Registry]: + db_path = tmp_path / "guide-board.db" + engine = create_async_engine(f"sqlite+aiosqlite:///{db_path}") + async with engine.begin() as conn: + await conn.run_sync(metadata.create_all) + for seed in RETENTION_CLASS_SEEDS: + await conn.execute(insert(retention_classes).values(**seed)) + backend = LocalBackend(tmp_path / "storage", backend_id="local") + reg = Registry(engine, InProcessDataPlane(backend), RegistryViewWriter()) + try: + yield reg + finally: + await reg.dispose() + + +async def _consume(stream: AsyncIterator[bytes]) -> bytes: + out = bytearray() + async for chunk in stream: + out.extend(chunk) + return bytes(out) + + +async def test_guide_board_library_ingest_is_idempotent_and_downloadable( + registry: Registry, +) -> None: + schema = json.loads(SCHEMA.read_text(encoding="utf-8")) + await registry.register_metadata_schema(slug=GUIDE_BOARD_SCHEMA_SLUG, json_schema=schema) + + first = await ingest_run(FIXTURE, registry=registry) + second = await ingest_run(FIXTURE, registry=registry) + + assert first.package_id + assert first.manifest_digest.startswith("blake3:") + assert first.manifest_digest == second.manifest_digest + assert second.reused_existing is True + + manifest = manifest_decode( + await registry.get_manifest_bytes(UUID(first.package_id), format="cbor") + ) + assert manifest.package.producer == "guide-board" + assert manifest.package.metadata_schema_id is not None + assert manifest.retention_summary.retention_class == "release-evidence" + assert len(manifest.files) == 8 + + for file_entry in manifest.files: + stream = await registry.get_file(UUID(file_entry.id)) + assert await _consume(stream) == (FIXTURE / file_entry.relative_path).read_bytes() + + state = await registry.get_retention_state(UUID(first.package_id)) + assert state.effective_class == "release-evidence" + + +def test_guide_board_cli_ingest_outputs_package_and_digest( + tmp_path: Path, + monkeypatch: pytest.MonkeyPatch, +) -> None: + db_path = tmp_path / "guide-board-cli.db" + storage_root = tmp_path / "storage" + storage_root.mkdir() + sync_engine = create_engine(f"sqlite:///{db_path}", future=True) + metadata.create_all(sync_engine) + with sync_engine.begin() as conn: + conn.execute(insert(retention_classes), [dict(s) for s in RETENTION_CLASS_SEEDS]) + sync_engine.dispose() + + monkeypatch.setenv("ARTIFACTSTORE_DATABASE_URL", f"sqlite+aiosqlite:///{db_path}") + monkeypatch.setenv("ARTIFACTSTORE_STORAGE_LOCAL_ROOT", str(storage_root)) + + result = CliRunner().invoke( + cli_app, + ["guide-board", "ingest", str(FIXTURE), "--schema", str(SCHEMA)], + ) + + assert result.exit_code == 0, result.output + payload = json.loads(result.output) + assert payload["package_id"] + assert payload["manifest_digest"].startswith("blake3:") + assert payload["file_count"] == 8 + assert payload["reused_existing"] is False diff --git a/workplans/ARTIFACT-STORE-WP-0005-guide-board-pilot.md b/workplans/ARTIFACT-STORE-WP-0005-guide-board-pilot.md index 64c3226..45beee5 100644 --- a/workplans/ARTIFACT-STORE-WP-0005-guide-board-pilot.md +++ b/workplans/ARTIFACT-STORE-WP-0005-guide-board-pilot.md @@ -4,13 +4,13 @@ type: workplan title: "Guide-Board Pilot Ingestion" repo: artifact-store domain: stack -status: planned +status: active owner: codex topic_slug: stack planning_priority: high planning_order: 5 created: "2026-05-15" -updated: "2026-05-15" +updated: "2026-05-16" state_hub_workstream_id: "701c4d8c-5cf4-4a4a-ab60-1dcae53fe771" --- @@ -41,9 +41,9 @@ bytes itself. This is the pilot success criterion in INTENT.md. ```task id: ARTIFACT-STORE-WP-0005-T001 -status: cancelled +status: done priority: high -state_hub_task_id: "eb822821-353c-4cd2-95bf-acb2f084b7ea" +state_hub_task_id: "830f6822-1cfe-4955-a4e0-5b9a42fb5db1" ``` Acceptance: @@ -61,7 +61,7 @@ Acceptance: ```task id: ARTIFACT-STORE-WP-0005-T002 -status: todo +status: done priority: high state_hub_task_id: "ff0ba2eb-b8d3-418a-8685-a54457cea2ed" ``` @@ -82,7 +82,7 @@ Acceptance: ```task id: ARTIFACT-STORE-WP-0005-T003 -status: todo +status: done priority: high state_hub_task_id: "5c367257-2d2a-4de9-9a06-663ba2c60d77" ``` @@ -106,7 +106,7 @@ Acceptance: ```task id: ARTIFACT-STORE-WP-0005-T004 -status: todo +status: done priority: medium state_hub_task_id: "b1ca7133-ad27-4091-93a0-a4e1b7450791" ``` @@ -124,7 +124,7 @@ Acceptance: ```task id: ARTIFACT-STORE-WP-0005-T005 -status: todo +status: blocked priority: medium state_hub_task_id: "bffa3573-4a1f-4c12-8c73-6d55bd8f6297" ``` @@ -139,6 +139,17 @@ Acceptance: - Procedure runs end-to-end on a developer workstation under 5 minutes. +Blocked note: the artifact-store ingest path was verified against an +existing non-fixture OpenCMIS guide-board run at +`/home/worsch/open-cmis-tck/.local/runs/opencmis-inmemory-pilot` using +an isolated SQLite DB and local storage root. It ingested 23 files, +replayed the event log through sequence 26, and verified 23 storage +locations with zero failures. A fresh guide-board/OpenCMIS producer run +from `~/guide-board` currently stops before artifact-store handoff with +`cmis-summary: report fragment not found: reports/cmis-summary.md`, +which needs to be fixed in the producer/extension before the documented +fresh-run procedure can be marked complete. + ## Success criteria - A real guide-board CMIS run is ingested with one CLI invocation.