Add guide-board pilot ingestion

This commit is contained in:
2026-05-17 00:09:11 +02:00
parent 1f379ba321
commit 91bb08c8e5
22 changed files with 1074 additions and 12 deletions

View File

@@ -3,8 +3,9 @@
Status: v0.1 (WP-0003 baseline)
Updated: 2026-05-16
This guide is the user manual for running `artifact-store` v0.1 the
library, CLI, HTTP ingestion API, manifest surface, and retention lifecycle.
This guide is the user manual for running `artifact-store` v0.1: the library,
CLI, HTTP ingestion API, manifest surface, retention lifecycle, storage checks,
and the guide-board pilot path.
For architectural background see
[ARCHITECTURE-BLUEPRINT.md](ARCHITECTURE-BLUEPRINT.md), the ADRs under
@@ -52,6 +53,7 @@ All settings are prefixed with ``ARTIFACTSTORE_`` and read by
| `ARTIFACTSTORE_ANON_READ` | `false` | Set `true` only for local demos where read endpoints may be anonymous. |
| `ARTIFACTSTORE_API_URL` | `http://127.0.0.1:8000` | Default API base URL used by HTTP-backed CLI commands. |
| `ARTIFACTSTORE_API_TOKEN` | empty | Default bearer token used by HTTP-backed CLI commands. |
| `ARTIFACTSTORE_GUIDE_BOARD_SCHEMA` | `schemas/guide-board.run.v1.json` | Schema path used by guide-board pilot bootstrap helpers. |
| `ARTIFACTSTORE_RETENTION_CONFIG_PATH` | empty | Optional TOML file overriding retention-class default durations. |
| `ARTIFACTSTORE_RETENTION_SWEEP_INTERVAL_SECONDS` | `3600` | Default interval for external schedulers that invoke the retention sweeper. |
| `ARTIFACTSTORE_STORAGE_BACKENDS` | `local` | Comma-separated backend IDs to configure (`local`, `s3`). |
@@ -67,6 +69,9 @@ All settings are prefixed with ``ARTIFACTSTORE_`` and read by
| `ARTIFACTSTORE_S3_SSE` | empty | Optional server-side encryption value, e.g. `AES256`. |
| `ARTIFACTSTORE_S3_MULTIPART_THRESHOLD_BYTES` | `67108864` | Multipart threshold for the S3 backend. |
| `ARTIFACTSTORE_S3_MULTIPART_CHUNK_BYTES` | `8388608` | Multipart part size for the S3 backend. |
| `STATE_HUB_URL` | `http://127.0.0.1:8000` | State Hub base URL used by guide-board linkage helpers. |
| `STATE_HUB_WORKSTREAM_ID` | empty | Optional workstream id for State Hub linkage events. |
| `STATE_HUB_TASK_ID` | empty | Optional task id for State Hub linkage events. |
See [`.env.example`](../.env.example) for the canonical template.
@@ -201,6 +206,7 @@ digest, emits `v1.storage.location_verified`, and marks failed locations as
| `artifactstore manifest <package_id>` | Fetch the JSON manifest projection through the HTTP API. |
| `artifactstore retention sweep` | Run one deletion-eligibility sweep against the configured DB. |
| `artifactstore storage verify --backend <id>` | Re-read stored objects for a backend and record verification events. |
| `artifactstore guide-board ingest <run-dir>` | Ingest one guide-board run directory as an artifact package. |
The CLI is a thin client over `artifactstore.registry.Registry`
(see [ADR-0005](adr/0005-v1-tech-stack.md)).
@@ -215,6 +221,7 @@ The CLI is a thin client over `artifactstore.registry.Registry`
| `/files...` | File metadata and byte downloads, including single-range reads. |
| `/uploads...` | Upload-session wire shape for whole-body v1 uploads. |
| `/packages/{id}/retention...` | Extend retention, apply/release holds, and read retention history. |
| `POST /metadata-schemas` | Register package metadata schemas by slug. |
| `GET /events` | Long-poll event feed, CBOR by default or JSON with `Accept: application/json`. |
All non-health routes require a bearer token unless
@@ -267,6 +274,14 @@ asyncio.run(main())
Prerequisites: `make migrate-fresh` has been run so the schema and the
retention class seeds exist.
## Guide-board pilot
The guide-board pilot stores a run directory as one artifact package and records
only package identifiers in State Hub. See
[docs/pilots/guide-board.md](pilots/guide-board.md) for schema registration,
the real `~/guide-board` plus `~/open-cmis-tck` smoke procedure, and the exact
`POST /progress/` linkage payload.
## Replay / disaster recovery
Every state-changing operation writes one row to `events` and updates the
@@ -303,6 +318,8 @@ sequence order through the canonical view writer. The result is
and the v1 schema commitments.
- [ROADMAP.md](ROADMAP.md) — workplan sequencing.
- [ASSEMBLY-EXPERIMENT.md](ASSEMBLY-EXPERIMENT.md) — opt-in asm research line.
- [pilots/guide-board.md](pilots/guide-board.md) — guide-board pilot ingestion
and State Hub linkage.
### Architecture Decision Records

162
docs/pilots/guide-board.md Normal file
View File

@@ -0,0 +1,162 @@
# Guide-Board Pilot
Status: active pilot
Updated: 2026-05-16
This guide wires the first real producer into artifact-store. A guide-board run
directory becomes one artifact package; State Hub records the package identity
and manifest digest, but never stores artifact bytes.
## One-Time Schema Registration
Start artifact-store and register the pilot metadata schema:
```sh
cd /home/worsch/artifact-store
export ARTIFACTSTORE_API_URL=http://127.0.0.1:8000
export ARTIFACTSTORE_API_TOKEN=dev-token
python3 scripts/register-guide-board-schema.py
```
The script posts this payload shape to `POST /metadata-schemas`:
```json
{
"slug": "guide-board.run.v1",
"json_schema": {
"$id": "artifactstore:schemas:guide-board.run.v1"
}
}
```
## Ingest A Run
The local CLI path opens the configured database and storage backend directly:
```sh
artifactstore guide-board ingest /tmp/guide-board-run \
--schema schemas/guide-board.run.v1.json
```
Output is JSON:
```json
{
"package_id": "00000000-0000-0000-0000-000000000000",
"manifest_digest": "blake3:...",
"file_count": 8,
"reused_existing": false
}
```
The helper is idempotent by guide-board `run_id`. Re-ingesting the same
finalized run returns the existing package id and manifest digest with
`reused_existing: true`.
## State Hub Linkage
After ingest, record a progress event with structured `detail`. This is the
canonical linkage shape:
```sh
curl -s -X POST "$STATE_HUB_URL/progress/" \
-H "Content-Type: application/json" \
-d '{
"event_type": "artifact_link",
"author": "artifact-store",
"workstream_id": "701c4d8c-5cf4-4a4a-ab60-1dcae53fe771",
"task_id": "bffa3573-4a1f-4c12-8c73-6d55bd8f6297",
"summary": "guide-board run <run_id> artifacts stored in artifact-store package <package_id>",
"detail": {
"producer": "guide-board",
"artifact_store_api_url": "http://127.0.0.1:8000",
"run_dir": "/tmp/guide-board-run",
"run_id": "<run_id>",
"target_profile_ref": "<target>",
"assessment_profile_ref": "<assessment>",
"result_status": "<status>",
"package_id": "<package_id>",
"manifest_digest": "<manifest_digest>",
"file_count": 8,
"retention_class": "release-evidence"
}
}'
```
Use the checked-in helper to build the same event from environment variables:
```sh
export STATE_HUB_URL=http://127.0.0.1:8000
export STATE_HUB_WORKSTREAM_ID=701c4d8c-5cf4-4a4a-ab60-1dcae53fe771
export STATE_HUB_TASK_ID=bffa3573-4a1f-4c12-8c73-6d55bd8f6297
export GUIDE_BOARD_RUN_DIR=/tmp/guide-board-run
export ARTIFACTSTORE_INGEST_RESULT_PATH=/tmp/artifactstore-guide-board-ingest.json
python3 scripts/link-guide-board-package.py
```
The helper posts only identifiers, summary metadata, and links. Artifact bytes
remain in artifact-store storage backends.
## Real Producer Smoke
This path uses the real guide-board core and the external `open-cmis-tck`
extension. It is expected to complete under five minutes on a developer
workstation once Python dependencies and local candidate prerequisites are in
place.
1. Produce a guide-board run:
```sh
cd /home/worsch/guide-board
mkdir -p /tmp/guide-board-artifact-store-smoke
PYTHONPATH=src python3 -m guide_board \
--extension-dir ../open-cmis-tck \
run \
--target ../open-cmis-tck/profiles/targets/kontextual-cmis-compat.json \
--assessment ../open-cmis-tck/profiles/assessments/cmis-browser-baseline.json \
--output-dir /tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline
```
2. Start artifact-store:
```sh
cd /home/worsch/artifact-store
cp .env.example .env
make migrate-fresh
make dev
```
3. Register the schema and ingest the run:
```sh
export ARTIFACTSTORE_API_TOKEN=dev-token
python3 scripts/register-guide-board-schema.py
artifactstore guide-board ingest \
/tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline \
--schema schemas/guide-board.run.v1.json \
> /tmp/artifactstore-guide-board-ingest.json
cat /tmp/artifactstore-guide-board-ingest.json
```
4. Verify the manifest:
```sh
PACKAGE_ID=$(python3 -c 'import json; print(json.load(open("/tmp/artifactstore-guide-board-ingest.json"))["package_id"])')
artifactstore manifest "$PACKAGE_ID"
```
5. Record State Hub linkage:
```sh
export STATE_HUB_URL=http://127.0.0.1:8000
export STATE_HUB_WORKSTREAM_ID=701c4d8c-5cf4-4a4a-ab60-1dcae53fe771
export STATE_HUB_TASK_ID=bffa3573-4a1f-4c12-8c73-6d55bd8f6297
export GUIDE_BOARD_RUN_DIR=/tmp/guide-board-artifact-store-smoke/open-cmis-tck-baseline
export ARTIFACTSTORE_INGEST_RESULT_PATH=/tmp/artifactstore-guide-board-ingest.json
python3 scripts/link-guide-board-package.py
```
To smoke the storage swap after enabling WP-0004 S3 settings, keep the same
guide-board ingest command and set
`ARTIFACTSTORE_STORAGE_BACKEND_ROUTES='guide-board:release-evidence=s3,*:*=local'`
before starting artifact-store.