Add activity-core cluster verifier
Some checks failed
railiance-tests / smoke (push) Has been cancelled
Some checks failed
railiance-tests / smoke (push) Has been cancelled
This commit is contained in:
@@ -0,0 +1,110 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0012
|
||||
type: workplan
|
||||
title: "activity-core cluster-owned deploy/verify"
|
||||
domain: railiance
|
||||
repo: railiance-cluster
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
created: "2026-06-15"
|
||||
updated: "2026-06-16"
|
||||
state_hub_workstream_id: "6434f7cb-e13c-4c05-839b-197bb239d5cd"
|
||||
---
|
||||
|
||||
# activity-core cluster-owned deploy/verify
|
||||
|
||||
## Context
|
||||
|
||||
activity-core `ACTIVITY-WP-0007-T06` needs live Railiance cluster evidence for
|
||||
the disabled ops inventory probe. That live verification should be owned by the
|
||||
cluster/operator layer, not by arbitrary activity-core sessions with local
|
||||
`kubectl` assumptions.
|
||||
|
||||
This workplan creates a cluster-owned path that keeps credentials in
|
||||
operator-owned locations while returning only non-secret evidence to State Hub.
|
||||
|
||||
## Implement cluster-owned verifier
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0012-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3769fdfb-b4f1-431b-a55a-672d93b3ea55"
|
||||
```
|
||||
|
||||
Add a repeatable command that:
|
||||
|
||||
- reconciles the activity-core Railiance runtime bundle;
|
||||
- reruns `actcore-sync`;
|
||||
- checks the `ops-service-inventory-probes` ActivityDefinition exists and is
|
||||
still disabled;
|
||||
- triggers the disabled definition manually through the in-cluster API path;
|
||||
- verifies a fresh `ops_inventory_probe` progress event exists in State Hub;
|
||||
- posts a non-secret State Hub evidence note for activity-core to cite.
|
||||
|
||||
Implemented as `tools/cmd/railiance-verify-activity-core` with Makefile target
|
||||
`verify-activity-core`. The script defaults to the `railiance01` SSH executor;
|
||||
use `ACTIVITY_CORE_CLUSTER_HOST=local` only for an explicitly selected local
|
||||
`kubectl` context.
|
||||
|
||||
## Run live verification and publish evidence
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0012-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "6d7f87c3-a533-4de1-84de-9ca65f2e2779"
|
||||
```
|
||||
|
||||
Run `make verify-activity-core` against the Railiance cluster. On success, cite
|
||||
the State Hub evidence note id in this task and in activity-core
|
||||
`ACTIVITY-WP-0007-T06`.
|
||||
|
||||
If a gate fails, the verifier must still post a non-secret State Hub note with
|
||||
the failing gate and last completed evidence fields.
|
||||
|
||||
2026-06-15: Completed against Railiance01 after refreshing the same-tag
|
||||
`activity-core:railiance01-prod` image from activity-core commit `ab17378`,
|
||||
importing digest `sha256:cff43c72455b9fc4fc11a0a997b4671a38987bb4583a600245dd961965af0e40`
|
||||
into k3s containerd, syncing the current runtime bundle to
|
||||
`/home/tegwick/activity-core/k8s/railiance`, and restarting the activity-core
|
||||
runtime deployments. The verifier reconciled the runtime bundle, completed
|
||||
`actcore-sync`, confirmed `ops-service-inventory-probes` exists and remains
|
||||
disabled, triggered it manually, verified State Hub progress
|
||||
`4c82360d-33e7-455b-8ab4-33facd4a3f8e`, and posted evidence note
|
||||
`baeeaeac-aa6d-4406-ae64-e54577f21386`.
|
||||
|
||||
An intermediate verifier invocation accidentally targeted the local
|
||||
CoulombCore `kubectl` context. It created only `actcore-*` runtime resources in
|
||||
the existing `activity-core` namespace; those resources were removed with the
|
||||
runtime manifest cleanup, and the pre-existing `llm-connect` deployment remains
|
||||
running.
|
||||
|
||||
Operational cleanup note: the successful Railiance01 verifier run used
|
||||
`ACTIVITY_CORE_RESTART_DEPLOYMENTS=1` after importing the same-tag image. The
|
||||
script was corrected afterward to restart only `actcore-api`,
|
||||
`actcore-worker`, and `actcore-event-router`, because
|
||||
`actcore-state-hub-bridge` uses host networking and a rolling restart leaves a
|
||||
new bridge pod pending behind the host-bound running pod. A 2026-06-16 cleanup
|
||||
check showed the bridge rollout had settled on Railiance01: the host-bound
|
||||
bridge pod was running and the replacement ReplicaSet was scaled to zero, so no
|
||||
manual live cleanup was needed.
|
||||
|
||||
## Handoff closure to activity-core
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0012-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "43f652c6-fcc4-49fa-90cc-4122eb6d5321"
|
||||
```
|
||||
|
||||
After live evidence exists, update activity-core `ACTIVITY-WP-0007-T06` to cite
|
||||
the Railiance evidence and close it if Inter-Hub submission is active or
|
||||
explicitly deferred with the clean State Hub fallback result.
|
||||
|
||||
2026-06-15: Updated activity-core `ACTIVITY-WP-0007-T06` to cite Railiance
|
||||
evidence note `baeeaeac-aa6d-4406-ae64-e54577f21386` and close the task with
|
||||
Inter-Hub submission explicitly deferred while the State Hub fallback evidence
|
||||
path is verified.
|
||||
Reference in New Issue
Block a user