--- id: ACTIVITY-WP-0010 type: workplan title: "Daily Triage LLM Reconciliation And Evidence" domain: custodian repo: activity-core status: blocked owner: codex topic_slug: custodian created: "2026-06-18" updated: "2026-06-19" state_hub_workstream_id: "f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9" --- # ACTIVITY-WP-0010 - Daily Triage LLM Reconciliation And Evidence ## Context This workplan implements the in-scope portion of the latest activity-core suggestion review against `INTENT.md` and `SCOPE.md`. Relevant accepted suggestion: - State Hub message `6a098e1e-65de-4309-ab4a-446aba2f3587` from `llm-connect` says `LLM-WP-0006` is complete on the llm-connect side. The stable Service URL is `http://llm-connect.activity-core.svc.cluster.local:8080`, timeout remains `300`, the provider Secret reports populated key count, and the in-namespace fixture smoke passed with schema-valid endpoint behavior. Why this belongs in activity-core: - `INTENT.md` says activity-core owns the **when/what/where** loop for scheduled coordination work. - `SCOPE.md` keeps LLM instruction execution in scope through the llm-connect boundary, while keeping provider credentials and cluster reconciliation out of scope. - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` remain open because daily State Hub WSJF triage has not yet produced three clean scheduled runs after the June 7 runtime projection failure. Suggestions reviewed but not accepted as product/runtime implementation work: - `coding_retro` activity-core suggestions for Bash tool thrash, schema thrash, and read-before-edit hygiene are agent workflow advice. They are useful for Codex operating style, but they do not change activity-core's Event Bridge product surface and should not become runtime code. - The earlier local-kubectl / cluster-owned evidence suggestion for `ACTIVITY-WP-0007` has already been handled by moving live evidence ownership to Railiance and closing the workplan from cluster-owned proof. Latest evidence before this workplan: - State Hub `daily_triage` progress on 2026-06-18 still shows `LLM_CONNECT_URL is not configured`, which means the live activity-core runtime has not yet consumed the repo-side URL update. - `k8s/railiance/20-runtime.yaml` now sets the verified llm-connect Service URL and `LLM_CONNECT_TIMEOUT_SECONDS=300`. ## Confirm Repo-Side Runtime Contract ```task id: ACTIVITY-WP-0010-T01 status: done priority: high state_hub_task_id: "dd52ce21-23b8-4e46-b3af-cb7bf486e40f" ``` Update activity-core's Railiance runtime projection so the daily triage worker consumes the verified llm-connect Service URL by default. Done when: - `k8s/railiance/20-runtime.yaml` sets `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`. - `LLM_CONNECT_TIMEOUT_SECONDS=300` remains configured. - Wiring tests assert the URL and timeout. - The Railiance README states that provider credentials remain operator-owned and outside Git / State Hub. 2026-06-18: Completed. Updated the runtime ConfigMap, README, and `tests/test_railiance_ops_inventory_wiring.py`. Focused tests passed: `tests/test_railiance_ops_inventory_wiring.py tests/test_llm_client.py` reported 9 passed. ## Reconcile Live Railiance Runtime ```task id: ACTIVITY-WP-0010-T02 status: wait priority: high state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a" ``` Apply or reconcile the updated activity-core Railiance runtime through the cluster-owned deployment path, not through ad hoc local kubectl from this repo. Done when non-secret evidence shows: - live `actcore-runtime-config` has the verified `LLM_CONNECT_URL` and timeout; - the activity-core worker has restarted or otherwise consumed the new config; - `activity-core/llm-connect-provider-secrets` remains present with a populated key count only, without printing or storing secret values; - the State Hub bridge remains reachable from the activity-core runtime. Current wait reason: this is Railiance/operator-owned live cluster work. State Hub handoff message `9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8` asks `railiance-cluster` to reconcile the updated config and smoke it. 2026-06-19 recheck: - Deployed `llm-connect` into the `activity-core` namespace on `railiance01` (the cluster that runs `actcore-worker`). `coulombcore` had llm-connect only; the in-cluster Service URL is cluster-local. - `actcore-runtime-config` already exposed the verified URL and timeout; `deployment/actcore-worker` was restarted and now reports `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`. - `llm-connect-provider-secrets` reports `DATA 1`; no Secret values were inspected. - Worker health probe to llm-connect `/health` returns `{"status": "ok"}`. - `actcore-state-hub-bridge` remains `0/1` Ready with upstream timeouts, so T02 is not fully closed until the node-local State Hub tunnel is restored. ## Run Daily Triage Fixture Smoke ```task id: ACTIVITY-WP-0010-T03 status: wait priority: high state_hub_task_id: "10e0df77-c230-4a82-b720-23c66bd17c0a" ``` After T02, run a manual or smoke execution of `daily-statehub-wsjf-triage` against the live activity-core runtime. Done when: - the run calls llm-connect through the configured Service URL; - llm-connect returns content accepted as schema-valid daily-triage JSON; - State Hub receives a `daily_triage` progress item with `output_validated=true`; - the working-memory daily-triage note exists at the path recorded in State Hub detail; - `scripts/verify_daily_triage.py` reports the smoke/manual run as present. 2026-06-19 recheck: - In-namespace llm-connect fixture smoke on `railiance01` passed: `smoke: pass health=ok latency_seconds=1.681 recommendations=1`. - Manual `POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/trigger` reached llm-connect, but the workflow failed at `persist_instruction_reports` with `state-hub-progress` sink `Connection refused` while `actcore-state-hub-bridge` is unhealthy. - T03 therefore remains open until State Hub bridge reachability is restored and a run emits non-secret `daily_triage` progress with `output_validated=true`. ## Collect Three Clean Scheduled Runs ```task id: ACTIVITY-WP-0010-T04 status: wait priority: high state_hub_task_id: "dc6b9482-cf43-4fc5-994b-dcd7dea47db7" ``` Let the normal 07:20 Europe/Berlin schedule produce three consecutive clean daily triage runs after the live config reconciliation. Done when: - three consecutive scheduled runs have Temporal workflow evidence, `activity_runs` rows, State Hub `daily_triage` progress, and working-memory notes; - none of the three runs are merely manual smoke tests or `execution_failed` diagnostics; - calibration feedback is recorded in State Hub; - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` can move from `wait` to `done`. ## Close Handoff State ```task id: ACTIVITY-WP-0010-T05 status: wait priority: medium state_hub_task_id: "ecc57e21-1716-4daa-aba6-d8a6d824e4ed" ``` Update the surrounding workplans and State Hub once the live daily triage gate passes. Done when: - `ACTIVITY-WP-0006` records the three-run calibration evidence; - `ACTIVITY-WP-0009` records the scheduled-run trust gap closure; - any temporary `needs_human` flags created for the llm-connect provider/config handoff are cleared or replaced by a narrower follow-up; - this workplan is marked `finished`.