Note the 2026-06-19 live reconciliation on railiance01: llm-connect deployed, worker restarted with LLM_CONNECT_URL, fixture smoke passed. Manual daily triage still blocked on actcore-state-hub-bridge reachability.
7.2 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| ACTIVITY-WP-0010 | workplan | Daily Triage LLM Reconciliation And Evidence | custodian | activity-core | blocked | codex | custodian | 2026-06-18 | 2026-06-19 | f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9 |
ACTIVITY-WP-0010 - Daily Triage LLM Reconciliation And Evidence
Context
This workplan implements the in-scope portion of the latest activity-core
suggestion review against INTENT.md and SCOPE.md.
Relevant accepted suggestion:
- State Hub message
6a098e1e-65de-4309-ab4a-446aba2f3587fromllm-connectsaysLLM-WP-0006is complete on the llm-connect side. The stable Service URL ishttp://llm-connect.activity-core.svc.cluster.local:8080, timeout remains300, the provider Secret reports populated key count, and the in-namespace fixture smoke passed with schema-valid endpoint behavior.
Why this belongs in activity-core:
INTENT.mdsays activity-core owns the when/what/where loop for scheduled coordination work.SCOPE.mdkeeps LLM instruction execution in scope through the llm-connect boundary, while keeping provider credentials and cluster reconciliation out of scope.ACTIVITY-WP-0006-T03andACTIVITY-WP-0009-T01remain open because daily State Hub WSJF triage has not yet produced three clean scheduled runs after the June 7 runtime projection failure.
Suggestions reviewed but not accepted as product/runtime implementation work:
coding_retroactivity-core suggestions for Bash tool thrash, schema thrash, and read-before-edit hygiene are agent workflow advice. They are useful for Codex operating style, but they do not change activity-core's Event Bridge product surface and should not become runtime code.- The earlier local-kubectl / cluster-owned evidence suggestion for
ACTIVITY-WP-0007has already been handled by moving live evidence ownership to Railiance and closing the workplan from cluster-owned proof.
Latest evidence before this workplan:
- State Hub
daily_triageprogress on 2026-06-18 still showsLLM_CONNECT_URL is not configured, which means the live activity-core runtime has not yet consumed the repo-side URL update. k8s/railiance/20-runtime.yamlnow sets the verified llm-connect Service URL andLLM_CONNECT_TIMEOUT_SECONDS=300.
Confirm Repo-Side Runtime Contract
id: ACTIVITY-WP-0010-T01
status: done
priority: high
state_hub_task_id: "dd52ce21-23b8-4e46-b3af-cb7bf486e40f"
Update activity-core's Railiance runtime projection so the daily triage worker consumes the verified llm-connect Service URL by default.
Done when:
k8s/railiance/20-runtime.yamlsetsLLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080.LLM_CONNECT_TIMEOUT_SECONDS=300remains configured.- Wiring tests assert the URL and timeout.
- The Railiance README states that provider credentials remain operator-owned and outside Git / State Hub.
2026-06-18: Completed. Updated the runtime ConfigMap, README, and
tests/test_railiance_ops_inventory_wiring.py. Focused tests passed:
tests/test_railiance_ops_inventory_wiring.py tests/test_llm_client.py
reported 9 passed.
Reconcile Live Railiance Runtime
id: ACTIVITY-WP-0010-T02
status: wait
priority: high
state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a"
Apply or reconcile the updated activity-core Railiance runtime through the cluster-owned deployment path, not through ad hoc local kubectl from this repo.
Done when non-secret evidence shows:
- live
actcore-runtime-confighas the verifiedLLM_CONNECT_URLand timeout; - the activity-core worker has restarted or otherwise consumed the new config;
activity-core/llm-connect-provider-secretsremains present with a populated key count only, without printing or storing secret values;- the State Hub bridge remains reachable from the activity-core runtime.
Current wait reason: this is Railiance/operator-owned live cluster work. State
Hub handoff message 9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8 asks
railiance-cluster to reconcile the updated config and smoke it.
2026-06-19 recheck:
- Deployed
llm-connectinto theactivity-corenamespace onrailiance01(the cluster that runsactcore-worker).coulombcorehad llm-connect only; the in-cluster Service URL is cluster-local. actcore-runtime-configalready exposed the verified URL and timeout;deployment/actcore-workerwas restarted and now reportsLLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080.llm-connect-provider-secretsreportsDATA 1; no Secret values were inspected.- Worker health probe to llm-connect
/healthreturns{"status": "ok"}. actcore-state-hub-bridgeremains0/1Ready with upstream timeouts, so T02 is not fully closed until the node-local State Hub tunnel is restored.
Run Daily Triage Fixture Smoke
id: ACTIVITY-WP-0010-T03
status: wait
priority: high
state_hub_task_id: "10e0df77-c230-4a82-b720-23c66bd17c0a"
After T02, run a manual or smoke execution of
daily-statehub-wsjf-triage against the live activity-core runtime.
Done when:
- the run calls llm-connect through the configured Service URL;
- llm-connect returns content accepted as schema-valid daily-triage JSON;
- State Hub receives a
daily_triageprogress item withoutput_validated=true; - the working-memory daily-triage note exists at the path recorded in State Hub detail;
scripts/verify_daily_triage.pyreports the smoke/manual run as present.
2026-06-19 recheck:
- In-namespace llm-connect fixture smoke on
railiance01passed:smoke: pass health=ok latency_seconds=1.681 recommendations=1. - Manual
POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/triggerreached llm-connect, but the workflow failed atpersist_instruction_reportswithstate-hub-progresssinkConnection refusedwhileactcore-state-hub-bridgeis unhealthy. - T03 therefore remains open until State Hub bridge reachability is restored and
a run emits non-secret
daily_triageprogress withoutput_validated=true.
Collect Three Clean Scheduled Runs
id: ACTIVITY-WP-0010-T04
status: wait
priority: high
state_hub_task_id: "dc6b9482-cf43-4fc5-994b-dcd7dea47db7"
Let the normal 07:20 Europe/Berlin schedule produce three consecutive clean daily triage runs after the live config reconciliation.
Done when:
- three consecutive scheduled runs have Temporal workflow evidence,
activity_runsrows, State Hubdaily_triageprogress, and working-memory notes; - none of the three runs are merely manual smoke tests or
execution_faileddiagnostics; - calibration feedback is recorded in State Hub;
ACTIVITY-WP-0006-T03andACTIVITY-WP-0009-T01can move fromwaittodone.
Close Handoff State
id: ACTIVITY-WP-0010-T05
status: wait
priority: medium
state_hub_task_id: "ecc57e21-1716-4daa-aba6-d8a6d824e4ed"
Update the surrounding workplans and State Hub once the live daily triage gate passes.
Done when:
ACTIVITY-WP-0006records the three-run calibration evidence;ACTIVITY-WP-0009records the scheduled-run trust gap closure;- any temporary
needs_humanflags created for the llm-connect provider/config handoff are cleared or replaced by a narrower follow-up; - this workplan is marked
finished.