generated from coulomb/repo-seed
Note the 2026-06-19 live reconciliation on railiance01: llm-connect deployed, worker restarted with LLM_CONNECT_URL, fixture smoke passed. Manual daily triage still blocked on actcore-state-hub-bridge reachability.
198 lines
7.2 KiB
Markdown
198 lines
7.2 KiB
Markdown
---
|
|
id: ACTIVITY-WP-0010
|
|
type: workplan
|
|
title: "Daily Triage LLM Reconciliation And Evidence"
|
|
domain: custodian
|
|
repo: activity-core
|
|
status: blocked
|
|
owner: codex
|
|
topic_slug: custodian
|
|
created: "2026-06-18"
|
|
updated: "2026-06-19"
|
|
state_hub_workstream_id: "f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9"
|
|
---
|
|
|
|
# ACTIVITY-WP-0010 - Daily Triage LLM Reconciliation And Evidence
|
|
|
|
## Context
|
|
|
|
This workplan implements the in-scope portion of the latest activity-core
|
|
suggestion review against `INTENT.md` and `SCOPE.md`.
|
|
|
|
Relevant accepted suggestion:
|
|
|
|
- State Hub message `6a098e1e-65de-4309-ab4a-446aba2f3587` from
|
|
`llm-connect` says `LLM-WP-0006` is complete on the llm-connect side. The
|
|
stable Service URL is
|
|
`http://llm-connect.activity-core.svc.cluster.local:8080`, timeout remains
|
|
`300`, the provider Secret reports populated key count, and the in-namespace
|
|
fixture smoke passed with schema-valid endpoint behavior.
|
|
|
|
Why this belongs in activity-core:
|
|
|
|
- `INTENT.md` says activity-core owns the **when/what/where** loop for
|
|
scheduled coordination work.
|
|
- `SCOPE.md` keeps LLM instruction execution in scope through the llm-connect
|
|
boundary, while keeping provider credentials and cluster reconciliation out of
|
|
scope.
|
|
- `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` remain open because daily
|
|
State Hub WSJF triage has not yet produced three clean scheduled runs after
|
|
the June 7 runtime projection failure.
|
|
|
|
Suggestions reviewed but not accepted as product/runtime implementation work:
|
|
|
|
- `coding_retro` activity-core suggestions for Bash tool thrash, schema thrash,
|
|
and read-before-edit hygiene are agent workflow advice. They are useful for
|
|
Codex operating style, but they do not change activity-core's Event Bridge
|
|
product surface and should not become runtime code.
|
|
- The earlier local-kubectl / cluster-owned evidence suggestion for
|
|
`ACTIVITY-WP-0007` has already been handled by moving live evidence ownership
|
|
to Railiance and closing the workplan from cluster-owned proof.
|
|
|
|
Latest evidence before this workplan:
|
|
|
|
- State Hub `daily_triage` progress on 2026-06-18 still shows
|
|
`LLM_CONNECT_URL is not configured`, which means the live activity-core
|
|
runtime has not yet consumed the repo-side URL update.
|
|
- `k8s/railiance/20-runtime.yaml` now sets the verified llm-connect Service URL
|
|
and `LLM_CONNECT_TIMEOUT_SECONDS=300`.
|
|
|
|
## Confirm Repo-Side Runtime Contract
|
|
|
|
```task
|
|
id: ACTIVITY-WP-0010-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "dd52ce21-23b8-4e46-b3af-cb7bf486e40f"
|
|
```
|
|
|
|
Update activity-core's Railiance runtime projection so the daily triage worker
|
|
consumes the verified llm-connect Service URL by default.
|
|
|
|
Done when:
|
|
|
|
- `k8s/railiance/20-runtime.yaml` sets
|
|
`LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
|
|
- `LLM_CONNECT_TIMEOUT_SECONDS=300` remains configured.
|
|
- Wiring tests assert the URL and timeout.
|
|
- The Railiance README states that provider credentials remain operator-owned
|
|
and outside Git / State Hub.
|
|
|
|
2026-06-18: Completed. Updated the runtime ConfigMap, README, and
|
|
`tests/test_railiance_ops_inventory_wiring.py`. Focused tests passed:
|
|
`tests/test_railiance_ops_inventory_wiring.py tests/test_llm_client.py`
|
|
reported 9 passed.
|
|
|
|
## Reconcile Live Railiance Runtime
|
|
|
|
```task
|
|
id: ACTIVITY-WP-0010-T02
|
|
status: wait
|
|
priority: high
|
|
state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a"
|
|
```
|
|
|
|
Apply or reconcile the updated activity-core Railiance runtime through the
|
|
cluster-owned deployment path, not through ad hoc local kubectl from this repo.
|
|
|
|
Done when non-secret evidence shows:
|
|
|
|
- live `actcore-runtime-config` has the verified `LLM_CONNECT_URL` and timeout;
|
|
- the activity-core worker has restarted or otherwise consumed the new config;
|
|
- `activity-core/llm-connect-provider-secrets` remains present with a populated
|
|
key count only, without printing or storing secret values;
|
|
- the State Hub bridge remains reachable from the activity-core runtime.
|
|
|
|
Current wait reason: this is Railiance/operator-owned live cluster work. State
|
|
Hub handoff message `9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8` asks
|
|
`railiance-cluster` to reconcile the updated config and smoke it.
|
|
|
|
2026-06-19 recheck:
|
|
|
|
- Deployed `llm-connect` into the `activity-core` namespace on `railiance01`
|
|
(the cluster that runs `actcore-worker`). `coulombcore` had llm-connect only;
|
|
the in-cluster Service URL is cluster-local.
|
|
- `actcore-runtime-config` already exposed the verified URL and timeout;
|
|
`deployment/actcore-worker` was restarted and now reports
|
|
`LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
|
|
- `llm-connect-provider-secrets` reports `DATA 1`; no Secret values were
|
|
inspected.
|
|
- Worker health probe to llm-connect `/health` returns `{"status": "ok"}`.
|
|
- `actcore-state-hub-bridge` remains `0/1` Ready with upstream timeouts, so T02
|
|
is not fully closed until the node-local State Hub tunnel is restored.
|
|
|
|
## Run Daily Triage Fixture Smoke
|
|
|
|
```task
|
|
id: ACTIVITY-WP-0010-T03
|
|
status: wait
|
|
priority: high
|
|
state_hub_task_id: "10e0df77-c230-4a82-b720-23c66bd17c0a"
|
|
```
|
|
|
|
After T02, run a manual or smoke execution of
|
|
`daily-statehub-wsjf-triage` against the live activity-core runtime.
|
|
|
|
Done when:
|
|
|
|
- the run calls llm-connect through the configured Service URL;
|
|
- llm-connect returns content accepted as schema-valid daily-triage JSON;
|
|
- State Hub receives a `daily_triage` progress item with `output_validated=true`;
|
|
- the working-memory daily-triage note exists at the path recorded in State Hub
|
|
detail;
|
|
- `scripts/verify_daily_triage.py` reports the smoke/manual run as present.
|
|
|
|
2026-06-19 recheck:
|
|
|
|
- In-namespace llm-connect fixture smoke on `railiance01` passed:
|
|
`smoke: pass health=ok latency_seconds=1.681 recommendations=1`.
|
|
- Manual `POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/trigger`
|
|
reached llm-connect, but the workflow failed at `persist_instruction_reports`
|
|
with `state-hub-progress` sink `Connection refused` while
|
|
`actcore-state-hub-bridge` is unhealthy.
|
|
- T03 therefore remains open until State Hub bridge reachability is restored and
|
|
a run emits non-secret `daily_triage` progress with `output_validated=true`.
|
|
|
|
## Collect Three Clean Scheduled Runs
|
|
|
|
```task
|
|
id: ACTIVITY-WP-0010-T04
|
|
status: wait
|
|
priority: high
|
|
state_hub_task_id: "dc6b9482-cf43-4fc5-994b-dcd7dea47db7"
|
|
```
|
|
|
|
Let the normal 07:20 Europe/Berlin schedule produce three consecutive clean
|
|
daily triage runs after the live config reconciliation.
|
|
|
|
Done when:
|
|
|
|
- three consecutive scheduled runs have Temporal workflow evidence,
|
|
`activity_runs` rows, State Hub `daily_triage` progress, and working-memory
|
|
notes;
|
|
- none of the three runs are merely manual smoke tests or `execution_failed`
|
|
diagnostics;
|
|
- calibration feedback is recorded in State Hub;
|
|
- `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` can move from `wait` to
|
|
`done`.
|
|
|
|
## Close Handoff State
|
|
|
|
```task
|
|
id: ACTIVITY-WP-0010-T05
|
|
status: wait
|
|
priority: medium
|
|
state_hub_task_id: "ecc57e21-1716-4daa-aba6-d8a6d824e4ed"
|
|
```
|
|
|
|
Update the surrounding workplans and State Hub once the live daily triage gate
|
|
passes.
|
|
|
|
Done when:
|
|
|
|
- `ACTIVITY-WP-0006` records the three-run calibration evidence;
|
|
- `ACTIVITY-WP-0009` records the scheduled-run trust gap closure;
|
|
- any temporary `needs_human` flags created for the llm-connect provider/config
|
|
handoff are cleared or replaced by a narrower follow-up;
|
|
- this workplan is marked `finished`.
|