From 44be583cdd76c758954f093b73ff40b89490d6b2 Mon Sep 17 00:00:00 2001 From: tegwick Date: Tue, 19 May 2026 15:49:47 +0200 Subject: [PATCH] Add activity-core daily triage runner workplan --- ...-0045-activity-core-daily-triage-runner.md | 324 ++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 workplans/CUST-WP-0045-activity-core-daily-triage-runner.md diff --git a/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md new file mode 100644 index 0000000..2a77fef --- /dev/null +++ b/workplans/CUST-WP-0045-activity-core-daily-triage-runner.md @@ -0,0 +1,324 @@ +--- +id: CUST-WP-0045 +type: workplan +title: "Activity-Core Daily Triage Runner Cutover" +domain: custodian +repo: the-custodian +status: ready +owner: custodian +topic_slug: custodian +planning_priority: high +planning_order: 45 +created: "2026-05-19" +updated: "2026-05-19" +--- + +# CUST-WP-0045 - Activity-Core Daily Triage Runner Cutover + +## Goal + +Move the Daily State Hub WSJF Triage runner from the Codex app automation +substrate to owned activity-core infrastructure. + +The outcome should be a reliable daily run at 07:20 Europe/Berlin that produces +the same review artifact promised by `CUST-WP-0044`: a dated working-memory +note, a State Hub `daily_triage` progress event, and an auditable activity-core +run record. + +## Context + +On 2026-05-19 the Codex app automation fired at the scheduled time, but did not +complete a useful run: + +- two `Daily State Hub WSJF Triage` sessions were created at 07:20 Europe/Berlin +- both session files contained only session metadata +- no prompt execution, report, tool call, working-memory note, or final answer + was recorded +- State Hub had no `daily_triage` progress event for that date +- the recorded session cwd values used Windows-style `C:\home\worsch\...` + paths rather than the intended WSL paths + +This shows the schedule is present but the launch substrate is not trustworthy +enough for an unattended Custodian operating habit. + +activity-core already provides the pieces that should own this class of work: + +- Temporal cron schedules with timezone and misfire-policy handling +- `ActivityDefinition` markdown ingestion via `ACTIVITY_DEFINITION_DIRS` +- `state-hub` context resolver hooks +- ActivityRun logging and Temporal workflow history +- rule/instruction model design in `ACT-ADR-003` +- deployment/runbook paths for the Railiance environment + +The missing work is to connect those existing capabilities to this judgement +report use case without building a second scheduler or a parallel priority +database. + +## Scope + +In scope: + +- Extend activity-core so the existing daily triage ActivityDefinition can run + as the primary scheduler. +- Reuse the existing prompt at + `runtime/prompts/daily_statehub_wsgi_triage.md`. +- Reuse the existing ActivityDefinition at + `activity-definitions/daily-statehub-wsjf-triage.md`. +- Extend activity-core's State Hub context resolver for the queries this + report already needs. +- Add or finish the instruction/report execution path described by activity-core + ADR-003. +- Write the report to Custodian working memory and log `event_type: + daily_triage` in State Hub. +- Disable the Codex app automation after activity-core is validated, so there + is only one daily runner. + +Out of scope: + +- Rewriting the WSJF rubric or report template; that belongs to `CUST-WP-0044`. +- Creating a new scheduler, cron daemon, or separate automation database. +- Automatically changing workplan status, priority, canon, secrets, deployment, + or external commitments from the daily report. +- Retiring the workstation fallback or deploying HA activity-core before the + relevant Railiance deployment work is approved. + +## Runner Decision + +Primary target runner: activity-core Temporal schedule. + +Temporary fallback runner: Codex app automation, only until activity-core has +completed a manual run and at least one scheduled canary run. + +Cutover rule: do not enable both runners at the same time. The handoff is: + +1. Activity-core definition remains disabled while the Codex automation is the + only runner. +2. Activity-core is validated with a manual trigger using the same definition. +3. Codex automation is paused. +4. Activity-core definition is enabled and schedules are synced. +5. The next scheduled run is checked for a working-memory note, State Hub + progress event, and ActivityRun row. + +## Tasks + +### T01 - Capture Failure Evidence And Runner Boundary + +```task +id: CUST-WP-0045-T01 +status: todo +priority: high +``` + +Record the 2026-05-19 failed automation evidence in the implementation notes +for this workplan and, if useful, in the CUST-WP-0044 calibration notes. + +Confirm the desired runner boundary: + +- activity-core owns schedule, retries, run log, and context resolution +- State Hub remains the read model and progress sink +- the-custodian owns the prompt, report template, and governance guardrails +- Codex app automation is a temporary fallback only + +Done when the failure mode and cutover target are explicit enough that future +agents do not try to fix this by adding another local cron path. + +### T02 - Extend Activity-Core State Hub Context Resolver + +```task +id: CUST-WP-0045-T02 +status: todo +priority: high +depends_on: [CUST-WP-0045-T01] +``` + +Extend activity-core's existing `state-hub` context resolver instead of adding +bespoke HTTP fetch logic to the Custodian repo. + +Required queries: + +- `state_summary` -> `GET /state/summary` +- `next_steps` -> `GET /state/next_steps` +- `workplan_index` -> `GET /workstreams/workplan-index` +- `hub_inbox` -> `GET /messages/?to_agent=hub&unread_only=true` + +The resolver should keep the existing `STATE_HUB_URL` configuration pattern, +use bounded timeouts, and return `{}` on resolver failure so the workflow can +still fall back to the offline brief/prompt contract. + +Done when activity-core tests cover all four new query names and the existing +`domain_summary` and `repo_sbom_status` behavior remains intact. + +### T03 - Implement Instruction Report Execution + +```task +id: CUST-WP-0045-T03 +status: todo +priority: high +depends_on: [CUST-WP-0045-T02] +``` + +Finish the activity-core instruction/report execution path needed for judgement +runs like daily triage. + +Reuse the existing rule/instruction model from `ACT-ADR-003`: + +- parse a fenced `instruction` block from the ActivityDefinition +- apply any instruction condition before running the report +- render the canonical prompt with explicit trusted context fields +- call the approved model/agent adapter through the existing org LLM path where + available +- validate the output against a small daily-triage report schema +- record model, prompt hash, validation result, and source instruction id in + the activity-core audit trail + +This task should not introduce another scheduler or a one-off daily-triage +script. The deliverable is a reusable instruction execution capability that +this report can use and future judgement activities can share. + +Done when activity-core can run a synthetic instruction ActivityDefinition and +produce a validated report payload under test. + +### T04 - Add Working-Memory And State Hub Progress Sinks + +```task +id: CUST-WP-0045-T04 +status: todo +priority: high +depends_on: [CUST-WP-0045-T03] +``` + +Add deterministic output sinks for report instructions. + +For this activity, the sink must: + +- write one dated note under + `/home/worsch/the-custodian/memory/working/` +- post one State Hub progress event with `event_type: daily_triage` +- include the activity id, run id, scheduled time, and report summary +- be idempotent by activity-core run id and local date +- refuse to edit `canon/`, `workplans/`, or other canonical files + +Done when a manual activity-core trigger creates exactly one working-memory +note and one State Hub progress event, and a retry does not duplicate either. + +### T05 - Update And Validate The Daily Triage ActivityDefinition + +```task +id: CUST-WP-0045-T05 +status: todo +priority: medium +depends_on: [CUST-WP-0045-T02, CUST-WP-0045-T03, CUST-WP-0045-T04] +``` + +Update `activity-definitions/daily-statehub-wsjf-triage.md` so it is executable +by activity-core. + +Expected changes: + +- keep the trigger at `20 7 * * *`, timezone `Europe/Berlin` +- keep `misfire_policy: skip` +- add the report instruction block that references the canonical prompt +- keep `enabled: false` until manual validation passes +- document the single-runner cutover rule in the file + +Validate using activity-core's existing parser and sync commands with +`ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian`. + +Done when the definition parses, syncs into activity-core, and appears as a +paused Temporal schedule while disabled. + +### T06 - Canary Cutover And Disable Codex Automation + +```task +id: CUST-WP-0045-T06 +status: todo +priority: high +depends_on: [CUST-WP-0045-T05] +``` + +Run the cutover safely. + +Sequence: + +1. Manually trigger the activity-core definition and verify output. +2. Pause or delete the Codex app automation + `daily-state-hub-wsjf-triage`. +3. Set the activity-core definition to `enabled: true`. +4. Sync activity definitions and Temporal schedules. +5. Confirm the Temporal schedule is unpaused and points at + `RunActivityWorkflow`. +6. Check the next 07:20 run for a working-memory note, State Hub progress event, + ActivityRun row, and Temporal workflow history. + +Done when activity-core is the only enabled runner and the first scheduled run +has completed successfully. + +### T07 - Observability And Missed-Run Handling + +```task +id: CUST-WP-0045-T07 +status: todo +priority: medium +depends_on: [CUST-WP-0045-T06] +``` + +Document and, where cheap, automate how to tell whether the daily run happened. + +The runbook should include: + +- Temporal schedule and workflow checks +- activity-core ActivityRun query +- State Hub `daily_triage` progress-event query +- working-memory note path check +- expected behavior when the activity-core host is offline at 07:20 +- the chosen missed-run behavior: `skip`, not catch-up + +Done when the operator can answer "did it run today?" from owned telemetry +without inspecting Codex Desktop session internals. + +### T08 - Three Daily Runs And CUST-WP-0044 Calibration + +```task +id: CUST-WP-0045-T08 +status: todo +priority: medium +depends_on: [CUST-WP-0045-T06, CUST-WP-0045-T07] +``` + +Run three consecutive daily canaries from activity-core and compare the +recommendations with actual follow-up work. + +Feed the result back into `CUST-WP-0044-T06`: + +- calibrate WSJF scoring weights +- tune report length +- adjust loose-end detection thresholds +- confirm stale-but-intentionally-parked work is treated correctly +- decide whether daily notes are useful enough as a standing habit + +Done when CUST-WP-0044 can close its calibration task using activity-core runs, +not Codex app automation runs. + +## Acceptance Criteria + +- The daily State Hub WSJF triage runs from activity-core, not Codex app cron. +- The Codex app automation is disabled or removed before the activity-core + schedule is enabled. +- The daily run leaves all three evidence surfaces: working-memory note, State + Hub `daily_triage` progress event, and activity-core ActivityRun/Temporal + history. +- "Did it run today?" can be answered from State Hub and activity-core + telemetry. +- A powered-off workstation no longer matters once activity-core is running on + the chosen always-on host. +- If the chosen activity-core host is offline at 07:20, the missed run is + skipped by policy and the absence is visible in the runbook checks. +- CUST-WP-0044's three-run calibration is completed using the new runner. + +## Notes + +The immediate Codex app automation failure could be patched by chasing the +Windows/WSL launch path issue. That is not the preferred durable fix. The +preferred fix is to make the existing activity-core ActivityDefinition the +primary runner and keep all scheduling, audit, context resolution, and failure +visibility in owned infrastructure.