Add activity-core daily triage runner workplan
This commit is contained in:
324
workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
Normal file
324
workplans/CUST-WP-0045-activity-core-daily-triage-runner.md
Normal file
@@ -0,0 +1,324 @@
|
||||
---
|
||||
id: CUST-WP-0045
|
||||
type: workplan
|
||||
title: "Activity-Core Daily Triage Runner Cutover"
|
||||
domain: custodian
|
||||
repo: the-custodian
|
||||
status: ready
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 45
|
||||
created: "2026-05-19"
|
||||
updated: "2026-05-19"
|
||||
---
|
||||
|
||||
# CUST-WP-0045 - Activity-Core Daily Triage Runner Cutover
|
||||
|
||||
## Goal
|
||||
|
||||
Move the Daily State Hub WSJF Triage runner from the Codex app automation
|
||||
substrate to owned activity-core infrastructure.
|
||||
|
||||
The outcome should be a reliable daily run at 07:20 Europe/Berlin that produces
|
||||
the same review artifact promised by `CUST-WP-0044`: a dated working-memory
|
||||
note, a State Hub `daily_triage` progress event, and an auditable activity-core
|
||||
run record.
|
||||
|
||||
## Context
|
||||
|
||||
On 2026-05-19 the Codex app automation fired at the scheduled time, but did not
|
||||
complete a useful run:
|
||||
|
||||
- two `Daily State Hub WSJF Triage` sessions were created at 07:20 Europe/Berlin
|
||||
- both session files contained only session metadata
|
||||
- no prompt execution, report, tool call, working-memory note, or final answer
|
||||
was recorded
|
||||
- State Hub had no `daily_triage` progress event for that date
|
||||
- the recorded session cwd values used Windows-style `C:\home\worsch\...`
|
||||
paths rather than the intended WSL paths
|
||||
|
||||
This shows the schedule is present but the launch substrate is not trustworthy
|
||||
enough for an unattended Custodian operating habit.
|
||||
|
||||
activity-core already provides the pieces that should own this class of work:
|
||||
|
||||
- Temporal cron schedules with timezone and misfire-policy handling
|
||||
- `ActivityDefinition` markdown ingestion via `ACTIVITY_DEFINITION_DIRS`
|
||||
- `state-hub` context resolver hooks
|
||||
- ActivityRun logging and Temporal workflow history
|
||||
- rule/instruction model design in `ACT-ADR-003`
|
||||
- deployment/runbook paths for the Railiance environment
|
||||
|
||||
The missing work is to connect those existing capabilities to this judgement
|
||||
report use case without building a second scheduler or a parallel priority
|
||||
database.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- Extend activity-core so the existing daily triage ActivityDefinition can run
|
||||
as the primary scheduler.
|
||||
- Reuse the existing prompt at
|
||||
`runtime/prompts/daily_statehub_wsgi_triage.md`.
|
||||
- Reuse the existing ActivityDefinition at
|
||||
`activity-definitions/daily-statehub-wsjf-triage.md`.
|
||||
- Extend activity-core's State Hub context resolver for the queries this
|
||||
report already needs.
|
||||
- Add or finish the instruction/report execution path described by activity-core
|
||||
ADR-003.
|
||||
- Write the report to Custodian working memory and log `event_type:
|
||||
daily_triage` in State Hub.
|
||||
- Disable the Codex app automation after activity-core is validated, so there
|
||||
is only one daily runner.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- Rewriting the WSJF rubric or report template; that belongs to `CUST-WP-0044`.
|
||||
- Creating a new scheduler, cron daemon, or separate automation database.
|
||||
- Automatically changing workplan status, priority, canon, secrets, deployment,
|
||||
or external commitments from the daily report.
|
||||
- Retiring the workstation fallback or deploying HA activity-core before the
|
||||
relevant Railiance deployment work is approved.
|
||||
|
||||
## Runner Decision
|
||||
|
||||
Primary target runner: activity-core Temporal schedule.
|
||||
|
||||
Temporary fallback runner: Codex app automation, only until activity-core has
|
||||
completed a manual run and at least one scheduled canary run.
|
||||
|
||||
Cutover rule: do not enable both runners at the same time. The handoff is:
|
||||
|
||||
1. Activity-core definition remains disabled while the Codex automation is the
|
||||
only runner.
|
||||
2. Activity-core is validated with a manual trigger using the same definition.
|
||||
3. Codex automation is paused.
|
||||
4. Activity-core definition is enabled and schedules are synced.
|
||||
5. The next scheduled run is checked for a working-memory note, State Hub
|
||||
progress event, and ActivityRun row.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Capture Failure Evidence And Runner Boundary
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Record the 2026-05-19 failed automation evidence in the implementation notes
|
||||
for this workplan and, if useful, in the CUST-WP-0044 calibration notes.
|
||||
|
||||
Confirm the desired runner boundary:
|
||||
|
||||
- activity-core owns schedule, retries, run log, and context resolution
|
||||
- State Hub remains the read model and progress sink
|
||||
- the-custodian owns the prompt, report template, and governance guardrails
|
||||
- Codex app automation is a temporary fallback only
|
||||
|
||||
Done when the failure mode and cutover target are explicit enough that future
|
||||
agents do not try to fix this by adding another local cron path.
|
||||
|
||||
### T02 - Extend Activity-Core State Hub Context Resolver
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T02
|
||||
status: todo
|
||||
priority: high
|
||||
depends_on: [CUST-WP-0045-T01]
|
||||
```
|
||||
|
||||
Extend activity-core's existing `state-hub` context resolver instead of adding
|
||||
bespoke HTTP fetch logic to the Custodian repo.
|
||||
|
||||
Required queries:
|
||||
|
||||
- `state_summary` -> `GET /state/summary`
|
||||
- `next_steps` -> `GET /state/next_steps`
|
||||
- `workplan_index` -> `GET /workstreams/workplan-index`
|
||||
- `hub_inbox` -> `GET /messages/?to_agent=hub&unread_only=true`
|
||||
|
||||
The resolver should keep the existing `STATE_HUB_URL` configuration pattern,
|
||||
use bounded timeouts, and return `{}` on resolver failure so the workflow can
|
||||
still fall back to the offline brief/prompt contract.
|
||||
|
||||
Done when activity-core tests cover all four new query names and the existing
|
||||
`domain_summary` and `repo_sbom_status` behavior remains intact.
|
||||
|
||||
### T03 - Implement Instruction Report Execution
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T03
|
||||
status: todo
|
||||
priority: high
|
||||
depends_on: [CUST-WP-0045-T02]
|
||||
```
|
||||
|
||||
Finish the activity-core instruction/report execution path needed for judgement
|
||||
runs like daily triage.
|
||||
|
||||
Reuse the existing rule/instruction model from `ACT-ADR-003`:
|
||||
|
||||
- parse a fenced `instruction` block from the ActivityDefinition
|
||||
- apply any instruction condition before running the report
|
||||
- render the canonical prompt with explicit trusted context fields
|
||||
- call the approved model/agent adapter through the existing org LLM path where
|
||||
available
|
||||
- validate the output against a small daily-triage report schema
|
||||
- record model, prompt hash, validation result, and source instruction id in
|
||||
the activity-core audit trail
|
||||
|
||||
This task should not introduce another scheduler or a one-off daily-triage
|
||||
script. The deliverable is a reusable instruction execution capability that
|
||||
this report can use and future judgement activities can share.
|
||||
|
||||
Done when activity-core can run a synthetic instruction ActivityDefinition and
|
||||
produce a validated report payload under test.
|
||||
|
||||
### T04 - Add Working-Memory And State Hub Progress Sinks
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T04
|
||||
status: todo
|
||||
priority: high
|
||||
depends_on: [CUST-WP-0045-T03]
|
||||
```
|
||||
|
||||
Add deterministic output sinks for report instructions.
|
||||
|
||||
For this activity, the sink must:
|
||||
|
||||
- write one dated note under
|
||||
`/home/worsch/the-custodian/memory/working/`
|
||||
- post one State Hub progress event with `event_type: daily_triage`
|
||||
- include the activity id, run id, scheduled time, and report summary
|
||||
- be idempotent by activity-core run id and local date
|
||||
- refuse to edit `canon/`, `workplans/`, or other canonical files
|
||||
|
||||
Done when a manual activity-core trigger creates exactly one working-memory
|
||||
note and one State Hub progress event, and a retry does not duplicate either.
|
||||
|
||||
### T05 - Update And Validate The Daily Triage ActivityDefinition
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T05
|
||||
status: todo
|
||||
priority: medium
|
||||
depends_on: [CUST-WP-0045-T02, CUST-WP-0045-T03, CUST-WP-0045-T04]
|
||||
```
|
||||
|
||||
Update `activity-definitions/daily-statehub-wsjf-triage.md` so it is executable
|
||||
by activity-core.
|
||||
|
||||
Expected changes:
|
||||
|
||||
- keep the trigger at `20 7 * * *`, timezone `Europe/Berlin`
|
||||
- keep `misfire_policy: skip`
|
||||
- add the report instruction block that references the canonical prompt
|
||||
- keep `enabled: false` until manual validation passes
|
||||
- document the single-runner cutover rule in the file
|
||||
|
||||
Validate using activity-core's existing parser and sync commands with
|
||||
`ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian`.
|
||||
|
||||
Done when the definition parses, syncs into activity-core, and appears as a
|
||||
paused Temporal schedule while disabled.
|
||||
|
||||
### T06 - Canary Cutover And Disable Codex Automation
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T06
|
||||
status: todo
|
||||
priority: high
|
||||
depends_on: [CUST-WP-0045-T05]
|
||||
```
|
||||
|
||||
Run the cutover safely.
|
||||
|
||||
Sequence:
|
||||
|
||||
1. Manually trigger the activity-core definition and verify output.
|
||||
2. Pause or delete the Codex app automation
|
||||
`daily-state-hub-wsjf-triage`.
|
||||
3. Set the activity-core definition to `enabled: true`.
|
||||
4. Sync activity definitions and Temporal schedules.
|
||||
5. Confirm the Temporal schedule is unpaused and points at
|
||||
`RunActivityWorkflow`.
|
||||
6. Check the next 07:20 run for a working-memory note, State Hub progress event,
|
||||
ActivityRun row, and Temporal workflow history.
|
||||
|
||||
Done when activity-core is the only enabled runner and the first scheduled run
|
||||
has completed successfully.
|
||||
|
||||
### T07 - Observability And Missed-Run Handling
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T07
|
||||
status: todo
|
||||
priority: medium
|
||||
depends_on: [CUST-WP-0045-T06]
|
||||
```
|
||||
|
||||
Document and, where cheap, automate how to tell whether the daily run happened.
|
||||
|
||||
The runbook should include:
|
||||
|
||||
- Temporal schedule and workflow checks
|
||||
- activity-core ActivityRun query
|
||||
- State Hub `daily_triage` progress-event query
|
||||
- working-memory note path check
|
||||
- expected behavior when the activity-core host is offline at 07:20
|
||||
- the chosen missed-run behavior: `skip`, not catch-up
|
||||
|
||||
Done when the operator can answer "did it run today?" from owned telemetry
|
||||
without inspecting Codex Desktop session internals.
|
||||
|
||||
### T08 - Three Daily Runs And CUST-WP-0044 Calibration
|
||||
|
||||
```task
|
||||
id: CUST-WP-0045-T08
|
||||
status: todo
|
||||
priority: medium
|
||||
depends_on: [CUST-WP-0045-T06, CUST-WP-0045-T07]
|
||||
```
|
||||
|
||||
Run three consecutive daily canaries from activity-core and compare the
|
||||
recommendations with actual follow-up work.
|
||||
|
||||
Feed the result back into `CUST-WP-0044-T06`:
|
||||
|
||||
- calibrate WSJF scoring weights
|
||||
- tune report length
|
||||
- adjust loose-end detection thresholds
|
||||
- confirm stale-but-intentionally-parked work is treated correctly
|
||||
- decide whether daily notes are useful enough as a standing habit
|
||||
|
||||
Done when CUST-WP-0044 can close its calibration task using activity-core runs,
|
||||
not Codex app automation runs.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- The daily State Hub WSJF triage runs from activity-core, not Codex app cron.
|
||||
- The Codex app automation is disabled or removed before the activity-core
|
||||
schedule is enabled.
|
||||
- The daily run leaves all three evidence surfaces: working-memory note, State
|
||||
Hub `daily_triage` progress event, and activity-core ActivityRun/Temporal
|
||||
history.
|
||||
- "Did it run today?" can be answered from State Hub and activity-core
|
||||
telemetry.
|
||||
- A powered-off workstation no longer matters once activity-core is running on
|
||||
the chosen always-on host.
|
||||
- If the chosen activity-core host is offline at 07:20, the missed run is
|
||||
skipped by policy and the absence is visible in the runbook checks.
|
||||
- CUST-WP-0044's three-run calibration is completed using the new runner.
|
||||
|
||||
## Notes
|
||||
|
||||
The immediate Codex app automation failure could be patched by chasing the
|
||||
Windows/WSL launch path issue. That is not the preferred durable fix. The
|
||||
preferred fix is to make the existing activity-core ActivityDefinition the
|
||||
primary runner and keep all scheduling, audit, context resolution, and failure
|
||||
visibility in owned infrastructure.
|
||||
Reference in New Issue
Block a user