--- id: CUST-WP-0045 type: workplan title: "Activity-Core Daily Triage Runner Cutover" domain: custodian repo: the-custodian status: blocked owner: custodian topic_slug: custodian planning_priority: high planning_order: 45 created: "2026-05-19" updated: "2026-05-19" state_hub_workstream_id: "d9d9a3ec-f736-4041-beac-bb92c7ad314e" --- # CUST-WP-0045 - Activity-Core Daily Triage Runner Cutover ## Goal Move the Daily State Hub WSJF Triage runner from the Codex app automation substrate to owned activity-core infrastructure. The outcome should be a reliable daily run at 07:20 Europe/Berlin that produces the same review artifact promised by `CUST-WP-0044`: a dated working-memory note, a State Hub `daily_triage` progress event, and an auditable activity-core run record. ## Context On 2026-05-19 the Codex app automation fired at the scheduled time, but did not complete a useful run: - two `Daily State Hub WSJF Triage` sessions were created at 07:20 Europe/Berlin - both session files contained only session metadata - no prompt execution, report, tool call, working-memory note, or final answer was recorded - State Hub had no `daily_triage` progress event for that date - the recorded session cwd values used Windows-style `C:\home\worsch\...` paths rather than the intended WSL paths This shows the schedule is present but the launch substrate is not trustworthy enough for an unattended Custodian operating habit. activity-core already provides the pieces that should own this class of work: - Temporal cron schedules with timezone and misfire-policy handling - `ActivityDefinition` markdown ingestion via `ACTIVITY_DEFINITION_DIRS` - `state-hub` context resolver hooks - ActivityRun logging and Temporal workflow history - rule/instruction model design in `ACT-ADR-003` - deployment/runbook paths for the Railiance environment The missing work is to connect those existing capabilities to this judgement report use case without building a second scheduler or a parallel priority database. ## Scope In scope: - Extend activity-core so the existing daily triage ActivityDefinition can run as the primary scheduler. - Reuse the existing prompt at `runtime/prompts/daily_statehub_wsgi_triage.md`. - Reuse the existing ActivityDefinition at `activity-definitions/daily-statehub-wsjf-triage.md`. - Extend activity-core's State Hub context resolver for the queries this report already needs. - Add or finish the instruction/report execution path described by activity-core ADR-003. - Write the report to Custodian working memory and log `event_type: daily_triage` in State Hub. - Disable the Codex app automation after activity-core is validated, so there is only one daily runner. Out of scope: - Rewriting the WSJF rubric or report template; that belongs to `CUST-WP-0044`. - Creating a new scheduler, cron daemon, or separate automation database. - Automatically changing workplan status, priority, canon, secrets, deployment, or external commitments from the daily report. - Retiring the workstation fallback or deploying HA activity-core before the relevant Railiance deployment work is approved. ## Runner Decision Primary target runner: activity-core Temporal schedule. Temporary fallback runner: Codex app automation, only until activity-core has completed a manual run and at least one scheduled canary run. Cutover rule: do not enable both runners at the same time. The handoff is: 1. Activity-core definition remains disabled while the Codex automation is the only runner. 2. Activity-core is validated with a manual trigger using the same definition. 3. Codex automation is paused. 4. Activity-core definition is enabled and schedules are synced. 5. The next scheduled run is checked for a working-memory note, State Hub progress event, and ActivityRun row. ## Tasks ### T01 - Capture Failure Evidence And Runner Boundary ```task id: CUST-WP-0045-T01 status: done priority: high state_hub_task_id: "01f57ed4-0473-42bf-b61c-0491f7ac7e2c" ``` Record the 2026-05-19 failed automation evidence in the implementation notes for this workplan and, if useful, in the CUST-WP-0044 calibration notes. Confirm the desired runner boundary: - activity-core owns schedule, retries, run log, and context resolution - State Hub remains the read model and progress sink - the-custodian owns the prompt, report template, and governance guardrails - Codex app automation is a temporary fallback only Done when the failure mode and cutover target are explicit enough that future agents do not try to fix this by adding another local cron path. ### T02 - Extend Activity-Core State Hub Context Resolver ```task id: CUST-WP-0045-T02 status: done priority: high depends_on: [CUST-WP-0045-T01] state_hub_task_id: "c4303b24-6f6b-445e-8e2e-94441589a7f2" ``` Extend activity-core's existing `state-hub` context resolver instead of adding bespoke HTTP fetch logic to the Custodian repo. Required queries: - `state_summary` -> `GET /state/summary` - `next_steps` -> `GET /state/next_steps` - `workplan_index` -> `GET /workstreams/workplan-index` - `hub_inbox` -> `GET /messages/?to_agent=hub&unread_only=true` The resolver should keep the existing `STATE_HUB_URL` configuration pattern, use bounded timeouts, and return `{}` on resolver failure so the workflow can still fall back to the offline brief/prompt contract. Done when activity-core tests cover all four new query names and the existing `domain_summary` and `repo_sbom_status` behavior remains intact. ### T03 - Implement Instruction Report Execution ```task id: CUST-WP-0045-T03 status: done priority: high depends_on: [CUST-WP-0045-T02] state_hub_task_id: "e766ff2e-1887-49e6-9c66-598bb395e76c" ``` Finish the activity-core instruction/report execution path needed for judgement runs like daily triage. Reuse the existing rule/instruction model from `ACT-ADR-003`: - parse a fenced `instruction` block from the ActivityDefinition - apply any instruction condition before running the report - render the canonical prompt with explicit trusted context fields - call the approved model/agent adapter through the existing org LLM path where available - validate the output against a small daily-triage report schema - record model, prompt hash, validation result, and source instruction id in the activity-core audit trail This task should not introduce another scheduler or a one-off daily-triage script. The deliverable is a reusable instruction execution capability that this report can use and future judgement activities can share. Done when activity-core can run a synthetic instruction ActivityDefinition and produce a validated report payload under test. ### T04 - Add Working-Memory And State Hub Progress Sinks ```task id: CUST-WP-0045-T04 status: done priority: high depends_on: [CUST-WP-0045-T03] state_hub_task_id: "04e56428-d3a8-4aa7-a6e1-172c974ece3a" ``` Add deterministic output sinks for report instructions. For this activity, the sink must: - write one dated note under `/home/worsch/the-custodian/memory/working/` - post one State Hub progress event with `event_type: daily_triage` - include the activity id, run id, scheduled time, and report summary - be idempotent by activity-core run id and local date - refuse to edit `canon/`, `workplans/`, or other canonical files Done when a manual activity-core trigger creates exactly one working-memory note and one State Hub progress event, and a retry does not duplicate either. ### T05 - Update And Validate The Daily Triage ActivityDefinition ```task id: CUST-WP-0045-T05 status: done priority: medium depends_on: [CUST-WP-0045-T02, CUST-WP-0045-T03, CUST-WP-0045-T04] state_hub_task_id: "0c6d54ec-7ed1-4e80-9cfa-ccb914e65fbf" ``` Update `activity-definitions/daily-statehub-wsjf-triage.md` so it is executable by activity-core. Expected changes: - keep the trigger at `20 7 * * *`, timezone `Europe/Berlin` - keep `misfire_policy: skip` - add the report instruction block that references the canonical prompt - keep `enabled: false` until manual validation passes - document the single-runner cutover rule in the file Validate using activity-core's existing parser and sync commands with `ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian`. Done when the definition parses, syncs into activity-core, and appears as a paused Temporal schedule while disabled. ### T06 - Canary Cutover And Disable Codex Automation ```task id: CUST-WP-0045-T06 status: blocked priority: high depends_on: [CUST-WP-0045-T05] state_hub_task_id: "545162d7-0198-4519-a30b-06e88c6db915" blocking_reason: "Needs an approved non-external LLM path for private State Hub digest data, or explicit operator approval for the external llm-connect backend." needs_human: true intervention_note: "Real cutover needs an approved non-external LLM path for private State Hub digest data, or explicit human approval for the external llm-connect backend after review." ``` Run the cutover safely. Sequence: 1. Manually trigger the activity-core definition and verify output. 2. Pause or delete the Codex app automation `daily-state-hub-wsjf-triage`. 3. Set the activity-core definition to `enabled: true`. 4. Sync activity definitions and Temporal schedules. 5. Confirm the Temporal schedule is unpaused and points at `RunActivityWorkflow`. 6. Check the next 07:20 run for a working-memory note, State Hub progress event, ActivityRun row, and Temporal workflow history. Done when activity-core is the only enabled runner and the first scheduled run has completed successfully. ### T07 - Observability And Missed-Run Handling ```task id: CUST-WP-0045-T07 status: todo priority: medium depends_on: [CUST-WP-0045-T06] state_hub_task_id: "b977c721-cadc-461f-8ffb-715d438e4c31" ``` Document and, where cheap, automate how to tell whether the daily run happened. The runbook should include: - Temporal schedule and workflow checks - activity-core ActivityRun query - State Hub `daily_triage` progress-event query - working-memory note path check - expected behavior when the activity-core host is offline at 07:20 - the chosen missed-run behavior: `skip`, not catch-up Done when the operator can answer "did it run today?" from owned telemetry without inspecting Codex Desktop session internals. ### T08 - Three Daily Runs And CUST-WP-0044 Calibration ```task id: CUST-WP-0045-T08 status: todo priority: medium depends_on: [CUST-WP-0045-T06, CUST-WP-0045-T07] state_hub_task_id: "f4a985fd-8cce-4175-983e-cf3b437e19a5" ``` Run three consecutive daily canaries from activity-core and compare the recommendations with actual follow-up work. Feed the result back into `CUST-WP-0044-T06`: - calibrate WSJF scoring weights - tune report length - adjust loose-end detection thresholds - confirm stale-but-intentionally-parked work is treated correctly - decide whether daily notes are useful enough as a standing habit Done when CUST-WP-0044 can close its calibration task using activity-core runs, not Codex app automation runs. ## Implementation Notes - 2026-05-19 T01 is complete. The 2026-05-19 failed Codex automation run is captured in this workplan's context, and the runner boundary is explicit: activity-core owns the schedule, retries, context resolution, run log, and audit trail; State Hub stays the read model and progress sink; the-custodian owns the prompt and guardrails. T02 is complete in activity-core. The existing `state-hub` context resolver now supports the daily triage queries `state_summary`, `next_steps`, `workplan_index`, and `hub_inbox` while preserving `domain_summary` and `repo_sbom_status`. Resolver failures return `{}` so the workflow can degrade to offline context instead of failing the whole run. T03 is complete in activity-core. `RunActivityWorkflow` now evaluates instruction blocks after rules, using the existing instruction executor and a small llm-connect HTTP client boundary. Instruction results carry task specs, optional report payloads, prompt hash, model, validation status, review flag, and condition metadata. A lightweight daily triage report schema is available at `schemas/daily-triage-report.json` so report payloads can be validated under test before T04 wires the deterministic working-memory and State Hub sinks. T04 is complete in activity-core. Instruction definitions can now declare `report_sinks`; report payloads are persisted through deterministic sink code instead of model-authored file operations. The first two sink types are `working-memory` and `state-hub-progress`. Working-memory writes refuse canonical Custodian `canon/` and `workplans/` paths, use run-id/date based idempotency, and State Hub progress posting deduplicates by activity run id and instruction id before posting. T05 is complete. The daily triage ActivityDefinition now uses a single trusted scalar `context.daily_triage_digest` instead of raw State Hub JSON. The digest is built in activity-core from safe identifiers, counts, statuses, priority fields, health labels, and shortened titles, while excluding task descriptions, message bodies, and other free-text command surfaces. The digest also carries a `deterministic_scoring` extension marker so a later high-criticality path can move especially high-gain/high-effort candidate scoring into code without changing the ActivityDefinition contract. T06 is partially validated but blocked before cutover. A local activity-core dev stack was started, the Custodian ActivityDefinition directory synced into activity-core, and the paused Temporal schedule for the disabled daily triage definition was created. The first sync exposed reusable activity-core gaps that were fixed there instead of bypassed here: - file-authored ActivityDefinition slug ids now map to stable UUIDv5 DB ids - schedule sync no longer uses raw `NOT IN :ids` SQL that asyncpg rejects - ADR-style context sources without an explicit `name` validate against the domain model - the worker now registers the existing instruction/report activities Manual trigger canary evidence, using a local-only llm-connect mock response so no State Hub digest data left the workstation: - workflow id: `activity-6fca51fa-387a-4fd0-bc4e-d62c29eb859a:manual-6a6e5950-2338-45c4-9054-573dda9c87cc` - Temporal status: `COMPLETED` - activity-core run id: `2164cb88-8415-5c96-9e31-e47a41cf4e67` - working-memory note: `memory/working/daily-triage-2026-05-19-2164cb88.md` - State Hub progress event: `e42c0ada-8111-4d88-9791-821252cd04a2` The real Claude-backed llm-connect trigger was not run. The execution wrapper blocked it because private State Hub workstream/task digest data would be sent to an external LLM provider. Therefore the Codex app automation remains the only enabled runner, the ActivityDefinition remains `enabled: false`, and T06 is blocked until there is either an approved local/private LLM backend or an explicit operator decision to allow that external data flow. Verification: - `uv run pytest tests/test_state_hub_context_resolver.py -q`: 6 passed - activity-core parser validation with `ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian`: parsed the daily triage definition, cron trigger, trusted instruction, and report sinks - `uv run pytest -q` in activity-core: 107 passed, 1 skipped - activity-core focused T06 validation: `uv run pytest tests/test_sync_activity_definitions.py tests/test_instruction_evaluation.py tests/test_report_sinks.py -q`: 10 passed - activity-core full suite after T06 fixes: `uv run pytest -q`: 110 passed, 1 skipped ## Acceptance Criteria - The daily State Hub WSJF triage runs from activity-core, not Codex app cron. - The Codex app automation is disabled or removed before the activity-core schedule is enabled. - The daily run leaves all three evidence surfaces: working-memory note, State Hub `daily_triage` progress event, and activity-core ActivityRun/Temporal history. - "Did it run today?" can be answered from State Hub and activity-core telemetry. - A powered-off workstation no longer matters once activity-core is running on the chosen always-on host. - If the chosen activity-core host is offline at 07:20, the missed run is skipped by policy and the absence is visible in the runbook checks. - CUST-WP-0044's three-run calibration is completed using the new runner. ## Notes The immediate Codex app automation failure could be patched by chasing the Windows/WSL launch path issue. That is not the preferred durable fix. The preferred fix is to make the existing activity-core ActivityDefinition the primary runner and keep all scheduling, audit, context resolution, and failure visibility in owned infrastructure.