From 17e2e3916521292f0f33b035430a441c75f1a861 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 18 Jun 2026 15:21:59 +0200 Subject: [PATCH] Track definition schedule hot reload --- ...-WP-0012-definition-schedule-hot-reload.md | 145 ++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md diff --git a/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md b/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md new file mode 100644 index 0000000..d1e670a --- /dev/null +++ b/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md @@ -0,0 +1,145 @@ +--- +id: ACTIVITY-WP-0012 +type: workplan +title: "Definition And Schedule Hot Reload" +domain: custodian +repo: activity-core +status: ready +owner: codex +topic_slug: custodian +created: "2026-06-18" +updated: "2026-06-18" +state_hub_workstream_id: "8887075e-21ec-451b-b82b-cd81035c9ca5" +--- + +# ACTIVITY-WP-0012 - Definition And Schedule Hot Reload + +## Context + +State Hub message `f4876517-f738-4571-a2d6-76f2965e9a13` from +`coulomb-loop` reports an operational gap from the Coulomb cadence ramp: after +renaming customer definitions from hourly to daily, operators had to run +definition/schedule sync and restart the worker before new Temporal schedule +state was reliable. + +Current behavior: + +- `worker.py` runs `sync_activity_definitions` and `sync_schedules` once at + startup. +- `RunActivityWorkflow` loads ActivityDefinitions from the DB at activity time. +- The event router reloads enabled event definitions per NATS message. +- Cron schedule changes only take effect when `sync_schedules` runs. + +This belongs in activity-core because the repo owns ActivityDefinition sync, +Temporal schedule projection, and the admin API. The first implementation +should expose an operator-triggered sync path without turning activity-core into +a repo checkout manager or CI system. + +## Extract Reusable Sync Service + +```task +id: ACTIVITY-WP-0012-T01 +status: todo +priority: high +state_hub_task_id: "53a7970b-7eec-47f5-ad30-bbd7c6271952" +``` + +Refactor the worker-startup sync sequence into a reusable async service that can +be called by startup and the API. + +Done when: + +- the service can run ActivityDefinition sync, event type sync, and Temporal + schedule sync independently based on booleans; +- it accepts the existing DB session factory / Temporal client dependencies + without creating hidden global state; +- startup behavior remains unchanged except for calling the shared service; +- failures are collected into a bounded `errors[]` result while preserving the + current startup best-effort behavior. + +## Add Admin Sync Endpoint + +```task +id: ACTIVITY-WP-0012-T02 +status: todo +priority: high +state_hub_task_id: "8697c761-15d1-4da0-b66b-d838218a2495" +``` + +Add an operator-only API endpoint: + +`POST /admin/sync?definitions=true&schedules=true&event_types=true` + +Done when: + +- the endpoint runs the shared sync service without requiring worker restart; +- response JSON reports counts for definitions, event types, schedules upserted, + schedules paused/deleted, and errors; +- default parameters sync definitions and schedules, with event types opt-in or + clearly documented; +- endpoint tests cover definitions-only, schedules-only, all-sync, and failure + result behavior. + +## Preserve Schedule Drift Semantics + +```task +id: ACTIVITY-WP-0012-T03 +status: todo +priority: high +state_hub_task_id: "efeac412-632c-4c90-9428-bb575ac7a624" +``` + +Make the sync result explicit enough for cadence changes and renames. + +Done when: + +- disabled cron definitions pause their Temporal schedules on sync; +- renamed definitions create the new schedule and pause/delete orphaned old + schedules according to the existing `sync_schedules` semantics; +- event-triggered definitions remain hot through the existing router DB reload + path; +- regression tests demonstrate the Coulomb hourly-to-daily rename shape without + needing a worker restart. + +## Optional Background Sync Loop + +```task +id: ACTIVITY-WP-0012-T04 +status: todo +priority: medium +state_hub_task_id: "d774087b-c51d-4444-8e90-bfef43765456" +``` + +Decide whether to add a periodic sync loop after the admin endpoint exists. + +Done when: + +- either `ACTIVITY_SYNC_INTERVAL_SECONDS` is implemented with a default disabled + or conservative interval, or the workplan records why manual/admin-triggered + sync is the safer v1 posture; +- if implemented, logs and metrics expose the last successful sync timestamp and + last error summary; +- the loop does not block worker startup or workflow task processing. + +## Live No-Restart Smoke + +```task +id: ACTIVITY-WP-0012-T05 +status: wait +priority: high +state_hub_task_id: "68a0e22a-106a-4d21-9f39-c6279850cb5e" +``` + +Validate the hot-reload path in the cluster/operator environment. + +Done when non-secret State Hub evidence shows: + +- a customer repo definition rename or `enabled` flip is synced through + `/admin/sync`; +- new Temporal schedules are active and retired schedules are paused/deleted + without worker SIGTERM or pod restart; +- event-triggered definitions still fire normally; +- rollback or repeat sync is idempotent. + +Current wait reason: this gate depends on the implementation tasks and a +cluster-owned smoke path.