Track definition schedule hot reload

This commit is contained in:
2026-06-18 15:21:59 +02:00
parent 6518ecefce
commit 17e2e39165

View File

@@ -0,0 +1,145 @@
---
id: ACTIVITY-WP-0012
type: workplan
title: "Definition And Schedule Hot Reload"
domain: custodian
repo: activity-core
status: ready
owner: codex
topic_slug: custodian
created: "2026-06-18"
updated: "2026-06-18"
state_hub_workstream_id: "8887075e-21ec-451b-b82b-cd81035c9ca5"
---
# ACTIVITY-WP-0012 - Definition And Schedule Hot Reload
## Context
State Hub message `f4876517-f738-4571-a2d6-76f2965e9a13` from
`coulomb-loop` reports an operational gap from the Coulomb cadence ramp: after
renaming customer definitions from hourly to daily, operators had to run
definition/schedule sync and restart the worker before new Temporal schedule
state was reliable.
Current behavior:
- `worker.py` runs `sync_activity_definitions` and `sync_schedules` once at
startup.
- `RunActivityWorkflow` loads ActivityDefinitions from the DB at activity time.
- The event router reloads enabled event definitions per NATS message.
- Cron schedule changes only take effect when `sync_schedules` runs.
This belongs in activity-core because the repo owns ActivityDefinition sync,
Temporal schedule projection, and the admin API. The first implementation
should expose an operator-triggered sync path without turning activity-core into
a repo checkout manager or CI system.
## Extract Reusable Sync Service
```task
id: ACTIVITY-WP-0012-T01
status: todo
priority: high
state_hub_task_id: "53a7970b-7eec-47f5-ad30-bbd7c6271952"
```
Refactor the worker-startup sync sequence into a reusable async service that can
be called by startup and the API.
Done when:
- the service can run ActivityDefinition sync, event type sync, and Temporal
schedule sync independently based on booleans;
- it accepts the existing DB session factory / Temporal client dependencies
without creating hidden global state;
- startup behavior remains unchanged except for calling the shared service;
- failures are collected into a bounded `errors[]` result while preserving the
current startup best-effort behavior.
## Add Admin Sync Endpoint
```task
id: ACTIVITY-WP-0012-T02
status: todo
priority: high
state_hub_task_id: "8697c761-15d1-4da0-b66b-d838218a2495"
```
Add an operator-only API endpoint:
`POST /admin/sync?definitions=true&schedules=true&event_types=true`
Done when:
- the endpoint runs the shared sync service without requiring worker restart;
- response JSON reports counts for definitions, event types, schedules upserted,
schedules paused/deleted, and errors;
- default parameters sync definitions and schedules, with event types opt-in or
clearly documented;
- endpoint tests cover definitions-only, schedules-only, all-sync, and failure
result behavior.
## Preserve Schedule Drift Semantics
```task
id: ACTIVITY-WP-0012-T03
status: todo
priority: high
state_hub_task_id: "efeac412-632c-4c90-9428-bb575ac7a624"
```
Make the sync result explicit enough for cadence changes and renames.
Done when:
- disabled cron definitions pause their Temporal schedules on sync;
- renamed definitions create the new schedule and pause/delete orphaned old
schedules according to the existing `sync_schedules` semantics;
- event-triggered definitions remain hot through the existing router DB reload
path;
- regression tests demonstrate the Coulomb hourly-to-daily rename shape without
needing a worker restart.
## Optional Background Sync Loop
```task
id: ACTIVITY-WP-0012-T04
status: todo
priority: medium
state_hub_task_id: "d774087b-c51d-4444-8e90-bfef43765456"
```
Decide whether to add a periodic sync loop after the admin endpoint exists.
Done when:
- either `ACTIVITY_SYNC_INTERVAL_SECONDS` is implemented with a default disabled
or conservative interval, or the workplan records why manual/admin-triggered
sync is the safer v1 posture;
- if implemented, logs and metrics expose the last successful sync timestamp and
last error summary;
- the loop does not block worker startup or workflow task processing.
## Live No-Restart Smoke
```task
id: ACTIVITY-WP-0012-T05
status: wait
priority: high
state_hub_task_id: "68a0e22a-106a-4d21-9f39-c6279850cb5e"
```
Validate the hot-reload path in the cluster/operator environment.
Done when non-secret State Hub evidence shows:
- a customer repo definition rename or `enabled` flip is synced through
`/admin/sync`;
- new Temporal schedules are active and retired schedules are paused/deleted
without worker SIGTERM or pod restart;
- event-triggered definitions still fire normally;
- rollback or repeat sync is idempotent.
Current wait reason: this gate depends on the implementation tasks and a
cluster-owned smoke path.