feat(ACTIVITY-WP-0014): explicit run-miss recovery policies (T02, T04)

Set Temporal catchup_window on cron schedules so a fire missed during a
worker/Temporal outage is no longer silently dropped. Redefine misfire_policy
into three explicit modes — skip, catchup_all, catchup_latest — mapping to
(catchup_window, overlap) pairs; legacy catchup/compress aliased. Add
catchup_window_seconds override. Remove the ad-hoc upsert-time 1h backfill in
favour of native catchup. Apply catchup_latest to daily-statehub-wsjf-triage in
the Railiance runtime manifest and document run-miss policies in the runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-23 14:15:45 +02:00
parent ffc0ee2cb7
commit a83b117f60
6 changed files with 181 additions and 29 deletions

View File

@@ -49,7 +49,18 @@ class CronTriggerConfig(BaseModel):
)
timezone: str = Field(default="UTC", description="IANA timezone name.")
jitter_seconds: int = Field(default=0, ge=0)
misfire_policy: Literal["skip", "catchup", "compress"] = Field(default="skip")
# Run-miss recovery behaviour (ACTIVITY-WP-0014). What happens when a fire is
# missed because the worker / Temporal was unavailable at trigger time:
# skip - run on trigger or skip; a missed fire is never recovered
# catchup_all - recover every fire missed during the outage window
# catchup_latest - recover only the most recent missed fire; do not accumulate
# Legacy aliases are accepted: catchup → catchup_all, compress → catchup_latest.
misfire_policy: Literal[
"skip", "catchup_all", "catchup_latest", "catchup", "compress"
] = Field(default="skip")
# Override the per-policy default catchup window (how far back Temporal will
# recover missed fires after an outage). None uses the policy default.
catchup_window_seconds: int | None = Field(default=None, ge=0)
class EventTriggerConfig(BaseModel):