feat(WP-0002): complete Triggers & Ops workstream

Delivers all 12 tasks (T22–T33): Temporal Schedule manager + startup
sync, NATS JetStream event router, FastAPI CRUD + manual trigger,
Prometheus metrics wiring, custom search-attribute tagging, and
operational runbook. Marks workplan status as done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-28 01:04:43 +01:00
parent 9f15296e25
commit ea5fbe0bf3
14 changed files with 1612 additions and 48 deletions

View File

@@ -3,56 +3,56 @@ id: custodian-WP-0002
type: workplan
domain: custodian
repo: activity-core
status: active
status: done
state_hub_workstream_id: 3a4f47d9-8bc1-434e-acb4-bed5d4dacda0
tasks:
- id: T22
title: Write schedule_manager.py
status: todo
status: done
state_hub_task_id: e50550d1-9904-41d7-afd8-492a1f1e91b8
- id: T23
title: Bootstrap script to sync schedules on startup
status: todo
status: done
state_hub_task_id: 5a1f7fa3-acb9-4f60-9892-c9eaa120272e
- id: T24
title: Handle misfire policy in schedule config
status: todo
status: done
state_hub_task_id: 00231668-95c5-447f-b3d0-1fb8c20b487f
- id: T25
title: Test schedule pause/resume lifecycle
status: todo
status: done
state_hub_task_id: 7abfd375-ea9d-4209-8371-e5664dc2c6c4
- id: T26
title: Implement Event Router service
status: todo
status: done
state_hub_task_id: 68b6610b-159c-4f1c-92a9-7128efea0961
- id: T27
title: Implement routing rules (event.type + filters → activity_ids)
status: todo
status: done
state_hub_task_id: 9348efea-a7e9-4f92-b866-8fc82cf28fee
- id: T28
title: Start/signal workflow from Event Router
status: todo
status: done
state_hub_task_id: cac1f45a-7391-471a-9566-97cdbd96eb2d
- id: T29
title: Integration test — publish event → observe workflow run
status: todo
status: done
state_hub_task_id: 7f10b5a3-7cad-4914-b603-d57508c85629
- id: T30
title: REST API (FastAPI) — CRUD for ActivityDefinition
status: todo
status: done
state_hub_task_id: b27e54a1-5dcc-476d-8f4a-c995aea6a8c2
- id: T31
title: Wire Temporal SDK metrics to Prometheus
status: todo
status: done
state_hub_task_id: 0eafb60c-f00e-4fd7-a921-7de75fcfe81e
- id: T32
title: Tag workflows with activity_id for Temporal visibility search
status: todo
status: done
state_hub_task_id: 7bdfc5c2-1f06-4fce-aac3-fae036dcb47e
- id: T33
title: Write operational runbook
status: todo
status: done
state_hub_task_id: 766d602d-1b23-4247-a46d-03c0d3b8e498
---
@@ -61,6 +61,7 @@ tasks:
**Workstream:** activity-core Triggers & Ops
**Hub ID:** `3a4f47d9-8bc1-434e-acb4-bed5d4dacda0`
**Depends on:** custodian-WP-0001 (Foundation — Temporal Backbone)
**Status:** DONE (2026-03-28)
## Purpose
@@ -68,50 +69,62 @@ Add automated triggering (time-based via Temporal Schedules and event-driven via
a REST admin API, Prometheus metrics, and an operational runbook. Transforms the manually-triggered
backbone from WP-0001 into a self-operating service.
## Open decisions (resolve before Phase 5)
## Decisions resolved
- **Event broker choice** (hub: `bc47c9c2-5643-4a88-8114-601738a2f64e`): Kafka vs NATS vs RabbitMQ.
T26T29 are blocked until this is resolved.
- **Event broker choice** (hub: `bc47c9c2-5643-4a88-8114-601738a2f64e`): **NATS + JetStream** chosen.
---
## Phase 4 — Time-Based Triggers (Temporal Schedules)
## Phase 4 — Time-Based Triggers (Temporal Schedules)
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T22: Write schedule_manager.py | medium | `e50550d1-9904-41d7-afd8-492a1f1e91b8` |
| T23: Bootstrap script to sync schedules on startup | medium | `5a1f7fa3-acb9-4f60-9892-c9eaa120272e` |
| T24: Handle misfire policy in schedule config | medium | `00231668-95c5-447f-b3d0-1fb8c20b487f` |
| T25: Test schedule pause/resume lifecycle | medium | `7abfd375-ea9d-4209-8371-e5664dc2c6c4` |
| T22: Write schedule_manager.py | medium | done |
| T23: Bootstrap script to sync schedules on startup | medium | done |
| T24: Handle misfire policy in schedule config | medium | done |
| T25: Test schedule pause/resume lifecycle | medium | done |
---
## Phase 5 — Event-Driven Triggers
## Phase 5 — Event-Driven Triggers
*Blocked by broker decision (`bc47c9c2-5643-4a88-8114-601738a2f64e`).*
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T26: Implement Event Router service | medium | `68b6610b-159c-4f1c-92a9-7128efea0961` |
| T27: Implement routing rules (event.type + filters → activity_ids) | medium | `9348efea-a7e9-4f92-b866-8fc82cf28fee` |
| T28: Start/signal workflow from Event Router | medium | `cac1f45a-7391-471a-9566-97cdbd96eb2d` |
| T29: Integration test — publish event → observe workflow run | medium | `7f10b5a3-7cad-4914-b603-d57508c85629` |
| T26: Implement Event Router service | medium | done |
| T27: Implement routing rules (event.type + filters → activity_ids) | medium | done |
| T28: Start/signal workflow from Event Router | medium | done |
| T29: Integration test — publish event → observe workflow run | medium | done |
---
## Phase 6 — Observability & Admin
## Phase 6 — Observability & Admin
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T30: REST API (FastAPI) — CRUD for ActivityDefinition | low | `b27e54a1-5dcc-476d-8f4a-c995aea6a8c2` |
| T31: Wire Temporal SDK metrics to Prometheus | low | `0eafb60c-f00e-4fd7-a921-7de75fcfe81e` |
| T32: Tag workflows with activity_id for Temporal visibility search | low | `7bdfc5c2-1f06-4fce-aac3-fae036dcb47e` |
| T33: Write operational runbook | low | `766d602d-1b23-4247-a46d-03c0d3b8e498` |
| T30: REST API (FastAPI) — CRUD for ActivityDefinition | low | done |
| T31: Wire Temporal SDK metrics to Prometheus | low | done |
| T32: Tag workflows with activity_id for Temporal visibility search | low | done |
| T33: Write operational runbook | low | done |
---
## Completion criteria
## Files produced
Schedules fire `RunActivityWorkflow` automatically on cron cadence. An external event published
to the broker reaches the correct ActivityDefinition end-to-end. ActivityDefinitions are
manageable via REST API. Prometheus metrics are scraped. Runbook is written.
| File | Purpose |
|------|---------|
| `src/activity_core/schedule_manager.py` | T22/T24: upsert/delete/list Temporal Schedules |
| `src/activity_core/sync_schedules.py` | T23: bootstrap schedule sync |
| `src/activity_core/event_router.py` | T26/T27/T28: NATS JetStream → Temporal |
| `src/activity_core/api.py` | T30: FastAPI CRUD + manual trigger |
| `tests/test_schedule_lifecycle.py` | T25: schedule lifecycle unit tests |
| `tests/test_event_router.py` | T29: event router unit + integration tests |
| `docs/runbook.md` | T33: operational runbook |
| `docker-compose.dev.yml` | added NATS service |
## Completion criteria ✓
- Schedules fire `RunActivityWorkflow` automatically on cron cadence ✓
- External event published to NATS reaches the correct ActivityDefinition end-to-end ✓
- ActivityDefinitions are manageable via REST API ✓
- Prometheus metrics are scraped ✓
- Runbook is written ✓