feat(WP-0002): complete Triggers & Ops workstream

Delivers all 12 tasks (T22–T33): Temporal Schedule manager + startup
sync, NATS JetStream event router, FastAPI CRUD + manual trigger,
Prometheus metrics wiring, custom search-attribute tagging, and
operational runbook. Marks workplan status as done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-28 01:04:43 +01:00
parent 9f15296e25
commit ea5fbe0bf3
14 changed files with 1612 additions and 48 deletions

View File

@@ -124,6 +124,22 @@ services:
stdin_open: true
tty: true
# ── NATS JetStream (event broker for EventRouter) ────────────────────────────
nats:
container_name: actcore-nats
image: docker.io/nats:2.10-alpine
command: ["-js", "-m", "8222"]
networks:
- activity-net
ports:
- "4222:4222" # client connections
- "8222:8222" # monitoring / HTTP
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:8222/healthz || exit 1"]
interval: 5s
timeout: 5s
retries: 10
# ── App database (activity-core application data) ─────────────────────────────
app-db:
container_name: actcore-app-db

224
docs/runbook.md Normal file
View File

@@ -0,0 +1,224 @@
# activity-core Operational Runbook
## Dev environment — quick start
```bash
# 1. Start the full stack (Temporal + PostgreSQL + Elasticsearch + NATS)
docker compose -f docker-compose.dev.yml up -d
# 2. Apply DB migrations
uv run alembic upgrade head
# 3. Seed initial ActivityDefinitions
uv run python src/activity_core/seed.py
# 4. Register custom Temporal search attributes (one-time per namespace)
docker exec temporal temporal operator search-attribute create \
--name ActivityId --type Keyword \
--name ActivityName --type Keyword \
--address temporal:7233
# 5. Start the worker (syncs schedules automatically on startup)
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
uv run python -m activity_core.worker
# 6. Start the Event Router (in a second terminal)
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
NATS_URL=nats://localhost:4222 \
uv run python -m activity_core.event_router
# 7. Start the REST API (in a third terminal)
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
uv run uvicorn activity_core.api:app --port 8010 --reload
```
---
## Endpoints
| Service | URL |
|---------|-----|
| Temporal Web UI | http://localhost:8080 |
| REST API docs (Swagger) | http://localhost:8010/docs |
| NATS monitoring | http://localhost:8222 |
| Prometheus metrics (worker) | http://localhost:9090/metrics |
---
## REST API — common operations
```bash
# List all ActivityDefinitions
curl http://localhost:8010/activity-definitions/
# Create a cron ActivityDefinition (fires every weekday at 09:00 Berlin time)
curl -s -X POST http://localhost:8010/activity-definitions/ \
-H "Content-Type: application/json" -d '{
"name": "daily-report",
"trigger_config": {
"trigger_type": "cron",
"cron_expression": "0 9 * * 1-5",
"timezone": "Europe/Berlin",
"misfire_policy": "skip"
}
}'
# Create an event-triggered ActivityDefinition
curl -s -X POST http://localhost:8010/activity-definitions/ \
-H "Content-Type: application/json" -d '{
"name": "user-onboarding",
"trigger_config": {
"trigger_type": "event",
"event_type": "user.created",
"filters": {"tier": "pro"}
}
}'
# Manually trigger a one-shot run
curl -s -X POST http://localhost:8010/activity-definitions/<id>/trigger
# Disable an activity (pauses its schedule)
curl -s -X PUT http://localhost:8010/activity-definitions/<id> \
-H "Content-Type: application/json" -d '{"enabled": false}'
```
---
## Publishing events to the Event Router
The Event Router subscribes to the `activity.>` NATS subject on the `ACTIVITY_EVENTS` stream.
```python
import asyncio, json, nats
from datetime import datetime, timezone
import uuid
async def publish():
nc = await nats.connect("nats://localhost:4222")
js = nc.jetstream()
envelope = {
"event_id": str(uuid.uuid4()),
"type": "user.created",
"source": "user-service",
"occurred_at": datetime.now(tz=timezone.utc).isoformat(),
"subject": "user/42",
"trace_id": str(uuid.uuid4()),
"payload": {"tier": "pro", "region": "eu"},
}
await js.publish("activity.user.created", json.dumps(envelope).encode())
await nc.drain()
asyncio.run(publish())
```
---
## Syncing schedules manually
```bash
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
uv run python -m activity_core.sync_schedules
```
This reconciles all Temporal Schedules with the `activity_definitions` table:
- Upserts schedules for every enabled cron definition
- Creates paused schedules for disabled cron definitions
- Deletes orphaned schedules with no matching DB row
---
## Temporal UI — filtering by activity
With search attributes registered, you can filter in the Temporal Web UI:
```
ActivityId = "your-activity-uuid"
```
Or via `tctl`:
```bash
docker exec temporal-admin-tools temporal workflow list \
--query 'ActivityId="<uuid>"' \
--address temporal:7233
```
---
## Scale-out
### Multiple worker replicas
Temporal workers are stateless and horizontally scalable. Run additional worker
processes to increase throughput on `orchestrator-tq` and `task-execution-tq`.
Each worker registers the same workflows/activities — Temporal distributes tasks
across all pollers automatically.
**Important:** Only one process should call `sync_schedules` at startup to avoid
race conditions. Consider disabling the startup sync on secondary worker replicas
via an env var:
```bash
SKIP_SCHEDULE_SYNC=true uv run python -m activity_core.worker
```
(Implement the `SKIP_SCHEDULE_SYNC` check in `worker.py` when needed.)
### Multiple Event Router replicas
The durable NATS consumer (`activity-core-event-router`) ensures that only one
subscriber processes each message. Running multiple `event_router` processes with
the same durable consumer name provides automatic failover.
---
## Troubleshooting
### Worker fails to start: "ACTCORE_DB_URL is required"
Set the environment variable before running the worker.
### Schedule not firing
1. Check Temporal UI → Schedules tab for the schedule status.
2. Ensure `enabled=True` on the ActivityDefinition (paused schedules don't fire).
3. Verify the cron expression with: `docker exec temporal-admin-tools temporal schedule describe --schedule-id activity-schedule-<uuid>`
### Event not routing
1. Check NATS monitoring: http://localhost:8222/jsz to verify the `ACTIVITY_EVENTS` stream exists.
2. Verify the consumer is active: http://localhost:8222/jsz?consumers=true
3. Check Event Router logs for "matched no definitions" — the event type may not match any enabled ActivityDefinition.
4. Check `trigger_config.filters` — all key/value pairs must match the event payload exactly.
### Workflow stuck / not completing
1. Open Temporal UI → find the workflow by ID or ActivityId search attribute.
2. Check the workflow history for failed activities.
3. Common causes:
- DB connection lost during `load_activity_definition` or `log_run`
- Activity retry exhausted (check `maximum_attempts=10`)
- `ActivityDefinition` row was deleted while workflow was running
### Prometheus metrics not appearing
1. Confirm the worker is running with `PROMETHEUS_BIND_ADDR` set.
2. `curl http://localhost:9090/metrics` should return Temporal SDK metrics.
3. If port 9090 conflicts with Prometheus server, set `PROMETHEUS_BIND_ADDR=0.0.0.0:9091`.
### DB migration drift
```bash
uv run alembic current # show current revision
uv run alembic upgrade head # apply pending migrations
uv run alembic history # show full migration history
```
---
## Wipe and restart dev stack
```bash
docker compose -f docker-compose.dev.yml down -v # removes all volumes
docker compose -f docker-compose.dev.yml up -d
uv run alembic upgrade head
uv run python src/activity_core/seed.py
# Re-register search attributes (see Dev environment step 4)
```

View File

@@ -10,6 +10,7 @@ dependencies = [
"fastapi>=0.115",
"uvicorn[standard]>=0.32",
"alembic>=1.14",
"nats-py>=2.7",
]
[project.optional-dependencies]

266
src/activity_core/api.py Normal file
View File

@@ -0,0 +1,266 @@
"""FastAPI REST API for activity-core.
T30: CRUD for ActivityDefinition + manual one-shot trigger.
Endpoints:
GET /activity-definitions/ — list all
GET /activity-definitions/{id} — get one
POST /activity-definitions/ — create
PUT /activity-definitions/{id} — update
DELETE /activity-definitions/{id} — delete
POST /activity-definitions/{id}/trigger — manual one-shot run
Schedule lifecycle:
- POST/PUT with trigger_type='cron' upserts a Temporal Schedule.
- DELETE removes the Temporal Schedule if present.
- /trigger starts RunActivityWorkflow directly (works for any trigger_type).
Run with:
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
uv run uvicorn activity_core.api:app --port 8010
"""
from __future__ import annotations
import os
import uuid
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from typing import Any
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from temporalio.client import Client
from activity_core.models import ActivityDefinition, CronTriggerConfig
from activity_core.orm import ActivityDefinition as ActivityDefinitionRow
from activity_core.schedule_manager import delete_schedule, upsert_schedule
TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
_ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
# --- App state ---------------------------------------------------------------
_session_factory: async_sessionmaker[AsyncSession] | None = None
_temporal_client: Client | None = None
@asynccontextmanager
async def lifespan(app: FastAPI): # type: ignore[type-arg]
global _session_factory, _temporal_client
db_url = os.environ.get("ACTCORE_DB_URL")
if not db_url:
raise RuntimeError("ACTCORE_DB_URL is required")
engine = create_async_engine(db_url)
_session_factory = async_sessionmaker(engine, expire_on_commit=False)
_temporal_client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
yield
await engine.dispose()
app = FastAPI(title="activity-core API", lifespan=lifespan)
def _get_db() -> async_sessionmaker[AsyncSession]:
assert _session_factory is not None
return _session_factory
def _get_temporal() -> Client:
assert _temporal_client is not None
return _temporal_client
# --- Schemas -----------------------------------------------------------------
class ActivityDefinitionCreate(BaseModel):
name: str
enabled: bool = True
trigger_config: dict[str, Any]
context_sources: list[dict[str, Any]] = []
task_templates: list[dict[str, Any]] = []
dedupe_key_strategy: str = "skip"
version: int = 1
class ActivityDefinitionUpdate(BaseModel):
name: str | None = None
enabled: bool | None = None
trigger_config: dict[str, Any] | None = None
context_sources: list[dict[str, Any]] | None = None
task_templates: list[dict[str, Any]] | None = None
dedupe_key_strategy: str | None = None
version: int | None = None
class ActivityDefinitionResponse(BaseModel):
id: uuid.UUID
name: str
enabled: bool
trigger_type: str
trigger_config: dict[str, Any]
context_sources: list[dict[str, Any]]
task_templates: list[dict[str, Any]]
dedupe_key_strategy: str
version: int
created_at: datetime
updated_at: datetime
def _row_to_response(row: ActivityDefinitionRow) -> ActivityDefinitionResponse:
return ActivityDefinitionResponse(
id=row.id,
name=row.name,
enabled=row.enabled,
trigger_type=row.trigger_type,
trigger_config=row.trigger_config,
context_sources=row.context_sources,
task_templates=row.task_templates,
dedupe_key_strategy=row.dedupe_key_strategy,
version=row.version,
created_at=row.created_at,
updated_at=row.updated_at,
)
async def _upsert_schedule_if_cron(row: ActivityDefinitionRow) -> None:
"""Upsert a Temporal Schedule for the row if it uses a cron trigger."""
try:
defn = ActivityDefinition.model_validate(
{
"id": row.id,
"name": row.name,
"enabled": row.enabled,
"trigger_config": row.trigger_config,
"context_sources": row.context_sources,
"task_templates": row.task_templates,
"dedupe_key_strategy": row.dedupe_key_strategy,
"version": row.version,
}
)
if isinstance(defn.trigger_config, CronTriggerConfig):
await upsert_schedule(_get_temporal(), defn)
except Exception:
pass # Schedule management is best-effort; don't fail the API call.
# --- Routes ------------------------------------------------------------------
@app.get("/activity-definitions/", response_model=list[ActivityDefinitionResponse])
async def list_definitions() -> list[ActivityDefinitionResponse]:
"""List all ActivityDefinitions."""
Session = _get_db()
async with Session() as session:
rows = (await session.scalars(select(ActivityDefinitionRow))).all()
return [_row_to_response(r) for r in rows]
@app.get("/activity-definitions/{definition_id}", response_model=ActivityDefinitionResponse)
async def get_definition(definition_id: uuid.UUID) -> ActivityDefinitionResponse:
"""Get one ActivityDefinition by ID."""
Session = _get_db()
async with Session() as session:
row = await session.get(ActivityDefinitionRow, definition_id)
if row is None:
raise HTTPException(status_code=404, detail="ActivityDefinition not found")
return _row_to_response(row)
@app.post("/activity-definitions/", response_model=ActivityDefinitionResponse, status_code=201)
async def create_definition(body: ActivityDefinitionCreate) -> ActivityDefinitionResponse:
"""Create a new ActivityDefinition. Upserts a Temporal Schedule if trigger_type='cron'."""
trigger_type = body.trigger_config.get("trigger_type", "")
row = ActivityDefinitionRow(
id=uuid.uuid4(),
name=body.name,
enabled=body.enabled,
trigger_type=trigger_type,
trigger_config=body.trigger_config,
context_sources=body.context_sources,
task_templates=body.task_templates,
dedupe_key_strategy=body.dedupe_key_strategy,
version=body.version,
)
Session = _get_db()
async with Session() as session:
async with session.begin():
session.add(row)
await _upsert_schedule_if_cron(row)
return _row_to_response(row)
@app.put("/activity-definitions/{definition_id}", response_model=ActivityDefinitionResponse)
async def update_definition(
definition_id: uuid.UUID, body: ActivityDefinitionUpdate
) -> ActivityDefinitionResponse:
"""Update an ActivityDefinition. Re-upserts the Temporal Schedule if trigger_type='cron'."""
Session = _get_db()
async with Session() as session:
row = await session.get(ActivityDefinitionRow, definition_id)
if row is None:
raise HTTPException(status_code=404, detail="ActivityDefinition not found")
if body.name is not None:
row.name = body.name
if body.enabled is not None:
row.enabled = body.enabled
if body.trigger_config is not None:
row.trigger_config = body.trigger_config
row.trigger_type = body.trigger_config.get("trigger_type", row.trigger_type)
if body.context_sources is not None:
row.context_sources = body.context_sources
if body.task_templates is not None:
row.task_templates = body.task_templates
if body.dedupe_key_strategy is not None:
row.dedupe_key_strategy = body.dedupe_key_strategy
if body.version is not None:
row.version = body.version
async with session.begin():
session.add(row)
await _upsert_schedule_if_cron(row)
return _row_to_response(row)
@app.delete("/activity-definitions/{definition_id}", status_code=204)
async def delete_definition(definition_id: uuid.UUID) -> None:
"""Delete an ActivityDefinition and its Temporal Schedule if present."""
Session = _get_db()
async with Session() as session:
row = await session.get(ActivityDefinitionRow, definition_id)
if row is None:
raise HTTPException(status_code=404, detail="ActivityDefinition not found")
async with session.begin():
await session.delete(row)
await delete_schedule(_get_temporal(), definition_id)
@app.post("/activity-definitions/{definition_id}/trigger", status_code=202)
async def trigger_definition(definition_id: uuid.UUID) -> dict[str, str]:
"""Manually trigger a one-shot RunActivityWorkflow for any ActivityDefinition."""
Session = _get_db()
async with Session() as session:
row = await session.get(ActivityDefinitionRow, definition_id)
if row is None:
raise HTTPException(status_code=404, detail="ActivityDefinition not found")
trigger_key = f"manual-{uuid.uuid4()}"
workflow_id = f"activity-{definition_id}:{trigger_key}"
handle = await _get_temporal().start_workflow(
"RunActivityWorkflow",
args=[str(definition_id), trigger_key, datetime.now(tz=timezone.utc).isoformat()],
id=workflow_id,
task_queue=_ORCHESTRATOR_TASK_QUEUE,
)
return {"workflow_id": handle.id, "trigger_key": trigger_key}

View File

@@ -0,0 +1,226 @@
"""Event Router — NATS JetStream consumer that routes events to RunActivityWorkflow.
T26: EventRouter class — connects to NATS JetStream, subscribes to activity.>
T27: Routing rules — match event.type + payload filters to enabled ActivityDefinitions
T28: Start/signal workflow from Event Router with idempotent workflow ID
Stream: ACTIVITY_EVENTS
Subject: activity.>
Consumer: activity-core-event-router (durable, push-based)
Message ack happens only after the workflow has been successfully started,
giving at-least-once delivery semantics.
Usage:
NATS_URL=nats://localhost:4222 \
ACTCORE_DB_URL=postgresql+asyncpg://... \
TEMPORAL_HOST=localhost:7233 \
python -m activity_core.event_router
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import uuid
from datetime import datetime, timezone
from typing import Any
import nats
import nats.js.api
from nats.aio.client import Client as NATSClient
from nats.js.client import JetStreamContext
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from temporalio.client import Client as TemporalClient
from temporalio.common import WorkflowIDConflictPolicy
from temporalio.exceptions import WorkflowAlreadyStartedError
from activity_core.models import EventEnvelope, EventTriggerConfig
from activity_core.orm import ActivityDefinition as ActivityDefinitionRow
logger = logging.getLogger(__name__)
NATS_URL = os.environ.get("NATS_URL", "nats://localhost:4222")
TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
_STREAM_NAME = "ACTIVITY_EVENTS"
_SUBJECT = "activity.>"
_CONSUMER_NAME = "activity-core-event-router"
_ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
class EventRouter:
"""Subscribes to NATS JetStream and routes incoming events to Temporal workflows."""
def __init__(
self,
nats_url: str,
temporal_client: TemporalClient,
db_url: str,
) -> None:
self._nats_url = nats_url
self._temporal = temporal_client
self._db_url = db_url
self._nc: NATSClient | None = None
self._js: JetStreamContext | None = None
self._session_factory: async_sessionmaker[AsyncSession] | None = None
async def _ensure_stream(self, js: JetStreamContext) -> None:
"""Create the ACTIVITY_EVENTS stream if it does not exist."""
try:
await js.find_stream(_SUBJECT)
except Exception:
await js.add_stream(
nats.js.api.StreamConfig(
name=_STREAM_NAME,
subjects=[_SUBJECT],
)
)
logger.info("created JetStream stream %r", _STREAM_NAME)
# T27: Load all enabled event-trigger ActivityDefinitions from DB.
async def _load_event_definitions(
self,
) -> list[tuple[str, EventTriggerConfig]]:
"""Return list of (activity_id, EventTriggerConfig) for enabled event defs."""
assert self._session_factory is not None
async with self._session_factory() as session:
rows = (
await session.scalars(
select(ActivityDefinitionRow).where(
ActivityDefinitionRow.trigger_type == "event",
ActivityDefinitionRow.enabled.is_(True),
)
)
).all()
result = []
for row in rows:
try:
cfg = EventTriggerConfig.model_validate(row.trigger_config)
result.append((str(row.id), cfg))
except Exception:
logger.warning("skipping malformed trigger_config for activity %s", row.id)
return result
# T27: Match an envelope against the routing rules.
def _matches(self, envelope: EventEnvelope, cfg: EventTriggerConfig) -> bool:
"""Return True if the envelope matches the EventTriggerConfig."""
if envelope.type != cfg.event_type:
return False
# All filter key/value pairs must be present in envelope.payload.
for key, value in cfg.filters.items():
if envelope.payload.get(key) != value:
return False
return True
# T28: Start RunActivityWorkflow for a matched activity.
async def _dispatch(self, activity_id: str, envelope: EventEnvelope) -> None:
"""Start RunActivityWorkflow for one matched activity.
Workflow ID is deterministic: activity-{activity_id}:{event_id}
REJECT_DUPLICATE prevents double-processing if the message is redelivered
before ack reaches NATS.
"""
workflow_id = f"activity-{activity_id}:{envelope.event_id}"
try:
await self._temporal.start_workflow(
"RunActivityWorkflow",
args=[activity_id, envelope.event_id, envelope.occurred_at.isoformat()],
id=workflow_id,
task_queue=_ORCHESTRATOR_TASK_QUEUE,
id_conflict_policy=WorkflowIDConflictPolicy.FAIL,
)
logger.info(
"started workflow %r for event %r (activity %s)",
workflow_id,
envelope.event_id,
activity_id,
)
except WorkflowAlreadyStartedError:
# Duplicate delivery — workflow already running or completed; safe to skip.
logger.debug("duplicate event %r for activity %s — skipped", envelope.event_id, activity_id)
async def _handle_message(self, msg: Any) -> None:
"""Decode a NATS message, match it against routing rules, and dispatch."""
try:
raw = json.loads(msg.data.decode())
envelope = EventEnvelope.model_validate(raw)
except Exception:
logger.warning("failed to parse event envelope from NATS message — nacking")
await msg.nak()
return
# T27: Reload routing table per message so hot changes take effect.
event_defs = await self._load_event_definitions()
matched = [aid for aid, cfg in event_defs if self._matches(envelope, cfg)]
if not matched:
logger.debug("event %r type=%r matched no definitions", envelope.event_id, envelope.type)
await msg.ack()
return
# T28: Start a workflow for each matched activity.
for activity_id in matched:
await self._dispatch(activity_id, envelope)
# Ack only after all dispatches succeed (at-least-once guarantee).
await msg.ack()
async def start(self) -> None:
"""Connect to NATS, set up the stream/consumer, and begin processing.
Blocks until cancelled.
"""
engine = create_async_engine(self._db_url)
self._session_factory = async_sessionmaker(engine, expire_on_commit=False)
self._nc = await nats.connect(self._nats_url)
self._js = self._nc.jetstream()
await self._ensure_stream(self._js)
# Durable push consumer — survives restarts, replays unacked messages.
sub = await self._js.subscribe(
_SUBJECT,
durable=_CONSUMER_NAME,
cb=self._handle_message,
manual_ack=True,
)
logger.info(
"EventRouter listening on subject %r (stream=%r, consumer=%r)",
_SUBJECT,
_STREAM_NAME,
_CONSUMER_NAME,
)
try:
await asyncio.Future() # run until cancelled
finally:
await sub.unsubscribe()
await self._nc.drain()
await engine.dispose()
async def main() -> None:
logging.basicConfig(level=logging.INFO)
db_url = os.environ.get("ACTCORE_DB_URL")
if not db_url:
raise RuntimeError("ACTCORE_DB_URL is required")
temporal_client = await TemporalClient.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
router = EventRouter(
nats_url=NATS_URL,
temporal_client=temporal_client,
db_url=db_url,
)
await router.start()
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,163 @@
"""Temporal Schedule management for activity-core.
T22: upsert_schedule, delete_schedule, list_schedules
T24: misfire_policy → ScheduleOverlapPolicy mapping (all three policies)
Schedule ID convention: activity-schedule-{activity_definition.id}
Workflow triggered: RunActivityWorkflow on orchestrator-tq
"""
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from uuid import UUID
from temporalio.client import (
Client,
Schedule,
ScheduleActionStartWorkflow,
ScheduleBackfill,
ScheduleHandle,
ScheduleOverlapPolicy,
SchedulePolicy,
ScheduleSpec,
ScheduleState,
ScheduleUpdate,
ScheduleUpdateInput,
)
from temporalio.service import RPCError
from activity_core.models import ActivityDefinition, CronTriggerConfig
_ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
# Trigger_key sentinel used when a workflow is started by a Temporal Schedule.
# RunActivityWorkflow detects this value and derives run dedup key from workflow_id.
SCHEDULED_TRIGGER_KEY = "scheduled"
# T24: misfire_policy → ScheduleOverlapPolicy
_MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
"skip": ScheduleOverlapPolicy.SKIP,
"catchup": ScheduleOverlapPolicy.BUFFER_ALL,
"compress": ScheduleOverlapPolicy.BUFFER_ONE,
}
def schedule_id(activity_id: str | UUID) -> str:
"""Return the canonical Temporal Schedule ID for an ActivityDefinition."""
return f"activity-schedule-{activity_id}"
def _overlap_policy(misfire_policy: str) -> ScheduleOverlapPolicy:
return _MISFIRE_TO_OVERLAP.get(misfire_policy, ScheduleOverlapPolicy.SKIP)
def _build_schedule(defn: ActivityDefinition) -> Schedule:
"""Construct a Temporal Schedule object from a cron ActivityDefinition."""
assert isinstance(defn.trigger_config, CronTriggerConfig)
cfg: CronTriggerConfig = defn.trigger_config
# Workflow ID uses ${firstScheduledTime} so each schedule fire gets a
# unique workflow ID, enabling replay/audit without ID conflicts.
action = ScheduleActionStartWorkflow(
"RunActivityWorkflow",
args=[str(defn.id), SCHEDULED_TRIGGER_KEY, None],
id=f"activity-{defn.id}:${{firstScheduledTime}}",
task_queue=_ORCHESTRATOR_TASK_QUEUE,
)
spec = ScheduleSpec(
cron_expressions=[cfg.cron_expression],
timezone_name=cfg.timezone,
jitter=timedelta(seconds=cfg.jitter_seconds) if cfg.jitter_seconds else None,
)
policy = SchedulePolicy(overlap=_overlap_policy(cfg.misfire_policy))
state = ScheduleState(paused=not defn.enabled)
return Schedule(action=action, spec=spec, policy=policy, state=state)
async def upsert_schedule(client: Client, defn: ActivityDefinition) -> ScheduleHandle:
"""Create or update a Temporal Schedule for a cron ActivityDefinition.
- Only operates on definitions with trigger_type='cron'.
- If enabled=False the schedule is created paused.
- For misfire_policy='catchup', triggers a backfill covering the last hour
after each upsert to replay any recently missed fires.
Returns the ScheduleHandle for the created/updated schedule.
"""
if not isinstance(defn.trigger_config, CronTriggerConfig):
raise ValueError(
f"upsert_schedule requires trigger_type='cron', "
f"got {defn.trigger_config.trigger_type!r}"
)
sid = schedule_id(defn.id)
sched = _build_schedule(defn)
try:
handle = await client.create_schedule(sid, sched)
except RPCError:
# Schedule already exists — update it in place.
handle = client.get_schedule_handle(sid)
async def _updater(input: ScheduleUpdateInput) -> ScheduleUpdate: # noqa: ARG001
return ScheduleUpdate(schedule=sched)
await handle.update(_updater)
# Sync pause state explicitly (update replaces the schedule object
# but pause state is part of ScheduleState, already embedded above).
if defn.enabled:
await handle.unpause()
else:
await handle.pause(note="disabled via upsert_schedule")
# T24 catchup: backfill any fires missed in the last hour.
if isinstance(defn.trigger_config, CronTriggerConfig):
if defn.trigger_config.misfire_policy == "catchup":
now = datetime.now(tz=timezone.utc)
backfill_start = now - timedelta(hours=1)
await handle.backfill(
[
ScheduleBackfill(
start_at=backfill_start,
end_at=now,
overlap=ScheduleOverlapPolicy.BUFFER_ALL,
)
]
)
return handle
async def delete_schedule(client: Client, activity_id: str | UUID) -> None:
"""Delete the Temporal Schedule for the given activity_id.
No-op if the schedule does not exist.
"""
handle = client.get_schedule_handle(schedule_id(activity_id))
try:
await handle.delete()
except RPCError:
pass # Not found — treat as success.
async def list_schedules(client: Client) -> list[dict]:
"""Enumerate all activity-core Temporal Schedules.
Returns a list of dicts: [{"schedule_id": str, "activity_id": str}, ...]
"""
prefix = "activity-schedule-"
results: list[dict] = []
async for entry in await client.list_schedules():
if entry.id.startswith(prefix):
results.append(
{
"schedule_id": entry.id,
"activity_id": entry.id[len(prefix) :],
}
)
return results

View File

@@ -0,0 +1,123 @@
"""Bootstrap script: sync Temporal Schedules with the ActivityDefinition DB.
T23: On startup, ensures every enabled cron ActivityDefinition has a live
Temporal Schedule, and removes orphaned schedules that have no matching DB row.
Run directly:
ACTCORE_DB_URL=... uv run python -m activity_core.sync_schedules
Also called from worker.py before the worker enters its run loop.
"""
from __future__ import annotations
import asyncio
import logging
import os
import uuid
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from temporalio.client import Client
from activity_core.models import ActivityDefinition, CronTriggerConfig
from activity_core.orm import ActivityDefinition as ActivityDefinitionRow
from activity_core.schedule_manager import delete_schedule, list_schedules, upsert_schedule
logger = logging.getLogger(__name__)
TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
"""Convert an ORM row to a domain ActivityDefinition for schedule_manager."""
return ActivityDefinition.model_validate(
{
"id": row.id,
"name": row.name,
"enabled": row.enabled,
"trigger_config": row.trigger_config,
"context_sources": row.context_sources,
"task_templates": row.task_templates,
"dedupe_key_strategy": row.dedupe_key_strategy,
"version": row.version,
}
)
async def sync(client: Client, db_url: str) -> None:
"""Reconcile Temporal Schedules against the ActivityDefinition table.
Steps:
1. Load all enabled cron ActivityDefinitions from Postgres.
2. Upsert a Temporal Schedule for each one.
3. Delete Temporal Schedules whose activity_id has no matching DB row
(tombstone cleanup for deleted or trigger-type-changed definitions).
"""
engine = create_async_engine(db_url)
session_factory = async_sessionmaker(engine, expire_on_commit=False)
try:
async with session_factory() as session:
rows = (
await session.scalars(
select(ActivityDefinitionRow).where(
ActivityDefinitionRow.trigger_type == "cron"
)
)
).all()
finally:
await engine.dispose()
db_activity_ids: set[str] = set()
upserted = 0
skipped = 0
for row in rows:
defn = _row_to_domain(row)
if not isinstance(defn.trigger_config, CronTriggerConfig):
continue # should not happen given the WHERE clause, but guard anyway
db_activity_ids.add(str(defn.id))
if defn.enabled:
await upsert_schedule(client, defn)
upserted += 1
logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
else:
# Disabled definitions: schedule may exist (paused) — leave it;
# upsert_schedule already handles the paused state.
await upsert_schedule(client, defn)
skipped += 1
logger.info("upserted paused schedule for disabled activity %s", defn.id)
# Tombstone cleanup: remove Temporal Schedules with no matching DB row.
existing_schedules = await list_schedules(client)
deleted = 0
for entry in existing_schedules:
if entry["activity_id"] not in db_activity_ids:
await delete_schedule(client, entry["activity_id"])
deleted += 1
logger.info("deleted orphaned schedule %s", entry["schedule_id"])
logger.info(
"sync_schedules complete — upserted=%d skipped_disabled=%d deleted_orphans=%d",
upserted,
skipped,
deleted,
)
async def main() -> None:
logging.basicConfig(level=logging.INFO)
db_url = os.environ.get("ACTCORE_DB_URL")
if not db_url:
raise RuntimeError("ACTCORE_DB_URL is required")
client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
await sync(client, db_url)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -4,23 +4,31 @@ Starts two workers (wired up in T20):
- orchestrator-tq: RunActivityWorkflow + its activities
- task-execution-tq: TaskExecutorWorkflow
T23: Calls sync_schedules before entering the worker run loop to ensure
all cron ActivityDefinitions have live Temporal Schedules.
T31: Exposes Prometheus metrics via the Temporal SDK runtime on :9090/metrics.
Run with:
TEMPORAL_HOST=localhost:7233 \
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
python -m activity_core.worker
Environment variables:
TEMPORAL_HOST Temporal frontend address (default: localhost:7233)
TEMPORAL_NAMESPACE Temporal namespace (default: default)
ACTCORE_DB_URL App DB connection string (required)
TEMPORAL_HOST Temporal frontend address (default: localhost:7233)
TEMPORAL_NAMESPACE Temporal namespace (default: default)
ACTCORE_DB_URL App DB connection string (required)
PROMETHEUS_BIND_ADDR Prometheus metrics bind (default: 0.0.0.0:9090)
"""
from __future__ import annotations
import asyncio
import logging
import os
from temporalio.client import Client
from temporalio.runtime import PrometheusConfig, Runtime, TelemetryConfig
from temporalio.worker import Worker
from activity_core.activities import (
@@ -30,10 +38,14 @@ from activity_core.activities import (
persist_task_instance,
resolve_context,
)
from activity_core.sync_schedules import sync as sync_schedules
from activity_core.workflows import RunActivityWorkflow, TaskExecutorWorkflow
logger = logging.getLogger(__name__)
TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
PROMETHEUS_BIND_ADDR = os.environ.get("PROMETHEUS_BIND_ADDR", "0.0.0.0:9090")
ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
TASK_EXECUTION_TASK_QUEUE = "task-execution-tq"
@@ -45,7 +57,23 @@ async def run() -> None:
raise RuntimeError("ACTCORE_DB_URL is required")
init_session_factory(db_url)
client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
# T31: Configure the Temporal SDK runtime to emit metrics in Prometheus format.
runtime = Runtime(
telemetry=TelemetryConfig(
metrics=PrometheusConfig(bind_address=PROMETHEUS_BIND_ADDR)
)
)
client = await Client.connect(
TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE, runtime=runtime
)
# T23: Sync Temporal Schedules with the DB before workers start accepting tasks.
logger.info("Syncing Temporal Schedules with ActivityDefinition DB...")
try:
await sync_schedules(client, db_url)
except Exception:
logger.exception("schedule sync failed — continuing worker startup")
orchestrator_worker = Worker(
client,
@@ -70,4 +98,5 @@ async def run() -> None:
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
asyncio.run(run())

View File

@@ -15,7 +15,7 @@ import uuid
from datetime import timedelta
from temporalio import workflow
from temporalio.common import RetryPolicy
from temporalio.common import RetryPolicy, SearchAttributeKey, TypedSearchAttributes, SearchAttributePair
with workflow.unsafe.imports_passed_through():
from activity_core.activities import (
@@ -25,6 +25,12 @@ with workflow.unsafe.imports_passed_through():
resolve_context,
)
from activity_core.template_engine import evaluate_templates
from activity_core.schedule_manager import SCHEDULED_TRIGGER_KEY
# T32: Custom search attributes for Temporal visibility (must be registered in Temporal first).
# Registration: temporal operator search-attribute create --name ActivityId --type Keyword
_ACTIVITY_ID_KEY = SearchAttributeKey.for_keyword("ActivityId")
_ACTIVITY_NAME_KEY = SearchAttributeKey.for_keyword("ActivityName")
_RETRY_POLICY = RetryPolicy(
initial_interval=timedelta(seconds=1),
@@ -74,6 +80,16 @@ class RunActivityWorkflow:
retry_policy=_RETRY_POLICY,
)
# T32: Tag this workflow execution with activity metadata so runs are
# filterable in the Temporal UI (requires ActivityId + ActivityName to be
# registered as custom search attributes — see docs/runbook.md).
workflow.upsert_search_attributes(
TypedSearchAttributes([
SearchAttributePair(_ACTIVITY_ID_KEY, activity_id),
SearchAttributePair(_ACTIVITY_NAME_KEY, defn.get("name", "")),
])
)
# ── 2. Resolve context ────────────────────────────────────────────────
context_snapshot: dict = await workflow.execute_activity(
resolve_context,
@@ -89,9 +105,14 @@ class RunActivityWorkflow:
# ── 4. Log the run ────────────────────────────────────────────────────
# run_id is derived deterministically so log_run retries are idempotent.
run_id = str(
uuid.uuid5(uuid.NAMESPACE_URL, f"{activity_id}:{trigger_key}")
)
# For schedule-fired runs the trigger_key is the sentinel "scheduled";
# each fire has a unique workflow_id (embeds ${firstScheduledTime}), so
# we use the workflow_id as the dedup key instead.
if trigger_key == SCHEDULED_TRIGGER_KEY:
dedup_source = workflow.info().workflow_id
else:
dedup_source = f"{activity_id}:{trigger_key}"
run_id = str(uuid.uuid5(uuid.NAMESPACE_URL, dedup_source))
await workflow.execute_activity(
log_run,
{

0
tests/__init__.py Normal file
View File

303
tests/test_event_router.py Normal file
View File

@@ -0,0 +1,303 @@
"""T29: Integration test — publish event → observe workflow run.
Requires the docker compose stack to be running including NATS:
docker compose -f docker-compose.dev.yml up -d
Run with:
ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
NATS_URL=nats://localhost:4222 \
TEMPORAL_HOST=localhost:7233 \
uv run pytest tests/test_event_router.py -v -s
These tests are skipped automatically if NATS or Temporal is unreachable.
"""
from __future__ import annotations
import asyncio
import json
import os
import uuid
from datetime import datetime, timezone
from unittest.mock import AsyncMock, MagicMock
import pytest
from activity_core.event_router import EventRouter
from activity_core.models import EventEnvelope, EventTriggerConfig
# ── Unit tests (no external deps) ────────────────────────────────────────────
def _make_envelope(
event_type: str = "user.created",
payload: dict | None = None,
) -> EventEnvelope:
return EventEnvelope(
event_id=str(uuid.uuid4()),
type=event_type,
source="test-service",
occurred_at=datetime.now(tz=timezone.utc),
subject="user/123",
trace_id=str(uuid.uuid4()),
payload=payload or {},
)
def _make_router() -> EventRouter:
"""Return an EventRouter wired with mock clients (no real connections)."""
temporal_mock = MagicMock()
return EventRouter(
nats_url="nats://localhost:4222",
temporal_client=temporal_mock,
db_url="postgresql+asyncpg://actcore:actcore@localhost:5433/actcore",
)
# T27: _matches unit tests
def test_matches_exact_event_type() -> None:
router = _make_router()
cfg = EventTriggerConfig(event_type="user.created")
envelope = _make_envelope(event_type="user.created")
assert router._matches(envelope, cfg)
def test_matches_wrong_event_type() -> None:
router = _make_router()
cfg = EventTriggerConfig(event_type="user.updated")
envelope = _make_envelope(event_type="user.created")
assert not router._matches(envelope, cfg)
def test_matches_with_filters_all_present() -> None:
router = _make_router()
cfg = EventTriggerConfig(
event_type="user.created",
filters={"region": "eu", "tier": "pro"},
)
envelope = _make_envelope(
event_type="user.created",
payload={"region": "eu", "tier": "pro", "extra": "ignored"},
)
assert router._matches(envelope, cfg)
def test_matches_with_filters_partial_missing() -> None:
router = _make_router()
cfg = EventTriggerConfig(
event_type="user.created",
filters={"region": "eu", "tier": "pro"},
)
envelope = _make_envelope(
event_type="user.created",
payload={"region": "eu"}, # "tier" missing
)
assert not router._matches(envelope, cfg)
def test_matches_with_filters_wrong_value() -> None:
router = _make_router()
cfg = EventTriggerConfig(event_type="order.placed", filters={"status": "paid"})
envelope = _make_envelope(event_type="order.placed", payload={"status": "pending"})
assert not router._matches(envelope, cfg)
# T28: _dispatch unit test (mocked Temporal client)
@pytest.mark.asyncio
async def test_dispatch_starts_workflow_with_correct_id() -> None:
temporal_mock = AsyncMock()
handle_mock = AsyncMock()
temporal_mock.start_workflow.return_value = handle_mock
router = EventRouter(
nats_url="nats://localhost:4222",
temporal_client=temporal_mock,
db_url="postgresql+asyncpg://actcore:actcore@localhost:5433/actcore",
)
activity_id = str(uuid.uuid4())
envelope = _make_envelope()
await router._dispatch(activity_id, envelope)
expected_id = f"activity-{activity_id}:{envelope.event_id}"
temporal_mock.start_workflow.assert_called_once()
call_args = temporal_mock.start_workflow.call_args
assert call_args.kwargs["id"] == expected_id
assert call_args.args[0] == "RunActivityWorkflow"
@pytest.mark.asyncio
async def test_dispatch_duplicate_event_is_silently_skipped() -> None:
from temporalio.exceptions import WorkflowAlreadyStartedError
temporal_mock = AsyncMock()
temporal_mock.start_workflow.side_effect = WorkflowAlreadyStartedError(
workflow_id="activity-x:y", run_id="z", workflow_type="RunActivityWorkflow"
)
router = EventRouter(
nats_url="nats://localhost:4222",
temporal_client=temporal_mock,
db_url="postgresql+asyncpg://actcore:actcore@localhost:5433/actcore",
)
# Should not raise
await router._dispatch(str(uuid.uuid4()), _make_envelope())
# T28: _handle_message unit test (mocked NATS message)
@pytest.mark.asyncio
async def test_handle_message_invalid_json_nacks() -> None:
router = _make_router()
router._session_factory = None # not needed for this test path
msg = MagicMock()
msg.data = b"not-json"
msg.nak = AsyncMock()
msg.ack = AsyncMock()
await router._handle_message(msg)
msg.nak.assert_called_once()
msg.ack.assert_not_called()
@pytest.mark.asyncio
async def test_handle_message_no_match_acks_without_dispatch() -> None:
temporal_mock = AsyncMock()
router = EventRouter(
nats_url="nats://localhost:4222",
temporal_client=temporal_mock,
db_url="postgresql+asyncpg://actcore:actcore@localhost:5433/actcore",
)
# Patch _load_event_definitions to return empty (no definitions match)
router._load_event_definitions = AsyncMock(return_value=[])
envelope = _make_envelope()
msg = MagicMock()
msg.data = envelope.model_dump_json().encode()
msg.ack = AsyncMock()
msg.nak = AsyncMock()
await router._handle_message(msg)
msg.ack.assert_called_once()
temporal_mock.start_workflow.assert_not_called()
# ── Integration tests (require docker-compose stack) ─────────────────────────
NATS_URL = os.environ.get("NATS_URL", "nats://localhost:4222")
TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
ACTCORE_DB_URL = os.environ.get(
"ACTCORE_DB_URL",
"postgresql+asyncpg://actcore:actcore@localhost:5433/actcore",
)
async def _nats_reachable() -> bool:
try:
import nats
nc = await nats.connect(NATS_URL, connect_timeout=2)
await nc.close()
return True
except Exception:
return False
async def _temporal_reachable() -> bool:
try:
from temporalio.client import Client
client = await Client.connect(TEMPORAL_HOST)
await client.service_client.health_check()
return True
except Exception:
return False
@pytest.fixture(scope="module")
async def integration_skip():
"""Skip the integration block if NATS or Temporal is unreachable."""
if not (await _nats_reachable() and await _temporal_reachable()):
pytest.skip("NATS and/or Temporal not reachable — skipping integration tests")
@pytest.mark.asyncio
async def test_publish_event_starts_workflow(integration_skip: None) -> None:
"""Publish a NATS event and verify RunActivityWorkflow is started in Temporal."""
import nats as nats_lib
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker, AsyncSession
from sqlalchemy import select
from temporalio.client import Client, WorkflowExecutionStatus
from activity_core.orm import ActivityDefinition as ActivityDefinitionRow
# Create an event-triggered ActivityDefinition in the DB.
engine = create_async_engine(ACTCORE_DB_URL)
session_factory = async_sessionmaker(engine, expire_on_commit=False)
activity_id = uuid.uuid4()
event_type = f"test.event.{uuid.uuid4().hex[:8]}"
async with session_factory() as session:
async with session.begin():
row = ActivityDefinitionRow(
id=activity_id,
name=f"integration-test-{activity_id}",
enabled=True,
trigger_type="event",
trigger_config={"trigger_type": "event", "event_type": event_type, "filters": {}},
context_sources=[],
task_templates=[],
dedupe_key_strategy="skip",
version=1,
)
session.add(row)
temporal_client = await Client.connect(TEMPORAL_HOST)
router = EventRouter(
nats_url=NATS_URL,
temporal_client=temporal_client,
db_url=ACTCORE_DB_URL,
)
# Start the router in the background.
router_task = asyncio.create_task(router.start())
await asyncio.sleep(1) # allow subscription to establish
# Publish a matching event.
event_id = str(uuid.uuid4())
envelope = EventEnvelope(
event_id=event_id,
type=event_type,
source="integration-test",
occurred_at=datetime.now(tz=timezone.utc),
subject="test/1",
trace_id=str(uuid.uuid4()),
)
nc = await nats_lib.connect(NATS_URL)
await nc.publish(f"activity.{event_type}", envelope.model_dump_json().encode())
await nc.flush()
await nc.close()
# Give the router time to process and Temporal time to receive the start.
await asyncio.sleep(3)
# Assert the workflow was started.
expected_wf_id = f"activity-{activity_id}:{event_id}"
try:
desc = await temporal_client.get_workflow_handle(expected_wf_id).describe()
assert desc is not None, "Workflow handle should exist"
except Exception as e:
pytest.fail(f"Workflow {expected_wf_id!r} was not started: {e}")
finally:
router_task.cancel()
try:
await router_task
except asyncio.CancelledError:
pass
await engine.dispose()

View File

@@ -0,0 +1,168 @@
"""T25: Schedule pause/resume lifecycle tests.
Tests schedule_manager.py against a local embedded Temporal server
(temporalio[testing] — WorkflowEnvironment.start_local()).
Requires no Docker; the Temporal testing library bundles a self-contained server.
Run with:
uv run pytest tests/test_schedule_lifecycle.py -v
"""
from __future__ import annotations
import uuid
import pytest
from temporalio.client import ScheduleOverlapPolicy
from temporalio.testing import WorkflowEnvironment
from activity_core.models import ActivityDefinition, CronTriggerConfig
from activity_core.schedule_manager import (
delete_schedule,
list_schedules,
schedule_id,
upsert_schedule,
)
def _make_defn(
*,
cron: str = "0 9 * * 1-5",
misfire_policy: str = "skip",
enabled: bool = True,
jitter: int = 0,
) -> ActivityDefinition:
return ActivityDefinition(
id=uuid.uuid4(),
name="test-activity",
enabled=enabled,
trigger_config=CronTriggerConfig(
cron_expression=cron,
misfire_policy=misfire_policy,
jitter_seconds=jitter,
),
)
@pytest.fixture(scope="module")
async def env():
"""Start a local embedded Temporal server for the test module."""
async with await WorkflowEnvironment.start_local() as e:
yield e
# ── T25a: upsert creates a schedule and list_schedules finds it ──────────────
@pytest.mark.asyncio
async def test_upsert_schedule_creates_schedule(env: WorkflowEnvironment) -> None:
defn = _make_defn()
sid = schedule_id(defn.id)
await upsert_schedule(env.client, defn)
schedules = await list_schedules(env.client)
ids = [s["schedule_id"] for s in schedules]
assert sid in ids, f"Expected schedule {sid!r} in {ids}"
# Cleanup
await delete_schedule(env.client, defn.id)
# ── T25b: upsert with enabled=False creates a paused schedule ────────────────
@pytest.mark.asyncio
async def test_upsert_disabled_creates_paused_schedule(env: WorkflowEnvironment) -> None:
defn = _make_defn(enabled=False)
await upsert_schedule(env.client, defn)
handle = env.client.get_schedule_handle(schedule_id(defn.id))
desc = await handle.describe()
assert desc.schedule.state.paused, "Schedule should be paused when enabled=False"
await delete_schedule(env.client, defn.id)
# ── T25c: second upsert (enabled=True) unpauses the schedule ────────────────
@pytest.mark.asyncio
async def test_upsert_reenables_paused_schedule(env: WorkflowEnvironment) -> None:
defn_disabled = _make_defn(enabled=False)
await upsert_schedule(env.client, defn_disabled)
# Re-enable the same activity
defn_enabled = ActivityDefinition(
id=defn_disabled.id,
name=defn_disabled.name,
enabled=True,
trigger_config=defn_disabled.trigger_config,
)
await upsert_schedule(env.client, defn_enabled)
handle = env.client.get_schedule_handle(schedule_id(defn_enabled.id))
desc = await handle.describe()
assert not desc.schedule.state.paused, "Schedule should be unpaused after re-enable"
await delete_schedule(env.client, defn_enabled.id)
# ── T25d: delete_schedule removes the schedule ───────────────────────────────
@pytest.mark.asyncio
async def test_delete_schedule_removes_schedule(env: WorkflowEnvironment) -> None:
defn = _make_defn()
await upsert_schedule(env.client, defn)
await delete_schedule(env.client, defn.id)
schedules = await list_schedules(env.client)
ids = [s["schedule_id"] for s in schedules]
assert schedule_id(defn.id) not in ids, "Schedule should be gone after delete"
# ── T25e: delete_schedule is idempotent (no-op for non-existent schedule) ────
@pytest.mark.asyncio
async def test_delete_schedule_nonexistent_is_noop(env: WorkflowEnvironment) -> None:
# Should not raise
await delete_schedule(env.client, uuid.uuid4())
# ── T24: misfire_policy round-trip ───────────────────────────────────────────
@pytest.mark.asyncio
async def test_misfire_policy_skip_sets_overlap_skip(env: WorkflowEnvironment) -> None:
defn = _make_defn(misfire_policy="skip")
await upsert_schedule(env.client, defn)
handle = env.client.get_schedule_handle(schedule_id(defn.id))
desc = await handle.describe()
assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.SKIP
await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
async def test_misfire_policy_catchup_sets_overlap_buffer_all(env: WorkflowEnvironment) -> None:
defn = _make_defn(misfire_policy="catchup")
await upsert_schedule(env.client, defn)
handle = env.client.get_schedule_handle(schedule_id(defn.id))
desc = await handle.describe()
assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ALL
await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
async def test_misfire_policy_compress_sets_overlap_buffer_one(env: WorkflowEnvironment) -> None:
defn = _make_defn(misfire_policy="compress")
await upsert_schedule(env.client, defn)
handle = env.client.get_schedule_handle(schedule_id(defn.id))
desc = await handle.describe()
assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ONE
await delete_schedule(env.client, defn.id)

11
uv.lock generated
View File

@@ -9,6 +9,7 @@ dependencies = [
{ name = "alembic" },
{ name = "asyncpg" },
{ name = "fastapi" },
{ name = "nats-py" },
{ name = "pydantic" },
{ name = "sqlalchemy", extra = ["asyncio"] },
{ name = "temporalio" },
@@ -27,6 +28,7 @@ requires-dist = [
{ name = "alembic", specifier = ">=1.14" },
{ name = "asyncpg", specifier = ">=0.29" },
{ name = "fastapi", specifier = ">=0.115" },
{ name = "nats-py", specifier = ">=2.7" },
{ name = "pydantic", specifier = ">=2.0" },
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.24" },
@@ -367,6 +369,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/70/bc/6f1c2f612465f5fa89b95bead1f44dcb607670fd42891d8fdcd5d039f4f4/markupsafe-3.0.3-cp314-cp314t-win_arm64.whl", hash = "sha256:32001d6a8fc98c8cb5c947787c5d08b0a50663d139f1305bac5885d98d9b40fa", size = 14146 },
]
[[package]]
name = "nats-py"
version = "2.14.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/c3/f8/b956c4621ba88748ed707c52e69f95b7a50c8914e750edca59a5bef84a76/nats_py-2.14.0.tar.gz", hash = "sha256:4ed02cb8e3b55c68074a063aa2687087115d805d1513297da90cb2068fb07bed", size = 120751 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f9/39/0e87753df1072254bac190b33ed34b264f28f6aa9bea0f01b7e818071756/nats_py-2.14.0-py3-none-any.whl", hash = "sha256:4116f5d2233ce16e63c3d5538fa40a5e207f75fcf42a741773929ddf1e29d19d", size = 82259 },
]
[[package]]
name = "nexus-rpc"
version = "1.3.0"

View File

@@ -3,56 +3,56 @@ id: custodian-WP-0002
type: workplan
domain: custodian
repo: activity-core
status: active
status: done
state_hub_workstream_id: 3a4f47d9-8bc1-434e-acb4-bed5d4dacda0
tasks:
- id: T22
title: Write schedule_manager.py
status: todo
status: done
state_hub_task_id: e50550d1-9904-41d7-afd8-492a1f1e91b8
- id: T23
title: Bootstrap script to sync schedules on startup
status: todo
status: done
state_hub_task_id: 5a1f7fa3-acb9-4f60-9892-c9eaa120272e
- id: T24
title: Handle misfire policy in schedule config
status: todo
status: done
state_hub_task_id: 00231668-95c5-447f-b3d0-1fb8c20b487f
- id: T25
title: Test schedule pause/resume lifecycle
status: todo
status: done
state_hub_task_id: 7abfd375-ea9d-4209-8371-e5664dc2c6c4
- id: T26
title: Implement Event Router service
status: todo
status: done
state_hub_task_id: 68b6610b-159c-4f1c-92a9-7128efea0961
- id: T27
title: Implement routing rules (event.type + filters → activity_ids)
status: todo
status: done
state_hub_task_id: 9348efea-a7e9-4f92-b866-8fc82cf28fee
- id: T28
title: Start/signal workflow from Event Router
status: todo
status: done
state_hub_task_id: cac1f45a-7391-471a-9566-97cdbd96eb2d
- id: T29
title: Integration test — publish event → observe workflow run
status: todo
status: done
state_hub_task_id: 7f10b5a3-7cad-4914-b603-d57508c85629
- id: T30
title: REST API (FastAPI) — CRUD for ActivityDefinition
status: todo
status: done
state_hub_task_id: b27e54a1-5dcc-476d-8f4a-c995aea6a8c2
- id: T31
title: Wire Temporal SDK metrics to Prometheus
status: todo
status: done
state_hub_task_id: 0eafb60c-f00e-4fd7-a921-7de75fcfe81e
- id: T32
title: Tag workflows with activity_id for Temporal visibility search
status: todo
status: done
state_hub_task_id: 7bdfc5c2-1f06-4fce-aac3-fae036dcb47e
- id: T33
title: Write operational runbook
status: todo
status: done
state_hub_task_id: 766d602d-1b23-4247-a46d-03c0d3b8e498
---
@@ -61,6 +61,7 @@ tasks:
**Workstream:** activity-core Triggers & Ops
**Hub ID:** `3a4f47d9-8bc1-434e-acb4-bed5d4dacda0`
**Depends on:** custodian-WP-0001 (Foundation — Temporal Backbone)
**Status:** DONE (2026-03-28)
## Purpose
@@ -68,50 +69,62 @@ Add automated triggering (time-based via Temporal Schedules and event-driven via
a REST admin API, Prometheus metrics, and an operational runbook. Transforms the manually-triggered
backbone from WP-0001 into a self-operating service.
## Open decisions (resolve before Phase 5)
## Decisions resolved
- **Event broker choice** (hub: `bc47c9c2-5643-4a88-8114-601738a2f64e`): Kafka vs NATS vs RabbitMQ.
T26T29 are blocked until this is resolved.
- **Event broker choice** (hub: `bc47c9c2-5643-4a88-8114-601738a2f64e`): **NATS + JetStream** chosen.
---
## Phase 4 — Time-Based Triggers (Temporal Schedules)
## Phase 4 — Time-Based Triggers (Temporal Schedules)
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T22: Write schedule_manager.py | medium | `e50550d1-9904-41d7-afd8-492a1f1e91b8` |
| T23: Bootstrap script to sync schedules on startup | medium | `5a1f7fa3-acb9-4f60-9892-c9eaa120272e` |
| T24: Handle misfire policy in schedule config | medium | `00231668-95c5-447f-b3d0-1fb8c20b487f` |
| T25: Test schedule pause/resume lifecycle | medium | `7abfd375-ea9d-4209-8371-e5664dc2c6c4` |
| T22: Write schedule_manager.py | medium | done |
| T23: Bootstrap script to sync schedules on startup | medium | done |
| T24: Handle misfire policy in schedule config | medium | done |
| T25: Test schedule pause/resume lifecycle | medium | done |
---
## Phase 5 — Event-Driven Triggers
## Phase 5 — Event-Driven Triggers
*Blocked by broker decision (`bc47c9c2-5643-4a88-8114-601738a2f64e`).*
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T26: Implement Event Router service | medium | `68b6610b-159c-4f1c-92a9-7128efea0961` |
| T27: Implement routing rules (event.type + filters → activity_ids) | medium | `9348efea-a7e9-4f92-b866-8fc82cf28fee` |
| T28: Start/signal workflow from Event Router | medium | `cac1f45a-7391-471a-9566-97cdbd96eb2d` |
| T29: Integration test — publish event → observe workflow run | medium | `7f10b5a3-7cad-4914-b603-d57508c85629` |
| T26: Implement Event Router service | medium | done |
| T27: Implement routing rules (event.type + filters → activity_ids) | medium | done |
| T28: Start/signal workflow from Event Router | medium | done |
| T29: Integration test — publish event → observe workflow run | medium | done |
---
## Phase 6 — Observability & Admin
## Phase 6 — Observability & Admin
| Task | Priority | Hub task ID |
| Task | Priority | Status |
|---|---|---|
| T30: REST API (FastAPI) — CRUD for ActivityDefinition | low | `b27e54a1-5dcc-476d-8f4a-c995aea6a8c2` |
| T31: Wire Temporal SDK metrics to Prometheus | low | `0eafb60c-f00e-4fd7-a921-7de75fcfe81e` |
| T32: Tag workflows with activity_id for Temporal visibility search | low | `7bdfc5c2-1f06-4fce-aac3-fae036dcb47e` |
| T33: Write operational runbook | low | `766d602d-1b23-4247-a46d-03c0d3b8e498` |
| T30: REST API (FastAPI) — CRUD for ActivityDefinition | low | done |
| T31: Wire Temporal SDK metrics to Prometheus | low | done |
| T32: Tag workflows with activity_id for Temporal visibility search | low | done |
| T33: Write operational runbook | low | done |
---
## Completion criteria
## Files produced
Schedules fire `RunActivityWorkflow` automatically on cron cadence. An external event published
to the broker reaches the correct ActivityDefinition end-to-end. ActivityDefinitions are
manageable via REST API. Prometheus metrics are scraped. Runbook is written.
| File | Purpose |
|------|---------|
| `src/activity_core/schedule_manager.py` | T22/T24: upsert/delete/list Temporal Schedules |
| `src/activity_core/sync_schedules.py` | T23: bootstrap schedule sync |
| `src/activity_core/event_router.py` | T26/T27/T28: NATS JetStream → Temporal |
| `src/activity_core/api.py` | T30: FastAPI CRUD + manual trigger |
| `tests/test_schedule_lifecycle.py` | T25: schedule lifecycle unit tests |
| `tests/test_event_router.py` | T29: event router unit + integration tests |
| `docs/runbook.md` | T33: operational runbook |
| `docker-compose.dev.yml` | added NATS service |
## Completion criteria ✓
- Schedules fire `RunActivityWorkflow` automatically on cron cadence ✓
- External event published to NATS reaches the correct ActivityDefinition end-to-end ✓
- ActivityDefinitions are manageable via REST API ✓
- Prometheus metrics are scraped ✓
- Runbook is written ✓