generated from coulomb/repo-seed
Phase 0 contracts (event envelope, ActivityDefinition, idempotency doc, naming conventions) and Phase 1 Temporal cluster setup (docker-compose.dev.yml, Temporal dynamic config) are complete. Includes Pydantic models, JSON schemas, wiki architecture docs, and ADR-001 workplan files for both workstreams. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
215 lines
8.9 KiB
Markdown
215 lines
8.9 KiB
Markdown
This is a protplan that should be digested and compared to form an actionable workplan.
|
||
|
||
If you’re leaning toward **self-hosted Temporal**, your “Activity = event-driven task factory” maps almost perfectly to:
|
||
|
||
* **Temporal Schedules** → create a durable, centrally managed trigger that starts a workflow on a cadence (better than “cron workflows”). ([docs.temporal.io][1])
|
||
* **Workflows** → your “activity processor” (evaluate context + decide 0..N tasks)
|
||
* **Activities** → your concrete “tasks” (side effects, API calls, jobs, human-task creation, etc.)
|
||
* **Signals / events** → external event ingestion into running workflows (or into starter workflows)
|
||
|
||
And Temporal’s server side is explicitly designed for HA scaling across its core subsystems (Frontend/History/Matching/Worker). ([docs.temporal.io][2])
|
||
|
||
Below is a consolidated, practical workplan to set up a **backbone service** for a robust event-driven architecture, using Temporal as the orchestration spine.
|
||
|
||
---
|
||
|
||
## Consolidated backbone architecture (Temporal-centered EDA)
|
||
|
||
### Backbone components
|
||
|
||
1. **Temporal Service (server)**
|
||
|
||
* Temporal Server (Frontend, History, Matching, Worker services) ([docs.temporal.io][2])
|
||
* Persistence store (SQL or Cassandra) + Visibility store (SQL and/or Elasticsearch depending on features) ([docs.temporal.io][3])
|
||
2. **Temporal Workers (your code)**
|
||
|
||
* “Activity Orchestrator Workflows” (your Activity runtime)
|
||
* Activities (task executors / integrators)
|
||
3. **Event ingress/egress**
|
||
|
||
* Ingress: broker subscriptions → “event router” → Temporal (start workflow / signal workflow)
|
||
* Egress: Temporal activities publish domain events to broker
|
||
4. **Admin + Observability**
|
||
|
||
* Temporal Web UI (ops visibility, schedules page, etc.) ([docs.temporal.io][4])
|
||
* Prometheus/Grafana + logs + tracing (OpenTelemetry if you want end-to-end)
|
||
|
||
---
|
||
|
||
## Workplan (phased, production-minded)
|
||
|
||
### Phase 0 — Decide the minimum “contract” for your EDA
|
||
|
||
**Deliverable:** a stable event & workflow contract so everything stays modular.
|
||
|
||
* **Event envelope (internal standard)**:
|
||
`event_id`, `type`, `source`, `occurred_at`, `subject`, `trace_id`, `schema_version`, `payload`
|
||
* **Idempotency standard**:
|
||
|
||
* Every inbound event has a stable `event_id`
|
||
* Every scheduled run has stable `(activity_id, scheduled_for)`
|
||
* **Naming/partitioning conventions**:
|
||
|
||
* Temporal **Namespace** strategy (e.g., `prod`, `stage`, or per-tenant)
|
||
* Task Queues per service boundary (e.g., `billing-tq`, `notifications-tq`)
|
||
|
||
---
|
||
|
||
### Phase 1 — Stand up Temporal Service on Kubernetes (self-hosted)
|
||
|
||
**Deliverable:** a working Temporal cluster with persistence + UI.
|
||
|
||
1. **Provision persistence + visibility dependencies**
|
||
|
||
* Choose **PostgreSQL/MySQL** (common) or Cassandra, plus optional Elasticsearch for advanced visibility. Temporal self-hosted deployments need you to provide these stores. ([docs.temporal.io][5])
|
||
2. **Deploy Temporal via official Helm chart**
|
||
|
||
* Temporal maintains official Helm charts for Kubernetes deployments. ([docs.temporal.io][5])
|
||
3. **Deploy Temporal Web UI**
|
||
|
||
* Enable the UI so you can inspect workflows and schedules. ([docs.temporal.io][4])
|
||
4. **Production hardening basics**
|
||
|
||
* NetworkPolicies, PodSecurity, resource limits, HPA
|
||
* Backups for DB/ES
|
||
* Separate node pools if needed for noisy workloads
|
||
|
||
*Note:* `temporalio/auto-setup` is excellent for dev or quick bootstrap (Docker), but for production you typically run server components + managed/provisioned DB/ES explicitly. ([Docker Hub][6])
|
||
|
||
---
|
||
|
||
### Phase 2 — Establish the “Activity Orchestrator” as a workflow pattern
|
||
|
||
**Deliverable:** one end-to-end ActivityDefinition that spawns tasks robustly.
|
||
|
||
Implement this canonical workflow:
|
||
|
||
**Workflow: `RunActivity(activity_id, trigger)`**
|
||
|
||
1. Load `ActivityDefinition` (versioned)
|
||
2. Resolve context snapshot (query DB/APIs)
|
||
3. Evaluate rules → decide `TaskInstances[]`
|
||
4. Execute tasks as Temporal **activities** (or create “human tasks” in your DB)
|
||
5. Emit `TaskCreated` / `TaskCompleted` events (activities publish to broker)
|
||
6. Record run audit (context hash, produced tasks, version)
|
||
|
||
**Key guardrails**
|
||
|
||
* **Idempotency:** use deterministic workflow IDs for scheduled runs:
|
||
`workflow_id = activity_id + ":" + scheduled_for`
|
||
* **Exactly-once effect:** for side effects, prefer **outbox** in your DB or make activities idempotent (store `event_id` / `task_instance_id`).
|
||
|
||
---
|
||
|
||
### Phase 3 — Replace cron with Temporal Schedules (first-class triggers)
|
||
|
||
**Deliverable:** schedules are managed in Temporal, not in random cronjobs.
|
||
|
||
* Use **Temporal Schedules** to start `RunActivity(...)` at times/intervals (and manage them centrally). ([docs.temporal.io][1])
|
||
* Store your “human editable schedule spec” in your ActivityRegistry, but **materialize** it into Temporal schedules.
|
||
* Decide “missed run” policy:
|
||
|
||
* catch up (bounded)
|
||
* skip
|
||
* compress (run once with widened context)
|
||
|
||
This is the cleanest alignment with your research draft: “timer ingress → trigger event → processor → spawn tasks”, except Temporal gives you durable state, retries, and execution history by default.
|
||
|
||
---
|
||
|
||
### Phase 4 — Add external events: broker → Temporal
|
||
|
||
**Deliverable:** event-driven triggers land reliably in Temporal.
|
||
|
||
1. Introduce an **Event Router** service:
|
||
|
||
* Subscribes to Kafka/NATS/Rabbit/etc.
|
||
* Validates schema + authN/authZ
|
||
* Applies routing rules: `(event.type, filters) -> activity_id(s)`
|
||
2. For each match, it either:
|
||
|
||
* **Starts** `RunActivity(activity_id, trigger_event)` (if no long-lived instance)
|
||
* Or **Signals** an existing workflow instance (if you have “stateful ongoing activities”)
|
||
|
||
**Rule of thumb**
|
||
|
||
* If the “activity” is inherently recurring and stateless per run → start per trigger.
|
||
* If the “activity” is an ongoing coordination process (state machine) → signal a long-lived workflow.
|
||
|
||
---
|
||
|
||
### Phase 5 — Observability and operability as first-class product
|
||
|
||
**Deliverable:** you can run this for months without fear.
|
||
|
||
* Temporal UI for inspection (workflows + schedules). ([docs.temporal.io][4])
|
||
* Metrics dashboards:
|
||
|
||
* schedule trigger rate, workflow start rate
|
||
* activity failures, retries, latency
|
||
* queue depth by task queue
|
||
* Alerts:
|
||
|
||
* sustained workflow task failures
|
||
* persistence latency spikes
|
||
* DLQ / poison message rate (in broker layer)
|
||
* Runbooks:
|
||
|
||
* how to replay/repair
|
||
* how to disable an activity safely
|
||
* how to backfill events
|
||
|
||
---
|
||
|
||
### Phase 6 — Security hardening (when you’re past MVP)
|
||
|
||
**Deliverable:** “robust backbone” in the real sense.
|
||
|
||
* mTLS between workers and Temporal (and between services)
|
||
* Namespace isolation (and per-namespace access controls if applicable)
|
||
* Event authentication (signed events / broker ACLs)
|
||
* Secrets management (KMS/Vault/SealedSecrets)
|
||
|
||
---
|
||
|
||
## How this consolidates with your draft research
|
||
|
||
Your draft blueprint is structurally correct; the consolidation with Temporal is mainly:
|
||
|
||
* **Trigger Layer** becomes:
|
||
|
||
* Temporal **Schedules** for time triggers ([docs.temporal.io][1])
|
||
* Broker + Event Router for external events
|
||
* **Activity Processor** becomes a **Workflow** (durable, replayable, stateful)
|
||
* **Task Queue & Execution** becomes Temporal **Task Queues** + workers (no separate Celery/RQ needed unless you have reasons)
|
||
* **Durability / retries / history** are native properties of Temporal’s server subsystems ([docs.temporal.io][2])
|
||
|
||
---
|
||
|
||
## Recommended “backbone MVP” scope (small but real)
|
||
|
||
If you want the minimum slice that proves the architecture:
|
||
|
||
1. Temporal on K8s via Helm (Postgres + UI) ([docs.temporal.io][5])
|
||
2. ActivityRegistry (Postgres table)
|
||
3. `RunActivity` workflow + 2 example activities:
|
||
|
||
* one “reads context”
|
||
* one “creates tasks” (writes TaskInstances into DB)
|
||
4. One Temporal Schedule that triggers every minute/hour ([docs.temporal.io][1])
|
||
5. One external event route (broker → router → start workflow)
|
||
|
||
That’s enough to call it a **robust event-driven backbone**.
|
||
|
||
---
|
||
|
||
If you tell me your preferred broker (Kafka vs NATS vs RabbitMQ) and your worker language (Go/TS/Java/Python), I’ll turn the plan into a concrete component diagram + Kubernetes deployment layout (namespaces, services, HPAs) + the exact “ActivityDefinition” schema I’d use.
|
||
|
||
[1]: https://docs.temporal.io/evaluate/development-production-features/schedules?utm_source=chatgpt.com "Schedules - Temporal feature"
|
||
[2]: https://docs.temporal.io/temporal-service/temporal-server?utm_source=chatgpt.com "Temporal Server | Temporal Platform Documentation"
|
||
[3]: https://docs.temporal.io/temporal-service?utm_source=chatgpt.com "Temporal Service | Temporal Platform Documentation"
|
||
[4]: https://docs.temporal.io/web-ui?utm_source=chatgpt.com "Temporal Web UI | Temporal Platform Documentation"
|
||
[5]: https://docs.temporal.io/self-hosted-guide/deployment?utm_source=chatgpt.com "Deploying a Temporal Service"
|
||
[6]: https://hub.docker.com/r/temporalio/auto-setup?utm_source=chatgpt.com "temporalio/auto-setup - Docker Image"
|
||
|