Files
activity-core/wiki/ActivityCorePlan_chtgpt.md
tegwick 6f9132314f Add project scaffold: contracts, schemas, docker-compose, workplans
Phase 0 contracts (event envelope, ActivityDefinition, idempotency doc,
naming conventions) and Phase 1 Temporal cluster setup (docker-compose.dev.yml,
Temporal dynamic config) are complete. Includes Pydantic models, JSON schemas,
wiki architecture docs, and ADR-001 workplan files for both workstreams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 22:45:40 +01:00

215 lines
8.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This is a protplan that should be digested and compared to form an actionable workplan.
If youre leaning toward **self-hosted Temporal**, your “Activity = event-driven task factory” maps almost perfectly to:
* **Temporal Schedules** → create a durable, centrally managed trigger that starts a workflow on a cadence (better than “cron workflows”). ([docs.temporal.io][1])
* **Workflows** → your “activity processor” (evaluate context + decide 0..N tasks)
* **Activities** → your concrete “tasks” (side effects, API calls, jobs, human-task creation, etc.)
* **Signals / events** → external event ingestion into running workflows (or into starter workflows)
And Temporals server side is explicitly designed for HA scaling across its core subsystems (Frontend/History/Matching/Worker). ([docs.temporal.io][2])
Below is a consolidated, practical workplan to set up a **backbone service** for a robust event-driven architecture, using Temporal as the orchestration spine.
---
## Consolidated backbone architecture (Temporal-centered EDA)
### Backbone components
1. **Temporal Service (server)**
* Temporal Server (Frontend, History, Matching, Worker services) ([docs.temporal.io][2])
* Persistence store (SQL or Cassandra) + Visibility store (SQL and/or Elasticsearch depending on features) ([docs.temporal.io][3])
2. **Temporal Workers (your code)**
* “Activity Orchestrator Workflows” (your Activity runtime)
* Activities (task executors / integrators)
3. **Event ingress/egress**
* Ingress: broker subscriptions → “event router” → Temporal (start workflow / signal workflow)
* Egress: Temporal activities publish domain events to broker
4. **Admin + Observability**
* Temporal Web UI (ops visibility, schedules page, etc.) ([docs.temporal.io][4])
* Prometheus/Grafana + logs + tracing (OpenTelemetry if you want end-to-end)
---
## Workplan (phased, production-minded)
### Phase 0 — Decide the minimum “contract” for your EDA
**Deliverable:** a stable event & workflow contract so everything stays modular.
* **Event envelope (internal standard)**:
`event_id`, `type`, `source`, `occurred_at`, `subject`, `trace_id`, `schema_version`, `payload`
* **Idempotency standard**:
* Every inbound event has a stable `event_id`
* Every scheduled run has stable `(activity_id, scheduled_for)`
* **Naming/partitioning conventions**:
* Temporal **Namespace** strategy (e.g., `prod`, `stage`, or per-tenant)
* Task Queues per service boundary (e.g., `billing-tq`, `notifications-tq`)
---
### Phase 1 — Stand up Temporal Service on Kubernetes (self-hosted)
**Deliverable:** a working Temporal cluster with persistence + UI.
1. **Provision persistence + visibility dependencies**
* Choose **PostgreSQL/MySQL** (common) or Cassandra, plus optional Elasticsearch for advanced visibility. Temporal self-hosted deployments need you to provide these stores. ([docs.temporal.io][5])
2. **Deploy Temporal via official Helm chart**
* Temporal maintains official Helm charts for Kubernetes deployments. ([docs.temporal.io][5])
3. **Deploy Temporal Web UI**
* Enable the UI so you can inspect workflows and schedules. ([docs.temporal.io][4])
4. **Production hardening basics**
* NetworkPolicies, PodSecurity, resource limits, HPA
* Backups for DB/ES
* Separate node pools if needed for noisy workloads
*Note:* `temporalio/auto-setup` is excellent for dev or quick bootstrap (Docker), but for production you typically run server components + managed/provisioned DB/ES explicitly. ([Docker Hub][6])
---
### Phase 2 — Establish the “Activity Orchestrator” as a workflow pattern
**Deliverable:** one end-to-end ActivityDefinition that spawns tasks robustly.
Implement this canonical workflow:
**Workflow: `RunActivity(activity_id, trigger)`**
1. Load `ActivityDefinition` (versioned)
2. Resolve context snapshot (query DB/APIs)
3. Evaluate rules → decide `TaskInstances[]`
4. Execute tasks as Temporal **activities** (or create “human tasks” in your DB)
5. Emit `TaskCreated` / `TaskCompleted` events (activities publish to broker)
6. Record run audit (context hash, produced tasks, version)
**Key guardrails**
* **Idempotency:** use deterministic workflow IDs for scheduled runs:
`workflow_id = activity_id + ":" + scheduled_for`
* **Exactly-once effect:** for side effects, prefer **outbox** in your DB or make activities idempotent (store `event_id` / `task_instance_id`).
---
### Phase 3 — Replace cron with Temporal Schedules (first-class triggers)
**Deliverable:** schedules are managed in Temporal, not in random cronjobs.
* Use **Temporal Schedules** to start `RunActivity(...)` at times/intervals (and manage them centrally). ([docs.temporal.io][1])
* Store your “human editable schedule spec” in your ActivityRegistry, but **materialize** it into Temporal schedules.
* Decide “missed run” policy:
* catch up (bounded)
* skip
* compress (run once with widened context)
This is the cleanest alignment with your research draft: “timer ingress → trigger event → processor → spawn tasks”, except Temporal gives you durable state, retries, and execution history by default.
---
### Phase 4 — Add external events: broker → Temporal
**Deliverable:** event-driven triggers land reliably in Temporal.
1. Introduce an **Event Router** service:
* Subscribes to Kafka/NATS/Rabbit/etc.
* Validates schema + authN/authZ
* Applies routing rules: `(event.type, filters) -> activity_id(s)`
2. For each match, it either:
* **Starts** `RunActivity(activity_id, trigger_event)` (if no long-lived instance)
* Or **Signals** an existing workflow instance (if you have “stateful ongoing activities”)
**Rule of thumb**
* If the “activity” is inherently recurring and stateless per run → start per trigger.
* If the “activity” is an ongoing coordination process (state machine) → signal a long-lived workflow.
---
### Phase 5 — Observability and operability as first-class product
**Deliverable:** you can run this for months without fear.
* Temporal UI for inspection (workflows + schedules). ([docs.temporal.io][4])
* Metrics dashboards:
* schedule trigger rate, workflow start rate
* activity failures, retries, latency
* queue depth by task queue
* Alerts:
* sustained workflow task failures
* persistence latency spikes
* DLQ / poison message rate (in broker layer)
* Runbooks:
* how to replay/repair
* how to disable an activity safely
* how to backfill events
---
### Phase 6 — Security hardening (when youre past MVP)
**Deliverable:** “robust backbone” in the real sense.
* mTLS between workers and Temporal (and between services)
* Namespace isolation (and per-namespace access controls if applicable)
* Event authentication (signed events / broker ACLs)
* Secrets management (KMS/Vault/SealedSecrets)
---
## How this consolidates with your draft research
Your draft blueprint is structurally correct; the consolidation with Temporal is mainly:
* **Trigger Layer** becomes:
* Temporal **Schedules** for time triggers ([docs.temporal.io][1])
* Broker + Event Router for external events
* **Activity Processor** becomes a **Workflow** (durable, replayable, stateful)
* **Task Queue & Execution** becomes Temporal **Task Queues** + workers (no separate Celery/RQ needed unless you have reasons)
* **Durability / retries / history** are native properties of Temporals server subsystems ([docs.temporal.io][2])
---
## Recommended “backbone MVP” scope (small but real)
If you want the minimum slice that proves the architecture:
1. Temporal on K8s via Helm (Postgres + UI) ([docs.temporal.io][5])
2. ActivityRegistry (Postgres table)
3. `RunActivity` workflow + 2 example activities:
* one “reads context”
* one “creates tasks” (writes TaskInstances into DB)
4. One Temporal Schedule that triggers every minute/hour ([docs.temporal.io][1])
5. One external event route (broker → router → start workflow)
Thats enough to call it a **robust event-driven backbone**.
---
If you tell me your preferred broker (Kafka vs NATS vs RabbitMQ) and your worker language (Go/TS/Java/Python), Ill turn the plan into a concrete component diagram + Kubernetes deployment layout (namespaces, services, HPAs) + the exact “ActivityDefinition” schema Id use.
[1]: https://docs.temporal.io/evaluate/development-production-features/schedules?utm_source=chatgpt.com "Schedules - Temporal feature"
[2]: https://docs.temporal.io/temporal-service/temporal-server?utm_source=chatgpt.com "Temporal Server | Temporal Platform Documentation"
[3]: https://docs.temporal.io/temporal-service?utm_source=chatgpt.com "Temporal Service | Temporal Platform Documentation"
[4]: https://docs.temporal.io/web-ui?utm_source=chatgpt.com "Temporal Web UI | Temporal Platform Documentation"
[5]: https://docs.temporal.io/self-hosted-guide/deployment?utm_source=chatgpt.com "Deploying a Temporal Service"
[6]: https://hub.docker.com/r/temporalio/auto-setup?utm_source=chatgpt.com "temporalio/auto-setup - Docker Image"