Files
activity-core/wiki/ActivityCorePlan_chtgpt.md
tegwick 6f9132314f Add project scaffold: contracts, schemas, docker-compose, workplans
Phase 0 contracts (event envelope, ActivityDefinition, idempotency doc,
naming conventions) and Phase 1 Temporal cluster setup (docker-compose.dev.yml,
Temporal dynamic config) are complete. Includes Pydantic models, JSON schemas,
wiki architecture docs, and ADR-001 workplan files for both workstreams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 22:45:40 +01:00

8.9 KiB
Raw Permalink Blame History

This is a protplan that should be digested and compared to form an actionable workplan.

If youre leaning toward self-hosted Temporal, your “Activity = event-driven task factory” maps almost perfectly to:

  • Temporal Schedules → create a durable, centrally managed trigger that starts a workflow on a cadence (better than “cron workflows”). (docs.temporal.io)
  • Workflows → your “activity processor” (evaluate context + decide 0..N tasks)
  • Activities → your concrete “tasks” (side effects, API calls, jobs, human-task creation, etc.)
  • Signals / events → external event ingestion into running workflows (or into starter workflows)

And Temporals server side is explicitly designed for HA scaling across its core subsystems (Frontend/History/Matching/Worker). (docs.temporal.io)

Below is a consolidated, practical workplan to set up a backbone service for a robust event-driven architecture, using Temporal as the orchestration spine.


Consolidated backbone architecture (Temporal-centered EDA)

Backbone components

  1. Temporal Service (server)

    • Temporal Server (Frontend, History, Matching, Worker services) (docs.temporal.io)
    • Persistence store (SQL or Cassandra) + Visibility store (SQL and/or Elasticsearch depending on features) (docs.temporal.io)
  2. Temporal Workers (your code)

    • “Activity Orchestrator Workflows” (your Activity runtime)
    • Activities (task executors / integrators)
  3. Event ingress/egress

    • Ingress: broker subscriptions → “event router” → Temporal (start workflow / signal workflow)
    • Egress: Temporal activities publish domain events to broker
  4. Admin + Observability

    • Temporal Web UI (ops visibility, schedules page, etc.) (docs.temporal.io)
    • Prometheus/Grafana + logs + tracing (OpenTelemetry if you want end-to-end)

Workplan (phased, production-minded)

Phase 0 — Decide the minimum “contract” for your EDA

Deliverable: a stable event & workflow contract so everything stays modular.

  • Event envelope (internal standard): event_id, type, source, occurred_at, subject, trace_id, schema_version, payload

  • Idempotency standard:

    • Every inbound event has a stable event_id
    • Every scheduled run has stable (activity_id, scheduled_for)
  • Naming/partitioning conventions:

    • Temporal Namespace strategy (e.g., prod, stage, or per-tenant)
    • Task Queues per service boundary (e.g., billing-tq, notifications-tq)

Phase 1 — Stand up Temporal Service on Kubernetes (self-hosted)

Deliverable: a working Temporal cluster with persistence + UI.

  1. Provision persistence + visibility dependencies

    • Choose PostgreSQL/MySQL (common) or Cassandra, plus optional Elasticsearch for advanced visibility. Temporal self-hosted deployments need you to provide these stores. (docs.temporal.io)
  2. Deploy Temporal via official Helm chart

    • Temporal maintains official Helm charts for Kubernetes deployments. (docs.temporal.io)
  3. Deploy Temporal Web UI

  4. Production hardening basics

    • NetworkPolicies, PodSecurity, resource limits, HPA
    • Backups for DB/ES
    • Separate node pools if needed for noisy workloads

Note: temporalio/auto-setup is excellent for dev or quick bootstrap (Docker), but for production you typically run server components + managed/provisioned DB/ES explicitly. (Docker Hub)


Phase 2 — Establish the “Activity Orchestrator” as a workflow pattern

Deliverable: one end-to-end ActivityDefinition that spawns tasks robustly.

Implement this canonical workflow:

Workflow: RunActivity(activity_id, trigger)

  1. Load ActivityDefinition (versioned)
  2. Resolve context snapshot (query DB/APIs)
  3. Evaluate rules → decide TaskInstances[]
  4. Execute tasks as Temporal activities (or create “human tasks” in your DB)
  5. Emit TaskCreated / TaskCompleted events (activities publish to broker)
  6. Record run audit (context hash, produced tasks, version)

Key guardrails

  • Idempotency: use deterministic workflow IDs for scheduled runs: workflow_id = activity_id + ":" + scheduled_for
  • Exactly-once effect: for side effects, prefer outbox in your DB or make activities idempotent (store event_id / task_instance_id).

Phase 3 — Replace cron with Temporal Schedules (first-class triggers)

Deliverable: schedules are managed in Temporal, not in random cronjobs.

  • Use Temporal Schedules to start RunActivity(...) at times/intervals (and manage them centrally). (docs.temporal.io)

  • Store your “human editable schedule spec” in your ActivityRegistry, but materialize it into Temporal schedules.

  • Decide “missed run” policy:

    • catch up (bounded)
    • skip
    • compress (run once with widened context)

This is the cleanest alignment with your research draft: “timer ingress → trigger event → processor → spawn tasks”, except Temporal gives you durable state, retries, and execution history by default.


Phase 4 — Add external events: broker → Temporal

Deliverable: event-driven triggers land reliably in Temporal.

  1. Introduce an Event Router service:

    • Subscribes to Kafka/NATS/Rabbit/etc.
    • Validates schema + authN/authZ
    • Applies routing rules: (event.type, filters) -> activity_id(s)
  2. For each match, it either:

    • Starts RunActivity(activity_id, trigger_event) (if no long-lived instance)
    • Or Signals an existing workflow instance (if you have “stateful ongoing activities”)

Rule of thumb

  • If the “activity” is inherently recurring and stateless per run → start per trigger.
  • If the “activity” is an ongoing coordination process (state machine) → signal a long-lived workflow.

Phase 5 — Observability and operability as first-class product

Deliverable: you can run this for months without fear.

  • Temporal UI for inspection (workflows + schedules). (docs.temporal.io)

  • Metrics dashboards:

    • schedule trigger rate, workflow start rate
    • activity failures, retries, latency
    • queue depth by task queue
  • Alerts:

    • sustained workflow task failures
    • persistence latency spikes
    • DLQ / poison message rate (in broker layer)
  • Runbooks:

    • how to replay/repair
    • how to disable an activity safely
    • how to backfill events

Phase 6 — Security hardening (when youre past MVP)

Deliverable: “robust backbone” in the real sense.

  • mTLS between workers and Temporal (and between services)
  • Namespace isolation (and per-namespace access controls if applicable)
  • Event authentication (signed events / broker ACLs)
  • Secrets management (KMS/Vault/SealedSecrets)

How this consolidates with your draft research

Your draft blueprint is structurally correct; the consolidation with Temporal is mainly:

  • Trigger Layer becomes:

    • Temporal Schedules for time triggers (docs.temporal.io)
    • Broker + Event Router for external events
  • Activity Processor becomes a Workflow (durable, replayable, stateful)

  • Task Queue & Execution becomes Temporal Task Queues + workers (no separate Celery/RQ needed unless you have reasons)

  • Durability / retries / history are native properties of Temporals server subsystems (docs.temporal.io)


If you want the minimum slice that proves the architecture:

  1. Temporal on K8s via Helm (Postgres + UI) (docs.temporal.io)

  2. ActivityRegistry (Postgres table)

  3. RunActivity workflow + 2 example activities:

    • one “reads context”
    • one “creates tasks” (writes TaskInstances into DB)
  4. One Temporal Schedule that triggers every minute/hour (docs.temporal.io)

  5. One external event route (broker → router → start workflow)

Thats enough to call it a robust event-driven backbone.


If you tell me your preferred broker (Kafka vs NATS vs RabbitMQ) and your worker language (Go/TS/Java/Python), Ill turn the plan into a concrete component diagram + Kubernetes deployment layout (namespaces, services, HPAs) + the exact “ActivityDefinition” schema Id use.