Files
state-hub/docs/activity-core-delegation.md

7.9 KiB

State Hub → activity-core Delegation Protocol

CUST-WP-0040 T05. Cross-reference: docs/nats-event-subjects.md, docs/cron-migration.md, and activity-core's docs/adr/adr-001-event-bridge-architecture.md.

TL;DR

The state hub is a read model for cross-domain state. It is not a task factory. Maintenance automations that create new work in response to state transitions belong in activity-core as ActivityDefinition files. The state hub's only job in that flow is to publish lifecycle events on NATS JetStream so activity-core can react.

                                          NATS JetStream
                                          subject: org.statehub.>
                                          stream:  ACTIVITY_EVENTS
                                          ┌──────────────────────┐
   POST /repos/                            │                      │
   PATCH /workstreams/*  ─────publish───▶  │                      │ ───consume───▶  activity-core
   POST /decisions/*/resolve               │                      │                 EventRouter
   POST /domain-goals/*/activate           │                      │                       │
   scripts/cleanup_stale_tasks.py          │                      │                       ▼
                                           └──────────────────────┘                 RunActivityWorkflow
   state-hub                                                          (creates tasks in
                                                                                     issue-core, etc.)

Why delegate?

Concern Living in the state hub today Lives in activity-core after migration
"When should this maintenance run?" cron/systemd timers ActivityDefinition.trigger (cron + event triggers)
"What rule decides whether to act?" hard-coded in the script ActivityDefinition.rule.when expressions
"What task / side-effect should we run?" hard-coded in the script ActivityDefinition.instruction (shell / workflow / etc.)
"Where do we audit what fired?" journalctl + ad hoc logs activity-core history + Temporal workflow runs
"How is it changed safely?" edit Python + redeploy hub edit YAML in the repo, PR-reviewable, hot-reloadable

Concentrating maintenance logic in declarative ActivityDefinition files makes the rules auditable, testable, and modifiable without redeploying the state hub.

Published lifecycle events (v1.0)

Authoritative list and attributes live in docs/nats-event-subjects.md. At v1.0 the state hub publishes:

Subject Trigger site (file:fn)
org.statehub.repo.registered api/routers/repos.py:register_repo
org.statehub.workstream.completed api/routers/workstreams.py:update_workstream (on transition)
org.statehub.decision.resolved api/routers/decisions.py:resolve_decision_action
org.statehub.domain.goal.activated api/routers/domain_goals.py:activate_domain_goal
org.statehub.task.stale scripts/cleanup_stale_tasks.py (per canceled task)

All events use the shared EventEnvelope schema (api/events/envelope.py) and are published via publish_event(subject, envelope). Publishing is fire-and-forget: failures are logged but never break the API request that triggered them, and the publisher no-ops when NATS_URL is unset.

What stays in the state hub

  • DB schema + Alembic migrations
  • API endpoints (CRUD + status transitions + read-model queries)
  • MCP tools (read + sanctioned writes: resolve_decision, add_progress_event, get_next_steps)
  • The consistency engine (scripts/consistency_check.py) — it owns ADR-001 reconciliation between workplan files and the DB.
  • The cleanup_stale_tasks.py script (not its schedule) — it owns the lifecycle rule for cancelling orphaned tasks.

What moves to activity-core

  • The schedule for the consistency sweep (*/15 * * * *) → the-custodian.state-hub-consistency-sweep ActivityDefinition.
  • The schedule for stale-task cleanup (0 3 * * *) → the-custodian.state-hub-stale-task-cleanup ActivityDefinition.
  • Any future "when X happens, create a task" logic. The state hub must not add such rules to its routers — it publishes the event and the rule lives in activity-core.

See docs/cron-migration.md for the ActivityDefinition drafts and cutover plan.

What must never happen

  • State hub writes directly to activity-core's DB. All communication is via NATS events.
  • State hub creates issue-core / Temporal tasks itself. That is activity-core's job.
  • Routers publish before committing. Always publish after await session.commit() succeeds. (Otherwise a transaction rollback would still leak an event.)
  • A publish failure breaks the API response. The publisher logs and swallows; lost events are recovered by activity-core re-reading state on next sweep, not by the API retrying.

Operational checklist — migrating a cron to an ActivityDefinition

  1. Identify the cron's current side-effects. If any of them create work (a task, an issue, a ticket), it is a delegation candidate. Pure consistency reconciliation can stay as a shell-cron for now if simpler.
  2. Decide the trigger: keep it as cron, or upgrade it to event by first identifying / publishing the state hub lifecycle event the cron is effectively polling for.
  3. Add a row to docs/nats-event-subjects.md if a new event type is being introduced.
  4. Wire publish_event(...) at the transition site in the appropriate router. Verify with nats sub 'org.statehub.>'.
  5. Land the ActivityDefinition in activity-core; enable it in staging.
  6. Run both old cron and new ActivityDefinition in parallel for one week. Both side-effects must be idempotent for this to be safe — if they aren't, fix that first.
  7. Disable the old cron / systemd timer, archive the unit files.
  8. Update SCOPE.md "Often used with" to mention the activity-core handoff if a new event type was added.

Bootstrap and partial-availability behaviour

  • No NATS configured (NATS_URL unset): publisher is a logged no-op. The state hub remains fully functional. Useful for dev environments and make test.
  • NATS reachable but stream missing: publisher creates the ACTIVITY_EVENTS stream with subject filter org.> on first publish, so the state hub can come up before activity-core. In production both should target the same NATS cluster.
  • activity-core down: events queue in JetStream and are replayed when the consumer reconnects. The state hub is unaffected.
  • State hub down: scheduled ActivityDefinitions in activity-core still fire; ones that need state-hub.health context will skip cleanly per their rule.

Verifying end-to-end

# Subscribe to lifecycle events
nats sub 'org.statehub.>'

# Trigger an event (in another terminal)
curl -X POST http://127.0.0.1:8000/repos/<slug>/sync

# Observe the envelope on the subscriber. Sample shape:
# {"id":"...","type":"org.statehub.workstream.completed","version":"1.0",
#  "timestamp":"...","publisher":"state-hub","attributes":{...}}