generated from coulomb/repo-seed
152 lines
7.9 KiB
Markdown
152 lines
7.9 KiB
Markdown
# State Hub → activity-core Delegation Protocol
|
|
|
|
> CUST-WP-0040 T05. Cross-reference:
|
|
> [`docs/nats-event-subjects.md`](nats-event-subjects.md),
|
|
> [`docs/cron-migration.md`](cron-migration.md), and activity-core's
|
|
> `docs/adr/adr-001-event-bridge-architecture.md`.
|
|
|
|
## TL;DR
|
|
|
|
The state hub is a **read model** for cross-domain state. It is not a
|
|
task factory. Maintenance automations that *create new work in response
|
|
to state transitions* belong in activity-core as `ActivityDefinition`
|
|
files. The state hub's only job in that flow is to **publish lifecycle
|
|
events** on NATS JetStream so activity-core can react.
|
|
|
|
```
|
|
NATS JetStream
|
|
subject: org.statehub.>
|
|
stream: ACTIVITY_EVENTS
|
|
┌──────────────────────┐
|
|
POST /repos/ │ │
|
|
PATCH /workstreams/* ─────publish───▶ │ │ ───consume───▶ activity-core
|
|
POST /decisions/*/resolve │ │ EventRouter
|
|
POST /domain-goals/*/activate │ │ │
|
|
scripts/cleanup_stale_tasks.py │ │ ▼
|
|
└──────────────────────┘ RunActivityWorkflow
|
|
state-hub (creates tasks in
|
|
issue-core, etc.)
|
|
```
|
|
|
|
## Why delegate?
|
|
|
|
| Concern | Living in the state hub today | Lives in activity-core after migration |
|
|
| ---------------------------------------- | ----------------------------- | ----------------------------------------------------------- |
|
|
| "When should this maintenance run?" | cron/systemd timers | `ActivityDefinition.trigger` (cron + event triggers) |
|
|
| "What rule decides whether to act?" | hard-coded in the script | `ActivityDefinition.rule.when` expressions |
|
|
| "What task / side-effect should we run?" | hard-coded in the script | `ActivityDefinition.instruction` (shell / workflow / etc.) |
|
|
| "Where do we audit what fired?" | journalctl + ad hoc logs | activity-core history + Temporal workflow runs |
|
|
| "How is it changed safely?" | edit Python + redeploy hub | edit YAML in the repo, PR-reviewable, hot-reloadable |
|
|
|
|
Concentrating maintenance logic in declarative `ActivityDefinition`
|
|
files makes the rules **auditable**, **testable**, and **modifiable
|
|
without redeploying the state hub**.
|
|
|
|
## Published lifecycle events (v1.0)
|
|
|
|
Authoritative list and attributes live in
|
|
[`docs/nats-event-subjects.md`](nats-event-subjects.md). At v1.0 the
|
|
state hub publishes:
|
|
|
|
| Subject | Trigger site (file:fn) |
|
|
| ------------------------------------ | --------------------------------------------------------------- |
|
|
| `org.statehub.repo.registered` | `api/routers/repos.py:register_repo` |
|
|
| `org.statehub.workstream.completed` | `api/routers/workstreams.py:update_workstream` (on transition) |
|
|
| `org.statehub.decision.resolved` | `api/routers/decisions.py:resolve_decision_action` |
|
|
| `org.statehub.domain.goal.activated` | `api/routers/domain_goals.py:activate_domain_goal` |
|
|
| `org.statehub.task.stale` | `scripts/cleanup_stale_tasks.py` (per canceled task) |
|
|
|
|
All events use the shared `EventEnvelope` schema (`api/events/envelope.py`)
|
|
and are published via `publish_event(subject, envelope)`. Publishing is
|
|
fire-and-forget: failures are logged but **never break the API request
|
|
that triggered them**, and the publisher no-ops when `NATS_URL` is
|
|
unset.
|
|
|
|
## What stays in the state hub
|
|
|
|
- DB schema + Alembic migrations
|
|
- API endpoints (CRUD + status transitions + read-model queries)
|
|
- MCP tools (read + sanctioned writes: `resolve_decision`,
|
|
`add_progress_event`, `get_next_steps`)
|
|
- The consistency engine (`scripts/consistency_check.py`) — it owns
|
|
ADR-001 reconciliation between workplan files and the DB.
|
|
- The `cleanup_stale_tasks.py` *script* (not its schedule) — it owns
|
|
the lifecycle rule for cancelling orphaned tasks.
|
|
|
|
## What moves to activity-core
|
|
|
|
- The *schedule* for the consistency sweep (`*/15 * * * *`) →
|
|
`the-custodian.state-hub-consistency-sweep` ActivityDefinition.
|
|
- The *schedule* for stale-task cleanup (`0 3 * * *`) →
|
|
`the-custodian.state-hub-stale-task-cleanup` ActivityDefinition.
|
|
- Any future "when X happens, create a task" logic. The state hub must
|
|
**not** add such rules to its routers — it publishes the event and
|
|
the rule lives in activity-core.
|
|
|
|
See [`docs/cron-migration.md`](cron-migration.md) for the
|
|
ActivityDefinition drafts and cutover plan.
|
|
|
|
## What must never happen
|
|
|
|
- **State hub writes directly to activity-core's DB.** All
|
|
communication is via NATS events.
|
|
- **State hub creates issue-core / Temporal tasks itself.** That is
|
|
activity-core's job.
|
|
- **Routers publish before committing.** Always publish after
|
|
`await session.commit()` succeeds. (Otherwise a transaction rollback
|
|
would still leak an event.)
|
|
- **A publish failure breaks the API response.** The publisher logs and
|
|
swallows; lost events are recovered by activity-core re-reading state
|
|
on next sweep, not by the API retrying.
|
|
|
|
## Operational checklist — migrating a cron to an ActivityDefinition
|
|
|
|
1. Identify the cron's current side-effects. If any of them
|
|
*create work* (a task, an issue, a ticket), it is a delegation
|
|
candidate. Pure consistency reconciliation can stay as a shell-cron
|
|
for now if simpler.
|
|
2. Decide the trigger: keep it as `cron`, or upgrade it to `event` by
|
|
first identifying / publishing the state hub lifecycle event the
|
|
cron is effectively polling for.
|
|
3. Add a row to [`docs/nats-event-subjects.md`](nats-event-subjects.md)
|
|
if a new event type is being introduced.
|
|
4. Wire `publish_event(...)` at the transition site in the appropriate
|
|
router. Verify with `nats sub 'org.statehub.>'`.
|
|
5. Land the `ActivityDefinition` in activity-core; enable it in
|
|
staging.
|
|
6. Run both old cron and new ActivityDefinition in parallel for one
|
|
week. Both side-effects must be idempotent for this to be safe — if
|
|
they aren't, fix that first.
|
|
7. Disable the old cron / systemd timer, archive the unit files.
|
|
8. Update [`SCOPE.md`](../../SCOPE.md) "Often used with" to mention the
|
|
activity-core handoff if a new event type was added.
|
|
|
|
## Bootstrap and partial-availability behaviour
|
|
|
|
- **No NATS configured (`NATS_URL` unset)**: publisher is a logged
|
|
no-op. The state hub remains fully functional. Useful for dev
|
|
environments and `make test`.
|
|
- **NATS reachable but stream missing**: publisher creates the
|
|
`ACTIVITY_EVENTS` stream with subject filter `org.>` on first
|
|
publish, so the state hub can come up before activity-core. In
|
|
production both should target the same NATS cluster.
|
|
- **activity-core down**: events queue in JetStream and are replayed
|
|
when the consumer reconnects. The state hub is unaffected.
|
|
- **State hub down**: scheduled ActivityDefinitions in activity-core
|
|
still fire; ones that need `state-hub.health` context will skip
|
|
cleanly per their rule.
|
|
|
|
## Verifying end-to-end
|
|
|
|
```bash
|
|
# Subscribe to lifecycle events
|
|
nats sub 'org.statehub.>'
|
|
|
|
# Trigger an event (in another terminal)
|
|
curl -X POST http://127.0.0.1:8000/repos/<slug>/sync
|
|
|
|
# Observe the envelope on the subscriber. Sample shape:
|
|
# {"id":"...","type":"org.statehub.workstream.completed","version":"1.0",
|
|
# "timestamp":"...","publisher":"state-hub","attributes":{...}}
|
|
```
|