generated from coulomb/repo-seed
176 lines
7.6 KiB
Markdown
176 lines
7.6 KiB
Markdown
# State Hub Cron → activity-core ActivityDefinition Migration (Design Stub)
|
|
|
|
> CUST-WP-0040 T04. **Design stub — not yet implemented.**
|
|
> Migration depends on activity-core WP-0003 reaching the
|
|
> "ActivityDefinition file ingestion + cron trigger executor" milestone.
|
|
|
|
The state hub currently runs two recurring maintenance jobs and one
|
|
per-repo event hook. Once activity-core is ready, each becomes an
|
|
ActivityDefinition file checked into the appropriate repo. The state hub
|
|
keeps the underlying scripts; only the *scheduling* moves.
|
|
|
|
---
|
|
|
|
## 1. Inventory of current maintenance automations
|
|
|
|
| # | Source | Trigger today | Script invoked | What it does |
|
|
| - | ------------------- | -------------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
|
|
| 1 | systemd user timer | every 15 min | `scripts/consistency_check.py --remote --all` | Pull every registered repo, reconcile workplan files ↔ DB, run C-15 writeback + C-16 pull gate |
|
|
| 2 | manual / daily cron | `make cleanup-stale` (suggested `0 3 * * *`) | `scripts/cleanup_stale_tasks.py` | Cancel tasks still open in finished/archived workstreams; emits `org.statehub.task.stale` |
|
|
| 3 | git post-commit | every commit in a registered repo | `make fix-consistency REPO=<slug>` | Per-repo workplan ↔ DB sync immediately after a commit |
|
|
|
|
Honourable mentions (not currently scheduled, on-demand only — listed for
|
|
completeness so they don't get mistakenly picked up):
|
|
|
|
- `scripts/ingest_sbom.py` — invoked via `make ingest-sbom REPO=<slug>`.
|
|
- `scripts/ingest_capabilities.py` — invoked via `make ingest-capabilities[-all]`.
|
|
- `scripts/check_doi.py` — invoked via `make check-doi[-all]`.
|
|
- `scripts/validate_repo_adr.py` — invoked manually for canon promotion.
|
|
- `scripts/ingest_tpsc.py` — invoked via `make ingest-tpsc[-all]`.
|
|
|
|
These are **not in scope** for cron migration — they remain on-demand
|
|
operator/CI commands. They become candidates only if we later decide to
|
|
run them on a schedule.
|
|
|
|
---
|
|
|
|
## 2. Target ActivityDefinitions
|
|
|
|
### A. `state-hub-consistency-sweep`
|
|
|
|
```yaml
|
|
# activity-definitions/state-hub-consistency-sweep.yaml
|
|
id: the-custodian.state-hub-consistency-sweep
|
|
description: |
|
|
Sweep all registered repos: pull, reconcile workplan files ↔ DB,
|
|
apply writeback (C-15), respect pull gate (C-16). Mirrors the
|
|
existing custodian-sync systemd timer.
|
|
trigger:
|
|
trigger_type: cron
|
|
cron_expression: "*/15 * * * *"
|
|
timezone: UTC
|
|
misfire_policy: skip # if a prior run is still active, skip
|
|
context:
|
|
- kind: http_get # confirm state-hub API is reachable
|
|
url: http://127.0.0.1:8000/state/health
|
|
bind: hub_health
|
|
rule:
|
|
when:
|
|
- "hub_health.status == 'ok'"
|
|
instruction:
|
|
kind: shell
|
|
cmd: >-
|
|
cd /home/worsch/state-hub &&
|
|
.venv/bin/python scripts/consistency_check.py --remote --all --max-seconds 300
|
|
on_failure: log_and_continue # warn-only sweeps must not page on transient failures
|
|
```
|
|
|
|
Notes:
|
|
- Replaces the `custodian-sync.service` + `custodian-sync.timer` pair.
|
|
- Lock semantics (`/tmp/custodian-consistency-remote-all.lock`) stay in
|
|
the script — activity-core just sets the cadence.
|
|
- Once active, `infra/README.md` is updated to instruct users to delete
|
|
the systemd timer.
|
|
|
|
### B. `state-hub-stale-task-cleanup`
|
|
|
|
```yaml
|
|
# activity-definitions/state-hub-stale-task-cleanup.yaml
|
|
id: the-custodian.state-hub-stale-task-cleanup
|
|
description: |
|
|
Daily sweep that cancels tasks still `wait|todo|progress` inside
|
|
finished or archived workstreams. Each cancellation also emits
|
|
org.statehub.task.stale on NATS for downstream reaction.
|
|
trigger:
|
|
trigger_type: cron
|
|
cron_expression: "0 3 * * *"
|
|
timezone: UTC
|
|
instruction:
|
|
kind: shell
|
|
cmd: >-
|
|
cd /home/worsch/state-hub &&
|
|
.venv/bin/python scripts/cleanup_stale_tasks.py
|
|
```
|
|
|
|
Notes:
|
|
- Replaces the documented (`Cron example: 0 3 * * * …`) daily run.
|
|
- The script already emits NATS events (see CUST-WP-0040 T03), so
|
|
downstream ActivityDefinitions can react per-task without a second pass.
|
|
|
|
### C. Per-commit consistency sync (currently a git hook)
|
|
|
|
The git `post-commit` hook installed by `state-hub/scripts/install_hooks.sh`
|
|
is **event-driven, not cron-based**. Migrating it to activity-core would
|
|
require a `repo.commit.pushed` event channel that doesn't exist yet.
|
|
|
|
Recommendation: **keep the git hook as-is for now**. Revisit once an
|
|
event source (e.g. Gitea webhook fed into NATS) is available, at which
|
|
point an event-triggered ActivityDefinition can replace it cleanly:
|
|
|
|
```yaml
|
|
trigger:
|
|
trigger_type: event
|
|
event_type: org.repo.commit.pushed
|
|
filters:
|
|
repo_slug: "*"
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Required context queries
|
|
|
|
Both A and B want to confirm the state hub is reachable before running.
|
|
A reusable context source should be added to activity-core for this:
|
|
|
|
- `state-hub.health` — `GET /state/health` → `{status, db, ...}`
|
|
- (optional) `state-hub.repos` — `GET /repos/?status=active` for the
|
|
sweep's per-repo branching, if we later split A into one
|
|
ActivityDefinition per repo.
|
|
|
|
These belong to the state-hub adapter referenced in the workplan's
|
|
out-of-scope note ("/sbom/status context query endpoint" etc.).
|
|
|
|
---
|
|
|
|
## 4. Blockers / sequencing
|
|
|
|
| Blocker | Owner | Where it lands |
|
|
| ------------------------------------------------------------------------- | -------------- | -------------------------- |
|
|
| activity-core ActivityDefinition file ingestion + cron executor (WP-0003) | activity-core | activity-core/`src/...` |
|
|
| activity-core shell instruction kind with on_failure semantics | activity-core | activity-core/`src/...` |
|
|
| state-hub adapter exposing `state-hub.health` as a context source | activity-core | activity-core/adapters/ |
|
|
|
|
Until these land, the state hub continues to schedule jobs via systemd
|
|
timer + cron entries.
|
|
|
|
---
|
|
|
|
## 5. Cutover plan (when ready)
|
|
|
|
1. Land ActivityDefinitions A + B in activity-core.
|
|
2. Enable them in staging; verify they fire on schedule and produce the
|
|
same DB / NATS effects as the current cron entries.
|
|
3. Run both in parallel for one week (cron + ActivityDefinition). The
|
|
scripts are idempotent — duplicate runs are no-ops on a clean state.
|
|
4. Disable the systemd timer:
|
|
`systemctl --user disable --now custodian-sync.timer`
|
|
5. Remove the cleanup-stale cron entry from `crontab -e`.
|
|
6. Update `infra/README.md` to point at the ActivityDefinitions and
|
|
archive the systemd unit files.
|
|
7. Per-commit hook stays until a `repo.commit.pushed` event exists.
|
|
|
|
---
|
|
|
|
## 6. Open questions
|
|
|
|
- **Locking**: should activity-core wrap shell instructions with a
|
|
process lock (today the script self-locks via `/tmp/...`)? If yes, the
|
|
state-hub script's lock can be removed.
|
|
- **Failure surfacing**: today systemd journals capture stderr. Where
|
|
does an ActivityDefinition's shell stderr go? (logs ? activity
|
|
history ?) — needs activity-core docs before cutover.
|
|
- **Per-repo split**: do we split A into one ActivityDefinition per
|
|
registered repo (so failures don't poison the sweep), or keep the
|
|
monolithic `--all` mode? The latter is simpler and matches today's
|
|
behaviour; the former gives better observability.
|