generated from coulomb/repo-seed
finish(STATE-WP-0064): cut over scheduler and split sweep errors from failures
STATE-WP-0064 cutover (state-hub only): - Retire local custodian-sync.timer; archive units under infra/systemd/archived/ - Mark workplan finished; update infra/README, cron-migration, runbook, AGENTS.md - Point activity-core-delegation at the consistency-sweep runbook Consistency engine — automation error vs assessment failure: - C-00 is an automation error; C-01..C-23 assessment failures are recorded for follow-up but no longer fail --remote --all scheduled sweeps (exit 0) - Skip workplans/README.md in the workplan glob (human index, not a workplan) - Progress events and compare script expose automation_error and assessment_failures separately from exit_code
This commit is contained in:
@@ -1,9 +1,8 @@
|
||||
# State Hub Cron → activity-core ActivityDefinition Migration
|
||||
|
||||
> CUST-WP-0040 T04. **Partially implemented** as of `STATE-WP-0064`.
|
||||
> The consistency sweep API surface and ActivityDefinition are landed;
|
||||
> cluster cutover still requires manual canary, parallel week, and local
|
||||
> timer retirement.
|
||||
> CUST-WP-0040 T04. **Consistency sweep cut over** as of `STATE-WP-0064`
|
||||
> (2026-06-21). Scheduling is on activity-core (Railiance01); the local
|
||||
> `custodian-sync.timer` is retired. Stale-task cleanup (B) is still pending.
|
||||
|
||||
The state hub currently runs two recurring maintenance jobs and one
|
||||
per-repo event hook. Once activity-core is ready, each becomes an
|
||||
@@ -16,7 +15,7 @@ keeps the underlying scripts; only the *scheduling* moves.
|
||||
|
||||
| # | Source | Trigger today | Script invoked | What it does |
|
||||
| - | ------------------- | -------------------------------------------------------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
|
||||
| 1 | systemd user timer | every 15 min | `scripts/consistency_check.py --remote --all` | Pull every registered repo, reconcile workplan files ↔ DB, run C-15 writeback + C-16 pull gate |
|
||||
| 1 | activity-core cron | every 15 min (Railiance01) | `POST /consistency/sweep/remote-all` → `consistency_check.py --remote --all` | Pull every registered repo, reconcile workplan files ↔ DB, run C-15 writeback + C-16 pull gate |
|
||||
| 2 | manual / daily cron | `make cleanup-stale` (suggested `0 3 * * *`) | `scripts/cleanup_stale_tasks.py` | Cancel tasks still open in finished/archived workstreams; emits `org.statehub.task.stale` |
|
||||
| 3 | git post-commit | every commit in a registered repo | `make fix-consistency REPO=<slug>` | Per-repo workplan ↔ DB sync immediately after a commit |
|
||||
|
||||
@@ -40,7 +39,7 @@ run them on a schedule.
|
||||
### A. `state-hub-consistency-sweep` (implemented)
|
||||
|
||||
Landed in `the-custodian/activity-definitions/state-hub-consistency-sweep.md`
|
||||
with `enabled: false` until canary and cutover.
|
||||
with `enabled: true` on Railiance01 since 2026-06-21 cutover.
|
||||
|
||||
Invocation path (matches the hourly RecentlyOnScope pattern):
|
||||
|
||||
@@ -56,11 +55,10 @@ checkout from the cluster.
|
||||
Operator runbook: [`docs/consistency-sweep-runbook.md`](consistency-sweep-runbook.md).
|
||||
|
||||
Notes:
|
||||
- Replaces the `custodian-sync.service` + `custodian-sync.timer` pair
|
||||
after parallel week and cutover.
|
||||
- Replaced the `custodian-sync.service` + `custodian-sync.timer` pair
|
||||
(local timer disabled 2026-06-21; units archived under `infra/systemd/archived/`).
|
||||
- Lock semantics (`/tmp/custodian-consistency-remote-all.lock`) stay in
|
||||
the script — activity-core just sets the cadence.
|
||||
- Local timer retirement is tracked in `STATE-WP-0064-T04`.
|
||||
|
||||
### B. `state-hub-stale-task-cleanup`
|
||||
|
||||
@@ -130,8 +128,8 @@ Still optional for B and future splits:
|
||||
| activity-core shell instruction kind with on_failure semantics | activity-core | activity-core/`src/...` |
|
||||
| state-hub adapter exposing `state-hub.health` as a context source | activity-core | activity-core/adapters/ |
|
||||
|
||||
Until B lands and A is cut over, the state hub continues to schedule the
|
||||
consistency sweep via the local systemd timer.
|
||||
A is cut over. Until B lands, stale-task cleanup remains on-demand via
|
||||
`make cleanup-stale` (or a manual daily cron).
|
||||
|
||||
---
|
||||
|
||||
@@ -142,11 +140,9 @@ consistency sweep via the local systemd timer.
|
||||
same DB / NATS effects as the current cron entries.
|
||||
3. Run both in parallel for one week (cron + ActivityDefinition). The
|
||||
scripts are idempotent — duplicate runs are no-ops on a clean state.
|
||||
4. Disable the systemd timer:
|
||||
`systemctl --user disable --now custodian-sync.timer`
|
||||
5. Remove the cleanup-stale cron entry from `crontab -e`.
|
||||
6. Update `infra/README.md` to point at the ActivityDefinitions and
|
||||
archive the systemd unit files.
|
||||
4. ~~Disable the systemd timer~~ — **done** 2026-06-21 (`STATE-WP-0064`).
|
||||
5. Remove the cleanup-stale cron entry from `crontab -e` (when B is enabled).
|
||||
6. ~~Update `infra/README.md` and archive systemd unit files~~ — **done**.
|
||||
7. Per-commit hook stays until a `repo.commit.pushed` event exists.
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user