--- id: STATE-WP-0064 type: workplan title: "Move State Hub consistency sync to Railiance01 (activity-core)" domain: custodian repo: state-hub status: finished owner: codex topic_slug: custodian created: "2026-06-21" updated: "2026-06-21" state_hub_workstream_id: "669d810a-53f4-448b-a0c1-a6543daa7c44" --- # STATE-WP-0064 — Move State Hub consistency sync to Railiance01 **Origin:** `history/20260621-weekend-automation-assessment.md` and `docs/cron-migration.md` design stub (CUST-WP-0040 T04). The 15-minute workplan↔DB reconciliation is a **State Hub read-model maintenance** job across all registered repos. The legacy name **custodian-sync** reflects the owning domain, not the job's scope. Operator-facing names should use **State Hub consistency sync**; the ActivityDefinition id `the-custodian.state-hub-consistency-sweep` already matches this in `docs/cron-migration.md`. This workplan moves **scheduling** to activity-core on Railiance01 while `scripts/consistency_check.py` remains in the `state-hub` repo. Depends on `STATE-WP-0063` repairing the current broken local path so there is a known-good baseline before cutover. ## Scope In scope: - Land `state-hub-consistency-sweep` ActivityDefinition in `the-custodian/activity-definitions/`. - Run the sweep from Railiance01 against the workstation State Hub via the existing bridge/tunnel pattern (`actcore-state-hub-bridge` or equivalent). - Parallel-run with local `custodian-sync.timer` for validation, then disable the local timer. - Update `infra/README.md`, `docs/cron-migration.md`, and operator runbooks. Out of scope: - Changing consistency_check.py reconciliation rules (ADR-001 logic stays). - Renaming `# custodian-sync-hook` in every registered repo's git hook (separate hygiene pass; hooks may keep the marker until all repos are updated). - Per-commit hook migration to event-driven activity-core (see cron-migration §C). ## Naming decision (decided) | Layer | Current | Target | |-------|---------|--------| | Operator docs | custodian sync / custodian-sync | **State Hub consistency sync** | | ActivityDefinition id | (not landed) | `the-custodian.state-hub-consistency-sweep` | | systemd unit (interim) | `custodian-sync.{service,timer}` | disabled; archived under `infra/systemd/archived/` | | git hook marker | `# custodian-sync-hook` | unchanged in this workplan | --- ## T1 — ActivityDefinition and cluster wiring ```task id: STATE-WP-0064-T01 status: done priority: high state_hub_task_id: "ecc0f846-e00f-4063-8ec1-f6ad630e9265" ``` Create `the-custodian/activity-definitions/state-hub-consistency-sweep.md` from the draft in `docs/cron-migration.md` §2A, adjusting: - shell command to reach the workstation repo path or a cluster-side checkout - `STATE_HUB_URL` via bridge service (not hard-coded `127.0.0.1` on cluster) - `misfire_policy: skip` and `--max-seconds 300` budget - `on_failure: log_and_continue` for warn-only sweeps Sync definition to Railiance01 activity-core (projection manifest per `hourly-recently-on-scope` precedent). Enable after manual canary. Done 2026-06-21: - State Hub `POST /consistency/sweep/remote-all` + progress event `consistency_sweep_remote_all` - ActivityDefinition in `the-custodian/activity-definitions/` - activity-core resolver query + k8s projection in `20-runtime.yaml` - Uses API invocation pattern (not cluster shell into laptop repo) ## T2 — Manual canary on Railiance01 ```task id: STATE-WP-0064-T02 status: done priority: high state_hub_task_id: "2e9b5b66-a7b1-46a5-8e1f-22e6b5caeff6" ``` Trigger one manual ActivityRun. Confirm: - `consistency_check.py --remote --all` completes within budget - C-15 writeback and C-16 pull gate behave as today - progress or activity-core run history shows success - no duplicate side-effects when local timer also fires (idempotent) Done 2026-06-21: - Applied `20-runtime.yaml` on Railiance01; `actcore-sync` upserted definition `7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b`. - Rebuilt/imported `activity-core:railiance01-prod` with `consistency_sweep_remote_all` resolver. - Bridge proxy POST timeout raised to 360s (30s was aborting sweeps). - Manual canaries: cluster POST via bridge (`exit_code 0`) and worker resolver. - Laptop `make sync-activity-definitions` is not valid against Railiance01 DB; use kubectl `actcore-sync` job instead. ## T3 — Parallel run and observability ```task id: STATE-WP-0064-T03 status: done priority: medium state_hub_task_id: "8abb31ad-2f03-4aa7-889e-e60c3c39f1f8" ``` Run cluster schedule (`*/15 * * * *` UTC per design stub) alongside local `custodian-sync.timer` for validation. Compare sweep completion rate, lock skips, and hard failures. Done 2026-06-21 (accelerated validation — parallel week shortened): - Enabled `state-hub-consistency-sweep` on Railiance01 (`enabled: true`). - Unified both runners on `POST /consistency/sweep/remote-all` with `detail.source` (`local-timer` vs `activity-core`). - `compare_consistency_sweep_parallel.py` over 72h: activity-core 5 events (3 completed, 2 lock_skipped), local-timer 6 events (5 completed, 1 lock_skipped). Matching hard-fail profile (repo-level C-06, not scheduler). - Lock overlap confirmed healthy idempotence. Evidence sufficient for cutover. ## T4 — Retire local timer ```task id: STATE-WP-0064-T04 status: done priority: medium state_hub_task_id: "c8275471-5ec0-4dfb-8fec-2b3ec3894036" ``` After parallel validation passes: ```bash systemctl --user disable --now custodian-sync.timer ``` Done 2026-06-21: - Local timer disabled (`inactive`, `disabled`). - Unit files archived to `infra/systemd/archived/`. - cron-migration §5 step 4 marked complete. - `docs/activity-core-delegation.md` cross-reference added. ## T5 — Docs and operator handoff ```task id: STATE-WP-0064-T05 status: done priority: low state_hub_task_id: "270ed7dd-aa79-469d-a817-e3fa1e71be41" ``` - `infra/README.md`: primary schedule is activity-core on Railiance01; local timer retired. - `docs/cron-migration.md`: §2A promoted to implemented; cutover complete. - `docs/consistency-sweep-runbook.md`: steady-state ops (no parallel week). - `AGENTS.md`: State Hub consistency sync terminology and runbook link. Done 2026-06-21. Cluster schedule is the sole primary runner.