STATE-WP-0064 cutover (state-hub only): - Retire local custodian-sync.timer; archive units under infra/systemd/archived/ - Mark workplan finished; update infra/README, cron-migration, runbook, AGENTS.md - Point activity-core-delegation at the consistency-sweep runbook Consistency engine — automation error vs assessment failure: - C-00 is an automation error; C-01..C-23 assessment failures are recorded for follow-up but no longer fail --remote --all scheduled sweeps (exit 0) - Skip workplans/README.md in the workplan glob (human index, not a workplan) - Progress events and compare script expose automation_error and assessment_failures separately from exit_code
6.1 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| STATE-WP-0064 | workplan | Move State Hub consistency sync to Railiance01 (activity-core) | custodian | state-hub | finished | codex | custodian | 2026-06-21 | 2026-06-21 | 669d810a-53f4-448b-a0c1-a6543daa7c44 |
STATE-WP-0064 — Move State Hub consistency sync to Railiance01
Origin: history/20260621-weekend-automation-assessment.md and
docs/cron-migration.md design stub (CUST-WP-0040 T04).
The 15-minute workplan↔DB reconciliation is a State Hub read-model
maintenance job across all registered repos. The legacy name custodian-sync
reflects the owning domain, not the job's scope. Operator-facing names should
use State Hub consistency sync; the ActivityDefinition id
the-custodian.state-hub-consistency-sweep already matches this in
docs/cron-migration.md.
This workplan moves scheduling to activity-core on Railiance01 while
scripts/consistency_check.py remains in the state-hub repo.
Depends on STATE-WP-0063 repairing the current broken local path so there is
a known-good baseline before cutover.
Scope
In scope:
- Land
state-hub-consistency-sweepActivityDefinition inthe-custodian/activity-definitions/. - Run the sweep from Railiance01 against the workstation State Hub via the
existing bridge/tunnel pattern (
actcore-state-hub-bridgeor equivalent). - Parallel-run with local
custodian-sync.timerfor validation, then disable the local timer. - Update
infra/README.md,docs/cron-migration.md, and operator runbooks.
Out of scope:
- Changing consistency_check.py reconciliation rules (ADR-001 logic stays).
- Renaming
# custodian-sync-hookin every registered repo's git hook (separate hygiene pass; hooks may keep the marker until all repos are updated). - Per-commit hook migration to event-driven activity-core (see cron-migration §C).
Naming decision (decided)
| Layer | Current | Target |
|---|---|---|
| Operator docs | custodian sync / custodian-sync | State Hub consistency sync |
| ActivityDefinition id | (not landed) | the-custodian.state-hub-consistency-sweep |
| systemd unit (interim) | custodian-sync.{service,timer} |
disabled; archived under infra/systemd/archived/ |
| git hook marker | # custodian-sync-hook |
unchanged in this workplan |
T1 — ActivityDefinition and cluster wiring
id: STATE-WP-0064-T01
status: done
priority: high
state_hub_task_id: "ecc0f846-e00f-4063-8ec1-f6ad630e9265"
Create the-custodian/activity-definitions/state-hub-consistency-sweep.md
from the draft in docs/cron-migration.md §2A, adjusting:
- shell command to reach the workstation repo path or a cluster-side checkout
STATE_HUB_URLvia bridge service (not hard-coded127.0.0.1on cluster)misfire_policy: skipand--max-seconds 300budgeton_failure: log_and_continuefor warn-only sweeps
Sync definition to Railiance01 activity-core (projection manifest per
hourly-recently-on-scope precedent). Enable after manual canary.
Done 2026-06-21:
- State Hub
POST /consistency/sweep/remote-all+ progress eventconsistency_sweep_remote_all - ActivityDefinition in
the-custodian/activity-definitions/ - activity-core resolver query + k8s projection in
20-runtime.yaml - Uses API invocation pattern (not cluster shell into laptop repo)
T2 — Manual canary on Railiance01
id: STATE-WP-0064-T02
status: done
priority: high
state_hub_task_id: "2e9b5b66-a7b1-46a5-8e1f-22e6b5caeff6"
Trigger one manual ActivityRun. Confirm:
consistency_check.py --remote --allcompletes within budget- C-15 writeback and C-16 pull gate behave as today
- progress or activity-core run history shows success
- no duplicate side-effects when local timer also fires (idempotent)
Done 2026-06-21:
- Applied
20-runtime.yamlon Railiance01;actcore-syncupserted definition7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b. - Rebuilt/imported
activity-core:railiance01-prodwithconsistency_sweep_remote_allresolver. - Bridge proxy POST timeout raised to 360s (30s was aborting sweeps).
- Manual canaries: cluster POST via bridge (
exit_code 0) and worker resolver. - Laptop
make sync-activity-definitionsis not valid against Railiance01 DB; use kubectlactcore-syncjob instead.
T3 — Parallel run and observability
id: STATE-WP-0064-T03
status: done
priority: medium
state_hub_task_id: "8abb31ad-2f03-4aa7-889e-e60c3c39f1f8"
Run cluster schedule (*/15 * * * * UTC per design stub) alongside local
custodian-sync.timer for validation. Compare sweep completion rate, lock
skips, and hard failures.
Done 2026-06-21 (accelerated validation — parallel week shortened):
- Enabled
state-hub-consistency-sweepon Railiance01 (enabled: true). - Unified both runners on
POST /consistency/sweep/remote-allwithdetail.source(local-timervsactivity-core). compare_consistency_sweep_parallel.pyover 72h: activity-core 5 events (3 completed, 2 lock_skipped), local-timer 6 events (5 completed, 1 lock_skipped). Matching hard-fail profile (repo-level C-06, not scheduler).- Lock overlap confirmed healthy idempotence. Evidence sufficient for cutover.
T4 — Retire local timer
id: STATE-WP-0064-T04
status: done
priority: medium
state_hub_task_id: "c8275471-5ec0-4dfb-8fec-2b3ec3894036"
After parallel validation passes:
systemctl --user disable --now custodian-sync.timer
Done 2026-06-21:
- Local timer disabled (
inactive,disabled). - Unit files archived to
infra/systemd/archived/. - cron-migration §5 step 4 marked complete.
docs/activity-core-delegation.mdcross-reference added.
T5 — Docs and operator handoff
id: STATE-WP-0064-T05
status: done
priority: low
state_hub_task_id: "270ed7dd-aa79-469d-a817-e3fa1e71be41"
infra/README.md: primary schedule is activity-core on Railiance01; local timer retired.docs/cron-migration.md: §2A promoted to implemented; cutover complete.docs/consistency-sweep-runbook.md: steady-state ops (no parallel week).AGENTS.md: State Hub consistency sync terminology and runbook link.
Done 2026-06-21. Cluster schedule is the sole primary runner.