Files
state-hub/workplans/STATE-WP-0064-statehub-consistency-sync-railiance01.md
tegwick 3d5e354ff8 docs(state-hub): weekend automation assessment and repair workplans
Persist the Fri-evening→Sun-afternoon automation gap assessment in
history/, and add STATE-WP-0063 (repair broken paths and cluster
reachability) plus STATE-WP-0064 (move State Hub consistency sync to
Railiance01 via activity-core). Workplans registered in State Hub via
fix-consistency.
2026-06-21 17:32:44 +02:00

4.8 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
STATE-WP-0064 workplan Move State Hub consistency sync to Railiance01 (activity-core) custodian state-hub ready codex custodian 2026-06-21 2026-06-21 669d810a-53f4-448b-a0c1-a6543daa7c44

STATE-WP-0064 — Move State Hub consistency sync to Railiance01

Origin: history/20260621-weekend-automation-assessment.md and docs/cron-migration.md design stub (CUST-WP-0040 T04).

The 15-minute workplan↔DB reconciliation is a State Hub read-model maintenance job across all registered repos. The legacy name custodian-sync reflects the owning domain, not the job's scope. Operator-facing names should use State Hub consistency sync; the ActivityDefinition id the-custodian.state-hub-consistency-sweep already matches this in docs/cron-migration.md.

This workplan moves scheduling to activity-core on Railiance01 while scripts/consistency_check.py remains in the state-hub repo.

Depends on STATE-WP-0063 repairing the current broken local path so there is a known-good baseline before cutover.

Scope

In scope:

  • Land state-hub-consistency-sweep ActivityDefinition in the-custodian/activity-definitions/.
  • Run the sweep from Railiance01 against the workstation State Hub via the existing bridge/tunnel pattern (actcore-state-hub-bridge or equivalent).
  • Parallel-run with local custodian-sync.timer for one week, then disable the local timer.
  • Update infra/README.md, docs/cron-migration.md, and operator runbooks.

Out of scope:

  • Changing consistency_check.py reconciliation rules (ADR-001 logic stays).
  • Renaming # custodian-sync-hook in every registered repo's git hook (separate hygiene pass; hooks may keep the marker until all repos are updated).
  • Per-commit hook migration to event-driven activity-core (see cron-migration §C).

Naming decision (decided)

Layer Current Target
Operator docs custodian sync / custodian-sync State Hub consistency sync
ActivityDefinition id (not landed) the-custodian.state-hub-consistency-sweep
systemd unit (interim) custodian-sync.{service,timer} disable after cutover; optional rename to statehub-consistency-sync.* during WP-0063 if low cost
git hook marker # custodian-sync-hook unchanged in this workplan

T1 — ActivityDefinition and cluster wiring

id: STATE-WP-0064-T01
status: todo
priority: high
state_hub_task_id: "ecc0f846-e00f-4063-8ec1-f6ad630e9265"

Create the-custodian/activity-definitions/state-hub-consistency-sweep.md from the draft in docs/cron-migration.md §2A, adjusting:

  • shell command to reach the workstation repo path or a cluster-side checkout
  • STATE_HUB_URL via bridge service (not hard-coded 127.0.0.1 on cluster)
  • misfire_policy: skip and --max-seconds 300 budget
  • on_failure: log_and_continue for warn-only sweeps

Sync definition to Railiance01 activity-core (projection manifest per hourly-recently-on-scope precedent). Enable after manual canary.

T2 — Manual canary on Railiance01

id: STATE-WP-0064-T02
status: todo
priority: high
state_hub_task_id: "2e9b5b66-a7b1-46a5-8e1f-22e6b5caeff6"

Trigger one manual ActivityRun. Confirm:

  • consistency_check.py --remote --all completes within budget
  • C-15 writeback and C-16 pull gate behave as today
  • progress or activity-core run history shows success
  • no duplicate side-effects when local timer also fires (idempotent)

T3 — Parallel run and observability

id: STATE-WP-0064-T03
status: todo
priority: medium
state_hub_task_id: "8abb31ad-2f03-4aa7-889e-e60c3c39f1f8"

Run cluster schedule (*/15 * * * * UTC per design stub) alongside local custodian-sync.timer for one week. Compare:

  • sweep completion rate
  • repos skipped due to lock or budget
  • hard failures vs warn-only exits

Document comparison in a progress event or short runbook addendum.

T4 — Retire local timer

id: STATE-WP-0064-T04
status: todo
priority: medium
state_hub_task_id: "c8275471-5ec0-4dfb-8fec-2b3ec3894036"

After parallel week passes:

systemctl --user disable --now custodian-sync.timer

Archive or update unit files under infra/. Mark cron-migration stub §5 step 4 complete. Update docs/activity-core-delegation.md cross-reference.

T5 — Docs and operator handoff

id: STATE-WP-0064-T05
status: todo
priority: low
state_hub_task_id: "270ed7dd-aa79-469d-a817-e3fa1e71be41"
  • infra/README.md: primary schedule is activity-core on Railiance01; local timer is retired.
  • docs/cron-migration.md: promote §2A from design stub to implemented; note blockers cleared.
  • Dashboard or AGENTS snippet: "State Hub consistency sync" terminology.

Mark workplan finished when cluster schedule is the sole primary runner.