Files
state-hub/infra/README.md
tegwick 39ed5459b9 finish(STATE-WP-0064): cut over scheduler and split sweep errors from failures
STATE-WP-0064 cutover (state-hub only):
- Retire local custodian-sync.timer; archive units under infra/systemd/archived/
- Mark workplan finished; update infra/README, cron-migration, runbook, AGENTS.md
- Point activity-core-delegation at the consistency-sweep runbook

Consistency engine — automation error vs assessment failure:
- C-00 is an automation error; C-01..C-23 assessment failures are recorded
  for follow-up but no longer fail --remote --all scheduled sweeps (exit 0)
- Skip workplans/README.md in the workplan glob (human index, not a workplan)
- Progress events and compare script expose automation_error and
  assessment_failures separately from exit_code
2026-06-22 01:20:59 +02:00

2.1 KiB

State Hub Infrastructure

Docker (PostgreSQL)

# Start postgres (required for API)
make db

# Start postgres + pgadmin
make db-tools

The compose file is infra/docker-compose.yml. Copy .env.example to .env and set POSTGRES_PASSWORD before starting.


Periodic Repo Sync — activity-core (Railiance01)

The State Hub consistency sync runs every 15 minutes (*/15 * * * * UTC) on activity-core (Railiance01). The cluster schedule triggers POST /consistency/sweep/remote-all on the workstation State Hub via the actcore-state-hub-bridge tunnel.

Operator runbook: docs/consistency-sweep-runbook.md.

Prerequisites for cluster-triggered sweeps:

  • Workstation State Hub API running (make api or equivalent)
  • state-hub-railiance01 ops-bridge tunnel connected
  • Workstation awake (execution still runs locally; only scheduling moved)

Per-repo git post-commit hooks remain the immediate consistency path after each commit. The 15-minute sweep is belt-and-suspenders across all registered repos.

The all-repo remote sweep has built-in load guards:

  • A nonblocking process lock at /tmp/custodian-consistency-remote-all.lock; overlapping triggers exit cleanly with lock_skipped: true.
  • A wall-clock budget, defaulting to 300 seconds. Remaining repos are skipped once the budget is exhausted.

Retired local timer

The legacy custodian-sync.{service,timer} systemd units were disabled 2026-06-21 (STATE-WP-0064). Archived templates live in infra/systemd/archived/. Do not re-enable unless debugging a cluster scheduling outage.


Post-commit hooks

Each registered repo can have a custodian sync hook installed that triggers fix-consistency automatically after every commit:

# Install into one repo
make install-hooks REPO=marki-docx

# Install into all active registered repos
make install-hooks-all

# Remove from one repo
make remove-hooks REPO=marki-docx

The hook is idempotent (guarded by # custodian-sync-hook marker) and runs in the background so it does not block the commit.