Files
state-hub/infra/README.md
tegwick 39ed5459b9 finish(STATE-WP-0064): cut over scheduler and split sweep errors from failures
STATE-WP-0064 cutover (state-hub only):
- Retire local custodian-sync.timer; archive units under infra/systemd/archived/
- Mark workplan finished; update infra/README, cron-migration, runbook, AGENTS.md
- Point activity-core-delegation at the consistency-sweep runbook

Consistency engine — automation error vs assessment failure:
- C-00 is an automation error; C-01..C-23 assessment failures are recorded
  for follow-up but no longer fail --remote --all scheduled sweeps (exit 0)
- Skip workplans/README.md in the workplan glob (human index, not a workplan)
- Progress events and compare script expose automation_error and
  assessment_failures separately from exit_code
2026-06-22 01:20:59 +02:00

70 lines
2.1 KiB
Markdown

# State Hub Infrastructure
## Docker (PostgreSQL)
```bash
# Start postgres (required for API)
make db
# Start postgres + pgadmin
make db-tools
```
The compose file is `infra/docker-compose.yml`. Copy `.env.example` to `.env` and set
`POSTGRES_PASSWORD` before starting.
---
## Periodic Repo Sync — activity-core (Railiance01)
The **State Hub consistency sync** runs every 15 minutes (`*/15 * * * *` UTC)
on activity-core (Railiance01). The cluster schedule triggers
`POST /consistency/sweep/remote-all` on the workstation State Hub via the
`actcore-state-hub-bridge` tunnel.
Operator runbook: [`docs/consistency-sweep-runbook.md`](../docs/consistency-sweep-runbook.md).
**Prerequisites for cluster-triggered sweeps:**
- Workstation State Hub API running (`make api` or equivalent)
- `state-hub-railiance01` ops-bridge tunnel `connected`
- Workstation awake (execution still runs locally; only scheduling moved)
Per-repo git post-commit hooks remain the immediate consistency path after
each commit. The 15-minute sweep is belt-and-suspenders across all registered
repos.
The all-repo remote sweep has built-in load guards:
- A nonblocking process lock at `/tmp/custodian-consistency-remote-all.lock`;
overlapping triggers exit cleanly with `lock_skipped: true`.
- A wall-clock budget, defaulting to 300 seconds. Remaining repos are skipped
once the budget is exhausted.
### Retired local timer
The legacy `custodian-sync.{service,timer}` systemd units were disabled
2026-06-21 (`STATE-WP-0064`). Archived templates live in
[`infra/systemd/archived/`](systemd/archived/). Do not re-enable unless
debugging a cluster scheduling outage.
---
## Post-commit hooks
Each registered repo can have a custodian sync hook installed that triggers
`fix-consistency` automatically after every commit:
```bash
# Install into one repo
make install-hooks REPO=marki-docx
# Install into all active registered repos
make install-hooks-all
# Remove from one repo
make remove-hooks REPO=marki-docx
```
The hook is idempotent (guarded by `# custodian-sync-hook` marker) and runs
in the background so it does not block the commit.