--- id: CUST-WP-0026 type: workplan title: "Distributed Consistency — Multi-Machine State Sync" domain: custodian repo: the-custodian status: done owner: custodian topic_slug: custodian created: "2026-03-21" updated: "2026-03-21" state_hub_workstream_id: "32de6210-ce1e-4cba-ad1f-fdeba462030d" --- # Distributed Consistency — Multi-Machine State Sync ## Problem The consistency checker assumes local workplan files are always the authoritative source of truth. This breaks in the primary development workflow: 1. Implementation runs on **CoulombCore** (remote) 2. Task status is written to the **state-hub DB** via ops-bridge tunnel 3. The **workstation's local repo** is not updated (no `git pull`) 4. Session close triggers `fix-consistency` on the workstation 5. Checker reads stale local files (tasks still `todo`) and **regresses** DB status — overwriting `done`/`in_progress` back to `todo` 6. The dashboard shows progress, then silently reverts This is a design assumption in ADR-001 that breaks under multi-machine workflows. ADR-001 states the DB is rebuilt from files — but only holds when local files are always up to date. ## Goal Eliminate false regressions and make `fix-consistency` safe to run regardless of local repo staleness. Three layers of defence: - **T01** (no-regress rule): Never allow fix-consistency to move a task *backwards* in status. DB-ahead wins. - **T02** (pull gate): Detect and warn when local repo is behind its remote before applying fixes. - **T03** (DB→file writeback): Write DB status back into workplan files and commit, so files stay truthful and the multi-machine workflow naturally converges. ## Implementation Notes The status progression order for the no-regress rule: `todo → in_progress → blocked → done → cancelled` For the pull gate, `git fetch` is the only network call needed. No push, no merge — just detection. The fix mode should refuse or warn; check mode should always be allowed to report. For writeback (T03), `fix-consistency --fix` needs to: 1. Detect tasks where DB status > file status 2. Edit the workplan file (update the `status:` field in the task block) 3. Stage and commit the change with a standard commit message Writeback must be idempotent and must not alter anything other than `status:` fields in task blocks. ## Tasks ### T01 — No-regress rule in consistency_check.py ```task id: CUST-WP-0026-T01 status: done priority: high state_hub_task_id: "34a76f4c-ad3f-4780-ad62-1e788ceca224" ``` Modify `state-hub/scripts/consistency_check.py` so that `--fix` mode never regresses task status in the DB. **Status ordering:** ```python STATUS_ORDER = {"todo": 0, "in_progress": 1, "blocked": 1, "done": 2, "cancelled": 2} ``` In the C-11 fix path (file task found, DB task found, statuses differ): - If `STATUS_ORDER[db_status] >= STATUS_ORDER[file_status]`: skip the DB update, emit a new check code **C-13** WARN: `"DB task '{title}' is ahead of file (db={db_status}, file={file_status}) — skipped to prevent regression"` - If `STATUS_ORDER[db_status] < STATUS_ORDER[file_status]`: apply the update as today (file is ahead, sync forward) New check code **C-13**: "DB task ahead of workplan file — regression prevented". Severity: WARN (not FAIL — this is expected in multi-machine workflows). Gate: `make test` must pass after this change. --- ### T02 — Git pull gate before --fix ```task id: CUST-WP-0026-T02 status: done priority: high state_hub_task_id: "f9dbad4e-ba66-4e20-83ef-93b78c9e1590" ``` Add a remote-staleness check to `consistency_check.py` that runs at the start of `--fix` mode for each repo being checked. **Detection logic:** ```bash git -C fetch --quiet origin 2>/dev/null LOCAL=$(git -C rev-parse HEAD) REMOTE=$(git -C rev-parse @{u} 2>/dev/null) # If LOCAL != REMOTE and REMOTE is reachable → repo is behind ``` If the repo is behind its remote tracking branch: - In `--fix` mode: emit **C-14** WARN and skip all write operations for that repo. Print: `"Repo '{slug}' is behind remote — pull before fixing to avoid clobbering remote progress"`. - In check-only mode: emit C-14 INFO (no-op, just informational). The `git fetch` must be best-effort — if the remote is unreachable (offline, ops-bridge down), skip the check silently rather than failing. New check code **C-14**: "Repo behind remote tracking branch". Severity: WARN in fix mode, INFO in check mode. Gate: `make test` must pass. Add a test that simulates a behind-remote repo (mock `rev-parse` output). --- ### T03 — DB→file status writeback ```task id: CUST-WP-0026-T03 status: done priority: medium state_hub_task_id: "749130f9-b397-46fd-8eb3-43c0fc127dac" ``` Extend `consistency_check.py --fix` to write DB status back into workplan files when DB is ahead of the file (the C-13 case from T01). **Writeback logic:** 1. Locate the task block in the workplan file by matching `id: ` 2. Replace the `status: ` line within that block with `status: ` 3. Stage the file: `git -C add ` 4. Commit with message: ``` chore(consistency): sync task status from DB [auto] Updated by fix-consistency on : - : ``` **Guard rails:** - Only modify lines inside a ` ```task ... ``` ` block - Only change the `status:` field — never touch `id:`, `priority:`, `state_hub_task_id:`, or any other field - If the workplan file has uncommitted local changes, skip writeback for that file and emit C-14 WARN ("workplan has uncommitted changes — skipping writeback") - If git commit fails for any reason, log the error but do not abort the rest of the consistency run **New flag:** `--no-writeback` — disables T03 behaviour while keeping T01/T02 active. Default: writeback enabled when `--fix` is set. Gate: `make test` must pass. The existing workplan parsing tests should cover the task block regex; add a writeback-specific test. --- ### T04 — Session protocol update ```task id: CUST-WP-0026-T04 status: done priority: medium state_hub_task_id: "59a5d09a-1e67-4749-9d84-039982edc3ef" ``` Update `the-custodian/CLAUDE.md` session close protocol (step 5) to reflect the new behaviour and add the recommended pre-fix step: **Current step 5:** > If any workplan files were written or modified this session, run: > `make fix-consistency REPO=the-custodian` **Updated step 5:** > Before running fix-consistency on any repo that has a remote, ensure the > local copy is up to date: > ```bash > git -C pull --ff-only > cd state-hub && make fix-consistency REPO= > ``` > The consistency checker will now warn (C-14) if the repo is still behind > and refuse to regress status (C-13). A C-13 warning is normal for repos > where work has progressed on a remote machine — it means writeback is > keeping the files in sync. Also update the `state-hub/scripts/project_rules/session-protocol.template` so newly registered repos get the updated guidance. --- ### T05 — Makefile: fix-consistency-remote target ```task id: CUST-WP-0026-T05 status: done priority: low state_hub_task_id: "b8375cbc-9c44-48f6-a78c-b7333d409525" ``` Add a convenience target to `state-hub/Makefile` that pulls before fixing: ```makefile ## Pull repo then sync consistency: make fix-consistency-remote REPO=net-kingdom fix-consistency-remote: @test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1) $(eval REPO_PATH := $(shell \ curl -s $(API_BASE)/repos/?slug=$(REPO) | \ python3 -c "import json,sys; \ repos=json.load(sys.stdin); \ print(next((r['local_path'] for r in repos if r['slug']=='$(REPO)'), ''))" \ )) @test -n "$(REPO_PATH)" || (echo "ERROR: repo '$(REPO)' not found in state-hub"; exit 1) git -C "$(REPO_PATH)" pull --ff-only || \ (echo "WARN: pull failed (conflicts or no remote) — running fix-consistency anyway"; true) $(MAKE) fix-consistency REPO=$(REPO) REPO_PATH=$(REPO_PATH) ``` This makes the safe path the convenient path: `make fix-consistency-remote REPO=net-kingdom` ## Done Criteria - [ ] `make fix-consistency REPO=net-kingdom` never regresses a `done` task back to `todo` when local file is stale - [ ] C-13 warning is emitted (not error) when DB is ahead of file - [ ] C-14 warning is emitted in fix mode when repo is behind remote; fix operations are skipped for that repo - [ ] DB→file writeback commits corrected status to the workplan file - [ ] `--no-writeback` flag disables writeback cleanly - [ ] `make fix-consistency-remote REPO=` pulls then fixes in one step - [ ] `make test` passes after all changes - [ ] Session protocol updated in CLAUDE.md and session-protocol.template