8.6 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| CUST-WP-0026 | workplan | Distributed Consistency — Multi-Machine State Sync | custodian | the-custodian | done | custodian | custodian | 2026-03-21 | 2026-03-21 | 32de6210-ce1e-4cba-ad1f-fdeba462030d |
Distributed Consistency — Multi-Machine State Sync
Problem
The consistency checker assumes local workplan files are always the authoritative source of truth. This breaks in the primary development workflow:
- Implementation runs on CoulombCore (remote)
- Task status is written to the state-hub DB via ops-bridge tunnel
- The workstation's local repo is not updated (no
git pull) - Session close triggers
fix-consistencyon the workstation - Checker reads stale local files (tasks still
todo) and regresses DB status — overwritingdone/in_progressback totodo - The dashboard shows progress, then silently reverts
This is a design assumption in ADR-001 that breaks under multi-machine workflows. ADR-001 states the DB is rebuilt from files — but only holds when local files are always up to date.
Goal
Eliminate false regressions and make fix-consistency safe to run regardless
of local repo staleness. Three layers of defence:
- T01 (no-regress rule): Never allow fix-consistency to move a task backwards in status. DB-ahead wins.
- T02 (pull gate): Detect and warn when local repo is behind its remote before applying fixes.
- T03 (DB→file writeback): Write DB status back into workplan files and commit, so files stay truthful and the multi-machine workflow naturally converges.
Implementation Notes
The status progression order for the no-regress rule:
todo → in_progress → blocked → done → cancelled
For the pull gate, git fetch is the only network call needed. No push, no
merge — just detection. The fix mode should refuse or warn; check mode should
always be allowed to report.
For writeback (T03), fix-consistency --fix needs to:
- Detect tasks where DB status > file status
- Edit the workplan file (update the
status:field in the task block) - Stage and commit the change with a standard commit message
Writeback must be idempotent and must not alter anything other than status:
fields in task blocks.
Tasks
T01 — No-regress rule in consistency_check.py
id: CUST-WP-0026-T01
status: done
priority: high
state_hub_task_id: "34a76f4c-ad3f-4780-ad62-1e788ceca224"
Modify state-hub/scripts/consistency_check.py so that --fix mode never
regresses task status in the DB.
Status ordering:
STATUS_ORDER = {"todo": 0, "in_progress": 1, "blocked": 1,
"done": 2, "cancelled": 2}
In the C-11 fix path (file task found, DB task found, statuses differ):
- If
STATUS_ORDER[db_status] >= STATUS_ORDER[file_status]: skip the DB update, emit a new check code C-13 WARN:"DB task '{title}' is ahead of file (db={db_status}, file={file_status}) — skipped to prevent regression" - If
STATUS_ORDER[db_status] < STATUS_ORDER[file_status]: apply the update as today (file is ahead, sync forward)
New check code C-13: "DB task ahead of workplan file — regression prevented". Severity: WARN (not FAIL — this is expected in multi-machine workflows).
Gate: make test must pass after this change.
T02 — Git pull gate before --fix
id: CUST-WP-0026-T02
status: done
priority: high
state_hub_task_id: "f9dbad4e-ba66-4e20-83ef-93b78c9e1590"
Add a remote-staleness check to consistency_check.py that runs at the start
of --fix mode for each repo being checked.
Detection logic:
git -C <repo_path> fetch --quiet origin 2>/dev/null
LOCAL=$(git -C <repo_path> rev-parse HEAD)
REMOTE=$(git -C <repo_path> rev-parse @{u} 2>/dev/null)
# If LOCAL != REMOTE and REMOTE is reachable → repo is behind
If the repo is behind its remote tracking branch:
- In
--fixmode: emit C-14 WARN and skip all write operations for that repo. Print:"Repo '{slug}' is behind remote — pull before fixing to avoid clobbering remote progress". - In check-only mode: emit C-14 INFO (no-op, just informational).
The git fetch must be best-effort — if the remote is unreachable (offline,
ops-bridge down), skip the check silently rather than failing.
New check code C-14: "Repo behind remote tracking branch". Severity: WARN in fix mode, INFO in check mode.
Gate: make test must pass. Add a test that simulates a behind-remote repo
(mock rev-parse output).
T03 — DB→file status writeback
id: CUST-WP-0026-T03
status: done
priority: medium
state_hub_task_id: "749130f9-b397-46fd-8eb3-43c0fc127dac"
Extend consistency_check.py --fix to write DB status back into workplan
files when DB is ahead of the file (the C-13 case from T01).
Writeback logic:
- Locate the task block in the workplan file by matching
id: <task_id> - Replace the
status: <old>line within that block withstatus: <new> - Stage the file:
git -C <repo_path> add <workplan_file> - Commit with message:
chore(consistency): sync task status from DB [auto] Updated by fix-consistency on <ISO-date>: - <task_id>: <old_status> → <new_status>
Guard rails:
- Only modify lines inside a
```task ... ```block - Only change the
status:field — never touchid:,priority:,state_hub_task_id:, or any other field - If the workplan file has uncommitted local changes, skip writeback for that file and emit C-14 WARN ("workplan has uncommitted changes — skipping writeback")
- If git commit fails for any reason, log the error but do not abort the rest of the consistency run
New flag: --no-writeback — disables T03 behaviour while keeping T01/T02
active. Default: writeback enabled when --fix is set.
Gate: make test must pass. The existing workplan parsing tests should cover
the task block regex; add a writeback-specific test.
T04 — Session protocol update
id: CUST-WP-0026-T04
status: done
priority: medium
state_hub_task_id: "59a5d09a-1e67-4749-9d84-039982edc3ef"
Update the-custodian/CLAUDE.md session close protocol (step 5) to reflect
the new behaviour and add the recommended pre-fix step:
Current step 5:
If any workplan files were written or modified this session, run:
make fix-consistency REPO=the-custodian
Updated step 5:
Before running fix-consistency on any repo that has a remote, ensure the local copy is up to date:
git -C <repo_path> pull --ff-only cd state-hub && make fix-consistency REPO=<slug>The consistency checker will now warn (C-14) if the repo is still behind and refuse to regress status (C-13). A C-13 warning is normal for repos where work has progressed on a remote machine — it means writeback is keeping the files in sync.
Also update the state-hub/scripts/project_rules/session-protocol.template
so newly registered repos get the updated guidance.
T05 — Makefile: fix-consistency-remote target
id: CUST-WP-0026-T05
status: done
priority: low
state_hub_task_id: "b8375cbc-9c44-48f6-a78c-b7333d409525"
Add a convenience target to state-hub/Makefile that pulls before fixing:
## Pull repo then sync consistency: make fix-consistency-remote REPO=net-kingdom
fix-consistency-remote:
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
$(eval REPO_PATH := $(shell \
curl -s $(API_BASE)/repos/?slug=$(REPO) | \
python3 -c "import json,sys; \
repos=json.load(sys.stdin); \
print(next((r['local_path'] for r in repos if r['slug']=='$(REPO)'), ''))" \
))
@test -n "$(REPO_PATH)" || (echo "ERROR: repo '$(REPO)' not found in state-hub"; exit 1)
git -C "$(REPO_PATH)" pull --ff-only || \
(echo "WARN: pull failed (conflicts or no remote) — running fix-consistency anyway"; true)
$(MAKE) fix-consistency REPO=$(REPO) REPO_PATH=$(REPO_PATH)
This makes the safe path the convenient path:
make fix-consistency-remote REPO=net-kingdom
Done Criteria
make fix-consistency REPO=net-kingdomnever regresses adonetask back totodowhen local file is stale- C-13 warning is emitted (not error) when DB is ahead of file
- C-14 warning is emitted in fix mode when repo is behind remote; fix operations are skipped for that repo
- DB→file writeback commits corrected status to the workplan file
--no-writebackflag disables writeback cleanlymake fix-consistency-remote REPO=<slug>pulls then fixes in one stepmake testpasses after all changes- Session protocol updated in CLAUDE.md and session-protocol.template