diff --git a/workplans/CUST-WP-0026-distributed-consistency.md b/workplans/CUST-WP-0026-distributed-consistency.md new file mode 100644 index 0000000..9cb95a6 --- /dev/null +++ b/workplans/CUST-WP-0026-distributed-consistency.md @@ -0,0 +1,252 @@ +--- +id: CUST-WP-0026 +type: workplan +title: "Distributed Consistency — Multi-Machine State Sync" +domain: custodian +repo: the-custodian +status: active +owner: custodian +topic_slug: custodian +created: "2026-03-21" +updated: "2026-03-21" +state_hub_workstream_id: "32de6210-ce1e-4cba-ad1f-fdeba462030d" +--- + +# Distributed Consistency — Multi-Machine State Sync + +## Problem + +The consistency checker assumes local workplan files are always the authoritative +source of truth. This breaks in the primary development workflow: + +1. Implementation runs on **CoulombCore** (remote) +2. Task status is written to the **state-hub DB** via ops-bridge tunnel +3. The **workstation's local repo** is not updated (no `git pull`) +4. Session close triggers `fix-consistency` on the workstation +5. Checker reads stale local files (tasks still `todo`) and **regresses** DB + status — overwriting `done`/`in_progress` back to `todo` +6. The dashboard shows progress, then silently reverts + +This is a design assumption in ADR-001 that breaks under multi-machine workflows. +ADR-001 states the DB is rebuilt from files — but only holds when local files +are always up to date. + +## Goal + +Eliminate false regressions and make `fix-consistency` safe to run regardless +of local repo staleness. Three layers of defence: + +- **T01** (no-regress rule): Never allow fix-consistency to move a task + *backwards* in status. DB-ahead wins. +- **T02** (pull gate): Detect and warn when local repo is behind its remote + before applying fixes. +- **T03** (DB→file writeback): Write DB status back into workplan files and + commit, so files stay truthful and the multi-machine workflow naturally + converges. + +## Implementation Notes + +The status progression order for the no-regress rule: +`todo → in_progress → blocked → done → cancelled` + +For the pull gate, `git fetch` is the only network call needed. No push, no +merge — just detection. The fix mode should refuse or warn; check mode should +always be allowed to report. + +For writeback (T03), `fix-consistency --fix` needs to: +1. Detect tasks where DB status > file status +2. Edit the workplan file (update the `status:` field in the task block) +3. Stage and commit the change with a standard commit message + +Writeback must be idempotent and must not alter anything other than `status:` +fields in task blocks. + +## Tasks + +### T01 — No-regress rule in consistency_check.py + +```task +id: CUST-WP-0026-T01 +status: todo +priority: high +state_hub_task_id: "34a76f4c-ad3f-4780-ad62-1e788ceca224" +``` + +Modify `state-hub/scripts/consistency_check.py` so that `--fix` mode never +regresses task status in the DB. + +**Status ordering:** +```python +STATUS_ORDER = {"todo": 0, "in_progress": 1, "blocked": 1, + "done": 2, "cancelled": 2} +``` + +In the C-11 fix path (file task found, DB task found, statuses differ): +- If `STATUS_ORDER[db_status] >= STATUS_ORDER[file_status]`: skip the DB + update, emit a new check code **C-13** WARN: + `"DB task '{title}' is ahead of file (db={db_status}, file={file_status}) — skipped to prevent regression"` +- If `STATUS_ORDER[db_status] < STATUS_ORDER[file_status]`: apply the update + as today (file is ahead, sync forward) + +New check code **C-13**: "DB task ahead of workplan file — regression +prevented". Severity: WARN (not FAIL — this is expected in multi-machine +workflows). + +Gate: `make test` must pass after this change. + +--- + +### T02 — Git pull gate before --fix + +```task +id: CUST-WP-0026-T02 +status: todo +priority: high +state_hub_task_id: "f9dbad4e-ba66-4e20-83ef-93b78c9e1590" +``` + +Add a remote-staleness check to `consistency_check.py` that runs at the start +of `--fix` mode for each repo being checked. + +**Detection logic:** +```bash +git -C fetch --quiet origin 2>/dev/null +LOCAL=$(git -C rev-parse HEAD) +REMOTE=$(git -C rev-parse @{u} 2>/dev/null) +# If LOCAL != REMOTE and REMOTE is reachable → repo is behind +``` + +If the repo is behind its remote tracking branch: +- In `--fix` mode: emit **C-14** WARN and skip all write operations for that + repo. Print: `"Repo '{slug}' is behind remote — pull before fixing to avoid + clobbering remote progress"`. +- In check-only mode: emit C-14 INFO (no-op, just informational). + +The `git fetch` must be best-effort — if the remote is unreachable (offline, +ops-bridge down), skip the check silently rather than failing. + +New check code **C-14**: "Repo behind remote tracking branch". Severity: WARN +in fix mode, INFO in check mode. + +Gate: `make test` must pass. Add a test that simulates a behind-remote repo +(mock `rev-parse` output). + +--- + +### T03 — DB→file status writeback + +```task +id: CUST-WP-0026-T03 +status: todo +priority: medium +state_hub_task_id: "749130f9-b397-46fd-8eb3-43c0fc127dac" +``` + +Extend `consistency_check.py --fix` to write DB status back into workplan +files when DB is ahead of the file (the C-13 case from T01). + +**Writeback logic:** +1. Locate the task block in the workplan file by matching `id: ` +2. Replace the `status: ` line within that block with `status: ` +3. Stage the file: `git -C add ` +4. Commit with message: + ``` + chore(consistency): sync task status from DB [auto] + + Updated by fix-consistency on : + - : + ``` + +**Guard rails:** +- Only modify lines inside a ` ```task ... ``` ` block +- Only change the `status:` field — never touch `id:`, `priority:`, + `state_hub_task_id:`, or any other field +- If the workplan file has uncommitted local changes, skip writeback for that + file and emit C-14 WARN ("workplan has uncommitted changes — skipping + writeback") +- If git commit fails for any reason, log the error but do not abort the rest + of the consistency run + +**New flag:** `--no-writeback` — disables T03 behaviour while keeping T01/T02 +active. Default: writeback enabled when `--fix` is set. + +Gate: `make test` must pass. The existing workplan parsing tests should cover +the task block regex; add a writeback-specific test. + +--- + +### T04 — Session protocol update + +```task +id: CUST-WP-0026-T04 +status: todo +priority: medium +state_hub_task_id: "59a5d09a-1e67-4749-9d84-039982edc3ef" +``` + +Update `the-custodian/CLAUDE.md` session close protocol (step 5) to reflect +the new behaviour and add the recommended pre-fix step: + +**Current step 5:** +> If any workplan files were written or modified this session, run: +> `make fix-consistency REPO=the-custodian` + +**Updated step 5:** +> Before running fix-consistency on any repo that has a remote, ensure the +> local copy is up to date: +> ```bash +> git -C pull --ff-only +> cd state-hub && make fix-consistency REPO= +> ``` +> The consistency checker will now warn (C-14) if the repo is still behind +> and refuse to regress status (C-13). A C-13 warning is normal for repos +> where work has progressed on a remote machine — it means writeback is +> keeping the files in sync. + +Also update the `state-hub/scripts/project_rules/session-protocol.template` +so newly registered repos get the updated guidance. + +--- + +### T05 — Makefile: fix-consistency-remote target + +```task +id: CUST-WP-0026-T05 +status: todo +priority: low +state_hub_task_id: "b8375cbc-9c44-48f6-a78c-b7333d409525" +``` + +Add a convenience target to `state-hub/Makefile` that pulls before fixing: + +```makefile +## Pull repo then sync consistency: make fix-consistency-remote REPO=net-kingdom +fix-consistency-remote: + @test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1) + $(eval REPO_PATH := $(shell \ + curl -s $(API_BASE)/repos/?slug=$(REPO) | \ + python3 -c "import json,sys; \ + repos=json.load(sys.stdin); \ + print(next((r['local_path'] for r in repos if r['slug']=='$(REPO)'), ''))" \ + )) + @test -n "$(REPO_PATH)" || (echo "ERROR: repo '$(REPO)' not found in state-hub"; exit 1) + git -C "$(REPO_PATH)" pull --ff-only || \ + (echo "WARN: pull failed (conflicts or no remote) — running fix-consistency anyway"; true) + $(MAKE) fix-consistency REPO=$(REPO) REPO_PATH=$(REPO_PATH) +``` + +This makes the safe path the convenient path: +`make fix-consistency-remote REPO=net-kingdom` + +## Done Criteria + +- [ ] `make fix-consistency REPO=net-kingdom` never regresses a `done` task + back to `todo` when local file is stale +- [ ] C-13 warning is emitted (not error) when DB is ahead of file +- [ ] C-14 warning is emitted in fix mode when repo is behind remote; + fix operations are skipped for that repo +- [ ] DB→file writeback commits corrected status to the workplan file +- [ ] `--no-writeback` flag disables writeback cleanly +- [ ] `make fix-consistency-remote REPO=` pulls then fixes in one step +- [ ] `make test` passes after all changes +- [ ] Session protocol updated in CLAUDE.md and session-protocol.template