253 lines
8.6 KiB
Markdown
253 lines
8.6 KiB
Markdown
---
|
|
id: CUST-WP-0026
|
|
type: workplan
|
|
title: "Distributed Consistency — Multi-Machine State Sync"
|
|
domain: custodian
|
|
repo: the-custodian
|
|
status: done
|
|
owner: custodian
|
|
topic_slug: custodian
|
|
created: "2026-03-21"
|
|
updated: "2026-03-21"
|
|
state_hub_workstream_id: "32de6210-ce1e-4cba-ad1f-fdeba462030d"
|
|
---
|
|
|
|
# Distributed Consistency — Multi-Machine State Sync
|
|
|
|
## Problem
|
|
|
|
The consistency checker assumes local workplan files are always the authoritative
|
|
source of truth. This breaks in the primary development workflow:
|
|
|
|
1. Implementation runs on **CoulombCore** (remote)
|
|
2. Task status is written to the **state-hub DB** via ops-bridge tunnel
|
|
3. The **workstation's local repo** is not updated (no `git pull`)
|
|
4. Session close triggers `fix-consistency` on the workstation
|
|
5. Checker reads stale local files (tasks still `todo`) and **regresses** DB
|
|
status — overwriting `done`/`in_progress` back to `todo`
|
|
6. The dashboard shows progress, then silently reverts
|
|
|
|
This is a design assumption in ADR-001 that breaks under multi-machine workflows.
|
|
ADR-001 states the DB is rebuilt from files — but only holds when local files
|
|
are always up to date.
|
|
|
|
## Goal
|
|
|
|
Eliminate false regressions and make `fix-consistency` safe to run regardless
|
|
of local repo staleness. Three layers of defence:
|
|
|
|
- **T01** (no-regress rule): Never allow fix-consistency to move a task
|
|
*backwards* in status. DB-ahead wins.
|
|
- **T02** (pull gate): Detect and warn when local repo is behind its remote
|
|
before applying fixes.
|
|
- **T03** (DB→file writeback): Write DB status back into workplan files and
|
|
commit, so files stay truthful and the multi-machine workflow naturally
|
|
converges.
|
|
|
|
## Implementation Notes
|
|
|
|
The status progression order for the no-regress rule:
|
|
`todo → in_progress → blocked → done → cancelled`
|
|
|
|
For the pull gate, `git fetch` is the only network call needed. No push, no
|
|
merge — just detection. The fix mode should refuse or warn; check mode should
|
|
always be allowed to report.
|
|
|
|
For writeback (T03), `fix-consistency --fix` needs to:
|
|
1. Detect tasks where DB status > file status
|
|
2. Edit the workplan file (update the `status:` field in the task block)
|
|
3. Stage and commit the change with a standard commit message
|
|
|
|
Writeback must be idempotent and must not alter anything other than `status:`
|
|
fields in task blocks.
|
|
|
|
## Tasks
|
|
|
|
### T01 — No-regress rule in consistency_check.py
|
|
|
|
```task
|
|
id: CUST-WP-0026-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "34a76f4c-ad3f-4780-ad62-1e788ceca224"
|
|
```
|
|
|
|
Modify `state-hub/scripts/consistency_check.py` so that `--fix` mode never
|
|
regresses task status in the DB.
|
|
|
|
**Status ordering:**
|
|
```python
|
|
STATUS_ORDER = {"todo": 0, "in_progress": 1, "blocked": 1,
|
|
"done": 2, "cancelled": 2}
|
|
```
|
|
|
|
In the C-11 fix path (file task found, DB task found, statuses differ):
|
|
- If `STATUS_ORDER[db_status] >= STATUS_ORDER[file_status]`: skip the DB
|
|
update, emit a new check code **C-13** WARN:
|
|
`"DB task '{title}' is ahead of file (db={db_status}, file={file_status}) — skipped to prevent regression"`
|
|
- If `STATUS_ORDER[db_status] < STATUS_ORDER[file_status]`: apply the update
|
|
as today (file is ahead, sync forward)
|
|
|
|
New check code **C-13**: "DB task ahead of workplan file — regression
|
|
prevented". Severity: WARN (not FAIL — this is expected in multi-machine
|
|
workflows).
|
|
|
|
Gate: `make test` must pass after this change.
|
|
|
|
---
|
|
|
|
### T02 — Git pull gate before --fix
|
|
|
|
```task
|
|
id: CUST-WP-0026-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "f9dbad4e-ba66-4e20-83ef-93b78c9e1590"
|
|
```
|
|
|
|
Add a remote-staleness check to `consistency_check.py` that runs at the start
|
|
of `--fix` mode for each repo being checked.
|
|
|
|
**Detection logic:**
|
|
```bash
|
|
git -C <repo_path> fetch --quiet origin 2>/dev/null
|
|
LOCAL=$(git -C <repo_path> rev-parse HEAD)
|
|
REMOTE=$(git -C <repo_path> rev-parse @{u} 2>/dev/null)
|
|
# If LOCAL != REMOTE and REMOTE is reachable → repo is behind
|
|
```
|
|
|
|
If the repo is behind its remote tracking branch:
|
|
- In `--fix` mode: emit **C-14** WARN and skip all write operations for that
|
|
repo. Print: `"Repo '{slug}' is behind remote — pull before fixing to avoid
|
|
clobbering remote progress"`.
|
|
- In check-only mode: emit C-14 INFO (no-op, just informational).
|
|
|
|
The `git fetch` must be best-effort — if the remote is unreachable (offline,
|
|
ops-bridge down), skip the check silently rather than failing.
|
|
|
|
New check code **C-14**: "Repo behind remote tracking branch". Severity: WARN
|
|
in fix mode, INFO in check mode.
|
|
|
|
Gate: `make test` must pass. Add a test that simulates a behind-remote repo
|
|
(mock `rev-parse` output).
|
|
|
|
---
|
|
|
|
### T03 — DB→file status writeback
|
|
|
|
```task
|
|
id: CUST-WP-0026-T03
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "749130f9-b397-46fd-8eb3-43c0fc127dac"
|
|
```
|
|
|
|
Extend `consistency_check.py --fix` to write DB status back into workplan
|
|
files when DB is ahead of the file (the C-13 case from T01).
|
|
|
|
**Writeback logic:**
|
|
1. Locate the task block in the workplan file by matching `id: <task_id>`
|
|
2. Replace the `status: <old>` line within that block with `status: <new>`
|
|
3. Stage the file: `git -C <repo_path> add <workplan_file>`
|
|
4. Commit with message:
|
|
```
|
|
chore(consistency): sync task status from DB [auto]
|
|
|
|
Updated by fix-consistency on <ISO-date>:
|
|
- <task_id>: <old_status> → <new_status>
|
|
```
|
|
|
|
**Guard rails:**
|
|
- Only modify lines inside a ` ```task ... ``` ` block
|
|
- Only change the `status:` field — never touch `id:`, `priority:`,
|
|
`state_hub_task_id:`, or any other field
|
|
- If the workplan file has uncommitted local changes, skip writeback for that
|
|
file and emit C-14 WARN ("workplan has uncommitted changes — skipping
|
|
writeback")
|
|
- If git commit fails for any reason, log the error but do not abort the rest
|
|
of the consistency run
|
|
|
|
**New flag:** `--no-writeback` — disables T03 behaviour while keeping T01/T02
|
|
active. Default: writeback enabled when `--fix` is set.
|
|
|
|
Gate: `make test` must pass. The existing workplan parsing tests should cover
|
|
the task block regex; add a writeback-specific test.
|
|
|
|
---
|
|
|
|
### T04 — Session protocol update
|
|
|
|
```task
|
|
id: CUST-WP-0026-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "59a5d09a-1e67-4749-9d84-039982edc3ef"
|
|
```
|
|
|
|
Update `the-custodian/CLAUDE.md` session close protocol (step 5) to reflect
|
|
the new behaviour and add the recommended pre-fix step:
|
|
|
|
**Current step 5:**
|
|
> If any workplan files were written or modified this session, run:
|
|
> `make fix-consistency REPO=the-custodian`
|
|
|
|
**Updated step 5:**
|
|
> Before running fix-consistency on any repo that has a remote, ensure the
|
|
> local copy is up to date:
|
|
> ```bash
|
|
> git -C <repo_path> pull --ff-only
|
|
> cd state-hub && make fix-consistency REPO=<slug>
|
|
> ```
|
|
> The consistency checker will now warn (C-14) if the repo is still behind
|
|
> and refuse to regress status (C-13). A C-13 warning is normal for repos
|
|
> where work has progressed on a remote machine — it means writeback is
|
|
> keeping the files in sync.
|
|
|
|
Also update the `state-hub/scripts/project_rules/session-protocol.template`
|
|
so newly registered repos get the updated guidance.
|
|
|
|
---
|
|
|
|
### T05 — Makefile: fix-consistency-remote target
|
|
|
|
```task
|
|
id: CUST-WP-0026-T05
|
|
status: done
|
|
priority: low
|
|
state_hub_task_id: "b8375cbc-9c44-48f6-a78c-b7333d409525"
|
|
```
|
|
|
|
Add a convenience target to `state-hub/Makefile` that pulls before fixing:
|
|
|
|
```makefile
|
|
## Pull repo then sync consistency: make fix-consistency-remote REPO=net-kingdom
|
|
fix-consistency-remote:
|
|
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
|
$(eval REPO_PATH := $(shell \
|
|
curl -s $(API_BASE)/repos/?slug=$(REPO) | \
|
|
python3 -c "import json,sys; \
|
|
repos=json.load(sys.stdin); \
|
|
print(next((r['local_path'] for r in repos if r['slug']=='$(REPO)'), ''))" \
|
|
))
|
|
@test -n "$(REPO_PATH)" || (echo "ERROR: repo '$(REPO)' not found in state-hub"; exit 1)
|
|
git -C "$(REPO_PATH)" pull --ff-only || \
|
|
(echo "WARN: pull failed (conflicts or no remote) — running fix-consistency anyway"; true)
|
|
$(MAKE) fix-consistency REPO=$(REPO) REPO_PATH=$(REPO_PATH)
|
|
```
|
|
|
|
This makes the safe path the convenient path:
|
|
`make fix-consistency-remote REPO=net-kingdom`
|
|
|
|
## Done Criteria
|
|
|
|
- [ ] `make fix-consistency REPO=net-kingdom` never regresses a `done` task
|
|
back to `todo` when local file is stale
|
|
- [ ] C-13 warning is emitted (not error) when DB is ahead of file
|
|
- [ ] C-14 warning is emitted in fix mode when repo is behind remote;
|
|
fix operations are skipped for that repo
|
|
- [ ] DB→file writeback commits corrected status to the workplan file
|
|
- [ ] `--no-writeback` flag disables writeback cleanly
|
|
- [ ] `make fix-consistency-remote REPO=<slug>` pulls then fixes in one step
|
|
- [ ] `make test` passes after all changes
|
|
- [ ] Session protocol updated in CLAUDE.md and session-protocol.template
|