Files
state-hub/scripts/push-seal.md
tegwick 6cbf2d2c56 feat(consistency): T04 push seal — closed-loop writeback for automated commits
Root cause of the 501-commit pile-up in inter-hub: fix_repo() created
git commits (brief updates, T03 writebacks) but never pushed them, so
the 15-minute timer accumulated local commits indefinitely. Once real
development landed on remote the repos diverged with no self-healing path.

Changes
-------
repo_sync.py (new module)
  Extracts all git lifecycle primitives: pull_ff, push_ff,
  count_remote_ahead (C-16 input), count_local_ahead (C-17/T04 input).
  Module docstring documents the push-seal invariant and stable state.

consistency_check.py
  - Imports primitives from repo_sync; thin _detect_behind_remote wrapper
    preserves backward compat for existing callers and tests.
  - C-17 backlog guard: if local has unpushed commits from a prior failed
    push, retry before making more; skip all writes if push still fails.
  - T04 push seal: unconditional push_ff() at end of every fix_repo() run.
  - _report_needs_action: ahead_of_remote param so repos with unpushed
    backlogs are not silently skipped as "clean" by fix_all_remote().
  - Domain-slug fallback: brief no longer degrades to "(unknown)" when all
    workplans are completed — falls back to any workstream for domain context.
  - Service switched from --all --fix to --remote --all (pulls before
    fixing, skips already-clean repos).

push-seal.md (new)
  Capability documentation: the problem, the invariant, all three checks
  (C-16/C-17/T04), stable-state description, API reference, and test map.

test_repo_sync.py (new, 32 tests)
  Full coverage of all four primitives via real git repos (tmp_path).
  Includes C-17 scenario, push-seal invariant, and four end-to-end
  loop-stability tests.

test_consistency_check.py
  Four new _report_needs_action cases for the ahead_of_remote parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 01:43:40 +02:00

7.5 KiB

Push Seal (T04)

Module: repo_sync.py
Check IDs: C-16 (pull gate), C-17 (backlog guard), T04 (push seal)
Tests: tests/test_repo_sync.py


The problem it solves

The consistency engine auto-commits two kinds of changes to managed repos:

  • Writebacks (T03): task status patched into workplan files when the DB is ahead of the file
  • Brief updates: .custodian-brief.md regenerated when workstream progress changes

Before T04, these commits were created but never pushed. The custodian-sync.service timer fires every 15 minutes. Each firing that detected any change would create a new commit. Over time — especially when a repo had no active workstreams and the brief kept oscillating — hundreds of auto-commits piled up locally while the remote received none. When real development commits landed on the remote, the repo became diverged and unrecoverable without manual intervention.

Root causes:

  1. Commits were made without a corresponding push — an open loop.
  2. C-16 (pull gate) only fires when remote is ahead of local, not when local is ahead. Once the first auto-commit ran, local was already ahead and C-16 never fired again.
  3. The .custodian-brief.md domain-slug lookup produced (unknown) for fully-completed repos (no active workstreams → no topic lookup → empty string), causing the brief to be considered "changed" on every run.

The invariant

Every fix_repo() run that creates commits must push them to remote before returning, so the next run starts with local ≡ remote.

When this holds, the timer loop is idempotent: a clean repo with no unpushed commits is detected as clean and skipped entirely — no git activity, no divergence.


Three interlocking checks

C-16 — pull gate (T02, pre-existing)

Fires when count_remote_ahead(repo_path) > 0.

Remote has commits local lacks. Writing more commits on top would lose remote progress. All write operations are skipped; the --remote flag in the service pulls before fixing.

C-17 — backlog guard (new)

Fires when count_local_ahead(repo_path) > 0 AND push_ff() fails.

Local has unpushed commits from a prior run where the push failed. Before making more commits this run, the guard attempts to push the backlog. If push fails (e.g. the repo has diverged), all write operations for this run are skipped. This prevents the commit count from growing further during an already-broken state.

If push succeeds, the guard clears itself ("C-17 cleared: pushed N backlogged commit(s)") and the run proceeds normally.

T04 — push seal (new)

At the end of every fix_repo() run, after all writebacks and the brief update, push_ff() is called unconditionally.

fix_repo()
  ├── C-16 check  → skip all writes if behind remote
  ├── C-17 guard  → retry backlog push; skip all writes if push still fails
  ├── apply fixable issues (C-04, C-05, C-09, C-10, C-11, C-12, C-13, C-15)
  ├── _write_custodian_brief()  → commit if content changed
  └── push_ff()  ← T04: seal the loop

A push on a repo with no new commits is a no-op that returns success. So T04 adds no overhead for clean runs.


Stable state

Once T04 is in effect, the timer converges to this stable state:

repo clean  AND  local ≡ remote
→ _report_needs_action(report, behind=0, ahead=0) → False
→ repo skipped (logged as CLEAN)
→ no git activity
→ state unchanged on next fire

The only way to leave stable state is:

  • A real fix is needed (task drift, brief content change, etc.) → run completes with a push → returns to stable state
  • Remote receives new commits → C-16 pulls → returns to stable state
  • Push fails (network, diverge) → C-17 fires next run → retries push → returns to stable state or escalates to a visible WARN

The brief regeneration was also fixed to avoid oscillating between inter_hub and (unknown) for repos with no active workstreams.

Before: domain slug was resolved only via active workstreams. A repo with all workplans completed had no active workstreams → topic lookup returned nothing → domain_slug = "" → brief rendered (unknown) → brief was considered "changed" on every run → spurious commit every 15 minutes.

After: if no active workstreams exist, the lookup falls back to any workstream (completed or archived) on the same repo. Domain context is preserved permanently.


Service configuration

# /home/worsch/.config/systemd/user/custodian-sync.service
ExecStart=… consistency_check.py --remote --all

--remote --all uses fix_all_remote(), which:

  1. Runs check_repo() + count_remote_ahead() + count_local_ahead() for each repo
  2. Skips repos that are already clean (no issues, not behind, not ahead)
  3. For repos needing action: git pull --ff-only first, then fix_repo() (which ends with T04 push)

Previously --all --fix was used, which skipped the pull step and the clean-repo skip logic.


Public API (repo_sync.py)

Function Returns Purpose
pull_ff(repo_path) (bool, str) git pull --ff-only; T02 pull gate
push_ff(repo_path) (bool, str) git push; T04 push seal
count_remote_ahead(repo_path) int commits remote has that local lacks; C-16 input
count_local_ahead(repo_path) int commits local has that remote lacks; C-17 / T04 input

All functions are best-effort: errors return 0 / (False, "…") rather than raising. The consistency engine must never be blocked by transient network issues.

Note: push_ff does not pass --ff-only to git — that flag applies to git pull, not git push. git push already rejects non-fast-forward updates by default. The function name signals the semantic guarantee (no force-push), not a CLI flag.


Compatibility aliases in consistency_check.py

The original inline git functions have been replaced by imports from repo_sync, with thin wrappers for backward compatibility:

from repo_sync import count_local_ahead, count_remote_ahead, pull_ff, push_ff

_detect_ahead_of_remote = count_local_ahead
_git_pull = pull_ff
_git_push = push_ff

def _detect_behind_remote(repo_path: str) -> bool:
    return count_remote_ahead(repo_path) > 0

Existing code and tests that import _detect_behind_remote or _git_pull from consistency_check continue to work unchanged.


Test coverage (tests/test_repo_sync.py)

All tests use real git repos via tmp_path — no mocks, no subprocess patches. A git_pair fixture provides a bare remote + one clone with an initial pushed commit. A _make_second_clone helper creates a second worker clone for divergence scenarios.

Class What it covers
TestErrorResilience All four functions return safe defaults for non-git paths, non-existent paths, and repos with no upstream
TestCountLocalAhead 0 when in sync; N after N local commits; 0 after push
TestCountRemoteAhead 0 when in sync; 0 when local is ahead (not behind); N when remote has N new commits; nonzero when diverged
TestPushFf Success when ahead; no-op success when in sync; push-seal invariant (ahead=0 after push); idempotent; fails when diverged; C-17 scenario (diverged push rejection does not grow the backlog)
TestPullFf Up-to-date no-op; pulls new remote commits; resolves behind state to 0; fails when diverged; fails with no remote
TestPushSealLoop Full fix-run cycle leaves repo clean; multiple writebacks sealed in one push; no-commit run needs no push; runaway prevention on failed push