feat(maintenance): nightly stale SSH forward cleanup at 03:00

Add bridge maintenance cleanup to detect reverse tunnels whose remote
port is bound but no longer forwards (zombie sshd sessions), kill the
stale listeners on the remote host, and optionally restart the tunnel.

Includes install-cron/uninstall-cron/show-cron helpers and README notes
for the actcore-state-hub-bridge failure mode we hit on railiance01.
This commit is contained in:
2026-06-19 15:59:27 +02:00
parent a6857fb8f7
commit 4e9882909f
5 changed files with 565 additions and 1 deletions

View File

@@ -243,6 +243,31 @@ has not yet cleaned up the socket), so the next reconnect attempt hits
"remote port forwarding failed" and exits with code 255. With ClientAlive
enabled, sshd evicts stale sessions within ~90 seconds and frees the port.
NIGHTLY STALE-FORWARD CLEANUP
------------------------------
When a bridge client dies without tearing down its SSH session, the remote
host can keep port 18000 (etc.) bound to a zombie sshd listener. The port
accepts connections but never forwards them, which breaks in-cluster proxies
such as actcore-state-hub-bridge on railiance01.
Install a 03:00 local-time cron job that probes each reverse tunnel's remote
forward, kills stale listeners when the local service is healthy but the
remote forward is not, and restarts the tunnel:
bridge maintenance install-cron
Manual run:
bridge maintenance cleanup --restart
Inspect or remove the cron entry:
bridge maintenance show-cron
bridge maintenance uninstall-cron
Logs append to ~/.local/state/bridge/cleanup.log
Apply and reload (no disconnect):
sudo sed -i 's/#ClientAliveInterval 0/ClientAliveInterval 30/' /etc/ssh/sshd_config