docs: add server prerequisites and health check gotchas

Document ClientAliveInterval/ClientAliveCountMax requirement on remote
sshd to prevent stale sessions holding ports after reconnect. Document
fail2ban ignoreip setup. Clarify that health_check.url must be a local
port (not the remote forwarded port), and that SSE endpoints block the
health checker.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-16 02:41:17 +01:00
parent 60c742a456
commit 6673cb0e48

View File

@@ -229,6 +229,44 @@ Source layout:
catalog/ OpsCatalog extension
SERVER PREREQUISITES
--------------------
For reliable auto-reconnect after reboots or network drops, the remote sshd
needs two settings in /etc/ssh/sshd_config:
ClientAliveInterval 30
ClientAliveCountMax 3
Without these, dead SSH sessions hold their remote port forward open (the OS
has not yet cleaned up the socket), so the next reconnect attempt hits
"remote port forwarding failed" and exits with code 255. With ClientAlive
enabled, sshd evicts stale sessions within ~90 seconds and frees the port.
Apply and reload (no disconnect):
sudo sed -i 's/#ClientAliveInterval 0/ClientAliveInterval 30/' /etc/ssh/sshd_config
sudo sed -i 's/#ClientAliveCountMax 3/ClientAliveCountMax 3/' /etc/ssh/sshd_config
sudo kill -HUP $(cat /run/sshd.pid)
If fail2ban is running on the remote, whitelist the bridge host IP so rapid
reconnect storms (e.g. after a key auth failure) do not trigger a ban.
Add the client IP to ignoreip in /etc/fail2ban/jail.local:
[DEFAULT]
ignoreip = 127.0.0.1/8 ::1 <your-bridge-host-ip>
Then reload: sudo systemctl reload fail2ban
Note: health_check.url must point to a LOCAL port (the local side of the
tunnel), not the remote forwarded port. For a reverse tunnel
(remote_port=18000, local_port=8000), the correct health check URL is
http://127.0.0.1:8000/... — NOT http://127.0.0.1:18000/...
For SSE endpoints (MCP), use a non-streaming endpoint from the same service
(e.g. the state-hub /state/health) since the health checker waits for the
response to complete.
DESIGN NOTES
------------
@@ -240,6 +278,9 @@ DESIGN NOTES
audit traceability (FRS §5.7).
- SSH command invoked: ssh -N -R remote_port:127.0.0.1:local_port
-i ssh_key ssh_user@host
- ExitOnForwardFailure=yes is set, so SSH exits immediately if the remote
port is already in use. This is intentional — it forces a clean reconnect
rather than silently running without the port forward active.
REPO STRUCTURE