generated from coulomb/repo-seed
feat(restart): route reverse tunnels through stale-forward cleanup
bridge restart now means blank-slate recovery: reverse tunnels run should_cleanup_tunnel and clear orphan remote listeners before reconnecting; healthy forwards are left running. Local-direction tunnels keep stop/start only. CLI and MCP report per-tunnel actions (healthy, cleaned_and_restarted, restarted, error) and exit non-zero on cleanup failure. Closes BRIDGE-WP-0005.
This commit is contained in:
@@ -157,31 +157,82 @@ Just controlled operational access when you need it.
|
||||
Start a bridge:
|
||||
|
||||
```
|
||||
ob up hostA=hostB
|
||||
bridge up state-hub-railiance01
|
||||
```
|
||||
|
||||
Check active bridges:
|
||||
|
||||
```
|
||||
ob status
|
||||
bridge status
|
||||
```
|
||||
|
||||
Investigate infrastructure targets:
|
||||
|
||||
```
|
||||
ob targets
|
||||
bridge targets
|
||||
```
|
||||
|
||||
Stop the bridge when finished:
|
||||
|
||||
```
|
||||
ob down hostA=hostB
|
||||
bridge down state-hub-railiance01
|
||||
```
|
||||
|
||||
OpsBridge handles the lifecycle so operators can focus on solving the problem.
|
||||
|
||||
---
|
||||
|
||||
# Tunnel lifecycle commands
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `bridge up` | Start tunnel(s) that are not already running |
|
||||
| `bridge down` | Stop tunnel(s) that are running |
|
||||
| `bridge restart` | Blank-slate recovery — get tunnel(s) operational again |
|
||||
| `bridge maintenance cleanup` | Proactive hygiene sweep without implying restart |
|
||||
|
||||
## `bridge restart` — blank-slate recovery
|
||||
|
||||
`bridge restart` means *operational again*, not merely cycling the local manager
|
||||
PID while a broken remote listener still holds the port.
|
||||
|
||||
For **reverse** tunnels (State Hub exposure on remote hosts), restart:
|
||||
|
||||
1. Runs `should_cleanup_tunnel` to detect stale SSH remote forwards
|
||||
2. Clears orphan listeners on the remote host when needed
|
||||
3. Reconnects the tunnel (stop + start) only when cleanup was required
|
||||
|
||||
When the remote forward is already healthy, restart reports `healthy` and leaves
|
||||
the working tunnel running — no unnecessary disruption.
|
||||
|
||||
For **local-direction** tunnels (`direction: local` in `tunnels.yaml`, e.g.
|
||||
`k3s-api-coulombcore`), restart uses local stop/start only; no remote cleanup.
|
||||
|
||||
Use `bridge maintenance cleanup` for scheduled or manual hygiene without the
|
||||
restart contract. The nightly cron (`bridge maintenance install-cron`) runs
|
||||
`maintenance cleanup --restart` at 03:00.
|
||||
|
||||
**Incident context:** stale orphan `sshd` remote forwards after laptop sleep
|
||||
blocked `bridge restart` until operators discovered the maintenance subcommand.
|
||||
See `state-hub/history/20260621-weekend-automation-assessment.md` and
|
||||
`BRIDGE-WP-0005` in this repo.
|
||||
|
||||
## Host roles
|
||||
|
||||
Tunnels in `~/.config/bridge/tunnels.yaml` serve three host roles:
|
||||
|
||||
| Role | Hosts | Behaviour |
|
||||
|------|-------|-----------|
|
||||
| **Workstation origin** | WSL laptop | Shutdown, sleep, and network changes kill local bridge processes without graceful remote SSH teardown. Orphan forwards on all remotes are common after wake. |
|
||||
| **VPS remotes** | coulombcore, railiance01 | Normally always-on. Maintenance reboots clear kernel state, but laptop return can leave orphan forwards from the previous session if the VPS did not reboot. |
|
||||
| **LAN builder** | haskelseed | Intermittently offline; same orphan-forward pattern when the workstation-side tunnel dies uncleanly. |
|
||||
|
||||
Conditional remote cleanup before restart benefits all reverse tunnels.
|
||||
`should_cleanup_tunnel` skips healthy forwards — VPS tunnels with live working
|
||||
forwards are untouched.
|
||||
|
||||
---
|
||||
|
||||
# The Philosophy Behind OpsBridge
|
||||
|
||||
Infrastructure teams succeed or fail based on how effectively they bridge the gaps between:
|
||||
|
||||
Reference in New Issue
Block a user