ops-bridge ========== SSH reverse tunnel lifecycle manager. Keeps remote execution environments (COULOMBCORE, Railiance nodes) connected to the local Custodian State Hub so Claude Code sessions on those machines have full MCP connectivity. WHAT IT DOES ------------ `bridge` is a CLI tool that manages named SSH reverse tunnels. Each tunnel: - Is identified by a human-readable name (e.g. state-hub-coulombcore) - Runs as an SSH reverse port-forward: ssh -R remote:127.0.0.1:local host - Auto-reconnects on drop using exponential backoff - Optionally runs an HTTP health check to confirm the forwarded service is actually reachable (not just the SSH process alive) - Records structured audit events (bridge_started, bridge_connected, health_check_failed, etc.) to a JSON log per tunnel Bridge states: stopped -> starting -> connected <-> degraded -> reconnecting INSTALL ------- Requires Python 3.11+ and uv (https://docs.astral.sh/uv/). uv tool install /path/to/ops-bridge This registers the `bridge` command globally. For development: cd /path/to/ops-bridge uv tool install -e . Verify: bridge --help CONFIGURATION ------------- Config file: ~/.config/bridge/tunnels.yaml Override with: BRIDGE_CONFIG=/path/to/config.yaml Minimal example: tunnels: state-hub-coulombcore: host: coulombcore.local remote_port: 18000 local_port: 8000 ssh_user: ubuntu ssh_key: ~/.ssh/id_ops actor: agent.claude-coulombcore actors: agent.claude-coulombcore: class: automation description: Claude Code agent on CoulombCore With health check and reconnect policy: tunnels: state-hub-coulombcore: host: coulombcore.local remote_port: 18000 local_port: 8000 ssh_user: ubuntu ssh_key: ~/.ssh/id_ops actor: agent.claude-coulombcore health_check: url: http://127.0.0.1:18000/health # checked from the REMOTE host interval_seconds: 30 timeout_seconds: 5 reconnect: max_attempts: 0 # 0 = retry forever backoff_initial: 5 backoff_max: 60 actors: agent.claude-coulombcore: class: automation # "human" or "automation" description: Claude Code agent on CoulombCore operator.bernd: class: human description: Bernd Worsch Required tunnel fields: host, remote_port, local_port, ssh_user, ssh_key, actor Required actor fields: class (must be "human" or "automation") CLI COMMANDS ------------ Lifecycle: bridge up [TUNNEL] Start one tunnel, or all if no name given bridge down [TUNNEL] Stop one tunnel, or all bridge restart [TUNNEL] Restart one tunnel, or all Observation: bridge status Show all tunnels: state, uptime, last event bridge status --json Machine-readable JSON output bridge logs TUNNEL Tail the audit log for a tunnel bridge logs TUNNEL --lines 100 --follow Examples: bridge up state-hub-coulombcore bridge status bridge logs state-hub-coulombcore --follow bridge down state-hub-coulombcore OPSCATALOG EXTENSION (optional) -------------------------------- If you maintain a Git-backed YAML catalog of your infrastructure, point bridge at it in your config: catalog_path: ~/ops-infra/opscatalog/ Catalog layout: opscatalog/ domains/ / domain.yaml targets/ .yaml bridges/ .yaml Then you can use: bridge targets [--domain DOMAIN] List all targets (optionally filtered) bridge targets show TARGET_ID Show full target metadata bridge catalog list List domains with counts bridge catalog validate Check catalog for consistency errors bridge catalog show BRIDGE_ID Show a catalog bridge's full metadata Bridges defined in the catalog are resolved the same way as inline tunnels. Inline tunnels (in tunnels.yaml) take precedence over catalog bridges when both define the same name. STATE FILES ----------- Runtime state is stored in ~/.local/state/bridge/: {name}.pid Manager process ID {name}.state Current bridge state (e.g. "connected") {name}.log Audit log, one JSON object per line Override the state directory with: BRIDGE_STATE_DIR=/path/to/dir AUDIT LOG FORMAT ---------------- Each event is one JSON object per line: { "ts": "2026-03-12T14:23:01.456789", "tunnel": "state-hub-coulombcore", "event": "bridge_connected", "actor": "agent.claude-coulombcore", "actor_class": "automation", "detail": "" } Event types: bridge_started, bridge_connected, bridge_disconnected, bridge_reconnecting, health_check_failed, health_check_recovered, bridge_stopped DEVELOPMENT ----------- uv run pytest Run all tests uv run pytest tests/test_cli.py -v Run a specific test file uv run ruff check . Lint Source layout: src/bridge/ cli.py Typer CLI (entry point) models.py Core dataclasses and enums config.py Config loading from tunnels.yaml manager.py Tunnel lifecycle (subprocess, reconnect loop) state.py PID and state file management audit.py Audit event logging health.py HTTP health checker (async, httpx) catalog/ OpsCatalog extension DESIGN NOTES ------------ - No system daemons. Tunnel processes are managed as subprocesses; PIDs are tracked in ~/.local/state/bridge/. - Graceful shutdown: SIGTERM to the daemon allows a clean exit; SIGKILL follows after 5 seconds if unresponsive. - Actor attribution on every log event (human vs. automation) supports audit traceability (FRS ยง5.7). - SSH command invoked: ssh -N -R remote_port:127.0.0.1:local_port -i ssh_key ssh_user@host REPO STRUCTURE -------------- src/bridge/ Main source tests/ Test suite wiki/ PRD, FRS, OpsCatalog specification workplans/ Custodian State Hub workplan files (BRIDGE-WP-*) pyproject.toml Build config and dependencies