generated from coulomb/repo-seed
225 lines
6.0 KiB
Plaintext
225 lines
6.0 KiB
Plaintext
ops-bridge
|
|
==========
|
|
|
|
SSH reverse tunnel lifecycle manager. Keeps remote execution environments
|
|
(COULOMBCORE, Railiance nodes) connected to the local Custodian State Hub
|
|
so Claude Code sessions on those machines have full MCP connectivity.
|
|
|
|
|
|
WHAT IT DOES
|
|
------------
|
|
|
|
`bridge` is a CLI tool that manages named SSH reverse tunnels. Each tunnel:
|
|
|
|
- Is identified by a human-readable name (e.g. state-hub-coulombcore)
|
|
- Runs as an SSH reverse port-forward: ssh -R remote:127.0.0.1:local host
|
|
- Auto-reconnects on drop using exponential backoff
|
|
- Optionally runs an HTTP health check to confirm the forwarded service
|
|
is actually reachable (not just the SSH process alive)
|
|
- Records structured audit events (bridge_started, bridge_connected,
|
|
health_check_failed, etc.) to a JSON log per tunnel
|
|
|
|
Bridge states: stopped -> starting -> connected <-> degraded -> reconnecting
|
|
|
|
|
|
INSTALL
|
|
-------
|
|
|
|
Requires Python 3.11+ and uv (https://docs.astral.sh/uv/).
|
|
|
|
uv tool install /path/to/ops-bridge
|
|
|
|
This registers the `bridge` command globally. For development:
|
|
|
|
cd /path/to/ops-bridge
|
|
uv tool install -e .
|
|
|
|
Verify:
|
|
|
|
bridge --help
|
|
|
|
|
|
CONFIGURATION
|
|
-------------
|
|
|
|
Config file: ~/.config/bridge/tunnels.yaml
|
|
Override with: BRIDGE_CONFIG=/path/to/config.yaml
|
|
|
|
Minimal example:
|
|
|
|
tunnels:
|
|
state-hub-coulombcore:
|
|
host: coulombcore.local
|
|
remote_port: 18000
|
|
local_port: 8000
|
|
ssh_user: ubuntu
|
|
ssh_key: ~/.ssh/id_ops
|
|
actor: agent.claude-coulombcore
|
|
|
|
actors:
|
|
agent.claude-coulombcore:
|
|
class: automation
|
|
description: Claude Code agent on CoulombCore
|
|
|
|
With health check and reconnect policy:
|
|
|
|
tunnels:
|
|
state-hub-coulombcore:
|
|
host: coulombcore.local
|
|
remote_port: 18000
|
|
local_port: 8000
|
|
ssh_user: ubuntu
|
|
ssh_key: ~/.ssh/id_ops
|
|
actor: agent.claude-coulombcore
|
|
|
|
health_check:
|
|
url: http://127.0.0.1:18000/health # checked from the REMOTE host
|
|
interval_seconds: 30
|
|
timeout_seconds: 5
|
|
|
|
reconnect:
|
|
max_attempts: 0 # 0 = retry forever
|
|
backoff_initial: 5
|
|
backoff_max: 60
|
|
|
|
actors:
|
|
agent.claude-coulombcore:
|
|
class: automation # "human" or "automation"
|
|
description: Claude Code agent on CoulombCore
|
|
operator.bernd:
|
|
class: human
|
|
description: Bernd Worsch
|
|
|
|
Required tunnel fields: host, remote_port, local_port, ssh_user, ssh_key, actor
|
|
Required actor fields: class (must be "human" or "automation")
|
|
|
|
|
|
CLI COMMANDS
|
|
------------
|
|
|
|
Lifecycle:
|
|
|
|
bridge up [TUNNEL] Start one tunnel, or all if no name given
|
|
bridge down [TUNNEL] Stop one tunnel, or all
|
|
bridge restart [TUNNEL] Restart one tunnel, or all
|
|
|
|
Observation:
|
|
|
|
bridge status Show all tunnels: state, uptime, last event
|
|
bridge status --json Machine-readable JSON output
|
|
bridge logs TUNNEL Tail the audit log for a tunnel
|
|
bridge logs TUNNEL --lines 100 --follow
|
|
|
|
Examples:
|
|
|
|
bridge up state-hub-coulombcore
|
|
bridge status
|
|
bridge logs state-hub-coulombcore --follow
|
|
bridge down state-hub-coulombcore
|
|
|
|
|
|
OPSCATALOG EXTENSION (optional)
|
|
--------------------------------
|
|
|
|
If you maintain a Git-backed YAML catalog of your infrastructure, point
|
|
bridge at it in your config:
|
|
|
|
catalog_path: ~/ops-infra/opscatalog/
|
|
|
|
Catalog layout:
|
|
|
|
opscatalog/
|
|
domains/
|
|
<domain-id>/
|
|
domain.yaml
|
|
targets/
|
|
<target-id>.yaml
|
|
bridges/
|
|
<bridge-id>.yaml
|
|
|
|
Then you can use:
|
|
|
|
bridge targets [--domain DOMAIN] List all targets (optionally filtered)
|
|
bridge targets show TARGET_ID Show full target metadata
|
|
bridge catalog list List domains with counts
|
|
bridge catalog validate Check catalog for consistency errors
|
|
bridge catalog show BRIDGE_ID Show a catalog bridge's full metadata
|
|
|
|
Bridges defined in the catalog are resolved the same way as inline tunnels.
|
|
Inline tunnels (in tunnels.yaml) take precedence over catalog bridges when
|
|
both define the same name.
|
|
|
|
|
|
STATE FILES
|
|
-----------
|
|
|
|
Runtime state is stored in ~/.local/state/bridge/:
|
|
|
|
{name}.pid Manager process ID
|
|
{name}.state Current bridge state (e.g. "connected")
|
|
{name}.log Audit log, one JSON object per line
|
|
|
|
Override the state directory with: BRIDGE_STATE_DIR=/path/to/dir
|
|
|
|
|
|
AUDIT LOG FORMAT
|
|
----------------
|
|
|
|
Each event is one JSON object per line:
|
|
|
|
{
|
|
"ts": "2026-03-12T14:23:01.456789",
|
|
"tunnel": "state-hub-coulombcore",
|
|
"event": "bridge_connected",
|
|
"actor": "agent.claude-coulombcore",
|
|
"actor_class": "automation",
|
|
"detail": ""
|
|
}
|
|
|
|
Event types: bridge_started, bridge_connected, bridge_disconnected,
|
|
bridge_reconnecting, health_check_failed, health_check_recovered,
|
|
bridge_stopped
|
|
|
|
|
|
DEVELOPMENT
|
|
-----------
|
|
|
|
uv run pytest Run all tests
|
|
uv run pytest tests/test_cli.py -v Run a specific test file
|
|
uv run ruff check . Lint
|
|
|
|
Source layout:
|
|
|
|
src/bridge/
|
|
cli.py Typer CLI (entry point)
|
|
models.py Core dataclasses and enums
|
|
config.py Config loading from tunnels.yaml
|
|
manager.py Tunnel lifecycle (subprocess, reconnect loop)
|
|
state.py PID and state file management
|
|
audit.py Audit event logging
|
|
health.py HTTP health checker (async, httpx)
|
|
catalog/ OpsCatalog extension
|
|
|
|
|
|
DESIGN NOTES
|
|
------------
|
|
|
|
- No system daemons. Tunnel processes are managed as subprocesses; PIDs
|
|
are tracked in ~/.local/state/bridge/.
|
|
- Graceful shutdown: SIGTERM to the daemon allows a clean exit; SIGKILL
|
|
follows after 5 seconds if unresponsive.
|
|
- Actor attribution on every log event (human vs. automation) supports
|
|
audit traceability (FRS §5.7).
|
|
- SSH command invoked: ssh -N -R remote_port:127.0.0.1:local_port
|
|
-i ssh_key ssh_user@host
|
|
|
|
|
|
REPO STRUCTURE
|
|
--------------
|
|
|
|
src/bridge/ Main source
|
|
tests/ Test suite
|
|
wiki/ PRD, FRS, OpsCatalog specification
|
|
workplans/ Custodian State Hub workplan files (BRIDGE-WP-*)
|
|
pyproject.toml Build config and dependencies
|