docs: add README.txt with usage guide and configuration reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-12 03:24:56 +01:00
parent baee28eda2
commit d248f14a9f

224
README.txt Normal file
View File

@@ -0,0 +1,224 @@
ops-bridge
==========
SSH reverse tunnel lifecycle manager. Keeps remote execution environments
(COULOMBCORE, Railiance nodes) connected to the local Custodian State Hub
so Claude Code sessions on those machines have full MCP connectivity.
WHAT IT DOES
------------
`bridge` is a CLI tool that manages named SSH reverse tunnels. Each tunnel:
- Is identified by a human-readable name (e.g. state-hub-coulombcore)
- Runs as an SSH reverse port-forward: ssh -R remote:127.0.0.1:local host
- Auto-reconnects on drop using exponential backoff
- Optionally runs an HTTP health check to confirm the forwarded service
is actually reachable (not just the SSH process alive)
- Records structured audit events (bridge_started, bridge_connected,
health_check_failed, etc.) to a JSON log per tunnel
Bridge states: stopped -> starting -> connected <-> degraded -> reconnecting
INSTALL
-------
Requires Python 3.11+ and uv (https://docs.astral.sh/uv/).
uv tool install /path/to/ops-bridge
This registers the `bridge` command globally. For development:
cd /path/to/ops-bridge
uv tool install -e .
Verify:
bridge --help
CONFIGURATION
-------------
Config file: ~/.config/bridge/tunnels.yaml
Override with: BRIDGE_CONFIG=/path/to/config.yaml
Minimal example:
tunnels:
state-hub-coulombcore:
host: coulombcore.local
remote_port: 18000
local_port: 8000
ssh_user: ubuntu
ssh_key: ~/.ssh/id_ops
actor: agent.claude-coulombcore
actors:
agent.claude-coulombcore:
class: automation
description: Claude Code agent on CoulombCore
With health check and reconnect policy:
tunnels:
state-hub-coulombcore:
host: coulombcore.local
remote_port: 18000
local_port: 8000
ssh_user: ubuntu
ssh_key: ~/.ssh/id_ops
actor: agent.claude-coulombcore
health_check:
url: http://127.0.0.1:18000/health # checked from the REMOTE host
interval_seconds: 30
timeout_seconds: 5
reconnect:
max_attempts: 0 # 0 = retry forever
backoff_initial: 5
backoff_max: 60
actors:
agent.claude-coulombcore:
class: automation # "human" or "automation"
description: Claude Code agent on CoulombCore
operator.bernd:
class: human
description: Bernd Worsch
Required tunnel fields: host, remote_port, local_port, ssh_user, ssh_key, actor
Required actor fields: class (must be "human" or "automation")
CLI COMMANDS
------------
Lifecycle:
bridge up [TUNNEL] Start one tunnel, or all if no name given
bridge down [TUNNEL] Stop one tunnel, or all
bridge restart [TUNNEL] Restart one tunnel, or all
Observation:
bridge status Show all tunnels: state, uptime, last event
bridge status --json Machine-readable JSON output
bridge logs TUNNEL Tail the audit log for a tunnel
bridge logs TUNNEL --lines 100 --follow
Examples:
bridge up state-hub-coulombcore
bridge status
bridge logs state-hub-coulombcore --follow
bridge down state-hub-coulombcore
OPSCATALOG EXTENSION (optional)
--------------------------------
If you maintain a Git-backed YAML catalog of your infrastructure, point
bridge at it in your config:
catalog_path: ~/ops-infra/opscatalog/
Catalog layout:
opscatalog/
domains/
<domain-id>/
domain.yaml
targets/
<target-id>.yaml
bridges/
<bridge-id>.yaml
Then you can use:
bridge targets [--domain DOMAIN] List all targets (optionally filtered)
bridge targets show TARGET_ID Show full target metadata
bridge catalog list List domains with counts
bridge catalog validate Check catalog for consistency errors
bridge catalog show BRIDGE_ID Show a catalog bridge's full metadata
Bridges defined in the catalog are resolved the same way as inline tunnels.
Inline tunnels (in tunnels.yaml) take precedence over catalog bridges when
both define the same name.
STATE FILES
-----------
Runtime state is stored in ~/.local/state/bridge/:
{name}.pid Manager process ID
{name}.state Current bridge state (e.g. "connected")
{name}.log Audit log, one JSON object per line
Override the state directory with: BRIDGE_STATE_DIR=/path/to/dir
AUDIT LOG FORMAT
----------------
Each event is one JSON object per line:
{
"ts": "2026-03-12T14:23:01.456789",
"tunnel": "state-hub-coulombcore",
"event": "bridge_connected",
"actor": "agent.claude-coulombcore",
"actor_class": "automation",
"detail": ""
}
Event types: bridge_started, bridge_connected, bridge_disconnected,
bridge_reconnecting, health_check_failed, health_check_recovered,
bridge_stopped
DEVELOPMENT
-----------
uv run pytest Run all tests
uv run pytest tests/test_cli.py -v Run a specific test file
uv run ruff check . Lint
Source layout:
src/bridge/
cli.py Typer CLI (entry point)
models.py Core dataclasses and enums
config.py Config loading from tunnels.yaml
manager.py Tunnel lifecycle (subprocess, reconnect loop)
state.py PID and state file management
audit.py Audit event logging
health.py HTTP health checker (async, httpx)
catalog/ OpsCatalog extension
DESIGN NOTES
------------
- No system daemons. Tunnel processes are managed as subprocesses; PIDs
are tracked in ~/.local/state/bridge/.
- Graceful shutdown: SIGTERM to the daemon allows a clean exit; SIGKILL
follows after 5 seconds if unresponsive.
- Actor attribution on every log event (human vs. automation) supports
audit traceability (FRS §5.7).
- SSH command invoked: ssh -N -R remote_port:127.0.0.1:local_port
-i ssh_key ssh_user@host
REPO STRUCTURE
--------------
src/bridge/ Main source
tests/ Test suite
wiki/ PRD, FRS, OpsCatalog specification
workplans/ Custodian State Hub workplan files (BRIDGE-WP-*)
pyproject.toml Build config and dependencies