5.9 KiB
SCOPE
This file helps you quickly understand what this repository is about, when it is relevant, and when it is not. It is intentionally lightweight and may be incomplete.
One-liner
SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable cert_command interface.
Core Idea
Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.
In Scope
- Named SSH reverse tunnel lifecycle (
bridge up/down/restart/status/logs/cert-status) - Auto-reconnect with exponential backoff and configurable retry policy
- Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
- Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
- Actor attribution: per-tunnel actor type (
adm/agt/atm) for audit traceability, with naming convention enforcement (adm-*,agt-*,atm-*) - Static key mode (default):
ssh_keypassed directly to SSH — no TTL, no cert logic, works without any CA or external tooling - cert_command mode (optional): pluggable shell command that issues a short-lived
CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh;
cert_identityrecorded in audit log — satisfies AccessManagementDirective §5 - PID + state file management in
~/.local/state/bridge/ - MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
- OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)
Out of Scope
- Credential issuance and CA management (owned by
ops-warden; ops-bridge consumes certs via thecert_commandinterface but never signs anything itself) - SSH key generation for human admins (self-service:
ssh-keygen) - Host-side principal deployment (
/etc/ssh/auth_principals/) — that israiliance-infra - Long-running application hosting on remote machines (port-forward only, not deployment)
- VPN or layer-3 connectivity
- Monitoring/alerting beyond JSON audit logs
- Replacing SSH for general interactive access
Relevant When
- Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
- Need audit trail of which actor (
adm/agt/atm) started/stopped tunnels - Setting up a new machine in the Railiance ecosystem that must phone home to the hub
- Diagnosing connectivity issues between local hub and remote services
- Checking certificate validity for active tunnels (
bridge cert-status) - Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials
Not Relevant When
- All work is local (no remote services involved)
- Manually running
ssh -Ris acceptable - No need for audit tracing of tunnel state changes
Current State
- Status: active (v0.1 core complete; directive alignment in progress — BRIDGE-WP-0004)
- Implementation: ~75% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; OpsCatalog framework present but not populated; cert_command / ActorType alignment not yet implemented
- Stability: stable tunnel lifecycle; tested under network drops and SSH failures
- Usage: running in lab for daily Railiance/Temporal connectivity
How It Fits
- Upstream dependencies: SSH (system), OpenSSH server on remote hosts
- Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
- Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)
Terminology
- Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check, cert_command, cert_identity
- Actor types:
adm(human operator),agt(LLM agent),atm(deterministic automation) - Also known as: "the bridge"
- Potentially confusing: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
- Legacy terms (deprecated):
actor_class: human(→adm),actor_class: automation(→atm)
Related / Overlapping
the-custodian— primary consumer; ops-bridge keeps remote agents connected to itops-warden— optional upstream; owns CA and cert issuance; ops-bridge calls it viacert_commandwhen short-lived certificates are requiredactivity-core— Temporal server on remote reached via ops-bridge tunnelrailiance-cluster/railiance-infra— remote hosts that need to phone home; owns host-side principal deployment (/etc/ssh/auth_principals/)
Provided Capabilities
type: infrastructure
title: SSH reverse tunnel connectivity
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]
Getting Oriented
- Start with:
README.txt(architecture, config format, CLI commands, MCP integration) - Key files / directories:
~/.config/bridge/tunnels.yaml(tunnel config),~/.local/state/bridge/(PID/state/cert files) - Entry points:
bridge --help;bridge up <tunnel-name>;bridge cert-status; MCP:bridge_status() - AccessManagementDirective context:
wiki/AccessManagementDirective.md - Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)