Files
ops-bridge/SCOPE.md

5.9 KiB

SCOPE

This file helps you quickly understand what this repository is about, when it is relevant, and when it is not. It is intentionally lightweight and may be incomplete.


One-liner

SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable cert_command interface.


Core Idea

Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.


In Scope

  • Named SSH reverse tunnel lifecycle (bridge up/down/restart/status/logs/cert-status)
  • Auto-reconnect with exponential backoff and configurable retry policy
  • Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
  • Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
  • Actor attribution: per-tunnel actor type (adm / agt / atm) for audit traceability, with naming convention enforcement (adm-*, agt-*, atm-*)
  • Static key mode (default): ssh_key passed directly to SSH — no TTL, no cert logic, works without any CA or external tooling
  • cert_command mode (optional): pluggable shell command that issues a short-lived CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh; cert_identity recorded in audit log — satisfies AccessManagementDirective §5
  • PID + state file management in ~/.local/state/bridge/
  • MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
  • OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)

Out of Scope

  • Credential issuance and CA management (owned by ops-warden; ops-bridge consumes certs via the cert_command interface but never signs anything itself)
  • SSH key generation for human admins (self-service: ssh-keygen)
  • Host-side principal deployment (/etc/ssh/auth_principals/) — that is railiance-infra
  • Long-running application hosting on remote machines (port-forward only, not deployment)
  • VPN or layer-3 connectivity
  • Monitoring/alerting beyond JSON audit logs
  • Replacing SSH for general interactive access

Relevant When

  • Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
  • Need audit trail of which actor (adm / agt / atm) started/stopped tunnels
  • Setting up a new machine in the Railiance ecosystem that must phone home to the hub
  • Diagnosing connectivity issues between local hub and remote services
  • Checking certificate validity for active tunnels (bridge cert-status)
  • Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials

Not Relevant When

  • All work is local (no remote services involved)
  • Manually running ssh -R is acceptable
  • No need for audit tracing of tunnel state changes

Current State

  • Status: active (v0.1 core complete; directive alignment in progress — BRIDGE-WP-0004)
  • Implementation: ~75% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; OpsCatalog framework present but not populated; cert_command / ActorType alignment not yet implemented
  • Stability: stable tunnel lifecycle; tested under network drops and SSH failures
  • Usage: running in lab for daily Railiance/Temporal connectivity

How It Fits

  • Upstream dependencies: SSH (system), OpenSSH server on remote hosts
  • Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
  • Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)

Terminology

  • Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check, cert_command, cert_identity
  • Actor types: adm (human operator), agt (LLM agent), atm (deterministic automation)
  • Also known as: "the bridge"
  • Potentially confusing: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
  • Legacy terms (deprecated): actor_class: human (→ adm), actor_class: automation (→ atm)

  • the-custodian — primary consumer; ops-bridge keeps remote agents connected to it
  • ops-warden — optional upstream; owns CA and cert issuance; ops-bridge calls it via cert_command when short-lived certificates are required
  • activity-core — Temporal server on remote reached via ops-bridge tunnel
  • railiance-cluster / railiance-infra — remote hosts that need to phone home; owns host-side principal deployment (/etc/ssh/auth_principals/)

Provided Capabilities

type: infrastructure
title: SSH reverse tunnel connectivity
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]

Getting Oriented

  • Start with: README.txt (architecture, config format, CLI commands, MCP integration)
  • Key files / directories: ~/.config/bridge/tunnels.yaml (tunnel config), ~/.local/state/bridge/ (PID/state/cert files)
  • Entry points: bridge --help; bridge up <tunnel-name>; bridge cert-status; MCP: bridge_status()
  • AccessManagementDirective context: wiki/AccessManagementDirective.md
  • Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)