Files
ops-bridge/SCOPE.md
tegwick bd169a07e2 feat(directive): implement BRIDGE-WP-0004 AccessManagementDirective alignment
- ActorType enum (adm/agt/atm) replaces actor_class string; config validates
  naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch;
  legacy 'human'/'automation' values accepted with DeprecationWarning
- cert_command: pluggable shell string run before each SSH launch; cert written
  to state dir; -i cert appended to SSH command alongside -i key
- TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart
  5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged
- CertAcquisitionError: cert failures trigger normal backoff/retry loop
- cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event
- bridge cert-status: new CLI command; exit 1 on expired cert; --json flag
- 233 tests passing, ruff clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:38:29 +02:00

5.9 KiB

SCOPE

This file helps you quickly understand what this repository is about, when it is relevant, and when it is not. It is intentionally lightweight and may be incomplete.


One-liner

SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable cert_command interface.


Core Idea

Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.


In Scope

  • Named SSH reverse tunnel lifecycle (bridge up/down/restart/status/logs/cert-status)
  • Auto-reconnect with exponential backoff and configurable retry policy
  • Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
  • Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
  • Actor attribution: per-tunnel actor type (adm / agt / atm) for audit traceability, with naming convention enforcement (adm-*, agt-*, atm-*)
  • Static key mode (default): ssh_key passed directly to SSH — no TTL, no cert logic, works without any CA or external tooling
  • cert_command mode (optional): pluggable shell command that issues a short-lived CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh; cert_identity recorded in audit log — satisfies AccessManagementDirective §5
  • PID + state file management in ~/.local/state/bridge/
  • MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
  • OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)

Out of Scope

  • Credential issuance and CA management (owned by ops-warden; ops-bridge consumes certs via the cert_command interface but never signs anything itself)
  • SSH key generation for human admins (self-service: ssh-keygen)
  • Host-side principal deployment (/etc/ssh/auth_principals/) — that is railiance-infra
  • Long-running application hosting on remote machines (port-forward only, not deployment)
  • VPN or layer-3 connectivity
  • Monitoring/alerting beyond JSON audit logs
  • Replacing SSH for general interactive access

Relevant When

  • Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
  • Need audit trail of which actor (adm / agt / atm) started/stopped tunnels
  • Setting up a new machine in the Railiance ecosystem that must phone home to the hub
  • Diagnosing connectivity issues between local hub and remote services
  • Checking certificate validity for active tunnels (bridge cert-status)
  • Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials

Not Relevant When

  • All work is local (no remote services involved)
  • Manually running ssh -R is acceptable
  • No need for audit tracing of tunnel state changes

Current State

  • Status: active (v0.1 core complete; AccessManagementDirective alignment done — BRIDGE-WP-0004)
  • Implementation: ~80% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; ActorType enum (adm/agt/atm) enforced; cert_command mode implemented with TTL-aware refresh and cert_identity audit logging; OpsCatalog framework present but not yet populated
  • Stability: stable tunnel lifecycle; tested under network drops and SSH failures
  • Usage: running in lab for daily Railiance/Temporal connectivity

How It Fits

  • Upstream dependencies: SSH (system), OpenSSH server on remote hosts
  • Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
  • Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)

Terminology

  • Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check, cert_command, cert_identity
  • Actor types: adm (human operator), agt (LLM agent), atm (deterministic automation)
  • Also known as: "the bridge"
  • Potentially confusing: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
  • Legacy terms (deprecated): actor_class: human (→ adm), actor_class: automation (→ atm)

  • the-custodian — primary consumer; ops-bridge keeps remote agents connected to it
  • ops-warden — optional upstream; owns CA and cert issuance; ops-bridge calls it via cert_command when short-lived certificates are required
  • activity-core — Temporal server on remote reached via ops-bridge tunnel
  • railiance-cluster / railiance-infra — remote hosts that need to phone home; owns host-side principal deployment (/etc/ssh/auth_principals/)

Provided Capabilities

type: infrastructure
title: SSH reverse tunnel connectivity
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]

Getting Oriented

  • Start with: README.txt (architecture, config format, CLI commands, MCP integration)
  • Key files / directories: ~/.config/bridge/tunnels.yaml (tunnel config), ~/.local/state/bridge/ (PID/state/cert files)
  • Entry points: bridge --help; bridge up <tunnel-name>; bridge cert-status; MCP: bridge_status()
  • AccessManagementDirective context: wiki/AccessManagementDirective.md
  • Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)