Files

tegwick bd169a07e2 feat(directive): implement BRIDGE-WP-0004 AccessManagementDirective alignment

- ActorType enum (adm/agt/atm) replaces actor_class string; config validates
  naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch;
  legacy 'human'/'automation' values accepted with DeprecationWarning
- cert_command: pluggable shell string run before each SSH launch; cert written
  to state dir; -i cert appended to SSH command alongside -i key
- TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart
  5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged
- CertAcquisitionError: cert failures trigger normal backoff/retry loop
- cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event
- bridge cert-status: new CLI command; exit 1 on expired cert; --json flag
- 233 tests passing, ruff clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-15 09:38:29 +02:00

5.9 KiB

Raw Blame History

SCOPE

This file helps you quickly understand what this repository is about, when it is relevant, and when it is not. It is intentionally lightweight and may be incomplete.

One-liner

SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable cert_command interface.

Core Idea

Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.

In Scope

Named SSH reverse tunnel lifecycle (bridge up/down/restart/status/logs/cert-status)
Auto-reconnect with exponential backoff and configurable retry policy
Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
Actor attribution: per-tunnel actor type (adm / agt / atm) for audit traceability, with naming convention enforcement (adm-*, agt-*, atm-*)
Static key mode (default): ssh_key passed directly to SSH — no TTL, no cert logic, works without any CA or external tooling
cert_command mode (optional): pluggable shell command that issues a short-lived CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh; cert_identity recorded in audit log — satisfies AccessManagementDirective §5
PID + state file management in ~/.local/state/bridge/
MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)

Out of Scope

Credential issuance and CA management (owned by ops-warden; ops-bridge consumes certs via the cert_command interface but never signs anything itself)
SSH key generation for human admins (self-service: ssh-keygen)
Host-side principal deployment (/etc/ssh/auth_principals/) — that is railiance-infra
Long-running application hosting on remote machines (port-forward only, not deployment)
VPN or layer-3 connectivity
Monitoring/alerting beyond JSON audit logs
Replacing SSH for general interactive access

Relevant When

Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
Need audit trail of which actor (adm / agt / atm) started/stopped tunnels
Setting up a new machine in the Railiance ecosystem that must phone home to the hub
Diagnosing connectivity issues between local hub and remote services
Checking certificate validity for active tunnels (bridge cert-status)
Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials

Not Relevant When

All work is local (no remote services involved)
Manually running ssh -R is acceptable
No need for audit tracing of tunnel state changes

Current State

Status: active (v0.1 core complete; AccessManagementDirective alignment done — BRIDGE-WP-0004)
Implementation: ~80% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; ActorType enum (adm/agt/atm) enforced; cert_command mode implemented with TTL-aware refresh and cert_identity audit logging; OpsCatalog framework present but not yet populated
Stability: stable tunnel lifecycle; tested under network drops and SSH failures
Usage: running in lab for daily Railiance/Temporal connectivity

How It Fits

Upstream dependencies: SSH (system), OpenSSH server on remote hosts
Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)

Terminology

Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check, cert_command, cert_identity
Actor types: adm (human operator), agt (LLM agent), atm (deterministic automation)
Also known as: "the bridge"
Potentially confusing: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
Legacy terms (deprecated): actor_class: human (→ adm), actor_class: automation (→ atm)

the-custodian — primary consumer; ops-bridge keeps remote agents connected to it
ops-warden — optional upstream; owns CA and cert issuance; ops-bridge calls it via cert_command when short-lived certificates are required
activity-core — Temporal server on remote reached via ops-bridge tunnel
railiance-cluster / railiance-infra — remote hosts that need to phone home; owns host-side principal deployment (/etc/ssh/auth_principals/)

Provided Capabilities

type: infrastructure
title: SSH reverse tunnel connectivity
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]

Getting Oriented

Start with: README.txt (architecture, config format, CLI commands, MCP integration)
Key files / directories: ~/.config/bridge/tunnels.yaml (tunnel config), ~/.local/state/bridge/ (PID/state/cert files)
Entry points: bridge --help; bridge up <tunnel-name>; bridge cert-status; MCP: bridge_status()
AccessManagementDirective context: wiki/AccessManagementDirective.md
Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)

5.9 KiB Raw Blame History