Files
ops-bridge/SCOPE.md
tegwick bd169a07e2 feat(directive): implement BRIDGE-WP-0004 AccessManagementDirective alignment
- ActorType enum (adm/agt/atm) replaces actor_class string; config validates
  naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch;
  legacy 'human'/'automation' values accepted with DeprecationWarning
- cert_command: pluggable shell string run before each SSH launch; cert written
  to state dir; -i cert appended to SSH command alongside -i key
- TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart
  5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged
- CertAcquisitionError: cert failures trigger normal backoff/retry loop
- cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event
- bridge cert-status: new CLI command; exit 1 on expired cert; --json flag
- 233 tests passing, ruff clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:38:29 +02:00

135 lines
5.9 KiB
Markdown

# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> It is intentionally lightweight and may be incomplete.
---
## One-liner
SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. Supports both static SSH keys (no TTL) and CA-signed short-lived certificates via a pluggable `cert_command` interface.
---
## Core Idea
Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.
---
## In Scope
- Named SSH reverse tunnel lifecycle (`bridge up/down/restart/status/logs/cert-status`)
- Auto-reconnect with exponential backoff and configurable retry policy
- Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
- Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
- Actor attribution: per-tunnel actor type (`adm` / `agt` / `atm`) for audit traceability,
with naming convention enforcement (`adm-*`, `agt-*`, `atm-*`)
- **Static key mode** (default): `ssh_key` passed directly to SSH — no TTL, no cert logic,
works without any CA or external tooling
- **cert_command mode** (optional): pluggable shell command that issues a short-lived
CA-signed certificate before each SSH launch; TTL-aware pre-emptive cert refresh;
`cert_identity` recorded in audit log — satisfies AccessManagementDirective §5
- PID + state file management in `~/.local/state/bridge/`
- MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
- OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)
---
## Out of Scope
- Credential issuance and CA management (owned by `ops-warden`; ops-bridge consumes
certs via the `cert_command` interface but never signs anything itself)
- SSH key generation for human admins (self-service: `ssh-keygen`)
- Host-side principal deployment (`/etc/ssh/auth_principals/`) — that is `railiance-infra`
- Long-running application hosting on remote machines (port-forward only, not deployment)
- VPN or layer-3 connectivity
- Monitoring/alerting beyond JSON audit logs
- Replacing SSH for general interactive access
---
## Relevant When
- Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
- Need audit trail of which actor (`adm` / `agt` / `atm`) started/stopped tunnels
- Setting up a new machine in the Railiance ecosystem that must phone home to the hub
- Diagnosing connectivity issues between local hub and remote services
- Checking certificate validity for active tunnels (`bridge cert-status`)
- Integrating with a CA (ops-warden or Vault) for short-lived tunnel credentials
---
## Not Relevant When
- All work is local (no remote services involved)
- Manually running `ssh -R` is acceptable
- No need for audit tracing of tunnel state changes
---
## Current State
- Status: active (v0.1 core complete; AccessManagementDirective alignment done — BRIDGE-WP-0004)
- Implementation: ~80% — CLI tunneling fully functional, MCP integration working, health
checks and audit logging complete; ActorType enum (adm/agt/atm) enforced; cert_command
mode implemented with TTL-aware refresh and cert_identity audit logging; OpsCatalog
framework present but not yet populated
- Stability: stable tunnel lifecycle; tested under network drops and SSH failures
- Usage: running in lab for daily Railiance/Temporal connectivity
---
## How It Fits
- Upstream dependencies: SSH (system), OpenSSH server on remote hosts
- Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
- Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)
---
## Terminology
- Preferred terms: tunnel, bridge, actor, actor_type, reconnect policy, health check,
cert_command, cert_identity
- Actor types: `adm` (human operator), `agt` (LLM agent), `atm` (deterministic automation)
- Also known as: "the bridge"
- Potentially confusing: "bridge state" is a tunnel-specific state machine
(stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
- Legacy terms (deprecated): `actor_class: human` (→ `adm`), `actor_class: automation` (→ `atm`)
---
## Related / Overlapping
- `the-custodian` — primary consumer; ops-bridge keeps remote agents connected to it
- `ops-warden` — optional upstream; owns CA and cert issuance; ops-bridge calls it via
`cert_command` when short-lived certificates are required
- `activity-core` — Temporal server on remote reached via ops-bridge tunnel
- `railiance-cluster` / `railiance-infra` — remote hosts that need to phone home; owns
host-side principal deployment (`/etc/ssh/auth_principals/`)
---
## Provided Capabilities
```capability
type: infrastructure
title: SSH reverse tunnel connectivity
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]
```
---
## Getting Oriented
- Start with: `README.txt` (architecture, config format, CLI commands, MCP integration)
- Key files / directories: `~/.config/bridge/tunnels.yaml` (tunnel config),
`~/.local/state/bridge/` (PID/state/cert files)
- Entry points: `bridge --help`; `bridge up <tunnel-name>`; `bridge cert-status`;
MCP: `bridge_status()`
- AccessManagementDirective context: `wiki/AccessManagementDirective.md`
- Workplans: BRIDGE-WP-0004 (directive alignment), WARDEN-WP-0001 (ops-warden bootstrap)