generated from coulomb/repo-seed
- diagnostics.py: TunnelCheckResult with SSH process liveness, port probe, and optional API health check; check_tunnel / check_all_tunnels - cli.py: bridge status shows LIVE column and [STALE] marker when state says connected but PID is dead; bridge check wired to diagnostics - state.py: read_raw_pid helper; _pid_alive exported for reuse - capabilities.py: capabilities registry stubs - mcp_server/server.py: expose check_tunnel and tunnel capabilities over MCP - SCOPE.md: rapid orientation document - workplans/OPS-WP-0001-diagnostics.md: workplan backing this feature - tests: 207 passing (test_cli, test_mcp, test_diagnostics) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
110 lines
4.2 KiB
Markdown
110 lines
4.2 KiB
Markdown
# SCOPE
|
|
|
|
> This file helps you quickly understand what this repository is about,
|
|
> when it is relevant, and when it is not.
|
|
> It is intentionally lightweight and may be incomplete.
|
|
|
|
---
|
|
|
|
## One-liner
|
|
|
|
SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards.
|
|
|
|
---
|
|
|
|
## Core Idea
|
|
|
|
Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools.
|
|
|
|
---
|
|
|
|
## In Scope
|
|
|
|
- Named SSH reverse tunnel lifecycle (`bridge up/down/restart/status/logs`)
|
|
- Auto-reconnect with exponential backoff and configurable retry policy
|
|
- Optional HTTP health checks (confirm forwarded service is actually reachable from remote)
|
|
- Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.)
|
|
- Actor attribution: per-tunnel actor class (human / automation) for audit traceability
|
|
- PID + state file management in `~/.local/state/bridge/`
|
|
- MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools
|
|
- OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges)
|
|
|
|
---
|
|
|
|
## Out of Scope
|
|
|
|
- Identity/credential management (uses existing SSH keys)
|
|
- Long-running application hosting on remote machines (port-forward only, not deployment)
|
|
- VPN or layer-3 connectivity
|
|
- Monitoring/alerting beyond JSON audit logs
|
|
- Replacing SSH for general interactive access
|
|
|
|
---
|
|
|
|
## Relevant When
|
|
|
|
- Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP
|
|
- Need audit trail of which actor (human vs. automation) started/stopped tunnels
|
|
- Setting up a new machine in the Railiance ecosystem that must phone home to the hub
|
|
- Diagnosing connectivity issues between local hub and remote services
|
|
|
|
---
|
|
|
|
## Not Relevant When
|
|
|
|
- All work is local (no remote services involved)
|
|
- Manually running `ssh -R` is acceptable
|
|
- No need for audit tracing of tunnel state changes
|
|
|
|
---
|
|
|
|
## Current State
|
|
|
|
- Status: experimental → active (v0.1 core complete; OpsCatalog planned but not yet shipped)
|
|
- Implementation: ~75% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; OpsCatalog framework present but not populated
|
|
- Stability: stable tunnel lifecycle; tested under network drops and SSH failures
|
|
- Usage: running in lab for daily Railiance/Temporal connectivity
|
|
|
|
---
|
|
|
|
## How It Fits
|
|
|
|
- Upstream dependencies: SSH (system), OpenSSH server on remote hosts
|
|
- Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel
|
|
- Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding)
|
|
|
|
---
|
|
|
|
## Terminology
|
|
|
|
- Preferred terms: tunnel, bridge, actor, actor_class, reconnect policy, health check
|
|
- Also known as: "the bridge"
|
|
- Potentially confusing terms: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge
|
|
|
|
---
|
|
|
|
## Related / Overlapping Repositories
|
|
|
|
- `the-custodian` — primary consumer; ops-bridge keeps remote agents connected to it
|
|
- `activity-core` — Temporal server on remote reached via ops-bridge tunnel
|
|
- `railiance-cluster` / `railiance-infra` — remote hosts that need to phone home
|
|
|
|
---
|
|
|
|
## Provided Capabilities
|
|
|
|
```capability
|
|
type: infrastructure
|
|
title: SSH reverse tunnel connectivity
|
|
description: Named, auto-reconnecting SSH reverse tunnels with health checks and audit logging — keeps remote execution environments continuously connected to the local Custodian State Hub.
|
|
keywords: [ssh, tunnel, reverse-tunnel, connectivity, remote, bridge, ops-bridge]
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Oriented
|
|
|
|
- Start with: `README.txt` (architecture, config format, CLI commands, MCP integration)
|
|
- Key files / directories: `~/.config/bridge/tunnels.yaml` (tunnel config), `~/.local/state/bridge/` (PID/state files)
|
|
- Entry points: `bridge --help`; `bridge up <tunnel-name>`; MCP: `bridge_status()`
|