diff --git a/SCOPE.md b/SCOPE.md new file mode 100644 index 0000000..05fee18 --- /dev/null +++ b/SCOPE.md @@ -0,0 +1,98 @@ +# SCOPE + +> This file helps you quickly understand what this repository is about, +> when it is relevant, and when it is not. +> It is intentionally lightweight and may be incomplete. + +--- + +## One-liner + +SSH reverse tunnel lifecycle manager — keeps remote execution environments continuously connected to the local Custodian State Hub via auto-reconnecting port-forwards. + +--- + +## Core Idea + +Claude Code sessions run locally; the Custodian State Hub API runs locally. Remote machines (Railiance nodes, Temporal workers, Markitect services) need to reach the hub. Ops-bridge manages named SSH reverse tunnels with auto-reconnect, health checks, audit logging, and an MCP server so Claude Code can start/stop/inspect tunnels as tools. + +--- + +## In Scope + +- Named SSH reverse tunnel lifecycle (`bridge up/down/restart/status/logs`) +- Auto-reconnect with exponential backoff and configurable retry policy +- Optional HTTP health checks (confirm forwarded service is actually reachable from remote) +- Structured audit logging: JSON events (connected, disconnected, health_check_failed, etc.) +- Actor attribution: per-tunnel actor class (human / automation) for audit traceability +- PID + state file management in `~/.local/state/bridge/` +- MCP server exposing tunnel lifecycle + OpsCatalog queries as Claude Code tools +- OpsCatalog: optional Git-backed YAML catalog of infrastructure topology (domains/targets/bridges) + +--- + +## Out of Scope + +- Identity/credential management (uses existing SSH keys) +- Long-running application hosting on remote machines (port-forward only, not deployment) +- VPN or layer-3 connectivity +- Monitoring/alerting beyond JSON audit logs +- Replacing SSH for general interactive access + +--- + +## Relevant When + +- Remote Temporal workers or Railiance nodes need to reach the local Custodian MCP +- Need audit trail of which actor (human vs. automation) started/stopped tunnels +- Setting up a new machine in the Railiance ecosystem that must phone home to the hub +- Diagnosing connectivity issues between local hub and remote services + +--- + +## Not Relevant When + +- All work is local (no remote services involved) +- Manually running `ssh -R` is acceptable +- No need for audit tracing of tunnel state changes + +--- + +## Current State + +- Status: experimental → active (v0.1 core complete; OpsCatalog planned but not yet shipped) +- Implementation: ~75% — CLI tunneling fully functional, MCP integration working, health checks and audit logging complete; OpsCatalog framework present but not populated +- Stability: stable tunnel lifecycle; tested under network drops and SSH failures +- Usage: running in lab for daily Railiance/Temporal connectivity + +--- + +## How It Fits + +- Upstream dependencies: SSH (system), OpenSSH server on remote hosts +- Downstream consumers: all remote Claude Code agents depend on ops-bridge to reach local hub MCP; activity-core Temporal server reachable via bridge tunnel +- Often used with: the-custodian (health checks point to hub API), activity-core (Temporal port-forwarding) + +--- + +## Terminology + +- Preferred terms: tunnel, bridge, actor, actor_class, reconnect policy, health check +- Also known as: "the bridge" +- Potentially confusing terms: "bridge state" is a tunnel-specific state machine (stopped → starting → connected ↔ degraded → reconnecting), not a network bridge + +--- + +## Related / Overlapping Repositories + +- `the-custodian` — primary consumer; ops-bridge keeps remote agents connected to it +- `activity-core` — Temporal server on remote reached via ops-bridge tunnel +- `railiance-cluster` / `railiance-infra` — remote hosts that need to phone home + +--- + +## Getting Oriented + +- Start with: `README.txt` (architecture, config format, CLI commands, MCP integration) +- Key files / directories: `~/.config/bridge/tunnels.yaml` (tunnel config), `~/.local/state/bridge/` (PID/state files) +- Entry points: `bridge --help`; `bridge up `; MCP: `bridge_status()`