bridge restart now means blank-slate recovery: reverse tunnels run
should_cleanup_tunnel and clear orphan remote listeners before reconnecting;
healthy forwards are left running. Local-direction tunnels keep stop/start
only. CLI and MCP report per-tunnel actions (healthy, cleaned_and_restarted,
restarted, error) and exit non-zero on cleanup failure.
Closes BRIDGE-WP-0005.
Add bridge maintenance cleanup to detect reverse tunnels whose remote
port is bound but no longer forwards (zombie sshd sessions), kill the
stale listeners on the remote host, and optionally restart the tunnel.
Includes install-cron/uninstall-cron/show-cron helpers and README notes
for the actcore-state-hub-bridge failure mode we hit on railiance01.
Surfaces the actor naming rules (adm-/agt-/atm- prefixes, legacy class
aliases) so users hitting a ConfigError have an in-CLI way to read the
spec without grepping the wiki.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- ActorType enum (adm/agt/atm) replaces actor_class string; config validates
naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch;
legacy 'human'/'automation' values accepted with DeprecationWarning
- cert_command: pluggable shell string run before each SSH launch; cert written
to state dir; -i cert appended to SSH command alongside -i key
- TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart
5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged
- CertAcquisitionError: cert failures trigger normal backoff/retry loop
- cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event
- bridge cert-status: new CLI command; exit 1 on expired cert; --json flag
- 233 tests passing, ruff clean
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add --http flag to MCP server for SSE transport on port 8002
- Add make mcp-http / mcp-stop targets
- Pin fastmcp<3.1.0 to stabilize dependency
- Update session-protocol: Step 0 tunnel health check before orient
- Mark OPS-WP-0002 and all its tasks done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- diagnostics.py: TunnelCheckResult with SSH process liveness, port
probe, and optional API health check; check_tunnel / check_all_tunnels
- cli.py: bridge status shows LIVE column and [STALE] marker when state
says connected but PID is dead; bridge check wired to diagnostics
- state.py: read_raw_pid helper; _pid_alive exported for reuse
- capabilities.py: capabilities registry stubs
- mcp_server/server.py: expose check_tunnel and tunnel capabilities
over MCP
- SCOPE.md: rapid orientation document
- workplans/OPS-WP-0001-diagnostics.md: workplan backing this feature
- tests: 207 passing (test_cli, test_mcp, test_diagnostics)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously build_ssh_command only generated -R (reverse) tunnels.
The k3s API tunnel needs -L (local forward: workstation:16443 →
CoulombCore:6443) so kubectl can reach the cluster API directly.
- TunnelConfig.direction: "reverse" (default) | "local"
- config.py: parse direction from YAML, validate allowed values
- manager.py: choose -R or -L flag based on direction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the full BRIDGE-WP-0003 workplan: 188 tests passing, 0 lint errors.
## What's added
**Capability registry** (`src/bridge/capabilities.py`):
- 10 capabilities with required_access_modes (cli/mcp/skill)
- Single source of truth for what OpsBridge does and where
**MCP server** (`src/bridge/mcp_server/server.py`):
- 10 FastMCP tools: bridge_up/down/restart/status/logs + 5 catalog_* tools
- 3 resources: bridge://status, catalog://domains, catalog://targets
- `.mcp.json` for project-scope auto-registration
- `scripts/register_mcp.py` for user-scope machine-global registration
**Skill** (`~/.claude/plugins/ops-bridge/bridge-status.md`):
- /bridge-status: health table with emoji indicators + remediation advice
**Cross-mode test coverage enforcement**:
- `tests/conftest.py`: capability/access_mode marks + collect_capability_coverage()
- `tests/test_mcp.py`: 31 FastMCP in-process client tests (Client(mcp) pattern)
- `tests/test_skill.py`: static skill lint against capability registry
- `tests/test_coverage_completeness.py`: meta-test that fails if any required
(capability × mode) pair lacks a test; also validates CLI commands and MCP
tools are registered in the capability registry
**ADR** (`architecture/adr-001-cross-mode-capability-registry.md`):
- Documents the registry pattern and FastMCP 3.x testing approach
Key implementation note: FastMCP 3.x in-process results are in
result.content[0].text (JSON string), not result.data directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the OpsCatalog subsystem: a Git-backed YAML catalog of operations
domains, targets, bridges, and actor classes. Includes catalog loader,
cross-reference validator, bridge resolver (inline-first, catalog
fallback), and new CLI commands: `bridge targets`, `bridge targets show`,
`bridge catalog list/validate/show`. Updates `up/down/restart` to resolve
bridge names from the catalog when not defined inline. 142 tests, all green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full TDD implementation of the `bridge` CLI tool covering all phases
from BRIDGE-WP-0001: project scaffolding, config loading, state
management, audit logging, health checks, tunnel lifecycle manager, and
all CLI commands (up/down/restart/status/logs). 77 tests, all green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>