bridge restart now means blank-slate recovery: reverse tunnels run
should_cleanup_tunnel and clear orphan remote listeners before reconnecting;
healthy forwards are left running. Local-direction tunnels keep stop/start
only. CLI and MCP report per-tunnel actions (healthy, cleaned_and_restarted,
restarted, error) and exit non-zero on cleanup failure.
Closes BRIDGE-WP-0005.
Add workplan to make bridge restart perform conditional stale-forward
cleanup before start (blank-slate recovery). Refines topology for laptop
workstation origin, intermittently offline haskelseed, and stable VPS
remotes (coulombcore, railiance01). Origin: STATE-WP-0063 tunnel incident.
Registered in State Hub via fix-consistency.
Add bridge maintenance cleanup to detect reverse tunnels whose remote
port is bound but no longer forwards (zombie sshd sessions), kill the
stale listeners on the remote host, and optionally restart the tunnel.
Includes install-cron/uninstall-cron/show-cron helpers and README notes
for the actcore-state-hub-bridge failure mode we hit on railiance01.
Surfaces the actor naming rules (adm-/agt-/atm- prefixes, legacy class
aliases) so users hitting a ConfigError have an in-CLI way to read the
spec without grepping the wiki.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- ActorType enum (adm/agt/atm) replaces actor_class string; config validates
naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch;
legacy 'human'/'automation' values accepted with DeprecationWarning
- cert_command: pluggable shell string run before each SSH launch; cert written
to state dir; -i cert appended to SSH command alongside -i key
- TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart
5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged
- CertAcquisitionError: cert failures trigger normal backoff/retry loop
- cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event
- bridge cert-status: new CLI command; exit 1 on expired cert; --json flag
- 233 tests passing, ruff clean
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both workplans had been registered as active workstreams but tasks were
never ingested — the markdown checkbox format was invisible to the
consistency checker, which requires task code blocks. Activated both
workplans (draft→active) and added task blocks with state_hub_task_id
for all 19 tasks (9 + 10).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Expands architecture constraints and SCOPE.md to reflect the three-actor
vocabulary (adm/agt/atm), two credential modes (static key + cert_command),
and ops-warden boundary. Adds directive wiki doc and two new workplans
(BRIDGE-WP-0004 directive alignment, WARDEN-WP-0001 ops-warden bootstrap).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add --http flag to MCP server for SSE transport on port 8002
- Add make mcp-http / mcp-stop targets
- Pin fastmcp<3.1.0 to stabilize dependency
- Update session-protocol: Step 0 tunnel health check before orient
- Mark OPS-WP-0002 and all its tasks done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Plan to make ops-bridge fully usable by worker agents:
- T01: SSE transport mode + make mcp-http target
- T02: register in ~/.claude.json at user scope
- T03: /bridge global slash command skill
- T04: worker agent bridge protocol in global CLAUDE.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- diagnostics.py: TunnelCheckResult with SSH process liveness, port
probe, and optional API health check; check_tunnel / check_all_tunnels
- cli.py: bridge status shows LIVE column and [STALE] marker when state
says connected but PID is dead; bridge check wired to diagnostics
- state.py: read_raw_pid helper; _pid_alive exported for reuse
- capabilities.py: capabilities registry stubs
- mcp_server/server.py: expose check_tunnel and tunnel capabilities
over MCP
- SCOPE.md: rapid orientation document
- workplans/OPS-WP-0001-diagnostics.md: workplan backing this feature
- tests: 207 passing (test_cli, test_mcp, test_diagnostics)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously build_ssh_command only generated -R (reverse) tunnels.
The k3s API tunnel needs -L (local forward: workstation:16443 →
CoulombCore:6443) so kubectl can reach the cluster API directly.
- TunnelConfig.direction: "reverse" (default) | "local"
- config.py: parse direction from YAML, validate allowed values
- manager.py: choose -R or -L flag based on direction
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each concern (identity, session protocol, workplan convention, stack,
architecture, repo boundary) now lives in its own file with a single
responsibility. CLAUDE.md becomes a thin @-import integrator. Removes
Ralph Loop duplication — global ~/.claude/CLAUDE.md remains authoritative.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Document ClientAliveInterval/ClientAliveCountMax requirement on remote
sshd to prevent stale sessions holding ports after reconnect. Document
fail2ban ignoreip setup. Clarify that health_check.url must be a local
port (not the remote forwarded port), and that SSE endpoints block the
health checker.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the full BRIDGE-WP-0003 workplan: 188 tests passing, 0 lint errors.
## What's added
**Capability registry** (`src/bridge/capabilities.py`):
- 10 capabilities with required_access_modes (cli/mcp/skill)
- Single source of truth for what OpsBridge does and where
**MCP server** (`src/bridge/mcp_server/server.py`):
- 10 FastMCP tools: bridge_up/down/restart/status/logs + 5 catalog_* tools
- 3 resources: bridge://status, catalog://domains, catalog://targets
- `.mcp.json` for project-scope auto-registration
- `scripts/register_mcp.py` for user-scope machine-global registration
**Skill** (`~/.claude/plugins/ops-bridge/bridge-status.md`):
- /bridge-status: health table with emoji indicators + remediation advice
**Cross-mode test coverage enforcement**:
- `tests/conftest.py`: capability/access_mode marks + collect_capability_coverage()
- `tests/test_mcp.py`: 31 FastMCP in-process client tests (Client(mcp) pattern)
- `tests/test_skill.py`: static skill lint against capability registry
- `tests/test_coverage_completeness.py`: meta-test that fails if any required
(capability × mode) pair lacks a test; also validates CLI commands and MCP
tools are registered in the capability registry
**ADR** (`architecture/adr-001-cross-mode-capability-registry.md`):
- Documents the registry pattern and FastMCP 3.x testing approach
Key implementation note: FastMCP 3.x in-process results are in
result.content[0].text (JSON string), not result.data directly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Defines the FastMCP server, /bridge-status skill, capability registry,
and self-validating cross-access-mode test suite for ops-bridge.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 39 tasks marked done; both workstreams updated to completed status
in the State Hub and workplan files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds the OpsCatalog subsystem: a Git-backed YAML catalog of operations
domains, targets, bridges, and actor classes. Includes catalog loader,
cross-reference validator, bridge resolver (inline-first, catalog
fallback), and new CLI commands: `bridge targets`, `bridge targets show`,
`bridge catalog list/validate/show`. Updates `up/down/restart` to resolve
bridge names from the catalog when not defined inline. 142 tests, all green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Full TDD implementation of the `bridge` CLI tool covering all phases
from BRIDGE-WP-0001: project scaffolding, config loading, state
management, audit logging, health checks, tunnel lifecycle manager, and
all CLI commands (up/down/restart/status/logs). 77 tests, all green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Expand CLAUDE.md with dev commands, architecture overview, and required prefix
- Add workplans/BRIDGE-WP-0001-initial-implementation.md: 8-phase implementation
plan covering FRS FR-1 to FR-26 (23 tasks registered in Custodian State Hub,
workstream bridge-wp-0001)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>