Files
ops-bridge/workplans/OPS-WP-0001-diagnostics.md
tegwick a55c685f89 feat(diagnostics): end-to-end tunnel check, stale state detection, MCP extensions
- diagnostics.py: TunnelCheckResult with SSH process liveness, port
  probe, and optional API health check; check_tunnel / check_all_tunnels
- cli.py: bridge status shows LIVE column and [STALE] marker when state
  says connected but PID is dead; bridge check wired to diagnostics
- state.py: read_raw_pid helper; _pid_alive exported for reuse
- capabilities.py: capabilities registry stubs
- mcp_server/server.py: expose check_tunnel and tunnel capabilities
  over MCP
- SCOPE.md: rapid orientation document
- workplans/OPS-WP-0001-diagnostics.md: workplan backing this feature
- tests: 207 passing (test_cli, test_mcp, test_diagnostics)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 15:07:47 +01:00

3.8 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
OPS-WP-0001 workplan ops-bridge diagnostics and flow improvements custodian ops-bridge done claude custodian 2026-03-20 2026-03-20 6726cea2-447a-40b2-b0a0-edf495f07942

OPS-WP-0001 — ops-bridge diagnostics and flow improvements

Scope: Add bridge check end-to-end diagnostics command, fix bridge status to surface live PID liveness and flag stale state, add a bridge_check MCP tool, and wire Makefile convenience targets in state-hub.

Context: During a session, bridge status reported "connected" but the reverse port forwarding was not active — stale .state files written by the daemon. The status command does not verify the SSH process is alive or that the remote port is actually listening.


Task: Add read_raw_pid() to StateManager

id: OPS-WP-0001-T01
status: done
priority: high
state_hub_task_id: "05e98e85-699a-4982-bb3e-8f2538cde2c7"

Add read_raw_pid(name) to src/bridge/state.py — reads PID from file without liveness check. Existing read_pid() (which also checks liveness) stays unchanged.


Task: Create src/bridge/diagnostics.py

id: OPS-WP-0001-T02
status: done
priority: high
state_hub_task_id: "b68d7b1e-850b-469a-9de2-8b5d3d1f1c05"

New module with TunnelCheckResult dataclass (ssh_process, pid, remote_port, local_api, latency_ms, stale_state, ok property) and check_tunnel() / check_all_tunnels() functions. SSH probe via subprocess; optional httpx health check.


Task: Fix bridge status and add bridge check to CLI

id: OPS-WP-0001-T03
status: done
priority: high
state_hub_task_id: "e87c6c5d-170c-4af3-905c-a48fae2edbe5"

Fix status to show live PID liveness (LIVE column) and flag stale state. Add check command with --json flag; exit 1 if any tunnel not ok. Add _print_check_table helper.


Task: Add bridge_check MCP tool and bridge://check resource

id: OPS-WP-0001-T04
status: done
priority: high
state_hub_task_id: "7e97c112-20e2-4e2e-b853-53b10998392b"

Add bridge_check(tunnel?) tool and bridge://check resource to src/bridge/mcp_server/server.py.


Task: Register bridge_check capability

id: OPS-WP-0001-T05
status: done
priority: high
state_hub_task_id: "c69fc748-a706-46db-a4d5-30d60222452b"

Add bridge_check entry to src/bridge/capabilities.py with required_access_modes=frozenset({"cli", "mcp"}).


Task: Write tests/test_diagnostics.py

id: OPS-WP-0001-T06
status: done
priority: high
state_hub_task_id: "070ed088-74a6-48d3-81cf-739c2a2fd21b"

Unit tests: test_no_pid, test_pid_dead, test_pid_alive_port_listening, test_pid_alive_port_closed, test_ssh_timeout.


Task: Add TestCheckCommand to tests/test_cli.py

id: OPS-WP-0001-T07
status: done
priority: high
state_hub_task_id: "aae5ddc5-f823-4647-a536-8604ddb97946"

Tests: test_check_help, test_check_all_pass (marked capability+mode), test_check_any_fail, test_check_json_flag, test_check_specific_tunnel.


Task: Add TestMcpBridgeCheck to tests/test_mcp.py

id: OPS-WP-0001-T08
status: done
priority: high
state_hub_task_id: "ed492a3d-7a5f-465e-8cc3-d2f992f5462c"

Test: test_bridge_check_tool marked capability("bridge_check") + access_mode("mcp").


Task: Add tunnels targets to state-hub Makefile

id: OPS-WP-0001-T09
status: done
priority: medium
state_hub_task_id: "a3c77062-cff5-40e3-936c-b210b05f8839"

Add tunnels-up, tunnels-status, tunnels-check targets delegating to bridge. Add to .PHONY line.


Task: Run test suite and verify

id: OPS-WP-0001-T10
status: done
priority: high
state_hub_task_id: "e42de76c-fab7-4924-8929-38fa9eaca478"

cd /home/worsch/ops-bridge && uv run pytest tests/ -v — all tests green.