diff --git a/workplans/OPS-WP-0002-agent-usability.md b/workplans/OPS-WP-0002-agent-usability.md new file mode 100644 index 0000000..7b0d11c --- /dev/null +++ b/workplans/OPS-WP-0002-agent-usability.md @@ -0,0 +1,216 @@ +--- +id: OPS-WP-0002 +type: workplan +title: "Agent Usability — MCP Registration, Skill, and Worker Orientation" +domain: custodian +repo: ops-bridge +status: active +owner: custodian +topic_slug: custodian +created: "2026-03-21" +updated: "2026-03-21" +depends_on: OPS-WP-0001 +--- + +# OPS-WP-0002 — Agent Usability: MCP Registration, Skill, and Worker Orientation + +## Problem + +The ops-bridge MCP server (`src/bridge/mcp_server/server.py`) is fully +implemented with tools for `bridge_up/down/restart/status/check/logs` and +catalog operations. But no agent can use it because: + +1. **Not registered** — the server isn't in `~/.claude.json` and has no + persistent transport mode. It only runs on stdio today. +2. **No slash command** — agents working ad-hoc (not via MCP) have no + quick way to check or restore tunnels. +3. **No worker orientation** — agents on remote machines (CoulombCore, + Railiance) don't know that bridge is available or how to use it when + their state-hub connection drops. + +## Goal + +Any agent — on the workstation or a remote machine — can: +- Check tunnel health in one call +- Bring up a dropped tunnel without manual intervention +- Recover the state-hub connection if it goes down mid-session + +## Design + +### MCP server (workstation, persistent) + +Run as an SSE service on port 8002 (same pattern as state-hub on 8001). +Registered at user scope in `~/.claude.json` so it's available to all +Claude Code sessions. + +The SSE transport is already supported by FastMCP — just change the +`mcp.run()` call to accept an `--http` flag or read a `BRIDGE_MCP_PORT` +env var. + +### Slash command skill (all machines) + +A `/bridge` skill at `~/.claude/commands/bridge.md` (global scope) that: +- Reads `bridge status` output +- Surfaces any tunnel that is down or stale +- Offers to bring it up +- Useful on machines that don't have the MCP server registered + +### Worker agent orientation (remote machines) + +Update `CLAUDE.md` (global) and `ops-bridge` session protocol to tell +worker agents: +- Check `bridge status` at session start when on a machine with + ops-bridge installed +- If state-hub tunnel is down: run `bridge up state-hub-` to + restore it before making any state-hub API calls +- If no bridge command: fall back to direct API URL if reachable + +--- + +## Tasks + +### T01 — SSE transport mode for MCP server + +```task +id: OPS-WP-0002-T01 +status: todo +priority: high +``` + +Add `--http` flag and `BRIDGE_MCP_PORT` env var to `server.py` entry +point. When `--http` is set, run `mcp.run(transport="sse", port=PORT)` +instead of stdio. + +Add `make mcp-http` target to `Makefile`: +```makefile +mcp-http: ## Start MCP server in SSE mode (default port 8002) + BRIDGE_MCP_PORT=$${BRIDGE_MCP_PORT:-8002} uv run python src/bridge/mcp_server/server.py --http +``` + +Add `make mcp-stop` target that kills any running MCP server on port +8002. + +Gate: `bridge_status()` tool callable via SSE on localhost:8002 after +`make mcp-http`. + +--- + +### T02 — Register MCP server in ~/.claude.json + +```task +id: OPS-WP-0002-T02 +status: todo +priority: high +``` + +Register the ops-bridge MCP server at user scope: +```bash +claude mcp add-json -s user ops-bridge \ + '{"type":"sse","url":"http://127.0.0.1:8002/sse"}' +``` + +Document in `ops-bridge` CLAUDE.md: +``` +To start the MCP server: + cd ~/ops-bridge && make mcp-http + +To verify registration: + python3 -c "import json,os; d=json.load(open(os.path.expanduser('~/.claude.json'))); print(list(d.get('mcpServers',{}).keys()))" +``` + +Update global `~/.claude/CLAUDE.md` to list `ops-bridge` MCP server +alongside `state-hub`. + +Gate: `ops-bridge` appears in Claude Code MCP tool list after `make +mcp-http`. + +--- + +### T03 — `/bridge` slash command skill + +```task +id: OPS-WP-0002-T03 +status: todo +priority: medium +``` + +Create `~/.claude/commands/bridge.md` — a global Claude Code skill for +tunnel management. + +**Behaviour:** +1. Run `bridge status` and parse output +2. Report each tunnel: name, state, LIVE column +3. For any tunnel that is `stopped`, `reconnecting`, or `[STALE]`: + - Offer to run `bridge up ` + - After `bridge up`, re-check with `bridge check ` +4. If all tunnels are `connected` and LIVE: report green and exit + +**Skill definition:** +```yaml +--- +description: > + Check ops-bridge tunnel health and restore any dropped tunnels. + Reports status of all configured tunnels and offers to bring up + any that are stopped or stale. +argument-hint: "[tunnel-name]" +allowed-tools: + - Bash(bridge status) + - Bash(bridge up*) + - Bash(bridge down*) + - Bash(bridge check*) + - Bash(bridge logs*) +--- +``` + +If an optional tunnel name is passed as `$ARGUMENTS`, scope all +operations to that tunnel only. + +Gate: `/bridge` skill runs cleanly when all tunnels are up; correctly +identifies and recovers a manually-stopped tunnel. + +--- + +### T04 — Worker agent orientation in CLAUDE.md + +```task +id: OPS-WP-0002-T04 +status: todo +priority: medium +``` + +Update global `~/.claude/CLAUDE.md` — add a **Worker Agent — Bridge +Protocol** section: + +```markdown +## Worker Agent — Bridge Protocol + +When working on a remote machine (CoulombCore, Railiance nodes): + +1. At session start, check if `bridge` is installed: + `which bridge && bridge status` +2. If state-hub tunnel is down: `bridge up state-hub-` + Wait for state `connected` before making state-hub API calls. +3. If `bridge` is not installed, check if the state-hub API is directly + reachable: `curl -s http://127.0.0.1:8000/state/health` +4. Only proceed without state-hub if absolutely necessary — log a + progress note about the outage when connectivity is restored. +``` + +Also add a one-liner reminder to the ops-bridge session protocol in +`.claude/rules/session-protocol.md`: +> At session start: `bridge status` — bring up any stopped tunnels +> before accessing remote services. + +Gate: `~/.claude/CLAUDE.md` contains the Worker Agent section; ops-bridge +session protocol references bridge status check. + +--- + +## Done Criteria + +- [ ] `make mcp-http` starts the MCP server on port 8002 (SSE) +- [ ] `bridge_status` and `bridge_check` callable as MCP tools from Claude Code +- [ ] `ops-bridge` registered in `~/.claude.json` at user scope +- [ ] `/bridge` skill surfaces tunnel states and recovers a stopped tunnel +- [ ] Global CLAUDE.md has worker agent bridge protocol +- [ ] All existing tests pass after T01 changes (`make test`)