docs(workplan): OPS-WP-0002 — agent usability via MCP registration and /bridge skill

Plan to make ops-bridge fully usable by worker agents:
- T01: SSE transport mode + make mcp-http target
- T02: register in ~/.claude.json at user scope
- T03: /bridge global slash command skill
- T04: worker agent bridge protocol in global CLAUDE.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-21 15:15:42 +01:00
parent a55c685f89
commit d73b7be45d

View File

@@ -0,0 +1,216 @@
---
id: OPS-WP-0002
type: workplan
title: "Agent Usability — MCP Registration, Skill, and Worker Orientation"
domain: custodian
repo: ops-bridge
status: active
owner: custodian
topic_slug: custodian
created: "2026-03-21"
updated: "2026-03-21"
depends_on: OPS-WP-0001
---
# OPS-WP-0002 — Agent Usability: MCP Registration, Skill, and Worker Orientation
## Problem
The ops-bridge MCP server (`src/bridge/mcp_server/server.py`) is fully
implemented with tools for `bridge_up/down/restart/status/check/logs` and
catalog operations. But no agent can use it because:
1. **Not registered** — the server isn't in `~/.claude.json` and has no
persistent transport mode. It only runs on stdio today.
2. **No slash command** — agents working ad-hoc (not via MCP) have no
quick way to check or restore tunnels.
3. **No worker orientation** — agents on remote machines (CoulombCore,
Railiance) don't know that bridge is available or how to use it when
their state-hub connection drops.
## Goal
Any agent — on the workstation or a remote machine — can:
- Check tunnel health in one call
- Bring up a dropped tunnel without manual intervention
- Recover the state-hub connection if it goes down mid-session
## Design
### MCP server (workstation, persistent)
Run as an SSE service on port 8002 (same pattern as state-hub on 8001).
Registered at user scope in `~/.claude.json` so it's available to all
Claude Code sessions.
The SSE transport is already supported by FastMCP — just change the
`mcp.run()` call to accept an `--http` flag or read a `BRIDGE_MCP_PORT`
env var.
### Slash command skill (all machines)
A `/bridge` skill at `~/.claude/commands/bridge.md` (global scope) that:
- Reads `bridge status` output
- Surfaces any tunnel that is down or stale
- Offers to bring it up
- Useful on machines that don't have the MCP server registered
### Worker agent orientation (remote machines)
Update `CLAUDE.md` (global) and `ops-bridge` session protocol to tell
worker agents:
- Check `bridge status` at session start when on a machine with
ops-bridge installed
- If state-hub tunnel is down: run `bridge up state-hub-<machine>` to
restore it before making any state-hub API calls
- If no bridge command: fall back to direct API URL if reachable
---
## Tasks
### T01 — SSE transport mode for MCP server
```task
id: OPS-WP-0002-T01
status: todo
priority: high
```
Add `--http` flag and `BRIDGE_MCP_PORT` env var to `server.py` entry
point. When `--http` is set, run `mcp.run(transport="sse", port=PORT)`
instead of stdio.
Add `make mcp-http` target to `Makefile`:
```makefile
mcp-http: ## Start MCP server in SSE mode (default port 8002)
BRIDGE_MCP_PORT=$${BRIDGE_MCP_PORT:-8002} uv run python src/bridge/mcp_server/server.py --http
```
Add `make mcp-stop` target that kills any running MCP server on port
8002.
Gate: `bridge_status()` tool callable via SSE on localhost:8002 after
`make mcp-http`.
---
### T02 — Register MCP server in ~/.claude.json
```task
id: OPS-WP-0002-T02
status: todo
priority: high
```
Register the ops-bridge MCP server at user scope:
```bash
claude mcp add-json -s user ops-bridge \
'{"type":"sse","url":"http://127.0.0.1:8002/sse"}'
```
Document in `ops-bridge` CLAUDE.md:
```
To start the MCP server:
cd ~/ops-bridge && make mcp-http
To verify registration:
python3 -c "import json,os; d=json.load(open(os.path.expanduser('~/.claude.json'))); print(list(d.get('mcpServers',{}).keys()))"
```
Update global `~/.claude/CLAUDE.md` to list `ops-bridge` MCP server
alongside `state-hub`.
Gate: `ops-bridge` appears in Claude Code MCP tool list after `make
mcp-http`.
---
### T03 — `/bridge` slash command skill
```task
id: OPS-WP-0002-T03
status: todo
priority: medium
```
Create `~/.claude/commands/bridge.md` — a global Claude Code skill for
tunnel management.
**Behaviour:**
1. Run `bridge status` and parse output
2. Report each tunnel: name, state, LIVE column
3. For any tunnel that is `stopped`, `reconnecting`, or `[STALE]`:
- Offer to run `bridge up <tunnel-name>`
- After `bridge up`, re-check with `bridge check <tunnel-name>`
4. If all tunnels are `connected` and LIVE: report green and exit
**Skill definition:**
```yaml
---
description: >
Check ops-bridge tunnel health and restore any dropped tunnels.
Reports status of all configured tunnels and offers to bring up
any that are stopped or stale.
argument-hint: "[tunnel-name]"
allowed-tools:
- Bash(bridge status)
- Bash(bridge up*)
- Bash(bridge down*)
- Bash(bridge check*)
- Bash(bridge logs*)
---
```
If an optional tunnel name is passed as `$ARGUMENTS`, scope all
operations to that tunnel only.
Gate: `/bridge` skill runs cleanly when all tunnels are up; correctly
identifies and recovers a manually-stopped tunnel.
---
### T04 — Worker agent orientation in CLAUDE.md
```task
id: OPS-WP-0002-T04
status: todo
priority: medium
```
Update global `~/.claude/CLAUDE.md` — add a **Worker Agent — Bridge
Protocol** section:
```markdown
## Worker Agent — Bridge Protocol
When working on a remote machine (CoulombCore, Railiance nodes):
1. At session start, check if `bridge` is installed:
`which bridge && bridge status`
2. If state-hub tunnel is down: `bridge up state-hub-<machine-slug>`
Wait for state `connected` before making state-hub API calls.
3. If `bridge` is not installed, check if the state-hub API is directly
reachable: `curl -s http://127.0.0.1:8000/state/health`
4. Only proceed without state-hub if absolutely necessary — log a
progress note about the outage when connectivity is restored.
```
Also add a one-liner reminder to the ops-bridge session protocol in
`.claude/rules/session-protocol.md`:
> At session start: `bridge status` — bring up any stopped tunnels
> before accessing remote services.
Gate: `~/.claude/CLAUDE.md` contains the Worker Agent section; ops-bridge
session protocol references bridge status check.
---
## Done Criteria
- [ ] `make mcp-http` starts the MCP server on port 8002 (SSE)
- [ ] `bridge_status` and `bridge_check` callable as MCP tools from Claude Code
- [ ] `ops-bridge` registered in `~/.claude.json` at user scope
- [ ] `/bridge` skill surfaces tunnel states and recovers a stopped tunnel
- [ ] Global CLAUDE.md has worker agent bridge protocol
- [ ] All existing tests pass after T01 changes (`make test`)