--- id: STATE-WP-0059 type: workplan title: "State Hub MCP write-layer reliability" domain: custodian repo: state-hub status: finished owner: codex topic_slug: custodian created: "2026-06-07" updated: "2026-06-07" state_hub_workstream_id: "0c9f9ed3-235b-4291-bfa3-cc08699b02b4" --- # State Hub MCP Write-Layer Reliability **Origin:** infrastructure-friction analysis from the `helix_forge` domain (`agentic-resources/docs/ASSESSMENT-infra-friction.md`). ## Critical Review This workplan conforms with `INTENT.md`: State Hub owns the local-first coordination service and FastMCP tools that expose topics, workstreams, tasks, decisions, and progress. Fixing a flaky MCP write wrapper keeps the hub usable as an agent coordination surface without moving canon out of workplan files or turning State Hub into a task factory. Implementation should stay narrow: - Keep MCP as a stateless HTTP client over the FastAPI API; do not add direct DB writes or duplicate API business logic in `mcp_server/server.py`. - Preserve the REST API as the source contract and make MCP responses/error handling match it. - Do not fold the bulk-write work from `STATE-WP-0058` into this workplan; this plan is about correctness and reliable error surfacing for existing writes. Repo review on 2026-06-07 found the likely fault surfaces: - `mcp_server/server.py` implements the affected write tools directly: `create_workstream`, `create_task`, `update_task_status`, `record_decision`, and `add_progress_event`. - `_get` / `_post` / `_patch` return `{"error": ...}` on HTTP/client failure. Several write tools then assume successful response fields exist; for example `update_task_status` formats `task['title']`, which can explain the observed `'title'` KeyError when the underlying PATCH returned an error object. - Existing `tests/test_mcp_smoke.py` verifies REST endpoint shapes the MCP tools depend on, but it explicitly does not exercise the MCP protocol/tool wrapper itself. This gap is where the `-32602 Invalid request parameters` family can hide. The State Hub MCP **write** tools fail while the **REST API stays healthy** (`/state/health` → 200). Observed continuously during a helix_forge work session: - `update_task_status` → `'title'` **KeyError** - `add_progress_event` / `create_workstream` / `create_task` → `-32602 Invalid request parameters` Every hub write that session had to fall back to REST `PATCH`/`POST`. The helix_forge error-mining corroborates it independently: this error family recurs in **4 captured sessions across 3 repos**. This is a **reliability defect in the MCP wrapper layer** (the API and DB are fine), and it is dangerous: an agent with no REST fallback would silently fail to record status/progress, leaving file↔hub drift unreconciled. ## Reproduce + Diagnose the MCP Write Failures ```task id: STATE-WP-0059-T01 status: done priority: high state_hub_task_id: "7fc04b00-2493-4310-ae76-101660621da6" ``` Reproduce the `update_task_status` `'title'` KeyError and the `-32602` on `add_progress_event` / `create_workstream` / `create_task` against the working REST contract. Capture the exact failing request payloads and server-side stack/response. Pin down response-serialization (KeyError `'title'`) vs request-param-validation (`-32602`) and which tools each affects. Done when a focused failing test or diagnostic script exercises the real MCP tool wrapper, not only the FastAPI endpoint, and records: - the MCP tool name and argument payload, - the corresponding REST request/response, - whether failure happens before the HTTP call, during HTTP error handling, or during MCP response serialization. ## Fix the MCP Serialization / Param-Validation Layer ```task id: STATE-WP-0059-T02 status: done priority: high state_hub_task_id: "2636e1d5-142e-463f-875e-4dc1edf12853" ``` Fix the MCP wrapper so the write tools succeed and return the same shape as their REST equivalents. Add an end-to-end regression test exercising each write tool (`create_workstream` / `create_task` / `update_task_status` / `add_progress_event` / `record_decision`) so the MCP and REST paths stay in parity. Implementation notes: - Add a small shared MCP response helper rather than open-coding success/error assumptions in each write tool. - If the API helper returns an error object, return a clear MCP-visible error payload and do not emit follow-up progress events that pretend the write succeeded. - Ensure optional arguments and structured values such as `detail` are declared and normalized in a way FastMCP can validate consistently. - Keep existing automatic progress-event side effects for successful writes. - Update `tests/test_mcp_smoke.py` or add a sibling test that invokes the MCP tool layer directly enough to catch `-32602` and response-field KeyErrors. ## Verify + Harden Error Surfacing ```task id: STATE-WP-0059-T03 status: done priority: medium state_hub_task_id: "af137d6b-07ca-4110-abb0-45a96b84b7a3" ``` Verify the fixed tools against the REST contract across a representative session flow. Ensure failures surface clearly (no silent success) so agents can fall back deterministically. Document the MCP↔REST parity guarantee in `mcp_server/TOOLS.md`, including the expected fallback behavior when the API is unreachable. Done when: - the MCP write regression tests pass, - a manual local flow can create a workstream, create a task, update task status, record a decision, and add progress through MCP while REST verification confirms matching records, - an induced API failure returns a clear error instead of a KeyError or false success, - the session closes with `make fix-consistency REPO=state-hub`. After workplan updates, run from `~/state-hub`: ```bash make fix-consistency REPO=state-hub ``` ## Verification Notes Completed 2026-06-07: - Added shared MCP response/error helpers in `mcp_server/server.py` so API error payloads and malformed REST responses return clear MCP-visible errors instead of triggering response-field KeyErrors or false success. - Kept successful MCP write responses in the same JSON shape as their REST equivalents, while preserving automatic progress events only after successful primary writes. - Added `tests/test_mcp_write_tools.py`, which invokes the registered FastMCP tools in-process and covers success/error behavior for `create_workstream`, `create_task`, `update_task_status`, `add_progress_event`, and `record_decision`. - Documented MCP/REST write parity and fallback expectations in `mcp_server/TOOLS.md`. Verification: - `.venv/bin/python -m pytest tests/test_mcp_write_tools.py -q` -> 8 passed - `.venv/bin/python -m pytest tests/test_mcp_write_tools.py tests/test_mcp_smoke.py -q` -> 21 passed - `git diff --check` -> clean