Files
state-hub/workplans/STATE-WP-0059-mcp-write-layer-reliability.md

6.6 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
STATE-WP-0059 workplan State Hub MCP write-layer reliability custodian state-hub finished codex custodian 2026-06-07 2026-06-07 0c9f9ed3-235b-4291-bfa3-cc08699b02b4

State Hub MCP Write-Layer Reliability

Origin: infrastructure-friction analysis from the helix_forge domain (agentic-resources/docs/ASSESSMENT-infra-friction.md).

Critical Review

This workplan conforms with INTENT.md: State Hub owns the local-first coordination service and FastMCP tools that expose topics, workstreams, tasks, decisions, and progress. Fixing a flaky MCP write wrapper keeps the hub usable as an agent coordination surface without moving canon out of workplan files or turning State Hub into a task factory.

Implementation should stay narrow:

  • Keep MCP as a stateless HTTP client over the FastAPI API; do not add direct DB writes or duplicate API business logic in mcp_server/server.py.
  • Preserve the REST API as the source contract and make MCP responses/error handling match it.
  • Do not fold the bulk-write work from STATE-WP-0058 into this workplan; this plan is about correctness and reliable error surfacing for existing writes.

Repo review on 2026-06-07 found the likely fault surfaces:

  • mcp_server/server.py implements the affected write tools directly: create_workstream, create_task, update_task_status, record_decision, and add_progress_event.
  • _get / _post / _patch return {"error": ...} on HTTP/client failure. Several write tools then assume successful response fields exist; for example update_task_status formats task['title'], which can explain the observed 'title' KeyError when the underlying PATCH returned an error object.
  • Existing tests/test_mcp_smoke.py verifies REST endpoint shapes the MCP tools depend on, but it explicitly does not exercise the MCP protocol/tool wrapper itself. This gap is where the -32602 Invalid request parameters family can hide.

The State Hub MCP write tools fail while the REST API stays healthy (/state/health → 200). Observed continuously during a helix_forge work session:

  • update_task_status'title' KeyError
  • add_progress_event / create_workstream / create_task-32602 Invalid request parameters

Every hub write that session had to fall back to REST PATCH/POST. The helix_forge error-mining corroborates it independently: this error family recurs in 4 captured sessions across 3 repos.

This is a reliability defect in the MCP wrapper layer (the API and DB are fine), and it is dangerous: an agent with no REST fallback would silently fail to record status/progress, leaving file↔hub drift unreconciled.

Reproduce + Diagnose the MCP Write Failures

id: STATE-WP-0059-T01
status: done
priority: high
state_hub_task_id: "7fc04b00-2493-4310-ae76-101660621da6"

Reproduce the update_task_status 'title' KeyError and the -32602 on add_progress_event / create_workstream / create_task against the working REST contract. Capture the exact failing request payloads and server-side stack/response. Pin down response-serialization (KeyError 'title') vs request-param-validation (-32602) and which tools each affects.

Done when a focused failing test or diagnostic script exercises the real MCP tool wrapper, not only the FastAPI endpoint, and records:

  • the MCP tool name and argument payload,
  • the corresponding REST request/response,
  • whether failure happens before the HTTP call, during HTTP error handling, or during MCP response serialization.

Fix the MCP Serialization / Param-Validation Layer

id: STATE-WP-0059-T02
status: done
priority: high
state_hub_task_id: "2636e1d5-142e-463f-875e-4dc1edf12853"

Fix the MCP wrapper so the write tools succeed and return the same shape as their REST equivalents. Add an end-to-end regression test exercising each write tool (create_workstream / create_task / update_task_status / add_progress_event / record_decision) so the MCP and REST paths stay in parity.

Implementation notes:

  • Add a small shared MCP response helper rather than open-coding success/error assumptions in each write tool.
  • If the API helper returns an error object, return a clear MCP-visible error payload and do not emit follow-up progress events that pretend the write succeeded.
  • Ensure optional arguments and structured values such as detail are declared and normalized in a way FastMCP can validate consistently.
  • Keep existing automatic progress-event side effects for successful writes.
  • Update tests/test_mcp_smoke.py or add a sibling test that invokes the MCP tool layer directly enough to catch -32602 and response-field KeyErrors.

Verify + Harden Error Surfacing

id: STATE-WP-0059-T03
status: done
priority: medium
state_hub_task_id: "af137d6b-07ca-4110-abb0-45a96b84b7a3"

Verify the fixed tools against the REST contract across a representative session flow. Ensure failures surface clearly (no silent success) so agents can fall back deterministically. Document the MCP↔REST parity guarantee in mcp_server/TOOLS.md, including the expected fallback behavior when the API is unreachable.

Done when:

  • the MCP write regression tests pass,
  • a manual local flow can create a workstream, create a task, update task status, record a decision, and add progress through MCP while REST verification confirms matching records,
  • an induced API failure returns a clear error instead of a KeyError or false success,
  • the session closes with make fix-consistency REPO=state-hub.

After workplan updates, run from ~/state-hub:

make fix-consistency REPO=state-hub

Verification Notes

Completed 2026-06-07:

  • Added shared MCP response/error helpers in mcp_server/server.py so API error payloads and malformed REST responses return clear MCP-visible errors instead of triggering response-field KeyErrors or false success.
  • Kept successful MCP write responses in the same JSON shape as their REST equivalents, while preserving automatic progress events only after successful primary writes.
  • Added tests/test_mcp_write_tools.py, which invokes the registered FastMCP tools in-process and covers success/error behavior for create_workstream, create_task, update_task_status, add_progress_event, and record_decision.
  • Documented MCP/REST write parity and fallback expectations in mcp_server/TOOLS.md.

Verification:

  • .venv/bin/python -m pytest tests/test_mcp_write_tools.py -q -> 8 passed
  • .venv/bin/python -m pytest tests/test_mcp_write_tools.py tests/test_mcp_smoke.py -q -> 21 passed
  • git diff --check -> clean