generated from coulomb/repo-seed
chore: close overview counts and review reliability workplans
This commit is contained in:
145
workplans/STATE-WP-0059-mcp-write-layer-reliability.md
Normal file
145
workplans/STATE-WP-0059-mcp-write-layer-reliability.md
Normal file
@@ -0,0 +1,145 @@
|
||||
---
|
||||
id: STATE-WP-0059
|
||||
type: workplan
|
||||
title: "State Hub MCP write-layer reliability"
|
||||
domain: custodian
|
||||
repo: state-hub
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-07"
|
||||
updated: "2026-06-07"
|
||||
state_hub_workstream_id: "0c9f9ed3-235b-4291-bfa3-cc08699b02b4"
|
||||
---
|
||||
|
||||
# State Hub MCP Write-Layer Reliability
|
||||
|
||||
**Origin:** infrastructure-friction analysis from the `helix_forge` domain
|
||||
(`agentic-resources/docs/ASSESSMENT-infra-friction.md`).
|
||||
|
||||
## Critical Review
|
||||
|
||||
This workplan conforms with `INTENT.md`: State Hub owns the local-first
|
||||
coordination service and FastMCP tools that expose topics, workstreams, tasks,
|
||||
decisions, and progress. Fixing a flaky MCP write wrapper keeps the hub usable
|
||||
as an agent coordination surface without moving canon out of workplan files or
|
||||
turning State Hub into a task factory.
|
||||
|
||||
Implementation should stay narrow:
|
||||
|
||||
- Keep MCP as a stateless HTTP client over the FastAPI API; do not add direct DB
|
||||
writes or duplicate API business logic in `mcp_server/server.py`.
|
||||
- Preserve the REST API as the source contract and make MCP responses/error
|
||||
handling match it.
|
||||
- Do not fold the bulk-write work from `STATE-WP-0058` into this workplan; this
|
||||
plan is about correctness and reliable error surfacing for existing writes.
|
||||
|
||||
Repo review on 2026-06-07 found the likely fault surfaces:
|
||||
|
||||
- `mcp_server/server.py` implements the affected write tools directly:
|
||||
`create_workstream`, `create_task`, `update_task_status`, `record_decision`,
|
||||
and `add_progress_event`.
|
||||
- `_get` / `_post` / `_patch` return `{"error": ...}` on HTTP/client failure.
|
||||
Several write tools then assume successful response fields exist; for example
|
||||
`update_task_status` formats `task['title']`, which can explain the observed
|
||||
`'title'` KeyError when the underlying PATCH returned an error object.
|
||||
- Existing `tests/test_mcp_smoke.py` verifies REST endpoint shapes the MCP tools
|
||||
depend on, but it explicitly does not exercise the MCP protocol/tool wrapper
|
||||
itself. This gap is where the `-32602 Invalid request parameters` family can
|
||||
hide.
|
||||
|
||||
The State Hub MCP **write** tools fail while the **REST API stays healthy**
|
||||
(`/state/health` → 200). Observed continuously during a helix_forge work session:
|
||||
|
||||
- `update_task_status` → `'title'` **KeyError**
|
||||
- `add_progress_event` / `create_workstream` / `create_task` → `-32602 Invalid
|
||||
request parameters`
|
||||
|
||||
Every hub write that session had to fall back to REST `PATCH`/`POST`. The
|
||||
helix_forge error-mining corroborates it independently: this error family recurs
|
||||
in **4 captured sessions across 3 repos**.
|
||||
|
||||
This is a **reliability defect in the MCP wrapper layer** (the API and DB are
|
||||
fine), and it is dangerous: an agent with no REST fallback would silently fail to
|
||||
record status/progress, leaving file↔hub drift unreconciled.
|
||||
|
||||
## Reproduce + Diagnose the MCP Write Failures
|
||||
|
||||
```task
|
||||
id: STATE-WP-0059-T01
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "7fc04b00-2493-4310-ae76-101660621da6"
|
||||
```
|
||||
|
||||
Reproduce the `update_task_status` `'title'` KeyError and the `-32602` on
|
||||
`add_progress_event` / `create_workstream` / `create_task` against the working REST
|
||||
contract. Capture the exact failing request payloads and server-side
|
||||
stack/response. Pin down response-serialization (KeyError `'title'`) vs
|
||||
request-param-validation (`-32602`) and which tools each affects.
|
||||
|
||||
Done when a focused failing test or diagnostic script exercises the real MCP
|
||||
tool wrapper, not only the FastAPI endpoint, and records:
|
||||
|
||||
- the MCP tool name and argument payload,
|
||||
- the corresponding REST request/response,
|
||||
- whether failure happens before the HTTP call, during HTTP error handling, or
|
||||
during MCP response serialization.
|
||||
|
||||
## Fix the MCP Serialization / Param-Validation Layer
|
||||
|
||||
```task
|
||||
id: STATE-WP-0059-T02
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "2636e1d5-142e-463f-875e-4dc1edf12853"
|
||||
```
|
||||
|
||||
Fix the MCP wrapper so the write tools succeed and return the same shape as their
|
||||
REST equivalents. Add an end-to-end regression test exercising each write tool
|
||||
(`create_workstream` / `create_task` / `update_task_status` / `add_progress_event`
|
||||
/ `record_decision`) so the MCP and REST paths stay in parity.
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Add a small shared MCP response helper rather than open-coding success/error
|
||||
assumptions in each write tool.
|
||||
- If the API helper returns an error object, return a clear MCP-visible error
|
||||
payload and do not emit follow-up progress events that pretend the write
|
||||
succeeded.
|
||||
- Ensure optional arguments and structured values such as `detail` are declared
|
||||
and normalized in a way FastMCP can validate consistently.
|
||||
- Keep existing automatic progress-event side effects for successful writes.
|
||||
- Update `tests/test_mcp_smoke.py` or add a sibling test that invokes the MCP
|
||||
tool layer directly enough to catch `-32602` and response-field KeyErrors.
|
||||
|
||||
## Verify + Harden Error Surfacing
|
||||
|
||||
```task
|
||||
id: STATE-WP-0059-T03
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "af137d6b-07ca-4110-abb0-45a96b84b7a3"
|
||||
```
|
||||
|
||||
Verify the fixed tools against the REST contract across a representative session
|
||||
flow. Ensure failures surface clearly (no silent success) so agents can fall back
|
||||
deterministically. Document the MCP↔REST parity guarantee in
|
||||
`mcp_server/TOOLS.md`, including the expected fallback behavior when the API is
|
||||
unreachable.
|
||||
|
||||
Done when:
|
||||
|
||||
- the MCP write regression tests pass,
|
||||
- a manual local flow can create a workstream, create a task, update task
|
||||
status, record a decision, and add progress through MCP while REST
|
||||
verification confirms matching records,
|
||||
- an induced API failure returns a clear error instead of a KeyError or false
|
||||
success,
|
||||
- the session closes with `make fix-consistency REPO=state-hub`.
|
||||
|
||||
After workplan updates, run from `~/state-hub`:
|
||||
|
||||
```bash
|
||||
make fix-consistency REPO=state-hub
|
||||
```
|
||||
Reference in New Issue
Block a user