generated from coulomb/repo-seed
169 lines
6.6 KiB
Markdown
169 lines
6.6 KiB
Markdown
---
|
|
id: STATE-WP-0059
|
|
type: workplan
|
|
title: "State Hub MCP write-layer reliability"
|
|
domain: custodian
|
|
repo: state-hub
|
|
status: finished
|
|
owner: codex
|
|
topic_slug: custodian
|
|
created: "2026-06-07"
|
|
updated: "2026-06-07"
|
|
state_hub_workstream_id: "0c9f9ed3-235b-4291-bfa3-cc08699b02b4"
|
|
---
|
|
|
|
# State Hub MCP Write-Layer Reliability
|
|
|
|
**Origin:** infrastructure-friction analysis from the `helix_forge` domain
|
|
(`agentic-resources/docs/ASSESSMENT-infra-friction.md`).
|
|
|
|
## Critical Review
|
|
|
|
This workplan conforms with `INTENT.md`: State Hub owns the local-first
|
|
coordination service and FastMCP tools that expose topics, workstreams, tasks,
|
|
decisions, and progress. Fixing a flaky MCP write wrapper keeps the hub usable
|
|
as an agent coordination surface without moving canon out of workplan files or
|
|
turning State Hub into a task factory.
|
|
|
|
Implementation should stay narrow:
|
|
|
|
- Keep MCP as a stateless HTTP client over the FastAPI API; do not add direct DB
|
|
writes or duplicate API business logic in `mcp_server/server.py`.
|
|
- Preserve the REST API as the source contract and make MCP responses/error
|
|
handling match it.
|
|
- Do not fold the bulk-write work from `STATE-WP-0058` into this workplan; this
|
|
plan is about correctness and reliable error surfacing for existing writes.
|
|
|
|
Repo review on 2026-06-07 found the likely fault surfaces:
|
|
|
|
- `mcp_server/server.py` implements the affected write tools directly:
|
|
`create_workstream`, `create_task`, `update_task_status`, `record_decision`,
|
|
and `add_progress_event`.
|
|
- `_get` / `_post` / `_patch` return `{"error": ...}` on HTTP/client failure.
|
|
Several write tools then assume successful response fields exist; for example
|
|
`update_task_status` formats `task['title']`, which can explain the observed
|
|
`'title'` KeyError when the underlying PATCH returned an error object.
|
|
- Existing `tests/test_mcp_smoke.py` verifies REST endpoint shapes the MCP tools
|
|
depend on, but it explicitly does not exercise the MCP protocol/tool wrapper
|
|
itself. This gap is where the `-32602 Invalid request parameters` family can
|
|
hide.
|
|
|
|
The State Hub MCP **write** tools fail while the **REST API stays healthy**
|
|
(`/state/health` → 200). Observed continuously during a helix_forge work session:
|
|
|
|
- `update_task_status` → `'title'` **KeyError**
|
|
- `add_progress_event` / `create_workstream` / `create_task` → `-32602 Invalid
|
|
request parameters`
|
|
|
|
Every hub write that session had to fall back to REST `PATCH`/`POST`. The
|
|
helix_forge error-mining corroborates it independently: this error family recurs
|
|
in **4 captured sessions across 3 repos**.
|
|
|
|
This is a **reliability defect in the MCP wrapper layer** (the API and DB are
|
|
fine), and it is dangerous: an agent with no REST fallback would silently fail to
|
|
record status/progress, leaving file↔hub drift unreconciled.
|
|
|
|
## Reproduce + Diagnose the MCP Write Failures
|
|
|
|
```task
|
|
id: STATE-WP-0059-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "7fc04b00-2493-4310-ae76-101660621da6"
|
|
```
|
|
|
|
Reproduce the `update_task_status` `'title'` KeyError and the `-32602` on
|
|
`add_progress_event` / `create_workstream` / `create_task` against the working REST
|
|
contract. Capture the exact failing request payloads and server-side
|
|
stack/response. Pin down response-serialization (KeyError `'title'`) vs
|
|
request-param-validation (`-32602`) and which tools each affects.
|
|
|
|
Done when a focused failing test or diagnostic script exercises the real MCP
|
|
tool wrapper, not only the FastAPI endpoint, and records:
|
|
|
|
- the MCP tool name and argument payload,
|
|
- the corresponding REST request/response,
|
|
- whether failure happens before the HTTP call, during HTTP error handling, or
|
|
during MCP response serialization.
|
|
|
|
## Fix the MCP Serialization / Param-Validation Layer
|
|
|
|
```task
|
|
id: STATE-WP-0059-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2636e1d5-142e-463f-875e-4dc1edf12853"
|
|
```
|
|
|
|
Fix the MCP wrapper so the write tools succeed and return the same shape as their
|
|
REST equivalents. Add an end-to-end regression test exercising each write tool
|
|
(`create_workstream` / `create_task` / `update_task_status` / `add_progress_event`
|
|
/ `record_decision`) so the MCP and REST paths stay in parity.
|
|
|
|
Implementation notes:
|
|
|
|
- Add a small shared MCP response helper rather than open-coding success/error
|
|
assumptions in each write tool.
|
|
- If the API helper returns an error object, return a clear MCP-visible error
|
|
payload and do not emit follow-up progress events that pretend the write
|
|
succeeded.
|
|
- Ensure optional arguments and structured values such as `detail` are declared
|
|
and normalized in a way FastMCP can validate consistently.
|
|
- Keep existing automatic progress-event side effects for successful writes.
|
|
- Update `tests/test_mcp_smoke.py` or add a sibling test that invokes the MCP
|
|
tool layer directly enough to catch `-32602` and response-field KeyErrors.
|
|
|
|
## Verify + Harden Error Surfacing
|
|
|
|
```task
|
|
id: STATE-WP-0059-T03
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "af137d6b-07ca-4110-abb0-45a96b84b7a3"
|
|
```
|
|
|
|
Verify the fixed tools against the REST contract across a representative session
|
|
flow. Ensure failures surface clearly (no silent success) so agents can fall back
|
|
deterministically. Document the MCP↔REST parity guarantee in
|
|
`mcp_server/TOOLS.md`, including the expected fallback behavior when the API is
|
|
unreachable.
|
|
|
|
Done when:
|
|
|
|
- the MCP write regression tests pass,
|
|
- a manual local flow can create a workstream, create a task, update task
|
|
status, record a decision, and add progress through MCP while REST
|
|
verification confirms matching records,
|
|
- an induced API failure returns a clear error instead of a KeyError or false
|
|
success,
|
|
- the session closes with `make fix-consistency REPO=state-hub`.
|
|
|
|
After workplan updates, run from `~/state-hub`:
|
|
|
|
```bash
|
|
make fix-consistency REPO=state-hub
|
|
```
|
|
|
|
## Verification Notes
|
|
|
|
Completed 2026-06-07:
|
|
|
|
- Added shared MCP response/error helpers in `mcp_server/server.py` so API error
|
|
payloads and malformed REST responses return clear MCP-visible errors instead
|
|
of triggering response-field KeyErrors or false success.
|
|
- Kept successful MCP write responses in the same JSON shape as their REST
|
|
equivalents, while preserving automatic progress events only after successful
|
|
primary writes.
|
|
- Added `tests/test_mcp_write_tools.py`, which invokes the registered FastMCP
|
|
tools in-process and covers success/error behavior for `create_workstream`,
|
|
`create_task`, `update_task_status`, `add_progress_event`, and
|
|
`record_decision`.
|
|
- Documented MCP/REST write parity and fallback expectations in
|
|
`mcp_server/TOOLS.md`.
|
|
|
|
Verification:
|
|
|
|
- `.venv/bin/python -m pytest tests/test_mcp_write_tools.py -q` -> 8 passed
|
|
- `.venv/bin/python -m pytest tests/test_mcp_write_tools.py tests/test_mcp_smoke.py -q` -> 21 passed
|
|
- `git diff --check` -> clean
|