session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)

Re-ingested under schema v2 (populates error_snippets) and re-ran detect over
27 real sessions. Added a 'content-level root causes' section to
docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read
(12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok)
make fix-consistency failure, and State Hub MCP instability. Documented a
fingerprint-noise caveat. WP-0006 finished; suite 98/98.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 13:09:29 +02:00
parent e022c0f9d6
commit 7cce276d32
2 changed files with 48 additions and 7 deletions

View File

@@ -86,15 +86,56 @@ issue.** Two high-ROI moves:
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
This is precisely what the Measure phase is for — the loop closes here.
## Content-level root causes (error-body mining)
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
error fingerprints into the durable digest, and `sig_recurring_error` clusters
them. This is the "why" the tool-mix view above could not see.*
**26 of 27 real sessions hit at least one error.** Top recurring error
fingerprints across the corpus (by # sessions affected):
| # sessions | occ | flavors | top sample |
|-----------:|----:|---------|------------|
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
Reading:
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
common error is agents trying to edit a file they haven't read into context.
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
the agent instructions / a skill, or a harness affordance. (Observed live: the
author hit this exact error twice while writing this workplan.)
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
— same family, a re-read-before-edit discipline fixes both.
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
the consistency tooling itself fails across flavors — a shared infra issue worth
a look on the state-hub side (cf. [STATE-WP-0058]).
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
flakiness seen during this work (REST fallback used).
**Caveat — fingerprint noise:** the fail-hint heuristic also catches non-failures
(successful hub JSON responses, source lines containing `raise …Error`, linter
"N errors" summaries). The *top* fingerprints above are real; a future refinement
should tighten `_is_failed` (e.g. skip valid-JSON success payloads and code-read
snapshots) before trusting the long tail.
## What this assessment still can't see
- **Why** a session was expensive at the *content* level (specific error
messages, repeated failed approaches) — the digest captures tool histograms and
prompt/response snippets but not error-body text. Mining tool-result bodies for
recurring failure messages is the natural next extension if root-cause depth is
needed.
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
(error-body mining, above), modulo the fingerprint-noise caveat.
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
silently retrying a wrong strategy without an error — are still invisible.
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
friction claims are Claude-weighted for now.
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
[STATE-WP-0058]: handed off to the state-hub repo worker
[detect/quality.py]: ../session_memory/detect/quality.py