session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)

Re-ingested under schema v2 (populates error_snippets) and re-ran detect over 27 real sessions. Added a 'content-level root causes' section to docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read (12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok) make fix-consistency failure, and State Hub MCP instability. Documented a fingerprint-noise caveat. WP-0006 finished; suite 98/98. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:09:29 +02:00
parent e022c0f9d6
commit 7cce276d32
2 changed files with 48 additions and 7 deletions
--- a/docs/ASSESSMENT-infra-friction.md
+++ b/docs/ASSESSMENT-infra-friction.md
@@ -86,15 +86,56 @@ issue.** Two high-ROI moves:
  share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
  This is precisely what the Measure phase is for — the loop closes here.

+## Content-level root causes (error-body mining)
+
+*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
+error fingerprints into the durable digest, and `sig_recurring_error` clusters
+them. This is the "why" the tool-mix view above could not see.*
+
+**26 of 27 real sessions hit at least one error.** Top recurring error
+fingerprints across the corpus (by # sessions affected):
+
+| # sessions | occ | flavors | top sample |
+|-----------:|----:|---------|------------|
+| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
+| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
+| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
+| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
+| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
+| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
+
+Reading:
+
+- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
+  common error is agents trying to edit a file they haven't read into context.
+  This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
+  the agent instructions / a skill, or a harness affordance. (Observed live: the
+  author hit this exact error twice while writing this workplan.)
+- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
+  — same family, a re-read-before-edit discipline fixes both.
+- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
+  the consistency tooling itself fails across flavors — a shared infra issue worth
+  a look on the state-hub side (cf. [STATE-WP-0058]).
+- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
+  in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
+  flakiness seen during this work (REST fallback used).
+
+**Caveat — fingerprint noise:** the fail-hint heuristic also catches non-failures
+(successful hub JSON responses, source lines containing `raise …Error`, linter
+"N errors" summaries). The *top* fingerprints above are real; a future refinement
+should tighten `_is_failed` (e.g. skip valid-JSON success payloads and code-read
+snapshots) before trusting the long tail.
+
 ## What this assessment still can't see

- **Why** a session was expensive at the *content* level (specific error
-  messages, repeated failed approaches) — the digest captures tool histograms and
-  prompt/response snippets but not error-body text. Mining tool-result bodies for
-  recurring failure messages is the natural next extension if root-cause depth is
-  needed.
+- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
+  (error-body mining, above), modulo the fingerprint-noise caveat.
+- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
+  silently retrying a wrong strategy without an error — are still invisible.
 - Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
  friction claims are Claude-weighted for now.

 [AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
+[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
+[STATE-WP-0058]: handed off to the state-hub repo worker
 [detect/quality.py]: ../session_memory/detect/quality.py