generated from coulomb/repo-seed
session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)
Re-ingested under schema v2 (populates error_snippets) and re-ran detect over 27 real sessions. Added a 'content-level root causes' section to docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read (12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok) make fix-consistency failure, and State Hub MCP instability. Documented a fingerprint-noise caveat. WP-0006 finished; suite 98/98. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -86,15 +86,56 @@ issue.** Two high-ROI moves:
|
||||
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
|
||||
This is precisely what the Measure phase is for — the loop closes here.
|
||||
|
||||
## Content-level root causes (error-body mining)
|
||||
|
||||
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
|
||||
error fingerprints into the durable digest, and `sig_recurring_error` clusters
|
||||
them. This is the "why" the tool-mix view above could not see.*
|
||||
|
||||
**26 of 27 real sessions hit at least one error.** Top recurring error
|
||||
fingerprints across the corpus (by # sessions affected):
|
||||
|
||||
| # sessions | occ | flavors | top sample |
|
||||
|-----------:|----:|---------|------------|
|
||||
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
|
||||
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
|
||||
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
|
||||
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
|
||||
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
|
||||
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
|
||||
|
||||
Reading:
|
||||
|
||||
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
|
||||
common error is agents trying to edit a file they haven't read into context.
|
||||
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
|
||||
the agent instructions / a skill, or a harness affordance. (Observed live: the
|
||||
author hit this exact error twice while writing this workplan.)
|
||||
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
|
||||
— same family, a re-read-before-edit discipline fixes both.
|
||||
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
|
||||
the consistency tooling itself fails across flavors — a shared infra issue worth
|
||||
a look on the state-hub side (cf. [STATE-WP-0058]).
|
||||
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
|
||||
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
|
||||
flakiness seen during this work (REST fallback used).
|
||||
|
||||
**Caveat — fingerprint noise:** the fail-hint heuristic also catches non-failures
|
||||
(successful hub JSON responses, source lines containing `raise …Error`, linter
|
||||
"N errors" summaries). The *top* fingerprints above are real; a future refinement
|
||||
should tighten `_is_failed` (e.g. skip valid-JSON success payloads and code-read
|
||||
snapshots) before trusting the long tail.
|
||||
|
||||
## What this assessment still can't see
|
||||
|
||||
- **Why** a session was expensive at the *content* level (specific error
|
||||
messages, repeated failed approaches) — the digest captures tool histograms and
|
||||
prompt/response snippets but not error-body text. Mining tool-result bodies for
|
||||
recurring failure messages is the natural next extension if root-cause depth is
|
||||
needed.
|
||||
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
|
||||
(error-body mining, above), modulo the fingerprint-noise caveat.
|
||||
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
|
||||
silently retrying a wrong strategy without an error — are still invisible.
|
||||
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
|
||||
friction claims are Claude-weighted for now.
|
||||
|
||||
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
|
||||
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
|
||||
[STATE-WP-0058]: handed off to the state-hub repo worker
|
||||
[detect/quality.py]: ../session_memory/detect/quality.py
|
||||
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Coding Session Memory — Error-Body Mining (content-level root causes)"
|
||||
domain: helix_forge
|
||||
repo: agentic-resources
|
||||
status: ready
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: helix-forge
|
||||
created: "2026-06-07"
|
||||
@@ -64,7 +64,7 @@ synthetic digests sharing a fingerprint.
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0006-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "bed16d23-3971-4257-b066-d1e639fef150"
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user