generated from coulomb/repo-seed
session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)
Re-ingested under schema v2 (populates error_snippets) and re-ran detect over 27 real sessions. Added a 'content-level root causes' section to docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read (12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok) make fix-consistency failure, and State Hub MCP instability. Documented a fingerprint-noise caveat. WP-0006 finished; suite 98/98. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -86,15 +86,56 @@ issue.** Two high-ROI moves:
|
|||||||
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
|
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
|
||||||
This is precisely what the Measure phase is for — the loop closes here.
|
This is precisely what the Measure phase is for — the loop closes here.
|
||||||
|
|
||||||
|
## Content-level root causes (error-body mining)
|
||||||
|
|
||||||
|
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
|
||||||
|
error fingerprints into the durable digest, and `sig_recurring_error` clusters
|
||||||
|
them. This is the "why" the tool-mix view above could not see.*
|
||||||
|
|
||||||
|
**26 of 27 real sessions hit at least one error.** Top recurring error
|
||||||
|
fingerprints across the corpus (by # sessions affected):
|
||||||
|
|
||||||
|
| # sessions | occ | flavors | top sample |
|
||||||
|
|-----------:|----:|---------|------------|
|
||||||
|
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
|
||||||
|
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
|
||||||
|
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
|
||||||
|
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
|
||||||
|
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
|
||||||
|
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
|
||||||
|
|
||||||
|
Reading:
|
||||||
|
|
||||||
|
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
|
||||||
|
common error is agents trying to edit a file they haven't read into context.
|
||||||
|
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
|
||||||
|
the agent instructions / a skill, or a harness affordance. (Observed live: the
|
||||||
|
author hit this exact error twice while writing this workplan.)
|
||||||
|
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
|
||||||
|
— same family, a re-read-before-edit discipline fixes both.
|
||||||
|
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
|
||||||
|
the consistency tooling itself fails across flavors — a shared infra issue worth
|
||||||
|
a look on the state-hub side (cf. [STATE-WP-0058]).
|
||||||
|
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
|
||||||
|
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
|
||||||
|
flakiness seen during this work (REST fallback used).
|
||||||
|
|
||||||
|
**Caveat — fingerprint noise:** the fail-hint heuristic also catches non-failures
|
||||||
|
(successful hub JSON responses, source lines containing `raise …Error`, linter
|
||||||
|
"N errors" summaries). The *top* fingerprints above are real; a future refinement
|
||||||
|
should tighten `_is_failed` (e.g. skip valid-JSON success payloads and code-read
|
||||||
|
snapshots) before trusting the long tail.
|
||||||
|
|
||||||
## What this assessment still can't see
|
## What this assessment still can't see
|
||||||
|
|
||||||
- **Why** a session was expensive at the *content* level (specific error
|
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
|
||||||
messages, repeated failed approaches) — the digest captures tool histograms and
|
(error-body mining, above), modulo the fingerprint-noise caveat.
|
||||||
prompt/response snippets but not error-body text. Mining tool-result bodies for
|
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
|
||||||
recurring failure messages is the natural next extension if root-cause depth is
|
silently retrying a wrong strategy without an error — are still invisible.
|
||||||
needed.
|
|
||||||
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
|
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
|
||||||
friction claims are Claude-weighted for now.
|
friction claims are Claude-weighted for now.
|
||||||
|
|
||||||
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
|
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
|
||||||
|
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
|
||||||
|
[STATE-WP-0058]: handed off to the state-hub repo worker
|
||||||
[detect/quality.py]: ../session_memory/detect/quality.py
|
[detect/quality.py]: ../session_memory/detect/quality.py
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ type: workplan
|
|||||||
title: "Coding Session Memory — Error-Body Mining (content-level root causes)"
|
title: "Coding Session Memory — Error-Body Mining (content-level root causes)"
|
||||||
domain: helix_forge
|
domain: helix_forge
|
||||||
repo: agentic-resources
|
repo: agentic-resources
|
||||||
status: ready
|
status: finished
|
||||||
owner: codex
|
owner: codex
|
||||||
topic_slug: helix-forge
|
topic_slug: helix-forge
|
||||||
created: "2026-06-07"
|
created: "2026-06-07"
|
||||||
@@ -64,7 +64,7 @@ synthetic digests sharing a fingerprint.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: AGENTIC-WP-0006-T03
|
id: AGENTIC-WP-0006-T03
|
||||||
status: todo
|
status: done
|
||||||
priority: medium
|
priority: medium
|
||||||
state_hub_task_id: "bed16d23-3971-4257-b066-d1e639fef150"
|
state_hub_task_id: "bed16d23-3971-4257-b066-d1e639fef150"
|
||||||
```
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user