generated from coulomb/repo-seed
Compare commits
3 Commits
70433cda61
...
896fde59f0
| Author | SHA1 | Date | |
|---|---|---|---|
| 896fde59f0 | |||
| 48618293b0 | |||
| 21c714e286 |
100
docs/ASSESSMENT-infra-friction.md
Normal file
100
docs/ASSESSMENT-infra-friction.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# Infrastructure Friction Assessment
|
||||
|
||||
*Generated 2026-06-07 from captured coding-session data (Helix Forge session
|
||||
memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven
|
||||
assessment of where our agentic coding sessions spend effort on plumbing rather
|
||||
than work.*
|
||||
|
||||
## Method & data quality
|
||||
|
||||
- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter
|
||||
([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs
|
||||
(mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.**
|
||||
- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by
|
||||
the outcome heuristic and produced a *false-positive* "cross-flavor abandoned"
|
||||
pattern in the first catalog — now purged. Treat any pre-hardening finding with
|
||||
suspicion.
|
||||
- **Key framing:** all 27 real sessions ended in `success`. So the friction here
|
||||
is **cost/efficiency, not failure** — sessions get there, but pay an avoidable
|
||||
tax to do it.
|
||||
|
||||
## The headline number
|
||||
|
||||
Across the 27 real sessions, tool-call activity breaks down as:
|
||||
|
||||
| Bucket | Share |
|
||||
|--------|------:|
|
||||
| shell (Bash / run_terminal) | 38.2 % |
|
||||
| edit | 30.2 % |
|
||||
| read | 12.9 % |
|
||||
| **State Hub MCP** | **10.3 %** |
|
||||
| **task-management plumbing** | **5.8 %** |
|
||||
| **schema-loading (`ToolSearch`)** | **1.5 %** |
|
||||
| other | 1.1 % |
|
||||
|
||||
**~17.6 % of all tool calls in real coding sessions are coordination plumbing
|
||||
(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead
|
||||
share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly.
|
||||
|
||||
## Ranked friction
|
||||
|
||||
### 1. State Hub call volume — *highest cost, addressable*
|
||||
State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions:
|
||||
|
||||
| Repo (one session) | total calls | State Hub calls | overhead share |
|
||||
|--------------------|------:|------:|------:|
|
||||
| vergabe-teilnahme | 570 | **231** | 43 % |
|
||||
| activity-core | 488 | 98 | 23 % |
|
||||
| flex-auth | 236 | 35 (+27 task) | 29 % |
|
||||
| net-kingdom | 129 | 25 | 22 % |
|
||||
|
||||
Root cause: many **fine-grained** calls — per-task status updates, per-event
|
||||
progress writes, repeated `get_domain_summary`. 231 hub calls in a single session
|
||||
is coordination overhead, not work.
|
||||
|
||||
### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix*
|
||||
**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP
|
||||
tools are *deferred*, so nearly every session re-discovers and re-loads the same
|
||||
tool schemas before it can call them. This is pure overhead with no work value —
|
||||
and it is **exactly the CLI/MCP-interface friction hypothesized.**
|
||||
|
||||
### 3. Task-management plumbing — 5.8 %
|
||||
`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with
|
||||
(1); much of it is redundant status churn within a session.
|
||||
|
||||
### 4. Tool thrash — *session-shape, watch only*
|
||||
11 sessions hammer a single tool 80–230× (usually Bash or Edit). Less an infra
|
||||
problem than a sign of missing higher-level tooling; low priority.
|
||||
|
||||
### 5. Budget overrun — 3 sessions
|
||||
Token cost well above peers. Secondary; revisit once (1)–(2) are addressed.
|
||||
|
||||
## Recommendations
|
||||
|
||||
**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor
|
||||
issue.** Two high-ROI moves:
|
||||
|
||||
- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest)
|
||||
that (i) **front-loads the common hub tool schemas** so agents stop
|
||||
`ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of
|
||||
sessions) — and (ii) **teaches batched writes** (sync N task statuses in one
|
||||
call, fewer progress events) to attack finding #1. Low effort, broad reach.
|
||||
- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan
|
||||
statuses" op so a session doesn't make 200+ individual hub calls. This is the
|
||||
structural fix behind the skill's guidance.
|
||||
- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead
|
||||
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
|
||||
This is precisely what the Measure phase is for — the loop closes here.
|
||||
|
||||
## What this assessment still can't see
|
||||
|
||||
- **Why** a session was expensive at the *content* level (specific error
|
||||
messages, repeated failed approaches) — the digest captures tool histograms and
|
||||
prompt/response snippets but not error-body text. Mining tool-result bodies for
|
||||
recurring failure messages is the natural next extension if root-cause depth is
|
||||
needed.
|
||||
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
|
||||
friction claims are Claude-weighted for now.
|
||||
|
||||
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
|
||||
[detect/quality.py]: ../session_memory/detect/quality.py
|
||||
@@ -1,79 +0,0 @@
|
||||
{
|
||||
"created_at": "2026-06-07T08:02:03Z",
|
||||
"distribution_ready": true,
|
||||
"id": "sp-problem-abandoned-outcome",
|
||||
"name": "cross-flavor problem: abandoned",
|
||||
"polarity": "problem",
|
||||
"problem": "cross-flavor problem: abandoned",
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 13.0,
|
||||
"cross_flavor": true,
|
||||
"flavors": [
|
||||
"claude",
|
||||
"grok"
|
||||
],
|
||||
"frequency": 13,
|
||||
"key": "problem:abandoned:outcome",
|
||||
"locus": "outcome",
|
||||
"polarity": "problem",
|
||||
"repos": [
|
||||
"can-you-assist",
|
||||
"llm-connect"
|
||||
],
|
||||
"score": 253.5,
|
||||
"sessions": [
|
||||
"claude:0510d5f4-956d-430a-9e89-6abc54f95b6a",
|
||||
"claude:106fd234-949e-470d-a208-fe5ed8f14562",
|
||||
"claude:377aba4f-8bbf-4760-90e9-469486ab0518",
|
||||
"claude:4c606c31-beff-4a41-a325-ef63c9f8fb0e",
|
||||
"claude:5bffe081-39fb-44cd-9966-4006f9235a0e",
|
||||
"claude:60d3c947-eacf-49e9-b12c-ff8eb6b1c20b",
|
||||
"claude:8f50f5b4-fbc4-4abe-9a7c-b25b2a713671",
|
||||
"claude:95b1fe00-5d2e-482f-9618-fddf9cdbeb51",
|
||||
"claude:c3e782ad-96b9-4cf1-9eb5-defdf3578426",
|
||||
"claude:d75b2084-faec-40cf-aaf8-d7e0c026bde6",
|
||||
"claude:f282058a-0a43-4fb8-87fc-1e67eaa3533c",
|
||||
"grok:019e6103-af11-7a92-8e0b-5f40465d8223",
|
||||
"grok:019e611e-0728-77d3-bb7a-8c5983e5058a"
|
||||
],
|
||||
"signal_type": "abandoned",
|
||||
"title": "cross-flavor problem: abandoned"
|
||||
},
|
||||
"promoted_at": "2026-06-07T08:02:03Z",
|
||||
"source_key": "problem:abandoned:outcome"
|
||||
},
|
||||
"rendering_hints": {
|
||||
"claude": {
|
||||
"note": "TODO: refine rendering",
|
||||
"target": "CLAUDE.md"
|
||||
},
|
||||
"grok": {
|
||||
"note": "TODO: refine rendering",
|
||||
"target": "instructions"
|
||||
}
|
||||
},
|
||||
"resolutions": [
|
||||
{
|
||||
"detail": "",
|
||||
"steps": [],
|
||||
"summary": "TODO: capture the recommended resolution"
|
||||
}
|
||||
],
|
||||
"schema_version": 1,
|
||||
"scope": {
|
||||
"domains": [],
|
||||
"flavors": [
|
||||
"claude",
|
||||
"grok"
|
||||
],
|
||||
"repos": [
|
||||
"can-you-assist",
|
||||
"llm-connect"
|
||||
]
|
||||
},
|
||||
"status": "approved",
|
||||
"updated_at": "2026-06-07T08:02:03Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"created_at": "2026-06-07T08:02:03Z",
|
||||
"created_at": "2026-06-07T09:13:20Z",
|
||||
"distribution_ready": true,
|
||||
"id": "sp-problem-budget_overrun-tokens",
|
||||
"name": "problem: budget overrun",
|
||||
@@ -8,39 +8,30 @@
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 27.135,
|
||||
"cost_impact": 10.667,
|
||||
"cross_flavor": false,
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"frequency": 8,
|
||||
"frequency": 3,
|
||||
"key": "problem:budget_overrun:tokens",
|
||||
"locus": "tokens",
|
||||
"polarity": "problem",
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"artifact-store",
|
||||
"citation-evidence",
|
||||
"flex-auth",
|
||||
"infospace-bench",
|
||||
"railiance-apps",
|
||||
"vergabe-teilnahme"
|
||||
"infospace-bench"
|
||||
],
|
||||
"score": 217.08,
|
||||
"score": 32.001,
|
||||
"sessions": [
|
||||
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
|
||||
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||
"claude:a7b4a9b3-0942-4899-b502-e76b0013fc42",
|
||||
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
|
||||
"claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
|
||||
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"
|
||||
],
|
||||
"signal_type": "budget_overrun",
|
||||
"title": "problem: budget overrun"
|
||||
},
|
||||
"promoted_at": "2026-06-07T08:02:03Z",
|
||||
"promoted_at": "2026-06-07T09:13:20Z",
|
||||
"source_key": "problem:budget_overrun:tokens"
|
||||
},
|
||||
"rendering_hints": {
|
||||
@@ -63,16 +54,12 @@
|
||||
"claude"
|
||||
],
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"artifact-store",
|
||||
"citation-evidence",
|
||||
"flex-auth",
|
||||
"infospace-bench",
|
||||
"railiance-apps",
|
||||
"vergabe-teilnahme"
|
||||
"infospace-bench"
|
||||
]
|
||||
},
|
||||
"status": "approved",
|
||||
"updated_at": "2026-06-07T08:02:03Z",
|
||||
"updated_at": "2026-06-07T09:13:20Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
|
||||
@@ -0,0 +1,62 @@
|
||||
{
|
||||
"created_at": "2026-06-07T09:13:20Z",
|
||||
"distribution_ready": false,
|
||||
"id": "sp-problem-infra_overhead-infra_overhead",
|
||||
"name": "problem: infra overhead",
|
||||
"polarity": "problem",
|
||||
"problem": "problem: infra overhead",
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 0.801,
|
||||
"cross_flavor": false,
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"frequency": 2,
|
||||
"key": "problem:infra_overhead:infra_overhead",
|
||||
"locus": "infra_overhead",
|
||||
"polarity": "problem",
|
||||
"repos": [
|
||||
"markitect-main",
|
||||
"vergabe-teilnahme"
|
||||
],
|
||||
"score": 1.602,
|
||||
"sessions": [
|
||||
"claude:135002f9-98d2-4d1b-b8fb-543b20388782",
|
||||
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
|
||||
],
|
||||
"signal_type": "infra_overhead",
|
||||
"title": "problem: infra overhead"
|
||||
},
|
||||
"promoted_at": "2026-06-07T09:13:20Z",
|
||||
"source_key": "problem:infra_overhead:infra_overhead"
|
||||
},
|
||||
"rendering_hints": {
|
||||
"claude": {
|
||||
"note": "TODO: refine rendering",
|
||||
"target": "CLAUDE.md"
|
||||
}
|
||||
},
|
||||
"resolutions": [
|
||||
{
|
||||
"detail": "",
|
||||
"steps": [],
|
||||
"summary": "TODO: capture the recommended resolution"
|
||||
}
|
||||
],
|
||||
"schema_version": 1,
|
||||
"scope": {
|
||||
"domains": [],
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"repos": [
|
||||
"markitect-main",
|
||||
"vergabe-teilnahme"
|
||||
]
|
||||
},
|
||||
"status": "provisional",
|
||||
"updated_at": "2026-06-07T09:13:20Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
@@ -0,0 +1,76 @@
|
||||
{
|
||||
"created_at": "2026-06-07T09:13:20Z",
|
||||
"distribution_ready": true,
|
||||
"id": "sp-problem-schema_thrash-schema_load",
|
||||
"name": "problem: schema thrash",
|
||||
"polarity": "problem",
|
||||
"problem": "problem: schema thrash",
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 79.0,
|
||||
"cross_flavor": false,
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"frequency": 8,
|
||||
"key": "problem:schema_thrash:schema_load",
|
||||
"locus": "schema_load",
|
||||
"polarity": "problem",
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"citation-evidence",
|
||||
"flex-auth",
|
||||
"infospace-bench",
|
||||
"ops-bridge",
|
||||
"vergabe-teilnahme"
|
||||
],
|
||||
"score": 632.0,
|
||||
"sessions": [
|
||||
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||
"claude:63fd4df2-5add-4748-af21-c1544825e006",
|
||||
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
|
||||
"claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
|
||||
],
|
||||
"signal_type": "schema_thrash",
|
||||
"title": "problem: schema thrash"
|
||||
},
|
||||
"promoted_at": "2026-06-07T09:13:20Z",
|
||||
"source_key": "problem:schema_thrash:schema_load"
|
||||
},
|
||||
"rendering_hints": {
|
||||
"claude": {
|
||||
"note": "TODO: refine rendering",
|
||||
"target": "CLAUDE.md"
|
||||
}
|
||||
},
|
||||
"resolutions": [
|
||||
{
|
||||
"detail": "",
|
||||
"steps": [],
|
||||
"summary": "TODO: capture the recommended resolution"
|
||||
}
|
||||
],
|
||||
"schema_version": 1,
|
||||
"scope": {
|
||||
"domains": [],
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"citation-evidence",
|
||||
"flex-auth",
|
||||
"infospace-bench",
|
||||
"ops-bridge",
|
||||
"vergabe-teilnahme"
|
||||
]
|
||||
},
|
||||
"status": "approved",
|
||||
"updated_at": "2026-06-07T09:13:20Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
83
session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
Normal file
83
session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
Normal file
@@ -0,0 +1,83 @@
|
||||
{
|
||||
"created_at": "2026-06-07T09:13:20Z",
|
||||
"distribution_ready": true,
|
||||
"id": "sp-problem-tool_thrash-tool-bash",
|
||||
"name": "problem: tool thrash",
|
||||
"polarity": "problem",
|
||||
"problem": "problem: tool thrash",
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 1990.0,
|
||||
"cross_flavor": false,
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"frequency": 11,
|
||||
"key": "problem:tool_thrash:tool:Bash",
|
||||
"locus": "tool:Bash",
|
||||
"polarity": "problem",
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"artifact-store",
|
||||
"citation-evidence",
|
||||
"ihp-railiance-probe",
|
||||
"infospace-bench",
|
||||
"railiance-apps",
|
||||
"state-hub",
|
||||
"vergabe-teilnahme"
|
||||
],
|
||||
"score": 21890.0,
|
||||
"sessions": [
|
||||
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
|
||||
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
|
||||
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
|
||||
"claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8",
|
||||
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
|
||||
],
|
||||
"signal_type": "tool_thrash",
|
||||
"title": "problem: tool thrash"
|
||||
},
|
||||
"promoted_at": "2026-06-07T09:13:20Z",
|
||||
"source_key": "problem:tool_thrash:tool:Bash"
|
||||
},
|
||||
"rendering_hints": {
|
||||
"claude": {
|
||||
"note": "TODO: refine rendering",
|
||||
"target": "CLAUDE.md"
|
||||
}
|
||||
},
|
||||
"resolutions": [
|
||||
{
|
||||
"detail": "",
|
||||
"steps": [],
|
||||
"summary": "TODO: capture the recommended resolution"
|
||||
}
|
||||
],
|
||||
"schema_version": 1,
|
||||
"scope": {
|
||||
"domains": [],
|
||||
"flavors": [
|
||||
"claude"
|
||||
],
|
||||
"repos": [
|
||||
"activity-core",
|
||||
"artifact-store",
|
||||
"citation-evidence",
|
||||
"ihp-railiance-probe",
|
||||
"infospace-bench",
|
||||
"railiance-apps",
|
||||
"state-hub",
|
||||
"vergabe-teilnahme"
|
||||
]
|
||||
},
|
||||
"status": "approved",
|
||||
"updated_at": "2026-06-07T09:13:20Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"created_at": "2026-06-07T08:02:03Z",
|
||||
"created_at": "2026-06-07T09:13:20Z",
|
||||
"distribution_ready": true,
|
||||
"id": "sp-success-clean_pass-outcome",
|
||||
"name": "cross-flavor success: clean pass",
|
||||
@@ -8,13 +8,13 @@
|
||||
"provenance": {
|
||||
"detected_at": null,
|
||||
"evidence": {
|
||||
"cost_impact": 20.0,
|
||||
"cost_impact": 17.0,
|
||||
"cross_flavor": true,
|
||||
"flavors": [
|
||||
"claude",
|
||||
"grok"
|
||||
],
|
||||
"frequency": 20,
|
||||
"frequency": 17,
|
||||
"key": "success:clean_pass:outcome",
|
||||
"locus": "outcome",
|
||||
"polarity": "success",
|
||||
@@ -32,13 +32,12 @@
|
||||
"the-custodian",
|
||||
"vergabe-teilnahme"
|
||||
],
|
||||
"score": 600.0,
|
||||
"score": 433.5,
|
||||
"sessions": [
|
||||
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||
"claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8",
|
||||
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
|
||||
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||
"claude:39dd33b1-d156-4d6a-8c33-c359b6f841d8",
|
||||
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
|
||||
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||
"claude:631de76e-fdee-43b5-b091-7b7675467ad1",
|
||||
@@ -46,8 +45,6 @@
|
||||
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||
"claude:99e9c5af-043f-4b97-8d92-14189da8716b",
|
||||
"claude:a7b4a9b3-0942-4899-b502-e76b0013fc42",
|
||||
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
|
||||
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
|
||||
"claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6",
|
||||
@@ -58,7 +55,7 @@
|
||||
"signal_type": "clean_pass",
|
||||
"title": "cross-flavor success: clean pass"
|
||||
},
|
||||
"promoted_at": "2026-06-07T08:02:03Z",
|
||||
"promoted_at": "2026-06-07T09:13:20Z",
|
||||
"source_key": "success:clean_pass:outcome"
|
||||
},
|
||||
"rendering_hints": {
|
||||
@@ -101,6 +98,6 @@
|
||||
]
|
||||
},
|
||||
"status": "approved",
|
||||
"updated_at": "2026-06-07T08:02:03Z",
|
||||
"updated_at": "2026-06-07T09:13:20Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
|
||||
@@ -91,9 +91,75 @@ def sig_error_then_recovery(digest, ctx) -> list[Signal]:
|
||||
return []
|
||||
|
||||
|
||||
# --- tool-mix / infrastructure-overhead signals (WP-0005 T02) ----------------
|
||||
# These read the captured ``tool_histogram`` — friction that the outcome+marker
|
||||
# signals above are blind to (sessions still "succeed", just expensively).
|
||||
|
||||
def tool_bucket(tool: str) -> str:
|
||||
"""Group a tool name into a coarse activity bucket (flavor-agnostic)."""
|
||||
if tool.startswith("mcp__state-hub"):
|
||||
return "statehub_mcp"
|
||||
if tool in ("TaskUpdate", "TaskCreate", "TaskGet", "TaskList", "TaskOutput",
|
||||
"TaskStop", "todo_write", "update_task_status"):
|
||||
return "task_mgmt"
|
||||
if tool == "ToolSearch":
|
||||
return "schema_load"
|
||||
if tool in ("Bash", "run_terminal_command"):
|
||||
return "shell"
|
||||
if tool in ("Edit", "Write", "search_replace", "write", "NotebookEdit"):
|
||||
return "edit"
|
||||
if tool in ("Read", "read_file", "grep", "Grep", "glob", "Glob"):
|
||||
return "read"
|
||||
return "other"
|
||||
|
||||
|
||||
def _bucketed(digest) -> tuple[dict, int]:
|
||||
buckets: dict[str, int] = {}
|
||||
for tool, n in (digest.get("tool_histogram") or {}).items():
|
||||
buckets[tool_bucket(tool)] = buckets.get(tool_bucket(tool), 0) + n
|
||||
return buckets, sum(buckets.values())
|
||||
|
||||
|
||||
def sig_infra_overhead(digest, ctx) -> list[Signal]:
|
||||
"""Problem: a large share of tool calls is hub/task/schema plumbing, not work."""
|
||||
buckets, total = _bucketed(digest)
|
||||
if total < ctx.get("infra_min_calls", 20):
|
||||
return []
|
||||
overhead = buckets.get("statehub_mcp", 0) + buckets.get("task_mgmt", 0) + buckets.get("schema_load", 0)
|
||||
share = overhead / total
|
||||
if share >= ctx.get("infra_overhead_threshold", 0.30):
|
||||
return [_base(digest, "infra_overhead", PROBLEM, "infra_overhead", round(share, 3),
|
||||
overhead_calls=overhead, total_calls=total,
|
||||
statehub=buckets.get("statehub_mcp", 0),
|
||||
task_mgmt=buckets.get("task_mgmt", 0),
|
||||
schema_load=buckets.get("schema_load", 0))]
|
||||
return []
|
||||
|
||||
|
||||
def sig_schema_thrash(digest, ctx) -> list[Signal]:
|
||||
"""Problem: repeated ToolSearch — deferred-tool schemas reloaded over and over."""
|
||||
buckets, _ = _bucketed(digest)
|
||||
n = buckets.get("schema_load", 0)
|
||||
if n >= ctx.get("schema_thrash_threshold", 5):
|
||||
return [_base(digest, "schema_thrash", PROBLEM, "schema_load", float(n), tool_searches=n)]
|
||||
return []
|
||||
|
||||
|
||||
def sig_tool_thrash(digest, ctx) -> list[Signal]:
|
||||
"""Problem: a single tool is hammered far more than any other — likely churn."""
|
||||
hist = digest.get("tool_histogram") or {}
|
||||
if not hist:
|
||||
return []
|
||||
tool, n = max(hist.items(), key=lambda kv: kv[1])
|
||||
if n >= ctx.get("tool_thrash_threshold", 80):
|
||||
return [_base(digest, "tool_thrash", PROBLEM, f"tool:{tool}", float(n), tool=tool, calls=n)]
|
||||
return []
|
||||
|
||||
|
||||
EXTRACTORS: list[Callable] = [
|
||||
sig_retry_storm, sig_repeated_errors, sig_budget_overrun, sig_abandoned,
|
||||
sig_clean_pass, sig_error_then_recovery,
|
||||
sig_infra_overhead, sig_schema_thrash, sig_tool_thrash,
|
||||
]
|
||||
|
||||
|
||||
@@ -104,7 +170,12 @@ def build_context(digests: list[dict]) -> dict[str, Any]:
|
||||
for d in digests
|
||||
)
|
||||
p90 = totals[int(0.9 * (len(totals) - 1))] if totals else 0
|
||||
return {"tokens_p90": p90, "retry_storm_threshold": 3, "error_threshold": 3}
|
||||
return {
|
||||
"tokens_p90": p90, "retry_storm_threshold": 3, "error_threshold": 3,
|
||||
# tool-mix / infra-overhead thresholds (WP-0005 T02)
|
||||
"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
|
||||
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80,
|
||||
}
|
||||
|
||||
|
||||
def extract_signals(digests: list[dict], ctx: Optional[dict] = None) -> list[Signal]:
|
||||
|
||||
80
tests/test_detect_infra_signals.py
Normal file
80
tests/test_detect_infra_signals.py
Normal file
@@ -0,0 +1,80 @@
|
||||
"""Infra-overhead + thrash signal tests (WP-0005 T02)."""
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from session_memory.detect.signals import ( # noqa: E402
|
||||
build_context,
|
||||
extract_signals,
|
||||
sig_infra_overhead,
|
||||
sig_schema_thrash,
|
||||
sig_tool_thrash,
|
||||
tool_bucket,
|
||||
)
|
||||
|
||||
|
||||
def _digest(uid="claude:a", repo="r1", tools=None):
|
||||
return {"session_uid": uid, "flavor": "claude", "repo": repo, "outcome": "success",
|
||||
"cost": {"input_tokens": 1, "output_tokens": 1},
|
||||
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
|
||||
"tool_histogram": tools or {}}
|
||||
|
||||
|
||||
CTX = {"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
|
||||
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80}
|
||||
|
||||
|
||||
def test_tool_bucket_mapping():
|
||||
assert tool_bucket("mcp__state-hub__update_task_status") == "statehub_mcp"
|
||||
assert tool_bucket("ToolSearch") == "schema_load"
|
||||
assert tool_bucket("TaskUpdate") == "task_mgmt"
|
||||
assert tool_bucket("Bash") == "shell"
|
||||
assert tool_bucket("Edit") == "edit"
|
||||
|
||||
|
||||
def test_infra_overhead_fires_above_share():
|
||||
# 18 statehub of 30 total = 60% overhead
|
||||
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4})
|
||||
sig = sig_infra_overhead(d, CTX)
|
||||
assert sig and sig[0].type == "infra_overhead"
|
||||
assert sig[0].magnitude >= 0.30
|
||||
assert sig[0].detail["statehub"] == 18
|
||||
|
||||
|
||||
def test_infra_overhead_quiet_when_mostly_work():
|
||||
d = _digest(tools={"mcp__state-hub__create_task": 3, "Bash": 40, "Edit": 30})
|
||||
assert sig_infra_overhead(d, CTX) == []
|
||||
|
||||
|
||||
def test_infra_overhead_ignores_tiny_sessions():
|
||||
d = _digest(tools={"mcp__state-hub__create_task": 5}) # below infra_min_calls
|
||||
assert sig_infra_overhead(d, CTX) == []
|
||||
|
||||
|
||||
def test_schema_thrash_fires():
|
||||
d = _digest(tools={"ToolSearch": 9, "Bash": 5})
|
||||
sig = sig_schema_thrash(d, CTX)
|
||||
assert sig and sig[0].type == "schema_thrash"
|
||||
assert sig[0].detail["tool_searches"] == 9
|
||||
|
||||
|
||||
def test_tool_thrash_fires_on_dominant_tool():
|
||||
d = _digest(tools={"Bash": 120, "Edit": 5})
|
||||
sig = sig_tool_thrash(d, CTX)
|
||||
assert sig and sig[0].locus == "tool:Bash"
|
||||
|
||||
|
||||
def test_extract_signals_includes_infra():
|
||||
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4,
|
||||
"ToolSearch": 6})
|
||||
types = {s.type for s in extract_signals([d])}
|
||||
assert "infra_overhead" in types
|
||||
assert "schema_thrash" in types
|
||||
|
||||
|
||||
def test_build_context_has_infra_defaults():
|
||||
ctx = build_context([])
|
||||
assert ctx["infra_overhead_threshold"] == 0.30
|
||||
assert ctx["schema_thrash_threshold"] == 5
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Coding Session Memory — Detect Hardening (quality filter + infra signals)"
|
||||
domain: helix_forge
|
||||
repo: agentic-resources
|
||||
status: ready
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: helix-forge
|
||||
created: "2026-06-07"
|
||||
@@ -52,7 +52,7 @@ sessions — fixing the `abandoned` false-positive. Knobs under `[detect]` in
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0005-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "10d57b05-a731-4ece-bf45-f6a98ac77555"
|
||||
```
|
||||
@@ -69,7 +69,7 @@ thresholds configurable. Unit-tested.
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0005-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "8b9d029a-60d0-4caf-af62-4fcc9c9a645c"
|
||||
```
|
||||
|
||||
80
workplans/AGENTIC-WP-0006-error-body-mining.md
Normal file
80
workplans/AGENTIC-WP-0006-error-body-mining.md
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
id: AGENTIC-WP-0006
|
||||
type: workplan
|
||||
title: "Coding Session Memory — Error-Body Mining (content-level root causes)"
|
||||
domain: helix_forge
|
||||
repo: agentic-resources
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: helix-forge
|
||||
created: "2026-06-07"
|
||||
updated: "2026-06-07"
|
||||
state_hub_workstream_id: "c6e44147-15fd-4cfa-ab2d-87461a6858f1"
|
||||
---
|
||||
|
||||
# Coding Session Memory — Error-Body Mining
|
||||
|
||||
The friction assessment ([ASSESSMENT-infra-friction.md](../docs/ASSESSMENT-infra-friction.md))
|
||||
can see *that* a session was expensive (tool-mix, cost, overhead share) but not
|
||||
always *why* at the content level — the specific error messages and repeated
|
||||
failed approaches. The digest captures tool histograms and prompt/response
|
||||
snippets, but **not error-body text**. This workplan closes that gap so Detect can
|
||||
surface recurring root-cause errors, not just coarse markers.
|
||||
|
||||
Approach: capture **normalized error fingerprints + samples into the durable Tier 2
|
||||
digest** (raw Tier 1 blobs are evictable, so mining must persist into the digest),
|
||||
then cluster recurring fingerprints across sessions into candidate problem
|
||||
patterns through the existing clusterer. No new capture source — this reads the
|
||||
event/blob bodies already ingested.
|
||||
|
||||
## Capture Error-Body Snippets into the Digest
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0006-T01
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "136a0a73-61c2-4390-876c-de3880a967e6"
|
||||
```
|
||||
|
||||
Extend `core/digest.py` `build_digest` to extract, from failed events
|
||||
(`kind=error` and `tool_result` bodies matching the existing `_FAIL_HINTS`), a
|
||||
**normalized fingerprint** (strip paths, line numbers, UUIDs, hex) plus a short
|
||||
sample, stored as `digest["error_snippets"] = [{fingerprint, sample, count, tool}]`.
|
||||
Same error across a session collapses to one fingerprint with a count. Durable in
|
||||
Tier 2 (survives Tier 1 eviction). Bump `SCHEMA_VERSION`. Unit-tested on synthetic
|
||||
sessions with repeated and varied errors.
|
||||
|
||||
## Recurring-Error Signal + Clustering
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0006-T02
|
||||
status: todo
|
||||
priority: high
|
||||
state_hub_task_id: "1a41b6f5-48bc-4080-bd18-94f2186ef566"
|
||||
```
|
||||
|
||||
Add `detect/signals.py` `sig_recurring_error` keyed on the error fingerprint, so
|
||||
the same error recurring across sessions/repos/flavors clusters into a candidate
|
||||
problem pattern (locus = fingerprint; magnitude = occurrences). Feeds the existing
|
||||
clusterer + cross-flavor flagging, so a root-cause error common to multiple flavors
|
||||
is flagged cross-flavor. Respects the WP-0005 quality filter. Unit-tested on
|
||||
synthetic digests sharing a fingerprint.
|
||||
|
||||
## Re-run Live, Extend Friction Assessment with Root Causes
|
||||
|
||||
```task
|
||||
id: AGENTIC-WP-0006-T03
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "bed16d23-3971-4257-b066-d1e639fef150"
|
||||
```
|
||||
|
||||
Re-ingest (to populate `error_snippets` — schema bump invalidates old digests) and
|
||||
re-run detect over the real local sessions. Add a **"content-level root causes"**
|
||||
section to [ASSESSMENT-infra-friction.md](../docs/ASSESSMENT-infra-friction.md):
|
||||
top recurring error fingerprints with counts and affected repos/flavors. Full suite
|
||||
green. After workplan updates, notify the operator to run from `~/state-hub`:
|
||||
|
||||
```bash
|
||||
make fix-consistency REPO=agentic-resources
|
||||
```
|
||||
Reference in New Issue
Block a user