From 48618293b092774ec2c51338dc97457622a5f7ed Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 7 Jun 2026 11:18:27 +0200 Subject: [PATCH] session-memory: friction assessment + hardened catalog (WP-0005 T03) Re-ran ingest->detect with the quality filter + infra signals over real local sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls; ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2; recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops. Workplan finished; suite 88/88. Co-Authored-By: Claude Opus 4.8 --- docs/ASSESSMENT-infra-friction.md | 100 ++++++++++++++++++ .../catalog/sp-problem-abandoned-outcome.json | 79 -------------- .../sp-problem-budget_overrun-tokens.json | 31 ++---- ...problem-infra_overhead-infra_overhead.json | 62 +++++++++++ .../sp-problem-schema_thrash-schema_load.json | 76 +++++++++++++ .../sp-problem-tool_thrash-tool-bash.json | 83 +++++++++++++++ .../sp-success-clean_pass-outcome.json | 15 ++- workplans/AGENTIC-WP-0005-detect-hardening.md | 4 +- 8 files changed, 338 insertions(+), 112 deletions(-) create mode 100644 docs/ASSESSMENT-infra-friction.md delete mode 100644 session_memory/catalog/sp-problem-abandoned-outcome.json create mode 100644 session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json create mode 100644 session_memory/catalog/sp-problem-schema_thrash-schema_load.json create mode 100644 session_memory/catalog/sp-problem-tool_thrash-tool-bash.json diff --git a/docs/ASSESSMENT-infra-friction.md b/docs/ASSESSMENT-infra-friction.md new file mode 100644 index 0000000..10de702 --- /dev/null +++ b/docs/ASSESSMENT-infra-friction.md @@ -0,0 +1,100 @@ +# Infrastructure Friction Assessment + +*Generated 2026-06-07 from captured coding-session data (Helix Forge session +memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven +assessment of where our agentic coding sessions spend effort on plumbing rather +than work.* + +## Method & data quality + +- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter + ([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs + (mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.** +- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by + the outcome heuristic and produced a *false-positive* "cross-flavor abandoned" + pattern in the first catalog — now purged. Treat any pre-hardening finding with + suspicion. +- **Key framing:** all 27 real sessions ended in `success`. So the friction here + is **cost/efficiency, not failure** — sessions get there, but pay an avoidable + tax to do it. + +## The headline number + +Across the 27 real sessions, tool-call activity breaks down as: + +| Bucket | Share | +|--------|------:| +| shell (Bash / run_terminal) | 38.2 % | +| edit | 30.2 % | +| read | 12.9 % | +| **State Hub MCP** | **10.3 %** | +| **task-management plumbing** | **5.8 %** | +| **schema-loading (`ToolSearch`)** | **1.5 %** | +| other | 1.1 % | + +**~17.6 % of all tool calls in real coding sessions are coordination plumbing +(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead +share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly. + +## Ranked friction + +### 1. State Hub call volume — *highest cost, addressable* +State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions: + +| Repo (one session) | total calls | State Hub calls | overhead share | +|--------------------|------:|------:|------:| +| vergabe-teilnahme | 570 | **231** | 43 % | +| activity-core | 488 | 98 | 23 % | +| flex-auth | 236 | 35 (+27 task) | 29 % | +| net-kingdom | 129 | 25 | 22 % | + +Root cause: many **fine-grained** calls — per-task status updates, per-event +progress writes, repeated `get_domain_summary`. 231 hub calls in a single session +is coordination overhead, not work. + +### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix* +**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP +tools are *deferred*, so nearly every session re-discovers and re-loads the same +tool schemas before it can call them. This is pure overhead with no work value — +and it is **exactly the CLI/MCP-interface friction hypothesized.** + +### 3. Task-management plumbing — 5.8 % +`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with +(1); much of it is redundant status churn within a session. + +### 4. Tool thrash — *session-shape, watch only* +11 sessions hammer a single tool 80–230× (usually Bash or Edit). Less an infra +problem than a sign of missing higher-level tooling; low priority. + +### 5. Budget overrun — 3 sessions +Token cost well above peers. Secondary; revisit once (1)–(2) are addressed. + +## Recommendations + +**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor +issue.** Two high-ROI moves: + +- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest) + that (i) **front-loads the common hub tool schemas** so agents stop + `ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of + sessions) — and (ii) **teaches batched writes** (sync N task statuses in one + call, fewer progress events) to attack finding #1. Low effort, broad reach. +- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan + statuses" op so a session doesn't make 200+ individual hub calls. This is the + structural fix behind the skill's guidance. +- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead + share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %). + This is precisely what the Measure phase is for — the loop closes here. + +## What this assessment still can't see + +- **Why** a session was expensive at the *content* level (specific error + messages, repeated failed approaches) — the digest captures tool histograms and + prompt/response snippets but not error-body text. Mining tool-result bodies for + recurring failure messages is the natural next extension if root-cause depth is + needed. +- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor + friction claims are Claude-weighted for now. + +[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md +[detect/quality.py]: ../session_memory/detect/quality.py diff --git a/session_memory/catalog/sp-problem-abandoned-outcome.json b/session_memory/catalog/sp-problem-abandoned-outcome.json deleted file mode 100644 index 9a6a651..0000000 --- a/session_memory/catalog/sp-problem-abandoned-outcome.json +++ /dev/null @@ -1,79 +0,0 @@ -{ - "created_at": "2026-06-07T08:02:03Z", - "distribution_ready": true, - "id": "sp-problem-abandoned-outcome", - "name": "cross-flavor problem: abandoned", - "polarity": "problem", - "problem": "cross-flavor problem: abandoned", - "provenance": { - "detected_at": null, - "evidence": { - "cost_impact": 13.0, - "cross_flavor": true, - "flavors": [ - "claude", - "grok" - ], - "frequency": 13, - "key": "problem:abandoned:outcome", - "locus": "outcome", - "polarity": "problem", - "repos": [ - "can-you-assist", - "llm-connect" - ], - "score": 253.5, - "sessions": [ - "claude:0510d5f4-956d-430a-9e89-6abc54f95b6a", - "claude:106fd234-949e-470d-a208-fe5ed8f14562", - "claude:377aba4f-8bbf-4760-90e9-469486ab0518", - "claude:4c606c31-beff-4a41-a325-ef63c9f8fb0e", - "claude:5bffe081-39fb-44cd-9966-4006f9235a0e", - "claude:60d3c947-eacf-49e9-b12c-ff8eb6b1c20b", - "claude:8f50f5b4-fbc4-4abe-9a7c-b25b2a713671", - "claude:95b1fe00-5d2e-482f-9618-fddf9cdbeb51", - "claude:c3e782ad-96b9-4cf1-9eb5-defdf3578426", - "claude:d75b2084-faec-40cf-aaf8-d7e0c026bde6", - "claude:f282058a-0a43-4fb8-87fc-1e67eaa3533c", - "grok:019e6103-af11-7a92-8e0b-5f40465d8223", - "grok:019e611e-0728-77d3-bb7a-8c5983e5058a" - ], - "signal_type": "abandoned", - "title": "cross-flavor problem: abandoned" - }, - "promoted_at": "2026-06-07T08:02:03Z", - "source_key": "problem:abandoned:outcome" - }, - "rendering_hints": { - "claude": { - "note": "TODO: refine rendering", - "target": "CLAUDE.md" - }, - "grok": { - "note": "TODO: refine rendering", - "target": "instructions" - } - }, - "resolutions": [ - { - "detail": "", - "steps": [], - "summary": "TODO: capture the recommended resolution" - } - ], - "schema_version": 1, - "scope": { - "domains": [], - "flavors": [ - "claude", - "grok" - ], - "repos": [ - "can-you-assist", - "llm-connect" - ] - }, - "status": "approved", - "updated_at": "2026-06-07T08:02:03Z", - "version": "1.0.0" -} diff --git a/session_memory/catalog/sp-problem-budget_overrun-tokens.json b/session_memory/catalog/sp-problem-budget_overrun-tokens.json index a451e42..c4dddf7 100644 --- a/session_memory/catalog/sp-problem-budget_overrun-tokens.json +++ b/session_memory/catalog/sp-problem-budget_overrun-tokens.json @@ -1,5 +1,5 @@ { - "created_at": "2026-06-07T08:02:03Z", + "created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-budget_overrun-tokens", "name": "problem: budget overrun", @@ -8,39 +8,30 @@ "provenance": { "detected_at": null, "evidence": { - "cost_impact": 27.135, + "cost_impact": 10.667, "cross_flavor": false, "flavors": [ "claude" ], - "frequency": 8, + "frequency": 3, "key": "problem:budget_overrun:tokens", "locus": "tokens", "polarity": "problem", "repos": [ - "activity-core", "artifact-store", "citation-evidence", - "flex-auth", - "infospace-bench", - "railiance-apps", - "vergabe-teilnahme" + "infospace-bench" ], - "score": 217.08, + "score": 32.001, "sessions": [ "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", - "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", - "claude:8313f946-f008-4e98-9915-31950380e39e", - "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", - "claude:a7b4a9b3-0942-4899-b502-e76b0013fc42", - "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", - "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212" + "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78" ], "signal_type": "budget_overrun", "title": "problem: budget overrun" }, - "promoted_at": "2026-06-07T08:02:03Z", + "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:budget_overrun:tokens" }, "rendering_hints": { @@ -63,16 +54,12 @@ "claude" ], "repos": [ - "activity-core", "artifact-store", "citation-evidence", - "flex-auth", - "infospace-bench", - "railiance-apps", - "vergabe-teilnahme" + "infospace-bench" ] }, "status": "approved", - "updated_at": "2026-06-07T08:02:03Z", + "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0" } diff --git a/session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json b/session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json new file mode 100644 index 0000000..475a0ed --- /dev/null +++ b/session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json @@ -0,0 +1,62 @@ +{ + "created_at": "2026-06-07T09:13:20Z", + "distribution_ready": false, + "id": "sp-problem-infra_overhead-infra_overhead", + "name": "problem: infra overhead", + "polarity": "problem", + "problem": "problem: infra overhead", + "provenance": { + "detected_at": null, + "evidence": { + "cost_impact": 0.801, + "cross_flavor": false, + "flavors": [ + "claude" + ], + "frequency": 2, + "key": "problem:infra_overhead:infra_overhead", + "locus": "infra_overhead", + "polarity": "problem", + "repos": [ + "markitect-main", + "vergabe-teilnahme" + ], + "score": 1.602, + "sessions": [ + "claude:135002f9-98d2-4d1b-b8fb-543b20388782", + "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74" + ], + "signal_type": "infra_overhead", + "title": "problem: infra overhead" + }, + "promoted_at": "2026-06-07T09:13:20Z", + "source_key": "problem:infra_overhead:infra_overhead" + }, + "rendering_hints": { + "claude": { + "note": "TODO: refine rendering", + "target": "CLAUDE.md" + } + }, + "resolutions": [ + { + "detail": "", + "steps": [], + "summary": "TODO: capture the recommended resolution" + } + ], + "schema_version": 1, + "scope": { + "domains": [], + "flavors": [ + "claude" + ], + "repos": [ + "markitect-main", + "vergabe-teilnahme" + ] + }, + "status": "provisional", + "updated_at": "2026-06-07T09:13:20Z", + "version": "1.0.0" +} diff --git a/session_memory/catalog/sp-problem-schema_thrash-schema_load.json b/session_memory/catalog/sp-problem-schema_thrash-schema_load.json new file mode 100644 index 0000000..fe3f3d6 --- /dev/null +++ b/session_memory/catalog/sp-problem-schema_thrash-schema_load.json @@ -0,0 +1,76 @@ +{ + "created_at": "2026-06-07T09:13:20Z", + "distribution_ready": true, + "id": "sp-problem-schema_thrash-schema_load", + "name": "problem: schema thrash", + "polarity": "problem", + "problem": "problem: schema thrash", + "provenance": { + "detected_at": null, + "evidence": { + "cost_impact": 79.0, + "cross_flavor": false, + "flavors": [ + "claude" + ], + "frequency": 8, + "key": "problem:schema_thrash:schema_load", + "locus": "schema_load", + "polarity": "problem", + "repos": [ + "activity-core", + "citation-evidence", + "flex-auth", + "infospace-bench", + "ops-bridge", + "vergabe-teilnahme" + ], + "score": 632.0, + "sessions": [ + "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", + "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", + "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", + "claude:63fd4df2-5add-4748-af21-c1544825e006", + "claude:8313f946-f008-4e98-9915-31950380e39e", + "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", + "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", + "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212" + ], + "signal_type": "schema_thrash", + "title": "problem: schema thrash" + }, + "promoted_at": "2026-06-07T09:13:20Z", + "source_key": "problem:schema_thrash:schema_load" + }, + "rendering_hints": { + "claude": { + "note": "TODO: refine rendering", + "target": "CLAUDE.md" + } + }, + "resolutions": [ + { + "detail": "", + "steps": [], + "summary": "TODO: capture the recommended resolution" + } + ], + "schema_version": 1, + "scope": { + "domains": [], + "flavors": [ + "claude" + ], + "repos": [ + "activity-core", + "citation-evidence", + "flex-auth", + "infospace-bench", + "ops-bridge", + "vergabe-teilnahme" + ] + }, + "status": "approved", + "updated_at": "2026-06-07T09:13:20Z", + "version": "1.0.0" +} diff --git a/session_memory/catalog/sp-problem-tool_thrash-tool-bash.json b/session_memory/catalog/sp-problem-tool_thrash-tool-bash.json new file mode 100644 index 0000000..23059f7 --- /dev/null +++ b/session_memory/catalog/sp-problem-tool_thrash-tool-bash.json @@ -0,0 +1,83 @@ +{ + "created_at": "2026-06-07T09:13:20Z", + "distribution_ready": true, + "id": "sp-problem-tool_thrash-tool-bash", + "name": "problem: tool thrash", + "polarity": "problem", + "problem": "problem: tool thrash", + "provenance": { + "detected_at": null, + "evidence": { + "cost_impact": 1990.0, + "cross_flavor": false, + "flavors": [ + "claude" + ], + "frequency": 11, + "key": "problem:tool_thrash:tool:Bash", + "locus": "tool:Bash", + "polarity": "problem", + "repos": [ + "activity-core", + "artifact-store", + "citation-evidence", + "ihp-railiance-probe", + "infospace-bench", + "railiance-apps", + "state-hub", + "vergabe-teilnahme" + ], + "score": 21890.0, + "sessions": [ + "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", + "claude:2c0d14e1-d089-4076-bf35-b134737a261d", + "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", + "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", + "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", + "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", + "claude:8313f946-f008-4e98-9915-31950380e39e", + "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", + "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", + "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8", + "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74" + ], + "signal_type": "tool_thrash", + "title": "problem: tool thrash" + }, + "promoted_at": "2026-06-07T09:13:20Z", + "source_key": "problem:tool_thrash:tool:Bash" + }, + "rendering_hints": { + "claude": { + "note": "TODO: refine rendering", + "target": "CLAUDE.md" + } + }, + "resolutions": [ + { + "detail": "", + "steps": [], + "summary": "TODO: capture the recommended resolution" + } + ], + "schema_version": 1, + "scope": { + "domains": [], + "flavors": [ + "claude" + ], + "repos": [ + "activity-core", + "artifact-store", + "citation-evidence", + "ihp-railiance-probe", + "infospace-bench", + "railiance-apps", + "state-hub", + "vergabe-teilnahme" + ] + }, + "status": "approved", + "updated_at": "2026-06-07T09:13:20Z", + "version": "1.0.0" +} diff --git a/session_memory/catalog/sp-success-clean_pass-outcome.json b/session_memory/catalog/sp-success-clean_pass-outcome.json index 6501853..9dd0208 100644 --- a/session_memory/catalog/sp-success-clean_pass-outcome.json +++ b/session_memory/catalog/sp-success-clean_pass-outcome.json @@ -1,5 +1,5 @@ { - "created_at": "2026-06-07T08:02:03Z", + "created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-success-clean_pass-outcome", "name": "cross-flavor success: clean pass", @@ -8,13 +8,13 @@ "provenance": { "detected_at": null, "evidence": { - "cost_impact": 20.0, + "cost_impact": 17.0, "cross_flavor": true, "flavors": [ "claude", "grok" ], - "frequency": 20, + "frequency": 17, "key": "success:clean_pass:outcome", "locus": "outcome", "polarity": "success", @@ -32,13 +32,12 @@ "the-custodian", "vergabe-teilnahme" ], - "score": 600.0, + "score": 433.5, "sessions": [ "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", - "claude:39dd33b1-d156-4d6a-8c33-c359b6f841d8", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:631de76e-fdee-43b5-b091-7b7675467ad1", @@ -46,8 +45,6 @@ "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", - "claude:99e9c5af-043f-4b97-8d92-14189da8716b", - "claude:a7b4a9b3-0942-4899-b502-e76b0013fc42", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6", @@ -58,7 +55,7 @@ "signal_type": "clean_pass", "title": "cross-flavor success: clean pass" }, - "promoted_at": "2026-06-07T08:02:03Z", + "promoted_at": "2026-06-07T09:13:20Z", "source_key": "success:clean_pass:outcome" }, "rendering_hints": { @@ -101,6 +98,6 @@ ] }, "status": "approved", - "updated_at": "2026-06-07T08:02:03Z", + "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0" } diff --git a/workplans/AGENTIC-WP-0005-detect-hardening.md b/workplans/AGENTIC-WP-0005-detect-hardening.md index 7f4fd07..420a596 100644 --- a/workplans/AGENTIC-WP-0005-detect-hardening.md +++ b/workplans/AGENTIC-WP-0005-detect-hardening.md @@ -4,7 +4,7 @@ type: workplan title: "Coding Session Memory — Detect Hardening (quality filter + infra signals)" domain: helix_forge repo: agentic-resources -status: ready +status: finished owner: codex topic_slug: helix-forge created: "2026-06-07" @@ -69,7 +69,7 @@ thresholds configurable. Unit-tested. ```task id: AGENTIC-WP-0005-T03 -status: todo +status: done priority: high state_hub_task_id: "8b9d029a-60d0-4caf-af62-4fcc9c9a645c" ```