session-memory: friction assessment + hardened catalog (WP-0005 T03)

Re-ran ingest->detect with the quality filter + infra signals over real local sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls; ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2; recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops. Workplan finished; suite 88/88. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:18:27 +02:00
parent 21c714e286
commit 48618293b0
8 changed files with 338 additions and 112 deletions
--- a/docs/ASSESSMENT-infra-friction.md
+++ b/docs/ASSESSMENT-infra-friction.md
@@ -0,0 +1,100 @@
+# Infrastructure Friction Assessment
+
+*Generated 2026-06-07 from captured coding-session data (Helix Forge session
+memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven
+assessment of where our agentic coding sessions spend effort on plumbing rather
+than work.*
+
+## Method & data quality
+
+- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter
+  ([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs
+  (mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.**
+- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by
+  the outcome heuristic and produced a *false-positive* "cross-flavor abandoned"
+  pattern in the first catalog — now purged. Treat any pre-hardening finding with
+  suspicion.
+- **Key framing:** all 27 real sessions ended in `success`. So the friction here
+  is **cost/efficiency, not failure** — sessions get there, but pay an avoidable
+  tax to do it.
+
+## The headline number
+
+Across the 27 real sessions, tool-call activity breaks down as:
+
+| Bucket | Share |
+|--------|------:|
+| shell (Bash / run_terminal) | 38.2 % |
+| edit | 30.2 % |
+| read | 12.9 % |
+| **State Hub MCP** | **10.3 %** |
+| **task-management plumbing** | **5.8 %** |
+| **schema-loading (`ToolSearch`)** | **1.5 %** |
+| other | 1.1 % |
+
+**~17.6 % of all tool calls in real coding sessions are coordination plumbing
+(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead
+share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly.
+
+## Ranked friction
+
+### 1. State Hub call volume — *highest cost, addressable*
+State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions:
+
+| Repo (one session) | total calls | State Hub calls | overhead share |
+|--------------------|------:|------:|------:|
+| vergabe-teilnahme | 570 | **231** | 43 % |
+| activity-core | 488 | 98 | 23 % |
+| flex-auth | 236 | 35 (+27 task) | 29 % |
+| net-kingdom | 129 | 25 | 22 % |
+
+Root cause: many **fine-grained** calls — per-task status updates, per-event
+progress writes, repeated `get_domain_summary`. 231 hub calls in a single session
+is coordination overhead, not work.
+
+### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix*
+**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP
+tools are *deferred*, so nearly every session re-discovers and re-loads the same
+tool schemas before it can call them. This is pure overhead with no work value —
+and it is **exactly the CLI/MCP-interface friction hypothesized.**
+
+### 3. Task-management plumbing — 5.8 %
+`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with
+(1); much of it is redundant status churn within a session.
+
+### 4. Tool thrash — *session-shape, watch only*
+11 sessions hammer a single tool 80–230× (usually Bash or Edit). Less an infra
+problem than a sign of missing higher-level tooling; low priority.
+
+### 5. Budget overrun — 3 sessions
+Token cost well above peers. Secondary; revisit once (1)–(2) are addressed.
+
+## Recommendations
+
+**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor
+issue.** Two high-ROI moves:
+
+- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest)
+  that (i) **front-loads the common hub tool schemas** so agents stop
+  `ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of
+  sessions) — and (ii) **teaches batched writes** (sync N task statuses in one
+  call, fewer progress events) to attack finding #1. Low effort, broad reach.
+- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan
+  statuses" op so a session doesn't make 200+ individual hub calls. This is the
+  structural fix behind the skill's guidance.
+- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead
+  share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
+  This is precisely what the Measure phase is for — the loop closes here.
+
+## What this assessment still can't see
+
+- **Why** a session was expensive at the *content* level (specific error
+  messages, repeated failed approaches) — the digest captures tool histograms and
+  prompt/response snippets but not error-body text. Mining tool-result bodies for
+  recurring failure messages is the natural next extension if root-cause depth is
+  needed.
+- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
+  friction claims are Claude-weighted for now.
+
+[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
+[detect/quality.py]: ../session_memory/detect/quality.py
--- a/session_memory/catalog/sp-problem-abandoned-outcome.json
+++ b/session_memory/catalog/sp-problem-abandoned-outcome.json
@@ -1,79 +0,0 @@
-{
-  "created_at": "2026-06-07T08:02:03Z",
-  "distribution_ready": true,
-  "id": "sp-problem-abandoned-outcome",
-  "name": "cross-flavor problem: abandoned",
-  "polarity": "problem",
-  "problem": "cross-flavor problem: abandoned",
-  "provenance": {
-    "detected_at": null,
-    "evidence": {
-      "cost_impact": 13.0,
-      "cross_flavor": true,
-      "flavors": [
-        "claude",
-        "grok"
-      ],
-      "frequency": 13,
-      "key": "problem:abandoned:outcome",
-      "locus": "outcome",
-      "polarity": "problem",
-      "repos": [
-        "can-you-assist",
-        "llm-connect"
-      ],
-      "score": 253.5,
-      "sessions": [
-        "claude:0510d5f4-956d-430a-9e89-6abc54f95b6a",
-        "claude:106fd234-949e-470d-a208-fe5ed8f14562",
-        "claude:377aba4f-8bbf-4760-90e9-469486ab0518",
-        "claude:4c606c31-beff-4a41-a325-ef63c9f8fb0e",
-        "claude:5bffe081-39fb-44cd-9966-4006f9235a0e",
-        "claude:60d3c947-eacf-49e9-b12c-ff8eb6b1c20b",
-        "claude:8f50f5b4-fbc4-4abe-9a7c-b25b2a713671",
-        "claude:95b1fe00-5d2e-482f-9618-fddf9cdbeb51",
-        "claude:c3e782ad-96b9-4cf1-9eb5-defdf3578426",
-        "claude:d75b2084-faec-40cf-aaf8-d7e0c026bde6",
-        "claude:f282058a-0a43-4fb8-87fc-1e67eaa3533c",
-        "grok:019e6103-af11-7a92-8e0b-5f40465d8223",
-        "grok:019e611e-0728-77d3-bb7a-8c5983e5058a"
-      ],
-      "signal_type": "abandoned",
-      "title": "cross-flavor problem: abandoned"
-    },
-    "promoted_at": "2026-06-07T08:02:03Z",
-    "source_key": "problem:abandoned:outcome"
-  },
-  "rendering_hints": {
-    "claude": {
-      "note": "TODO: refine rendering",
-      "target": "CLAUDE.md"
-    },
-    "grok": {
-      "note": "TODO: refine rendering",
-      "target": "instructions"
-    }
-  },
-  "resolutions": [
-    {
-      "detail": "",
-      "steps": [],
-      "summary": "TODO: capture the recommended resolution"
-    }
-  ],
-  "schema_version": 1,
-  "scope": {
-    "domains": [],
-    "flavors": [
-      "claude",
-      "grok"
-    ],
-    "repos": [
-      "can-you-assist",
-      "llm-connect"
-    ]
-  },
-  "status": "approved",
-  "updated_at": "2026-06-07T08:02:03Z",
-  "version": "1.0.0"
-}
--- a/session_memory/catalog/sp-problem-budget_overrun-tokens.json
+++ b/session_memory/catalog/sp-problem-budget_overrun-tokens.json
@@ -1,5 +1,5 @@
 {
-  "created_at": "2026-06-07T08:02:03Z",
+  "created_at": "2026-06-07T09:13:20Z",
  "distribution_ready": true,
  "id": "sp-problem-budget_overrun-tokens",
  "name": "problem: budget overrun",
@@ -8,39 +8,30 @@
  "provenance": {
    "detected_at": null,
    "evidence": {
-      "cost_impact": 27.135,
+      "cost_impact": 10.667,
      "cross_flavor": false,
      "flavors": [
        "claude"
      ],
-      "frequency": 8,
+      "frequency": 3,
      "key": "problem:budget_overrun:tokens",
      "locus": "tokens",
      "polarity": "problem",
      "repos": [
-        "activity-core",
        "artifact-store",
        "citation-evidence",
-        "flex-auth",
-        "infospace-bench",
-        "railiance-apps",
-        "vergabe-teilnahme"
+        "infospace-bench"
      ],
-      "score": 217.08,
+      "score": 32.001,
      "sessions": [
        "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
-        "claude:2c0d14e1-d089-4076-bf35-b134737a261d",
        "claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
-        "claude:8313f946-f008-4e98-9915-31950380e39e",
-        "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
-        "claude:a7b4a9b3-0942-4899-b502-e76b0013fc42",
-        "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
-        "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
+        "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"
      ],
      "signal_type": "budget_overrun",
      "title": "problem: budget overrun"
    },
-    "promoted_at": "2026-06-07T08:02:03Z",
+    "promoted_at": "2026-06-07T09:13:20Z",
    "source_key": "problem:budget_overrun:tokens"
  },
  "rendering_hints": {
@@ -63,16 +54,12 @@
      "claude"
    ],
    "repos": [
-      "activity-core",
      "artifact-store",
      "citation-evidence",
-      "flex-auth",
-      "infospace-bench",
-      "railiance-apps",
-      "vergabe-teilnahme"
+      "infospace-bench"
    ]
  },
  "status": "approved",
-  "updated_at": "2026-06-07T08:02:03Z",
+  "updated_at": "2026-06-07T09:13:20Z",
  "version": "1.0.0"
 }
--- a/session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json
+++ b/session_memory/catalog/sp-problem-infra_overhead-infra_overhead.json
@@ -0,0 +1,62 @@
+{
+  "created_at": "2026-06-07T09:13:20Z",
+  "distribution_ready": false,
+  "id": "sp-problem-infra_overhead-infra_overhead",
+  "name": "problem: infra overhead",
+  "polarity": "problem",
+  "problem": "problem: infra overhead",
+  "provenance": {
+    "detected_at": null,
+    "evidence": {
+      "cost_impact": 0.801,
+      "cross_flavor": false,
+      "flavors": [
+        "claude"
+      ],
+      "frequency": 2,
+      "key": "problem:infra_overhead:infra_overhead",
+      "locus": "infra_overhead",
+      "polarity": "problem",
+      "repos": [
+        "markitect-main",
+        "vergabe-teilnahme"
+      ],
+      "score": 1.602,
+      "sessions": [
+        "claude:135002f9-98d2-4d1b-b8fb-543b20388782",
+        "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
+      ],
+      "signal_type": "infra_overhead",
+      "title": "problem: infra overhead"
+    },
+    "promoted_at": "2026-06-07T09:13:20Z",
+    "source_key": "problem:infra_overhead:infra_overhead"
+  },
+  "rendering_hints": {
+    "claude": {
+      "note": "TODO: refine rendering",
+      "target": "CLAUDE.md"
+    }
+  },
+  "resolutions": [
+    {
+      "detail": "",
+      "steps": [],
+      "summary": "TODO: capture the recommended resolution"
+    }
+  ],
+  "schema_version": 1,
+  "scope": {
+    "domains": [],
+    "flavors": [
+      "claude"
+    ],
+    "repos": [
+      "markitect-main",
+      "vergabe-teilnahme"
+    ]
+  },
+  "status": "provisional",
+  "updated_at": "2026-06-07T09:13:20Z",
+  "version": "1.0.0"
+}
--- a/session_memory/catalog/sp-problem-schema_thrash-schema_load.json
+++ b/session_memory/catalog/sp-problem-schema_thrash-schema_load.json
@@ -0,0 +1,76 @@
+{
+  "created_at": "2026-06-07T09:13:20Z",
+  "distribution_ready": true,
+  "id": "sp-problem-schema_thrash-schema_load",
+  "name": "problem: schema thrash",
+  "polarity": "problem",
+  "problem": "problem: schema thrash",
+  "provenance": {
+    "detected_at": null,
+    "evidence": {
+      "cost_impact": 79.0,
+      "cross_flavor": false,
+      "flavors": [
+        "claude"
+      ],
+      "frequency": 8,
+      "key": "problem:schema_thrash:schema_load",
+      "locus": "schema_load",
+      "polarity": "problem",
+      "repos": [
+        "activity-core",
+        "citation-evidence",
+        "flex-auth",
+        "infospace-bench",
+        "ops-bridge",
+        "vergabe-teilnahme"
+      ],
+      "score": 632.0,
+      "sessions": [
+        "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
+        "claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
+        "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
+        "claude:63fd4df2-5add-4748-af21-c1544825e006",
+        "claude:8313f946-f008-4e98-9915-31950380e39e",
+        "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
+        "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
+        "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
+      ],
+      "signal_type": "schema_thrash",
+      "title": "problem: schema thrash"
+    },
+    "promoted_at": "2026-06-07T09:13:20Z",
+    "source_key": "problem:schema_thrash:schema_load"
+  },
+  "rendering_hints": {
+    "claude": {
+      "note": "TODO: refine rendering",
+      "target": "CLAUDE.md"
+    }
+  },
+  "resolutions": [
+    {
+      "detail": "",
+      "steps": [],
+      "summary": "TODO: capture the recommended resolution"
+    }
+  ],
+  "schema_version": 1,
+  "scope": {
+    "domains": [],
+    "flavors": [
+      "claude"
+    ],
+    "repos": [
+      "activity-core",
+      "citation-evidence",
+      "flex-auth",
+      "infospace-bench",
+      "ops-bridge",
+      "vergabe-teilnahme"
+    ]
+  },
+  "status": "approved",
+  "updated_at": "2026-06-07T09:13:20Z",
+  "version": "1.0.0"
+}
--- a/session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
+++ b/session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
@@ -0,0 +1,83 @@
+{
+  "created_at": "2026-06-07T09:13:20Z",
+  "distribution_ready": true,
+  "id": "sp-problem-tool_thrash-tool-bash",
+  "name": "problem: tool thrash",
+  "polarity": "problem",
+  "problem": "problem: tool thrash",
+  "provenance": {
+    "detected_at": null,
+    "evidence": {
+      "cost_impact": 1990.0,
+      "cross_flavor": false,
+      "flavors": [
+        "claude"
+      ],
+      "frequency": 11,
+      "key": "problem:tool_thrash:tool:Bash",
+      "locus": "tool:Bash",
+      "polarity": "problem",
+      "repos": [
+        "activity-core",
+        "artifact-store",
+        "citation-evidence",
+        "ihp-railiance-probe",
+        "infospace-bench",
+        "railiance-apps",
+        "state-hub",
+        "vergabe-teilnahme"
+      ],
+      "score": 21890.0,
+      "sessions": [
+        "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
+        "claude:2c0d14e1-d089-4076-bf35-b134737a261d",
+        "claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
+        "claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
+        "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
+        "claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
+        "claude:8313f946-f008-4e98-9915-31950380e39e",
+        "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
+        "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
+        "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8",
+        "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
+      ],
+      "signal_type": "tool_thrash",
+      "title": "problem: tool thrash"
+    },
+    "promoted_at": "2026-06-07T09:13:20Z",
+    "source_key": "problem:tool_thrash:tool:Bash"
+  },
+  "rendering_hints": {
+    "claude": {
+      "note": "TODO: refine rendering",
+      "target": "CLAUDE.md"
+    }
+  },
+  "resolutions": [
+    {
+      "detail": "",
+      "steps": [],
+      "summary": "TODO: capture the recommended resolution"
+    }
+  ],
+  "schema_version": 1,
+  "scope": {
+    "domains": [],
+    "flavors": [
+      "claude"
+    ],
+    "repos": [
+      "activity-core",
+      "artifact-store",
+      "citation-evidence",
+      "ihp-railiance-probe",
+      "infospace-bench",
+      "railiance-apps",
+      "state-hub",
+      "vergabe-teilnahme"
+    ]
+  },
+  "status": "approved",
+  "updated_at": "2026-06-07T09:13:20Z",
+  "version": "1.0.0"
+}
--- a/session_memory/catalog/sp-success-clean_pass-outcome.json
+++ b/session_memory/catalog/sp-success-clean_pass-outcome.json
@@ -1,5 +1,5 @@
 {
-  "created_at": "2026-06-07T08:02:03Z",
+  "created_at": "2026-06-07T09:13:20Z",
  "distribution_ready": true,
  "id": "sp-success-clean_pass-outcome",
  "name": "cross-flavor success: clean pass",
@@ -8,13 +8,13 @@
  "provenance": {
    "detected_at": null,
    "evidence": {
-      "cost_impact": 20.0,
+      "cost_impact": 17.0,
      "cross_flavor": true,
      "flavors": [
        "claude",
        "grok"
      ],
-      "frequency": 20,
+      "frequency": 17,
      "key": "success:clean_pass:outcome",
      "locus": "outcome",
      "polarity": "success",
@@ -32,13 +32,12 @@
        "the-custodian",
        "vergabe-teilnahme"
      ],
-      "score": 600.0,
+      "score": 433.5,
      "sessions": [
        "claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
        "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8",
        "claude:2c0d14e1-d089-4076-bf35-b134737a261d",
        "claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
-        "claude:39dd33b1-d156-4d6a-8c33-c359b6f841d8",
        "claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
        "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
        "claude:631de76e-fdee-43b5-b091-7b7675467ad1",
@@ -46,8 +45,6 @@
        "claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
        "claude:8313f946-f008-4e98-9915-31950380e39e",
        "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
-        "claude:99e9c5af-043f-4b97-8d92-14189da8716b",
-        "claude:a7b4a9b3-0942-4899-b502-e76b0013fc42",
        "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
        "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
        "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6",
@@ -58,7 +55,7 @@
      "signal_type": "clean_pass",
      "title": "cross-flavor success: clean pass"
    },
-    "promoted_at": "2026-06-07T08:02:03Z",
+    "promoted_at": "2026-06-07T09:13:20Z",
    "source_key": "success:clean_pass:outcome"
  },
  "rendering_hints": {
@@ -101,6 +98,6 @@
    ]
  },
  "status": "approved",
-  "updated_at": "2026-06-07T08:02:03Z",
+  "updated_at": "2026-06-07T09:13:20Z",
  "version": "1.0.0"
 }
--- a/workplans/AGENTIC-WP-0005-detect-hardening.md
+++ b/workplans/AGENTIC-WP-0005-detect-hardening.md
@@ -4,7 +4,7 @@ type: workplan
 title: "Coding Session Memory — Detect Hardening (quality filter + infra signals)"
 domain: helix_forge
 repo: agentic-resources
-status: ready
+status: finished
 owner: codex
 topic_slug: helix-forge
 created: "2026-06-07"
@@ -69,7 +69,7 @@ thresholds configurable. Unit-tested.

 ```task
 id: AGENTIC-WP-0005-T03
-status: todo
+status: done
 priority: high
 state_hub_task_id: "8b9d029a-60d0-4caf-af62-4fcc9c9a645c"
 ```