session-memory: session-quality filter (WP-0005 T01)

detect/quality.py: is_real_coding_session drops health-checks / smoke-tests / interrupted / trivially-short sessions (event floor, repo present, substantive tool activity, non-trivial prompt). Wired into run_detect so signals only form over real sessions — fixes the abandoned false-positive. [detect.quality] knobs; existing detect/curate fixtures made realistic. 8 new tests; suite 80/80. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:07:22 +02:00
parent 56b2f576de
commit 70433cda61
7 changed files with 241 additions and 2 deletions
--- a/workplans/AGENTIC-WP-0005-detect-hardening.md
+++ b/workplans/AGENTIC-WP-0005-detect-hardening.md
@@ -0,0 +1,88 @@
+---
+id: AGENTIC-WP-0005
+type: workplan
+title: "Coding Session Memory — Detect Hardening (quality filter + infra signals)"
+domain: helix_forge
+repo: agentic-resources
+status: ready
+owner: codex
+topic_slug: helix-forge
+created: "2026-06-07"
+updated: "2026-06-07"
+state_hub_workstream_id: "d8b7b8d1-1d85-4d2a-8ccd-7b0366a9442d"
+---
+
+# Coding Session Memory — Detect Hardening
+
+A focused hardening pass (call it Phase 1.5) so the Detect output is trustworthy
+enough to drive an **infrastructure assessment**. Triggered by ad-hoc analysis of
+the live store after Phase 2:
+
+- Of **72 captured sessions, only 31 are real coding sessions**; the rest are
+  health-checks / smoke-tests / interrupted runs (mostly `llm-connect` *"Say hello
+  in one word"*). The `abandoned` outcome heuristic mislabels these, and Phase 2
+  cataloged a **false-positive** "cross-flavor abandoned" pattern as
+  `approved`/`distribution_ready`.
+- All 31 real sessions read as `success`, so the current signal set
+  (outcome + markers + cost) surfaces almost no genuine friction.
+- The already-captured `tool_histogram` tells the real story: **~17% of tool
+  activity in real sessions is State Hub MCP + task plumbing + `ToolSearch`
+  schema-loading**, concentrated to 40–70% in some sessions — but `signals.py`
+  never looks at it.
+
+No new capture is needed — this is analysis the data already supports.
+
+## Session-Quality Filter
+
+```task
+id: AGENTIC-WP-0005-T01
+status: done
+priority: high
+state_hub_task_id: "9f8b4304-0a37-4f66-ad34-d93e12fba0d8"
+```
+
+Add `detect/quality.py` with `is_real_coding_session(digest)` that filters out
+health-checks, smoke-tests, interrupted, and trivially-short sessions (event-count
+floor, repo present, substantive edit/tool activity, not a single hello/interrupt
+prompt). Wire it into the detect pipeline so signals/clusters only form over real
+sessions — fixing the `abandoned` false-positive. Knobs under `[detect]` in
+`config.toml`. Unit-tested on synthetic trivial-vs-real digests.
+
+## Infra-Overhead + Thrash Signals
+
+```task
+id: AGENTIC-WP-0005-T02
+status: todo
+priority: high
+state_hub_task_id: "10d57b05-a731-4ece-bf45-f6a98ac77555"
+```
+
+Add `tool_histogram`-based extractors to `detect/signals.py`: a shared tool-bucket
+helper (`shell` / `edit` / `read` / `statehub_mcp` / `task_mgmt` / `schema_load` /
+`other`); `sig_infra_overhead` (PROBLEM when the statehub+task+schema share of tool
+calls exceeds a threshold; magnitude = share; locus `infra_overhead`);
+`sig_schema_thrash` (`ToolSearch` count over threshold; locus `schema_load`);
+`sig_tool_thrash` (extreme single-tool repetition). Pure functions over digests;
+thresholds configurable. Unit-tested.
+
+## Re-run Live, Purge False Positives, Ranked Friction Report
+
+```task
+id: AGENTIC-WP-0005-T03
+status: todo
+priority: high
+state_hub_task_id: "8b9d029a-60d0-4caf-af62-4fcc9c9a645c"
+```
+
+Re-run `ingest → detect` over the real local sessions with the filter + new
+signals. Purge the false-positive catalog entries seeded in Phase 2 (the
+health-check `abandoned` pattern) and re-curate so the catalog reflects real
+friction. Produce a ranked **friction assessment** (`docs/ASSESSMENT-infra-friction.md`)
+of the major infrastructure problems — quantified per repo/flavor, infra-overhead
+share, schema-thrash — with recommendations (incl. the State Hub / MCP skill
+hypothesis). After workplan file updates, notify the operator to run from
+`~/state-hub`:
+
+```bash
+make fix-consistency REPO=agentic-resources
+```