Files
agentic-resources/workplans/AGENTIC-WP-0005-detect-hardening.md
tegwick 48618293b0 session-memory: friction assessment + hardened catalog (WP-0005 T03)
Re-ran ingest->detect with the quality filter + infra signals over real local
sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog
entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead
patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real
tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls;
ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2;
recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops.
Workplan finished; suite 88/88.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:18:27 +02:00

89 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: AGENTIC-WP-0005
type: workplan
title: "Coding Session Memory — Detect Hardening (quality filter + infra signals)"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "d8b7b8d1-1d85-4d2a-8ccd-7b0366a9442d"
---
# Coding Session Memory — Detect Hardening
A focused hardening pass (call it Phase 1.5) so the Detect output is trustworthy
enough to drive an **infrastructure assessment**. Triggered by ad-hoc analysis of
the live store after Phase 2:
- Of **72 captured sessions, only 31 are real coding sessions**; the rest are
health-checks / smoke-tests / interrupted runs (mostly `llm-connect` *"Say hello
in one word"*). The `abandoned` outcome heuristic mislabels these, and Phase 2
cataloged a **false-positive** "cross-flavor abandoned" pattern as
`approved`/`distribution_ready`.
- All 31 real sessions read as `success`, so the current signal set
(outcome + markers + cost) surfaces almost no genuine friction.
- The already-captured `tool_histogram` tells the real story: **~17% of tool
activity in real sessions is State Hub MCP + task plumbing + `ToolSearch`
schema-loading**, concentrated to 4070% in some sessions — but `signals.py`
never looks at it.
No new capture is needed — this is analysis the data already supports.
## Session-Quality Filter
```task
id: AGENTIC-WP-0005-T01
status: done
priority: high
state_hub_task_id: "9f8b4304-0a37-4f66-ad34-d93e12fba0d8"
```
Add `detect/quality.py` with `is_real_coding_session(digest)` that filters out
health-checks, smoke-tests, interrupted, and trivially-short sessions (event-count
floor, repo present, substantive edit/tool activity, not a single hello/interrupt
prompt). Wire it into the detect pipeline so signals/clusters only form over real
sessions — fixing the `abandoned` false-positive. Knobs under `[detect]` in
`config.toml`. Unit-tested on synthetic trivial-vs-real digests.
## Infra-Overhead + Thrash Signals
```task
id: AGENTIC-WP-0005-T02
status: done
priority: high
state_hub_task_id: "10d57b05-a731-4ece-bf45-f6a98ac77555"
```
Add `tool_histogram`-based extractors to `detect/signals.py`: a shared tool-bucket
helper (`shell` / `edit` / `read` / `statehub_mcp` / `task_mgmt` / `schema_load` /
`other`); `sig_infra_overhead` (PROBLEM when the statehub+task+schema share of tool
calls exceeds a threshold; magnitude = share; locus `infra_overhead`);
`sig_schema_thrash` (`ToolSearch` count over threshold; locus `schema_load`);
`sig_tool_thrash` (extreme single-tool repetition). Pure functions over digests;
thresholds configurable. Unit-tested.
## Re-run Live, Purge False Positives, Ranked Friction Report
```task
id: AGENTIC-WP-0005-T03
status: done
priority: high
state_hub_task_id: "8b9d029a-60d0-4caf-af62-4fcc9c9a645c"
```
Re-run `ingest → detect` over the real local sessions with the filter + new
signals. Purge the false-positive catalog entries seeded in Phase 2 (the
health-check `abandoned` pattern) and re-curate so the catalog reflects real
friction. Produce a ranked **friction assessment** (`docs/ASSESSMENT-infra-friction.md`)
of the major infrastructure problems — quantified per repo/flavor, infra-overhead
share, schema-thrash — with recommendations (incl. the State Hub / MCP skill
hypothesis). After workplan file updates, notify the operator to run from
`~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```