session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)

Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate,
schema-thrash, token percentiles, success) + persisted baseline trend. effect.py:
before/after per-pattern effectiveness with an improved verdict per metric.
measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix
baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8.
13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-07 15:49:22 +02:00
parent 035c7a20d3
commit 4f28cd67cf
11 changed files with 497 additions and 5 deletions

View File

@@ -4,7 +4,7 @@ type: workplan
title: "Coding Session Memory — Phase 4 (Measure: effectiveness + fleet trend)"
domain: helix_forge
repo: agentic-resources
status: ready
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
@@ -27,7 +27,7 @@ this is computation over existing digests, not new capture.
```task
id: AGENTIC-WP-0009-T01
status: todo
status: done
priority: high
state_hub_task_id: "e5c2016a-2d51-4382-a013-7153e053e8ed"
```
@@ -41,7 +41,7 @@ percentiles) and persist a **timestamped baseline snapshot**. Reuses
```task
id: AGENTIC-WP-0009-T02
status: todo
status: done
priority: high
state_hub_task_id: "aa097a00-3462-41da-a137-67e1d61d8d33"
```
@@ -55,7 +55,7 @@ retired (FR-M1/FR-M2). Unit-tested.
```task
id: AGENTIC-WP-0009-T03
status: todo
status: done
priority: medium
state_hub_task_id: "f1147d59-2fb7-4d35-baec-b8f001bb9d62"
```