session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)

Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate, schema-thrash, token percentiles, success) + persisted baseline trend. effect.py: before/after per-pattern effectiveness with an improved verdict per metric. measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8. 13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:49:22 +02:00
parent 035c7a20d3
commit 4f28cd67cf
11 changed files with 497 additions and 5 deletions
--- a/workplans/AGENTIC-WP-0009-session-memory-phase4.md
+++ b/workplans/AGENTIC-WP-0009-session-memory-phase4.md
@@ -4,7 +4,7 @@ type: workplan
 title: "Coding Session Memory — Phase 4 (Measure: effectiveness + fleet trend)"
 domain: helix_forge
 repo: agentic-resources
-status: ready
+status: finished
 owner: codex
 topic_slug: helix-forge
 created: "2026-06-07"
@@ -27,7 +27,7 @@ this is computation over existing digests, not new capture.

 ```task
 id: AGENTIC-WP-0009-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "e5c2016a-2d51-4382-a013-7153e053e8ed"
 ```
@@ -41,7 +41,7 @@ percentiles) and persist a **timestamped baseline snapshot**. Reuses

 ```task
 id: AGENTIC-WP-0009-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "aa097a00-3462-41da-a137-67e1d61d8d33"
 ```
@@ -55,7 +55,7 @@ retired (FR-M1/FR-M2). Unit-tested.

 ```task
 id: AGENTIC-WP-0009-T03
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "f1147d59-2fb7-4d35-baec-b8f001bb9d62"
 ```