Files
agentic-resources/workplans/AGENTIC-WP-0009-session-memory-phase4.md
tegwick 9e6f8a6e08 Register WP-0007 (Distribute), WP-0008 (Read-before-Edit), WP-0009 (Measure)
Three workplans queued and registered with the State Hub (via REST — MCP write
layer is erroring this session):
- AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render
  approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain.
- AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding.
- AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend.
Proceeding in that order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:03 +02:00

69 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: AGENTIC-WP-0009
type: workplan
title: "Coding Session Memory — Phase 4 (Measure: effectiveness + fleet trend)"
domain: helix_forge
repo: agentic-resources
status: ready
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "99f1d836-3be0-40e5-9f17-63d3ecc5fcca"
---
# Coding Session Memory — Phase 4 (Measure)
Implements **Measure** (PRD §6.5, FR-M1FR-M3) — the loop-closer. After patterns
are distributed (Phase 3) and changes land (e.g. the State Hub skill
[STATE-WP-0058] and the Read-before-Edit reflex
[AGENTIC-WP-0008](AGENTIC-WP-0008-read-before-edit-reflex.md)), Measure answers:
**did it actually help?**
Reuses what is already captured — WP-0005 tool buckets, WP-0006 error mining — so
this is computation over existing digests, not new capture.
## Baseline Metrics Module + Persisted Baseline
```task
id: AGENTIC-WP-0009-T01
status: todo
priority: high
state_hub_task_id: "e5c2016a-2d51-4382-a013-7153e053e8ed"
```
`session_memory/measure/metrics.py`: compute fleet metrics over real sessions
(infra-overhead share, error rate, recurring-error count, schema-thrash, cost
percentiles) and persist a **timestamped baseline snapshot**. Reuses
`detect.signals.tool_bucket` and the digest `error_snippets`. Unit-tested.
## Before/After Per-Pattern Effectiveness
```task
id: AGENTIC-WP-0009-T02
status: todo
priority: high
state_hub_task_id: "aa097a00-3462-41da-a137-67e1d61d8d33"
```
Given a change/pattern with an applied-at date, compare sessions **after** it
against the pre-change baseline (cost, error rate, infra-overhead, success) to
surface **per-pattern effectiveness** so ineffective patterns can be revised or
retired (FR-M1/FR-M2). Unit-tested.
## Fleet-Trend Report + Entrypoint + Tests
```task
id: AGENTIC-WP-0009-T03
status: todo
priority: medium
state_hub_task_id: "f1147d59-2fb7-4d35-baec-b8f001bb9d62"
```
`python -m session_memory.measure`: fleet-level trend (is the median session
getting cheaper / more reliable over time, FR-M3) plus per-pattern effectiveness;
markdown + JSON. Document in `session_memory/README.md`. After updates, notify the
operator to run `make fix-consistency REPO=agentic-resources`.
[STATE-WP-0058]: handed off to the state-hub repo worker