generated from coulomb/repo-seed
Three workplans queued and registered with the State Hub (via REST — MCP write layer is erroring this session): - AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain. - AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding. - AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend. Proceeding in that order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
69 lines
2.2 KiB
Markdown
69 lines
2.2 KiB
Markdown
---
|
||
id: AGENTIC-WP-0009
|
||
type: workplan
|
||
title: "Coding Session Memory — Phase 4 (Measure: effectiveness + fleet trend)"
|
||
domain: helix_forge
|
||
repo: agentic-resources
|
||
status: ready
|
||
owner: codex
|
||
topic_slug: helix-forge
|
||
created: "2026-06-07"
|
||
updated: "2026-06-07"
|
||
state_hub_workstream_id: "99f1d836-3be0-40e5-9f17-63d3ecc5fcca"
|
||
---
|
||
|
||
# Coding Session Memory — Phase 4 (Measure)
|
||
|
||
Implements **Measure** (PRD §6.5, FR-M1–FR-M3) — the loop-closer. After patterns
|
||
are distributed (Phase 3) and changes land (e.g. the State Hub skill
|
||
[STATE-WP-0058] and the Read-before-Edit reflex
|
||
[AGENTIC-WP-0008](AGENTIC-WP-0008-read-before-edit-reflex.md)), Measure answers:
|
||
**did it actually help?**
|
||
|
||
Reuses what is already captured — WP-0005 tool buckets, WP-0006 error mining — so
|
||
this is computation over existing digests, not new capture.
|
||
|
||
## Baseline Metrics Module + Persisted Baseline
|
||
|
||
```task
|
||
id: AGENTIC-WP-0009-T01
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "e5c2016a-2d51-4382-a013-7153e053e8ed"
|
||
```
|
||
|
||
`session_memory/measure/metrics.py`: compute fleet metrics over real sessions
|
||
(infra-overhead share, error rate, recurring-error count, schema-thrash, cost
|
||
percentiles) and persist a **timestamped baseline snapshot**. Reuses
|
||
`detect.signals.tool_bucket` and the digest `error_snippets`. Unit-tested.
|
||
|
||
## Before/After Per-Pattern Effectiveness
|
||
|
||
```task
|
||
id: AGENTIC-WP-0009-T02
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "aa097a00-3462-41da-a137-67e1d61d8d33"
|
||
```
|
||
|
||
Given a change/pattern with an applied-at date, compare sessions **after** it
|
||
against the pre-change baseline (cost, error rate, infra-overhead, success) to
|
||
surface **per-pattern effectiveness** so ineffective patterns can be revised or
|
||
retired (FR-M1/FR-M2). Unit-tested.
|
||
|
||
## Fleet-Trend Report + Entrypoint + Tests
|
||
|
||
```task
|
||
id: AGENTIC-WP-0009-T03
|
||
status: todo
|
||
priority: medium
|
||
state_hub_task_id: "f1147d59-2fb7-4d35-baec-b8f001bb9d62"
|
||
```
|
||
|
||
`python -m session_memory.measure`: fleet-level trend (is the median session
|
||
getting cheaper / more reliable over time, FR-M3) plus per-pattern effectiveness;
|
||
markdown + JSON. Document in `session_memory/README.md`. After updates, notify the
|
||
operator to run `make fix-consistency REPO=agentic-resources`.
|
||
|
||
[STATE-WP-0058]: handed off to the state-hub repo worker
|