Files

tegwick 04fdc249f5 Bridge Coach memory brief with project metrics summaries.

Add Performance Summary block to memory brief, document metrics synthesis in
agent-coach, and add e2e and CLI tests for qualitative plus quantitative briefs.

2026-06-16 01:46:51 +02:00

11 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, state_hub_workstream_id, created, updated

id	type	title	domain	repo	status	owner	topic_slug	state_hub_workstream_id	created	updated
KAIZEN-WP-0003	workplan	Measurement Loop: Metrics Convention, Collection, and Optimizer Integration	custodian	kaizen-agentic	active	kaizen-agentic	custodian	36252a45-f360-4496-bf77-17b5dfb02767	2026-06-16	2026-06-17

KAIZEN-WP-0003 — Measurement Loop: Metrics Convention, Collection, and Optimizer Integration

Status: active Owner: kaizen-agentic Repo: kaizen-agentic Target version: 1.1.0 (partial; remainder in WP-0001)

Goal

Close the kaizen feedback loop defined in INTENT.md and wiki/AgentKaizenOptimizer.md: agents produce measurable, per-execution performance records stored in project-scoped .kaizen/metrics/, the existing OptimizationLoop reads that data and generates evidence-based recommendations, and the Coach/optimizer meta-agents share a single improvement path.

This workplan addresses the P0 gap from the INTENT gap analysis: strategic vision (memory + qualitative learning) exists; quantitative measurement → refinement does not.

Background

Layer	State
`INTENT.md`	Requires measurable-by-default agents and evidence-based refinement
`wiki/KaizenAgentTemplate.md`	Defines `metrics`, `idempotency`, `optimization` sections per agent
`wiki/AgentKaizenOptimizer.md`	Specifies `.kaizen/metrics/` storage and optimizer behaviour
`src/kaizen_agentic/optimization.py`	`OptimizationLoop` + `PerformanceMetrics` implemented, unit-tested, unwired
Agency framework (WP-0002)	`.kaizen/agents/<name>/memory.md` + Coach brief — qualitative only
WP-0001 T04	Telemetry — overlaps; WP-0003 defines the convention; WP-0001 can adopt it

Part 1 — Metrics Convention and Storage

Define the project-scoped metrics artifact alongside the existing memory convention (ADR-002).

Location convention

<project-root>/.kaizen/metrics/<agent-name>/
  executions.jsonl          # append-only per-execution records
  summary.json              # rolling aggregates (regenerated on write)

Optimizer-specific aggregates (per wiki/AgentKaizenOptimizer.md):

<project-root>/.kaizen/metrics/optimizer/
  analysis.json             # last run output + fingerprint
  recommendations.jsonl     # append-only recommendation history

Execution record schema (minimum viable)

{
  "timestamp": "ISO-8601",
  "agent": "tdd-workflow",
  "session_id": "optional-uuid-or-hash",
  "execution_time_s": 0.0,
  "success": true,
  "quality_score": 0.0,
  "primary_metric": { "name": "...", "value": 0.0, "target": 0.0 },
  "metadata": {}
}

Tasks

T01 — Write ADR-004: project metrics convention (location, schema, lifecycle, retention, Helix Forge correlation)
T02 — Implement MetricsStore in src/kaizen_agentic/metrics.py (append, read, summarise, prune by retention)
T03 — Add memory init hook to scaffold .kaizen/metrics/<agent>/ alongside memory (optional flag --no-metrics)
T04 — Unit tests for MetricsStore (append idempotency key, summary regeneration, retention prune)

Definition of done

ADR-004 accepted and referenced from docs/agency-framework.md
MetricsStore passes unit tests
kaizen-agentic memory init <agent> creates metrics scaffold by default

Part 2 — Metrics CLI

Expose metrics collection and inspection without requiring Python imports in agent sessions.

Commands

kaizen-agentic metrics record <agent>   # Append one execution record (stdin JSON or flags)
kaizen-agentic metrics show <agent>     # Print summary + recent executions
kaizen-agentic metrics list             # List agents with metrics in current project
kaizen-agentic metrics export <agent>   # Dump executions.jsonl to stdout

Options (record)

--target / -t — project root (default: cwd)
--success / --failure — boolean outcome shorthand
--time — execution time in seconds
--quality — quality score 0.0–1.0
--json — full record on stdin

Tasks

T05 — Implement metrics CLI command group (record, show, list, export)
T06 — Integrate metrics record into session-close protocol template for pilot agents
T07 — CLI tests for metrics commands (click.testing, temp project dir)
T08 — Update docs/CLI_CHEAT_SHEET.md and docs/agency-framework.md with metrics section

Definition of done

All four metrics commands work against a test project with .kaizen/metrics/
Session-close template documents the metrics record one-liner for pilot agents
CLI cheat sheet updated

Part 3 — Wire OptimizationLoop to Project Metrics

Connect the existing Python optimization infrastructure to real project data.

Tasks

T09 — Add OptimizationLoop.from_metrics_store(store) factory that loads PerformanceMetrics from executions
T10 — Implement kaizen-agentic metrics optimize [agent] — run analysis, print recommendations, write optimizer/analysis.json
T11 — Consolidate agent-optimization.md and agent-agent-optimization.md into single canonical optimization agent; update registry
T12 — Update agent-optimization.md session protocol to invoke metrics optimize and reference ADR-004
T13 — Unit + integration tests: synthetic executions → recommendations → non-empty output

Definition of done

kaizen-agentic metrics optimize produces recommendations when ≥10 execution records exist (per wiki minimum sample size)
Single canonical optimization meta-agent in registry
Tests cover insufficient-data and sufficient-data paths

Part 4 — Bridge Coach, Memory, and Metrics

Unify qualitative memory and quantitative metrics in the orientation path.

Tasks

T14 — Extend memory brief to include metrics summary for target agent (recent success rate, avg quality, trend arrow)
T15 — Extend agent-coach.md to reference metrics context in synthesis instructions
T16 — E2e test: populate memory + metrics for two agents → memory brief includes both qualitative and quantitative sections

Definition of done

memory brief tdd-workflow output includes a ## Performance Summary block when metrics exist
E2e test passes

Part 5 — Pilot Agent and Template Conformance

Prove the loop end-to-end on one agent before fleet-wide rollout.

Pilot agent: tdd-workflow (high usage, clear success criteria in existing prompt)

Tasks

T17 — Add metrics section to agent-tdd-workflow.md frontmatter (primary: test-pass rate; secondary: cycle time)
T18 — Add session-close step: invoke kaizen-agentic metrics record tdd-workflow with session outcome
T19 — Document pilot in wiki/AboutKaizenAgents.md as reference implementation
T20 — E2e test: two simulated tdd-workflow sessions → metrics accumulate → optimize produces recommendation

Definition of done

tdd-workflow is the documented reference for metrics-enabled agents
Full loop demonstrated in e2e test: record → show → optimize → brief

Part 6 — Packaging and Orientation

Close distribution and documentation gaps surfaced in gap analysis.

Tasks

T21 — Sync missing 4 agents into src/kaizen_agentic/data/agents/ (coach, sys-medic, scope-analyst, optimization)
T22 — Update README.md Getting Oriented to link INTENT.md and wiki/ (SCOPE.md already updated)
T23 — Update .claude/rules/architecture.md agent table (21 agents, meta category, sys-medic, coach)
T24 — CHANGELOG.md entry for metrics convention and CLI

Definition of done

pip install / packaged data includes all 21 agents
README orientation path matches SCOPE.md
architecture.md agent count accurate

Sequencing

Part 1 (T01–T04)  ──→  Part 2 (T05–T08)  ──→  Part 3 (T09–T13)
                                                    │
                     Part 4 (T14–T16)  ←────────────┘
                            │
                     Part 5 (T17–T20)  ──→  Part 6 (T21–T24)

Parts 1–2 are blocking. Part 3 depends on storage + CLI. Parts 4–5 can overlap once Part 3 factory exists. Part 6 can run in parallel except T21 (needs final agent consolidation from T11).

Estimated effort: 4–6 sessions.

Out of Scope (this workplan)

Full wiki/KaizenAgentTemplate.md conformance for all 21 agents (future workplan)
KaizenGuidance codemod pipeline (wiki/KaizenGuidance.md)
Scheduled/automated optimizer runs (cron, activity-core integration) — convention only
WP-0001 CI/CD, PyPI publication, cross-platform testing
ML-based pattern detection (pandas/sklearn in wiki spec) — simple statistics first

Success Criteria

A reader of INTENT.md can point to this repo and say:

Agents can record measurable per-execution outcomes in a standard location.
The optimization loop does read real project data and produce recommendations.
Coach orientation includes performance context, not only qualitative memory.
At least one agent (tdd-workflow) demonstrates the full measure → analyse → orient cycle.

State Hub Task IDs

Code	UUID
T01	4e7b0fd2-38c0-46aa-84a7-bb18366b8c7c
T02	eeaa99c7-d7a7-403b-a013-364cba45a663
T03	247c097f-de89-4383-930c-35ee66de9b36
T04	3aa14026-6ee3-4384-b409-11300c1302f0
T05	6b505d29-7d2e-44a2-a4b7-1fe82884390c
T06	84f2a357-f2dd-4fc7-96b6-a4e80d5467a7
T07	8e9ee64b-b7c4-4dff-ac6e-988fd47ef95d
T08	4c41e0db-d5d8-4a1b-b346-06ad004edf4a
T09	0b374439-6eca-4754-8e15-2a7eece0cd27
T10	db87a09b-0252-495c-a771-a43b4b98f820
T11	73cb7d73-6fc6-42a9-97aa-d33cdf9ee363
T12	c127eca7-7394-42db-ba5e-721aef0ccb76
T13	f208dc9f-cdf7-47e3-9c03-09097e46eee9
T14	d01f969c-bbb1-4eca-a4f1-d79d5c867b35
T15	67f791a4-fced-4986-a331-7eb4ea47fe6e
T16	1fb89b54-8bd2-40bf-9a71-04693cb9f695
T17	1d471a7a-9a98-4805-903e-b4a2b8153717
T18	abb387f1-86ce-4b9b-a516-2d4efb6aca4c
T19	67fbc26e-a57d-4133-96e6-3d2cdbd10dc0
T20	fbdd7c8b-e122-48d9-8c8f-de9f82d025e3
T21	9662bcec-34fe-451b-b61f-5d11b9574576
T22	422aae43-5697-4a00-86e9-1569baf09422
T23	ba6b3411-d330-4a58-8cd0-62b4fbef8c5f
T24	748be9f3-f6ac-4f26-a844-6330268935b6

Hub workstream: kaizen-wp-0003-measurement-loop (36252a45-f360-4496-bf77-17b5dfb02767)

Notes

Retention default: 180 days (per wiki/AgentKaizenOptimizer.md); override via project config in a later iteration
WP-0001 T04 (telemetry) should consume ADR-004 schema rather than inventing a parallel format
OptimizationLoop threshold constants (30s execution, 0.8 success rate) are starting points; expose in config later

11 KiB Raw Blame History Unescape Escape

KAIZEN-WP-0003 — Measurement Loop: Metrics Convention, Collection, and Optimizer Integration

Goal

Background

Part 1 — Metrics Convention and Storage

Location convention

Execution record schema (minimum viable)

Tasks

Definition of done

Part 2 — Metrics CLI

Commands

Options (record)

Tasks

Definition of done

Part 3 — Wire OptimizationLoop to Project Metrics

Tasks

Definition of done

Part 4 — Bridge Coach, Memory, and Metrics

Tasks

Definition of done

Part 5 — Pilot Agent and Template Conformance

Tasks

Definition of done

Part 6 — Packaging and Orientation

Tasks

Definition of done

Sequencing

Out of Scope (this workplan)

Success Criteria

State Hub Task IDs

Notes

11 KiB

Raw Blame History