WP-0003 Part 5: tdd-workflow metrics pilot

Add metrics frontmatter and session-close recording to tdd-workflow, document the reference implementation in wiki/AboutKaizenAgents.md, and add an e2e test covering record → show → optimize → brief.
2026-06-16 01:48:43 +02:00
parent 04fdc249f5
commit fd2edfbe6c
4 changed files with 231 additions and 17 deletions
--- a/wiki/AboutKaizenAgents.md
+++ b/wiki/AboutKaizenAgents.md
@@ -1,24 +1,76 @@
-AboutKaizenAgents
+# About Kaizen Agents

-*Basic concepts of Kaizen Agents*
+Basic concepts of Kaizen Agents.

-All Kaizen Agents follow the KaizenAgentTemplateDefinition 
+All Kaizen Agents follow the [KaizenAgentTemplate](KaizenAgentTemplate.md) definition.
+That template provides a comprehensive structure for defining Kaizen Agent subagents.

-This template provides a comprehensive structure for defining KaizenAgent subagents. 
+Key sections:

-The key sections are:
+- **Specification** — declarative outcomes rather than implementation steps
+- **Idempotency design** — detect and handle already-completed work
+- **Metrics** — measurable success criteria from day one
+- **Testing** — scenarios that feed the optimization loop
+- **Evolution tracking** — improvement history and performance trends

-Specification: Focuses on declarative outcomes rather than implementation steps, making agents more maintainable and testable.
+The template enforces separation of concerns, testability, and measurability while
+keeping agent definitions consistent across the fleet.

-Idempotency Design: Forces you to think upfront about how the agent will detect and handle already-completed work.
+---

-Metrics: Ensures every agent has measurable success criteria from day one.
+## Metrics-enabled pilot: `tdd-workflow`

-Testing: Built-in test scenarios that can be automated as part of the optimization loop.
+`tdd-workflow` is the reference implementation for project-scoped metrics (WP-0003).
+Use it as a template when adding metrics to other agents.

-Evolution Tracking: Maintains a history of improvements and provides hooks for the KaizenAgent to analyze performance trends.
+### What is measured

-The template enforces our design principles  - separation of concerns, testability, and measurability - while providing enough structure to ensure consistency across different coding subagents.
+| Metric | Role | How |
+|--------|------|-----|
+| `test_pass_rate` | Primary | Passing tests ÷ total tests at PUBLISH (target: 1.0) |
+| `cycle_time_s` | Secondary | Session duration (`execution_time_s` in ADR-004) |

+Definitions live in the agent frontmatter (`agents/agent-tdd-workflow.md`).

-xxx
+### Where data lives
+
+```
+<project>/.kaizen/metrics/tdd-workflow/
+  executions.jsonl    # append-only per-session records
+  summary.json        # rolling aggregates (auto-generated)
+```
+
+Scaffolded by `kaizen-agentic memory init tdd-workflow` alongside
+`.kaizen/agents/tdd-workflow/memory.md`.
+
+### Session-close loop
+
+At the end of each TDD8 session:
+
+1. Update qualitative memory (`## Session Log`, findings, watch points).
+2. Record quantitative outcome:
+
+```bash
+kaizen-agentic metrics record tdd-workflow --success --time <seconds> --quality <0.0-1.0>
+```
+
+Or pass a full ADR-004 record with `primary_metric` via `--json` (see agent spec).
+
+### Analysis and orientation
+
+| Command | Purpose |
+|---------|---------|
+| `kaizen-agentic metrics show tdd-workflow` | Summary + recent executions |
+| `kaizen-agentic metrics optimize tdd-workflow` | Evidence-based recommendations (≥10 records) |
+| `kaizen-agentic memory brief tdd-workflow` | Qualitative memory + `## Performance Summary` |
+
+Fleet-level session analytics remain in **agentic-resources** (Helix Forge); project
+metrics stay in `.kaizen/metrics/` per [ADR-004](../docs/adr/ADR-004-project-metrics-convention.md)
+and [EcosystemIntegration](EcosystemIntegration.md).
+
+### Adopting metrics on another agent
+
+1. Add a `metrics:` block to frontmatter (primary + secondary + collection).
+2. Copy the session-close `metrics record` step from `agent-tdd-workflow.md`.
+3. Run `kaizen-agentic memory init <agent>` to scaffold storage.
+4. Verify with `metrics show` after one session.