WP-0003 Part 5: tdd-workflow metrics pilot

Add metrics frontmatter and session-close recording to tdd-workflow,
document the reference implementation in wiki/AboutKaizenAgents.md,
and add an e2e test covering record → show → optimize → brief.
This commit is contained in:
2026-06-16 01:48:43 +02:00
parent 04fdc249f5
commit fd2edfbe6c
4 changed files with 231 additions and 17 deletions

View File

@@ -1,24 +1,76 @@
AboutKaizenAgents
# About Kaizen Agents
*Basic concepts of Kaizen Agents*
Basic concepts of Kaizen Agents.
All Kaizen Agents follow the KaizenAgentTemplateDefinition
All Kaizen Agents follow the [KaizenAgentTemplate](KaizenAgentTemplate.md) definition.
That template provides a comprehensive structure for defining Kaizen Agent subagents.
This template provides a comprehensive structure for defining KaizenAgent subagents.
Key sections:
The key sections are:
- **Specification** — declarative outcomes rather than implementation steps
- **Idempotency design** — detect and handle already-completed work
- **Metrics** — measurable success criteria from day one
- **Testing** — scenarios that feed the optimization loop
- **Evolution tracking** — improvement history and performance trends
Specification: Focuses on declarative outcomes rather than implementation steps, making agents more maintainable and testable.
The template enforces separation of concerns, testability, and measurability while
keeping agent definitions consistent across the fleet.
Idempotency Design: Forces you to think upfront about how the agent will detect and handle already-completed work.
---
Metrics: Ensures every agent has measurable success criteria from day one.
## Metrics-enabled pilot: `tdd-workflow`
Testing: Built-in test scenarios that can be automated as part of the optimization loop.
`tdd-workflow` is the reference implementation for project-scoped metrics (WP-0003).
Use it as a template when adding metrics to other agents.
Evolution Tracking: Maintains a history of improvements and provides hooks for the KaizenAgent to analyze performance trends.
### What is measured
The template enforces our design principles - separation of concerns, testability, and measurability - while providing enough structure to ensure consistency across different coding subagents.
| Metric | Role | How |
|--------|------|-----|
| `test_pass_rate` | Primary | Passing tests ÷ total tests at PUBLISH (target: 1.0) |
| `cycle_time_s` | Secondary | Session duration (`execution_time_s` in ADR-004) |
Definitions live in the agent frontmatter (`agents/agent-tdd-workflow.md`).
xxx
### Where data lives
```
<project>/.kaizen/metrics/tdd-workflow/
executions.jsonl # append-only per-session records
summary.json # rolling aggregates (auto-generated)
```
Scaffolded by `kaizen-agentic memory init tdd-workflow` alongside
`.kaizen/agents/tdd-workflow/memory.md`.
### Session-close loop
At the end of each TDD8 session:
1. Update qualitative memory (`## Session Log`, findings, watch points).
2. Record quantitative outcome:
```bash
kaizen-agentic metrics record tdd-workflow --success --time <seconds> --quality <0.0-1.0>
```
Or pass a full ADR-004 record with `primary_metric` via `--json` (see agent spec).
### Analysis and orientation
| Command | Purpose |
|---------|---------|
| `kaizen-agentic metrics show tdd-workflow` | Summary + recent executions |
| `kaizen-agentic metrics optimize tdd-workflow` | Evidence-based recommendations (≥10 records) |
| `kaizen-agentic memory brief tdd-workflow` | Qualitative memory + `## Performance Summary` |
Fleet-level session analytics remain in **agentic-resources** (Helix Forge); project
metrics stay in `.kaizen/metrics/` per [ADR-004](../docs/adr/ADR-004-project-metrics-convention.md)
and [EcosystemIntegration](EcosystemIntegration.md).
### Adopting metrics on another agent
1. Add a `metrics:` block to frontmatter (primary + secondary + collection).
2. Copy the session-close `metrics record` step from `agent-tdd-workflow.md`.
3. Run `kaizen-agentic memory init <agent>` to scaffold storage.
4. Verify with `metrics show` after one session.