Document measurement loop plan and ecosystem integration strategy.
Persist INTENT and ecosystem assessments in history/, add ADR-004 for project metrics with Helix Forge correlation, and register WP-0003 and WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework docs to reflect the two-layer measurement model.
This commit is contained in:
190
docs/adr/ADR-004-project-metrics-convention.md
Normal file
190
docs/adr/ADR-004-project-metrics-convention.md
Normal file
@@ -0,0 +1,190 @@
|
||||
---
|
||||
id: ADR-004
|
||||
title: Project Metrics Convention
|
||||
status: accepted
|
||||
date: "2026-06-16"
|
||||
---
|
||||
|
||||
# ADR-004 — Project Metrics Convention
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
`INTENT.md` requires agents to be measurable, versioned, and optimizable. The
|
||||
agency framework (ADR-002) provides **qualitative** project memory; the kaizen
|
||||
loop needs **quantitative** per-execution records.
|
||||
|
||||
`wiki/AgentKaizenOptimizer.md` specifies `.kaizen/metrics/` storage.
|
||||
`OptimizationLoop` in `src/kaizen_agentic/optimization.py` exists but has no
|
||||
data source.
|
||||
|
||||
Separately, `agentic-resources` (Helix Forge) captures **fleet-level** session
|
||||
metrics from coding agent transcripts. Project metrics and fleet metrics serve
|
||||
different scopes and must correlate without duplicating ingestion logic.
|
||||
|
||||
## Decision
|
||||
|
||||
Each agent deployed into a project may accumulate **project-scoped execution
|
||||
metrics**. Records are append-only JSONL with rolling summaries. The optimizer
|
||||
reads these files to produce evidence-based recommendations.
|
||||
|
||||
### File locations
|
||||
|
||||
Per-agent executions:
|
||||
|
||||
```
|
||||
<project-root>/.kaizen/metrics/<agent-name>/
|
||||
executions.jsonl # append-only per-execution records
|
||||
summary.json # rolling aggregates (regenerated on write)
|
||||
```
|
||||
|
||||
Optimizer outputs:
|
||||
|
||||
```
|
||||
<project-root>/.kaizen/metrics/optimizer/
|
||||
analysis.json # last analysis run + input fingerprint
|
||||
recommendations.jsonl # append-only recommendation history
|
||||
```
|
||||
|
||||
The `.kaizen/metrics/` tree lives alongside `.kaizen/agents/` under the same
|
||||
project-level state directory (ADR-002).
|
||||
|
||||
### Execution record schema (minimum viable)
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-06-16T12:00:00Z",
|
||||
"agent": "tdd-workflow",
|
||||
"session_id": "optional-uuid-or-hash",
|
||||
"execution_time_s": 0.0,
|
||||
"success": true,
|
||||
"quality_score": 0.0,
|
||||
"primary_metric": {
|
||||
"name": "test_pass_rate",
|
||||
"value": 1.0,
|
||||
"target": 1.0
|
||||
},
|
||||
"metadata": {}
|
||||
}
|
||||
```
|
||||
|
||||
Required fields: `timestamp`, `agent`, `success`.
|
||||
Recommended fields: `execution_time_s`, `quality_score`, `primary_metric`.
|
||||
|
||||
### Summary schema
|
||||
|
||||
`summary.json` is derived — never hand-edited. Regenerated on each append:
|
||||
|
||||
```json
|
||||
{
|
||||
"agent": "tdd-workflow",
|
||||
"execution_count": 12,
|
||||
"success_rate": 0.917,
|
||||
"avg_quality_score": 0.82,
|
||||
"avg_execution_time_s": 45.3,
|
||||
"last_execution": "2026-06-16T12:00:00Z",
|
||||
"trend": {
|
||||
"success_rate": "stable",
|
||||
"quality_score": "up"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Retention
|
||||
|
||||
Default retention: **180 days** (per `wiki/AgentKaizenOptimizer.md`).
|
||||
Pruning removes aged lines from `executions.jsonl` and regenerates `summary.json`.
|
||||
Project-level override via `.kaizen/metrics/config.json` is reserved for a
|
||||
future iteration.
|
||||
|
||||
### Session-close protocol
|
||||
|
||||
Memory-enabled agents with declared metrics should append one execution record
|
||||
at session close:
|
||||
|
||||
```bash
|
||||
kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>
|
||||
```
|
||||
|
||||
Or pipe a full JSON record via `--json` / stdin.
|
||||
|
||||
### CLI interface
|
||||
|
||||
```
|
||||
kaizen-agentic metrics record <agent> # Append execution record
|
||||
kaizen-agentic metrics show <agent> # Summary + recent executions
|
||||
kaizen-agentic metrics list # Agents with metrics in project
|
||||
kaizen-agentic metrics export <agent> # Dump executions.jsonl
|
||||
kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)
|
||||
```
|
||||
|
||||
`kaizen-agentic memory init <agent>` scaffolds metrics directories by default
|
||||
(`--no-metrics` to opt out).
|
||||
|
||||
### Helix Forge correlation
|
||||
|
||||
Kaizen-agentic **project metrics** and agentic-resources **fleet metrics**
|
||||
operate at different layers:
|
||||
|
||||
| Layer | Scope | Owner | Typical storage |
|
||||
|-------|-------|-------|-----------------|
|
||||
| Project | Per-agent persona in one repo | kaizen-agentic | `.kaizen/metrics/` |
|
||||
| Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + `measure/baselines.jsonl` |
|
||||
|
||||
**Correlation fields** — optional on project execution records, populated when
|
||||
the session is also captured by Helix Forge:
|
||||
|
||||
```json
|
||||
{
|
||||
"helix_session_uid": "claude:<native-session-uuid>",
|
||||
"repo": "kaizen-agentic",
|
||||
"flavor": "claude",
|
||||
"tokens": 12500,
|
||||
"infra_overhead_share": 0.12
|
||||
}
|
||||
```
|
||||
|
||||
Mapping from Helix Forge `session_metrics()` (agentic-resources):
|
||||
|
||||
| Helix field | ADR-004 field |
|
||||
|-------------|---------------|
|
||||
| `digest.outcome == "success"` | `success` |
|
||||
| `digest.cost.wall_clock_s` | `execution_time_s` |
|
||||
| `tokens` (input + output) | `tokens` in metadata / top-level |
|
||||
| `infra_overhead_share` | `metadata.infra_overhead_share` |
|
||||
| `Session.session_uid` | `helix_session_uid` |
|
||||
| `Session.repo` | `repo` |
|
||||
| `Session.flavor` | `flavor` |
|
||||
|
||||
Kaizen-agentic does **not** ingest Claude/Codex/Grok JSONL transcripts.
|
||||
Correlation is **link-by-reference**: project metrics may cite a Helix session
|
||||
UID; fleet analytics remain owned by agentic-resources.
|
||||
|
||||
WP-0004 defines the integration contract and optional sync tooling.
|
||||
|
||||
### Coach and memory integration
|
||||
|
||||
`kaizen-agentic memory brief <agent>` includes a `## Performance Summary`
|
||||
section when `summary.json` exists (WP-0003 Part 4). Qualitative memory
|
||||
(ADR-002) and quantitative metrics (this ADR) are complementary views of the
|
||||
same agent's project history.
|
||||
|
||||
## Consequences
|
||||
|
||||
- Agents can be measured per project without a central telemetry platform.
|
||||
- `OptimizationLoop` has a defined data source for recommendations.
|
||||
- Fleet session analytics stay in agentic-resources; no duplicate ingestion.
|
||||
- `.kaizen/metrics/` should default to `.gitignore` (same policy as memory).
|
||||
- WP-0003 implements `MetricsStore` and CLI against this convention.
|
||||
- WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [ADR-002: Project Memory Convention](ADR-002-project-memory-convention.md)
|
||||
- [wiki/EcosystemIntegration.md](../../wiki/EcosystemIntegration.md)
|
||||
- [agentic-resources session schema](https://github.com/coulomb/agentic-resources) — `session_memory/core/schema.py`
|
||||
- [KAIZEN-WP-0003](../../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
|
||||
- [KAIZEN-WP-0004](../../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)
|
||||
@@ -234,8 +234,56 @@ All agents that do session-bound project work have `memory: enabled` in their fr
|
||||
|
||||
---
|
||||
|
||||
## Project Metrics
|
||||
|
||||
Project-scoped **quantitative** metrics complement qualitative memory (ADR-002).
|
||||
Per-execution records live under `.kaizen/metrics/<agent>/` and feed the
|
||||
kaizen optimizer loop.
|
||||
|
||||
### Location
|
||||
|
||||
```
|
||||
<project-root>/.kaizen/metrics/<agent-name>/
|
||||
executions.jsonl
|
||||
summary.json
|
||||
|
||||
<project-root>/.kaizen/metrics/optimizer/
|
||||
analysis.json
|
||||
recommendations.jsonl
|
||||
```
|
||||
|
||||
### CLI (WP-0003)
|
||||
|
||||
```
|
||||
kaizen-agentic metrics record <agent> # Append execution record at session close
|
||||
kaizen-agentic metrics show <agent> # Summary + recent executions
|
||||
kaizen-agentic metrics list # Agents with metrics in project
|
||||
kaizen-agentic metrics export <agent> # Dump executions.jsonl
|
||||
kaizen-agentic metrics optimize [agent] # Run optimizer on project metrics
|
||||
```
|
||||
|
||||
`memory brief` includes a `## Performance Summary` when metrics exist (WP-0003
|
||||
Part 4).
|
||||
|
||||
### Fleet correlation
|
||||
|
||||
Project metrics correlate with **Helix Forge** fleet session metrics in
|
||||
`agentic-resources` via optional `helix_session_uid` (ADR-004). See
|
||||
[wiki/EcosystemIntegration.md](../wiki/EcosystemIntegration.md).
|
||||
|
||||
### Evidence retention
|
||||
|
||||
Optimizer outputs may be published to `artifact-store` (WP-0004 Part 3).
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
- [ADR-001: Workplan Convention](../workplans/kaizen-agentic-WP-0001-community-engagement.md) — how work items are structured
|
||||
- [ADR-002: Project Memory Convention](../workplans/kaizen-agentic-WP-0002-agency-framework.md) — memory file location, structure, and lifecycle
|
||||
- [WP-0002: Agency Framework](../workplans/kaizen-agentic-WP-0002-agency-framework.md) — full implementation workplan
|
||||
- [ADR-001: Workplan Convention](adr/ADR-001-workplan-convention.md)
|
||||
- [ADR-002: Project Memory Convention](adr/ADR-002-project-memory-convention.md)
|
||||
- [ADR-003: Protocols Artifact Convention](adr/ADR-003-protocols-artifact-convention.md)
|
||||
- [ADR-004: Project Metrics Convention](adr/ADR-004-project-metrics-convention.md)
|
||||
- [wiki/EcosystemIntegration.md](../wiki/EcosystemIntegration.md) — two-layer measurement model
|
||||
- [WP-0002: Agency Framework](../workplans/kaizen-agentic-WP-0002-agency-framework.md)
|
||||
- [WP-0003: Measurement Loop](../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
|
||||
- [WP-0004: Ecosystem Integration](../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)
|
||||
|
||||
Reference in New Issue
Block a user