kaizen-agentic/docs/adr/ADR-004-project-metrics-convention.md

---
id: ADR-004
title: Project Metrics Convention
status: accepted
date: "2026-06-16"
---

# ADR-004 — Project Metrics Convention

## Status

Accepted

## Context

`INTENT.md` requires agents to be measurable, versioned, and optimizable. The
agency framework (ADR-002) provides **qualitative** project memory; the kaizen
loop needs **quantitative** per-execution records.

`wiki/AgentKaizenOptimizer.md` specifies `.kaizen/metrics/` storage.
`OptimizationLoop` in `src/kaizen_agentic/optimization.py` exists but has no
data source.

Separately, `agentic-resources` (Helix Forge) captures **fleet-level** session
metrics from coding agent transcripts. Project metrics and fleet metrics serve
different scopes and must correlate without duplicating ingestion logic.

## Decision

Each agent deployed into a project may accumulate **project-scoped execution
metrics**. Records are append-only JSONL with rolling summaries. The optimizer
reads these files to produce evidence-based recommendations.

### File locations

Per-agent executions:

```
<project-root>/.kaizen/metrics/<agent-name>/
  executions.jsonl    # append-only per-execution records
  summary.json        # rolling aggregates (regenerated on write)
```

Optimizer outputs:

```
<project-root>/.kaizen/metrics/optimizer/
  analysis.json           # last analysis run + input fingerprint
  recommendations.jsonl   # append-only recommendation history
```

The `.kaizen/metrics/` tree lives alongside `.kaizen/agents/` under the same
project-level state directory (ADR-002).

### Execution record schema (minimum viable)

```json
{
  "timestamp": "2026-06-16T12:00:00Z",
  "agent": "tdd-workflow",
  "session_id": "optional-uuid-or-hash",
  "execution_time_s": 0.0,
  "success": true,
  "quality_score": 0.0,
  "primary_metric": {
    "name": "test_pass_rate",
    "value": 1.0,
    "target": 1.0
  },
  "metadata": {}
}
```

Required fields: `timestamp`, `agent`, `success`.
Recommended fields: `execution_time_s`, `quality_score`, `primary_metric`.

### Summary schema

`summary.json` is derived — never hand-edited. Regenerated on each append:

```json
{
  "agent": "tdd-workflow",
  "execution_count": 12,
  "success_rate": 0.917,
  "avg_quality_score": 0.82,
  "avg_execution_time_s": 45.3,
  "last_execution": "2026-06-16T12:00:00Z",
  "trend": {
    "success_rate": "stable",
    "quality_score": "up"
  }
}
```

### Retention

Default retention: **180 days** (per `wiki/AgentKaizenOptimizer.md`).
Pruning removes aged lines from `executions.jsonl` and regenerates `summary.json`.
Project-level override via `.kaizen/metrics/config.json` is reserved for a
future iteration.

### Session-close protocol

Memory-enabled agents with declared metrics should append one execution record
at session close:

```bash
kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>
```

Or pipe a full JSON record via `--json` / stdin.

### CLI interface

```
kaizen-agentic metrics record <agent>   # Append execution record
kaizen-agentic metrics show <agent>     # Summary + recent executions
kaizen-agentic metrics list             # Agents with metrics in project
kaizen-agentic metrics export <agent>   # Dump executions.jsonl
kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)
```

`kaizen-agentic memory init <agent>` scaffolds metrics directories by default
(`--no-metrics` to opt out).

### Helix Forge correlation

Kaizen-agentic **project metrics** and agentic-resources **fleet metrics**
operate at different layers:

| Layer | Scope | Owner | Typical storage |
|-------|-------|-------|-----------------|
| Project | Per-agent persona in one repo | kaizen-agentic | `.kaizen/metrics/` |
| Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + `measure/baselines.jsonl` |

**Correlation fields** — optional on project execution records, populated when
the session is also captured by Helix Forge:

```json
{
  "helix_session_uid": "claude:<native-session-uuid>",
  "repo": "kaizen-agentic",
  "flavor": "claude",
  "tokens": 12500,
  "infra_overhead_share": 0.12
}
```

Mapping from Helix Forge `session_metrics()` (agentic-resources):

| Helix field | ADR-004 field |
|-------------|---------------|
| `digest.outcome == "success"` | `success` |
| `digest.cost.wall_clock_s` | `execution_time_s` |
| `tokens` (input + output) | `tokens` in metadata / top-level |
| `infra_overhead_share` | `metadata.infra_overhead_share` |
| `Session.session_uid` | `helix_session_uid` |
| `Session.repo` | `repo` |
| `Session.flavor` | `flavor` |

Kaizen-agentic does **not** ingest Claude/Codex/Grok JSONL transcripts.
Correlation is **link-by-reference**: project metrics may cite a Helix session
UID; fleet analytics remain owned by agentic-resources.

WP-0004 defines the integration contract and optional sync tooling.

### Coach and memory integration

`kaizen-agentic memory brief <agent>` includes a `## Performance Summary`
section when `summary.json` exists (WP-0003 Part 4). Qualitative memory
(ADR-002) and quantitative metrics (this ADR) are complementary views of the
same agent's project history.

## Consequences

- Agents can be measured per project without a central telemetry platform.
- `OptimizationLoop` has a defined data source for recommendations.
- Fleet session analytics stay in agentic-resources; no duplicate ingestion.
- `.kaizen/metrics/` should default to `.gitignore` (same policy as memory).
- WP-0003 implements `MetricsStore` and CLI against this convention.
- WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).

## Related Documents

- [ADR-002: Project Memory Convention](ADR-002-project-memory-convention.md)
- [wiki/EcosystemIntegration.md](../../wiki/EcosystemIntegration.md)
- [agentic-resources session schema](https://github.com/coulomb/agentic-resources) — `session_memory/core/schema.py`
- [KAIZEN-WP-0003](../../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
- [KAIZEN-WP-0004](../../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)