--- id: ADR-004 title: Project Metrics Convention status: accepted date: "2026-06-16" --- # ADR-004 — Project Metrics Convention ## Status Accepted ## Context `INTENT.md` requires agents to be measurable, versioned, and optimizable. The agency framework (ADR-002) provides **qualitative** project memory; the kaizen loop needs **quantitative** per-execution records. `wiki/AgentKaizenOptimizer.md` specifies `.kaizen/metrics/` storage. `OptimizationLoop` in `src/kaizen_agentic/optimization.py` exists but has no data source. Separately, `agentic-resources` (Helix Forge) captures **fleet-level** session metrics from coding agent transcripts. Project metrics and fleet metrics serve different scopes and must correlate without duplicating ingestion logic. ## Decision Each agent deployed into a project may accumulate **project-scoped execution metrics**. Records are append-only JSONL with rolling summaries. The optimizer reads these files to produce evidence-based recommendations. ### File locations Per-agent executions: ``` /.kaizen/metrics// executions.jsonl # append-only per-execution records summary.json # rolling aggregates (regenerated on write) ``` Optimizer outputs: ``` /.kaizen/metrics/optimizer/ analysis.json # last analysis run + input fingerprint recommendations.jsonl # append-only recommendation history ``` The `.kaizen/metrics/` tree lives alongside `.kaizen/agents/` under the same project-level state directory (ADR-002). ### Execution record schema (minimum viable) ```json { "timestamp": "2026-06-16T12:00:00Z", "agent": "tdd-workflow", "session_id": "optional-uuid-or-hash", "execution_time_s": 0.0, "success": true, "quality_score": 0.0, "primary_metric": { "name": "test_pass_rate", "value": 1.0, "target": 1.0 }, "metadata": {} } ``` Required fields: `timestamp`, `agent`, `success`. Recommended fields: `execution_time_s`, `quality_score`, `primary_metric`. ### Summary schema `summary.json` is derived — never hand-edited. Regenerated on each append: ```json { "agent": "tdd-workflow", "execution_count": 12, "success_rate": 0.917, "avg_quality_score": 0.82, "avg_execution_time_s": 45.3, "last_execution": "2026-06-16T12:00:00Z", "trend": { "success_rate": "stable", "quality_score": "up" } } ``` ### Retention Default retention: **180 days** (per `wiki/AgentKaizenOptimizer.md`). Pruning removes aged lines from `executions.jsonl` and regenerates `summary.json`. Project-level override via `.kaizen/metrics/config.json` is reserved for a future iteration. ### Session-close protocol Memory-enabled agents with declared metrics should append one execution record at session close: ```bash kaizen-agentic metrics record --success --time --quality <0-1> ``` Or pipe a full JSON record via `--json` / stdin. ### CLI interface ``` kaizen-agentic metrics record # Append execution record kaizen-agentic metrics show # Summary + recent executions kaizen-agentic metrics list # Agents with metrics in project kaizen-agentic metrics export # Dump executions.jsonl kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3) ``` `kaizen-agentic memory init ` scaffolds metrics directories by default (`--no-metrics` to opt out). ### Helix Forge correlation Kaizen-agentic **project metrics** and agentic-resources **fleet metrics** operate at different layers: | Layer | Scope | Owner | Typical storage | |-------|-------|-------|-----------------| | Project | Per-agent persona in one repo | kaizen-agentic | `.kaizen/metrics/` | | Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + `measure/baselines.jsonl` | **Correlation fields** — optional on project execution records, populated when the session is also captured by Helix Forge: ```json { "helix_session_uid": "claude:", "repo": "kaizen-agentic", "flavor": "claude", "tokens": 12500, "infra_overhead_share": 0.12 } ``` Mapping from Helix Forge `session_metrics()` (agentic-resources): | Helix field | ADR-004 field | |-------------|---------------| | `digest.outcome == "success"` | `success` | | `digest.cost.wall_clock_s` | `execution_time_s` | | `tokens` (input + output) | `tokens` in metadata / top-level | | `infra_overhead_share` | `metadata.infra_overhead_share` | | `Session.session_uid` | `helix_session_uid` | | `Session.repo` | `repo` | | `Session.flavor` | `flavor` | Kaizen-agentic does **not** ingest Claude/Codex/Grok JSONL transcripts. Correlation is **link-by-reference**: project metrics may cite a Helix session UID; fleet analytics remain owned by agentic-resources. WP-0004 defines the integration contract and optional sync tooling. ### Coach and memory integration `kaizen-agentic memory brief ` includes a `## Performance Summary` section when `summary.json` exists (WP-0003 Part 4). Qualitative memory (ADR-002) and quantitative metrics (this ADR) are complementary views of the same agent's project history. ## Consequences - Agents can be measured per project without a central telemetry platform. - `OptimizationLoop` has a defined data source for recommendations. - Fleet session analytics stay in agentic-resources; no duplicate ingestion. - `.kaizen/metrics/` should default to `.gitignore` (same policy as memory). - WP-0003 implements `MetricsStore` and CLI against this convention. - WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge). ## Related Documents - [ADR-002: Project Memory Convention](ADR-002-project-memory-convention.md) - [wiki/EcosystemIntegration.md](../../wiki/EcosystemIntegration.md) - [agentic-resources session schema](https://github.com/coulomb/agentic-resources) — `session_memory/core/schema.py` - [KAIZEN-WP-0003](../../workplans/kaizen-agentic-WP-0003-measurement-loop.md) - [KAIZEN-WP-0004](../../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)