Persist INTENT and ecosystem assessments in history/, add ADR-004 for project metrics with Helix Forge correlation, and register WP-0003 and WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework docs to reflect the two-layer measurement model.
190 lines
5.9 KiB
Markdown
190 lines
5.9 KiB
Markdown
---
|
|
id: ADR-004
|
|
title: Project Metrics Convention
|
|
status: accepted
|
|
date: "2026-06-16"
|
|
---
|
|
|
|
# ADR-004 — Project Metrics Convention
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
`INTENT.md` requires agents to be measurable, versioned, and optimizable. The
|
|
agency framework (ADR-002) provides **qualitative** project memory; the kaizen
|
|
loop needs **quantitative** per-execution records.
|
|
|
|
`wiki/AgentKaizenOptimizer.md` specifies `.kaizen/metrics/` storage.
|
|
`OptimizationLoop` in `src/kaizen_agentic/optimization.py` exists but has no
|
|
data source.
|
|
|
|
Separately, `agentic-resources` (Helix Forge) captures **fleet-level** session
|
|
metrics from coding agent transcripts. Project metrics and fleet metrics serve
|
|
different scopes and must correlate without duplicating ingestion logic.
|
|
|
|
## Decision
|
|
|
|
Each agent deployed into a project may accumulate **project-scoped execution
|
|
metrics**. Records are append-only JSONL with rolling summaries. The optimizer
|
|
reads these files to produce evidence-based recommendations.
|
|
|
|
### File locations
|
|
|
|
Per-agent executions:
|
|
|
|
```
|
|
<project-root>/.kaizen/metrics/<agent-name>/
|
|
executions.jsonl # append-only per-execution records
|
|
summary.json # rolling aggregates (regenerated on write)
|
|
```
|
|
|
|
Optimizer outputs:
|
|
|
|
```
|
|
<project-root>/.kaizen/metrics/optimizer/
|
|
analysis.json # last analysis run + input fingerprint
|
|
recommendations.jsonl # append-only recommendation history
|
|
```
|
|
|
|
The `.kaizen/metrics/` tree lives alongside `.kaizen/agents/` under the same
|
|
project-level state directory (ADR-002).
|
|
|
|
### Execution record schema (minimum viable)
|
|
|
|
```json
|
|
{
|
|
"timestamp": "2026-06-16T12:00:00Z",
|
|
"agent": "tdd-workflow",
|
|
"session_id": "optional-uuid-or-hash",
|
|
"execution_time_s": 0.0,
|
|
"success": true,
|
|
"quality_score": 0.0,
|
|
"primary_metric": {
|
|
"name": "test_pass_rate",
|
|
"value": 1.0,
|
|
"target": 1.0
|
|
},
|
|
"metadata": {}
|
|
}
|
|
```
|
|
|
|
Required fields: `timestamp`, `agent`, `success`.
|
|
Recommended fields: `execution_time_s`, `quality_score`, `primary_metric`.
|
|
|
|
### Summary schema
|
|
|
|
`summary.json` is derived — never hand-edited. Regenerated on each append:
|
|
|
|
```json
|
|
{
|
|
"agent": "tdd-workflow",
|
|
"execution_count": 12,
|
|
"success_rate": 0.917,
|
|
"avg_quality_score": 0.82,
|
|
"avg_execution_time_s": 45.3,
|
|
"last_execution": "2026-06-16T12:00:00Z",
|
|
"trend": {
|
|
"success_rate": "stable",
|
|
"quality_score": "up"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Retention
|
|
|
|
Default retention: **180 days** (per `wiki/AgentKaizenOptimizer.md`).
|
|
Pruning removes aged lines from `executions.jsonl` and regenerates `summary.json`.
|
|
Project-level override via `.kaizen/metrics/config.json` is reserved for a
|
|
future iteration.
|
|
|
|
### Session-close protocol
|
|
|
|
Memory-enabled agents with declared metrics should append one execution record
|
|
at session close:
|
|
|
|
```bash
|
|
kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>
|
|
```
|
|
|
|
Or pipe a full JSON record via `--json` / stdin.
|
|
|
|
### CLI interface
|
|
|
|
```
|
|
kaizen-agentic metrics record <agent> # Append execution record
|
|
kaizen-agentic metrics show <agent> # Summary + recent executions
|
|
kaizen-agentic metrics list # Agents with metrics in project
|
|
kaizen-agentic metrics export <agent> # Dump executions.jsonl
|
|
kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)
|
|
```
|
|
|
|
`kaizen-agentic memory init <agent>` scaffolds metrics directories by default
|
|
(`--no-metrics` to opt out).
|
|
|
|
### Helix Forge correlation
|
|
|
|
Kaizen-agentic **project metrics** and agentic-resources **fleet metrics**
|
|
operate at different layers:
|
|
|
|
| Layer | Scope | Owner | Typical storage |
|
|
|-------|-------|-------|-----------------|
|
|
| Project | Per-agent persona in one repo | kaizen-agentic | `.kaizen/metrics/` |
|
|
| Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + `measure/baselines.jsonl` |
|
|
|
|
**Correlation fields** — optional on project execution records, populated when
|
|
the session is also captured by Helix Forge:
|
|
|
|
```json
|
|
{
|
|
"helix_session_uid": "claude:<native-session-uuid>",
|
|
"repo": "kaizen-agentic",
|
|
"flavor": "claude",
|
|
"tokens": 12500,
|
|
"infra_overhead_share": 0.12
|
|
}
|
|
```
|
|
|
|
Mapping from Helix Forge `session_metrics()` (agentic-resources):
|
|
|
|
| Helix field | ADR-004 field |
|
|
|-------------|---------------|
|
|
| `digest.outcome == "success"` | `success` |
|
|
| `digest.cost.wall_clock_s` | `execution_time_s` |
|
|
| `tokens` (input + output) | `tokens` in metadata / top-level |
|
|
| `infra_overhead_share` | `metadata.infra_overhead_share` |
|
|
| `Session.session_uid` | `helix_session_uid` |
|
|
| `Session.repo` | `repo` |
|
|
| `Session.flavor` | `flavor` |
|
|
|
|
Kaizen-agentic does **not** ingest Claude/Codex/Grok JSONL transcripts.
|
|
Correlation is **link-by-reference**: project metrics may cite a Helix session
|
|
UID; fleet analytics remain owned by agentic-resources.
|
|
|
|
WP-0004 defines the integration contract and optional sync tooling.
|
|
|
|
### Coach and memory integration
|
|
|
|
`kaizen-agentic memory brief <agent>` includes a `## Performance Summary`
|
|
section when `summary.json` exists (WP-0003 Part 4). Qualitative memory
|
|
(ADR-002) and quantitative metrics (this ADR) are complementary views of the
|
|
same agent's project history.
|
|
|
|
## Consequences
|
|
|
|
- Agents can be measured per project without a central telemetry platform.
|
|
- `OptimizationLoop` has a defined data source for recommendations.
|
|
- Fleet session analytics stay in agentic-resources; no duplicate ingestion.
|
|
- `.kaizen/metrics/` should default to `.gitignore` (same policy as memory).
|
|
- WP-0003 implements `MetricsStore` and CLI against this convention.
|
|
- WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).
|
|
|
|
## Related Documents
|
|
|
|
- [ADR-002: Project Memory Convention](ADR-002-project-memory-convention.md)
|
|
- [wiki/EcosystemIntegration.md](../../wiki/EcosystemIntegration.md)
|
|
- [agentic-resources session schema](https://github.com/coulomb/agentic-resources) — `session_memory/core/schema.py`
|
|
- [KAIZEN-WP-0003](../../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
|
|
- [KAIZEN-WP-0004](../../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md) |