Persist INTENT and ecosystem assessments in history/, add ADR-004 for project metrics with Helix Forge correlation, and register WP-0003 and WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework docs to reflect the two-layer measurement model.
5.9 KiB
id, title, status, date
| id | title | status | date |
|---|---|---|---|
| ADR-004 | Project Metrics Convention | accepted | 2026-06-16 |
ADR-004 — Project Metrics Convention
Status
Accepted
Context
INTENT.md requires agents to be measurable, versioned, and optimizable. The
agency framework (ADR-002) provides qualitative project memory; the kaizen
loop needs quantitative per-execution records.
wiki/AgentKaizenOptimizer.md specifies .kaizen/metrics/ storage.
OptimizationLoop in src/kaizen_agentic/optimization.py exists but has no
data source.
Separately, agentic-resources (Helix Forge) captures fleet-level session
metrics from coding agent transcripts. Project metrics and fleet metrics serve
different scopes and must correlate without duplicating ingestion logic.
Decision
Each agent deployed into a project may accumulate project-scoped execution metrics. Records are append-only JSONL with rolling summaries. The optimizer reads these files to produce evidence-based recommendations.
File locations
Per-agent executions:
<project-root>/.kaizen/metrics/<agent-name>/
executions.jsonl # append-only per-execution records
summary.json # rolling aggregates (regenerated on write)
Optimizer outputs:
<project-root>/.kaizen/metrics/optimizer/
analysis.json # last analysis run + input fingerprint
recommendations.jsonl # append-only recommendation history
The .kaizen/metrics/ tree lives alongside .kaizen/agents/ under the same
project-level state directory (ADR-002).
Execution record schema (minimum viable)
{
"timestamp": "2026-06-16T12:00:00Z",
"agent": "tdd-workflow",
"session_id": "optional-uuid-or-hash",
"execution_time_s": 0.0,
"success": true,
"quality_score": 0.0,
"primary_metric": {
"name": "test_pass_rate",
"value": 1.0,
"target": 1.0
},
"metadata": {}
}
Required fields: timestamp, agent, success.
Recommended fields: execution_time_s, quality_score, primary_metric.
Summary schema
summary.json is derived — never hand-edited. Regenerated on each append:
{
"agent": "tdd-workflow",
"execution_count": 12,
"success_rate": 0.917,
"avg_quality_score": 0.82,
"avg_execution_time_s": 45.3,
"last_execution": "2026-06-16T12:00:00Z",
"trend": {
"success_rate": "stable",
"quality_score": "up"
}
}
Retention
Default retention: 180 days (per wiki/AgentKaizenOptimizer.md).
Pruning removes aged lines from executions.jsonl and regenerates summary.json.
Project-level override via .kaizen/metrics/config.json is reserved for a
future iteration.
Session-close protocol
Memory-enabled agents with declared metrics should append one execution record at session close:
kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>
Or pipe a full JSON record via --json / stdin.
CLI interface
kaizen-agentic metrics record <agent> # Append execution record
kaizen-agentic metrics show <agent> # Summary + recent executions
kaizen-agentic metrics list # Agents with metrics in project
kaizen-agentic metrics export <agent> # Dump executions.jsonl
kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)
kaizen-agentic memory init <agent> scaffolds metrics directories by default
(--no-metrics to opt out).
Helix Forge correlation
Kaizen-agentic project metrics and agentic-resources fleet metrics operate at different layers:
| Layer | Scope | Owner | Typical storage |
|---|---|---|---|
| Project | Per-agent persona in one repo | kaizen-agentic | .kaizen/metrics/ |
| Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + measure/baselines.jsonl |
Correlation fields — optional on project execution records, populated when the session is also captured by Helix Forge:
{
"helix_session_uid": "claude:<native-session-uuid>",
"repo": "kaizen-agentic",
"flavor": "claude",
"tokens": 12500,
"infra_overhead_share": 0.12
}
Mapping from Helix Forge session_metrics() (agentic-resources):
| Helix field | ADR-004 field |
|---|---|
digest.outcome == "success" |
success |
digest.cost.wall_clock_s |
execution_time_s |
tokens (input + output) |
tokens in metadata / top-level |
infra_overhead_share |
metadata.infra_overhead_share |
Session.session_uid |
helix_session_uid |
Session.repo |
repo |
Session.flavor |
flavor |
Kaizen-agentic does not ingest Claude/Codex/Grok JSONL transcripts. Correlation is link-by-reference: project metrics may cite a Helix session UID; fleet analytics remain owned by agentic-resources.
WP-0004 defines the integration contract and optional sync tooling.
Coach and memory integration
kaizen-agentic memory brief <agent> includes a ## Performance Summary
section when summary.json exists (WP-0003 Part 4). Qualitative memory
(ADR-002) and quantitative metrics (this ADR) are complementary views of the
same agent's project history.
Consequences
- Agents can be measured per project without a central telemetry platform.
OptimizationLoophas a defined data source for recommendations.- Fleet session analytics stay in agentic-resources; no duplicate ingestion.
.kaizen/metrics/should default to.gitignore(same policy as memory).- WP-0003 implements
MetricsStoreand CLI against this convention. - WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).
Related Documents
- ADR-002: Project Memory Convention
- wiki/EcosystemIntegration.md
- agentic-resources session schema —
session_memory/core/schema.py - KAIZEN-WP-0003
- KAIZEN-WP-0004