Files

tegwick bd74d7d122 Document measurement loop plan and ecosystem integration strategy.

Persist INTENT and ecosystem assessments in history/, add ADR-004 for
project metrics with Helix Forge correlation, and register WP-0003 and
WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework
docs to reflect the two-layer measurement model.

2026-06-16 01:34:13 +02:00

5.9 KiB

Raw Blame History

id, title, status, date

id	title	status	date
ADR-004	Project Metrics Convention	accepted	2026-06-16

ADR-004 — Project Metrics Convention

Status

Accepted

Context

INTENT.md requires agents to be measurable, versioned, and optimizable. The agency framework (ADR-002) provides qualitative project memory; the kaizen loop needs quantitative per-execution records.

wiki/AgentKaizenOptimizer.md specifies .kaizen/metrics/ storage. OptimizationLoop in src/kaizen_agentic/optimization.py exists but has no data source.

Separately, agentic-resources (Helix Forge) captures fleet-level session metrics from coding agent transcripts. Project metrics and fleet metrics serve different scopes and must correlate without duplicating ingestion logic.

Decision

Each agent deployed into a project may accumulate project-scoped execution metrics. Records are append-only JSONL with rolling summaries. The optimizer reads these files to produce evidence-based recommendations.

File locations

Per-agent executions:

<project-root>/.kaizen/metrics/<agent-name>/
  executions.jsonl    # append-only per-execution records
  summary.json        # rolling aggregates (regenerated on write)

Optimizer outputs:

<project-root>/.kaizen/metrics/optimizer/
  analysis.json           # last analysis run + input fingerprint
  recommendations.jsonl   # append-only recommendation history

The .kaizen/metrics/ tree lives alongside .kaizen/agents/ under the same project-level state directory (ADR-002).

Execution record schema (minimum viable)

{
  "timestamp": "2026-06-16T12:00:00Z",
  "agent": "tdd-workflow",
  "session_id": "optional-uuid-or-hash",
  "execution_time_s": 0.0,
  "success": true,
  "quality_score": 0.0,
  "primary_metric": {
    "name": "test_pass_rate",
    "value": 1.0,
    "target": 1.0
  },
  "metadata": {}
}

Required fields: timestamp, agent, success. Recommended fields: execution_time_s, quality_score, primary_metric.

Summary schema

summary.json is derived — never hand-edited. Regenerated on each append:

{
  "agent": "tdd-workflow",
  "execution_count": 12,
  "success_rate": 0.917,
  "avg_quality_score": 0.82,
  "avg_execution_time_s": 45.3,
  "last_execution": "2026-06-16T12:00:00Z",
  "trend": {
    "success_rate": "stable",
    "quality_score": "up"
  }
}

Retention

Default retention: 180 days (per wiki/AgentKaizenOptimizer.md). Pruning removes aged lines from executions.jsonl and regenerates summary.json. Project-level override via .kaizen/metrics/config.json is reserved for a future iteration.

Session-close protocol

Memory-enabled agents with declared metrics should append one execution record at session close:

kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>

Or pipe a full JSON record via --json / stdin.

CLI interface

kaizen-agentic metrics record <agent>   # Append execution record
kaizen-agentic metrics show <agent>     # Summary + recent executions
kaizen-agentic metrics list             # Agents with metrics in project
kaizen-agentic metrics export <agent>   # Dump executions.jsonl
kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)

kaizen-agentic memory init <agent> scaffolds metrics directories by default (--no-metrics to opt out).

Helix Forge correlation

Kaizen-agentic project metrics and agentic-resources fleet metrics operate at different layers:

Layer	Scope	Owner	Typical storage
Project	Per-agent persona in one repo	kaizen-agentic	`.kaizen/metrics/`
Fleet	Cross-repo coding sessions	agentic-resources	Helix Forge digest store + `measure/baselines.jsonl`

Correlation fields — optional on project execution records, populated when the session is also captured by Helix Forge:

{
  "helix_session_uid": "claude:<native-session-uuid>",
  "repo": "kaizen-agentic",
  "flavor": "claude",
  "tokens": 12500,
  "infra_overhead_share": 0.12
}

Mapping from Helix Forge session_metrics() (agentic-resources):

Helix field	ADR-004 field
`digest.outcome == "success"`	`success`
`digest.cost.wall_clock_s`	`execution_time_s`
`tokens` (input + output)	`tokens` in metadata / top-level
`infra_overhead_share`	`metadata.infra_overhead_share`
`Session.session_uid`	`helix_session_uid`
`Session.repo`	`repo`
`Session.flavor`	`flavor`

Kaizen-agentic does not ingest Claude/Codex/Grok JSONL transcripts. Correlation is link-by-reference: project metrics may cite a Helix session UID; fleet analytics remain owned by agentic-resources.

WP-0004 defines the integration contract and optional sync tooling.

Coach and memory integration

kaizen-agentic memory brief <agent> includes a ## Performance Summary section when summary.json exists (WP-0003 Part 4). Qualitative memory (ADR-002) and quantitative metrics (this ADR) are complementary views of the same agent's project history.

Consequences

Agents can be measured per project without a central telemetry platform.
OptimizationLoop has a defined data source for recommendations.
Fleet session analytics stay in agentic-resources; no duplicate ingestion.
.kaizen/metrics/ should default to .gitignore (same policy as memory).
WP-0003 implements MetricsStore and CLI against this convention.
WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).

ADR-002: Project Memory Convention
wiki/EcosystemIntegration.md
agentic-resources session schema — session_memory/core/schema.py
KAIZEN-WP-0003
KAIZEN-WP-0004

5.9 KiB Raw Blame History