Document measurement loop plan and ecosystem integration strategy.

Persist INTENT and ecosystem assessments in history/, add ADR-004 for project metrics with Helix Forge correlation, and register WP-0003 and WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework docs to reflect the two-layer measurement model.
2026-06-16 01:34:13 +02:00
parent 71ef5f4734
commit bd74d7d122
10 changed files with 1186 additions and 33 deletions
--- a/docs/adr/ADR-004-project-metrics-convention.md
+++ b/docs/adr/ADR-004-project-metrics-convention.md
@@ -0,0 +1,190 @@
+---
+id: ADR-004
+title: Project Metrics Convention
+status: accepted
+date: "2026-06-16"
+---
+
+# ADR-004 — Project Metrics Convention
+
+## Status
+
+Accepted
+
+## Context
+
+`INTENT.md` requires agents to be measurable, versioned, and optimizable. The
+agency framework (ADR-002) provides **qualitative** project memory; the kaizen
+loop needs **quantitative** per-execution records.
+
+`wiki/AgentKaizenOptimizer.md` specifies `.kaizen/metrics/` storage.
+`OptimizationLoop` in `src/kaizen_agentic/optimization.py` exists but has no
+data source.
+
+Separately, `agentic-resources` (Helix Forge) captures **fleet-level** session
+metrics from coding agent transcripts. Project metrics and fleet metrics serve
+different scopes and must correlate without duplicating ingestion logic.
+
+## Decision
+
+Each agent deployed into a project may accumulate **project-scoped execution
+metrics**. Records are append-only JSONL with rolling summaries. The optimizer
+reads these files to produce evidence-based recommendations.
+
+### File locations
+
+Per-agent executions:
+
+```
+<project-root>/.kaizen/metrics/<agent-name>/
+  executions.jsonl    # append-only per-execution records
+  summary.json        # rolling aggregates (regenerated on write)
+```
+
+Optimizer outputs:
+
+```
+<project-root>/.kaizen/metrics/optimizer/
+  analysis.json           # last analysis run + input fingerprint
+  recommendations.jsonl   # append-only recommendation history
+```
+
+The `.kaizen/metrics/` tree lives alongside `.kaizen/agents/` under the same
+project-level state directory (ADR-002).
+
+### Execution record schema (minimum viable)
+
+```json
+{
+  "timestamp": "2026-06-16T12:00:00Z",
+  "agent": "tdd-workflow",
+  "session_id": "optional-uuid-or-hash",
+  "execution_time_s": 0.0,
+  "success": true,
+  "quality_score": 0.0,
+  "primary_metric": {
+    "name": "test_pass_rate",
+    "value": 1.0,
+    "target": 1.0
+  },
+  "metadata": {}
+}
+```
+
+Required fields: `timestamp`, `agent`, `success`.
+Recommended fields: `execution_time_s`, `quality_score`, `primary_metric`.
+
+### Summary schema
+
+`summary.json` is derived — never hand-edited. Regenerated on each append:
+
+```json
+{
+  "agent": "tdd-workflow",
+  "execution_count": 12,
+  "success_rate": 0.917,
+  "avg_quality_score": 0.82,
+  "avg_execution_time_s": 45.3,
+  "last_execution": "2026-06-16T12:00:00Z",
+  "trend": {
+    "success_rate": "stable",
+    "quality_score": "up"
+  }
+}
+```
+
+### Retention
+
+Default retention: **180 days** (per `wiki/AgentKaizenOptimizer.md`).
+Pruning removes aged lines from `executions.jsonl` and regenerates `summary.json`.
+Project-level override via `.kaizen/metrics/config.json` is reserved for a
+future iteration.
+
+### Session-close protocol
+
+Memory-enabled agents with declared metrics should append one execution record
+at session close:
+
+```bash
+kaizen-agentic metrics record <agent> --success --time <seconds> --quality <0-1>
+```
+
+Or pipe a full JSON record via `--json` / stdin.
+
+### CLI interface
+
+```
+kaizen-agentic metrics record <agent>   # Append execution record
+kaizen-agentic metrics show <agent>     # Summary + recent executions
+kaizen-agentic metrics list             # Agents with metrics in project
+kaizen-agentic metrics export <agent>   # Dump executions.jsonl
+kaizen-agentic metrics optimize [agent] # Run OptimizationLoop (WP-0003 Part 3)
+```
+
+`kaizen-agentic memory init <agent>` scaffolds metrics directories by default
+(`--no-metrics` to opt out).
+
+### Helix Forge correlation
+
+Kaizen-agentic **project metrics** and agentic-resources **fleet metrics**
+operate at different layers:
+
+| Layer | Scope | Owner | Typical storage |
+|-------|-------|-------|-----------------|
+| Project | Per-agent persona in one repo | kaizen-agentic | `.kaizen/metrics/` |
+| Fleet | Cross-repo coding sessions | agentic-resources | Helix Forge digest store + `measure/baselines.jsonl` |
+
+**Correlation fields** — optional on project execution records, populated when
+the session is also captured by Helix Forge:
+
+```json
+{
+  "helix_session_uid": "claude:<native-session-uuid>",
+  "repo": "kaizen-agentic",
+  "flavor": "claude",
+  "tokens": 12500,
+  "infra_overhead_share": 0.12
+}
+```
+
+Mapping from Helix Forge `session_metrics()` (agentic-resources):
+
+| Helix field | ADR-004 field |
+|-------------|---------------|
+| `digest.outcome == "success"` | `success` |
+| `digest.cost.wall_clock_s` | `execution_time_s` |
+| `tokens` (input + output) | `tokens` in metadata / top-level |
+| `infra_overhead_share` | `metadata.infra_overhead_share` |
+| `Session.session_uid` | `helix_session_uid` |
+| `Session.repo` | `repo` |
+| `Session.flavor` | `flavor` |
+
+Kaizen-agentic does **not** ingest Claude/Codex/Grok JSONL transcripts.
+Correlation is **link-by-reference**: project metrics may cite a Helix session
+UID; fleet analytics remain owned by agentic-resources.
+
+WP-0004 defines the integration contract and optional sync tooling.
+
+### Coach and memory integration
+
+`kaizen-agentic memory brief <agent>` includes a `## Performance Summary`
+section when `summary.json` exists (WP-0003 Part 4). Qualitative memory
+(ADR-002) and quantitative metrics (this ADR) are complementary views of the
+same agent's project history.
+
+## Consequences
+
+- Agents can be measured per project without a central telemetry platform.
+- `OptimizationLoop` has a defined data source for recommendations.
+- Fleet session analytics stay in agentic-resources; no duplicate ingestion.
+- `.kaizen/metrics/` should default to `.gitignore` (same policy as memory).
+- WP-0003 implements `MetricsStore` and CLI against this convention.
+- WP-0004 wires ecosystem services (activity-core, artifact-store, Helix Forge).
+
+## Related Documents
+
+- [ADR-002: Project Memory Convention](ADR-002-project-memory-convention.md)
+- [wiki/EcosystemIntegration.md](../../wiki/EcosystemIntegration.md)
+- [agentic-resources session schema](https://github.com/coulomb/agentic-resources) — `session_memory/core/schema.py`
+- [KAIZEN-WP-0003](../../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
+- [KAIZEN-WP-0004](../../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)
--- a/docs/agency-framework.md
+++ b/docs/agency-framework.md
@@ -234,8 +234,56 @@ All agents that do session-bound project work have `memory: enabled` in their fr

 ---

+## Project Metrics
+
+Project-scoped **quantitative** metrics complement qualitative memory (ADR-002).
+Per-execution records live under `.kaizen/metrics/<agent>/` and feed the
+kaizen optimizer loop.
+
+### Location
+
+```
+<project-root>/.kaizen/metrics/<agent-name>/
+  executions.jsonl
+  summary.json
+
+<project-root>/.kaizen/metrics/optimizer/
+  analysis.json
+  recommendations.jsonl
+```
+
+### CLI (WP-0003)
+
+```
+kaizen-agentic metrics record <agent>   # Append execution record at session close
+kaizen-agentic metrics show <agent>     # Summary + recent executions
+kaizen-agentic metrics list             # Agents with metrics in project
+kaizen-agentic metrics export <agent>   # Dump executions.jsonl
+kaizen-agentic metrics optimize [agent] # Run optimizer on project metrics
+```
+
+`memory brief` includes a `## Performance Summary` when metrics exist (WP-0003
+Part 4).
+
+### Fleet correlation
+
+Project metrics correlate with **Helix Forge** fleet session metrics in
+`agentic-resources` via optional `helix_session_uid` (ADR-004). See
+[wiki/EcosystemIntegration.md](../wiki/EcosystemIntegration.md).
+
+### Evidence retention
+
+Optimizer outputs may be published to `artifact-store` (WP-0004 Part 3).
+
+---
+
 ## Related Documents

- [ADR-001: Workplan Convention](../workplans/kaizen-agentic-WP-0001-community-engagement.md) — how work items are structured
- [ADR-002: Project Memory Convention](../workplans/kaizen-agentic-WP-0002-agency-framework.md) — memory file location, structure, and lifecycle
- [WP-0002: Agency Framework](../workplans/kaizen-agentic-WP-0002-agency-framework.md) — full implementation workplan
+- [ADR-001: Workplan Convention](adr/ADR-001-workplan-convention.md)
+- [ADR-002: Project Memory Convention](adr/ADR-002-project-memory-convention.md)
+- [ADR-003: Protocols Artifact Convention](adr/ADR-003-protocols-artifact-convention.md)
+- [ADR-004: Project Metrics Convention](adr/ADR-004-project-metrics-convention.md)
+- [wiki/EcosystemIntegration.md](../wiki/EcosystemIntegration.md) — two-layer measurement model
+- [WP-0002: Agency Framework](../workplans/kaizen-agentic-WP-0002-agency-framework.md)
+- [WP-0003: Measurement Loop](../workplans/kaizen-agentic-WP-0003-measurement-loop.md)
+- [WP-0004: Ecosystem Integration](../workplans/kaizen-agentic-WP-0004-ecosystem-integration.md)