Document measurement loop plan and ecosystem integration strategy.

Persist INTENT and ecosystem assessments in history/, add ADR-004 for project metrics with Helix Forge correlation, and register WP-0003 and WP-0004 workplans with State Hub. Update SCOPE, README, and agency-framework docs to reflect the two-layer measurement model.
2026-06-16 01:34:13 +02:00
parent 71ef5f4734
commit bd74d7d122
10 changed files with 1186 additions and 33 deletions
--- a/workplans/kaizen-agentic-WP-0003-measurement-loop.md
+++ b/workplans/kaizen-agentic-WP-0003-measurement-loop.md
@@ -0,0 +1,289 @@
+---
+id: KAIZEN-WP-0003
+type: workplan
+title: "Measurement Loop: Metrics Convention, Collection, and Optimizer Integration"
+domain: custodian
+repo: kaizen-agentic
+status: active
+owner: kaizen-agentic
+topic_slug: custodian
+state_hub_workstream_id: 36252a45-f360-4496-bf77-17b5dfb02767
+created: "2026-06-16"
+updated: "2026-06-17"
+---
+
+# KAIZEN-WP-0003 — Measurement Loop: Metrics Convention, Collection, and Optimizer Integration
+
+**Status:** active
+**Owner:** kaizen-agentic
+**Repo:** kaizen-agentic
+**Target version:** 1.1.0 (partial; remainder in WP-0001)
+
+## Goal
+
+Close the kaizen feedback loop defined in `INTENT.md` and `wiki/AgentKaizenOptimizer.md`:
+agents produce **measurable, per-execution performance records** stored in project-scoped
+`.kaizen/metrics/`, the existing `OptimizationLoop` reads that data and generates
+evidence-based recommendations, and the Coach/optimizer meta-agents share a single
+improvement path.
+
+This workplan addresses the P0 gap from the INTENT gap analysis: strategic vision
+(memory + qualitative learning) exists; **quantitative measurement → refinement**
+does not.
+
+---
+
+## Background
+
+| Layer | State |
+|-------|-------|
+| `INTENT.md` | Requires measurable-by-default agents and evidence-based refinement |
+| `wiki/KaizenAgentTemplate.md` | Defines `metrics`, `idempotency`, `optimization` sections per agent |
+| `wiki/AgentKaizenOptimizer.md` | Specifies `.kaizen/metrics/` storage and optimizer behaviour |
+| `src/kaizen_agentic/optimization.py` | `OptimizationLoop` + `PerformanceMetrics` implemented, unit-tested, unwired |
+| Agency framework (WP-0002) | `.kaizen/agents/<name>/memory.md` + Coach brief — qualitative only |
+| WP-0001 T04 | Telemetry — overlaps; WP-0003 defines the convention; WP-0001 can adopt it |
+
+---
+
+## Part 1 — Metrics Convention and Storage
+
+Define the project-scoped metrics artifact alongside the existing memory convention
+(ADR-002).
+
+### Location convention
+
+```
+<project-root>/.kaizen/metrics/<agent-name>/
+  executions.jsonl          # append-only per-execution records
+  summary.json              # rolling aggregates (regenerated on write)
+```
+
+Optimizer-specific aggregates (per `wiki/AgentKaizenOptimizer.md`):
+
+```
+<project-root>/.kaizen/metrics/optimizer/
+  analysis.json             # last run output + fingerprint
+  recommendations.jsonl     # append-only recommendation history
+```
+
+### Execution record schema (minimum viable)
+
+```json
+{
+  "timestamp": "ISO-8601",
+  "agent": "tdd-workflow",
+  "session_id": "optional-uuid-or-hash",
+  "execution_time_s": 0.0,
+  "success": true,
+  "quality_score": 0.0,
+  "primary_metric": { "name": "...", "value": 0.0, "target": 0.0 },
+  "metadata": {}
+}
+```
+
+### Tasks
+
+- [x] T01 — Write ADR-004: project metrics convention (location, schema, lifecycle, retention, Helix Forge correlation)
+- [ ] T02 — Implement `MetricsStore` in `src/kaizen_agentic/metrics.py` (append, read, summarise, prune by retention)
+- [ ] T03 — Add `memory init` hook to scaffold `.kaizen/metrics/<agent>/` alongside memory (optional flag `--no-metrics`)
+- [ ] T04 — Unit tests for `MetricsStore` (append idempotency key, summary regeneration, retention prune)
+
+### Definition of done
+
+- ADR-004 accepted and referenced from `docs/agency-framework.md`
+- `MetricsStore` passes unit tests
+- `kaizen-agentic memory init <agent>` creates metrics scaffold by default
+
+---
+
+## Part 2 — Metrics CLI
+
+Expose metrics collection and inspection without requiring Python imports in agent
+sessions.
+
+### Commands
+
+```
+kaizen-agentic metrics record <agent>   # Append one execution record (stdin JSON or flags)
+kaizen-agentic metrics show <agent>     # Print summary + recent executions
+kaizen-agentic metrics list             # List agents with metrics in current project
+kaizen-agentic metrics export <agent>   # Dump executions.jsonl to stdout
+```
+
+### Options (record)
+
+- `--target / -t` — project root (default: cwd)
+- `--success / --failure` — boolean outcome shorthand
+- `--time` — execution time in seconds
+- `--quality` — quality score 0.0–1.0
+- `--json` — full record on stdin
+
+### Tasks
+
+- [ ] T05 — Implement `metrics` CLI command group (record, show, list, export)
+- [ ] T06 — Integrate `metrics record` into session-close protocol template for pilot agents
+- [ ] T07 — CLI tests for metrics commands (click.testing, temp project dir)
+- [ ] T08 — Update `docs/CLI_CHEAT_SHEET.md` and `docs/agency-framework.md` with metrics section
+
+### Definition of done
+
+- All four metrics commands work against a test project with `.kaizen/metrics/`
+- Session-close template documents the `metrics record` one-liner for pilot agents
+- CLI cheat sheet updated
+
+---
+
+## Part 3 — Wire OptimizationLoop to Project Metrics
+
+Connect the existing Python optimization infrastructure to real project data.
+
+### Tasks
+
+- [ ] T09 — Add `OptimizationLoop.from_metrics_store(store)` factory that loads `PerformanceMetrics` from executions
+- [ ] T10 — Implement `kaizen-agentic metrics optimize [agent]` — run analysis, print recommendations, write `optimizer/analysis.json`
+- [ ] T11 — Consolidate `agent-optimization.md` and `agent-agent-optimization.md` into single canonical `optimization` agent; update registry
+- [ ] T12 — Update `agent-optimization.md` session protocol to invoke `metrics optimize` and reference ADR-004
+- [ ] T13 — Unit + integration tests: synthetic executions → recommendations → non-empty output
+
+### Definition of done
+
+- `kaizen-agentic metrics optimize` produces recommendations when ≥10 execution records exist (per wiki minimum sample size)
+- Single canonical optimization meta-agent in registry
+- Tests cover insufficient-data and sufficient-data paths
+
+---
+
+## Part 4 — Bridge Coach, Memory, and Metrics
+
+Unify qualitative memory and quantitative metrics in the orientation path.
+
+### Tasks
+
+- [ ] T14 — Extend `memory brief` to include metrics summary for target agent (recent success rate, avg quality, trend arrow)
+- [ ] T15 — Extend `agent-coach.md` to reference metrics context in synthesis instructions
+- [ ] T16 — E2e test: populate memory + metrics for two agents → `memory brief` includes both qualitative and quantitative sections
+
+### Definition of done
+
+- `memory brief tdd-workflow` output includes a `## Performance Summary` block when metrics exist
+- E2e test passes
+
+---
+
+## Part 5 — Pilot Agent and Template Conformance
+
+Prove the loop end-to-end on one agent before fleet-wide rollout.
+
+**Pilot agent:** `tdd-workflow` (high usage, clear success criteria in existing prompt)
+
+### Tasks
+
+- [ ] T17 — Add `metrics` section to `agent-tdd-workflow.md` frontmatter (primary: test-pass rate; secondary: cycle time)
+- [ ] T18 — Add session-close step: invoke `kaizen-agentic metrics record tdd-workflow` with session outcome
+- [ ] T19 — Document pilot in `wiki/AboutKaizenAgents.md` as reference implementation
+- [ ] T20 — E2e test: two simulated tdd-workflow sessions → metrics accumulate → optimize produces recommendation
+
+### Definition of done
+
+- tdd-workflow is the documented reference for metrics-enabled agents
+- Full loop demonstrated in e2e test: record → show → optimize → brief
+
+---
+
+## Part 6 — Packaging and Orientation
+
+Close distribution and documentation gaps surfaced in gap analysis.
+
+### Tasks
+
+- [ ] T21 — Sync missing 4 agents into `src/kaizen_agentic/data/agents/` (coach, sys-medic, scope-analyst, optimization)
+- [ ] T22 — Update `README.md` Getting Oriented to link `INTENT.md` and `wiki/` (SCOPE.md already updated)
+- [ ] T23 — Update `.claude/rules/architecture.md` agent table (21 agents, meta category, sys-medic, coach)
+- [ ] T24 — CHANGELOG.md entry for metrics convention and CLI
+
+### Definition of done
+
+- `pip install` / packaged data includes all 21 agents
+- README orientation path matches SCOPE.md
+- architecture.md agent count accurate
+
+---
+
+## Sequencing
+
+```
+Part 1 (T01–T04)  ──→  Part 2 (T05–T08)  ──→  Part 3 (T09–T13)
+                                                    │
+                     Part 4 (T14–T16)  ←────────────┘
+                            │
+                     Part 5 (T17–T20)  ──→  Part 6 (T21–T24)
+```
+
+Parts 1–2 are blocking. Part 3 depends on storage + CLI. Parts 4–5 can overlap
+once Part 3 factory exists. Part 6 can run in parallel except T21 (needs final
+agent consolidation from T11).
+
+Estimated effort: 4–6 sessions.
+
+---
+
+## Out of Scope (this workplan)
+
+- Full `wiki/KaizenAgentTemplate.md` conformance for all 21 agents (future workplan)
+- KaizenGuidance codemod pipeline (`wiki/KaizenGuidance.md`)
+- Scheduled/automated optimizer runs (cron, activity-core integration) — convention only
+- WP-0001 CI/CD, PyPI publication, cross-platform testing
+- ML-based pattern detection (pandas/sklearn in wiki spec) — simple statistics first
+
+---
+
+## Success Criteria
+
+A reader of `INTENT.md` can point to this repo and say:
+
+1. Agents **can** record measurable per-execution outcomes in a standard location.
+2. The optimization loop **does** read real project data and produce recommendations.
+3. Coach orientation **includes** performance context, not only qualitative memory.
+4. At least one agent (tdd-workflow) demonstrates the full measure → analyse → orient cycle.
+
+---
+
+## State Hub Task IDs
+
+| Code | UUID |
+|------|------|
+| T01 | 4e7b0fd2-38c0-46aa-84a7-bb18366b8c7c |
+| T02 | eeaa99c7-d7a7-403b-a013-364cba45a663 |
+| T03 | 247c097f-de89-4383-930c-35ee66de9b36 |
+| T04 | 3aa14026-6ee3-4384-b409-11300c1302f0 |
+| T05 | 6b505d29-7d2e-44a2-a4b7-1fe82884390c |
+| T06 | 84f2a357-f2dd-4fc7-96b6-a4e80d5467a7 |
+| T07 | 8e9ee64b-b7c4-4dff-ac6e-988fd47ef95d |
+| T08 | 4c41e0db-d5d8-4a1b-b346-06ad004edf4a |
+| T09 | 0b374439-6eca-4754-8e15-2a7eece0cd27 |
+| T10 | db87a09b-0252-495c-a771-a43b4b98f820 |
+| T11 | 73cb7d73-6fc6-42a9-97aa-d33cdf9ee363 |
+| T12 | c127eca7-7394-42db-ba5e-721aef0ccb76 |
+| T13 | f208dc9f-cdf7-47e3-9c03-09097e46eee9 |
+| T14 | d01f969c-bbb1-4eca-a4f1-d79d5c867b35 |
+| T15 | 67f791a4-fced-4986-a331-7eb4ea47fe6e |
+| T16 | 1fb89b54-8bd2-40bf-9a71-04693cb9f695 |
+| T17 | 1d471a7a-9a98-4805-903e-b4a2b8153717 |
+| T18 | abb387f1-86ce-4b9b-a516-2d4efb6aca4c |
+| T19 | 67fbc26e-a57d-4133-96e6-3d2cdbd10dc0 |
+| T20 | fbdd7c8b-e122-48d9-8c8f-de9f82d025e3 |
+| T21 | 9662bcec-34fe-451b-b61f-5d11b9574576 |
+| T22 | 422aae43-5697-4a00-86e9-1569baf09422 |
+| T23 | ba6b3411-d330-4a58-8cd0-62b4fbef8c5f |
+| T24 | 748be9f3-f6ac-4f26-a844-6330268935b6 |
+
+**Hub workstream:** `kaizen-wp-0003-measurement-loop` (`36252a45-f360-4496-bf77-17b5dfb02767`)
+
+---
+
+## Notes
+
+- Retention default: 180 days (per `wiki/AgentKaizenOptimizer.md`); override via project config in a later iteration
+- WP-0001 T04 (telemetry) should consume ADR-004 schema rather than inventing a parallel format
+- `OptimizationLoop` threshold constants (30s execution, 0.8 success rate) are starting points; expose in config later
--- a/workplans/kaizen-agentic-WP-0004-ecosystem-integration.md
+++ b/workplans/kaizen-agentic-WP-0004-ecosystem-integration.md
@@ -0,0 +1,190 @@
+---
+id: KAIZEN-WP-0004
+type: workplan
+title: "Ecosystem Integration: Helix Forge, activity-core, and artifact-store"
+domain: custodian
+repo: kaizen-agentic
+status: active
+owner: kaizen-agentic
+topic_slug: custodian
+state_hub_workstream_id: 76be7294-e201-4074-91c0-6421992470fe
+created: "2026-06-16"
+updated: "2026-06-17"
+---
+
+# KAIZEN-WP-0004 — Ecosystem Integration: Helix Forge, activity-core, and artifact-store
+
+**Status:** active
+**Owner:** kaizen-agentic
+**Repo:** kaizen-agentic
+**Depends on:** KAIZEN-WP-0003 Part 3 (metrics CLI + `metrics optimize` operational)
+
+## Goal
+
+Compose KaizenAgentic with adjacent ecosystem repos so INTENT's measurement and
+improvement vision spans **project** and **fleet** layers without duplicating
+capabilities or violating repo boundaries.
+
+Primary integrations: **agentic-resources** (Helix Forge), **activity-core**
+(scheduled triggers), **artifact-store** (evidence retention). Secondary
+integrations (info-tech-canon, kontextual-engine) are Part 4 stretch goals.
+
+Reference: `wiki/EcosystemIntegration.md`, `history/2026-06-16-ecosystem-assessment.md`
+
+---
+
+## Part 1 — Helix Forge Correlation (agentic-resources)
+
+Wire project metrics (ADR-004) to fleet session metrics without re-implementing
+session ingestion.
+
+### Tasks
+
+- [ ] T01 — Document correlation contract in `agentic-resources` (cross-repo PR or shared doc link from both repos)
+- [ ] T02 — Add optional `helix_session_uid` population to `metrics record` when env `HELIX_SESSION_UID` is set
+- [ ] T03 — Add `kaizen-agentic metrics correlate` — lookup Helix digest summary by UID (read-only adapter stub if Helix API not ready)
+- [ ] T04 — Integration test: synthetic project record with `helix_session_uid` round-trips through show/brief
+- [ ] T05 — Update `wiki/EcosystemIntegration.md` with worked correlation example
+
+### Definition of done
+
+- Project execution records can carry Helix correlation fields per ADR-004
+- Documentation is bidirectional (kaizen-agentic + agentic-resources reference each other)
+- No session JSONL ingestion code in kaizen-agentic
+
+---
+
+## Part 2 — activity-core Triggers
+
+Define ActivityDefinitions for recurring kaizen operations.
+
+### Tasks
+
+- [ ] T06 — Draft ActivityDefinition: weekly `metrics optimize` on repos with `.kaizen/metrics/`
+- [ ] T07 — Draft ActivityDefinition: post-install metrics scaffold validation (`memory init` check)
+- [ ] T08 — Draft ActivityDefinition: success_rate below 0.8 → issue-core review task
+- [ ] T09 — Document ActivityDefinition paths and activation contract in `docs/INTEGRATION_PATTERNS.md`
+- [ ] T10 — Smoke test: manual activation against a test repo with populated metrics
+
+### Definition of done
+
+- Three ActivityDefinition markdown files committed (location per activity-core convention)
+- kaizen-agentic docs describe how activity-core triggers map to CLI commands
+- No scheduling code in kaizen-agentic
+
+---
+
+## Part 3 — artifact-store Evidence Retention
+
+Persist optimizer outputs as registered artifact packages.
+
+### Tasks
+
+- [ ] T11 — Define artifact package manifest for optimizer run (`analysis.json` + `recommendations.jsonl`)
+- [ ] T12 — Add `kaizen-agentic metrics publish` — register optimizer output with artifact-store API (configurable endpoint)
+- [ ] T13 — Map retention class `raw-evidence` (180d) in publish manifest metadata
+- [ ] T14 — Integration test with artifact-store local backend (skip if service unavailable; mark `@pytest.mark.integration`)
+- [ ] T15 — Document publish workflow in `docs/agency-framework.md` metrics section
+
+### Definition of done
+
+- Optimizer outputs can be registered as artifact packages when artifact-store is reachable
+- Retention metadata matches ADR-004 default
+- Publish is optional — local-only workflows still work without artifact-store
+
+---
+
+## Part 4 — Canon and Knowledge (stretch)
+
+Secondary integrations for template conformance and knowledge asset lifecycle.
+
+### Tasks
+
+- [ ] T16 — Map `wiki/KaizenAgentTemplate.md` sections to info-tech-canon profile outline (design doc only)
+- [ ] T17 — Draft one InfoTechCanon-style agent brief for `tdd-workflow` pilot
+- [ ] T18 — Spike: kontextual-engine ingestion manifest for `wiki/` directory (design note, no runtime dependency)
+- [ ] T19 — Update `history/2026-06-16-ecosystem-assessment.md` with Part 4 outcomes
+
+### Definition of done
+
+- Design artifacts committed; no hard dependency on info-tech-canon or kontextual-engine services
+- tdd-workflow brief serves as reference for fleet-wide brief rollout (future WP)
+
+---
+
+## Sequencing
+
+```
+WP-0003 Part 3 complete
+        │
+        ▼
+Part 1 (T01–T05)  ──→  Part 2 (T06–T10)
+        │                      │
+        └──────────┬───────────┘
+                   ▼
+            Part 3 (T11–T15)
+                   │
+                   ▼
+            Part 4 (T16–T19)  [stretch]
+```
+
+Part 1 can start once `metrics record` and `metrics optimize` exist.
+Parts 2–3 can overlap. Part 4 is non-blocking.
+
+Estimated effort: 3–5 sessions after WP-0003 Part 3.
+
+---
+
+## Out of Scope
+
+- Cloning or implementing tele-mcp (assess separately)
+- phase-memory graph migration (future WP)
+- Full KaizenGuidance codemod pipeline
+- Owning activity-core, artifact-store, or agentic-resources code
+
+---
+
+## Success Criteria
+
+1. Two-layer measurement model is documented, implemented at correlation layer,
+   and operable without repo merges.
+2. Recurring kaizen checks can be triggered via activity-core without custom cron.
+3. Optimizer evidence can be preserved in artifact-store when configured.
+4. Canon/knowledge integration has a clear design path for later work.
+
+---
+
+## State Hub Task IDs
+
+| Code | UUID |
+|------|------|
+| T01 | f365d19e-9619-4453-bebf-f1fd596b1bd1 |
+| T02 | e7f47683-5957-49db-bcbd-3aa47f44a073 |
+| T03 | 6ef8ba99-7d0c-44f4-835d-7a66e9d55984 |
+| T04 | 9875422c-a54b-40f1-a444-6b485a9e57d6 |
+| T05 | 0dc33d13-0e0b-4336-a7ad-371fc533b823 |
+| T06 | dbaa5f46-f66a-4a74-b4a0-97978e47d1c3 |
+| T07 | 161a264a-8f70-4e37-a854-bd5a76a0e54b |
+| T08 | 3b58ad38-839c-436a-8d97-ef5a8f9beefe |
+| T09 | a004b60f-4e8f-4881-b088-229ac9ab242f |
+| T10 | 84866bf1-5830-470d-87a5-9786222332c2 |
+| T11 | 033a19db-fbd2-411f-9d2e-779d210400d4 |
+| T12 | 54517f2b-23e3-433b-a483-c59227625dbc |
+| T13 | 3b378789-a761-4472-b072-a346541be239 |
+| T14 | a3566713-db58-4519-b9c4-5003421c1f1e |
+| T15 | 5d8255aa-fd7a-4fe6-bce2-3a176f954c7f |
+| T16 | 852c9cbf-0b0c-4f23-8594-905ca280c268 |
+| T17 | 62e05097-9033-401d-bbe0-d5d773da50fe |
+| T18 | cd6962c7-aaed-4d7d-81de-37c0e3ed715e |
+| T19 | 2c1f66f5-e6ab-4e19-88ca-818acb15a706 |
+
+**Hub workstream:** `kaizen-wp-0004-ecosystem-integration` (`76be7294-e201-4074-91c0-6421992470fe`)
+
+---
+
+## Notes
+
+- ADR-004 Helix Forge correlation section is the authoritative field mapping
+- WP-0001 T04 (telemetry) should evaluate tele-mcp as adapter candidate
+- activity-core ActivityDefinitions live in activity-core repo per ACT-ADR-002/003;
+  kaizen-agentic commits reference copies or links under `docs/integrations/`