Files
coulomb-loop/workplans/LOOP-WP-0002-reactive-quality-escalation.md
tegwick e783dc9a2b Bootstrap coulomb-loop engagement: governance, loops, and activity definitions.
Register with state-hub, accept DEC-001–004 and ADR-004 rotation policy, scaffold
pilot roster, hourly ActivityDefinition copies, and bootstrap log after schedule
init on three custodian pilot repos.
2026-06-18 04:53:51 +02:00

233 lines
6.6 KiB
Markdown

---
id: LOOP-WP-0002
type: workplan
title: "Reactive Quality Escalation (signal-driven improvement)"
domain: coulomb_social
repo: coulomb-loop
status: active
owner: coulomb-loop
topic_slug: coulomb_social
supplier: kaizen-agentic
created: "2026-06-18"
updated: "2026-06-18"
depends_on:
- LOOP-WP-0001
tasks:
- id: T01
status: done
title: Define escalation signals and thresholds
- id: T02
status: done
title: Draft low-success-rate ActivityDefinition for coulomb-loop
- id: T03
status: todo
title: Specify kaizen.metrics.recorded event emitter contract
- id: T04
status: todo
title: Hourly metrics sweep fallback until event bus is live
- id: T05
status: todo
title: Wire activity-core event trigger and smoke test
- id: T06
status: todo
title: Add test-maintenance escalation path for CI-degraded repos
- id: T07
status: todo
title: Promote sweep cadence from hourly to daily after stabilization
state_hub_workstream_id: "d4d3b624-dad3-4e1d-9db3-e84548d133de"
---
# LOOP-WP-0002 — Reactive Quality Escalation
**Status:** active
**Owner:** coulomb-loop (customer)
**Supplier:** kaizen-agentic
**Depends on:** LOOP-WP-0001 (metrics scaffold on pilot repos)
## Goal
Complement calendar-based improvement (LOOP-WP-0001) with **signal-driven**
escalation: when agent performance or test health degrades, activity-core creates
a high-priority task before drift compounds.
```mermaid
flowchart LR
REC[metrics record at session close]
EVT[kaizen.metrics.recorded]
RULE[low-success-rate-review]
TASK[high-priority hub task]
AGT[optimization / test-maintenance session]
REC --> EVT --> RULE --> TASK --> AGT
```
## Escalation signals
| Signal | Threshold | Agent | Priority |
|--------|-----------|-------|----------|
| Agent success rate | < 0.8 over ≥ 5 executions | `optimization` | high |
| Agent quality trend | declining 3 consecutive records | `optimization` | medium |
| Test failure streak | ≥ 2 CI failures in 24h (pilot) | `test-maintenance` | high |
| Optimizer recommends action | `recommendations.jsonl` non-empty + unacted | `optimization` | medium |
Thresholds live in `loops/quality-escalation/thresholds.yml` (created in T01).
## Cadence ramp
| Phase | Mechanism | Cadence |
|-------|-----------|---------|
| Bootstrap | Hourly metrics **sweep** (poll `.kaizen/metrics/`) | `0 * * * *` |
| Stabilize | Event-driven `kaizen.metrics.recorded` primary; sweep daily backup | daily `0 6 * * *` |
| Operate | Event-only; sweep weekly health check | weekly |
Hourly sweep exists because the event emitter may lag LOOP-WP-0001 bootstrap.
The sweep is **temporary scaffolding**, not the long-term design.
## Part 1 — Signal contract
## Define escalation signals and thresholds
```task
id: LOOP-WP-0002-T01
status: todo
priority: high
state_hub_task_id: "f6f549e5-0d67-49b8-8165-346c64000696"
```
Create `loops/quality-escalation/thresholds.yml` and document each signal's
rationale. Align with supplier `low-success-rate-review` defaults (0.8, min 5).
## Draft low-success-rate ActivityDefinition
```task
id: LOOP-WP-0002-T02
status: todo
priority: high
state_hub_task_id: "a19c2ade-029d-4b1c-ba08-8fd82729e649"
```
Copy `kaizen-agentic/docs/integrations/activity-definitions/low-success-rate-review.md`
to `coulomb-loop/activity-definitions/low-success-rate-review.md`.
Adjust:
- `owner: coulomb-loop`
- `trigger.type: event` with `event_type: kaizen.metrics.recorded`
- task `labels` include `coulomb-loop`, `quality-escalation`
- `enabled: false` until smoke test
## Specify event emitter contract
```task
id: LOOP-WP-0002-T03
status: todo
priority: high
state_hub_task_id: "6dbdb335-9c1d-4db0-ad7d-4342c966cd47"
```
Document expected NATS payload in `loops/quality-escalation/event-payload.md`:
```json
{
"agent": "coach",
"project": "kaizen-agentic",
"summary": {
"success_rate": 0.75,
"execution_count": 12,
"avg_quality": 0.81
}
}
```
Supplier action item: emit event from `kaizen-agentic metrics record` when
`--emit-event` flag set (or always in engagement mode). Track in supplier-notes;
implementation stays in kaizen-agentic.
## Part 2 — Bootstrap sweep (hourly)
## Hourly metrics sweep fallback
```task
id: LOOP-WP-0002-T04
status: todo
priority: medium
state_hub_task_id: "604a9515-0f6b-47e4-8a37-6bfc374ca4f3"
```
Draft `coulomb-loop/activity-definitions/hourly-metrics-health-sweep.md`:
- Resolver: shell `discover_kaizen_projects` with marker `.kaizen/metrics`
- Filter: pilot roster from LOOP-WP-0001
- Condition: read `summary.json`; flag `success_rate < 0.8 && execution_count >= 5`
- Action: create review task with `metrics show` + `metrics optimize` commands
Cron: `45 * * * *` (offset from LOOP-WP-0001 hourly chain).
## Wire activity-core and smoke test
```task
id: LOOP-WP-0002-T05
status: todo
priority: medium
state_hub_task_id: "a5d8a6e1-9908-49a5-8976-900c433cd325"
```
1. Sync sweep definition to activity-core
2. Inject test metrics below threshold on one pilot repo
3. Verify task creation within one hourly cycle
4. Run optimization session; confirm metrics improve on next sweep
When event bus is ready, enable `low-success-rate-review` and keep sweep as
backup only.
## Part 3 — Test health path
## test-maintenance escalation for CI-degraded repos
```task
id: LOOP-WP-0002-T06
status: todo
priority: low
state_hub_task_id: "206d33c2-0a8a-4274-8361-f69291f11b94"
```
Draft `hourly-ci-health-escalation.md` (bootstrap) / `daily-ci-health-escalation.md`
(stabilize):
- Context: state-hub or shell resolver listing pilot repos
- Signal: open CI failure indicator (Gitea API or `make test` exit code in scheduled probe)
- Agent: `test-maintenance` via `schedule prepare test-maintenance`
- Scope: pilot repos only in bootstrap
Defer full fleet CI integration until daily phase.
## Promote sweep to daily
```task
id: LOOP-WP-0002-T07
status: todo
priority: low
state_hub_task_id: "fa4205f4-6058-495c-bcaa-40c20f27f9aa"
```
After LOOP-WP-0004 approval and event trigger proven:
1. Disable hourly sweep
2. Enable event-driven `low-success-rate-review`
3. Retain daily backup sweep at `0 6 * * *`
## Definition of done
- Below-threshold metrics on a pilot repo create a task within one bootstrap cycle
- Optimization session closes loop (success rate recovers or recommendation filed)
- Event payload spec handed to kaizen-agentic supplier
- Cadence promotion path documented
## Out of scope
- Full fleet CI integration in bootstrap phase
- Implementing NATS emitter in coulomb-loop (supplier + activity-core)
## Supplier feedback
Note whether `metrics record --emit-event` should become standard for customer
engagements. Feed into kaizen-agentic customer bootstrap playbook.