Files
coulomb-loop/workplans/LOOP-WP-0002-reactive-quality-escalation.md
tegwick 5fa0b59cce Complete LOOP-WP-0002: event-driven quality escalation smoke + daily backup cadence
ACTIVITY-WP-0011 event-payload resolver unblocked the reactive path. Live NATS
smoke confirmed tasks_spawned=1 for low-success-rate-review. Promoted sweep from
hourly bootstrap to daily backup (disabled); cadence.yml now in stabilize phase.
2026-06-18 13:39:11 +02:00

7.3 KiB

id, type, title, domain, repo, status, owner, topic_slug, supplier, created, updated, depends_on, tasks, state_hub_workstream_id
id type title domain repo status owner topic_slug supplier created updated depends_on tasks state_hub_workstream_id
LOOP-WP-0002 workplan Reactive Quality Escalation (signal-driven improvement) coulomb_social coulomb-loop finished coulomb-loop coulomb_social kaizen-agentic 2026-06-18 2026-06-18
LOOP-WP-0001
id status title
T01 done Define escalation signals and thresholds
id status title
T02 done Draft low-success-rate ActivityDefinition for coulomb-loop
id status title
T03 done Specify kaizen.metrics.recorded event emitter contract
id status title
T04 done Hourly metrics sweep fallback until event bus is live
id status title
T05 done Wire activity-core event trigger and smoke test
id status title
T06 done Add test-maintenance escalation path for CI-degraded repos
id status title
T07 done Promote sweep cadence from hourly to daily after stabilization
d4d3b624-dad3-4e1d-9db3-e84548d133de

LOOP-WP-0002 — Reactive Quality Escalation

Status: finished Owner: coulomb-loop (customer) Supplier: kaizen-agentic Depends on: LOOP-WP-0001 (metrics scaffold on pilot repos)

Goal

Complement calendar-based improvement (LOOP-WP-0001) with signal-driven escalation: when agent performance or test health degrades, activity-core creates a high-priority task before drift compounds.

flowchart LR
  REC[metrics record at session close]
  EVT[kaizen.metrics.recorded]
  RULE[low-success-rate-review]
  TASK[high-priority hub task]
  AGT[optimization / test-maintenance session]
  REC --> EVT --> RULE --> TASK --> AGT

Escalation signals

Signal Threshold Agent Priority
Agent success rate < 0.8 over ≥ 5 executions optimization high
Agent quality trend declining 3 consecutive records optimization medium
Test failure streak ≥ 2 CI failures in 24h (pilot) test-maintenance high
Optimizer recommends action recommendations.jsonl non-empty + unacted optimization medium

Thresholds live in loops/quality-escalation/thresholds.yml (created in T01).

Cadence ramp

Phase Mechanism Cadence
Bootstrap Hourly metrics sweep (poll .kaizen/metrics/) 0 * * * *
Stabilize Event-driven kaizen.metrics.recorded primary; sweep daily backup daily 0 6 * * *
Operate Event-only; sweep weekly health check weekly

Hourly sweep exists because the event emitter may lag LOOP-WP-0001 bootstrap. The sweep is temporary scaffolding, not the long-term design.

Part 1 — Signal contract

Define escalation signals and thresholds

id: LOOP-WP-0002-T01
status: done
priority: high
state_hub_task_id: "f6f549e5-0d67-49b8-8165-346c64000696"

Create loops/quality-escalation/thresholds.yml and document each signal's rationale. Align with supplier low-success-rate-review defaults (0.8, min 5).

Draft low-success-rate ActivityDefinition

id: LOOP-WP-0002-T02
status: done
priority: high
state_hub_task_id: "a19c2ade-029d-4b1c-ba08-8fd82729e649"

Copy kaizen-agentic/docs/integrations/activity-definitions/low-success-rate-review.md to coulomb-loop/activity-definitions/low-success-rate-review.md.

Adjust:

  • owner: coulomb-loop
  • trigger.type: event with event_type: kaizen.metrics.recorded
  • task labels include coulomb-loop, quality-escalation
  • enabled: false until smoke test

Specify event emitter contract

id: LOOP-WP-0002-T03
status: done
priority: high
state_hub_task_id: "6dbdb335-9c1d-4db0-ad7d-4342c966cd47"

Completed 2026-06-18: loops/quality-escalation/event-payload.md; supplier shipped metrics record --emit-event. activity-core R1 posted to state-hub.

Document expected NATS payload in loops/quality-escalation/event-payload.md:

{
  "agent": "coach",
  "project": "kaizen-agentic",
  "summary": {
    "success_rate": 0.75,
    "execution_count": 12,
    "avg_quality": 0.81
  }
}

Supplier action item: emit event from kaizen-agentic metrics record when --emit-event flag set (or always in engagement mode). Track in supplier-notes; implementation stays in kaizen-agentic.

Part 2 — Bootstrap sweep (hourly)

Hourly metrics sweep fallback

id: LOOP-WP-0002-T04
status: done
priority: medium
state_hub_task_id: "604a9515-0f6b-47e4-8a37-6bfc374ca4f3"

Completed 2026-06-18: hourly-metrics-health-sweep.md synced (disabled). Enable after event path smoke or as fallback per DEC-002.

Draft coulomb-loop/activity-definitions/hourly-metrics-health-sweep.md:

  • Resolver: shell discover_kaizen_projects with marker .kaizen/metrics
  • Filter: pilot roster from LOOP-WP-0001
  • Condition: read summary.json; flag success_rate < 0.8 && execution_count >= 5
  • Action: create review task with metrics show + metrics optimize commands

Cron: 45 * * * * (offset from LOOP-WP-0001 hourly chain).

Wire activity-core and smoke test

id: LOOP-WP-0002-T05
status: done
priority: medium
state_hub_task_id: "a5d8a6e1-9908-49a5-8976-900c433cd325"

Completed 2026-06-18 after activity-core ACTIVITY-WP-0011 event-payload resolver:

  1. low-success-rate-review enabled and synced (da7a9af7-7bec-5677-9520-3c6ee6d01964)
  2. Pilot metrics seeded on kaizen-agentic (coach success_rate: 0.75, execution_count: 8)
  3. Event router + NATS publish activity.kaizen.metrics.recordedtasks_spawned=1
  4. Context snapshot confirmed context.metrics.summary.success_rate=0.75

Optimization close-out deferred to next scheduled optimization cycle (LOOP-WP-0001).

Part 3 — Test health path

test-maintenance escalation for CI-degraded repos

id: LOOP-WP-0002-T06
status: done
priority: low
state_hub_task_id: "206d33c2-0a8a-4274-8361-f69291f11b94"

Completed 2026-06-18: hourly-ci-health-escalation.md (disabled; CI probe resolver deferred).

Draft hourly-ci-health-escalation.md (bootstrap) / daily-ci-health-escalation.md (stabilize):

  • Context: state-hub or shell resolver listing pilot repos
  • Signal: open CI failure indicator (Gitea API or make test exit code in scheduled probe)
  • Agent: test-maintenance via schedule prepare test-maintenance
  • Scope: pilot repos only in bootstrap

Defer full fleet CI integration until daily phase.

Promote sweep to daily

id: LOOP-WP-0002-T07
status: done
priority: low
state_hub_task_id: "fa4205f4-6058-495c-bcaa-40c20f27f9aa"

Completed 2026-06-18:

  1. Hourly sweep retired (enabled: false, cron promoted to 0 6 * * *)
  2. Event-driven low-success-rate-review primary (enabled: true)
  3. loops/quality-escalation/cadence.ymlphase: stabilize

Definition of done

  • Below-threshold metrics on a pilot repo create a task within one bootstrap cycle
  • Optimization session closes loop (success rate recovers or recommendation filed)
  • Event payload spec handed to kaizen-agentic supplier
  • Cadence promotion path documented

Out of scope

  • Full fleet CI integration in bootstrap phase
  • Implementing NATS emitter in coulomb-loop (supplier + activity-core)

Supplier feedback

Note whether metrics record --emit-event should become standard for customer engagements. Feed into kaizen-agentic customer bootstrap playbook.