chore(consistency): sync task status from DB [auto]

Updated by fix-consistency on 2026-06-30: - update .custodian-brief.md for activity-core
2026-06-30 01:50:22 +02:00 · 2026-06-29 13:45:41 +02:00 · 2026-06-29 13:33:21 +02:00 · 2026-06-29 12:57:25 +02:00 · 2026-06-27 20:34:25 +02:00 · 2026-06-27 09:58:47 +02:00
82 changed files with 7108 additions and 227 deletions
--- a/.claude/rules/credential-routing.md
+++ b/.claude/rules/credential-routing.md
@@ -0,0 +1,50 @@
 # Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
--- a/.claude/rules/first-session.md
+++ b/.claude/rules/first-session.md
@@ -1,11 +1,11 @@
 ## First Session Protocol
-Triggered when `get_domain_summary("custodian")` shows **no workstreams**.
+Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
 The project is registered but work has not yet been structured.
 **Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/custodian/project_charter_v0.1.md` — purpose, scope
+- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/custodian/roadmap_v0.1.md` — planned phases
+- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
 - Scan repo root: README, directory structure, existing code or docs
 **Step 2 — Survey in-progress work**
@@ -17,7 +17,7 @@ roadmap phase. **Wait for approval before creating.**
 **Step 4 — Create workplan file first, then DB record (ADR-001)**
 ```
-workplans/activity-core-WP-NNNN-<slug>.md   ← write this first
+workplans/ACTIVITY-WP-NNNN-<slug>.md   ← write this first
 ```
 Then register in the hub:
 ```
@@ -28,7 +28,7 @@ create_task(workstream_id="<id>", title="...", priority="high|medium|low")
 **Step 5 — Record the setup**
 ```
 add_progress_event(
-    summary="First session: structured custodian into N workstreams, M tasks",
+    summary="First session: structured infotech into N workstreams, M tasks",
    event_type="milestone",
    topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
    detail={"workstreams": [...], "tasks_created": M}
--- a/.claude/rules/repo-identity.md
+++ b/.claude/rules/repo-identity.md
@@ -1,5 +1,5 @@
 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.
-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a
--- a/.claude/rules/session-protocol.md
+++ b/.claude/rules/session-protocol.md
@@ -1,6 +1,7 @@
 ## Session Protocol
-State Hub: http://127.0.0.1:8000
+Dev Hub (State Hub API): http://127.0.0.1:8000
 MCP server name in `~/.claude.json`: `dev-hub`
 **Step 1 — Orient**
@@ -10,7 +11,7 @@ cat .custodian-brief.md
 ```
 Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
 ```
-get_domain_summary("custodian")
+get_domain_summary("infotech")
 ```
 If MCP tools are unavailable in the current agent session, use the REST API:
 ```bash
@@ -39,11 +40,11 @@ curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
 ls workplans/
 ```
 For each file with `status: ready`, `active`, or `blocked`, note pending
-`todo`/`in_progress` tasks.
+`wait`/`todo`/`progress` tasks.
 **Step 4 — Present brief**
-1. **Active workstreams** for `custodian` — title, task counts, blocking decisions
+1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
 2. **Pending tasks** from `workplans/` + any `[repo:activity-core]` hub tasks
 3. **Goal guidance** — if `goal_guidance` in summary:
   - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
--- a/.claude/rules/workplan-convention.md
+++ b/.claude/rules/workplan-convention.md
@@ -1,7 +1,7 @@
 ## Workplan Convention (ADR-001)
-File location: `workplans/activity-core-WP-NNNN-<slug>.md`
+File location: `workplans/ACTIVITY-WP-NNNN-<slug>.md`
-ID prefix: `ACTIVITY-WP`
+ID prefix: `ACTIVITY-WP-`
 Work items originate as files in this repo **before** being registered in the hub.
@@ -12,7 +12,7 @@ repo state, and `finished` when implementation is complete. `stalled` and
 `needs_review` are derived health labels, not stored statuses.
 Closed workplans may be moved to `workplans/archived/` with a completion-date
-prefix: `YYMMDD-activity-core-WP-NNNN-<slug>.md`. The frontmatter id remains
+prefix: `YYMMDD-ACTIVITY-WP-NNNN-<slug>.md`. The frontmatter id remains
 unchanged; the prefix is only for quick visual reference.
 Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
@@ -25,4 +25,16 @@ Ecosystem todos from other agents arrive as `[repo:activity-core]` hub tasks —
 visible at session start. Pick one up by creating the workplan file, then registering
 the workstream.
 Task blocks use this shape:
 ```task
 id: ACTIVITY-WP-NNNN-T01
 status: wait | todo | progress | done | cancel
 priority: high | medium | low
 state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 ```
 Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
 blocked work and `cancel` for stopped work.
 <!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
--- a/.custodian-brief.md
+++ b/.custodian-brief.md
@@ -1,18 +1,56 @@
 <!-- custodian-brief: generated by fix-consistency — do not edit manually -->
 # Custodian Brief — activity-core
-**Domain:** custodian  
+**Domain:** infotech  
-**Last synced:** 2026-06-17 21:59 UTC  
+**Last synced:** 2026-06-29 23:50 UTC  
 **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
 ## Active Workstreams
 ### Automation schedule inventory Make targets
 Progress: 0/5 done  |  workstream_id: `21c73763-9adc-42f6-8fd2-1b8b33c2c770`
 **Open tasks:**
 - · Task: Define the automation inventory contract  `8de24590`
 - · Task: Implement a non-mutating inventory CLI  `538cb9a5`
 - · Task: Add Make targets  `f2001721`
 - · Task: Document the inventory workflow  `f687743b`
 - · Task: Verify against current repo and live/degraded sources  `5317b532`
 ### LLM Output Robustness & The Producer Trust Boundary
 Progress: 3/10 done  |  workstream_id: `4ef0d53b-1777-41ae-80c6-1b69fdb34726`
 **Open tasks:**
 - ! Reproduce & Root-Cause The Failure  `74fd16a5`
  *(wait: Local analysis complete: mechanism is the unbounded ~1-per-workstream recommendation list (16 active workstreams; break at char 5268 ~rank 8-9); both first attempt and retry failed. Exact token + finish_reason are unrecoverable from activity-core (complete() drops finish_reason; report cap 4000 < 5268; log cap 2000). Remaining: pull llm-connect producer-side logs on railiance01 (cluster/operator-owned). Does NOT block T02/T03 — mitigation is identical regardless.)*
 - ► Tests + calibration re-entry  `b7b9e07a`
 - ► Schema + Prompt Redesign For Error Locality  `ae67ca8c`
 - ► Tests + Calibration Re-Entry  `c881500b`
 - · Reproduce & root-cause the 06-26 validation failure  `2d3bba00`
 - · Schema + prompt redesign for error locality  `5da6962c`
 - · Boundary parser — verify & mitigate with quarantine lane  `4c408114`
 ### Post-triage operational hardening
-Progress: 5/6 done  |  workstream_id: `5646e13a-13af-4724-bca6-3c0d86f96733`
+Progress: 7/8 done  |  workstream_id: `5646e13a-13af-4724-bca6-3c0d86f96733`
 **Open tasks:**
 - ! Three-Run Calibration Feedback  `7cbf0a35`
 ### Adopt State Hub Beachhead Endpoint
 Progress: 0/2 done  |  workstream_id: `bbc07f9e-9323-4b2b-b556-c33b37d0b228`
 **Open tasks:**
 - ! Point STATE_HUB_URL at the beachhead  `76b6132d`
 - ! Retire the bespoke actcore-state-hub-bridge proxy  `526c2129`
 ### Daily Triage LLM Reconciliation And Evidence
 Progress: 2/5 done  |  workstream_id: `f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9`
 **Open tasks:**
 - ! Run Daily Triage Fixture Smoke  `10e0df77`
 - ! Collect Three Clean Scheduled Runs  `dc6b9482`
 - ! Close Handoff State  `ecc57e21`
 ### Intent gap closure
 Progress: 4/6 done  |  workstream_id: `d64cfbba-6da7-4737-afb9-866afa0e9cda`
@@ -30,6 +68,6 @@ Progress: 2/3 done  |  workstream_id: `7387fc50-1f2c-471a-9d85-bb085cbd0b63`
 ## MCP Orientation (when available)
 If the state-hub MCP server is reachable, call:
-`get_domain_summary("custodian")`
+`get_domain_summary("infotech")`
 This provides richer cross-domain context.
 If the MCP call fails, use this file as your orientation source.
--- a/.env.example
+++ b/.env.example
@@ -18,14 +18,17 @@ STATE_HUB_URL=http://127.0.0.1:8000
 # Repo scoping — used by the repo-scoping context adapter. Binds {} on failure.
 REPO_SCOPING_URL=http://127.0.0.1:8020
 # Issue Core — task emission backend.
-ISSUE_CORE_URL=http://127.0.0.1:8010
+ISSUE_CORE_URL=http://127.0.0.1:8765
 # Shared ingestion key — must match issue-core's ISSUE_CORE_API_KEY.
 ISSUE_CORE_API_KEY=
 # Sink type: 'rest' (POST to issue-core) or 'null' (discard, for dry-run).
 ISSUE_SINK_TYPE=rest
 # ── Activity definitions ───────────────────────────────────────────────────────
 # Colon-separated paths to additional activity-definitions/ directories.
 # The local activity-definitions/ directory is always scanned.
-ACTIVITY_DEFINITION_DIRS=
+# Coulomb-loop kaizen engagement definitions (colon-separated for more roots).
 ACTIVITY_DEFINITION_DIRS=/home/worsch/coulomb-loop
 # ── Observability ─────────────────────────────────────────────────────────────
 # Prometheus metrics bind address (Temporal SDK metrics).
--- a/.kaizen/agents/coach/memory.md
+++ b/.kaizen/agents/coach/memory.md
@@ -0,0 +1,24 @@
 ---
 agent: coach
 project: activity-core
 last_updated: 2026-06-18
 session_count: 0
 ---
 ## Project Context
 <!-- What this agent knows about the project it works in -->
 ## Accumulated Findings
 <!-- Patterns, recurring issues, key decisions encountered -->
 ## What Worked
 <!-- Approaches that produced good results in this project -->
 ## Watch Points
 <!-- Recurring risks, traps, or areas requiring extra care -->
 ## Open Threads
 <!-- Things noticed but not yet acted on -->
 ## Session Log
 <!-- One-line entry per session: date · summary · outcome -->
--- a/.kaizen/agents/optimization/memory.md
+++ b/.kaizen/agents/optimization/memory.md
@@ -0,0 +1,24 @@
 ---
 agent: optimization
 project: activity-core
 last_updated: 2026-06-18
 session_count: 0
 ---
 ## Project Context
 <!-- What this agent knows about the project it works in -->
 ## Accumulated Findings
 <!-- Patterns, recurring issues, key decisions encountered -->
 ## What Worked
 <!-- Approaches that produced good results in this project -->
 ## Watch Points
 <!-- Recurring risks, traps, or areas requiring extra care -->
 ## Open Threads
 <!-- Things noticed but not yet acted on -->
 ## Session Log
 <!-- One-line entry per session: date · summary · outcome -->
--- a/.kaizen/metrics/coach/executions.jsonl
+++ b/.kaizen/metrics/coach/executions.jsonl
@@ -0,0 +1,2 @@
 {"agent": "coach", "execution_time_s": 120.0, "quality_score": 0.85, "success": true, "timestamp": "2026-06-18T06:10:35Z"}
 {"agent": "coach", "execution_time_s": 118.0, "quality_score": 0.86, "success": true, "timestamp": "2026-06-18T10:06:38Z"}
--- a/.kaizen/metrics/coach/summary.json
+++ b/.kaizen/metrics/coach/summary.json
@@ -0,0 +1,12 @@
 {
  "agent": "coach",
  "avg_execution_time_s": 119.0,
  "avg_quality_score": 0.855,
  "execution_count": 2,
  "last_execution": "2026-06-18T10:06:38Z",
  "success_rate": 1.0,
  "trend": {
    "quality_score": "stable",
    "success_rate": "stable"
  }
 }
--- a/.kaizen/metrics/optimization/executions.jsonl
+++ b/.kaizen/metrics/optimization/executions.jsonl
@@ -0,0 +1,2 @@
 {"agent": "optimization", "execution_time_s": 90.0, "quality_score": 0.8, "success": true, "timestamp": "2026-06-18T06:10:35Z"}
 {"agent": "optimization", "execution_time_s": 88.0, "quality_score": 0.81, "success": true, "timestamp": "2026-06-18T10:06:38Z"}
--- a/.kaizen/metrics/optimization/summary.json
+++ b/.kaizen/metrics/optimization/summary.json
@@ -0,0 +1,12 @@
 {
  "agent": "optimization",
  "avg_execution_time_s": 89.0,
  "avg_quality_score": 0.805,
  "execution_count": 2,
  "last_execution": "2026-06-18T10:06:38Z",
  "success_rate": 1.0,
  "trend": {
    "quality_score": "stable",
    "success_rate": "stable"
  }
 }
--- a/.kaizen/metrics/optimizer/analysis.json
+++ b/.kaizen/metrics/optimizer/analysis.json
@@ -0,0 +1,59 @@
 {
  "agents": [
    {
      "agent_name": "coach",
      "meets_sample_threshold": false,
      "metrics_count": 2,
      "optimization_cycles": 0,
      "performance_analysis": {
        "analysis_timestamp": "2026-06-18T12:06:39.212809",
        "avg_execution_time": 119.0,
        "avg_quality_score": 0.855,
        "avg_success_rate": 1.0,
        "execution_time_trend": -0.01680672268907563,
        "quality_score_trend": 0.01169590643274855,
        "success_rate_trend": 0.0,
        "window_size": 2
      },
      "recommendations": [
        {
          "details": "Average execution time: 119.00s",
          "message": "Consider optimizing execution time",
          "priority": "high",
          "type": "performance"
        }
      ],
      "report_timestamp": "2026-06-18T12:06:39.213012",
      "sample_threshold": 10
    },
    {
      "agent_name": "optimization",
      "meets_sample_threshold": false,
      "metrics_count": 2,
      "optimization_cycles": 0,
      "performance_analysis": {
        "analysis_timestamp": "2026-06-18T12:06:39.220252",
        "avg_execution_time": 89.0,
        "avg_quality_score": 0.805,
        "avg_success_rate": 1.0,
        "execution_time_trend": -0.02247191011235955,
        "quality_score_trend": 0.012422360248447215,
        "success_rate_trend": 0.0,
        "window_size": 2
      },
      "recommendations": [
        {
          "details": "Average execution time: 89.00s",
          "message": "Consider optimizing execution time",
          "priority": "high",
          "type": "performance"
        }
      ],
      "report_timestamp": "2026-06-18T12:06:39.220417",
      "sample_threshold": 10
    }
  ],
  "min_samples": 10,
  "optimized_at": "2026-06-18",
  "project": "activity-core"
 }
--- a/.kaizen/schedule.yml
+++ b/.kaizen/schedule.yml
@@ -0,0 +1,15 @@
 # Kaizen scheduled agent execution manifest (ADR-005)
 # Engagement: coulomb-loop bootstrap — weekly cadence
 # Regulator promotes cadence per customer engagement policy (ADR-003).
 # Validate with: kaizen-agentic schedule validate
 version: '1'
 timezone: Europe/Berlin
 agents:
  coach:
    cadence: weekly
    cron: 0 9 * * 1
    enabled: true
  optimization:
    cadence: weekly
    cron: 0 10 * * 1
    enabled: true
--- a/.repo-classification.yaml
+++ b/.repo-classification.yaml
@@ -0,0 +1,28 @@
 # Repo classification (Repo Classification Standard v1.0).
 repo_classification:
  standard: Repo Classification Standard
  version: '1.0'
  classified_at: '2026-06-22'
  classified_by: human
  category: tooling
  domain: infotech
  secondary_domains:
  - agents
  capability_tags:
  - workflow
  - orchestration
  - automation
  - coordination
  - observability
  business_stake:
  - technology
  - operations
  - automation
  - execution
  business_mechanics:
  - coordination
  - operation
  - adaptation
  notes: Org-wide event bridge / task factory (Temporal-based). Active bounded implementation
    -> project.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -4,7 +4,7 @@
 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.
-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
 **Workplan prefix:** `ACTIVITY-WP-`
@@ -83,7 +83,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
 1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
 2. Check inbox: `GET /messages/?to_agent=activity-core&unread_only=true`; mark read
 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
-4. Check blocked tasks: `GET /tasks/?needs_human=true`
+4. Check human-needed tasks: `GET /tasks/?needs_human=true`
 **During work:**
 - Update task statuses in workplan files as tasks progress
@@ -101,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
 ---
 ## Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
 <!-- REPO-AGENTS-EXTENSIONS -->
 <!-- Append repo-specific agent instructions below this marker.
     The state-hub template sync preserves content after this line. -->
 ---
 ## Workplan Convention (ADR-001)
 Work items originate as files in this repo — not in the hub. The hub is a
@@ -124,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
 id: ACTIVITY-WP-NNNN
 type: workplan
 title: "..."
-domain: custodian
+domain: infotech
 repo: activity-core
 status: proposed | ready | active | blocked | backlog | finished | archived
 owner: codex
@@ -154,10 +211,7 @@ state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 Task description text.
 ```
-Status progression: `todo` → `progress` → `done`; use `wait` for a task
+Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
 blocked on external input and `cancel` for intentionally abandoned work.
 Workstream/workplan lifecycle status is separate; frontmatter `blocked` remains
 valid there.
 To create a new workplan:
 1. Write the file following the format above
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,4 +8,5 @@
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md
--- a/19
+++ b/19
@@ -1,13 +1,16 @@
 -include .env
 export
-.PHONY: sync-event-types sync-activity-definitions test migrate sync-all \
+.PHONY: sync-event-types sync-activity-definitions sync-schedules test migrate sync-all \
        dev-up dev-down railiance-up railiance-down \
        start-worker start-api start-event-router help
 sync-activity-definitions:  ## Sync ActivityDefinition files into DB
 	uv run python -m activity_core.sync_activity_definitions
 sync-schedules:  ## Reconcile Temporal schedules from activity_definitions DB
 	uv run python -m activity_core.sync_schedules
 sync-event-types:  ## Sync event type YAML files into DB
 	uv run python scripts/sync_event_types.py
@@ -52,3 +55,17 @@ help:  ## Show this help message
 	@grep -Eh '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | \
 		awk 'BEGIN {FS = ":.*?## "}; {printf "  \033[36m%-24s\033[0m %s\n", $$1, $$2}' | \
 		sort
 # Agent Management Targets
 agents-list:
 	@echo "Installed agents:"
 	@ls agents/ 2>/dev/null | grep agent- | sed 's/agent-//g' | sed 's/.md//g' \
 	|| echo "No agents installed"
 agents-update:
 	@echo "Updating agents..."
 	@kaizen-agentic update
 agents-validate:
 	@echo "Validating agents..."
 	@kaizen-agentic validate agents/
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -1,7 +1,7 @@
 ---
 domain: capabilities
 repo: activity-core
-updated: "2026-06-03"
+updated: "2026-06-16"
 ---
 # SCOPE
@@ -16,7 +16,8 @@ updated: "2026-06-03"
 activity-core is the org-wide Event Bridge for the Coulomb organization — a
 rule-governed event loop that receives time-based and domain events, evaluates
 declarative rules and LLM instructions against current org context, and emits
-structured task sets to issue-core.
+structured task, report, and evidence outputs without owning downstream task
 lifecycle.
 ---
@@ -27,8 +28,11 @@ An `ActivityDefinition` (a markdown file checked into a repo) declares a trigger
 resolve before evaluation, and a set of rules and instructions that determine
 what tasks to create. When triggered, a durable Temporal workflow loads the
 definition, resolves context, evaluates the rule/instruction set, and emits task
-creation requests to issue-core. Everything is auditable: the spawn log records
+creation requests to issue-core or configured dry-run/audit sinks. Instructions
-the triggering event, matched rule, and resulting task references.
+may also emit validated reports, and selected context resolvers may emit compact
 non-secret evidence. Everything is auditable: the spawn log records the
 triggering event, matched rule/instruction metadata, model/prompt hash where
 applicable, and resulting task references.
 The two evaluation modes:
 - **Rule** — deterministic condition (sandboxed Python-like DSL) → fixed task
@@ -48,21 +52,35 @@ The two evaluation modes:
  attribute schemas, example payloads, and intent documentation.
  Curator-gating configurable per runtime environment.
 - **Trigger types**: 5-field cron with timezone and misfire policy; one-off
-  scheduled datetime; event-type subscription via NATS.
+  scheduled datetime; event-type subscription via NATS; manual one-shot API
  trigger; one-shot schedule smoke tests for recurring definitions.
 - **Context resolution adapters**: repo-scoping (repository capability queries),
-  state hub (domain and workstream state), extensible for other sources.
+  State Hub (domain/workstream state, SBOM status, daily triage digest, coding
  retro read model), and ops inventory (bounded HTTP/HTTPS probes of a
  non-secret service inventory). The adapter registry is extensible for other
  sources.
 - **Rule evaluator**: sandboxed AST walker for Python-like boolean expressions
  over event attributes and resolved context. Rule actions support safe
  `context.*` / `event.*` interpolation and explicit `for_each` per-item
  binding. No `exec()`.
 - **Instruction executor**: trusted-field prompt rendering, LLM call via
-  llm-connect, structured output validation, optional curator review queue,
+  llm-connect, structured output validation, item-granular recovery with a
-  and deterministic report sinks.
+  quarantine lane and producer guardrails (count/length/depth caps, reference
  allow-list) at the producer trust boundary, bounded validation-failure
  artifacts for report instructions, review-required audit metadata, and
  deterministic report sinks. A real downstream review queue is not implemented
  in this repo.
 - **Task emission adapter**: abstraction over issue-core; current transport is
-  REST; designed to migrate to NATS subscription without code changes.
+  REST, with `ISSUE_SINK_TYPE=null` for dry-run/audit mode. It is designed to
  migrate to a durable issue-core-owned NATS command boundary when issue-core
  provides that contract.
 - **Report sinks**: instruction report outputs can be persisted to bounded
  local working memory and posted as State Hub progress events. These are
  reporting outputs, not task lifecycle ownership.
 - **Ops evidence sinks**: `ops-inventory` context sources can post compact
  non-secret `ops_inventory_probe` summaries to State Hub. Inter-Hub submission
  is present only as a gated/deferred sink result until operator-owned
  `OPS_HUB_KEY` custody and widget mapping are ready.
 - **Spawn audit log**: every task emission recorded with rule/instruction id,
  triggering event id, model and prompt hash (instructions), issue-core task ref.
 - **Webhook receiver**: HTTP endpoint normalising inbound Gitea/GitHub webhook
@@ -84,6 +102,14 @@ The two evaluation modes:
  coordinated changes belong to project-core (future).
 - **Execution of automatable tasks** — Temporal Activities that do real work
  (run a scan, apply a patch, call an API) live in per-repo workers, not here.
 - **General ops execution** — Kubernetes, SSH, tunnel, authenticated service
  checks, secret custody, OpenBao writes, and Inter-Hub widget/API-key
  provisioning belong to the owning operational repos and operator workflows.
  activity-core may record non-secret probe evidence; it must not become the ops
  control plane.
 - **Service inventory authority** — the Custodian inventory remains owned by
  the custodian/state-hub surface. activity-core may read a projected
  non-secret snapshot.
 - **Event broker hosting** — NATS JetStream is org infrastructure; activity-core
  consumes it but does not own its lifecycle.
 - **Temporal server hosting** — activity-core uses the Temporal SDK; the server
@@ -101,6 +127,9 @@ The two evaluation modes:
  structured tasks in the right repos."
 - You need one-off future task scheduling without a separate reminder system.
 - You want an auditable record of what triggered what and why.
 - You need a scheduled, non-secret evidence note proving that declared service
  endpoints or access paths were observed, without executing privileged ops
  commands.
 - You are replacing scattered bespoke cron jobs and manual coordination with
  a governed, observable automation layer.
@@ -117,29 +146,45 @@ The two evaluation modes:
 ## Current State
- **Status**: active production-backed service. Foundation, triggers/ops,
+- **Status**: active production-backed service with two visible open gates:
-  event bridge, Railiance deployment, and the production service workplans are
+  `ACTIVITY-WP-0006` still waits on three clean consecutive scheduled daily
-  complete. The stale March WP-0002 handoff note has been reconciled and
+  triage runs and calibration feedback, and `ACTIVITY-WP-0008` is blocked until
-  archived.
+  Helix Forge publishes the upstream `coding_retro` read model needed to enable
  the Saturday schedule. `ACTIVITY-WP-0007` is finished: the bounded
  ops-inventory probe/evidence slice has live Railiance evidence.
 - **Implementation**: core is functional. `RunActivityWorkflow`,
-  `TaskExecutorWorkflow` (stub), PostgreSQL schema, Temporal Schedules, NATS
+  `TaskExecutorWorkflow` (stub), PostgreSQL schema, Temporal Schedules and smoke
-  Event Router, FastAPI admin API, Prometheus metrics, event type registry,
+  schedules, NATS Event Router, FastAPI admin API, Prometheus metrics, event
-  markdown ActivityDefinition parser/sync, rule evaluator, instruction
+  type registry, markdown ActivityDefinition parser/sync, rule evaluator,
-  executor, context resolvers, issue sink, report sinks, Kubernetes deployment,
+  instruction executor, context resolvers, issue sink, report sinks, ops
-  and operational runbook are all implemented.
+  evidence sink, Kubernetes deployment, and operational runbook are all
- **Operational proof**: the daily State Hub WSJF triage cutover has completed
+  implemented.
-  far enough that activity-core is now the trusted scheduled substrate for the
+- **Current definitions**: `weekly-sbom-staleness` is enabled and demonstrates
-  routine report. Recent hardening fixed the State Hub SBOM resolver contract,
+  the deterministic rule/fan-out path. `weekly-coding-retro` is present and
-  made slow LLM activity timeouts configurable, and added safe rule action
+  tested but intentionally disabled until live `coding_retro` evidence exists.
-  interpolation plus explicit `for_each` binding for per-repo SBOM staleness
+  Railiance projects the daily State Hub WSJF triage definition and the disabled
-  tasks.
+  ops-service-inventory probe definition from the runtime bundle.
- **Stability**: construction risk has shifted to operational hardening risk.
+- **Operational proof**: the State Hub daily WSJF triage path has produced
-  The full test suite passed on 2026-06-03 (`125 passed, 1 skipped`). The
+  validated reports and working-memory notes, but the calibration gate is not
-  remaining work is mostly observability, status-canon adaptation, contract
+  closed. A 2026-06-16 recheck found State Hub `daily_triage` progress and
-  documentation, and broader production adoption rather than first
+  working-memory `daily-triage-*` notes only through 2026-06-06, so there is not
-  implementation.
+  yet evidence for three clean consecutive scheduled runs after the June 7
- **Next**: `ACTIVITY-WP-0006` — post-triage operational hardening and scope
+  runtime projection failure. The ops inventory probe path has live fallback
-  alignment.
+  evidence in State Hub; Inter-Hub per-entity submission remains deferred.
 - **Task emission posture**: the issue-core REST sink is implemented, but the
  Railiance runtime currently uses `ISSUE_SINK_TYPE=null` dry-run/audit mode.
  Switching to live issue-core task creation requires a verified endpoint,
  credentials, and duplicate-handling check in the target environment.
 - **Stability**: construction risk has shifted to operational hardening and
  adoption risk. The last recorded full-suite pass in the workplans was
  2026-06-04 (`128 passed, 1 skipped`), with later targeted coverage added for
  ops inventory, ops evidence sinks, Railiance projection wiring, and weekly
  coding retro parsing/rule behavior.
 - **Next**: close `ACTIVITY-WP-0006-T03` with real scheduled-run calibration
  evidence; close `ACTIVITY-WP-0008-T03` once upstream `coding_retro` publication
  exists and the dry-run/duplicate check passes; decide when to move selected
  task/report/evidence sinks from dry-run or fallback mode to their intended
  live backends.
 ---
@@ -159,9 +204,9 @@ database, the project planner, or a general execution worker. The local
 workplan explicitly rehomes execution responsibility.
 One boundary nuance is now explicit: activity-core may post State Hub progress
-events as a configured report sink. That is acceptable because it records the
+events as a configured report or evidence sink. That is acceptable because it
-result of an activity-core activation; it is not ownership of State Hub state,
+records the result of an activity-core activation; it is not ownership of State
-task lifecycle, or workstream planning.
+Hub state, task lifecycle, or workstream planning.
 The main drift risk is convenience creep: adding direct task tracking,
 project-phase state, or bespoke operational scripts because the Temporal
@@ -169,27 +214,58 @@ substrate is already nearby. Future work should prefer declarative
 ActivityDefinitions, bounded context resolvers, and outbound adapters over
 new one-off control paths.
 ## Known Gaps Against Intent
 - **Scheduled-run trust gap**: INTENT promises recurring coordination work that
  runs without Bernd as the manual coordination layer. The daily triage path is
  implemented, but its current calibration task still lacks three clean
  consecutive scheduled runs after the June 7 runtime failure. Until that closes,
  daily triage remains a production-backed capability with an evidence gap, not
  a fully proven standing substrate.
 - **Task creation gap**: INTENT says activations emit task creation requests to
  issue-core. The REST sink exists, but Railiance is still in `ISSUE_SINK_TYPE=null`
  mode. That preserves auditability and avoids accidental duplicate/live tasks,
  but it means production schedules are not yet consistently creating real
  issue-core tasks.
 - **Review queue gap**: `review_required` is explicitly metadata only in the
  current contract. No issue-core review queue integration exists here, so any
  future queue routing needs a downstream issue-core contract before high-impact
  instruction outputs rely on it.
 - **Evidence backend posture**: the State Hub fallback evidence path is the
  accepted current backend for `ops_inventory_probe`. Inter-Hub/ops-hub
  submission is deliberately deferred behind `OPS_HUB_KEY`, widget mapping, and
  operator approval, so per-entity ops evidence publication is future work.
 - **Execution-boundary residue**: `TaskExecutorWorkflow` is still registered as
  a stub that writes a done `task_instances` row. It should remain inert or be
  removed/re-homed before it attracts real execution work, because execution is
  explicitly outside activity-core's intent.
 - **API exposure posture**: the FastAPI surface stays ClusterIP-only for now.
  External ingress remains future work until an authenticated access policy is
  designed.
 ---
 ## How It Fits
 ```
-[NATS JetStream]  ←  publishers: state hub, Gitea webhooks, Temporal signals, cron
+[NATS JetStream]  ←  publishers: State Hub, Gitea webhooks, Temporal signals, cron
       ↓
 [activity-core]   ←  event type registry, rule evaluator, instruction executor
 [activity-core]   →  [issue-core]  →  [repos/services]
-[activity-core]   →  [report sinks]
+[activity-core]   →  [report/evidence sinks]  →  [State Hub / working memory / future Inter-Hub]
 ```
 - **Upstream**: NATS (event bus), Temporal (durable workflow engine), PostgreSQL
-  (definitions and audit log), repo-scoping (context adapter), state hub (context
+  (definitions and audit log), repo-scoping (context adapter), State Hub (context
  adapter and event publisher).
- **Downstream**: issue-core (task management) and configured report sinks.
+- **Downstream**: issue-core (task management) and configured report/evidence sinks.
  Agents and humans pick up tasks from issue-core and do the actual work.
  Railiance may use the null sink for dry-run/audit mode until live issue-core
  emission is approved.
 - **Coordinates with**: the state hub delegates maintenance automations to
  activity-core by publishing lifecycle events or by being resolved as context.
-  activity-core may post progress events as report outputs, but it does not own
+  activity-core may post progress events as report/evidence outputs, but it
-  State Hub task/workstream state.
+  does not own State Hub task/workstream state.
 ---
@@ -203,6 +279,11 @@ new one-off control paths.
  by a sandboxed AST walker.
 - **Instruction** — LLM-evaluated task generation with trusted-field prompt
  interpolation and structured output schema enforcement.
 - **Report sink** — configured persistence for instruction reports, currently
  working-memory markdown notes and State Hub progress events.
 - **Evidence sink** — configured persistence for compact non-secret resolver
  evidence, currently State Hub progress for ops inventory probes; Inter-Hub is
  a deferred gated target.
 - **Event type** — a registered, schema-documented category of event (e.g.
  `org.repo.registered`). Publisher-declared; curator-gated per environment.
 - **Spawn audit trail** — activity-core's local record of what tasks were emitted,
@@ -219,8 +300,12 @@ new one-off control paths.
 - `issue-core` (formerly issue-facade) — downstream task management; receives
  all task emission from activity-core.
 - `repo-scoping` — context adapter for repository capability queries.
- `the-custodian` / state hub — context adapter for domain state; delegates
+- `the-custodian` / State Hub — context adapter for domain state; delegates
  maintenance automation to activity-core via NATS events.
 - `llm-connect` — instruction execution backend for judgement-oriented reports
  such as daily State Hub WSJF triage.
 - `inter-hub` / `ops-hub` — future richer ops evidence intake target; currently
  operator-gated and not required for the State Hub fallback evidence path.
 - `rules-core` (future extraction) — the rule evaluator and instruction executor
  module, currently in `src/activity_core/rules/`.
 - `project-core` (future) — project and initiative management; will use
@@ -237,6 +322,9 @@ new one-off control paths.
  governance model, event type schema, ActivityDefinition structure.
 - `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
  model, evaluation semantics, audit trail, testing strategy.
 - `docs/adr/adr-004-producer-trust-boundary.md` — untrusted-producer premise,
  trust-but-handle vs verify-and-mitigate postures, error-locality and
  quarantine-with-provenance, producer guardrails for LLM/agent/human output.
 ---
@@ -248,7 +336,10 @@ new one-off control paths.
  `src/activity_core/activities.py` (Temporal activities),
  `src/activity_core/event_router.py` (NATS → Temporal),
  `src/activity_core/schedule_manager.py` (Temporal Schedules),
-  `src/activity_core/api.py` (FastAPI admin).
+  `src/activity_core/api.py` (FastAPI admin),
  `src/activity_core/report_sinks.py` (instruction reports),
  `src/activity_core/ops_evidence_sinks.py` (ops evidence),
  and `src/activity_core/context_resolvers/` (external context adapters).
 - Definition files: `event-types/`, `activity-definitions/`, and `tasks/`.
 - Dev environment: `docker-compose.dev.yml` (Temporal + PostgreSQL + NATS).
 - Entry points: `uv run python -m activity_core.worker` (Temporal worker),
@@ -264,6 +355,7 @@ title: Durable event-triggered task factory
 description: >
  Org-wide Event Bridge that receives time-based and domain events, evaluates
  declarative rules and LLM instructions against current org context, and emits
-  structured task sets to issue-core with a full spawn audit trail.
+  structured task, report, and evidence outputs with a full spawn/report audit
-keywords: [temporal, workflow, event-bridge, task, cron, event, rule, instruction, org-automation]
+  trail while leaving task lifecycle ownership downstream.
 keywords: [temporal, workflow, event-bridge, task, report, evidence, cron, event, rule, instruction, org-automation]
 ```
--- a/agents/agent-coach.md
+++ b/agents/agent-coach.md
@@ -0,0 +1,184 @@
 ---
 name: coach
 description: Coaching meta-agent that reads all agent memories in a project and synthesises cross-agent briefs and new-agent orientations
 category: meta
 memory: enabled
 ---
 # Coach Agent
 ## Role
 You are the **kaizen-agentic Coach** — a meta-agent that observes, synthesises,
 and advises. You do not perform domain work (coding, testing, infrastructure).
 Your sole purpose is to read across the accumulated memories of all agents in a
 project and produce useful, targeted briefs.
 You are invoked via:
 ```
 kaizen-agentic memory brief <agent-name>
 ```
 Or directly by the operator: *"Coach, brief the sys-medic agent on this project"*
 or *"Coach, what patterns have you observed across all agents?"*
 ---
 ## What You Do
 ### 1. Cross-Agent Synthesis
 Read all `.kaizen/agents/*/memory.md` files in the current project. Identify:
 - **Shared patterns**: themes that appear across multiple agents
  (e.g. "three agents flagged missing test coverage as a risk")
 - **Cross-domain risks**: signals in one agent's memory that should inform
  another (e.g. infrastructure instability flagged by sys-medic → tdd-workflow
  should account for flaky environments)
 - **Resource or architectural signals**: recurring mentions of specific files,
  modules, services, or systems across agents
 - **Contradictions or gaps**: where agents hold conflicting assumptions or where
  no agent has coverage
 ### 2. New-Agent Orientation
 When asked to brief a specific agent about to be deployed for the first time:
 1. Read all existing agent memories in the project
 2. Filter for what is relevant to the incoming agent's domain
 3. Produce a targeted orientation brief covering:
   - **Project context**: what kind of project this is, key constraints
   - **What to know first**: the most important facts for this agent
   - **Watch points**: risks or pitfalls flagged by other agents that are relevant
   - **What has worked**: successful approaches in adjacent domains
   - **Open threads**: unresolved items from other agents that may interact with
     this agent's work
 ### 3. Fleet Health Overview
 When asked for a fleet overview:
 - Summarise the health of the agent fleet: which agents are active, stale, or
  missing from the project
 - Flag agents with high `session_count` and still-open `## Open Threads`
 - Identify agents whose memories suggest overlapping concerns
 - Recommend whether any memory files should be reviewed or reset
 ---
 ## How to Read Agent Memory Files
 Memory files live at `.kaizen/agents/<name>/memory.md` relative to the project
 root. Each follows ADR-002 structure:
 ```
 ## Project Context      ← agent's understanding of the project
 ## Accumulated Findings ← patterns and recurring issues
 ## What Worked         ← validated approaches
 ## Watch Points        ← risks and traps
 ## Open Threads        ← unresolved items
 ## Session Log         ← chronological session summaries
 ```
 When synthesising, weight `## Watch Points` and `## Open Threads` most heavily —
 these are the signals most likely to be actionable for another agent.
 ### Project metrics (ADR-004)
 Quantitative performance data lives at `.kaizen/metrics/<agent>/summary.json`.
 `kaizen-agentic memory brief <agent>` includes a `## Performance Summary` block
 when metrics exist.
 When synthesising orientations:
 - Combine qualitative memory with quantitative trends (success rate, quality,
  execution time, trend arrows)
 - Flag agents with declining success rate or quality trends
 - Cross-reference metrics with `## Watch Points` — do metrics confirm or
  contradict qualitative findings?
 - Note when an agent has memory but no metrics (incomplete session-close protocol)
 Fleet optimizer output at `.kaizen/metrics/optimizer/analysis.json` provides
 project-wide analysis from `kaizen-agentic metrics optimize`.
 ---
 ## Output Format
 ### Cross-agent brief
 ```
 ## Cross-Agent Brief — <project name>
 Generated: <date>
 Agents with memory: <list>
 ### Shared Patterns
 <bullet list of themes appearing across ≥2 agents>
 ### Cross-Domain Risks
 <risks from one domain relevant to others>
 ### Open Threads (fleet-wide)
 <unresolved items that span or affect multiple agents>
 ### Fleet Health
 <which agents are active/stale, any concerning signals>
 ```
 ### New-agent orientation
 ```
 ## Orientation Brief for: <agent-name>
 Project: <project name>
 Generated: <date>
 Sources: <which agent memories were read>
 ### Performance Summary
 <from .kaizen/metrics/<agent>/ when available — success rate, quality, trends>
 ### What to Know First
 <3–5 most important facts for this agent>
 ### Watch Points
 <risks relevant to this agent's domain>
 ### What Has Worked
 <approaches validated by other agents that apply here>
 ### Open Threads You May Encounter
 <items from other agents that may intersect with your work>
 ```
 ---
 ## Behaviour Boundaries
 - **Do not** modify agent memory files
 - **Do not** perform any domain-specific work (coding, testing, diagnosis)
 - **Do not** make decisions — synthesise and advise only
 - **If no memories exist**: say so clearly and offer to help initialise them
 - **If asked about a specific agent not present**: note the gap
 ---
 ## Coach's Own Memory
 The coach maintains `.kaizen/agents/coach/memory.md` covering:
 - Fleet-level patterns observed over time
 - How the agent population in this project has evolved
 - Meta-observations about how well the memory convention is being followed
 - Recurring gaps or blind spots in the agent fleet
 ### Session Start
 1. Check for `.kaizen/agents/coach/memory.md`.
 2. If present, read it — prior fleet observations provide context for the current synthesis.
 3. Scan `.kaizen/agents/*/memory.md` to build the current fleet picture.
 ### Session Close
 1. Update `## Accumulated Findings` with new fleet-level patterns.
 2. Note any new agents added or memory files reset.
 3. Append one line to `## Session Log`: `YYYY-MM-DD · <brief requested for> · <key finding>`.
 4. Bump `last_updated` and `session_count`.
--- a/agents/agent-optimization.md
+++ b/agents/agent-optimization.md
@@ -0,0 +1,191 @@
 ---
 name: optimization
 description: Meta-agent that analyzes and optimizes other Claude Code subagents based on their performance data, usage patterns, and effectiveness metrics. Use PROACTIVELY for agent ecosystem improvement.
 model: inherit
 category: meta
 memory: enabled
 ---
 # Kaizen Optimizer - Agent Performance Meta-Optimizer
 ## Purpose
 Meta-agent that analyzes and optimizes other Claude Code subagents based on their performance data, usage patterns, and effectiveness metrics. Continuously improves the agent ecosystem by identifying patterns that correlate with success or failure, and proposing data-driven refinements to agent specifications.
 ## When to Use This Agent
 Use the kaizen-optimizer agent when you need:
 - Analysis of subagent performance and effectiveness
 - Optimization recommendations for existing agents
 - Agent specification improvements based on usage data
 - Performance pattern identification across agent invocations
 - Agent ecosystem health assessment
 - Continuous improvement of the agent framework
 ### Trigger Patterns
 1. **Scheduled Reviews**: Regular analysis of agent performance (weekly/monthly)
 2. **Performance Degradation**: When agent success rates drop below thresholds
 3. **New Agent Evaluation**: After deploying new agents to assess effectiveness
 4. **Usage Pattern Changes**: When agent usage patterns shift significantly
 5. **Explicit Optimization Requests**: Direct requests for agent improvement analysis
 ### Example Usage Scenarios
 1. **Post-Project Analysis**: "Analyze how well our agents performed during Issue #15 implementation and suggest improvements"
 2. **Agent Performance Review**: "Review the effectiveness of tddai-assistant over the last 30 days and recommend optimizations"
 3. **Ecosystem Optimization**: "Identify which agents are underperforming and suggest specification improvements"
 4. **Success Pattern Analysis**: "Analyze successful agent chains and recommend best practices"
 ## Agent Capabilities
 ### Performance Analysis
 - **Success Rate Analysis**: Track agent task completion and success metrics
 - **Usage Pattern Recognition**: Identify how agents are being used effectively
 - **Failure Mode Analysis**: Categorize and analyze agent failure patterns
 - **Response Quality Assessment**: Evaluate the quality of agent outputs
 ### Optimization Recommendations
 - **Specification Refinements**: Suggest improvements to agent descriptions and capabilities
 - **Trigger Pattern Optimization**: Refine when and how agents should be invoked
 - **Chain Optimization**: Recommend better agent collaboration patterns
 - **Scope Adjustments**: Identify agents that are too broad or too narrow in scope
 ### Meta-Learning
 - **Pattern Detection**: Identify successful agent behaviors and specifications
 - **Correlation Analysis**: Find relationships between agent characteristics and performance
 - **Best Practice Extraction**: Distill successful patterns into reusable guidelines
 - **Evolution Tracking**: Monitor how agent improvements affect performance over time
 ## Analysis Framework
 ### Data Collection Focus
 Since this operates within Claude Code's environment, analysis is based on:
 - **Conversation Context**: Agent invocation patterns and outcomes within sessions
 - **User Feedback Patterns**: Implicit success signals from user interactions
 - **Task Completion Rates**: Whether agents successfully complete their assigned tasks
 - **Agent Specification Quality**: How well specifications match actual usage
 ### Performance Metrics
 - **Invocation Success**: How often agents complete tasks as intended
 - **User Satisfaction Indicators**: Continued usage, follow-up requests, task completion
 - **Agent Utilization**: Which agents are used most/least and why
 - **Chain Effectiveness**: Success rates of multi-agent workflows
 ## Optimization Strategies
 ### Specification Enhancement
 - **Clarity Improvements**: Make agent purposes and capabilities clearer
 - **Scope Refinement**: Adjust agent boundaries for better effectiveness
 - **Example Enhancement**: Add better usage examples and scenarios
 - **Integration Guidance**: Improve agent-to-agent collaboration descriptions
 ### Performance Improvement
 - **Trigger Optimization**: Refine when agents should be automatically suggested
 - **Capability Matching**: Ensure agent capabilities match user needs
 - **Redundancy Reduction**: Identify and resolve agent overlap issues
 - **Gap Identification**: Find missing capabilities in the agent ecosystem
 ## Integration with Agent Ecosystem
 ### Analyzes All Agents
 - **general-purpose**: Assess effectiveness for research and multi-step tasks
 - **tddai-assistant**: Evaluate TDD workflow support and methodology adherence
 - **project-assistant**: Review project management and milestone tracking performance
 - **claude-expert**: Analyze documentation and feature explanation effectiveness
 - **statusline-setup**: Assess configuration task success rates
 - **output-style-setup**: Evaluate creative task completion effectiveness
 ### Collaborative Analysis
 Works with other agents to gather performance data:
 - Uses **general-purpose** for complex analysis tasks
 - Coordinates with **project-assistant** for milestone-based performance tracking
 - Leverages **claude-expert** for framework knowledge and best practices
 ## Expected Outputs
 ### Performance Analysis Reports
 - Agent effectiveness rankings with supporting evidence
 - Usage pattern analysis and trend identification
 - Success/failure correlation analysis
 - Performance bottleneck identification
 ### Optimization Recommendations
 - Specific agent specification improvements
 - Trigger pattern refinements
 - Agent chain optimization suggestions
 - New agent capability recommendations
 ### Implementation Guidance
 - Prioritized improvement roadmap
 - Specification update templates
 - A/B testing suggestions for agent improvements
 - Rollback strategies for failed optimizations
 ## Best Practices for Usage
 ### Provide Performance Context
 - Share specific agent interactions that were particularly effective or ineffective
 - Describe user experience challenges with current agents
 - Include examples of successful and unsuccessful agent chains
 - Specify performance concerns or optimization goals
 ### Be Specific About Scope
 - Focus on particular agents or agent categories for analysis
 - Define time windows for performance analysis
 - Specify success criteria for optimization efforts
 - Clarify whether analysis should be broad ecosystem or targeted
 ### Implementation Approach
 - Request prioritized recommendations based on impact vs. effort
 - Ask for specific specification changes rather than general advice
 - Seek rollback plans for proposed optimizations
 - Request measurable success criteria for improvements
 ## Quality Standards
 ### Analysis Rigor
 - Evidence-based recommendations supported by usage patterns
 - Consideration of trade-offs between different optimization approaches
 - Realistic improvement expectations and timelines
 - Acknowledgment of limitations in available performance data
 ### Recommendation Quality
 - Specific, actionable changes to agent specifications
 - Clear success criteria for measuring improvement effectiveness
 - Integration considerations for agent ecosystem harmony
 - Risk assessment for proposed changes
 ## Integration Notes
 This agent operates within Claude Code's conversation context and focuses on:
 - **Qualitative Analysis**: Since detailed metrics aren't available, focuses on behavioral patterns and user interaction quality
 - **Specification Optimization**: Improving agent descriptions, examples, and usage guidance
 - **Ecosystem Balance**: Ensuring agents complement rather than compete with each other
 - **Practical Improvements**: Recommendations that can be implemented through specification updates
 The agent serves as the continuous improvement engine for the subagent ecosystem, ensuring agents evolve to better serve user needs and project requirements.
 ## Session Start
 1. Check for `.kaizen/agents/optimization/memory.md` in the project root.
 2. If present, read it before beginning analysis.
 3. Review `.kaizen/metrics/optimizer/analysis.json` if it exists for the latest fleet report.
 ## Session Close
 1. When analysis completes, note key findings in `## Accumulated Findings`.
 2. Append one line to `## Session Log`: `YYYY-MM-DD · <agents reviewed> · <outcome>`.
 3. Bump `last_updated` and increment `session_count`.
 4. Persist quantitative analysis via CLI (ADR-004):
 ```bash
 kaizen-agentic metrics optimize [agent-name]
 ```
 Run without an agent name to analyze all agents with project metrics. Requires
 ≥10 execution records per agent for actionable recommendations (see
 `wiki/AgentKaizenOptimizer.md`).
--- a/docs/adr/adr-003-rule-instruction-model.md
+++ b/docs/adr/adr-003-rule-instruction-model.md
@@ -216,11 +216,21 @@ it. The output schema must define `List[TaskSpec]` or a compatible envelope.
 #### `review_required: true`
-When set, the instruction's proposed task list is written to a **pending review
+When set today, the instruction's task/report output is marked with
-queue** in issue-core rather than directly created. A human or curator agent
+`review_required=true` in activity-core audit metadata. For report-producing
-reviews and approves/rejects before tasks are materialised. This is the default
+instructions, this flag is also persisted in configured report sinks so an
-for instructions that create high-impact tasks (cross-repo changes, security
+operator can distinguish validated-but-review-worthy output from routine
-responses, production operations).
+output.
 activity-core does **not** currently route proposed tasks to a pending review
 queue. That queue must be owned by issue-core, because issue-core owns task
 lifecycle state. Until issue-core exposes a review contract, `review_required`
 is metadata only; it must not be treated as evidence that live task creation was
 held for approval.
 Future issue-core review integration may use the same field, but that change
 must update the issue sink contract and tests before any ActivityDefinition
 relies on queue routing.
 #### Evaluation semantics
@@ -286,7 +296,8 @@ This boundary makes future extraction to `rules-core` a packaging exercise, not
  tasks" behaviour is replaced by explicit rule blocks.
 - A new `RuleEvaluator` class (AST walker) is added to `src/activity_core/rules/`.
 - A new `InstructionExecutor` class handles prompt rendering, LLM call, output
-  validation, and review queue routing.
+  validation, and review-required audit metadata. Pending review queue routing
  remains a future issue-core integration.
 - Integration tests for rule evaluation use fixture JSON; no running Temporal required.
 - The `task_spawn_log` table is added to the Postgres schema (new Alembic migration).
 - ActivityDefinition files that omit both `rules` and `instructions` are valid
--- a/docs/adr/adr-004-producer-trust-boundary.md
+++ b/docs/adr/adr-004-producer-trust-boundary.md
@@ -0,0 +1,156 @@
 ---
 id: ACT-ADR-004
 type: architecture-decision-record
 title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output"
 status: accepted
 decided_by: Bernd Worsch
 date: "2026-06-26"
 scope: cross-repo
 affects:
  - activity-core
  - rules-core (future extraction)
 tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"]
 ---
 # ACT-ADR-004: The Producer Trust Boundary
 ## Status
 Accepted.
 ## Context
 On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called
 llm-connect successfully, and produced a long ranked recommendation list — but
 the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because
 the report was validated and consumed as a single monolithic JSON document, one
 malformed delimiter discarded the **entire** run, including the 7 perfectly good
 recommendations the model had already emitted. The scheduling and runtime layers
 were healthy; the failure was entirely at the seam where free-form model output
 meets a strict consumer.
 This is not a one-off bug, it is a recurring class. activity-core has a **trust
 boundary** wherever generative or human-authored output meets strict deterministic
 consumers: the JSON Schema validator, the task emitter, and any classic compute
 pipeline downstream. The producers on the other side of that boundary — **LLMs,
 agents, and humans** — are all *untrusted producers*. Their output may be:
 - **erroneous** — hallucination, truncation at a token limit, drift, type slips,
  typos, a missing delimiter; or
 - **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested
  structures intended to exhaust or confuse the consumer.
 The pre-existing design treated producer output optimistically: parse the whole
 document, validate the whole document, and on any failure discard the whole
 document (preserving only a bounded diagnostic preview). That gives **zero error
 locality** — the blast radius of any single defect is the entire activation.
 ## Decision
 Treat the producer→consumer seam as an explicit, adversarial **trust boundary**,
 and place guardrails plus error-correction tooling *at that boundary* rather than
 letting raw producer output flow into deterministic consumers.
 ### Two non-fail-fast postures
 When hard-failing on a problem is undesirable, there are two sound strategies, and
 they **compose**:
 - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy
  path; blast radius depends entirely on how granular the catch is. Best when
  failures are rare and locally recoverable. Risk: failures surface late, possibly
  after partial side effects.
 - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
  and normalize the output to a known-good shape *before* it enters the pipeline —
  drop bad items, coerce types, bound sizes/depth, allow-list references — so the
  consumer only ever sees clean input. Higher upfront cost, smaller blast radius,
  no partial side effects. Best when failures are common or consequences are high.
 ### Governing principles
 1. **Push verification to the boundary; keep the interior strict.** Apply posture
   **B** at the producer→consumer boundary; keep posture **A** for residual
   exceptions inside the verified core. Never relax the interior schema to absorb
   producer sloppiness.
 2. **Make error locality match the unit of work.** One bad recommendation must
   cost one recommendation, not the whole report. Structuring the payload so each
   item is independently parseable and validatable is the highest-leverage change.
 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
   provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they
   can be debugged or replayed. Degraded-but-usable is reported distinctly from
   total loss.
 4. **Both human and agent input get the same rigor.** Guardrails are
   producer-agnostic: the same count / length / depth caps and reference
   allow-lists apply whether the producer is an LLM, an agent, or a human.
 ### What this means concretely in activity-core
 Implemented in `src/activity_core/rules/executor.py`:
 - **Strict-structure-only schema.** The daily-triage output schema is strict on
  per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`)
  and carries `maxItems` as a producer *hint* — never as a hard whole-document
  reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs
  the schema format; `schemas/daily-triage-report.json`).
 - **Item-granular recovery (posture B).** When whole-document parse + one retry
  fail, `_resilient_report` recovers individually-parseable recommendation objects
  via a brace/quote-aware scanner (`_extract_object_spans`) that works for both
  pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a
  truncated tail, validates each recovered object against the item schema, and
  keeps the valid ones. Survivors are emitted with `output_validated=true`,
  `partial=true`, and `review_required=true`.
 - **Producer guardrails (`_partition_items`, applied on both the recovery and the
  happy path).** Per recommendation: structural type → schema → structural caps
  (`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by
  `maxItems`). The first failing check quarantines the item with provenance and a
  `reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`).
 - **Reference allow-list.** A recommendation whose `candidate` is not in the set of
  known ids is quarantined. The set is sourced from resolved context
  (`context["known_candidates"]`, via `_allow_list_from_context`); the check is
  inert until a context resolver populates it, so the capability ships now and
  activates with a one-line resolver change.
 ### Where each posture sits
 | Layer | Posture | Mechanism |
 |-------|---------|-----------|
 | Schema / contract | B | strict per-item structure; `maxItems` as hint |
 | Whole-document parse | A | tolerant parse + single retry |
 | Failed parse | B | item-granular recovery + repair + quarantine |
 | Per-item screening | B | schema + depth/length caps + allow-list + count cap |
 | Emitted report | — | `partial` / `quarantined_*` provenance; never silent |
 ## Consequences
 - A single malformed or oversized item no longer discards an entire activation;
  the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid
  recommendations and quarantine the broken tail.
 - Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks
  and reviewers can distinguish degraded-but-usable from total loss.
 - Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the
  allow-list) are policy knobs that will need tuning; they are intentionally
  conservative defaults, not a finished calibration.
 - **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns
  only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact
  caps raw output below realistic break points. Capturing those signals so
  failures stay debuggable is tracked as a retention fix, not closed by this ADR.
 ## Alternatives considered
 - **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an
  over-count document reproduces the whole-document blast radius. Mitigation (keep
  top-N, quarantine the rest) is preferred.
 - **Relax the schema to accept anything.** Rejected: violates principle 1; pushes
  malformed data into downstream consumers.
 - **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the
  2026-06-26 failure recurred across both the initial attempt and the retry, so
  retry alone does not bound the blast radius.
 ## References
 - ACT-ADR-002 — markdown-as-definition format and output schema governance.
 - ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection
  surface this boundary complements on the output side.
 - `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the
  implementing workplan.
--- a/docs/conventions.md
+++ b/docs/conventions.md
@@ -18,7 +18,7 @@ extension point `af654abb`).
 | Queue name | Registered workers |
 |---|---|
 | `orchestrator-tq` | `RunActivityWorkflow` and all its activities (`load_activity_definition`, `resolve_context`, `log_run`) |
-| `task-execution-tq` | `TaskExecutorWorkflow` and all concrete task type workflows |
+| `task-execution-tq` | `TaskExecutorWorkflow` compatibility stub only; real execution belongs in per-repo workers |
 **Rule:** a workflow and its activities must be registered on the same task queue.
 Cross-queue activity calls require an explicit `task_queue` argument on
@@ -60,6 +60,12 @@ A single process may run workers for multiple task queues, but each `Worker`
 instance is bound to one task queue. Use separate `Worker` instances for
 `orchestrator-tq` and `task-execution-tq`.
 `TaskExecutorWorkflow` is not a production execution surface for activity-core.
 It exists only as a compatibility/idempotency stub that writes a synthetic
 `task_instances` row in older tests and dev flows. Do not add concrete task
 execution logic here; execution ownership belongs to per-repo workers or a
 future execution-owned repo/workplan.
 ---
 ## Search attributes
--- a/docs/issue-core-emission-boundary.md
+++ b/docs/issue-core-emission-boundary.md
@@ -11,7 +11,9 @@ The current authoritative boundary is the issue-core REST API:
 POST {ISSUE_CORE_URL}/issues/
 ```
-`IssueCoreRestSink` sends this payload:
+`IssueCoreRestSink` authenticates with the shared `ISSUE_CORE_API_KEY` env var
 (same value as the issue-core server) via `Authorization: Bearer <key>` and
 sends this payload:
 ```json
 {
@@ -52,7 +54,7 @@ task reference before it can replace `IssueCoreRestSink`.
 Weekly SBOM staleness is safe to evaluate in dry-run mode because the rule
 contract is deterministic and tested. Do not enable it against the real REST sink
-until issue-core credentials, endpoint reachability, and duplicate-handling are
+until `ISSUE_CORE_API_KEY`, endpoint reachability, and duplicate-handling are
 verified in the target environment.
 ## Verification
--- a/docs/runbook.md
+++ b/docs/runbook.md
@@ -116,7 +116,58 @@ asyncio.run(publish())
 ---
-## Syncing schedules manually
+## Syncing definitions and schedules manually
 When the API is running, prefer the admin sync endpoint for definition or
 schedule changes. It refreshes file-backed ActivityDefinitions and reconciles
 Temporal Schedules without restarting the worker:
 ```bash
 curl -s -X POST \
  'http://localhost:8010/admin/sync?definitions=true&schedules=true'
 ```
 The response reports:
 - `definitions.synced`
 - `event_types.synced`
 - `schedules.upserted`
 - `schedules.paused`
 - `schedules.deleted_orphans`
 - bounded `errors[]`
 `event_types` defaults to `false` for this endpoint because event-triggered
 definitions already reload from the DB in the event router path; opt in when
 the operator intentionally changed event type definition files:
 ```bash
 curl -s -X POST \
  'http://localhost:8010/admin/sync?definitions=true&schedules=true&event_types=true'
 ```
 The v1 posture is manual/operator-triggered sync. A periodic background loop is
 deferred until live use shows it is needed; this keeps customer definition
 changes explicit and avoids background repo scanning from the worker.
 ### Railiance01 no-restart smoke
 After changing a projected definition in `k8s/railiance/20-runtime.yaml`,
 apply the ConfigMap and wait for the API pod volume to refresh (up to ~60s),
 then reconcile without restarting `actcore-worker`:
 ```bash
 export KUBECONFIG=~/.kube/config-hosteurope
 kubectl apply -f k8s/railiance/20-runtime.yaml
 sleep 60
 kubectl -n activity-core exec deploy/actcore-api -- \
  python3 -c 'import urllib.request; req=urllib.request.Request("http://localhost:8010/admin/sync?definitions=true&schedules=true", method="POST"); print(urllib.request.urlopen(req).read().decode())'
 ```
 Automated regression for the disabled `ops-service-inventory-probes`
 projection (enable/cadence flip, idempotent repeat sync, rollback) lives in
 `scripts/smoke_admin_sync_no_restart.py`.
 If the API is unavailable, the schedule-only CLI remains available:
 ```bash
 TEMPORAL_HOST=localhost:7233 \
@@ -126,7 +177,7 @@ ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
 This reconciles all Temporal Schedules with the `activity_definitions` table:
 - Upserts schedules for every enabled cron definition
- Creates paused schedules for disabled cron definitions
+- Creates paused schedules for disabled cron or one-shot scheduled definitions
 - Deletes orphaned schedules with no matching DB row
 After adding or changing a recurring ActivityDefinition or workflow activity
@@ -159,14 +210,34 @@ repos, and emits one automated task per stale repo through explicit
 `weekly-coding-retro` follows the same cron -> context resolver -> per-repo task
 pattern for coding-session retrospection. It runs Saturdays at 19:00
 Europe/Berlin and resolves the latest State Hub `/progress/` item with
-`event_type=coding_retro` into `context.retro.suggestions`. Each positive-score
+`event_type=coding_retro` and a matching `window_days` into
-suggestion emits one task to `context.s.repo` with labels
+`context.retro.suggestions`. Each positive-score suggestion emits one task to
-`coding-retro`, `improvement`, and `automated`.
+`context.s.repo` with labels `coding-retro`, `improvement`, and `automated`.
 The weekly schedule intentionally ignores broader retro windows such as 30-day
 catch-up reports.
 Keep `weekly-coding-retro` disabled until Helix Forge publishes the
 `coding_retro` read model and a smoke run confirms the resolver returns a
 non-empty suggestion set with no duplicate target tasks on re-run.
 ## Ops inventory evidence posture
 The current accepted live backend for activity-core ops inventory probes is
 State Hub progress with `event_type=ops_inventory_probe`.
 Inter-Hub / ops-hub per-entity submission remains intentionally deferred until
 all of these are true:
 - `OPS_HUB_KEY` is provisioned through an operator-owned secret path, never Git,
  chat, or State Hub detail.
 - Widget or capability mapping is configured for the target ops-hub entities.
 - Production Inter-Hub intake is deployed and smoke-tested for the relevant
  authenticated routes.
 Until then, missing Inter-Hub configuration should produce an explicit skipped
 sink result, not a failed probe. This posture was recorded in State Hub decision
 `7c235bbb-ee6f-4c3e-b1dd-74717eac9082`.
 ---
 ## Temporal UI — filtering by activity
@@ -262,6 +333,52 @@ the same durable consumer name provides automatic failover.
 ---
 ## Run-miss recovery policies (cron triggers)
 A cron fire is **missed** when the worker or Temporal is unavailable at trigger
 time. `trigger_config.misfire_policy` selects what happens when the system
 recovers. Each policy combines a Temporal **catchup window** (how far back missed
 fires are recovered) with an **overlap policy** (what to do if a recovered fire
 would start while a prior run is still executing):
 | `misfire_policy` | Behaviour | Default catchup window | Overlap |
 | --- | --- | --- | --- |
 | `skip` | Run on trigger or skip — a missed fire is never recovered | 60s grace | `SKIP` |
 | `catchup_all` | Recover **every** fire missed during the outage | 365 days | `BUFFER_ALL` |
 | `catchup_latest` | Recover only the **most recent** missed fire; no backlog | 24h | `BUFFER_ONE` |
 Set `trigger_config.catchup_window_seconds` to override the per-policy default
 (e.g. an hourly definition using `catchup_latest` should set it to ~3600 so a
 single missed hour is recovered but older ones are not).
 Legacy values are still accepted: `catchup` → `catchup_all`,
 `compress` → `catchup_latest`.
 > **Why this exists:** before ACTIVITY-WP-0014 no catchup window was set, so a
 > brief outage at trigger time silently dropped the fire with no recovery and no
 > log line. The `daily-statehub-wsjf-triage` definition now uses `catchup_latest`.
 ## State Hub write idempotency (ACTIVITY-WP-0014 T05)
 Every State Hub write from activity-core (report-sink progress, ops-evidence
 progress, schedule-miss alerts) carries a stable **`Idempotency-Key`** header
 derived deterministically from the write's identity
 (`run_id:instruction_id:event_type`, or `schedule_miss:activity_id:last_fired`
 for miss alerts). This makes writes safe to **buffer and replay** under the
 planned State Hub *beachhead* (per-machine read cache + write outbox): a flush —
 possibly retried after an outage — cannot create duplicate progress/triage
 events once State Hub / the beachhead honours the header.
 The guarantee lives on the write, not on a live dedup read. The read-based
 `_progress_exists` check is now best-effort only: if State Hub is unreachable it
 returns `False` (proceed to the keyed write) rather than hard-failing. The header
 passes untouched through the `actcore-state-hub-bridge` proxy and is ignored by
 State Hub versions that do not yet honour it.
 > The queue/cache itself is **not** built in activity-core — it belongs to the
 > state-hub beachhead. activity-core only emits the key. See the proposal sent to
 > the `state-hub` agent.
 ## Troubleshooting
 ### Worker fails to start: "ACTCORE_DB_URL is required"
@@ -271,6 +388,9 @@ Set the environment variable before running the worker.
 1. Check Temporal UI → Schedules tab for the schedule status.
 2. Ensure `enabled=True` on the ActivityDefinition (paused schedules don't fire).
 3. Verify the cron expression with: `docker exec temporal-admin-tools temporal schedule describe --schedule-id activity-schedule-<uuid>`
 4. If a fire was **missed entirely** (no run, no failure event) during an outage,
   check `misfire_policy` — under `skip` missed fires are dropped by design. Use
   `catchup_all` or `catchup_latest` to recover them. See *Run-miss recovery policies*.
 ### Event not routing
 1. Check NATS monitoring: http://localhost:8222/jsz to verify the `ACTIVITY_EVENTS` stream exists.
@@ -342,6 +462,14 @@ uv run alembic history    # show full migration history
 ## Railiance Deployment
 ### Production API access posture
 The FastAPI admin surface remains ClusterIP-only in production. Do not publish
 it through an external ingress until a separate access-policy work item chooses
 the hostname, authentication layer, allowed users/agents, and audit
 expectations. This posture was recorded in State Hub decision
 `9ffaf7a9-227a-4e39-92e3-cd93d8cda1f2`.
 ### Pre-requisites
 - Docker ≥ 24 with Compose v2 (`docker compose` not `docker-compose`)
 - ≥ 4 GB RAM available (Temporal server takes ~1 GB)
@@ -412,6 +540,31 @@ make railiance-up
 ---
 ## Kaizen fleet resolver (coulomb-loop)
 Dry-run scheduled agent discovery against State Hub + pilot roster:
 ```bash
 export STATE_HUB_URL=http://127.0.0.1:8000
 export KAIZEN_RUNNER_HOST=$(hostname)
 export ACTIVITY_DEFINITION_DIRS=/home/worsch/coulomb-loop
 uv run python -c "
 from activity_core.context_resolvers.kaizen import discover_kaizen_scheduled_repos
 print(discover_kaizen_scheduled_repos({
    'roster': '/home/worsch/coulomb-loop/loops/kaizen-stack/roster.yaml',
    'cadence': 'daily',
 }))
 "
 make sync-activity-definitions   # requires ACTCORE_DB_URL + stack up
 ```
 Source types: `kaizen`, `resolver`, or `shell` (alias). Queries:
 `discover_kaizen_scheduled_repos`, `discover_kaizen_projects`.
 ---
 ## Wipe and restart dev stack
 ```bash
--- a/history/2026-06-16-intent-gap-analysis.md
+++ b/history/2026-06-16-intent-gap-analysis.md
@@ -0,0 +1,118 @@
 ---
 type: history
 title: "activity-core INTENT gap analysis"
 date: "2026-06-16"
 author: codex
 repo: activity-core
 related_workplan: ACTIVITY-WP-0009
 ---
 # activity-core INTENT Gap Analysis - 2026-06-16
 ## Context
 This note preserves the findings from a repository review against `INTENT.md`.
 The review refreshed `SCOPE.md` for the current repo state and identified the
 remaining gaps between the intended Event Bridge boundary and the implemented /
 deployed surface.
 Files and surfaces reviewed:
 - `INTENT.md`
 - `SCOPE.md`
 - `src/activity_core/`
 - `activity-definitions/`
 - `docs/runbook.md`
 - `docs/issue-core-emission-boundary.md`
 - `k8s/railiance/`
 - `workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md`
 - `workplans/ACTIVITY-WP-0007-ops-inventory-probe-runner.md`
 - `workplans/ACTIVITY-WP-0008-weekly-coding-retro.md`
 ## Summary
 activity-core matches the core INTENT boundary in shape: it owns trigger
 durability, context resolution, rule/instruction evaluation, outbound
 task/report/evidence emission, and local audit records. It still must avoid
 owning task lifecycle, project state, privileged ops execution, or service
 inventory authority.
 The current implementation has grown a useful bounded report/evidence surface:
 instruction reports can write working-memory notes and State Hub progress, and
 `ops-inventory` context sources can emit compact non-secret
 `ops_inventory_probe` summaries. This is still consistent with INTENT as long as
 those outputs remain records of activity-core activations rather than an
 authoritative task, project, or ops control plane.
 ## Findings
 ### 1. Scheduled-run trust gap
 `INTENT.md` expects recurring coordination work to run without Bernd as the
 manual coordination layer. The daily State Hub WSJF triage path is implemented
 and has produced validated reports, but `ACTIVITY-WP-0006-T03` still lacks
 three clean consecutive scheduled runs after the June 7 runtime projection
 failure.
 Current evidence as of 2026-06-16:
 - State Hub `daily_triage` progress only shows activity-core entries through
  2026-06-06.
 - `/home/worsch/the-custodian/memory/working` only has `daily-triage-*` notes
  for 2026-06-02 through 2026-06-06.
 Impact: daily triage is production-backed, but not yet fully proven as a
 standing substrate.
 ### 2. Live task creation gap
 `INTENT.md` says each activation emits task creation requests to issue-core and
 records only the spawn audit trail. The REST issue sink exists, but Railiance is
 currently configured with `ISSUE_SINK_TYPE=null`, so production runs record
 synthetic audit references instead of consistently creating live issue-core
 tasks.
 Impact: the task emission boundary is implemented but not yet broadly proven in
 the production deployment.
 ### 3. Review queue gap
 The original ADR text described `review_required` as routing instruction output
 to a pending review queue. Current code records `review_required` in
 report/spawn metadata but does not integrate with an issue-core review queue.
 Impact: current behavior is safe as metadata. As of the ACTIVITY-WP-0009
 implementation pass, ADR-003 and SCOPE.md have been aligned to that behavior.
 ### 4. Evidence backend gap
 The State Hub fallback evidence path works for `ops_inventory_probe`, and
 `ACTIVITY-WP-0007` has live Railiance evidence. Inter-Hub / ops-hub submission
 is intentionally deferred behind operator-owned `OPS_HUB_KEY` custody, widget
 mapping, and approval.
 Impact: activity-core can preserve non-secret continuity evidence, but richer
 per-entity ops evidence publication is not yet live.
 ### 5. Execution-boundary residue
 `TaskExecutorWorkflow` remains registered as a stub that persists a done
 `task_instances` row. INTENT explicitly says activity-core must not execute the
 work or track lifecycle state.
 Impact: low immediate risk because the workflow is inert, but it is an attractive
 wrong hook for future execution creep.
 ### 6. API exposure gap
 The FastAPI admin surface is useful for internal CRUD and manual triggers.
 Railiance docs keep it as ClusterIP until an authenticated ingress and access
 policy are chosen.
 Impact: operationally acceptable for now, but production access posture remains
 an explicit decision.
 ## Follow-up
 `workplans/ACTIVITY-WP-0009-intent-gap-closure.md` was created to turn these
 findings into tracked closure work.
--- a/k8s/railiance/20-runtime.yaml
+++ b/k8s/railiance/20-runtime.yaml
@@ -11,7 +11,7 @@ data:
  TEMPORAL_NAMESPACE: default
  NATS_URL: nats://actcore-nats:4222
  STATE_HUB_URL: http://actcore-state-hub-bridge:8000
-  LLM_CONNECT_URL: ""
+  LLM_CONNECT_URL: http://llm-connect.activity-core.svc.cluster.local:8080
  LLM_CONNECT_TIMEOUT_SECONDS: "300"
  REPO_SCOPING_URL: http://repo-scoping.repo-scoping.svc.cluster.local:8020
  ISSUE_CORE_URL: http://issue-core.issue-core.svc.cluster.local:8010
@@ -47,7 +47,10 @@ data:
      type: cron
      cron_expression: "20 7 * * *"
      timezone: Europe/Berlin
-      misfire_policy: skip
+      # ACTIVITY-WP-0014: recover the most recent missed daily fire when the
      # worker/Temporal was unavailable at trigger time, without accumulating a
      # backlog after a multi-day outage.
      misfire_policy: catchup_latest
    context_sources:
      - type: static
        bind_to: context.prompt_path
@@ -164,6 +167,36 @@ data:
    Kubernetes projection of the Custodian-owned definition in
    `/home/worsch/the-custodian/activity-definitions/hourly-recently-on-scope.md`.
  state-hub-consistency-sweep.md: |
    ---
    id: "7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b"
    name: "State Hub Consistency Sweep"
    type: activity-definition
    version: "1.0"
    enabled: true
    owner: custodian
    governance: custodian
    status: active
    created: "2026-06-21"
    trigger:
      type: cron
      cron_expression: "*/15 * * * *"
      timezone: UTC
      misfire_policy: skip
    context_sources:
      - type: state-hub
        query: consistency_sweep_remote_all
        required: true
        params:
          max_seconds: 300
          source: activity-core
        bind_to: context.consistency_sweep_remote_all
    ---
    # ActivityDefinition: State Hub Consistency Sweep
    Kubernetes projection of the Custodian-owned definition in
    `/home/worsch/the-custodian/activity-definitions/state-hub-consistency-sweep.md`.
  ops-service-inventory-probes.md: |
    ---
    id: "40d15a87-7ff6-4d8e-992c-37df15f95110"
@@ -578,7 +611,8 @@ spec:
                          method=self.command,
                      )
                      try:
-                          with urlopen(request, timeout=30) as response:
+                          timeout = 360 if self.command == "POST" else 30
                          with urlopen(request, timeout=timeout) as response:
                              payload = response.read()
                              self.send_response(response.status)
                              for key, value in response.headers.items():
@@ -599,7 +633,7 @@ spec:
              ThreadingHTTPServer(("0.0.0.0", 18080), Proxy).serve_forever()
          readinessProbe:
            httpGet:
-              path: /state/summary
+              path: /state/health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
--- a/k8s/railiance/README.md
+++ b/k8s/railiance/README.md
@@ -32,8 +32,10 @@ Europe/Berlin schedule, verify both runtime dependencies:
 - `actcore-state-hub-bridge` can reach the State Hub API through the node-local
  tunnel expected at `127.0.0.1:18000`.
- `LLM_CONNECT_URL` is set to an operator-approved llm-connect endpoint that can
+- `LLM_CONNECT_URL` points at the verified in-namespace llm-connect Service,
-  serve the `custodian-triage-balanced` profile.
+  `http://llm-connect.activity-core.svc.cluster.local:8080`, and the
  operator-owned provider Secret lets that Service serve the
  `custodian-triage-balanced` profile.
 If `LLM_CONNECT_URL` is missing or broken, report-sink instructions write a
 visible `execution_failed` diagnostic instead of silently producing no report.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -12,6 +12,7 @@ dependencies = [
    "alembic>=1.14",
    "nats-py>=2.7",
    "httpx>=0.27",
    "pyyaml>=6.0",
 ]
 [project.optional-dependencies]
--- a/schemas/daily-triage-report.json
+++ b/schemas/daily-triage-report.json
@@ -1,4 +1,5 @@
 {
  "$comment": "ACTIVITY-WP-0016-T02. Strict, bounded contract for the daily WSJF triage report. The per-item 'recommendations' schema is intentionally strict on STRUCTURE (types + required keys) so the T03 boundary parser can validate each recommendation independently and quarantine only the malformed ones. 'maxItems' is a producer hint (honoured by llm-connect constrained decoding and by the prompt); it is deliberately NOT hard-enforced by the in-repo validator, because rejecting a whole report for having too many items would reproduce the monolithic-failure bug WP-0016 exists to remove. Over-count is mitigated in T03 (keep top-N by rank, quarantine the rest). Value-domain vocabularies (action/confidence) are documented in the prompt and enforced by T04 guardrails with mitigation, not as brittle hard-fail enums here.",
  "type": "object",
  "required": ["summary", "recommendations"],
  "properties": {
@@ -7,8 +8,28 @@
    },
    "recommendations": {
      "type": "array",
      "maxItems": 7,
      "items": {
-        "type": "object"
+        "type": "object",
        "required": ["rank", "candidate", "action", "why"],
        "properties": {
          "rank": { "type": "integer" },
          "candidate": { "type": "string" },
          "action": { "type": "string" },
          "why": { "type": "string" },
          "confidence": { "type": "string" },
          "wsjf": {
            "type": "object",
            "properties": {
              "score": { "type": "number" },
              "strategic_value": { "type": "number" },
              "time_criticality": { "type": "number" },
              "risk_reduction": { "type": "number" },
              "opportunity_enablement": { "type": "number" },
              "job_size": { "type": "number" }
            }
          }
        }
      }
    }
  }
--- a/scripts/smoke_admin_sync_no_restart.py
+++ b/scripts/smoke_admin_sync_no_restart.py
@@ -0,0 +1,212 @@
 #!/usr/bin/env python3
 """Railiance01 no-restart smoke for POST /admin/sync.
 Patches the disabled ops-service-inventory-probes projection in the cluster
 ConfigMap, waits for the API pod volume to refresh, runs /admin/sync twice,
 verifies DB + Temporal schedule drift without restarting actcore-worker, then
 rolls the ConfigMap back to the disabled baseline.
 Requires:
  - KUBECONFIG pointing at railiance01 (for example ~/.kube/config-hosteurope)
  - kubectl access to the activity-core namespace
 Example:
  export KUBECONFIG=~/.kube/config-hosteurope
  python3 scripts/smoke_admin_sync_no_restart.py
 """
 from __future__ import annotations
 import json
 import subprocess
 import sys
 import time
 ACTIVITY_ID = "40d15a87-7ff6-4d8e-992c-37df15f95110"
 CONFIGMAP = "actcore-external-activity-definitions"
 DEFINITION_KEY = "ops-service-inventory-probes.md"
 MOUNTED_FILE = (
    "/etc/activity-core/external-definitions/activity-definitions/"
    f"{DEFINITION_KEY}"
 )
 VOLUME_PROPAGATION_SECONDS = 65
 def kubectl(*args: str, input_text: str | None = None) -> str:
    cmd = ["kubectl", "-n", "activity-core", *args]
    return subprocess.check_output(
        cmd,
        input=input_text,
        text=True,
    )
 def api_json(path: str, *, method: str = "GET") -> dict:
    script = (
        "import urllib.request, json\n"
        f'req = urllib.request.Request("http://localhost:8010{path}", method="{method}")\n'
        "print(urllib.request.urlopen(req).read().decode())"
    )
    return json.loads(kubectl("exec", "deploy/actcore-api", "--", "python3", "-c", script))
 def worker_lines(script: str) -> list[str]:
    return kubectl("exec", "deploy/actcore-worker", "--", "python3", "-c", script).splitlines()
 def worker_uid() -> str:
    return kubectl(
        "get",
        "pod",
        "-l",
        "app.kubernetes.io/name=actcore-worker",
        "-o",
        "jsonpath={.items[0].metadata.uid}",
    ).strip()
 def load_configmap() -> dict:
    return json.loads(kubectl("get", "configmap", CONFIGMAP, "-o", "json"))
 def apply_configmap(cm: dict) -> None:
    kubectl("apply", "-f", "-", input_text=json.dumps(cm))
 def patch_definition(cm: dict, *, enabled: bool, cron: str) -> None:
    text = cm["data"][DEFINITION_KEY]
    for line in text.splitlines():
        if line.strip().startswith("enabled:"):
            break
    else:
        raise RuntimeError("enabled field not found in projection")
    text = _replace_once(text, 'enabled: false', f"enabled: {'true' if enabled else 'false'}")
    text = _replace_once(text, 'enabled: true', f"enabled: {'true' if enabled else 'false'}")
    text = _replace_once(
        text,
        'cron_expression: "15 * * * *"',
        f'cron_expression: "{cron}"',
    )
    text = _replace_once(
        text,
        'cron_expression: "25 * * * *"',
        f'cron_expression: "{cron}"',
    )
    cm["data"][DEFINITION_KEY] = text
    apply_configmap(cm)
 def _replace_once(text: str, old: str, new: str) -> str:
    if old not in text:
        return text
    return text.replace(old, new, 1)
 def wait_for_mount(*, enabled: bool, cron: str) -> None:
    deadline = time.time() + VOLUME_PROPAGATION_SECONDS
    want_enabled = "enabled: true" if enabled else "enabled: false"
    want_cron = f'cron_expression: "{cron}"'
    while time.time() < deadline:
        content = kubectl("exec", "deploy/actcore-api", "--", "cat", MOUNTED_FILE)
        if want_enabled in content and want_cron in content:
            return
        time.sleep(5)
    raise RuntimeError(
        f"ConfigMap projection did not refresh within {VOLUME_PROPAGATION_SECONDS}s"
    )
 def get_definition() -> dict[str, object]:
    for item in api_json("/activity-definitions/"):
        if item["id"] == ACTIVITY_ID:
            return {
                "enabled": item["enabled"],
                "cron": item["trigger_config"]["cron_expression"],
            }
    raise RuntimeError(f"ActivityDefinition {ACTIVITY_ID} not found")
 def describe_schedule() -> dict[str, object]:
    script = f"""
 import asyncio
 from temporalio.client import Client
 async def main() -> None:
    client = await Client.connect("actcore-temporal:7233")
    handle = client.get_schedule_handle("activity-schedule-{ACTIVITY_ID}")
    described = await handle.describe()
    schedule = described.schedule
    minute = schedule.spec.calendars[0].minute[0].start if schedule.spec.calendars else None
    print(schedule.state.paused)
    print(minute)
 asyncio.run(main())
 """
    paused, minute = worker_lines(script)
    return {"paused": paused == "True", "minute": int(minute)}
 def main() -> int:
    worker_before = worker_uid()
    cm = load_configmap()
    print("1) enable + cadence change via ConfigMap")
    patch_definition(cm, enabled=True, cron="25 * * * *")
    wait_for_mount(enabled=True, cron="25 * * * *")
    print("2) POST /admin/sync (first pass)")
    sync1 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if not sync1.get("ok"):
        print(json.dumps(sync1, indent=2), file=sys.stderr)
        return 1
    defn = get_definition()
    schedule = describe_schedule()
    print("   definition:", defn)
    print("   schedule:", schedule)
    if defn != {"enabled": True, "cron": "25 * * * *"}:
        print("definition drift after sync", file=sys.stderr)
        return 1
    if schedule["paused"] or schedule["minute"] != 25:
        print("schedule drift after enable sync", file=sys.stderr)
        return 1
    print("3) POST /admin/sync (idempotent repeat)")
    sync2 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if sync2.get("schedules") != sync1.get("schedules"):
        print("idempotent schedule counts changed", file=sys.stderr)
        print(json.dumps({"sync1": sync1, "sync2": sync2}, indent=2), file=sys.stderr)
        return 1
    print("4) rollback ConfigMap + sync")
    cm = load_configmap()
    patch_definition(cm, enabled=False, cron="15 * * * *")
    wait_for_mount(enabled=False, cron="15 * * * *")
    sync3 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if not sync3.get("ok"):
        print(json.dumps(sync3, indent=2), file=sys.stderr)
        return 1
    defn = get_definition()
    schedule = describe_schedule()
    print("   definition:", defn)
    print("   schedule:", schedule)
    if defn != {"enabled": False, "cron": "15 * * * *"}:
        print("rollback definition drift", file=sys.stderr)
        return 1
    if not schedule["paused"] or schedule["minute"] != 15:
        print("rollback schedule drift", file=sys.stderr)
        return 1
    worker_after = worker_uid()
    if worker_before != worker_after:
        print("actcore-worker pod restarted during smoke", file=sys.stderr)
        return 1
    print("smoke passed: admin sync hot-reload without worker restart")
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/src/activity_core/activities.py
+++ b/src/activity_core/activities.py
@@ -11,8 +11,10 @@ activities that need DB access.
 from __future__ import annotations
 import json
 import uuid
 from datetime import datetime, timezone
 from typing import Any
 from sqlalchemy import select
 from sqlalchemy.dialects.postgresql import insert as pg_insert
@@ -52,6 +54,36 @@ def _get_session_factory() -> async_sessionmaker[AsyncSession]:
    return _session_factory
 def _bind_resolver_result(bind_key: str, result: Any) -> Any:
    """Unwrap single-key resolver payloads when the key matches bind_key.
    Resolvers such as ``discover_kaizen_projects`` return ``{"projects": [...]}``
    while definitions bind to ``context.projects`` and iterate ``for_each:
    context.projects``.  Multi-key summaries (e.g. repo SBOM bulk) stay intact.
    """
    if isinstance(result, dict) and len(result) == 1 and bind_key in result:
        return result[bind_key]
    return result
 def _parse_event_envelope(event_envelope_json: str | None) -> dict[str, Any] | None:
    """Parse an event envelope JSON string for context resolvers."""
    if not event_envelope_json:
        return None
    try:
        payload = json.loads(event_envelope_json)
    except (TypeError, json.JSONDecodeError) as exc:
        activity.logger.warning("Invalid event envelope JSON - %s", exc)
        return None
    if not isinstance(payload, dict):
        activity.logger.warning(
            "Invalid event envelope JSON - expected object, got %s",
            type(payload).__name__,
        )
        return None
    return payload
 # ── Activities ─────────────────────────────────────────────────────────────────
@activity.defn
@@ -111,11 +143,14 @@ async def resolve_context(
    from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY
    snapshot: dict = {}
    event_envelope = _parse_event_envelope(event_envelope_json)
    for source in context_sources:
        source_type = source.get("type", "")
        query = source.get("query", "")
        params = source.get("params") or {}
        required = bool(source.get("required") or params.get("required", False))
        resolver_params = dict(params)
        resolver_params["required"] = required
        raw_bind = source.get("bind_to") or source.get("name") or source_type
        # Strip the 'context.' namespace prefix so evaluator can find the key.
        bind_key = raw_bind.removeprefix("context.") if raw_bind.startswith("context.") else raw_bind
@@ -139,7 +174,8 @@ async def resolve_context(
            continue
        try:
-            snapshot[bind_key] = resolver_cls().resolve(query, None, params)
+            resolved = resolver_cls().resolve(query, event_envelope, resolver_params)
            snapshot[bind_key] = _bind_resolver_result(bind_key, resolved)
        except Exception as exc:
            if required:
                raise ApplicationError(
--- a/src/activity_core/api.py
+++ b/src/activity_core/api.py
@@ -40,6 +40,7 @@ from temporalio.client import Client
 from activity_core.models import ActivityDefinition, CronTriggerConfig
 from activity_core.orm import ActivityDefinition as ActivityDefinitionRow, EventType as EventTypeRow
 from activity_core.schedule_manager import delete_schedule, upsert_schedule
 from activity_core.sync_service import run_sync
 from activity_core.webhook_receiver import router as webhook_router
 TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
@@ -275,6 +276,24 @@ async def trigger_definition(definition_id: uuid.UUID) -> dict[str, str]:
    return {"workflow_id": handle.id, "trigger_key": trigger_key}
 # --- Admin sync ---------------------------------------------------------------
@app.post("/admin/sync")
 async def admin_sync(
    definitions: bool = True,
    schedules: bool = True,
    event_types: bool = False,
 ) -> dict[str, Any]:
    """Run operator-triggered definition/event/schedule sync without restart."""
    return await run_sync(
        session_factory=_get_db(),
        temporal_client=_get_temporal() if schedules else None,
        definitions=definitions,
        schedules=schedules,
        event_types=event_types,
    )
 # T42: Curator gate — event type approval endpoint
@app.get("/health")
--- a/src/activity_core/context_resolvers/init.py
+++ b/src/activity_core/context_resolvers/init.py
@@ -1 +1,8 @@
-from activity_core.context_resolvers import ops_inventory, repo_scoping, state_hub  # noqa: F401
+from activity_core.context_resolvers import (  # noqa: F401
    event_payload,
    kaizen,
    ops_inventory,
    repo_scoping,
    state_hub,
    reuse_surface,
 )
--- a/src/activity_core/context_resolvers/event_payload.py
+++ b/src/activity_core/context_resolvers/event_payload.py
@@ -0,0 +1,51 @@
 """Event payload context adapter.
 Registered as source type ``event-payload``. It exposes the triggering
 EventEnvelope attributes to event-triggered ActivityDefinitions without
 requiring an external context service call.
 """
 from __future__ import annotations
 from typing import Any
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, ContextResolver
 class EventPayloadContextResolver(ContextResolver):
    """Resolve context from the triggering event envelope attributes."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> Any:
        attributes = _event_attributes(event)
        if query in {"", "attributes"}:
            return attributes
        if query.startswith("attributes."):
            return _resolve_path(attributes, query.removeprefix("attributes."))
        return _resolve_path(attributes, query)
 def _event_attributes(event: Any) -> dict[str, Any]:
    if not isinstance(event, dict):
        raise RuntimeError("event-payload source requires an event envelope")
    attributes = event.get("attributes")
    if not isinstance(attributes, dict):
        raise RuntimeError("event-payload source requires envelope attributes")
    return attributes
 def _resolve_path(root: dict[str, Any], path: str) -> Any:
    if not path:
        return root
    current: Any = root
    for part in path.split("."):
        if not part:
            return {}
        if not isinstance(current, dict):
            return {}
        current = current.get(part)
        if current is None:
            return {}
    return current
 CONTEXT_RESOLVER_REGISTRY["event-payload"] = EventPayloadContextResolver
--- a/src/activity_core/context_resolvers/kaizen.py
+++ b/src/activity_core/context_resolvers/kaizen.py
@@ -0,0 +1,305 @@
 """Kaizen-agentic fleet context adapter.
 Registered as source types ``kaizen`` and ``resolver`` (alias for ADR-005 drafts).
 Supported queries:
  - discover_kaizen_scheduled_repos: hub roster ∩ valid ``.kaizen/schedule.yml``
  - discover_kaizen_projects: repos with ``.kaizen/metrics`` marker (+ optional roster)
 Contract: kaizen-agentic ``docs/integrations/discover-kaizen-scheduled-repos.md``
 """
 from __future__ import annotations
 import json
 import logging
 import os
 import socket
 from pathlib import Path
 from typing import Any
 import httpx
 import yaml
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, ContextResolver
 logger = logging.getLogger(__name__)
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _TIMEOUT_SECONDS = 10.0
 _SCHEDULE_VERSION = "1"
 _VALID_CADENCES = frozenset({"daily", "weekly", "monthly"})
 _PREPARE_BIN = os.environ.get("KAIZEN_AGENTIC_BIN", "kaizen-agentic")
 def _base_url() -> str:
    return os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL).rstrip("/")
 def _runner_host() -> str:
    return os.environ.get("KAIZEN_RUNNER_HOST", socket.gethostname())
 def _fetch_repos(domain: str | None) -> list[dict[str, Any]]:
    url = f"{_base_url()}/repos/"
    try:
        resp = httpx.get(url, timeout=_TIMEOUT_SECONDS)
        resp.raise_for_status()
    except httpx.HTTPError as exc:
        raise RuntimeError(f"State Hub unreachable at {url}: {exc}") from exc
    payload = resp.json()
    if not isinstance(payload, list):
        raise RuntimeError(f"State Hub /repos/ returned non-list: {type(payload)!r}")
    if domain:
        payload = [r for r in payload if r.get("domain_slug") == domain]
    return payload
 def _repo_root(repo: dict[str, Any]) -> Path | None:
    host_paths = repo.get("host_paths") or {}
    host = _runner_host()
    raw = host_paths.get(host) or repo.get("local_path")
    if not raw or raw == "(unknown)":
        return None
    path = Path(raw)
    return path if path.is_dir() else None
 def _load_roster(params: dict[str, Any]) -> dict[str, dict[str, Any]] | None:
    """Return slug -> roster entry for active repos, or None if no roster param."""
    roster_path = params.get("roster")
    if not roster_path:
        return None
    path = Path(roster_path)
    if not path.is_file():
        logger.warning("kaizen roster file not found: %s", path)
        return {}
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    if not isinstance(data, dict):
        logger.warning("kaizen roster invalid (not a mapping): %s", path)
        return {}
    entries: dict[str, dict[str, Any]] = {}
    for item in data.get("active") or []:
        if isinstance(item, dict) and item.get("slug"):
            slug = str(item["slug"])
            if item.get("status", "active") == "saturated":
                continue
            entries[slug] = item
    return entries
 def _validate_schedule_file(path: Path) -> list[str]:
    """Structural validation aligned with kaizen-agentic schedule validate."""
    errors: list[str] = []
    try:
        raw = yaml.safe_load(path.read_text(encoding="utf-8"))
    except yaml.YAMLError as exc:
        return [f"invalid YAML: {exc}"]
    if not isinstance(raw, dict):
        return ["schedule.yml must be a YAML mapping at the top level"]
    version = raw.get("version")
    if version is None:
        errors.append("missing required key: version")
    elif str(version) != _SCHEDULE_VERSION:
        errors.append(f"unsupported version '{version}' (expected '{_SCHEDULE_VERSION}')")
    agents = raw.get("agents", {})
    if not isinstance(agents, dict):
        errors.append("agents must be a mapping")
        return errors
    if not agents:
        errors.append("no agents declared under 'agents:'")
    seen: set[str] = set()
    for name, settings in agents.items():
        if settings is None:
            settings = {}
        if not isinstance(settings, dict):
            errors.append(f"agent '{name}' settings must be a mapping")
            continue
        if name in seen:
            errors.append(f"duplicate agent entry: {name}")
        seen.add(name)
        cadence = str(settings.get("cadence", ""))
        if cadence not in _VALID_CADENCES:
            errors.append(
                f"agent '{name}': invalid cadence '{cadence}' "
                f"(expected one of {', '.join(sorted(_VALID_CADENCES))})"
            )
        cron = settings.get("cron")
        if cron is not None and not isinstance(cron, str):
            errors.append(f"agent '{name}' cron must be a string")
    return errors
 def _parse_schedule(path: Path) -> dict[str, Any] | None:
    errors = _validate_schedule_file(path)
    if errors:
        return None
    raw = yaml.safe_load(path.read_text(encoding="utf-8"))
    return raw if isinstance(raw, dict) else None
 def _prepare_command(agent: str, root: Path) -> str:
    return f"{_PREPARE_BIN} schedule prepare {agent} --target {root}"
 def discover_kaizen_scheduled_repos(params: dict[str, Any]) -> dict[str, Any]:
    domain = params.get("domain")
    cadence_filter = params.get("cadence")
    roster = _load_roster(params)
    runs: list[dict[str, Any]] = []
    for repo in _fetch_repos(domain):
        slug = repo.get("slug", "")
        if not slug:
            continue
        if roster is not None and slug not in roster:
            continue
        root = _repo_root(repo)
        if root is None:
            logger.info("kaizen repo_unreachable slug=%s host=%s", slug, _runner_host())
            continue
        schedule_path = root / ".kaizen" / "schedule.yml"
        if not schedule_path.is_file():
            continue
        errors = _validate_schedule_file(schedule_path)
        if errors:
            logger.warning(
                "kaizen schedule_invalid slug=%s path=%s errors=%s",
                slug,
                schedule_path,
                "; ".join(errors),
            )
            continue
        schedule = _parse_schedule(schedule_path)
        if schedule is None:
            continue
        timezone = schedule.get("timezone") or "Europe/Berlin"
        roster_agents = roster.get(slug, {}).get("agents") if roster else None
        agents = schedule.get("agents") or {}
        for agent_name, settings in agents.items():
            if not isinstance(settings, dict):
                continue
            if not bool(settings.get("enabled", True)):
                continue
            cadence = str(settings.get("cadence", ""))
            if cadence_filter and cadence != cadence_filter:
                continue
            if roster_agents and agent_name not in roster_agents:
                continue
            cron = settings.get("cron")
            runs.append(
                {
                    "repo": slug,
                    "root": str(root),
                    "agent": agent_name,
                    "cadence": cadence,
                    "cron": cron,
                    "timezone": timezone,
                    "enabled": True,
                    "prepare_command": _prepare_command(agent_name, root),
                }
            )
    return {"scheduled_runs": runs}
 def _read_metrics_summary(metrics_dir: Path) -> dict[str, Any]:
    summary_path = metrics_dir / "summary.json"
    if not summary_path.is_file():
        return {}
    try:
        data = json.loads(summary_path.read_text(encoding="utf-8"))
        return data if isinstance(data, dict) else {}
    except (json.JSONDecodeError, OSError):
        return {}
 def discover_kaizen_projects(params: dict[str, Any]) -> dict[str, Any]:
    """Discover repos with ``.kaizen/metrics`` (optional per-agent summaries)."""
    domain = params.get("domain")
    marker = params.get("marker", ".kaizen/metrics")
    roster = _load_roster(params)
    in_roster_key = "in_pilot_roster"
    projects: list[dict[str, Any]] = []
    for repo in _fetch_repos(domain):
        slug = repo.get("slug", "")
        if not slug:
            continue
        in_pilot = roster is None or slug in roster
        if roster is not None and slug not in roster:
            continue
        root = _repo_root(repo)
        if root is None:
            continue
        metrics_root = root / Path(marker)
        if not metrics_root.is_dir():
            continue
        has_metrics = any(metrics_root.iterdir()) if metrics_root.is_dir() else False
        if not has_metrics:
            continue
        roster_entry = roster.get(slug, {}) if roster else {}
        agent_filter = roster_entry.get("agents")
        for agent_dir in sorted(metrics_root.iterdir()):
            if not agent_dir.is_dir() or agent_dir.name == "optimizer":
                continue
            agent = agent_dir.name
            if agent_filter and agent not in agent_filter:
                continue
            summary = _read_metrics_summary(agent_dir)
            projects.append(
                {
                    "repo": slug,
                    "root": str(root),
                    "agent": agent,
                    "has_metrics": True,
                    in_roster_key: in_pilot,
                    "summary": summary,
                }
            )
        if not any(p["repo"] == slug for p in projects):
            projects.append(
                {
                    "repo": slug,
                    "root": str(root),
                    "agent": None,
                    "has_metrics": has_metrics,
                    in_roster_key: in_pilot,
                    "summary": {},
                }
            )
    return {"projects": projects}
 class KaizenContextResolver(ContextResolver):
    """Resolves kaizen fleet scheduling and project metrics discovery."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
        if query == "discover_kaizen_scheduled_repos":
            return discover_kaizen_scheduled_repos(params)
        if query == "discover_kaizen_projects":
            return discover_kaizen_projects(params)
        return {}
 CONTEXT_RESOLVER_REGISTRY["kaizen"] = KaizenContextResolver
 CONTEXT_RESOLVER_REGISTRY["resolver"] = KaizenContextResolver
 CONTEXT_RESOLVER_REGISTRY["shell"] = KaizenContextResolver
--- a/src/activity_core/context_resolvers/reuse_surface.py
+++ b/src/activity_core/context_resolvers/reuse_surface.py
@@ -0,0 +1,516 @@
 """Reuse-surface registry hygiene context adapter.
 Registered as source type ``reuse-surface`` and as the ``shell`` resolver
 dispatcher for the ``reuse_surface_report_gaps`` query.  Other shell queries
 continue to delegate to the kaizen resolver for backward compatibility.
 """
 from __future__ import annotations
 import json
 import logging
 import os
 import socket
 import subprocess
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
 import httpx
 import yaml
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, ContextResolver
 from activity_core.context_resolvers.kaizen import KaizenContextResolver
 from activity_core.context_resolvers.state_hub import StateHubContextResolver
 logger = logging.getLogger(__name__)
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _REPORT_TIMEOUT_SECONDS = 60
 _STATE_HUB_TIMEOUT_SECONDS = 10.0
 _KNOWN_SIGNALS = frozenset(
    {
        "registry_gap",
        "empty_capability_scaffold",
        "stale_scope",
        "stale_sbom",
        "publish_check_fail",
    }
 )
@dataclass(frozen=True)
 class RosterEntry:
    slug: str
    domain: str | None = None
    publish_check: str | None = None
 def _base_url() -> str:
    return os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL).rstrip("/")
 def _runner_host(params: dict[str, Any]) -> str:
    return str(
        params.get("runner_host")
        or os.environ.get("KAIZEN_RUNNER_HOST")
        or socket.gethostname()
    )
 def _as_required(params: dict[str, Any]) -> bool:
    return bool(params.get("required", False))
 def reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
    """Resolve registry-hygiene gaps for the next rollout batch.
    Missing operational dependencies are visible failures for required sources
    and graceful empty lists for optional sources so definitions can opt into
    either behavior without changing rule logic.
    """
    try:
        return _resolve_reuse_surface_report_gaps(params)
    except Exception as exc:
        if _as_required(params):
            raise
        logger.warning("reuse_surface_report_gaps unavailable: %s", exc)
        return {"gaps": []}
 def _resolve_reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
    roster_path = _roster_path(params)
    entries = _load_active_roster_entries(roster_path)
    if not entries:
        return {"gaps": []}
    state_path = _round_robin_state_path(params, roster_path)
    selected, next_cursor = _select_round_robin_batch(
        entries,
        _batch_size(params),
        state_path,
    )
    if not selected:
        return {"gaps": []}
    signals = _enabled_signals(_signals_path(params, roster_path))
    roots = _resolve_repo_roots(selected, _runner_host(params))
    report = _reuse_surface_report(params, signals)
    gaps = _gap_records(selected, roots, signals, report)
    _write_round_robin_state(state_path, next_cursor, selected)
    return {"gaps": gaps}
 def _roster_path(params: dict[str, Any]) -> Path:
    raw = params.get("roster")
    if not raw:
        raise ValueError("reuse_surface_report_gaps requires params.roster")
    path = Path(str(raw)).expanduser()
    if not path.is_file():
        raise FileNotFoundError(f"reuse_surface_report_gaps roster not found: {path}")
    return path
 def _batch_size(params: dict[str, Any]) -> int:
    try:
        return max(1, int(params.get("batch_size", 3)))
    except (TypeError, ValueError):
        return 3
 def _round_robin_state_path(params: dict[str, Any], roster_path: Path) -> Path:
    raw = params.get("round_robin_state")
    if raw:
        return Path(str(raw)).expanduser()
    return roster_path.with_name("round-robin-state.json")
 def _signals_path(params: dict[str, Any], roster_path: Path) -> Path:
    raw = params.get("signals")
    if raw:
        return Path(str(raw)).expanduser()
    return roster_path.with_name("signals.yml")
 def _load_active_roster_entries(path: Path) -> list[RosterEntry]:
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    if not isinstance(data, dict):
        raise ValueError(f"reuse_surface rollout roster is not a mapping: {path}")
    entries: dict[str, RosterEntry] = {}
    for domain, block in _iter_domain_blocks(data):
        if _domain_phase(block) != "active":
            continue
        for item in _repo_items(block):
            entry = _entry_from_item(item, domain, block)
            if entry and entry.slug not in entries:
                entries[entry.slug] = entry
    return list(entries.values())
 def _iter_domain_blocks(data: dict[str, Any]) -> list[tuple[str | None, dict[str, Any]]]:
    domains = data.get("domains")
    if isinstance(domains, dict):
        return [
            (str(name), block)
            for name, block in domains.items()
            if isinstance(block, dict)
        ]
    if isinstance(domains, list):
        return [
            (str(block.get("name") or block.get("domain") or ""), block)
            for block in domains
            if isinstance(block, dict)
        ]
    if isinstance(data.get("active"), list):
        return [(None, {"phase": "active", "repos": data["active"]})]
    return [
        (str(name), block)
        for name, block in data.items()
        if isinstance(block, dict) and ("phase" in block or "repos" in block)
    ]
 def _domain_phase(block: dict[str, Any]) -> str:
    return str(block.get("phase") or block.get("status") or "").lower()
 def _repo_items(block: dict[str, Any]) -> list[Any]:
    repos = (
        block.get("repos")
        or block.get("repo_slugs")
        or block.get("repositories")
        or block.get("slugs")
        or []
    )
    if isinstance(repos, dict):
        items: list[Any] = []
        for slug, config in repos.items():
            if isinstance(config, dict):
                item = dict(config)
                item.setdefault("slug", slug)
                items.append(item)
            else:
                items.append(str(slug))
        return items
    if isinstance(repos, list):
        return repos
    return []
 def _entry_from_item(
    item: Any,
    domain: str | None,
    block: dict[str, Any],
 ) -> RosterEntry | None:
    publish_check = block.get("publish_check")
    if isinstance(item, str):
        slug = item
    elif isinstance(item, dict):
        slug = item.get("slug") or item.get("repo") or item.get("name")
        publish_check = item.get("publish_check", publish_check)
    else:
        return None
    if not slug:
        return None
    return RosterEntry(
        slug=str(slug),
        domain=domain or None,
        publish_check=str(publish_check).lower() if publish_check is not None else None,
    )
 def _select_round_robin_batch(
    entries: list[RosterEntry],
    batch_size: int,
    state_path: Path,
 ) -> tuple[list[RosterEntry], int]:
    if not entries:
        return [], 0
    cursor = _read_round_robin_cursor(state_path) % len(entries)
    size = min(batch_size, len(entries))
    selected = [entries[(cursor + offset) % len(entries)] for offset in range(size)]
    next_cursor = (cursor + size) % len(entries)
    return selected, next_cursor
 def _read_round_robin_cursor(path: Path) -> int:
    if not path.is_file():
        return 0
    try:
        data = json.loads(path.read_text(encoding="utf-8"))
    except (OSError, json.JSONDecodeError):
        return 0
    if not isinstance(data, dict):
        return 0
    try:
        return int(data.get("cursor", 0))
    except (TypeError, ValueError):
        return 0
 def _write_round_robin_state(
    path: Path,
    cursor: int,
    selected: list[RosterEntry],
 ) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    payload = {
        "cursor": cursor,
        "last_batch": [entry.slug for entry in selected],
        "updated_at": datetime.now(timezone.utc).isoformat(),
    }
    path.write_text(
        json.dumps(payload, indent=2, sort_keys=True) + "\n",
        encoding="utf-8",
    )
 def _enabled_signals(path: Path) -> set[str]:
    if not path.is_file():
        return set(_KNOWN_SIGNALS)
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    node = data.get("signals") if isinstance(data, dict) else data
    enabled: set[str] = set()
    saw_known_signal = False
    if isinstance(node, dict):
        for name, config in node.items():
            if str(name) not in _KNOWN_SIGNALS:
                continue
            saw_known_signal = True
            if isinstance(config, dict) and config.get("enabled") is False:
                continue
            if config is False:
                continue
            enabled.add(str(name))
    elif isinstance(node, list):
        for item in node:
            if isinstance(item, str) and item in _KNOWN_SIGNALS:
                saw_known_signal = True
                enabled.add(item)
            elif isinstance(item, dict):
                name = item.get("id") or item.get("signal") or item.get("name")
                if str(name) in _KNOWN_SIGNALS and item.get("enabled", True) is not False:
                    saw_known_signal = True
                    enabled.add(str(name))
    return enabled if saw_known_signal else set(_KNOWN_SIGNALS)
 def _resolve_repo_roots(
    entries: list[RosterEntry],
    runner_host: str,
 ) -> dict[str, Path]:
    requested = {entry.slug for entry in entries}
    roots: dict[str, Path] = {}
    for repo in _fetch_repos():
        slug = str(repo.get("slug") or "")
        if slug not in requested:
            continue
        raw = _repo_path_for_host(repo, runner_host)
        if raw:
            roots[slug] = Path(raw)
    return roots
 def _fetch_repos() -> list[dict[str, Any]]:
    url = f"{_base_url()}/repos/"
    try:
        resp = httpx.get(url, timeout=_STATE_HUB_TIMEOUT_SECONDS)
        resp.raise_for_status()
    except httpx.HTTPError as exc:
        raise RuntimeError(f"State Hub unreachable at {url}: {exc}") from exc
    payload = resp.json()
    if not isinstance(payload, list):
        raise RuntimeError(f"State Hub /repos/ returned non-list: {type(payload)!r}")
    return [repo for repo in payload if isinstance(repo, dict)]
 def _repo_path_for_host(repo: dict[str, Any], runner_host: str) -> str | None:
    host_paths = repo.get("host_paths") or {}
    raw = None
    if isinstance(host_paths, dict):
        raw = host_paths.get(runner_host)
    raw = raw or repo.get("local_path")
    if not raw or raw == "(unknown)":
        return None
    return str(raw)
 def _reuse_surface_report(params: dict[str, Any], signals: set[str]) -> dict[str, Any]:
    if not (signals & {"registry_gap", "empty_capability_scaffold"}):
        return {}
    binary = str(params.get("reuse_surface_bin") or "reuse-surface")
    try:
        completed = subprocess.run(
            [binary, "report", "gaps", "--format", "json"],
            capture_output=True,
            check=False,
            text=True,
            timeout=_REPORT_TIMEOUT_SECONDS,
        )
    except FileNotFoundError as exc:
        raise RuntimeError(f"reuse-surface CLI not found: {binary}") from exc
    except subprocess.TimeoutExpired as exc:
        raise RuntimeError("reuse-surface report gaps timed out") from exc
    if completed.returncode != 0:
        detail = completed.stderr.strip() or completed.stdout.strip()
        raise RuntimeError(f"reuse-surface report gaps failed: {detail}")
    try:
        payload = json.loads(completed.stdout or "{}")
    except json.JSONDecodeError as exc:
        raise RuntimeError("reuse-surface report gaps returned invalid JSON") from exc
    if not isinstance(payload, dict):
        raise RuntimeError("reuse-surface report gaps returned non-object JSON")
    return payload
 def _gap_records(
    entries: list[RosterEntry],
    roots: dict[str, Path],
    signals: set[str],
    report: dict[str, Any],
 ) -> list[dict[str, Any]]:
    empty_scaffolds = _repo_set(report, {"empty_scaffolds", "empty_scaffold"})
    publish_fail = _repo_set(
        report,
        {"publish_fail", "publish_fails", "publish_failures"},
    )
    gaps: list[dict[str, Any]] = []
    seen: set[tuple[str, str]] = set()
    for entry in entries:
        root = roots.get(entry.slug)
        if root is None:
            logger.info("reuse_surface repo_unreachable slug=%s", entry.slug)
            continue
        if (
            signals & {"registry_gap", "empty_capability_scaffold"}
            and entry.slug in empty_scaffolds
        ):
            _append_gap(gaps, seen, entry.slug, root, "empty_capability_scaffold")
        if "registry_gap" in signals and entry.slug in publish_fail:
            _append_gap(gaps, seen, entry.slug, root, "registry_gap")
        if "publish_check_fail" in signals and entry.publish_check == "fail":
            _append_gap(gaps, seen, entry.slug, root, "publish_check_fail")
        if "stale_scope" in signals and _scope_is_stale(root):
            _append_gap(gaps, seen, entry.slug, root, "stale_scope")
        if "stale_sbom" in signals and _sbom_is_stale(entry.slug):
            _append_gap(gaps, seen, entry.slug, root, "stale_sbom")
    return gaps
 def _append_gap(
    gaps: list[dict[str, Any]],
    seen: set[tuple[str, str]],
    slug: str,
    root: Path,
    signal: str,
 ) -> None:
    key = (slug, signal)
    if key in seen:
        return
    seen.add(key)
    gaps.append(
        {
            "repo": slug,
            "root": str(root),
            "signal": signal,
            "hygiene_signal": signal,
        }
    )
 def _scope_is_stale(root: Path) -> bool:
    scope = root / "SCOPE.md"
    if not scope.is_file():
        return True
    age_seconds = datetime.now(timezone.utc).timestamp() - scope.stat().st_mtime
    return age_seconds > 90 * 24 * 60 * 60
 def _sbom_is_stale(slug: str) -> bool:
    payload = StateHubContextResolver().resolve(
        "repo_sbom_status",
        None,
        {"repo_slug": slug},
    )
    if not isinstance(payload, dict):
        return False
    try:
        return int(payload.get("sbom_age_days", 0)) > 30
    except (TypeError, ValueError):
        return False
 def _repo_set(report: dict[str, Any], keys: set[str]) -> set[str]:
    slugs: set[str] = set()
    for value in _values_for_keys(report, keys):
        slugs.update(_slugs_from_value(value))
    return slugs
 def _values_for_keys(value: Any, keys: set[str]) -> list[Any]:
    values: list[Any] = []
    if isinstance(value, dict):
        for key, nested in value.items():
            if key in keys:
                values.append(nested)
            values.extend(_values_for_keys(nested, keys))
    elif isinstance(value, list):
        for item in value:
            values.extend(_values_for_keys(item, keys))
    return values
 def _slugs_from_value(value: Any) -> set[str]:
    if isinstance(value, str):
        return {value}
    if isinstance(value, list):
        slugs: set[str] = set()
        for item in value:
            slugs.update(_slugs_from_value(item))
        return slugs
    if isinstance(value, dict):
        for key in ("repo", "repo_slug", "slug", "name"):
            if value.get(key):
                return {str(value[key])}
        slugs: set[str] = set()
        for key, nested in value.items():
            if nested is True or isinstance(nested, (dict, list)):
                slugs.add(str(key))
            slugs.update(_slugs_from_value(nested))
        return slugs
    return set()
 class ReuseSurfaceContextResolver(ContextResolver):
    """Resolves reuse-surface registry hygiene gap reports."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
        if query == "reuse_surface_report_gaps":
            return reuse_surface_report_gaps(params)
        return {}
 class ShellContextResolver(ContextResolver):
    """Dispatch shell-backed context queries without breaking kaizen aliases."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
        if query == "reuse_surface_report_gaps":
            return reuse_surface_report_gaps(params)
        return KaizenContextResolver().resolve(query, event, params)
 CONTEXT_RESOLVER_REGISTRY["reuse-surface"] = ReuseSurfaceContextResolver
 CONTEXT_RESOLVER_REGISTRY["shell"] = ShellContextResolver
--- a/src/activity_core/context_resolvers/state_hub.py
+++ b/src/activity_core/context_resolvers/state_hub.py
@@ -12,6 +12,7 @@ Supported queries:
  - coding_retro:     latest /progress/ item with event_type=coding_retro
  - daily_triage_digest: curated scalar JSON digest for daily WSJF triage
  - recently_on_scope_hourly: POST {STATE_HUB_URL}/recently-on-scope/hourly
  - consistency_sweep_remote_all: POST {STATE_HUB_URL}/consistency/sweep/remote-all
 No caching — state hub data is live operational state and must not be stale
 within a single workflow run.
@@ -31,6 +32,7 @@ from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, Cont
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _TIMEOUT_SECONDS = 10.0
 _SWEEP_TIMEOUT_SECONDS = 330.0
 _OPEN_WORKSTREAM_STATUSES = {"active", "ready", "blocked"}
 _OPEN_TASK_STATUSES = {"wait", "todo", "progress"}
 # Sentinel age for repos that have never had an SBOM ingested. Large enough
@@ -53,13 +55,26 @@ def _fetch_json(path: str, params: dict[str, Any] | None = None) -> Any:
        return {}
-def _post_json(path: str, payload: dict[str, Any]) -> Any:
+def _post_json(path: str, payload: dict[str, Any], *, timeout: float = _TIMEOUT_SECONDS) -> Any:
    url = f"{_base_url()}{path}"
-    resp = httpx.post(url, json=payload, timeout=_TIMEOUT_SECONDS)
+    resp = httpx.post(url, json=payload, timeout=timeout)
    resp.raise_for_status()
    return resp.json()
 def _validate_consistency_sweep_remote_all(result: Any) -> dict[str, Any]:
    if not isinstance(result, dict):
        raise RuntimeError("consistency_sweep_remote_all returned a non-object response")
    required_keys = {"exit_code", "lock_skipped", "repos_processed"}
    missing = required_keys - set(result)
    if missing:
        missing_list = ", ".join(sorted(missing))
        raise RuntimeError(
            f"consistency_sweep_remote_all response missing required key(s): {missing_list}"
        )
    return result
 def _validate_recently_on_scope_hourly(result: Any) -> dict[str, Any]:
    if not isinstance(result, dict):
        raise RuntimeError("recently_on_scope_hourly returned a non-object response")
@@ -107,6 +122,18 @@ class StateHubContextResolver(ContextResolver):
            }
            result = _post_json("/recently-on-scope/hourly", payload)
            return _validate_recently_on_scope_hourly(result)
        if query == "consistency_sweep_remote_all":
            payload = {
                key: value
                for key, value in params.items()
                if key not in {"required"}
            }
            result = _post_json(
                "/consistency/sweep/remote-all",
                payload,
                timeout=_SWEEP_TIMEOUT_SECONDS,
            )
            return _validate_consistency_sweep_remote_all(result)
        return {}
@@ -219,11 +246,13 @@ def _coding_retro(params: dict[str, Any]) -> dict[str, Any]:
    """
    event_type = str(params.get("event_type") or "coding_retro")
    limit = _bounded_int(params.get("limit", 100), default=100, minimum=1, maximum=500)
-    items = _fetch_json("/progress/", {"limit": limit})
+    query_params = {"event_type": event_type, "limit": limit}
    items = _fetch_json("/progress/", query_params)
    if not isinstance(items, list):
        return _empty_coding_retro(event_type)
-    item = _latest_progress_item(items, event_type)
+    window_days = _optional_int(params.get("window_days"))
    item = _latest_progress_item(items, event_type, window_days)
    if item is None:
        return _empty_coding_retro(event_type)
@@ -256,12 +285,18 @@ def _empty_coding_retro(event_type: str) -> dict[str, Any]:
 def _latest_progress_item(
    items: list[Any],
    event_type: str,
    window_days: int | None = None,
 ) -> dict[str, Any] | None:
    newest: dict[str, Any] | None = None
    newest_key: tuple[datetime, int] | None = None
    for index, item in enumerate(items):
        if not isinstance(item, dict) or item.get("event_type") != event_type:
            continue
        if window_days is not None and not _progress_matches_window_days(
            item,
            window_days,
        ):
            continue
        key = (_parse_progress_timestamp(item.get("created_at")), index)
        if newest_key is None or key > newest_key:
            newest = item
@@ -295,6 +330,56 @@ def _progress_detail(item: dict[str, Any]) -> dict[str, Any]:
    return {}
 def _progress_matches_window_days(item: dict[str, Any], window_days: int) -> bool:
    detail = _progress_detail(item)
    return _progress_window_days(detail) == window_days
 def _progress_window_days(detail: dict[str, Any]) -> int | None:
    window = detail.get("window")
    if isinstance(window, dict):
        direct = _optional_int(window.get("days") or window.get("window_days"))
        if direct is not None:
            return direct
        ranged = _window_days_from_range(
            window.get("since") or window.get("window_start"),
            window.get("until") or window.get("window_end"),
        )
        if ranged is not None:
            return ranged
    direct = _optional_int(detail.get("days") or detail.get("window_days"))
    if direct is not None:
        return direct
    return _window_days_from_range(
        detail.get("since") or detail.get("window_start"),
        detail.get("until") or detail.get("window_end"),
    )
 def _window_days_from_range(start: Any, end: Any) -> int | None:
    start_ts = _parse_optional_timestamp(start)
    end_ts = _parse_optional_timestamp(end)
    if start_ts is None or end_ts is None or end_ts < start_ts:
        return None
    seconds = (end_ts - start_ts).total_seconds()
    if seconds <= 0:
        return None
    return max(1, round(seconds / 86400))
 def _parse_optional_timestamp(value: Any) -> datetime | None:
    if not isinstance(value, str) or not value:
        return None
    try:
        parsed = datetime.fromisoformat(value.replace("Z", "+00:00"))
    except ValueError:
        return None
    if parsed.tzinfo is None:
        parsed = parsed.replace(tzinfo=timezone.utc)
    return parsed.astimezone(timezone.utc)
 def _normalise_coding_retro_suggestions(value: Any) -> list[dict[str, Any]]:
    if not isinstance(value, list):
        return []
@@ -374,6 +459,13 @@ def _bounded_int(value: Any, *, default: int, minimum: int, maximum: int) -> int
    return max(minimum, min(maximum, number))
 def _optional_int(value: Any) -> int | None:
    try:
        return int(value)
    except (TypeError, ValueError):
        return None
 def _clean_scalar(value: Any) -> str:
    return " ".join(str(value or "").split())
--- a/src/activity_core/issue_sink.py
+++ b/src/activity_core/issue_sink.py
@@ -20,7 +20,8 @@ from activity_core.rules.models import TaskRef, TaskSpec
 logger = logging.getLogger(__name__)
-ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8010")
+ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8765")
 ISSUE_CORE_API_KEY_ENV = "ISSUE_CORE_API_KEY"
 ISSUE_SINK_TYPE = os.environ.get("ISSUE_SINK_TYPE", "rest")
@@ -30,10 +31,30 @@ class IssueSink(ABC):
 class IssueCoreRestSink(IssueSink):
-    """POSTs to issue-core REST API. Config: ISSUE_CORE_URL env var."""
+    """POSTs to issue-core REST API.
-    def __init__(self, base_url: str = ISSUE_CORE_URL) -> None:
+    Config: ISSUE_CORE_URL and ISSUE_CORE_API_KEY env vars (shared key with
    the issue-core server).
    """
    def __init__(
        self,
        base_url: str = ISSUE_CORE_URL,
        api_key: str | None = None,
    ) -> None:
        self._base_url = base_url.rstrip("/")
        if api_key is not None:
            self._api_key = api_key.strip()
        else:
            self._api_key = os.environ.get(ISSUE_CORE_API_KEY_ENV, "").strip()
    def _auth_headers(self) -> dict[str, str]:
        if not self._api_key:
            raise RuntimeError(
                f"{ISSUE_CORE_API_KEY_ENV} is not set. "
                "Required when ISSUE_SINK_TYPE=rest."
            )
        return {"Authorization": f"Bearer {self._api_key}"}
    def emit(self, task_spec: TaskSpec) -> TaskRef:
        payload = {
@@ -45,10 +66,19 @@ class IssueCoreRestSink(IssueSink):
            "due_in_days": task_spec.due_in_days,
            "source_type": task_spec.source_type,
            "source_id": task_spec.source_id,
-            "triggering_event_id": task_spec.triggering_event_id,
+            "triggering_event_id": (
                str(task_spec.triggering_event_id)
                if task_spec.triggering_event_id is not None
                else None
            ),
            "activity_definition_id": task_spec.activity_definition_id,
        }
-        resp = httpx.post(f"{self._base_url}/issues/", json=payload, timeout=10.0)
+        resp = httpx.post(
            f"{self._base_url}/issues/",
            json=payload,
            headers=self._auth_headers(),
            timeout=10.0,
        )
        resp.raise_for_status()
        data = resp.json()
        return TaskRef(
--- a/src/activity_core/models.py
+++ b/src/activity_core/models.py
@@ -49,7 +49,18 @@ class CronTriggerConfig(BaseModel):
    )
    timezone: str = Field(default="UTC", description="IANA timezone name.")
    jitter_seconds: int = Field(default=0, ge=0)
-    misfire_policy: Literal["skip", "catchup", "compress"] = Field(default="skip")
+    # Run-miss recovery behaviour (ACTIVITY-WP-0014). What happens when a fire is
    # missed because the worker / Temporal was unavailable at trigger time:
    #   skip           - run on trigger or skip; a missed fire is never recovered
    #   catchup_all    - recover every fire missed during the outage window
    #   catchup_latest - recover only the most recent missed fire; do not accumulate
    # Legacy aliases are accepted: catchup → catchup_all, compress → catchup_latest.
    misfire_policy: Literal[
        "skip", "catchup_all", "catchup_latest", "catchup", "compress"
    ] = Field(default="skip")
    # Override the per-policy default catchup window (how far back Temporal will
    # recover missed fires after an outage). None uses the policy default.
    catchup_window_seconds: int | None = Field(default=None, ge=0)
 class EventTriggerConfig(BaseModel):
--- a/src/activity_core/ops_evidence_sinks.py
+++ b/src/activity_core/ops_evidence_sinks.py
@@ -2,12 +2,15 @@
 from __future__ import annotations
 import json
 import os
 from pathlib import Path
 from typing import Any
 import httpx
 from activity_core.context_resolvers.ops_inventory import _sanitize_url
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _INTER_HUB_SINK_TYPES = {
@@ -15,6 +18,10 @@ _INTER_HUB_SINK_TYPES = {
    "inter-hub-event",
    "inter-hub-interaction-event",
 }
 _CORE_HUB_SINK_TYPES = {
    "core-hub",
    "core-hub-interaction-event",
 }
 def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, Any]]:
@@ -55,6 +62,12 @@ def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, An
                    results.append(
                        _post_state_hub_progress(payload, bind_key, probe_result, sink)
                    )
                elif sink_type in _CORE_HUB_SINK_TYPES:
                    results.append(
                        _post_core_hub_interaction_event(
                            payload, bind_key, probe_result, sink
                        )
                    )
                elif sink_type in _INTER_HUB_SINK_TYPES:
                    results.append(_inter_hub_result(sink))
                else:
@@ -121,6 +134,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers(run_id, context_key, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -136,12 +150,17 @@ def _post_state_hub_progress(
 def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bool:
-    resp = httpx.get(
+    # Best-effort optimisation only; the Idempotency-Key header on the write is the
-        f"{base_url}/progress/",
+    # real dedup guarantee. Do not hard-fail if State Hub is unreachable here.
-        params={"limit": 100},
+    try:
-        timeout=10.0,
+        resp = httpx.get(
-    )
+            f"{base_url}/progress/",
-    resp.raise_for_status()
+            params={"limit": 100},
            timeout=10.0,
        )
        resp.raise_for_status()
    except httpx.HTTPError:
        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
@@ -152,6 +171,213 @@ def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bo
    return False
 def _post_core_hub_interaction_event(
    payload: dict[str, Any],
    context_key: str,
    probe_result: dict[str, Any],
    sink: dict[str, Any],
 ) -> dict[str, Any]:
    raw_base_url = (
        sink.get("core_hub_url")
        or sink.get("base_url")
        or os.environ.get("CORE_HUB_BASE_URL")
        or ""
    )
    base_url = str(raw_base_url).rstrip("/")
    runtime_token = _core_hub_runtime_token(sink)
    widget_id = _core_hub_widget_id(sink, probe_result)
    missing: list[str] = []
    if not base_url:
        missing.append("CORE_HUB_BASE_URL")
    if not runtime_token:
        missing.append("CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE")
    if not widget_id:
        missing.append("widget_id or CORE_HUB_WIDGET_ID")
    if missing:
        return {
            "type": sink.get("type"),
            "status": "skipped",
            "reason": "missing_core_hub_config",
            "missing": missing,
            "context_key": context_key,
        }
    endpoint = _selected_endpoint(probe_result, sink)
    event_type = sink.get("event_type", "ops-endpoint-verified")
    timeout = float(sink.get("timeout_seconds", 10.0))
    body = {
        "widgetId": widget_id,
        "eventType": event_type,
        "viewContext": _core_hub_view_context(payload, context_key, endpoint, sink),
        "metadata": _core_hub_metadata(payload, context_key, probe_result, endpoint),
    }
    resp = httpx.post(
        f"{base_url}/api/v2/interaction-events",
        json=body,
        headers=_core_hub_headers(runtime_token),
        timeout=timeout,
    )
    resp.raise_for_status()
    data = resp.json()
    event_id = data.get("id")
    if not event_id:
        raise RuntimeError("Core Hub interaction event response did not include an id")
    if not _core_hub_event_exists(base_url, runtime_token, str(event_id), timeout):
        raise RuntimeError("Core Hub interaction event was not visible after create")
    return {
        "type": sink.get("type"),
        "status": "posted",
        "event_type": data.get("eventType", event_type),
        "event_id": event_id,
        "widget_id": data.get("widgetId", widget_id),
        "verified": True,
        "context_key": context_key,
    }
 def _core_hub_headers(runtime_token: str) -> dict[str, str]:
    return {
        "Accept": "application/json",
        "Authorization": f"Bearer {runtime_token}",
        "Content-Type": "application/json",
        "User-Agent": "activity-core-ops-evidence/0.1",
    }
 def _core_hub_runtime_token(sink: dict[str, Any]) -> str:
    token_file = (
        sink.get("runtime_token_file")
        or sink.get("token_file")
        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_FILE")
    )
    if token_file:
        return Path(str(token_file)).read_text(encoding="utf-8").strip()
    env_name = (
        sink.get("runtime_token_env")
        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_ENV")
        or "CORE_HUB_RUNTIME_TOKEN"
    )
    return os.environ.get(str(env_name), "").strip()
 def _core_hub_widget_id(sink: dict[str, Any], probe_result: dict[str, Any]) -> str:
    direct = sink.get("widget_id") or os.environ.get("CORE_HUB_WIDGET_ID")
    if direct:
        return str(direct)
    endpoint = _selected_endpoint(probe_result, sink)
    widget_ref = endpoint.get("widget_ref") if endpoint else None
    if not widget_ref:
        return ""
    mapping = sink.get("widget_mapping") or sink.get("capability_mapping")
    if mapping is None:
        mapping = os.environ.get("CORE_HUB_WIDGET_MAPPING")
    parsed = _parse_widget_mapping(mapping)
    return parsed.get(str(widget_ref), "")
 def _parse_widget_mapping(raw: Any) -> dict[str, str]:
    if isinstance(raw, dict):
        return {str(key): str(value) for key, value in raw.items() if value}
    if not isinstance(raw, str) or not raw.strip():
        return {}
    value = raw.strip()
    if value.startswith("{"):
        try:
            loaded = json.loads(value)
        except json.JSONDecodeError:
            return {}
        if isinstance(loaded, dict):
            return {str(key): str(item) for key, item in loaded.items() if item}
        return {}
    if "=" not in value:
        return {}
    pairs: dict[str, str] = {}
    for part in value.split(","):
        key, _, item = part.partition("=")
        if key.strip() and item.strip():
            pairs[key.strip()] = item.strip()
    return pairs
 def _selected_endpoint(probe_result: dict[str, Any], sink: dict[str, Any]) -> dict[str, Any]:
    endpoints = [
        endpoint
        for endpoint in probe_result.get("endpoints", [])
        if isinstance(endpoint, dict)
    ]
    endpoint_id = sink.get("endpoint_id")
    if endpoint_id:
        match = next(
            (endpoint for endpoint in endpoints if endpoint.get("endpoint_id") == endpoint_id),
            None,
        )
        if match:
            return match
    return next(
        (endpoint for endpoint in endpoints if endpoint.get("widget_ref")),
        endpoints[0] if endpoints else {},
    )
 def _core_hub_view_context(
    payload: dict[str, Any],
    context_key: str,
    endpoint: dict[str, Any],
    sink: dict[str, Any],
 ) -> str:
    return str(
        sink.get("view_context")
        or endpoint.get("view_context")
        or f"activity-core/ops-inventory/{payload.get('run_id', 'unknown')}/{context_key}"
    )
 def _core_hub_metadata(
    payload: dict[str, Any],
    context_key: str,
    probe_result: dict[str, Any],
    endpoint: dict[str, Any],
 ) -> dict[str, Any]:
    compact = _compact_probe_result(probe_result)
    return {
        "activity_id": payload.get("activity_id"),
        "activity_core_run_id": payload.get("run_id"),
        "scheduled_for": payload.get("scheduled_for"),
        "source_type": "ops-inventory",
        "context_key": context_key,
        "probe": {
            "generated_at": compact.get("generated_at"),
            "inventory_path": compact.get("inventory_path"),
            "status": compact.get("status"),
            "reason": compact.get("reason"),
            "summary": compact.get("summary", {}),
        },
        "endpoint": _compact_endpoint(endpoint) if endpoint else {},
    }
 def _core_hub_event_exists(
    base_url: str,
    runtime_token: str,
    event_id: str,
    timeout: float,
 ) -> bool:
    resp = httpx.get(
        f"{base_url}/api/v2/interaction-events",
        headers=_core_hub_headers(runtime_token),
        timeout=timeout,
    )
    resp.raise_for_status()
    payload = resp.json()
    data = payload.get("data") if isinstance(payload, dict) else []
    if not isinstance(data, list):
        return False
    return any(isinstance(item, dict) and item.get("id") == event_id for item in data)
 def _inter_hub_result(sink: dict[str, Any]) -> dict[str, Any]:
    missing: list[str] = []
    if not (sink.get("inter_hub_url") or os.environ.get("INTER_HUB_URL")):
--- a/src/activity_core/report_sinks.py
+++ b/src/activity_core/report_sinks.py
@@ -11,6 +11,8 @@ from zoneinfo import ZoneInfo
 import httpx
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _THE_CUSTODIAN_ROOT = Path("/home/worsch/the-custodian")
 _FORBIDDEN_CUSTODIAN_ROOTS = (
@@ -149,6 +151,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers(run_id, instruction_id, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -167,12 +170,18 @@ def _progress_exists(
    instruction_id: str,
    event_type: str,
 ) -> bool:
-    resp = httpx.get(
+    # Best-effort read-dedup optimisation only. The Idempotency-Key header on the
-        f"{base_url}/progress/",
+    # write is the real guarantee; if State Hub is unreachable here we must not
-        params={"limit": 100},
+    # hard-fail — proceed to the (keyed) write rather than raising.
-        timeout=10.0,
+    try:
-    )
+        resp = httpx.get(
-    resp.raise_for_status()
+            f"{base_url}/progress/",
            params={"limit": 100},
            timeout=10.0,
        )
        resp.raise_for_status()
    except httpx.HTTPError:
        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
--- a/src/activity_core/rules/executor.py
+++ b/src/activity_core/rules/executor.py
@@ -160,15 +160,20 @@ def _execute(
    prompt_hash = hashlib.sha256(rendered.encode()).hexdigest()
    llm_config = _llm_run_config(instr)
    # Reference allow-list (WP-0016-T04): if a context resolver supplied the set
    # of known candidate ids, recommendations pointing at anything else are
    # quarantined. Absent (None) today → the check is inert until wired.
    allow_list = _allow_list_from_context(context)
    # Step 3 — call LLM
    raw_output = llm_client.complete(rendered, model=instr.model, config=llm_config)
    # Step 4 — validate and optionally retry
-    task_specs, report, error = _validate_output(raw_output, instr)
+    task_specs, report, error = _validate_output(raw_output, instr, allow_list)
    if error:
        retry_prompt = rendered + f"\n\nPrevious output was invalid: {error}\nPlease fix."
        raw_output = llm_client.complete(retry_prompt, model=instr.model, config=llm_config)
-        task_specs, report, error = _validate_output(raw_output, instr)
+        task_specs, report, error = _validate_output(raw_output, instr, allow_list)
        if error:
            # Truncate to keep log volume bounded but long enough to see the
            # actual JSON shape mismatch (typical reports are <2KB).
@@ -178,6 +183,14 @@ def _execute(
                "error=%s, raw_output_preview=%r",
                instr.id, prompt_hash, error, preview,
            )
            # Posture B (WP-0016-T03): try to recover a partial-but-usable
            # report from individually-parseable items before declaring total
            # loss. One bad item should cost one item, not the whole report.
            recovered = _resilient_report(
                instr, raw_output, error, prompt_hash, allow_list,
            )
            if recovered is not None:
                return recovered
            failure_report = _invalid_output_report(instr, error, raw_output)
            if failure_report is not None:
                return InstructionResult(
@@ -279,6 +292,320 @@ def _invalid_output_report(
    return report
 # ---------------------------------------------------------------------------
 # Resilient report recovery (ACTIVITY-WP-0016-T03)
 #
 # Posture B — verify & mitigate at the producer→consumer boundary. When the
 # whole-document parse/validate fails, recover individually-parseable
 # recommendation objects, validate each against the item schema, keep the valid
 # ones, and quarantine the malformed/over-limit ones with provenance. One bad
 # item costs one item, not the whole report (error locality == unit of work).
 # ---------------------------------------------------------------------------
 _QUARANTINE_LIMIT = 20
 _SNIPPET_LIMIT = 200
 # Producer guardrails (ACTIVITY-WP-0016-T04): structural bounds applied to every
 # recommendation regardless of producer (LLM, agent, or human). These are
 # verify-and-mitigate limits — an offending item is quarantined, never allowed to
 # fail the whole report or flow unbounded into a downstream consumer.
 _MAX_STRING_LEN = 4000
 _MAX_DEPTH = 8
 _SUMMARY_RE = re.compile(r'"summary"\s*:\s*"((?:[^"\\]|\\.)*)"')
 def _snippet(value: Any) -> str:
    text = value if isinstance(value, str) else json.dumps(value, default=str)
    return text[:_SNIPPET_LIMIT]
 def _json_depth(value: Any, depth: int = 1) -> int:
    if depth > _MAX_DEPTH:
        return depth
    if isinstance(value, dict):
        return max((_json_depth(v, depth + 1) for v in value.values()), default=depth)
    if isinstance(value, list):
        return max((_json_depth(v, depth + 1) for v in value), default=depth)
    return depth
 def _has_oversized_string(value: Any) -> bool:
    if isinstance(value, str):
        return len(value) > _MAX_STRING_LEN
    if isinstance(value, dict):
        return any(_has_oversized_string(v) for v in value.values())
    if isinstance(value, list):
        return any(_has_oversized_string(v) for v in value)
    return False
 def _item_structure_error(item: Any) -> str | None:
    """Producer-agnostic structural guardrail: depth and string-length caps."""
    if _json_depth(item) > _MAX_DEPTH:
        return f"exceeds max nesting depth {_MAX_DEPTH}"
    if _has_oversized_string(item):
        return f"contains a string longer than {_MAX_STRING_LEN} chars"
    return None
 def _allow_list_from_context(context: dict | None) -> set[str] | None:
    """Build the recommendation-candidate allow-list from resolved context.
    Looks for `context["known_candidates"]` (a list/set of valid candidate ids).
    Returns None when absent so the allow-list check stays inert until a context
    resolver populates it — the guardrail capability ships now; activation is a
    one-line resolver change.
    """
    if not isinstance(context, dict):
        return None
    known = context.get("known_candidates")
    if isinstance(known, (list, set, tuple)):
        return {str(item) for item in known}
    return None
 def _report_contract(instr: Any) -> tuple[dict[str, Any] | None, int | None]:
    """Extract (item_schema, max_items) for the recommendations list, if any."""
    try:
        schema = _load_output_schema(getattr(instr, "output_schema", ""))
    except (OSError, json.JSONDecodeError, TypeError):
        return None, None
    if not isinstance(schema, dict):
        return None, None
    recs = (schema.get("properties") or {}).get("recommendations")
    if not isinstance(recs, dict):
        return None, None
    item_schema = recs.get("items") if isinstance(recs.get("items"), dict) else None
    max_items = recs.get("maxItems") if isinstance(recs.get("maxItems"), int) else None
    return item_schema, max_items
 def _extract_object_spans(raw: str) -> list[tuple[str, bool]]:
    """Return (span, complete) for each recommendation object in raw output.
    Scans the `recommendations` array brace-aware and string-aware so it recovers
    objects whether they are pretty-printed across many lines or emitted one per
    line (NDJSON). A truncated trailing object is returned with complete=False.
    """
    key = raw.find('"recommendations"')
    start_region = raw.find("[", key) if key >= 0 else -1
    if start_region < 0:
        return []
    spans: list[tuple[str, bool]] = []
    i, n = start_region + 1, len(raw)
    while i < n:
        ch = raw[i]
        if ch == "]":
            break
        if ch != "{":
            i += 1
            continue
        depth, in_str, esc, j = 0, False, False, i
        closed = False
        while j < n:
            c = raw[j]
            if in_str:
                if esc:
                    esc = False
                elif c == "\\":
                    esc = True
                elif c == '"':
                    in_str = False
            elif c == '"':
                in_str = True
            elif c == "{":
                depth += 1
            elif c == "}":
                depth -= 1
                if depth == 0:
                    spans.append((raw[i:j + 1], True))
                    closed = True
                    break
            j += 1
        if not closed:
            spans.append((raw[i:], False))  # truncated tail
            break
        i = j + 1
    return spans
 def _try_repair(span: str) -> str:
    """Best-effort close of a truncated JSON object: balance quote, braces, brackets."""
    in_str, esc, depth_c, depth_b = False, False, 0, 0
    for c in span:
        if in_str:
            if esc:
                esc = False
            elif c == "\\":
                esc = True
            elif c == '"':
                in_str = False
        elif c == '"':
            in_str = True
        elif c == "{":
            depth_c += 1
        elif c == "}":
            depth_c -= 1
        elif c == "[":
            depth_b += 1
        elif c == "]":
            depth_b -= 1
    repaired = span.rstrip().rstrip(",")
    if in_str:
        repaired += '"'
    return repaired + "]" * max(depth_b, 0) + "}" * max(depth_c, 0)
 def _recover_recommendations(
    raw: str,
 ) -> tuple[str | None, list[dict[str, Any]], list[dict[str, Any]]]:
    """Recover (summary, items, quarantined) from a failed report payload."""
    summary_match = _SUMMARY_RE.search(raw)
    summary = None
    if summary_match:
        try:
            summary = json.loads(f'"{summary_match.group(1)}"')
        except json.JSONDecodeError:
            summary = summary_match.group(1)
    items: list[dict[str, Any]] = []
    quarantined: list[dict[str, Any]] = []
    for index, (span, complete) in enumerate(_extract_object_spans(raw)):
        parsed: Any = None
        try:
            parsed = json.loads(span)
        except json.JSONDecodeError as exc:
            if not complete:
                try:
                    parsed = json.loads(_try_repair(span))
                except json.JSONDecodeError:
                    parsed = None
            if parsed is None:
                quarantined.append(
                    {"index": index, "error": str(exc), "raw": _snippet(span),
                     "reason": "truncated" if not complete else "unparseable"}
                )
                continue
        if isinstance(parsed, dict):
            items.append(parsed)
        else:
            quarantined.append(
                {"index": index, "error": "item is not a JSON object",
                 "raw": _snippet(span)}
            )
    return summary, items, quarantined
 def _partition_items(
    items: list[dict[str, Any]],
    item_schema: dict[str, Any] | None,
    max_items: int | None,
    *,
    run_schema: bool = True,
    allow_list: set[str] | None = None,
 ) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
    """Screen items into (valid, quarantined).
    Applied uniformly to recovered items (run_schema=True) and to already
    schema-valid happy-path items (run_schema=False). Order of checks: structural
    type → schema → producer guardrails (depth/length) → reference allow-list →
    count cap. The first failing check quarantines the item with provenance.
    """
    valid: list[dict[str, Any]] = []
    quarantined: list[dict[str, Any]] = []
    for index, item in enumerate(items):
        if not isinstance(item, dict):
            quarantined.append(
                {"index": index, "error": "item is not a JSON object",
                 "raw": _snippet(item), "reason": "malformed"}
            )
            continue
        schema_error = (
            _validate_schema_node(item, item_schema, f"recommendations[{index}]")
            if (run_schema and item_schema)
            else None
        )
        if schema_error:
            quarantined.append(
                {"index": index, "error": schema_error, "raw": _snippet(item),
                 "reason": "schema"}
            )
            continue
        structure_error = _item_structure_error(item)
        if structure_error:
            quarantined.append(
                {"index": index, "error": structure_error, "raw": _snippet(item),
                 "reason": "guardrail"}
            )
            continue
        if allow_list is not None:
            candidate = item.get("candidate")
            if not isinstance(candidate, str) or candidate not in allow_list:
                quarantined.append(
                    {"index": index, "error": f"candidate {candidate!r} not in allow-list",
                     "raw": _snippet(item), "reason": "allow_list"}
                )
                continue
        valid.append(item)
    if max_items is not None and len(valid) > max_items:
        for item in valid[max_items:]:
            quarantined.append(
                {"index": None, "error": f"exceeds maxItems={max_items}",
                 "raw": _snippet(item), "reason": "over_limit"}
            )
        valid = valid[:max_items]
    return valid, quarantined
 def _resilient_report(
    instr: Any,
    raw_output: Any,
    original_error: str,
    prompt_hash: str | None,
    allow_list: set[str] | None = None,
 ) -> InstructionResult | None:
    """Recover a partial-but-usable report from output that failed validation.
    Returns None when nothing usable can be recovered, so the caller falls back
    to the total-loss diagnostic artifact (_invalid_output_report).
    """
    if not getattr(instr, "report_sinks", None) or not isinstance(raw_output, str):
        return None
    item_schema, max_items = _report_contract(instr)
    summary, items, quarantined = _recover_recommendations(raw_output)
    if not items:
        return None
    valid, item_quarantine = _partition_items(
        items, item_schema, max_items, allow_list=allow_list,
    )
    quarantined.extend(item_quarantine)
    if not valid:
        return None
    report: dict[str, Any] = {
        "summary": summary
        or f"Partial daily triage: recovered {len(valid)} recommendation(s) "
        "after the full report failed validation.",
        "recommendations": valid,
        "status": "partial",
        "partial": True,
        "quarantined_count": len(quarantined),
        "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
        "recovery_note": f"original validation error: {original_error}",
    }
    logger.warning(
        "instruction_output_recovered: instruction=%r, kept=%d, quarantined=%d",
        getattr(instr, "id", None), len(valid), len(quarantined),
    )
    return InstructionResult(
        tasks=[],
        report=report,
        prompt_hash=prompt_hash,
        model=getattr(instr, "model", None),
        output_validated=True,
        review_required=True,
        condition_matched=getattr(instr, "condition", "") or None,
        validation_error=None,
    )
 def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
    """Build a durable diagnostic report when a report instruction cannot run."""
    if not getattr(instr, "report_sinks", None):
@@ -295,6 +622,7 @@ def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
 def _validate_output(
    raw_output: Any,
    instr: Any,
    allow_list: set[str] | None = None,
 ) -> tuple[list[TaskSpec], dict[str, Any] | None, str | None]:
    """Parse raw LLM output into TaskSpecs and optional report payload.
@@ -349,6 +677,28 @@ def _validate_output(
                source_type="instruction",
                source_id=instr.id,
            ))
        # Happy-path producer guardrails (WP-0016-T04): the whole document already
        # passed schema validation, so recommendations are schema-valid; still apply
        # the count cap, structural caps, and reference allow-list, quarantining any
        # offenders rather than emitting them. Report shape only changes when an item
        # is actually quarantined.
        if isinstance(report, dict) and isinstance(report.get("recommendations"), list):
            item_schema, max_items = _report_contract(instr)
            kept, quarantined = _partition_items(
                report["recommendations"], item_schema, max_items,
                run_schema=False, allow_list=allow_list,
            )
            if quarantined:
                report = {
                    **report,
                    "recommendations": kept,
                    "status": "partial",
                    "partial": True,
                    "quarantined_count": len(quarantined),
                    "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
                }
        return specs, report, None
    except (json.JSONDecodeError, AttributeError, KeyError, TypeError) as exc:
        return [], None, str(exc)
--- a/src/activity_core/schedule_health.py
+++ b/src/activity_core/schedule_health.py
@@ -0,0 +1,194 @@
 """Missed-fire detection for cron schedules (ACTIVITY-WP-0014, T03).
 Even with a catchup window configured, an operator wants to *know* when a fire
 was missed — especially under ``misfire_policy: skip`` where missed fires are
 dropped by design and leave no run and no failure event. This module turns the
 schedule's own bookkeeping into an explicit verdict and an optional State Hub
 alert so a miss is never invisible again.
 Temporal already counts fires that were dropped because they fell outside the
 catchup window in ``ScheduleInfo.num_actions_missed_catchup_window``. We surface
 that, plus a staleness check on the most recent fire, as a ``ScheduleHealth``
 verdict. The verdict logic is a pure function so it is testable without a live
 Temporal server; ``check_schedule_health`` is the thin async reader.
 """
 from __future__ import annotations
 import os
 from dataclasses import dataclass, field
 from datetime import datetime, timedelta, timezone
 from typing import Any
 from uuid import UUID
 import httpx
 from activity_core.schedule_manager import schedule_id
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
@dataclass(frozen=True)
 class ScheduleHealth:
    """Verdict for a single schedule's recent firing behaviour."""
    activity_id: str
    healthy: bool
    missed_catchup_window: int
    last_fired_at: datetime | None
    staleness: timedelta | None
    reasons: list[str] = field(default_factory=list)
    @property
    def missed(self) -> bool:
        return not self.healthy
 def evaluate_schedule_health(
    *,
    activity_id: str,
    missed_catchup_window: int,
    last_fired_at: datetime | None,
    now: datetime,
    expected_interval: timedelta | None = None,
    tolerance: timedelta = timedelta(minutes=10),
 ) -> ScheduleHealth:
    """Pure verdict: was a fire missed?
    A schedule is unhealthy if Temporal dropped any fire past the catchup window,
    or — when ``expected_interval`` is known — if the most recent fire is older
    than one interval plus ``tolerance`` (i.e. a fire should have happened and
    did not).
    """
    reasons: list[str] = []
    if missed_catchup_window > 0:
        reasons.append(
            f"{missed_catchup_window} fire(s) dropped outside the catchup window"
        )
    staleness: timedelta | None = None
    if last_fired_at is not None:
        staleness = now - last_fired_at
        if expected_interval is not None and staleness > expected_interval + tolerance:
            reasons.append(
                f"last fire was {staleness} ago, exceeding the expected "
                f"{expected_interval} interval"
            )
    elif expected_interval is not None:
        reasons.append("no recorded fire for a schedule that should have fired")
    return ScheduleHealth(
        activity_id=activity_id,
        healthy=not reasons,
        missed_catchup_window=missed_catchup_window,
        last_fired_at=last_fired_at,
        staleness=staleness,
        reasons=reasons,
    )
 def _extract_info(desc: Any) -> tuple[int, datetime | None]:
    """Pull (missed_catchup_window, last_fired_at) from a ScheduleDescription.
    Accesses are defensive so a Temporal SDK field rename degrades to "unknown"
    rather than raising inside an operational health check.
    """
    info = getattr(desc, "info", None)
    missed = int(getattr(info, "num_actions_missed_catchup_window", 0) or 0)
    last_fired: datetime | None = None
    recent = getattr(info, "recent_actions", None) or []
    times = [
        getattr(a, "scheduled_at", None) or getattr(a, "started_at", None)
        for a in recent
    ]
    times = [t for t in times if t is not None]
    if times:
        last_fired = max(times)
    return missed, last_fired
 async def check_schedule_health(
    client: Any,
    activity_id: str | UUID,
    *,
    now: datetime | None = None,
    expected_interval: timedelta | None = None,
    tolerance: timedelta = timedelta(minutes=10),
 ) -> ScheduleHealth:
    """Describe the schedule for ``activity_id`` and evaluate its health."""
    now = now or datetime.now(tz=timezone.utc)
    handle = client.get_schedule_handle(schedule_id(activity_id))
    desc = await handle.describe()
    missed, last_fired = _extract_info(desc)
    return evaluate_schedule_health(
        activity_id=str(activity_id),
        missed_catchup_window=missed,
        last_fired_at=last_fired,
        now=now,
        expected_interval=expected_interval,
        tolerance=tolerance,
    )
 def post_missed_fire_alert(
    health: ScheduleHealth,
    *,
    state_hub_url: str | None = None,
    author: str = "activity-core",
    topic_id: str | None = None,
    workstream_id: str | None = None,
    timeout_seconds: float = 10.0,
 ) -> dict[str, Any]:
    """Post a ``schedule_miss`` progress event to State Hub for an unhealthy schedule.
    No-op (returns ``status: ok``) when the schedule is healthy, so callers can
    invoke unconditionally.
    """
    if health.healthy:
        return {"type": "schedule-miss-alert", "status": "ok"}
    base_url = state_hub_url or os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL)
    base_url = str(base_url).rstrip("/")
    body: dict[str, Any] = {
        "event_type": "schedule_miss",
        "author": author,
        "summary": (
            f"Schedule {health.activity_id} missed a fire: "
            + "; ".join(health.reasons)
        ),
        "detail": {
            "activity_id": health.activity_id,
            "missed_catchup_window": health.missed_catchup_window,
            "last_fired_at": (
                health.last_fired_at.isoformat() if health.last_fired_at else None
            ),
            "staleness_seconds": (
                health.staleness.total_seconds() if health.staleness else None
            ),
            "reasons": health.reasons,
        },
    }
    if topic_id:
        body["topic_id"] = topic_id
    if workstream_id:
        body["workstream_id"] = workstream_id
    # Dedup repeated alerts for the same missed window (same schedule + last fire).
    last_fired = health.last_fired_at.isoformat() if health.last_fired_at else "none"
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers("schedule_miss", health.activity_id, last_fired),
        timeout=timeout_seconds,
    )
    resp.raise_for_status()
    data = resp.json()
    return {
        "type": "schedule-miss-alert",
        "status": "posted",
        "progress_id": data.get("id"),
    }
--- a/src/activity_core/schedule_manager.py
+++ b/src/activity_core/schedule_manager.py
@@ -17,7 +17,6 @@ from temporalio.client import (
    Schedule,
    ScheduleActionStartWorkflow,
    ScheduleAlreadyRunningError,
    ScheduleBackfill,
    ScheduleCalendarSpec,
    ScheduleHandle,
    ScheduleOverlapPolicy,
@@ -38,13 +37,49 @@ _ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
 # RunActivityWorkflow detects this value and derives run dedup key from workflow_id.
 SCHEDULED_TRIGGER_KEY = "scheduled"
-# T24: misfire_policy → ScheduleOverlapPolicy
+# ACTIVITY-WP-0014: misfire_policy → run-miss recovery behaviour.
-_MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
+#
-    "skip": ScheduleOverlapPolicy.SKIP,
+# A "missed fire" happens when the worker / Temporal is unavailable at trigger
-    "catchup": ScheduleOverlapPolicy.BUFFER_ALL,
+# time. Two Temporal levers together define the behaviour:
-    "compress": ScheduleOverlapPolicy.BUFFER_ONE,
+#   - catchup_window: how far back the server will recover missed fires once it
 #     is healthy again. The previous code never set this, so a brief outage at
 #     trigger time silently dropped the fire with no recovery and no signal.
 #   - overlap: what to do when a (recovered) fire would start while a prior run
 #     is still executing.
 #
 # Legacy values (catchup, compress) are aliased onto the explicit names.
 _MISFIRE_ALIASES: dict[str, str] = {
    "catchup": "catchup_all",
    "compress": "catchup_latest",
 }
 # overlap policy + default catchup window (seconds) per normalised policy.
 _SKIP_WINDOW_SECONDS = 60
 _CATCHUP_ALL_WINDOW_SECONDS = 365 * 24 * 3600
 _CATCHUP_LATEST_WINDOW_SECONDS = 24 * 3600
 _MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
    # Run on trigger or skip — recover nothing past a tiny grace window.
    "skip": ScheduleOverlapPolicy.SKIP,
    # Run on trigger or recover every missed fire during the outage window.
    "catchup_all": ScheduleOverlapPolicy.BUFFER_ALL,
    # Run on trigger or recover the most recent missed fire only; BUFFER_ONE
    # buffers at most one start and drops the rest, so a backlog never accumulates.
    "catchup_latest": ScheduleOverlapPolicy.BUFFER_ONE,
 }
 _MISFIRE_DEFAULT_WINDOW: dict[str, int] = {
    "skip": _SKIP_WINDOW_SECONDS,
    "catchup_all": _CATCHUP_ALL_WINDOW_SECONDS,
    "catchup_latest": _CATCHUP_LATEST_WINDOW_SECONDS,
 }
 def _normalize_misfire_policy(misfire_policy: str) -> str:
    """Map legacy aliases onto the explicit run-miss policy names."""
    canonical = _MISFIRE_ALIASES.get(misfire_policy, misfire_policy)
    return canonical if canonical in _MISFIRE_TO_OVERLAP else "skip"
 def schedule_id(activity_id: str | UUID) -> str:
    """Return the canonical Temporal Schedule ID for an ActivityDefinition."""
@@ -57,7 +92,15 @@ def smoke_schedule_id(activity_id: str | UUID) -> str:
 def _overlap_policy(misfire_policy: str) -> ScheduleOverlapPolicy:
-    return _MISFIRE_TO_OVERLAP.get(misfire_policy, ScheduleOverlapPolicy.SKIP)
+    return _MISFIRE_TO_OVERLAP[_normalize_misfire_policy(misfire_policy)]
 def _catchup_window(cfg: CronTriggerConfig) -> timedelta:
    """Resolve the catchup window: explicit override, else the policy default."""
    if cfg.catchup_window_seconds is not None:
        return timedelta(seconds=cfg.catchup_window_seconds)
    policy = _normalize_misfire_policy(cfg.misfire_policy)
    return timedelta(seconds=_MISFIRE_DEFAULT_WINDOW[policy])
 def _build_schedule(defn: ActivityDefinition) -> Schedule:
@@ -80,7 +123,10 @@ def _build_schedule(defn: ActivityDefinition) -> Schedule:
        jitter=timedelta(seconds=cfg.jitter_seconds) if cfg.jitter_seconds else None,
    )
-    policy = SchedulePolicy(overlap=_overlap_policy(cfg.misfire_policy))
+    policy = SchedulePolicy(
        overlap=_overlap_policy(cfg.misfire_policy),
        catchup_window=_catchup_window(cfg),
    )
    state = ScheduleState(paused=not defn.enabled)
    return Schedule(action=action, spec=spec, policy=policy, state=state)
@@ -282,18 +328,10 @@ async def upsert_schedule(client: Client, defn: ActivityDefinition) -> ScheduleH
        else:
            await handle.pause(note="disabled via upsert_schedule")
-    # T24 catchup: backfill any fires missed in the last hour.
+    # ACTIVITY-WP-0014: missed-fire recovery is now handled natively by the
-    if isinstance(defn.trigger_config, CronTriggerConfig):
+    # schedule's catchup_window (see _build_schedule), which the server applies
-        if defn.trigger_config.misfire_policy == "catchup":
+    # continuously after any outage — not only at upsert time. The previous
-            now = datetime.now(tz=timezone.utc)
+    # ad-hoc 1-hour backfill is therefore no longer needed.
            backfill_start = now - timedelta(hours=1)
            await handle.backfill(
                ScheduleBackfill(
                    start_at=backfill_start,
                    end_at=now,
                    overlap=ScheduleOverlapPolicy.BUFFER_ALL,
                )
            )
    return handle
--- a/src/activity_core/state_hub_write.py
+++ b/src/activity_core/state_hub_write.py
@@ -0,0 +1,34 @@
 """Idempotency-keyed State Hub writes (ACTIVITY-WP-0014 T05).
 Under the State Hub *beachhead* model, a write may be buffered locally while
 central State Hub is unreachable and **flushed later, possibly with retries**.
 To keep that flush safe — no duplicate progress / triage events — every write
 carries a stable ``Idempotency-Key`` header derived deterministically from the
 write's identity. The guarantee lives on the write itself and does **not** depend
 on a live dedup read, so it holds even when the beachhead is serving offline.
 activity-core does not implement the queue/cache (that is state-hub's beachhead);
 it only emits the key so the beachhead / State Hub can dedup on flush. The header
 passes untouched through the existing ``actcore-state-hub-bridge`` proxy and is
 ignored by State Hub versions that do not yet honour it.
 """
 from __future__ import annotations
 IDEMPOTENCY_HEADER = "Idempotency-Key"
 def idempotency_key(*parts: str | None) -> str:
    """Build a stable, header-safe idempotency key from identity parts.
    Empty/None parts are kept as empty segments so the key shape is stable across
    calls. Whitespace and control characters are collapsed to keep the value a
    valid single-line HTTP header.
    """
    raw = ":".join((p or "") for p in parts)
    return "".join(ch if 0x20 < ord(ch) < 0x7F else "_" for ch in raw) or "_"
 def idempotency_headers(*parts: str | None) -> dict[str, str]:
    """Return the header dict to attach to a State Hub write."""
    return {IDEMPOTENCY_HEADER: idempotency_key(*parts)}
--- a/src/activity_core/sync_schedules.py
+++ b/src/activity_core/sync_schedules.py
@@ -15,6 +15,8 @@ import asyncio
 import logging
 import os
 import uuid
 from dataclasses import dataclass
 from typing import Sequence
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
@@ -30,6 +32,20 @@ TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
 TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
@dataclass
 class ScheduleSyncResult:
    upserted: int = 0
    paused: int = 0
    deleted_orphans: int = 0
    def to_dict(self) -> dict[str, int]:
        return {
            "upserted": self.upserted,
            "paused": self.paused,
            "deleted_orphans": self.deleted_orphans,
        }
 def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    """Convert an ORM row to a domain ActivityDefinition for schedule_manager."""
    return ActivityDefinition.model_validate(
@@ -46,12 +62,82 @@ def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    )
-async def sync(client: Client, db_url: str) -> None:
+def _valid_schedule_activity_id(defn: ActivityDefinition) -> str:
    if isinstance(defn.trigger_config, ScheduledTriggerConfig):
        return f"{defn.id}-once"
    return str(defn.id)
 async def _load_schedule_rows(
    session_factory: async_sessionmaker[AsyncSession],
 ) -> Sequence[ActivityDefinitionRow]:
    async with session_factory() as session:
        return (
            await session.scalars(
                select(ActivityDefinitionRow).where(
                    ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
                )
            )
        ).all()
 async def sync_schedule_rows(
    client: Client,
    rows: Sequence[ActivityDefinitionRow],
 ) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules against already-loaded definition rows."""
    valid_schedule_activity_ids: set[str] = set()
    result = ScheduleSyncResult()
    for row in rows:
        defn = _row_to_domain(row)
        if not isinstance(
            defn.trigger_config,
            (CronTriggerConfig, ScheduledTriggerConfig),
        ):
            continue
        valid_schedule_activity_ids.add(_valid_schedule_activity_id(defn))
        await upsert_schedule(client, defn)
        if defn.enabled:
            result.upserted += 1
            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
        else:
            result.paused += 1
            logger.info("upserted paused schedule for disabled activity %s", defn.id)
    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
    existing_schedules = await list_schedules(client)
    for entry in existing_schedules:
        if entry["activity_id"] not in valid_schedule_activity_ids:
            await delete_schedule(client, entry["activity_id"])
            result.deleted_orphans += 1
            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
    logger.info(
        "sync_schedules complete — upserted=%d paused=%d deleted_orphans=%d",
        result.upserted,
        result.paused,
        result.deleted_orphans,
    )
    return result
 async def sync_with_session_factory(
    client: Client,
    session_factory: async_sessionmaker[AsyncSession],
 ) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules using an existing DB session factory."""
    return await sync_schedule_rows(client, await _load_schedule_rows(session_factory))
 async def sync(client: Client, db_url: str) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules against the ActivityDefinition table.
    Steps:
-      1. Load all enabled cron ActivityDefinitions from Postgres.
+      1. Load all cron/scheduled ActivityDefinitions from Postgres.
-      2. Upsert a Temporal Schedule for each one.
+      2. Upsert a Temporal Schedule for each one, paused when disabled.
      3. Delete Temporal Schedules whose activity_id has no matching DB row
         (tombstone cleanup for deleted or trigger-type-changed definitions).
    """
@@ -59,55 +145,10 @@ async def sync(client: Client, db_url: str) -> None:
    session_factory = async_sessionmaker(engine, expire_on_commit=False)
    try:
-        async with session_factory() as session:
+        return await sync_with_session_factory(client, session_factory)
            rows = (
                await session.scalars(
                    select(ActivityDefinitionRow).where(
                        ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
                    )
                )
            ).all()
    finally:
        await engine.dispose()
    db_activity_ids: set[str] = set()
    upserted = 0
    skipped = 0
    for row in rows:
        defn = _row_to_domain(row)
        if not isinstance(defn.trigger_config, (CronTriggerConfig, ScheduledTriggerConfig)):
            continue
        db_activity_ids.add(str(defn.id))
        if defn.enabled:
            await upsert_schedule(client, defn)
            upserted += 1
            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
        else:
            # Disabled definitions: schedule may exist (paused) — leave it;
            # upsert_schedule already handles the paused state.
            await upsert_schedule(client, defn)
            skipped += 1
            logger.info("upserted paused schedule for disabled activity %s", defn.id)
    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
    existing_schedules = await list_schedules(client)
    deleted = 0
    for entry in existing_schedules:
        if entry["activity_id"] not in db_activity_ids:
            await delete_schedule(client, entry["activity_id"])
            deleted += 1
            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
    logger.info(
        "sync_schedules complete — upserted=%d skipped_disabled=%d deleted_orphans=%d",
        upserted,
        skipped,
        deleted,
    )
 async def main() -> None:
    logging.basicConfig(level=logging.INFO)
@@ -116,7 +157,13 @@ async def main() -> None:
        raise RuntimeError("ACTCORE_DB_URL is required")
    client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
-    await sync(client, db_url)
+    result = await sync(client, db_url)
    print(
        "Synced schedules: "
        f"upserted={result.upserted} "
        f"paused={result.paused} "
        f"deleted_orphans={result.deleted_orphans}"
    )
 if __name__ == "__main__":
--- a/src/activity_core/sync_service.py
+++ b/src/activity_core/sync_service.py
@@ -0,0 +1,97 @@
 """Shared ActivityDefinition/event type/schedule sync orchestration."""
 from __future__ import annotations
 from typing import Any
 from temporalio.client import Client
 from activity_core.event_type_registry import sync_event_types
 from activity_core.sync_activity_definitions import sync as sync_activity_definitions
 from activity_core.sync_schedules import ScheduleSyncResult, sync_with_session_factory
 _MAX_ERRORS = 20
 _MAX_ERROR_MESSAGE_LENGTH = 1000
 def _empty_result(
    *,
    definitions: bool,
    schedules: bool,
    event_types: bool,
 ) -> dict[str, Any]:
    return {
        "ok": True,
        "ran": {
            "definitions": definitions,
            "schedules": schedules,
            "event_types": event_types,
        },
        "definitions": {"synced": 0},
        "event_types": {"synced": 0},
        "schedules": ScheduleSyncResult().to_dict(),
        "errors": [],
    }
 def _record_error(result: dict[str, Any], stage: str, exc: Exception) -> None:
    errors = result["errors"]
    if len(errors) >= _MAX_ERRORS:
        return
    errors.append(
        {
            "stage": stage,
            "type": type(exc).__name__,
            "message": str(exc)[:_MAX_ERROR_MESSAGE_LENGTH],
        }
    )
    result["ok"] = False
 async def run_sync(
    *,
    session_factory: Any,
    temporal_client: Client | None,
    definitions: bool = True,
    schedules: bool = True,
    event_types: bool = False,
 ) -> dict[str, Any]:
    """Run the requested sync stages and return bounded operator-facing status.
    The orchestration deliberately accepts its database and Temporal
    dependencies as arguments so startup and the API can share the same behavior
    without creating another global runtime.
    """
    result = _empty_result(
        definitions=definitions,
        schedules=schedules,
        event_types=event_types,
    )
    if definitions:
        try:
            result["definitions"]["synced"] = await sync_activity_definitions(
                session_factory
            )
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "definitions", exc)
    if event_types:
        try:
            result["event_types"]["synced"] = await sync_event_types(session_factory)
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "event_types", exc)
    if schedules:
        try:
            if temporal_client is None:
                raise RuntimeError("Temporal client is required for schedule sync")
            schedule_result = await sync_with_session_factory(
                temporal_client,
                session_factory,
            )
            result["schedules"] = schedule_result.to_dict()
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "schedules", exc)
    return result
--- a/src/activity_core/worker.py
+++ b/src/activity_core/worker.py
@@ -46,8 +46,7 @@ from activity_core.activities import (
 )
 from activity_core.db import make_engine
 from sqlalchemy.ext.asyncio import async_sessionmaker
-from activity_core.sync_activity_definitions import sync as sync_activity_defs
+from activity_core.sync_service import run_sync
 from activity_core.sync_schedules import sync as sync_schedules
 from activity_core.workflows import RunActivityWorkflow, TaskExecutorWorkflow
 logger = logging.getLogger(__name__)
@@ -77,20 +76,26 @@ async def run() -> None:
        TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE, runtime=runtime
    )
-    # T45: Sync ActivityDefinition files into DB before schedule sync.
+    logger.info("Syncing ActivityDefinitions and Temporal Schedules...")
-    logger.info("Syncing ActivityDefinition files...")
+    sync_engine = make_engine(db_url)
    session_factory = async_sessionmaker(sync_engine, expire_on_commit=False)
    try:
-        session_factory = async_sessionmaker(make_engine(db_url), expire_on_commit=False)
+        sync_result = await run_sync(
-        await sync_activity_defs(session_factory)
+            session_factory=session_factory,
-    except Exception:
+            temporal_client=client,
-        logger.exception("activity definition sync failed — continuing worker startup")
+            definitions=True,
-
+            schedules=True,
-    # T23: Sync Temporal Schedules with the DB before workers start accepting tasks.
+            event_types=False,
-    logger.info("Syncing Temporal Schedules with ActivityDefinition DB...")
+        )
-    try:
+        for error in sync_result["errors"]:
-        await sync_schedules(client, db_url)
+            logger.error(
-    except Exception:
+                "startup sync %s failed — %s: %s",
-        logger.exception("schedule sync failed — continuing worker startup")
+                error["stage"],
                error["type"],
                error["message"],
            )
    finally:
        await sync_engine.dispose()
    orchestrator_worker = Worker(
        client,
--- a/src/activity_core/workflows.py
+++ b/src/activity_core/workflows.py
@@ -209,11 +209,12 @@ class RunActivityWorkflow:
@workflow.defn
 class TaskExecutorWorkflow:
-    """Child workflow that executes one concrete task instance.
+    """Compatibility stub for legacy task-instance workflows.
-    Stub behaviour: persists a task_instances row with status=done and
+    This is not a production execution surface for activity-core. It persists a
-    returns immediately. Real task execution logic replaces this in a
+    task_instances row with status=done and returns immediately so legacy/dev
-    later workstream.
+    flows keep their idempotency behavior. Real task execution belongs in
    per-repo workers or a future execution-owned repo/workplan, not here.
    task_id is derived deterministically from the workflow's own ID so
    persist_task_instance retries remain idempotent.
@@ -221,7 +222,7 @@ class TaskExecutorWorkflow:
    @workflow.run
    async def run(self, run_id: str, task_type: str, params: dict) -> dict:
-        # Derive a stable task_id from this workflow's own ID.
+        # Keep the stub idempotent without implying task lifecycle ownership.
        task_id = str(
            uuid.uuid5(uuid.NAMESPACE_URL, workflow.info().workflow_id)
        )
--- a/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
+++ b/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
@@ -0,0 +1,5 @@
 {
  "_note": "PARTIAL 4000-char preview of the 2026-06-26 daily-triage validation failure (retry attempt). Full payload not recoverable from activity-core: complete() drops finish_reason; report sink caps raw at 4000 chars; the JSON break is at char 5268 (beyond this preview). Full response would require llm-connect producer-side logs on railiance01.",
  "validation_error": "Expecting ',' delimiter: line 136 column 22 (char 5268)",
  "raw_output_preview": "{\n  \"summary\": \"Triage report focusing on high-priority workstreams with pending human intervention or critical dependencies, and addressing recently cleared dependencies to unblock progress.\",\n  \"recommendations\": [\n    {\n      \"rank\": 1,\n      \"candidate\": \"2731fece-6c49-45b8-ab8a-4ea6c04ac603\",\n      \"action\": \"work-next\",\n      \"why\": \"A critical dependency (T03 - Configure bounded OpenBao token roles and policies) for this workstream has been cleared, unblocking significant progress on credential management. This workstream has 8 todo tasks and no waits, indicating it's ready for immediate action.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 5.0,\n        \"strategic_value\": 5,\n        \"time_criticality\": 5,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 5,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 2,\n      \"candidate\": \"bd086c41-287d-4a4e-8ac5-9ab270f14d72\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (T04 - Provision the runtime API key outside Git) and is currently blocked by 3 'wait' tasks. Human intervention is required to unblock progress.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 3,\n      \"candidate\": \"9b56414a-c71f-4e72-9b2b-d2166aaf50d0\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (Task: Execute Live Ops-Hub Bootstrap) and is currently blocked by a 'wait' task. Human intervention is required to proceed with the bootstrap.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 4,\n      \"candidate\": \"84e17675-0d15-4268-a8bd-540124d37018\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has 4 'needs_human' tasks, including 'T02 \u2014 Resolve Forgejo production design decisions', indicating significant human input is required to move forward with the migration.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.0,\n        \"strategic_value\": 4,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 5,\n      \"candidate\": \"5646e13a-13af-4724-bca6-3c0d86f96733\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has a 'needs_human' task ('Three-Run Calibration Feedback') and is currently in a 'wait' state. Human feedback is crucial for operational hardening.\",\n      \"confidence\": \"medium\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 6,\n      \"candidate\": \"896ace77-21b3-450b-8fb7-254aefc8c570\",\n      \"action\": \"close-out\",\n      \"why\": \"The task 'Wire activity-core to the live service' has been resolved, and the workstream shows 2 progress tasks with 0 todo/wait tasks. This indicates the deployment is likely complete or nearing completion and ready for close-out after verification.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 7,\n      \"candidate\": \"656e435d-3a00-4f5e-a38e-114467f9062e\",\n      \"action\": \"work-next\",\n      \"why\": \"This high-priority workstream has a single 'wait' task ('Task: Activate Ops-Hub Widgets In Inter-Hub') and no 'needs_human' tasks. It appears ready for the next step to activate the widgets.\",\n      \"confidence\": \"medium\",\n      \"wsjf"
 }
--- a/tests/rules/test_actions.py
+++ b/tests/rules/test_actions.py
@@ -88,6 +88,43 @@ def test_for_each_binds_each_list_item_before_condition_and_action_rendering() -
    ]
 def test_for_each_can_gate_registry_hygiene_gaps_on_signal() -> None:
    rules = [
        {
            "id": "flag-registry-hygiene-gap",
            "for_each": "context.gaps",
            "bind_as": "g",
            "condition": 'context.g.hygiene_signal != ""',
            "action": {
                "task_template": "Close registry hygiene gap for {context.g.repo}",
                "target_repo": "context.g.repo",
                "priority": "medium",
                "labels": ["registry-hygiene", "{context.g.hygiene_signal}"],
            },
        }
    ]
    context = {
        "gaps": [
            {
                "repo": "reuse-surface",
                "hygiene_signal": "empty_capability_scaffold",
            },
            {
                "repo": "activity-core",
                "hygiene_signal": "",
            },
        ]
    }
    specs = expand_rule_actions(rules, _Event(), context)
    assert [spec["target_repo"] for spec in specs] == ["reuse-surface"]
    assert specs[0]["labels"] == [
        "registry-hygiene",
        "empty_capability_scaffold",
    ]
 def test_for_each_rejects_non_path_expression() -> None:
    rules = [
        {
--- a/tests/rules/test_executor.py
+++ b/tests/rules/test_executor.py
@@ -12,6 +12,7 @@ Covers:
 from __future__ import annotations
 import json
 from pathlib import Path
 from types import SimpleNamespace
 from typing import Any
@@ -333,7 +334,14 @@ def test_execute_instruction_forwards_output_schema_to_llm_connect(tmp_path, mon
 def test_execute_instruction_with_audit_accepts_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
            {
                "rank": 1,
                "action": "revisit",
                "candidate": "CUST-WP-0045",
                "why": "Loose ends need attention.",
            }
        ],
    }
    llm = _CountingLLM([json.dumps(report_data)])
    instr = _instr(
@@ -353,7 +361,14 @@ def test_execute_instruction_with_audit_accepts_report_payload():
 def test_execute_instruction_with_audit_accepts_fenced_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
            {
                "rank": 1,
                "action": "revisit",
                "candidate": "CUST-WP-0045",
                "why": "Loose ends need attention.",
            }
        ],
    }
    llm = _CountingLLM([f"```json\n{json.dumps(report_data)}\n```"])
    instr = _instr(
@@ -389,6 +404,175 @@ def test_execute_instruction_with_audit_rejects_invalid_report_schema():
    assert llm.call_count == 2
 # ── WP-0016-T03 resilient report recovery ─────────────────────────────────────
 def _valid_rec(rank: int) -> dict[str, Any]:
    return {
        "rank": rank,
        "candidate": f"WS-{rank}",
        "action": "work-next",
        "why": f"reason {rank}",
        "wsjf": {"score": 5.0},
    }
 def _pretty_triage_with_truncated_tail(num_valid: int) -> str:
    body = ",\n".join("    " + json.dumps(_valid_rec(i)) for i in range(1, num_valid + 1))
    # Trailing object is cut off mid-string — the whole document is invalid JSON,
    # reproducing the 2026-06-26 failure shape (valid prefix, broken tail).
    return (
        '{\n  "summary": "Daily triage.",\n  "recommendations": [\n'
        + body
        + ',\n    {\n      "rank": '
        + str(num_valid + 1)
        + ',\n      "candidate": "WS-X",\n      "action": "work-'
    )
 def test_resilient_report_recovers_valid_prefix_and_quarantines_truncated_tail():
    raw = _pretty_triage_with_truncated_tail(7)
    llm = _CountingLLM([raw, raw])
    instr = _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
    assert result.output_validated is True
    assert result.review_required is True
    assert result.report is not None
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 7
    assert result.report["summary"] == "Daily triage."
    assert result.report["quarantined_count"] >= 1
    # The broken tail is dropped — either as an unparseable/truncated span or,
    # if _try_repair salvages its structure, as a schema-invalid item. Either way
    # it carries a diagnostic error and never pollutes the surviving report.
    assert result.report["quarantined_items"][0]["error"]
 def test_resilient_report_quarantines_one_bad_item_among_valid():
    recs = [_valid_rec(1), {"candidate": "WS-2", "action": "x", "why": "no rank"}, _valid_rec(3)]
    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
    llm = _CountingLLM([raw, raw])
    instr = _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
    assert result.output_validated is True
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 2
    assert result.report["quarantined_count"] == 1
    assert "rank" in result.report["quarantined_items"][0]["error"]
 # ── WP-0016-T04 producer guardrails ───────────────────────────────────────────
 def _triage_instr() -> SimpleNamespace:
    return _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
 def test_guardrail_count_cap_on_valid_happy_path():
    # 9 fully-valid recommendations in a syntactically valid document: schema
    # validation passes, but the maxItems=7 count cap must keep 7 and quarantine 2.
    recs = [_valid_rec(i) for i in range(1, 10)]
    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert llm.call_count == 1  # no retry — the document was valid
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 7
    assert result.report["quarantined_count"] == 2
    assert all(q["reason"] == "over_limit" for q in result.report["quarantined_items"])
 def test_guardrail_oversized_string_quarantined():
    big = _valid_rec(2)
    big["why"] = "x" * 5000  # exceeds _MAX_STRING_LEN
    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), big]})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["quarantined_count"] == 1
    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
 def test_guardrail_allow_list_rejects_unknown_candidate():
    raw = json.dumps({
        "summary": "Triage.",
        "recommendations": [_valid_rec(1), _valid_rec(2)],  # candidates WS-1, WS-2
    })
    llm = _CountingLLM([raw])
    context = {"known_candidates": ["WS-1"]}
    result = execute_instruction_with_audit(_triage_instr(), _Event(), context, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["recommendations"][0]["candidate"] == "WS-1"
    assert result.report["quarantined_items"][0]["reason"] == "allow_list"
 def _nested(depth: int) -> dict[str, Any]:
    node: dict[str, Any] = {"leaf": 1}
    for _ in range(depth):
        node = {"a": node}
    return node
 def test_guardrail_over_depth_quarantined():
    deep = _valid_rec(2)
    deep["extra"] = _nested(12)  # well past _MAX_DEPTH
    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), deep]})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["quarantined_count"] == 1
    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
    assert "depth" in result.report["quarantined_items"][0]["error"]
 def test_resilient_recovery_against_real_2026_06_26_fixture():
    # The actual captured failure payload (4000-char preview, truncated at the 7th
    # recommendation) — the run that reset the WP-0006-T03 streak. Before WP-0016
    # this discarded the whole report; now it must recover the valid prefix.
    fixture = json.loads(
        Path("tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json")
        .read_text(encoding="utf-8")
    )
    raw = fixture["raw_output_preview"]
    llm = _CountingLLM([raw, raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert result.output_validated is True
    assert result.report["partial"] is True
    # Six recommendations are fully intact before the truncation point.
    assert len(result.report["recommendations"]) >= 6
    assert all("rank" in rec and "candidate" in rec for rec in result.report["recommendations"])
 def test_execute_instruction_with_audit_preserves_invalid_report_with_sinks(
    tmp_path,
    monkeypatch,
--- a/tests/test_admin_sync_api.py
+++ b/tests/test_admin_sync_api.py
@@ -0,0 +1,114 @@
 from __future__ import annotations
 from typing import Any
 import pytest
 from activity_core import api
@pytest.mark.asyncio
 async def test_admin_sync_definitions_only_does_not_require_temporal(
    monkeypatch,
 ) -> None:
    seen: dict[str, Any] = {}
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        seen.update(kwargs)
        return {"ok": True, "ran": {"definitions": True}}
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", None)
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=True,
        schedules=False,
        event_types=False,
    )
    assert result == {"ok": True, "ran": {"definitions": True}}
    assert seen["session_factory"] is api._session_factory
    assert seen["temporal_client"] is None
    assert seen["definitions"] is True
    assert seen["schedules"] is False
    assert seen["event_types"] is False
@pytest.mark.asyncio
 async def test_admin_sync_schedules_only_passes_temporal(monkeypatch) -> None:
    temporal = object()
    seen: dict[str, Any] = {}
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        seen.update(kwargs)
        return {
            "ok": True,
            "schedules": {
                "upserted": 1,
                "paused": 0,
                "deleted_orphans": 0,
            },
        }
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", temporal)
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=False,
        schedules=True,
        event_types=False,
    )
    assert result["schedules"]["upserted"] == 1
    assert seen["temporal_client"] is temporal
    assert seen["definitions"] is False
    assert seen["schedules"] is True
    assert seen["event_types"] is False
@pytest.mark.asyncio
 async def test_admin_sync_all_sync_returns_failure_result(monkeypatch) -> None:
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        return {
            "ok": False,
            "ran": {
                "definitions": kwargs["definitions"],
                "schedules": kwargs["schedules"],
                "event_types": kwargs["event_types"],
            },
            "errors": [
                {
                    "stage": "event_types",
                    "type": "RuntimeError",
                    "message": "bad event type",
                }
            ],
        }
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", object())
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=True,
        schedules=True,
        event_types=True,
    )
    assert result == {
        "ok": False,
        "ran": {
            "definitions": True,
            "schedules": True,
            "event_types": True,
        },
        "errors": [
            {
                "stage": "event_types",
                "type": "RuntimeError",
                "message": "bad event type",
            }
        ],
    }
--- a/tests/test_instruction_evaluation.py
+++ b/tests/test_instruction_evaluation.py
@@ -1,6 +1,7 @@
 from __future__ import annotations
 import json
 from pathlib import Path
 import pytest
@@ -70,7 +71,14 @@ async def test_evaluate_instructions_returns_task_specs_with_audit(monkeypatch)
 async def test_evaluate_instructions_returns_report_payload(monkeypatch) -> None:
    llm = FakeLLMClient(json.dumps({
        "summary": "State Hub has open loose ends.",
-        "recommendations": [{"candidate": "CUST-WP-0045", "action": "work-next"}],
+        "recommendations": [
            {
                "rank": 1,
                "candidate": "CUST-WP-0045",
                "action": "work-next",
                "why": "Open loose ends.",
            }
        ],
    }))
    monkeypatch.setattr(activities, "get_llm_client", lambda: llm)
@@ -209,6 +217,12 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "context": {},
    })
    # Read the live schema file rather than hard-coding it, so the forwarded
    # json_schema assertion tracks schemas/daily-triage-report.json as the
    # contract evolves (ACTIVITY-WP-0016-T02).
    expected_schema = json.loads(
        Path("schemas/daily-triage-report.json").read_text(encoding="utf-8")
    )
    assert llm.calls[0][2] == {
        "model_name": "custodian-triage-balanced",
        "temperature": 0.2,
@@ -216,16 +230,6 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "max_depth": 2,
        "model_params": {
            "reasoning_effort": "medium",
-            "json_schema": {
+            "json_schema": expected_schema,
                "type": "object",
                "required": ["summary", "recommendations"],
                "properties": {
                    "summary": {"type": "string"},
                    "recommendations": {
                        "type": "array",
                        "items": {"type": "object"},
                    },
                },
            },
        },
    }
--- a/tests/test_issue_sink.py
+++ b/tests/test_issue_sink.py
@@ -34,7 +34,7 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:
    monkeypatch.setattr(httpx, "post", fake_post)
-    ref = IssueCoreRestSink("http://issue-core.test/").emit(TaskSpec(
+    ref = IssueCoreRestSink("http://issue-core.test/", api_key="test-key").emit(TaskSpec(
        title="Run SBOM rescan for activity-core",
        description="SBOM is older than 30 days.",
        target_repo="activity-core",
@@ -67,9 +67,28 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:
                "triggering_event_id": "scheduled",
                "activity_definition_id": "activity-1",
            },
            "headers": {"Authorization": "Bearer test-key"},
            "timeout": 10.0,
        }
    ]
    assert "review_required" not in posts[0]["json"]
 def test_issue_core_rest_sink_requires_api_key() -> None:
    sink = IssueCoreRestSink("http://issue-core.test/", api_key="")
    with pytest.raises(RuntimeError, match="ISSUE_CORE_API_KEY"):
        sink.emit(TaskSpec(
            title="t",
            description="",
            target_repo="activity-core",
            priority="low",
            labels=[],
            due_in_days=None,
            source_type="rule",
            source_id="r",
            triggering_event_id="e",
            activity_definition_id="a",
        ))
@pytest.mark.asyncio
--- a/tests/test_kaizen_context_resolver.py
+++ b/tests/test_kaizen_context_resolver.py
@@ -0,0 +1,195 @@
 from __future__ import annotations
 from pathlib import Path
 from typing import Any
 import httpx
 import pytest
 import yaml
 from activity_core.context_resolvers.kaizen import (
    KaizenContextResolver,
    discover_kaizen_scheduled_repos,
 )
 class DummyResponse:
    def __init__(self, payload: Any, status_error: Exception | None = None) -> None:
        self.payload = payload
        self.status_error = status_error
    def raise_for_status(self) -> None:
        if self.status_error is not None:
            raise self.status_error
    def json(self) -> Any:
        return self.payload
 def _write_schedule(path: Path, agents: dict[str, Any]) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(
        yaml.safe_dump(
            {"version": "1", "timezone": "Europe/Berlin", "agents": agents},
            sort_keys=False,
        ),
        encoding="utf-8",
    )
 def test_discover_scheduled_repos_emits_enabled_coach(tmp_path, monkeypatch) -> None:
    repo_root = tmp_path / "pilot-repo"
    repo_root.mkdir()
    _write_schedule(
        repo_root / ".kaizen" / "schedule.yml",
        {"coach": {"cadence": "daily", "cron": "15 * * * *", "enabled": True}},
    )
    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse(
            [
                {
                    "slug": "pilot-repo",
                    "domain_slug": "custodian",
                    "host_paths": {"testhost": str(repo_root)},
                }
            ]
        )
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "testhost")
    monkeypatch.setattr(httpx, "get", fake_get)
    result = discover_kaizen_scheduled_repos({})
    assert len(result["scheduled_runs"]) == 1
    run = result["scheduled_runs"][0]
    assert run["repo"] == "pilot-repo"
    assert run["agent"] == "coach"
    assert run["enabled"] is True
    assert "schedule prepare coach" in run["prepare_command"]
 def test_discover_scheduled_repos_skips_disabled_coach(tmp_path, monkeypatch) -> None:
    repo_root = tmp_path / "pilot-repo"
    repo_root.mkdir()
    _write_schedule(
        repo_root / ".kaizen" / "schedule.yml",
        {"coach": {"cadence": "daily", "enabled": False}},
    )
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "testhost")
    monkeypatch.setattr(
        httpx,
        "get",
        lambda url, **kwargs: DummyResponse(
            [{"slug": "pilot-repo", "host_paths": {"testhost": str(repo_root)}}]
        ),
    )
    result = discover_kaizen_scheduled_repos({})
    assert result["scheduled_runs"] == []
 def test_discover_scheduled_repos_skips_missing_schedule(tmp_path, monkeypatch) -> None:
    repo_root = tmp_path / "no-schedule"
    repo_root.mkdir()
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "testhost")
    monkeypatch.setattr(
        httpx,
        "get",
        lambda url, **kwargs: DummyResponse(
            [{"slug": "no-schedule", "host_paths": {"testhost": str(repo_root)}}]
        ),
    )
    result = discover_kaizen_scheduled_repos({})
    assert result["scheduled_runs"] == []
 def test_discover_scheduled_repos_skips_invalid_schedule(tmp_path, monkeypatch) -> None:
    repo_root = tmp_path / "bad-schedule"
    schedule = repo_root / ".kaizen" / "schedule.yml"
    schedule.parent.mkdir(parents=True)
    schedule.write_text("version: '2'\nagents: {}\n", encoding="utf-8")
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "testhost")
    monkeypatch.setattr(
        httpx,
        "get",
        lambda url, **kwargs: DummyResponse(
            [{"slug": "bad-schedule", "host_paths": {"testhost": str(repo_root)}}]
        ),
    )
    result = discover_kaizen_scheduled_repos({})
    assert result["scheduled_runs"] == []
 def test_discover_scheduled_repos_filters_by_roster_and_cadence(
    tmp_path, monkeypatch
 ) -> None:
    repo_a = tmp_path / "kaizen-agentic"
    repo_b = tmp_path / "other-repo"
    for root in (repo_a, repo_b):
        _write_schedule(
            root / ".kaizen" / "schedule.yml",
            {
                "coach": {"cadence": "daily", "enabled": True},
                "optimization": {"cadence": "weekly", "enabled": True},
            },
        )
    roster = tmp_path / "roster.yaml"
    roster.write_text(
        yaml.safe_dump(
            {
                "active": [
                    {"slug": "kaizen-agentic", "agents": ["coach"], "status": "active"}
                ]
            }
        ),
        encoding="utf-8",
    )
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "testhost")
    monkeypatch.setattr(
        httpx,
        "get",
        lambda url, **kwargs: DummyResponse(
            [
                {"slug": "kaizen-agentic", "host_paths": {"testhost": str(repo_a)}},
                {"slug": "other-repo", "host_paths": {"testhost": str(repo_b)}},
            ]
        ),
    )
    result = discover_kaizen_scheduled_repos(
        {"roster": str(roster), "cadence": "daily"}
    )
    agents = {r["agent"] for r in result["scheduled_runs"]}
    repos = {r["repo"] for r in result["scheduled_runs"]}
    assert repos == {"kaizen-agentic"}
    assert agents == {"coach"}
 def test_hub_unreachable_raises(monkeypatch) -> None:
    monkeypatch.setenv("STATE_HUB_URL", "http://hub.test")
    def fail_get(url: str, **kwargs: Any) -> DummyResponse:
        raise httpx.ConnectError("down")
    monkeypatch.setattr(httpx, "get", fail_get)
    with pytest.raises(RuntimeError, match="State Hub unreachable"):
        discover_kaizen_scheduled_repos({})
 def test_resolver_registry_alias() -> None:
    resolver = KaizenContextResolver()
    assert resolver.resolve("unknown_query", None, {}) == {}
--- a/tests/test_ops_evidence_sinks.py
+++ b/tests/test_ops_evidence_sinks.py
@@ -166,6 +166,93 @@ def test_state_hub_progress_sink_is_idempotent(monkeypatch) -> None:
    assert result[0]["idempotency_key"] == idempotency_key
 def test_core_hub_interaction_event_sink_posts_and_verifies_compact_event(monkeypatch) -> None:
    posts: list[dict[str, Any]] = []
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        assert url == "http://core-hub.test/api/v2/interaction-events"
        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
        posts.append({"url": url, **kwargs})
        return DummyResponse(
            {
                "id": "event-1",
                "eventType": "ops-endpoint-verified",
                "widgetId": "widget-1",
            }
        )
    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
        assert url == "http://core-hub.test/api/v2/interaction-events"
        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
        return DummyResponse({"data": [{"id": "event-1"}]})
    monkeypatch.setenv("CORE_HUB_RUNTIME_TOKEN", "runtime-secret")
    monkeypatch.setattr(httpx, "post", fake_post)
    monkeypatch.setattr(httpx, "get", fake_get)
    result = persist_ops_inventory_evidence(
        _payload([
            {
                "type": "core-hub-interaction-event",
                "core_hub_url": "http://core-hub.test",
                "widget_id": "widget-1",
                "event_type": "ops-endpoint-verified",
            }
        ])
    )
    assert result == [
        {
            "type": "core-hub-interaction-event",
            "status": "posted",
            "event_type": "ops-endpoint-verified",
            "event_id": "event-1",
            "widget_id": "widget-1",
            "verified": True,
            "context_key": "ops_probe",
        }
    ]
    body = posts[0]["json"]
    assert body["widgetId"] == "widget-1"
    assert body["eventType"] == "ops-endpoint-verified"
    assert body["metadata"]["activity_core_run_id"] == _run_id()
    assert body["metadata"]["endpoint"]["url"] == "http://state-hub.test/health"
    assert body["metadata"]["endpoint"]["widget_ref"] == "ops:endpoint:state-hub-health"
    serialized = json.dumps(body, sort_keys=True)
    assert "runtime-secret" not in serialized
    assert "secret response body" not in serialized
    assert "Authorization" not in serialized
    assert "user:pass" not in serialized
    assert "token=secret" not in serialized
 def test_core_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
    monkeypatch.delenv("CORE_HUB_BASE_URL", raising=False)
    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN", raising=False)
    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN_FILE", raising=False)
    monkeypatch.delenv("CORE_HUB_WIDGET_ID", raising=False)
    monkeypatch.delenv("CORE_HUB_WIDGET_MAPPING", raising=False)
    result = persist_ops_inventory_evidence(
        _payload([{"type": "core-hub-interaction-event"}])
    )
    assert result == [
        {
            "type": "core-hub-interaction-event",
            "status": "skipped",
            "reason": "missing_core_hub_config",
            "missing": [
                "CORE_HUB_BASE_URL",
                "CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE",
                "widget_id or CORE_HUB_WIDGET_ID",
            ],
            "context_key": "ops_probe",
        }
    ]
 def test_inter_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
    monkeypatch.delenv("INTER_HUB_URL", raising=False)
    monkeypatch.delenv("OPS_HUB_KEY", raising=False)
--- a/tests/test_railiance_ops_inventory_wiring.py
+++ b/tests/test_railiance_ops_inventory_wiring.py
@@ -33,7 +33,9 @@ def _by_kind_name(kind: str, name: str) -> dict[str, Any]:
 def test_runtime_config_has_ops_inventory_placeholders() -> None:
    config = _by_kind_name("ConfigMap", "actcore-runtime-config")
-    assert config["data"]["LLM_CONNECT_URL"] == ""
+    assert config["data"]["LLM_CONNECT_URL"] == (
        "http://llm-connect.activity-core.svc.cluster.local:8080"
    )
    assert config["data"]["LLM_CONNECT_TIMEOUT_SECONDS"] == "300"
    assert config["data"]["OPS_INVENTORY_PATH"] == (
        "/etc/activity-core/ops/service-inventory.yml"
--- a/tests/test_resolve_context_binding.py
+++ b/tests/test_resolve_context_binding.py
@@ -0,0 +1,160 @@
 from __future__ import annotations
 import json
 import pytest
 from temporalio.exceptions import ApplicationError
 from activity_core import activities
 from activity_core.activities import _bind_resolver_result, resolve_context
 def test_bind_resolver_result_unwraps_single_key_wrapper() -> None:
    projects = [{"repo": "kaizen-agentic", "has_metrics": True}]
    assert _bind_resolver_result("projects", {"projects": projects}) == projects
 def test_bind_resolver_result_keeps_multi_key_summary() -> None:
    summary = {
        "repos": [{"repo_slug": "a"}],
        "stale_count": 1,
        "total_count": 2,
    }
    assert _bind_resolver_result("repos", summary) == summary
@pytest.mark.asyncio
 async def test_resolve_context_unwraps_kaizen_projects(monkeypatch) -> None:
    class _FakeResolver:
        def resolve(self, query: str, event: object, params: dict) -> dict:
            assert query == "discover_kaizen_projects"
            return {"projects": [{"repo": "pilot", "has_metrics": True}]}
    import activity_core.context_resolvers  # noqa: F401
    from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY
    monkeypatch.setitem(CONTEXT_RESOLVER_REGISTRY, "kaizen", lambda: _FakeResolver())
    snapshot = await resolve_context(
        [
            {
                "type": "kaizen",
                "query": "discover_kaizen_projects",
                "params": {},
                "bind_to": "context.projects",
            }
        ]
    )
    assert snapshot == {"projects": [{"repo": "pilot", "has_metrics": True}]}
@pytest.mark.asyncio
 async def test_resolve_context_binds_event_payload_attributes() -> None:
    envelope = {
        "type": "kaizen.metrics.recorded",
        "attributes": {
            "agent": "coach",
            "project": "kaizen-agentic",
            "summary": {
                "success_rate": 0.75,
                "execution_count": 12,
                "avg_quality": 0.81,
            },
        },
    }
    snapshot = await resolve_context(
        [
            {
                "type": "event-payload",
                "bind_to": "context.metrics",
            }
        ],
        json.dumps(envelope),
    )
    assert snapshot == {
        "metrics": {
            "agent": "coach",
            "project": "kaizen-agentic",
            "summary": {
                "success_rate": 0.75,
                "execution_count": 12,
                "avg_quality": 0.81,
            },
        }
    }
@pytest.mark.asyncio
 async def test_event_payload_context_supports_low_success_rate_rule() -> None:
    snapshot = await resolve_context(
        [
            {
                "type": "event-payload",
                "bind_to": "context.metrics",
            }
        ],
        json.dumps({
            "type": "kaizen.metrics.recorded",
            "attributes": {
                "agent": "coach",
                "project": "kaizen-agentic",
                "summary": {"success_rate": 0.75},
            },
        }),
    )
    result = await activities.evaluate_rules({
        "rules": [
            {
                "id": "flag-low-success-rate",
                "condition": "context.metrics.summary.success_rate < 0.8",
                "action": {
                    "task_template": (
                        "Review low success rate for {context.metrics.agent}"
                    ),
                    "target_repo": "context.metrics.project",
                    "priority": "high",
                    "labels": ["kaizen", "{context.metrics.agent}"],
                },
            }
        ],
        "event": {},
        "context": snapshot,
    })
    assert len(result) == 1
    assert result[0]["source_id"] == "flag-low-success-rate"
    assert result[0]["title"] == "Review low success rate for coach"
    assert result[0]["target_repo"] == "kaizen-agentic"
    assert result[0]["labels"] == ["kaizen", "coach"]
@pytest.mark.asyncio
 async def test_event_payload_context_binds_empty_when_optional_envelope_missing() -> None:
    snapshot = await resolve_context(
        [
            {
                "type": "event-payload",
                "bind_to": "context.metrics",
            }
        ],
    )
    assert snapshot == {"metrics": {}}
@pytest.mark.asyncio
 async def test_event_payload_context_fails_when_required_envelope_missing() -> None:
    with pytest.raises(ApplicationError, match="Required context resolver"):
        await resolve_context(
            [
                {
                    "type": "event-payload",
                    "bind_to": "context.metrics",
                    "required": True,
                }
            ],
        )
--- a/tests/test_reuse_surface_context_resolver.py
+++ b/tests/test_reuse_surface_context_resolver.py
@@ -0,0 +1,167 @@
 from __future__ import annotations
 import json
 from pathlib import Path
 from typing import Any
 import pytest
 from temporalio.exceptions import ApplicationError
 from activity_core.activities import resolve_context
 from activity_core.context_resolvers import reuse_surface
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY
 class _Response:
    def __init__(self, payload: Any) -> None:
        self._payload = payload
    def raise_for_status(self) -> None:
        return None
    def json(self) -> Any:
        return self._payload
 class _Completed:
    returncode = 0
    stderr = ""
    def __init__(self, payload: dict[str, Any]) -> None:
        self.stdout = json.dumps(payload)
 def _write_rollout(path: Path) -> None:
    path.write_text(
        """
 domains:
  reuse:
    phase: active
    repos:
      - reuse-surface
      - activity-core
  parked:
    phase: backlog
    repos:
      - ignored-repo
 """.lstrip(),
        encoding="utf-8",
    )
 def _write_cli_only_signals(path: Path) -> None:
    path.write_text(
        """
 signals:
  empty_capability_scaffold:
    enabled: true
  registry_gap:
    enabled: false
  stale_scope:
    enabled: false
  stale_sbom:
    enabled: false
  publish_check_fail:
    enabled: false
 """.lstrip(),
        encoding="utf-8",
    )
 def test_shell_resolver_emits_reuse_surface_gaps_and_advances_cursor(
    tmp_path,
    monkeypatch,
 ) -> None:
    rollout = tmp_path / "rollout.yaml"
    _write_rollout(rollout)
    _write_cli_only_signals(tmp_path / "signals.yml")
    reuse_root = tmp_path / "reuse-surface"
    reuse_root.mkdir()
    (reuse_root / "SCOPE.md").write_text("fresh\n", encoding="utf-8")
    activity_root = tmp_path / "activity-core"
    activity_root.mkdir()
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "runner")
    def fake_get(url: str, **kwargs: Any) -> _Response:
        assert url.endswith("/repos/")
        return _Response(
            [
                {
                    "slug": "reuse-surface",
                    "host_paths": {"runner": str(reuse_root)},
                },
                {
                    "slug": "activity-core",
                    "host_paths": {"runner": str(activity_root)},
                },
            ]
        )
    def fake_run(cmd: list[str], **kwargs: Any) -> _Completed:
        assert cmd == ["reuse-surface", "report", "gaps", "--format", "json"]
        return _Completed({"empty_scaffolds": ["reuse-surface"]})
    monkeypatch.setattr(reuse_surface.httpx, "get", fake_get)
    monkeypatch.setattr(reuse_surface.subprocess, "run", fake_run)
    import activity_core.context_resolvers  # noqa: F401
    result = CONTEXT_RESOLVER_REGISTRY["shell"]().resolve(
        "reuse_surface_report_gaps",
        None,
        {
            "roster": str(rollout),
            "batch_size": 1,
        },
    )
    assert result == {
        "gaps": [
            {
                "repo": "reuse-surface",
                "root": str(reuse_root),
                "signal": "empty_capability_scaffold",
                "hygiene_signal": "empty_capability_scaffold",
            }
        ]
    }
    state = json.loads((tmp_path / "round-robin-state.json").read_text(encoding="utf-8"))
    assert state["cursor"] == 1
    assert state["last_batch"] == ["reuse-surface"]
 def test_shell_resolver_keeps_kaizen_fallback_for_existing_queries() -> None:
    assert CONTEXT_RESOLVER_REGISTRY["shell"]().resolve("unknown_query", None, {}) == {}
@pytest.mark.asyncio
 async def test_optional_reuse_surface_missing_roster_binds_empty_list(tmp_path) -> None:
    snapshot = await resolve_context(
        [
            {
                "type": "shell",
                "query": "reuse_surface_report_gaps",
                "params": {"roster": str(tmp_path / "missing.yaml")},
                "bind_to": "context.gaps",
            }
        ]
    )
    assert snapshot == {"gaps": []}
@pytest.mark.asyncio
 async def test_required_reuse_surface_missing_roster_fails_visibly(tmp_path) -> None:
    with pytest.raises(ApplicationError, match="Required context resolver"):
        await resolve_context(
            [
                {
                    "type": "shell",
                    "query": "reuse_surface_report_gaps",
                    "params": {"roster": str(tmp_path / "missing.yaml")},
                    "bind_to": "context.gaps",
                    "required": True,
                }
            ]
        )
--- a/tests/test_schedule_health.py
+++ b/tests/test_schedule_health.py
@@ -0,0 +1,81 @@
 """ACTIVITY-WP-0014 T03: missed-fire detection verdict tests."""
 from __future__ import annotations
 from datetime import datetime, timedelta, timezone
 from activity_core.schedule_health import evaluate_schedule_health
 NOW = datetime(2026, 6, 23, 12, 0, tzinfo=timezone.utc)
 def test_healthy_when_recent_fire_and_no_drops() -> None:
    health = evaluate_schedule_health(
        activity_id="a1",
        missed_catchup_window=0,
        last_fired_at=NOW - timedelta(minutes=5),
        now=NOW,
        expected_interval=timedelta(hours=1),
    )
    assert health.healthy is True
    assert health.missed is False
    assert health.reasons == []
 def test_unhealthy_when_catchup_window_dropped_fires() -> None:
    health = evaluate_schedule_health(
        activity_id="a1",
        missed_catchup_window=2,
        last_fired_at=NOW - timedelta(minutes=5),
        now=NOW,
    )
    assert health.missed is True
    assert "2 fire(s) dropped" in health.reasons[0]
 def test_unhealthy_when_last_fire_too_stale() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=NOW - timedelta(days=2),
        now=NOW,
        expected_interval=timedelta(days=1),
    )
    assert health.missed is True
    assert any("exceeding the expected" in r for r in health.reasons)
    assert health.staleness == timedelta(days=2)
 def test_within_tolerance_is_healthy() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=NOW - (timedelta(days=1) + timedelta(minutes=5)),
        now=NOW,
        expected_interval=timedelta(days=1),
        tolerance=timedelta(minutes=10),
    )
    assert health.healthy is True
 def test_no_fire_recorded_for_due_schedule_is_unhealthy() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=None,
        now=NOW,
        expected_interval=timedelta(days=1),
    )
    assert health.missed is True
    assert "no recorded fire" in health.reasons[0]
 def test_no_interval_and_no_fire_is_not_flagged() -> None:
    # Without an expected interval we cannot assert a miss from absence alone.
    health = evaluate_schedule_health(
        activity_id="event-ish",
        missed_catchup_window=0,
        last_fired_at=None,
        now=NOW,
    )
    assert health.healthy is True
--- a/tests/test_schedule_lifecycle.py
+++ b/tests/test_schedule_lifecycle.py
@@ -37,6 +37,7 @@ def _make_defn(
    misfire_policy: str = "skip",
    enabled: bool = True,
    jitter: int = 0,
    catchup_window_seconds: int | None = None,
 ) -> ActivityDefinition:
    return ActivityDefinition(
        id=uuid.uuid4(),
@@ -46,6 +47,7 @@ def _make_defn(
            cron_expression=cron,
            misfire_policy=misfire_policy,
            jitter_seconds=jitter,
            catchup_window_seconds=catchup_window_seconds,
        ),
    )
@@ -186,6 +188,76 @@ async def test_misfire_policy_compress_sets_overlap_buffer_one(env: WorkflowEnvi
    await delete_schedule(env.client, defn.id)
 # ── ACTIVITY-WP-0014: explicit run-miss policies + catchup window ────────────
@pytest.mark.asyncio
 async def test_skip_sets_short_catchup_window(env: WorkflowEnvironment) -> None:
    """skip = run on trigger or skip: tiny grace window, no real recovery."""
    defn = _make_defn(misfire_policy="skip")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.SKIP
    assert desc.schedule.policy.catchup_window == timedelta(seconds=60)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_catchup_all_recovers_full_window(env: WorkflowEnvironment) -> None:
    """catchup_all = recover every missed fire: long window, BUFFER_ALL."""
    defn = _make_defn(misfire_policy="catchup_all")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ALL
    assert desc.schedule.policy.catchup_window == timedelta(days=365)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_catchup_latest_does_not_accumulate(env: WorkflowEnvironment) -> None:
    """catchup_latest = recover only the most recent missed fire: BUFFER_ONE."""
    defn = _make_defn(misfire_policy="catchup_latest")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ONE
    assert desc.schedule.policy.catchup_window == timedelta(hours=24)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_legacy_aliases_map_to_explicit_policies(env: WorkflowEnvironment) -> None:
    """Legacy catchup/compress keep working and pick up the new catchup windows."""
    catchup = _make_defn(misfire_policy="catchup")
    compress = _make_defn(misfire_policy="compress")
    await upsert_schedule(env.client, catchup)
    await upsert_schedule(env.client, compress)
    d1 = await env.client.get_schedule_handle(schedule_id(catchup.id)).describe()
    d2 = await env.client.get_schedule_handle(schedule_id(compress.id)).describe()
    assert d1.schedule.policy.catchup_window == timedelta(days=365)
    assert d2.schedule.policy.catchup_window == timedelta(hours=24)
    await delete_schedule(env.client, catchup.id)
    await delete_schedule(env.client, compress.id)
@pytest.mark.asyncio
 async def test_explicit_catchup_window_override(env: WorkflowEnvironment) -> None:
    """An explicit catchup_window_seconds overrides the per-policy default."""
    defn = _make_defn(misfire_policy="skip", catchup_window_seconds=7200)
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.catchup_window == timedelta(hours=2)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_schedule_smoke_test_creates_one_shot_schedule(
    env: WorkflowEnvironment,
--- a/tests/test_state_hub_context_resolver.py
+++ b/tests/test_state_hub_context_resolver.py
@@ -215,6 +215,29 @@ def test_coding_retro_returns_latest_progress_suggestions(monkeypatch) -> None:
                    ],
                },
            },
            {
                "id": "newer-30-day-retro",
                "event_type": "coding_retro",
                "summary": "monthly coding retro ready",
                "created_at": "2026-06-07T17:15:00Z",
                "detail": {
                    "generated_at": "2026-06-07T17:14:30Z",
                    "window": {
                        "days": 30,
                        "since": "2026-05-08T00:00:00Z",
                        "until": "2026-06-07T00:00:00Z",
                    },
                    "suggestions": [
                        {
                            "repo": "broad-retro-repo",
                            "title": "Should not displace the weekly retro",
                            "recommendation": "Keep weekly schedule bounded.",
                            "priority": "high",
                            "score": 99,
                        }
                    ],
                },
            },
        ])
    monkeypatch.setenv("STATE_HUB_URL", "http://state-hub.test/")
@@ -229,7 +252,7 @@ def test_coding_retro_returns_latest_progress_suggestions(monkeypatch) -> None:
    assert calls == [
        {
            "url": "http://state-hub.test/progress/",
-            "params": {"limit": 20},
+            "params": {"event_type": "coding_retro", "limit": 20},
            "timeout": 10.0,
        }
    ]
@@ -251,6 +274,47 @@ def test_coding_retro_returns_latest_progress_suggestions(monkeypatch) -> None:
    ]
 def test_coding_retro_returns_empty_when_window_does_not_match(monkeypatch) -> None:
    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse([
            {
                "id": "monthly-retro",
                "event_type": "coding_retro",
                "summary": "monthly coding retro ready",
                "created_at": "2026-06-07T17:10:00Z",
                "detail": {
                    "window": {"days": 30},
                    "suggestions": [
                        {
                            "repo": "activity-core",
                            "title": "Broad retro item",
                            "recommendation": "Do not emit from weekly schedule.",
                            "priority": "high",
                            "score": 10,
                        }
                    ],
                },
            }
        ])
    monkeypatch.setattr(httpx, "get", fake_get)
    result = StateHubContextResolver().resolve(
        "coding_retro",
        None,
        {"event_type": "coding_retro", "window_days": 7},
    )
    assert result == {
        "suggestions": [],
        "window": None,
        "generated_at": None,
        "source_progress_id": None,
        "event_type": "coding_retro",
        "summary": "",
    }
 def test_coding_retro_returns_empty_shape_when_not_published(monkeypatch) -> None:
    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse([
@@ -343,6 +407,70 @@ def test_recently_on_scope_hourly_failure_bubbles(monkeypatch) -> None:
        StateHubContextResolver().resolve("recently_on_scope_hourly", None, {"range": "1h"})
 def test_consistency_sweep_remote_all_posts_batch(monkeypatch) -> None:
    calls: list[dict[str, Any]] = []
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        calls.append({"url": url, **kwargs})
        return DummyResponse(
            {
                "exit_code": 0,
                "lock_skipped": False,
                "repos_processed": [{"repo_slug": "state-hub", "result": "pass"}],
                "skipped_clean": ["quiet-repo"],
                "skipped_missing": [],
                "skipped_budget": [],
            }
        )
    monkeypatch.setenv("STATE_HUB_URL", "http://state-hub.test/")
    monkeypatch.setattr(httpx, "post", fake_post)
    result = StateHubContextResolver().resolve(
        "consistency_sweep_remote_all",
        None,
        {"max_seconds": 300, "source": "activity-core", "required": True},
    )
    assert result["exit_code"] == 0
    assert result["repos_processed"][0]["repo_slug"] == "state-hub"
    assert calls == [
        {
            "url": "http://state-hub.test/consistency/sweep/remote-all",
            "json": {"max_seconds": 300, "source": "activity-core"},
            "timeout": 330.0,
        }
    ]
 def test_consistency_sweep_remote_all_failure_bubbles(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        raise httpx.ConnectError("offline")
    monkeypatch.setattr(httpx, "post", fake_post)
    with pytest.raises(httpx.ConnectError):
        StateHubContextResolver().resolve(
            "consistency_sweep_remote_all",
            None,
            {"max_seconds": 300},
        )
 def test_consistency_sweep_remote_all_rejects_empty_response(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse({})
    monkeypatch.setattr(httpx, "post", fake_post)
    with pytest.raises(RuntimeError, match="missing required key"):
        StateHubContextResolver().resolve(
            "consistency_sweep_remote_all",
            None,
            {"max_seconds": 300},
        )
 def test_recently_on_scope_hourly_rejects_empty_response(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse({})
--- a/tests/test_state_hub_write.py
+++ b/tests/test_state_hub_write.py
@@ -0,0 +1,81 @@
 """ACTIVITY-WP-0014 T05: idempotency-keyed State Hub writes."""
 from __future__ import annotations
 import httpx
 import pytest
 from activity_core import report_sinks
 from activity_core.state_hub_write import (
    IDEMPOTENCY_HEADER,
    idempotency_headers,
    idempotency_key,
 )
 def test_key_is_stable_and_deterministic() -> None:
    a = idempotency_key("run1", "daily-triage-report", "daily_triage")
    b = idempotency_key("run1", "daily-triage-report", "daily_triage")
    assert a == b == "run1:daily-triage-report:daily_triage"
 def test_key_shape_stable_with_missing_parts() -> None:
    assert idempotency_key("run1", None, "daily_triage") == "run1::daily_triage"
 def test_key_sanitizes_control_and_whitespace() -> None:
    key = idempotency_key("run 1", "a\tb", "x\n")
    assert "\t" not in key and "\n" not in key and " " not in key
 def test_headers_carry_the_key() -> None:
    headers = idempotency_headers("run1", "i", "e")
    assert headers == {IDEMPOTENCY_HEADER: "run1:i:e"}
 def test_distinct_identities_get_distinct_keys() -> None:
    assert idempotency_key("r", "i", "daily_triage") != idempotency_key(
        "r", "i", "schedule_miss"
    )
 def test_progress_exists_is_best_effort_on_connection_error(monkeypatch) -> None:
    """A down State Hub must not hard-fail the dedup read; it returns False so the
    keyed write can still proceed."""
    def _boom(*args, **kwargs):
        raise httpx.ConnectError("Connection refused")
    monkeypatch.setattr(report_sinks.httpx, "get", _boom)
    assert (
        report_sinks._progress_exists(
            "http://127.0.0.1:8000", "run1", "daily-triage-report", "daily_triage"
        )
        is False
    )
 def test_report_sink_post_sends_idempotency_header(monkeypatch) -> None:
    """The state-hub-progress write carries a stable Idempotency-Key header."""
    captured: dict[str, object] = {}
    monkeypatch.setattr(report_sinks, "_progress_exists", lambda *a, **k: False)
    class _Resp:
        def raise_for_status(self) -> None: ...
        def json(self) -> dict[str, str]:
            return {"id": "pid-1"}
    def _capture_post(url, json, headers, timeout):  # noqa: A002
        captured["headers"] = headers
        return _Resp()
    monkeypatch.setattr(report_sinks.httpx, "post", _capture_post)
    payload = {"run_id": "run1", "activity_id": "act1", "scheduled_for": None}
    report_entry = {"instruction_id": "daily-triage-report", "report": {"summary": "s"}}
    sink = {"event_type": "daily_triage"}
    result = report_sinks._post_state_hub_progress(payload, report_entry, sink)
    assert result["status"] == "posted"
    assert captured["headers"][IDEMPOTENCY_HEADER] == "run1:daily-triage-report:daily_triage"
--- a/tests/test_sync_schedules.py
+++ b/tests/test_sync_schedules.py
@@ -0,0 +1,126 @@
 from __future__ import annotations
 import uuid
 from datetime import datetime, timezone
 from types import SimpleNamespace
 from typing import Any
 import pytest
 from activity_core import sync_schedules
 def _row(
    *,
    activity_id: uuid.UUID,
    enabled: bool,
    trigger_config: dict[str, Any],
 ) -> SimpleNamespace:
    return SimpleNamespace(
        id=activity_id,
        name=f"definition-{activity_id}",
        enabled=enabled,
        trigger_config=trigger_config,
        context_sources=[],
        task_templates=[],
        dedupe_key_strategy="skip",
        version=1,
    )
@pytest.mark.asyncio
 async def test_sync_schedule_rows_reports_drift_counts_and_preserves_one_shots(
    monkeypatch,
 ) -> None:
    new_id = uuid.uuid4()
    disabled_old_id = uuid.uuid4()
    one_shot_id = uuid.uuid4()
    orphan_id = uuid.uuid4()
    upserted: list[tuple[uuid.UUID, bool, str]] = []
    deleted: list[str] = []
    async def fake_upsert_schedule(client: object, defn: object) -> None:
        upserted.append((
            defn.id,
            defn.enabled,
            defn.trigger_config.trigger_type,
        ))
    async def fake_list_schedules(client: object) -> list[dict[str, str]]:
        return [
            {
                "schedule_id": f"activity-schedule-{disabled_old_id}",
                "activity_id": str(disabled_old_id),
            },
            {
                "schedule_id": f"activity-schedule-{one_shot_id}-once",
                "activity_id": f"{one_shot_id}-once",
            },
            {
                "schedule_id": f"activity-schedule-{orphan_id}",
                "activity_id": str(orphan_id),
            },
        ]
    async def fake_delete_schedule(client: object, activity_id: str) -> None:
        deleted.append(activity_id)
    monkeypatch.setattr(sync_schedules, "upsert_schedule", fake_upsert_schedule)
    monkeypatch.setattr(sync_schedules, "list_schedules", fake_list_schedules)
    monkeypatch.setattr(sync_schedules, "delete_schedule", fake_delete_schedule)
    result = await sync_schedules.sync_schedule_rows(
        object(),
        [
            _row(
                activity_id=new_id,
                enabled=True,
                trigger_config={
                    "trigger_type": "cron",
                    "cron_expression": "20 7 * * *",
                    "timezone": "Europe/Berlin",
                    "misfire_policy": "skip",
                },
            ),
            _row(
                activity_id=disabled_old_id,
                enabled=False,
                trigger_config={
                    "trigger_type": "cron",
                    "cron_expression": "20 * * * *",
                    "timezone": "Europe/Berlin",
                    "misfire_policy": "skip",
                },
            ),
            _row(
                activity_id=one_shot_id,
                enabled=True,
                trigger_config={
                    "trigger_type": "scheduled",
                    "at": datetime(2026, 6, 19, 8, 0, tzinfo=timezone.utc),
                    "timezone": "UTC",
                },
            ),
            _row(
                activity_id=uuid.uuid4(),
                enabled=True,
                trigger_config={
                    "trigger_type": "event",
                    "event_type": "kaizen.metrics.recorded",
                    "filters": {},
                },
            ),
        ],
    )
    assert result.to_dict() == {
        "upserted": 2,
        "paused": 1,
        "deleted_orphans": 1,
    }
    assert upserted == [
        (new_id, True, "cron"),
        (disabled_old_id, False, "cron"),
        (one_shot_id, True, "scheduled"),
    ]
    assert deleted == [str(orphan_id)]
--- a/tests/test_sync_service.py
+++ b/tests/test_sync_service.py
@@ -0,0 +1,134 @@
 from __future__ import annotations
 from typing import Any
 import pytest
 from activity_core import sync_service
 from activity_core.sync_schedules import ScheduleSyncResult
@pytest.mark.asyncio
 async def test_run_sync_runs_requested_sections(monkeypatch) -> None:
    calls: list[str] = []
    async def fake_definitions(session_factory: object) -> int:
        calls.append("definitions")
        return 2
    async def fake_event_types(session_factory: object) -> int:
        calls.append("event_types")
        return 5
    async def fake_schedules(
        temporal_client: object,
        session_factory: object,
    ) -> ScheduleSyncResult:
        calls.append("schedules")
        return ScheduleSyncResult(upserted=3, paused=1, deleted_orphans=2)
    monkeypatch.setattr(sync_service, "sync_activity_definitions", fake_definitions)
    monkeypatch.setattr(sync_service, "sync_event_types", fake_event_types)
    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=object(),
        definitions=True,
        schedules=True,
        event_types=True,
    )
    assert calls == ["definitions", "event_types", "schedules"]
    assert result["ok"] is True
    assert result["ran"] == {
        "definitions": True,
        "schedules": True,
        "event_types": True,
    }
    assert result["definitions"] == {"synced": 2}
    assert result["event_types"] == {"synced": 5}
    assert result["schedules"] == {
        "upserted": 3,
        "paused": 1,
        "deleted_orphans": 2,
    }
    assert result["errors"] == []
@pytest.mark.asyncio
 async def test_run_sync_collects_errors_and_continues(monkeypatch) -> None:
    calls: list[str] = []
    async def failing_definitions(session_factory: object) -> int:
        calls.append("definitions")
        raise RuntimeError("definition parse failed")
    async def fake_schedules(
        temporal_client: object,
        session_factory: object,
    ) -> ScheduleSyncResult:
        calls.append("schedules")
        return ScheduleSyncResult(upserted=1)
    monkeypatch.setattr(
        sync_service,
        "sync_activity_definitions",
        failing_definitions,
    )
    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=object(),
        definitions=True,
        schedules=True,
        event_types=False,
    )
    assert calls == ["definitions", "schedules"]
    assert result["ok"] is False
    assert result["definitions"] == {"synced": 0}
    assert result["schedules"]["upserted"] == 1
    assert result["errors"] == [
        {
            "stage": "definitions",
            "type": "RuntimeError",
            "message": "definition parse failed",
        }
    ]
@pytest.mark.asyncio
 async def test_run_sync_reports_missing_temporal_client_for_schedules() -> None:
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=None,
        definitions=False,
        schedules=True,
        event_types=False,
    )
    assert result["ok"] is False
    assert result["errors"] == [
        {
            "stage": "schedules",
            "type": "RuntimeError",
            "message": "Temporal client is required for schedule sync",
        }
    ]
 def test_record_error_bounds_error_count() -> None:
    result: dict[str, Any] = {
        "ok": True,
        "errors": [],
    }
    for i in range(25):
        sync_service._record_error(result, "stage", RuntimeError(f"boom {i}"))
    assert result["ok"] is False
    assert len(result["errors"]) == 20
    assert result["errors"][0]["message"] == "boom 0"
    assert result["errors"][-1]["message"] == "boom 19"
--- a/uv.lock
+++ b/uv.lock
@@ -12,6 +12,7 @@ dependencies = [
    { name = "httpx" },
    { name = "nats-py" },
    { name = "pydantic" },
    { name = "pyyaml" },
    { name = "sqlalchemy", extra = ["asyncio"] },
    { name = "temporalio" },
    { name = "uvicorn", extra = ["standard"] },
@@ -34,6 +35,7 @@ requires-dist = [
    { name = "pydantic", specifier = ">=2.0" },
    { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = ">=0.24" },
    { name = "pyyaml", specifier = ">=6.0" },
    { name = "sqlalchemy", extras = ["asyncio"], specifier = ">=2.0" },
    { name = "temporalio", specifier = ">=1.7" },
    { name = "temporalio", extras = ["testing"], marker = "extra == 'dev'", specifier = ">=1.7" },
--- a/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
+++ b/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
@@ -8,7 +8,7 @@ status: active
 owner: codex
 topic_slug: custodian
 created: "2026-06-03"
-updated: "2026-06-07"
+updated: "2026-06-27"
 state_hub_workstream_id: "5646e13a-13af-4724-bca6-3c0d86f96733"
 ---
@@ -150,6 +150,59 @@ State Hub to `state-hub` (`dc10704f`), `railiance-cluster` (`53e78702`),
 activity-core runner plus three clean scheduled daily runs and calibration
 feedback.
 2026-06-16: Rechecked State Hub and the configured working-memory sink. State
 Hub `/progress/?event_type=daily_triage` still only shows activity-core
 `daily_triage` progress through 2026-06-06, and
 `/home/worsch/the-custodian/memory/working` only has `daily-triage-*` notes
 for 2026-06-02 through 2026-06-06. There is still no evidence of three clean
 consecutive scheduled runs after the June 7 runtime projection failure, so
 T03 remains `wait`.
 2026-06-18: Consumed the verified in-cluster llm-connect Service URL in the
 Railiance runtime projection. `actcore-runtime-config` now sets
 `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080` and
 keeps `LLM_CONNECT_TIMEOUT_SECONDS=300`. The remaining live gate is no longer
 the URL slot itself; it is operator-owned provider credential custody for
 `activity-core/llm-connect-provider-secrets`, a schema-valid fixture smoke, and
 then three clean scheduled daily triage runs.
 2026-06-18 follow-up: `llm-connect` reported State Hub message
 `6a098e1e-65de-4309-ab4a-446aba2f3587`: the provider Secret now has a populated
 key count and the in-namespace fixture smoke passed on the llm-connect side.
 The remaining activity-core gate is to reconcile the live Railiance runtime so
 the worker consumes the configured URL, then produce schema-valid daily triage
 evidence and three clean scheduled runs. This narrower path is tracked in
 `ACTIVITY-WP-0010`.
 2026-06-25: Consecutive-run streak resumed. State Hub `daily_triage` progress
 events from author `activity-core` fired on time on **2026-06-24 05:20:56Z** and
 **2026-06-25 05:20:47Z** (07:20 Berlin), both delivered, no misfires. That is two
 clean consecutive scheduled runs. **RECHECK 2026-06-26 (after 05:20Z):** confirm
 the 06-26 scheduled `daily_triage` event delivered. If clean, that completes three
 clean consecutive scheduled runs (06-24 / 06-25 / 06-26) — record the calibration
 result in State Hub and close T03. If the 06-26 run misfires or is missing, the
 streak resets and T03 stays `wait`. Flag deliberately kept in-repo (agent-agnostic)
 rather than tied to any single coding agent's scheduler.
 2026-06-26 recheck outcome: **streak reset at two.** The 06-26 scheduled run fired
 on time (`daily_triage` event 05:20:57Z) — scheduling layer healthy, no misfire —
 but the `daily-triage-report` instruction output **failed schema validation**:
 `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The model produced a
 long ranked WSJF recommendation list (reached rank 7+ with nested `wsjf` objects)
 whose JSON broke ~char 5268; only a bounded 4000-char preview is preserved in the
 State Hub event, so the exact offending token needs the runtime llm-connect log.
 This is an LLM-output-quality failure (tracked by `ACTIVITY-WP-0010`), not a
 runtime/projection failure. T03 stays `wait`; three clean consecutive scheduled
 runs not yet achieved (06-24 ✅, 06-25 ✅, 06-26 ✗-validation).
 2026-06-27 recheck outcome: streak remains reset. The scheduled run fired and
 wrote State Hub progress plus working memory, but daily-triage-report failed
 validation again with an unterminated string around char 5246. This confirms the
 runner/sink path is alive and the active blocker is live deployment of the
 ACTIVITY-WP-0016 output-robustness bundle and runtime prompt/token changes, not
 a missing schedule. T03 stays wait until a post-deployment smoke passes and three
 new clean scheduled runs are collected.
 ## Rule Action Contract Documentation
 ```task
--- a/workplans/ACTIVITY-WP-0008-weekly-coding-retro.md
+++ b/workplans/ACTIVITY-WP-0008-weekly-coding-retro.md
@@ -8,7 +8,7 @@ status: blocked
 owner: codex
 topic_slug: custodian
 created: "2026-06-07"
-updated: "2026-06-07"
+updated: "2026-06-17"
 state_hub_workstream_id: "7387fc50-1f2c-471a-9d85-bb085cbd0b63"
 ---
@@ -47,6 +47,12 @@ resolver. It reads recent `/progress/` items, selects the latest
 `event_type=coding_retro`, normalizes `suggestions[]`, and returns an empty
 suggestion list while the upstream publisher has not produced a read model yet.
 **2026-06-17:** Hardened the resolver lookup after live review found recent
 non-retro progress could hide older retro events. The resolver now queries
 State Hub with `event_type=coding_retro` and only selects a read model matching
 the requested `window_days`, so the weekly schedule cannot accidentally route a
 broader 30-day retro batch.
 ## `weekly-coding-retro` Activity-Definition
 ```task
@@ -92,3 +98,12 @@ make fix-consistency REPO=activity-core
 Live State Hub did not yet expose a published `event_type=coding_retro` progress
 item, so the real dry-run, duplicate check, and `enabled: true` flip remain
 blocked on `AGENTIC-WP-0010`.
 **2026-06-17:** `AGENTIC-WP-0010` is finished and State Hub has
 `coding_retro` progress. A live no-write smoke now resolves the matching weekly
 read model `ec20ac1c-ef50-4db4-a5dc-364d31a259a5`
 (`generated_at=2026-06-07T19:25:19Z`, `window.days=7`) and emits zero task
 specs because that weekly read model has zero suggestions. The schedule remains
 disabled until a non-empty weekly read model, or an explicit operator decision
 that a zero-suggestion dry-run is an acceptable enablement proof, confirms
 correct routing and no duplicate target tasks on re-run.
--- a/workplans/ACTIVITY-WP-0009-intent-gap-closure.md
+++ b/workplans/ACTIVITY-WP-0009-intent-gap-closure.md
@@ -0,0 +1,250 @@
 ---
 id: ACTIVITY-WP-0009
 type: workplan
 title: "Intent gap closure"
 domain: custodian
 repo: activity-core
 status: blocked
 owner: codex
 topic_slug: custodian
 created: "2026-06-16"
 updated: "2026-06-18"
 state_hub_workstream_id: "d64cfbba-6da7-4737-afb9-866afa0e9cda"
 ---
 # ACTIVITY-WP-0009 - Intent gap closure
 ## Context
 The 2026-06-16 review of activity-core against `INTENT.md` found that the repo
 matches the intended Event Bridge shape, but several production and contract
 gaps remain before the implementation fully satisfies the operational promise:
 - recurring scheduled work must be trusted without manual coordination
 - live task creation must be proven through issue-core, not only null-sink audit
 - `review_required` semantics must either be implemented or documented as
  metadata only
 - ops evidence must either remain explicitly fallback-first or activate the
  Inter-Hub / ops-hub backend behind operator-owned secrets
 - the `TaskExecutorWorkflow` stub must not become a back door into execution
  ownership
 - the internal FastAPI surface needs an explicit production access decision
 The preserved analysis lives in:
 `history/2026-06-16-intent-gap-analysis.md`
 ## Close Daily Triage Scheduled-Run Trust Gap
 ```task
 id: ACTIVITY-WP-0009-T01
 status: wait
 priority: high
 state_hub_task_id: "7012e4fd-2530-49b7-9c2f-1d949809a144"
 ```
 Close the scheduled-run trust gap identified in `ACTIVITY-WP-0006-T03`.
 Acceptance criteria:
 - activity-core has three clean consecutive scheduled daily State Hub WSJF
  triage runs after the June 7 runtime projection failure
 - each run has matching Temporal workflow history, `activity_runs` row, State
  Hub `daily_triage` progress, and working-memory report note
 - calibration feedback is recorded in State Hub
 - `ACTIVITY-WP-0006-T03` can move from `wait` to `done`
 Current wait reason: as of 2026-06-16, State Hub `daily_triage` progress and
 working-memory `daily-triage-*` notes only show activity-core evidence through
 2026-06-06.
 2026-06-18 update: activity-core now consumes the verified in-cluster
 llm-connect Service URL in `k8s/railiance/20-runtime.yaml`:
 `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080` with
 `LLM_CONNECT_TIMEOUT_SECONDS=300`. This removes the activity-core repo-side URL
 gap. Closure still waits on the operator-owned provider Secret for llm-connect,
 a schema-valid fixture smoke, and three clean scheduled daily triage runs with
 matching State Hub and working-memory evidence.
 2026-06-18 follow-up: State Hub message
 `6a098e1e-65de-4309-ab4a-446aba2f3587` reports that the llm-connect side is now
 complete: the provider Secret has a populated key count and the in-namespace
 fixture smoke passed. The remaining work is the activity-core / Railiance
 runtime reconciliation and daily-triage evidence collection path captured in
 `ACTIVITY-WP-0010`.
 ## Promote Issue-Core Task Emission Safely
 ```task
 id: ACTIVITY-WP-0009-T02
 status: wait
 priority: high
 state_hub_task_id: "3854677b-32b4-43f8-a6ca-5a2b25a08dd9"
 ```
 Move selected production-safe definitions from `ISSUE_SINK_TYPE=null` audit mode
 toward real issue-core task creation.
 Acceptance criteria:
 - issue-core endpoint, credentials, and duplicate-handling posture are approved
  for the target environment
 - one known-safe definition is run first in null-sink mode and its task specs are
  reviewed
 - the same definition creates exactly the expected issue-core task(s) through
  `IssueCoreRestSink`
 - `task_spawn_log` records the real returned task references
 - rollback to null-sink mode is documented
 Current wait reason: production Railiance currently uses null-sink audit mode;
 live issue-core credentials/access and duplicate-handling are not yet verified
 for this repo.
 ## Resolve Review-Required Contract Drift
 ```task
 id: ACTIVITY-WP-0009-T03
 status: done
 priority: medium
 state_hub_task_id: "1eafe5e4-8412-4104-a417-933efe8e7bbd"
 ```
 Resolve the mismatch between ADR language and current code for
 `review_required`.
 Options:
 - implement an issue-core-owned pending review queue contract and route
  `review_required=true` instruction outputs there, or
 - update ADR/docs to state that `review_required` is currently audit/report
  metadata only
 Acceptance criteria:
 - `docs/adr/adr-003-rule-instruction-model.md`, `SCOPE.md`, and tests describe
  the same behavior
 - no ActivityDefinition implies a review queue exists unless that downstream
  contract is live
 - report/spawn metadata remains available for operator review either way
 2026-06-16: Completed by aligning ADR-003 with the implemented behavior:
 `review_required` is audit/report metadata only until issue-core owns a pending
 review queue contract. `SCOPE.md` already had the same boundary, and
 `tests/test_issue_sink.py` now asserts the REST issue sink does not send a
 `review_required` field as though a review queue existed.
 ## Decide And Gate Ops Evidence Backend
 ```task
 id: ACTIVITY-WP-0009-T04
 status: done
 priority: medium
 state_hub_task_id: "61300966-c119-4ebf-af89-a6c50df93ac8"
 ```
 Decide whether the `ops-inventory` evidence path should remain State Hub
 fallback-first for now or activate Inter-Hub / ops-hub submission.
 Acceptance criteria:
 - the decision is recorded in State Hub and the relevant docs/workplans
 - if fallback-first remains the chosen mode, docs explicitly say State Hub
  `ops_inventory_probe` progress is the accepted closure path
 - if Inter-Hub is activated, `OPS_HUB_KEY` is provisioned outside Git, widget /
  capability mapping is configured, and live submission is tested without
  printing or storing secrets
 2026-06-16: Completed the current posture decision. State Hub decision
 `7c235bbb-ee6f-4c3e-b1dd-74717eac9082` records that State Hub
 `ops_inventory_probe` progress is the accepted live evidence backend for now.
 Inter-Hub / ops-hub per-entity submission remains future work gated on
 operator-owned `OPS_HUB_KEY` custody, widget mapping, and production intake
 smoke tests. `docs/runbook.md` documents the fallback-first posture.
 ## Remove Or Rehome TaskExecutor Stub Risk
 ```task
 id: ACTIVITY-WP-0009-T05
 status: done
 priority: medium
 state_hub_task_id: "fbe3e822-1a7c-4fe6-8251-cc8a782b9516"
 ```
 Reduce the chance that `TaskExecutorWorkflow` attracts real execution work
 inside activity-core.
 Acceptance criteria:
 - decide whether the stub should stay registered, be removed, or be moved to an
  execution-owned repo/workplan
 - if it stays, docs and comments explicitly mark it as non-production and
  outside the activity-core ownership boundary
 - no production ActivityDefinition or workflow path depends on `task_instances`
  as task lifecycle state
 2026-06-16: Completed by deciding to keep `TaskExecutorWorkflow` registered only
 as a compatibility/idempotency stub. `src/activity_core/workflows.py` and
 `docs/conventions.md` now mark it as non-production and outside activity-core's
 execution boundary. No production ActivityDefinition uses `task_instances` for
 task lifecycle state.
 ## Decide FastAPI Production Access Posture
 ```task
 id: ACTIVITY-WP-0009-T06
 status: done
 priority: medium
 state_hub_task_id: "99e1e301-296b-4f78-8843-2a39e59ecd7d"
 ```
 Choose and document the production access posture for the FastAPI admin surface.
 Acceptance criteria:
 - operator decides whether the API remains ClusterIP-only or receives an
  authenticated ingress
 - if ingress is chosen, hostname, auth layer, allowed users/agents, and audit
  expectations are documented before exposure
 - runbook and Railiance deployment docs match the chosen posture
 2026-06-16: Completed the current access posture decision. State Hub decision
 `9ffaf7a9-227a-4e39-92e3-cd93d8cda1f2` records that the FastAPI admin surface
 remains ClusterIP-only until a separate authenticated ingress/access-policy work
 item chooses hostname, auth layer, allowed users/agents, and audit expectations.
 `docs/runbook.md` and `k8s/railiance/README.md` now agree on this posture.
 ## Completion Criteria
 - The historical findings are preserved under `history/`.
 - `SCOPE.md`, ADRs, workplans, and implementation agree on activity-core's
  boundary.
 - Daily scheduled triage has real consecutive-run calibration evidence.
 - At least one production-safe task creation path is proven against issue-core,
  or null-sink mode is explicitly accepted as the current production posture.
 - Ops evidence backend posture is explicit and tested in the chosen mode.
 - No registered workflow or API path invites activity-core to own execution,
  task lifecycle, project state, or privileged ops control.
 ## Implementation Pass - 2026-06-16
 Agent-actionable closure is complete for T03, T04, T05, and T06.
 Remaining waits:
 - T01 waits on real scheduled daily triage run evidence.
 - T02 waits on issue-core production endpoint/credentials and duplicate-handling
  approval.
 Verification:
 ```bash
 .venv/bin/pytest tests/test_issue_sink.py tests/rules/test_executor.py -k "review_required or issue_core_rest_sink"
 ```
 Result: 3 passed, 24 deselected.
 After this workplan is synced by the custodian operator, run from `~/state-hub`:
 ```bash
 make fix-consistency REPO=activity-core
 ```
--- a/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
+++ b/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
@@ -0,0 +1,225 @@
 ---
 id: ACTIVITY-WP-0010
 type: workplan
 title: "Daily Triage LLM Reconciliation And Evidence"
 domain: custodian
 repo: activity-core
 status: blocked
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
 updated: "2026-06-27"
 state_hub_workstream_id: "f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9"
 ---
 # ACTIVITY-WP-0010 - Daily Triage LLM Reconciliation And Evidence
 ## Context
 This workplan implements the in-scope portion of the latest activity-core
 suggestion review against `INTENT.md` and `SCOPE.md`.
 Relevant accepted suggestion:
 - State Hub message `6a098e1e-65de-4309-ab4a-446aba2f3587` from
  `llm-connect` says `LLM-WP-0006` is complete on the llm-connect side. The
  stable Service URL is
  `http://llm-connect.activity-core.svc.cluster.local:8080`, timeout remains
  `300`, the provider Secret reports populated key count, and the in-namespace
  fixture smoke passed with schema-valid endpoint behavior.
 Why this belongs in activity-core:
 - `INTENT.md` says activity-core owns the **when/what/where** loop for
  scheduled coordination work.
 - `SCOPE.md` keeps LLM instruction execution in scope through the llm-connect
  boundary, while keeping provider credentials and cluster reconciliation out of
  scope.
 - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` remain open because daily
  State Hub WSJF triage has not yet produced three clean scheduled runs after
  the June 7 runtime projection failure.
 Suggestions reviewed but not accepted as product/runtime implementation work:
 - `coding_retro` activity-core suggestions for Bash tool thrash, schema thrash,
  and read-before-edit hygiene are agent workflow advice. They are useful for
  Codex operating style, but they do not change activity-core's Event Bridge
  product surface and should not become runtime code.
 - The earlier local-kubectl / cluster-owned evidence suggestion for
  `ACTIVITY-WP-0007` has already been handled by moving live evidence ownership
  to Railiance and closing the workplan from cluster-owned proof.
 Latest evidence before this workplan:
 - State Hub `daily_triage` progress on 2026-06-18 still shows
  `LLM_CONNECT_URL is not configured`, which means the live activity-core
  runtime has not yet consumed the repo-side URL update.
 - `k8s/railiance/20-runtime.yaml` now sets the verified llm-connect Service URL
  and `LLM_CONNECT_TIMEOUT_SECONDS=300`.
 ## Confirm Repo-Side Runtime Contract
 ```task
 id: ACTIVITY-WP-0010-T01
 status: done
 priority: high
 state_hub_task_id: "dd52ce21-23b8-4e46-b3af-cb7bf486e40f"
 ```
 Update activity-core's Railiance runtime projection so the daily triage worker
 consumes the verified llm-connect Service URL by default.
 Done when:
 - `k8s/railiance/20-runtime.yaml` sets
  `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
 - `LLM_CONNECT_TIMEOUT_SECONDS=300` remains configured.
 - Wiring tests assert the URL and timeout.
 - The Railiance README states that provider credentials remain operator-owned
  and outside Git / State Hub.
 2026-06-18: Completed. Updated the runtime ConfigMap, README, and
 `tests/test_railiance_ops_inventory_wiring.py`. Focused tests passed:
 `tests/test_railiance_ops_inventory_wiring.py tests/test_llm_client.py`
 reported 9 passed.
 ## Reconcile Live Railiance Runtime
 ```task
 id: ACTIVITY-WP-0010-T02
 status: done
 priority: high
 state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a"
 ```
 Apply or reconcile the updated activity-core Railiance runtime through the
 cluster-owned deployment path, not through ad hoc local kubectl from this repo.
 Done when non-secret evidence shows:
 - live `actcore-runtime-config` has the verified `LLM_CONNECT_URL` and timeout;
 - the activity-core worker has restarted or otherwise consumed the new config;
 - `activity-core/llm-connect-provider-secrets` remains present with a populated
  key count only, without printing or storing secret values;
 - the State Hub bridge remains reachable from the activity-core runtime.
 Current wait reason: this is Railiance/operator-owned live cluster work. State
 Hub handoff message `9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8` asks
 `railiance-cluster` to reconcile the updated config and smoke it.
 2026-06-19 recheck:
 - Deployed `llm-connect` into the `activity-core` namespace on `railiance01`
  (the cluster that runs `actcore-worker`). `coulombcore` had llm-connect only;
  the in-cluster Service URL is cluster-local.
 - `actcore-runtime-config` already exposed the verified URL and timeout;
  `deployment/actcore-worker` was restarted and now reports
  `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
 - `llm-connect-provider-secrets` reports `DATA 1`; no Secret values were
  inspected.
 - Worker health probe to llm-connect `/health` returns `{"status": "ok"}`.
 - `actcore-state-hub-bridge` remains `0/1` Ready with upstream timeouts, so T02
  is not fully closed until the node-local State Hub tunnel is restored.
 2026-06-27 recheck:
 - Superseded by real scheduled runner evidence: State Hub daily_triage events on
  2026-06-24, 2026-06-25, 2026-06-26, and 2026-06-27 all reached State Hub and
  wrote working-memory notes. The bridge/sink is therefore reachable for the
  live runner.
 - 2026-06-24 and 2026-06-25 were schema-valid; 2026-06-26 and 2026-06-27 failed
  output validation after calling llm-connect. That moves the active blocker out
  of T02 and into the WP-0016 live bundle/smoke lane. Marking T02 done.
 ## Run Daily Triage Fixture Smoke
 ```task
 id: ACTIVITY-WP-0010-T03
 status: wait
 priority: high
 state_hub_task_id: "10e0df77-c230-4a82-b720-23c66bd17c0a"
 ```
 After T02, run a manual or smoke execution of
 `daily-statehub-wsjf-triage` against the live activity-core runtime.
 Done when:
 - the run calls llm-connect through the configured Service URL;
 - llm-connect returns content accepted as schema-valid daily-triage JSON;
 - State Hub receives a `daily_triage` progress item with `output_validated=true`;
 - the working-memory daily-triage note exists at the path recorded in State Hub
  detail;
 - `scripts/verify_daily_triage.py` reports the smoke/manual run as present.
 2026-06-19 recheck:
 - In-namespace llm-connect fixture smoke on `railiance01` passed:
  `smoke: pass health=ok latency_seconds=1.681 recommendations=1`.
 - Manual `POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/trigger`
  reached llm-connect, but the workflow failed at `persist_instruction_reports`
  with `state-hub-progress` sink `Connection refused` while
  `actcore-state-hub-bridge` is unhealthy.
 - T03 therefore remains open until State Hub bridge reachability is restored and
  a run emits non-secret `daily_triage` progress with `output_validated=true`.
 2026-06-27 recheck:
 - Scheduled runs on 2026-06-24 and 2026-06-25 satisfy the non-secret smoke
  evidence for llm-connect call, State Hub progress with output_validated=true,
  and working-memory note creation.
 - Kept T03 at progress rather than done because the workstation did not run the
  live verifier against Temporal/activity-core DB, and the smoke must be repeated
  after the WP-0016 code/schema/runtime-prompt deployment due the 2026-06-26 and
  2026-06-27 malformed-output failures.
 ## Collect Three Clean Scheduled Runs
 ```task
 id: ACTIVITY-WP-0010-T04
 status: wait
 priority: high
 state_hub_task_id: "dc6b9482-cf43-4fc5-994b-dcd7dea47db7"
 ```
 Let the normal 07:20 Europe/Berlin schedule produce three consecutive clean
 daily triage runs after the live config reconciliation.
 Done when:
 - three consecutive scheduled runs have Temporal workflow evidence,
  `activity_runs` rows, State Hub `daily_triage` progress, and working-memory
  notes;
 - none of the three runs are merely manual smoke tests or `execution_failed`
  diagnostics;
 - calibration feedback is recorded in State Hub;
 - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` can move from `wait` to
  `done`.
 2026-06-27 recheck:
 - Three-clean-run streak is reset. The latest sequence is 2026-06-24 clean,
  2026-06-25 clean, 2026-06-26 validation_failed, 2026-06-27 validation_failed.
 - Current pickup is to deploy ACTIVITY-WP-0016 code/schema together with the
  Railiance runtime prompt and max_tokens changes, run a live smoke, then restart
  the three-consecutive-scheduled-run gate from zero.
 ## Close Handoff State
 ```task
 id: ACTIVITY-WP-0010-T05
 status: wait
 priority: medium
 state_hub_task_id: "ecc57e21-1716-4daa-aba6-d8a6d824e4ed"
 ```
 Update the surrounding workplans and State Hub once the live daily triage gate
 passes.
 Done when:
 - `ACTIVITY-WP-0006` records the three-run calibration evidence;
 - `ACTIVITY-WP-0009` records the scheduled-run trust gap closure;
 - any temporary `needs_human` flags created for the llm-connect provider/config
  handoff are cleared or replaced by a narrower follow-up;
 - this workplan is marked `finished`.
--- a/workplans/ACTIVITY-WP-0011-event-payload-context-resolver.md
+++ b/workplans/ACTIVITY-WP-0011-event-payload-context-resolver.md
@@ -0,0 +1,179 @@
 ---
 id: ACTIVITY-WP-0011
 type: workplan
 title: "Event Payload Context Resolver"
 domain: custodian
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
 updated: "2026-06-18"
 state_hub_workstream_id: "4efe4bcf-2148-4489-b57c-87f6039d4ed5"
 ---
 # ACTIVITY-WP-0011 - Event Payload Context Resolver
 ## Context
 State Hub message `d561ebd7-ba01-4dc6-8ffc-fe87d45304ee` from
 `kaizen-agentic` handed off an urgent blocker for LOOP-WP-0002:
 event-triggered definitions can receive the triggering EventEnvelope JSON, but
 activity-core did not bind `source.type: event-payload` into the context
 snapshot. The immediate customer is the disabled
 `coulomb-low-success-rate-review` ActivityDefinition, whose
 `flag-low-success-rate` rule needs to evaluate
 `context.metrics.summary.success_rate`.
 This is in activity-core scope because the repo owns ActivityDefinition context
 resolution and the Event Bridge workflow boundary. The remaining event type
 registry and live NATS smoke evidence are cross-repo/operator gates and should
 wait in State Hub rather than depending on local kubectl or ad hoc live cluster
 access from this repo.
 ## Implement Event Payload Resolver
 ```task
 id: ACTIVITY-WP-0011-T01
 status: done
 priority: high
 state_hub_task_id: "5c87ce0b-3bd0-4a44-aae5-10d7586c939e"
 ```
 Register resolver type `event-payload` so event-triggered definitions can bind
 the triggering EventEnvelope attributes into `context.*`.
 Done when:
 - `activity_core.context_resolvers` imports and registers an `event-payload`
  resolver.
 - `resolve_context` parses `event_envelope_json` once and passes the parsed
  envelope to registered resolvers.
 - `source.type: event-payload` extracts envelope `attributes`.
 - `bind_to: context.metrics` strips the `context.` prefix and unwraps a
  single-key `{"metrics": ...}` attributes payload into `snapshot["metrics"]`.
 - Missing or malformed envelopes fail required sources visibly and bind `{}` for
  optional sources.
 2026-06-18: Completed in `src/activity_core/activities.py` and
 `src/activity_core/context_resolvers/event_payload.py`.
 ## Cover Binding And Rule Evaluation
 ```task
 id: ACTIVITY-WP-0011-T02
 status: done
 priority: high
 state_hub_task_id: "c6f7dea6-9adc-4997-a22e-4bf2e94dc05a"
 ```
 Add focused tests for the handoff acceptance contract.
 Done when:
 - sample `kaizen.metrics.recorded` envelope attributes resolve to:
  `{"metrics": {"agent": "coach", "project": "kaizen-agentic", "summary": ...}}`;
 - `flag-low-success-rate` evaluates
  `context.metrics.summary.success_rate < 0.8`;
 - optional missing envelopes bind `{}`;
 - required missing envelopes raise a visible activity failure.
 2026-06-18: Completed in `tests/test_resolve_context_binding.py`. Focused
 tests passed:
 `.venv/bin/python -m pytest tests/test_resolve_context_binding.py tests/test_rule_evaluation_activity.py`
 reported 8 passed, and adjacent rule tests
 `.venv/bin/python -m pytest tests/rules/test_evaluator.py tests/rules/test_actions.py`
 reported 55 passed.
 ## Wait For Event Type Registry
 ```task
 id: ACTIVITY-WP-0011-T03
 status: done
 priority: high
 state_hub_task_id: "a4f277de-eb83-41bc-860e-b26586c72495"
 ```
 Confirm that `kaizen.metrics.recorded` is registered in the shared event type
 catalog through the owning State Hub / producer workflow.
 Done when:
 - State Hub or the producer-owned event catalog exposes
  `kaizen.metrics.recorded` with an attributes schema covering
  `metrics.agent`, `metrics.project`, and `metrics.summary.success_rate`;
 - the registry decision names the owning repo for future schema changes;
 - activity-core has no local-only event type drift from the producer contract.
 Registry ownership: the event type is producer/catalog owned. Activity-core
 accepted State Hub-backed registry confirmation before closing the workplan.
 2026-06-18: Closed from State Hub acknowledgement
 `3efb56d8-c3d6-4308-82ea-76eaaa172255` from `kaizen-agentic`. The producer
 registered `kaizen.metrics.recorded` in `kaizen-agentic/event-types/` with
 status `active`, publisher `kaizen-agentic`, and schema fields
 `agent`, `project`, `summary.success_rate`, `summary.execution_count`, and
 `summary.avg_quality`. The sync command reported was
 `ACTIVITY_DEFINITION_DIRS=~/coulomb-loop:~/kaizen-agentic make sync-event-types`.
 ## Wait For Live Event Smoke
 ```task
 id: ACTIVITY-WP-0011-T04
 status: done
 priority: high
 state_hub_task_id: "3b636d5e-8f93-49b4-ae53-3da4f736a4d9"
 ```
 After T03, run the live event-triggered path without relying on local kubectl
 from activity-core.
 Done when State Hub records non-secret evidence that:
 - a sample `kaizen.metrics.recorded` envelope was published on the expected NATS
  subject;
 - activity-core triggered `coulomb-low-success-rate-review`;
 - the resolved context snapshot contained `context.metrics.summary.success_rate`;
 - `flag-low-success-rate` matched and produced the expected task/report output;
 - any disabled-definition or operator-controlled enablement state was recorded.
 Execution ownership: this cross-repo/live-runtime smoke was owned by the event
 producer, customer definition owner, and cluster/operator path. Activity-core
 accepted the non-secret evidence from State Hub.
 2026-06-18: Closed from State Hub acknowledgement
 `68bfcd0d-7c47-4b42-85fc-64d63f38a909` from `kaizen-agentic`.
 Supplier confirms R1 acceptance criteria met and LOOP-WP-0002 closed. Evidence:
 NATS `activity.kaizen.metrics.recorded` triggered
 `coulomb-low-success-rate-review` (`da7a9af7`), run
 `e61554c6-1e67-5fa1-b34e-478d154a188e`, `tasks_spawned=1`, with
 `metrics.summary.success_rate=0.75`.
 ## Close Handoff
 ```task
 id: ACTIVITY-WP-0011-T05
 status: done
 priority: medium
 state_hub_task_id: "5169d8c5-769f-4272-97cf-c25b31087601"
 ```
 Close the urgent R1/live-smoke handoff once State Hub has acknowledgement that
 the resolver-side blocker is removed. The broader workplan remains blocked only
 on T03 event-type registry confirmation.
 Done when:
 - State Hub message `d561ebd7-ba01-4dc6-8ffc-fe87d45304ee` is answered or
  linked to this workplan;
 - `kaizen-agentic` / LOOP-WP-0002 can proceed without an activity-core code
  blocker;
 - this workplan has no remaining activity-core code or live-smoke blocker.
 2026-06-18: Closed from State Hub acknowledgement
 `68bfcd0d-7c47-4b42-85fc-64d63f38a909`. The original handoff message
 `d561ebd7-ba01-4dc6-8ffc-fe87d45304ee` was answered, and the live smoke
 evidence in T04 unblocks LOOP-WP-0002.
 2026-06-18: Workplan finished. T03 registry confirmation, T04 live event smoke,
 and T05 handoff closure are all done in State Hub.
--- a/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
+++ b/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
@@ -0,0 +1,192 @@
 ---
 id: ACTIVITY-WP-0012
 type: workplan
 title: "Definition And Schedule Hot Reload"
 domain: custodian
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
 updated: "2026-06-22"
 state_hub_workstream_id: "8887075e-21ec-451b-b82b-cd81035c9ca5"
 ---
 # ACTIVITY-WP-0012 - Definition And Schedule Hot Reload
 ## Context
 State Hub message `f4876517-f738-4571-a2d6-76f2965e9a13` from
 `coulomb-loop` reports an operational gap from the Coulomb cadence ramp: after
 renaming customer definitions from hourly to daily, operators had to run
 definition/schedule sync and restart the worker before new Temporal schedule
 state was reliable.
 Current behavior:
 - `worker.py` runs `sync_activity_definitions` and `sync_schedules` once at
  startup.
 - `RunActivityWorkflow` loads ActivityDefinitions from the DB at activity time.
 - The event router reloads enabled event definitions per NATS message.
 - Cron schedule changes only take effect when `sync_schedules` runs.
 This belongs in activity-core because the repo owns ActivityDefinition sync,
 Temporal schedule projection, and the admin API. The first implementation
 should expose an operator-triggered sync path without turning activity-core into
 a repo checkout manager or CI system.
 ## Extract Reusable Sync Service
 ```task
 id: ACTIVITY-WP-0012-T01
 status: done
 priority: high
 state_hub_task_id: "53a7970b-7eec-47f5-ad30-bbd7c6271952"
 ```
 Refactor the worker-startup sync sequence into a reusable async service that can
 be called by startup and the API.
 Done when:
 - the service can run ActivityDefinition sync, event type sync, and Temporal
  schedule sync independently based on booleans;
 - it accepts the existing DB session factory / Temporal client dependencies
  without creating hidden global state;
 - startup behavior remains unchanged except for calling the shared service;
 - failures are collected into a bounded `errors[]` result while preserving the
  current startup best-effort behavior.
 2026-06-19: Completed. Added `activity_core.sync_service.run_sync`, which
 orchestrates ActivityDefinition, event type, and schedule sync independently
 from explicit DB session factory and Temporal client dependencies. Worker
 startup now calls the shared service for definitions+schedules and logs bounded
 stage errors while continuing startup.
 ## Add Admin Sync Endpoint
 ```task
 id: ACTIVITY-WP-0012-T02
 status: done
 priority: high
 state_hub_task_id: "8697c761-15d1-4da0-b66b-d838218a2495"
 ```
 Add an operator-only API endpoint:
 `POST /admin/sync?definitions=true&schedules=true&event_types=true`
 Done when:
 - the endpoint runs the shared sync service without requiring worker restart;
 - response JSON reports counts for definitions, event types, schedules upserted,
  schedules paused/deleted, and errors;
 - default parameters sync definitions and schedules, with event types opt-in or
  clearly documented;
 - endpoint tests cover definitions-only, schedules-only, all-sync, and failure
  result behavior.
 2026-06-19: Completed. Added `POST /admin/sync` with defaults
 `definitions=true`, `schedules=true`, and `event_types=false`. The response
 reports definition/event counts, schedule upsert/pause/orphan-delete counts, and
 bounded `errors[]`. Tests cover definitions-only, schedules-only, all-sync, and
 failure-result behavior.
 ## Preserve Schedule Drift Semantics
 ```task
 id: ACTIVITY-WP-0012-T03
 status: done
 priority: high
 state_hub_task_id: "efeac412-632c-4c90-9428-bb575ac7a624"
 ```
 Make the sync result explicit enough for cadence changes and renames.
 Done when:
 - disabled cron definitions pause their Temporal schedules on sync;
 - renamed definitions create the new schedule and pause/delete orphaned old
  schedules according to the existing `sync_schedules` semantics;
 - event-triggered definitions remain hot through the existing router DB reload
  path;
 - regression tests demonstrate the Coulomb hourly-to-daily rename shape without
  needing a worker restart.
 2026-06-19: Completed. `sync_schedules` now returns explicit counts for enabled
 schedule upserts, disabled schedule pauses, and orphan deletes. Regression tests
 cover the hourly-to-daily rename shape: a new enabled cron schedule is upserted,
 the old disabled cron schedule is preserved as paused, unrelated orphan
 schedules are deleted, event-triggered definitions do not create schedules, and
 one-shot scheduled definitions are no longer mistaken for orphans.
 ## Optional Background Sync Loop
 ```task
 id: ACTIVITY-WP-0012-T04
 status: done
 priority: medium
 state_hub_task_id: "d774087b-c51d-4444-8e90-bfef43765456"
 ```
 Decide whether to add a periodic sync loop after the admin endpoint exists.
 Done when:
 - either `ACTIVITY_SYNC_INTERVAL_SECONDS` is implemented with a default disabled
  or conservative interval, or the workplan records why manual/admin-triggered
  sync is the safer v1 posture;
 - if implemented, logs and metrics expose the last successful sync timestamp and
  last error summary;
 - the loop does not block worker startup or workflow task processing.
 2026-06-19: Completed by decision. v1 stays manual/operator-triggered through
 `POST /admin/sync`; no background loop was added. The runbook records this
 posture so customer definition changes stay explicit and the worker does not
 start background repo scanning. A periodic loop remains a future option if live
 operator use proves it is needed.
 ## Live No-Restart Smoke
 ```task
 id: ACTIVITY-WP-0012-T05
 status: done
 priority: high
 state_hub_task_id: "68a0e22a-106a-4d21-9f39-c6279850cb5e"
 ```
 Validate the hot-reload path in the cluster/operator environment.
 Done when non-secret State Hub evidence shows:
 - a customer repo definition rename or `enabled` flip is synced through
  `/admin/sync`;
 - new Temporal schedules are active and retired schedules are paused/deleted
  without worker SIGTERM or pod restart;
 - event-triggered definitions still fire normally;
 - rollback or repeat sync is idempotent.
 2026-06-22: Completed on Railiance01 (`KUBECONFIG=~/.kube/config-hosteurope`).
 Smoke target: disabled projection `ops-service-inventory-probes`
 (`40d15a87-7ff6-4d8e-992c-37df15f95110`) in
 `actcore-external-activity-definitions`.
 Evidence:
 - ConfigMap flip `enabled: false -> true` and cadence `15 * * * * -> 25 * * * *`,
  then `POST /admin/sync?definitions=true&schedules=true` from `actcore-api`.
 - DB after sync: `enabled=true`, `cron=25 * * * *`.
 - Temporal schedule after sync: `paused=false`, calendar minute `25`.
 - Repeat sync returned identical schedule counts
  (`upserted=5`, `paused=1`, `deleted_orphans=0`) — idempotent.
 - Rollback flip restored `enabled=false`, `cron=15 * * * *`, schedule
  `paused=true`, calendar minute `15`.
 - `actcore-worker` pod UID unchanged (`a68d6539-2bba-457e-a78a-39564002a980`,
  started `2026-06-21T18:46:46Z`); `actcore-event-router` pod UID unchanged.
 - Event-triggered definitions: none projected on Railiance01 today; hot DB
  reload path for event definitions remains covered by T03 unit tests and an
  unchanged event-router deployment.
 Automation: `scripts/smoke_admin_sync_no_restart.py`. Runbook section added
 under "Railiance01 no-restart smoke".
--- a/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
+++ b/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
@@ -0,0 +1,78 @@
 ---
 id: ACTIVITY-WP-0013
 type: workplan
 title: "Reuse Surface Report Gaps Resolver"
 domain: custodian
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: activity-core
 created: "2026-06-18"
 updated: "2026-06-18"
 state_hub_workstream_id: "01e68dfd-b146-4aef-a575-2d3b178ca5c2"
 ---
 # Reuse Surface Report Gaps Resolver
 Implement the R2 handoff from kaizen-agentic (`bffa224c`) so the
 `reuse_surface_report_gaps` shell context source populates
 `context.gaps` for the Coulomb daily registry hygiene sweep.
 ## Register Shell Resolver Query
 ```task
 id: ACTIVITY-WP-0013-T01
 status: done
 priority: high
 state_hub_task_id: "a6e1fc5c-7b42-436d-914e-4d605cb6f329"
 ```
 Add a dedicated reuse-surface context resolver module and register
 `reuse_surface_report_gaps` on the `shell` resolver path while preserving
 the existing kaizen shell query behavior.
 ## Implement Batch And Signal Semantics
 ```task
 id: ACTIVITY-WP-0013-T02
 status: done
 priority: high
 state_hub_task_id: "229cf285-8388-471d-95fd-08400db1553e"
 ```
 Load the Coulomb rollout roster, select active repos with a persisted
 round-robin cursor, resolve repo roots from State Hub host paths, run
 `reuse-surface report gaps --format json`, and emit gap records for the
 enabled registry hygiene signals.
 ## Cover Required And Optional Failure Modes
 ```task
 id: ACTIVITY-WP-0013-T03
 status: done
 priority: high
 state_hub_task_id: "85b5c7d4-40e1-4945-8ada-1dff2363c194"
 ```
 Ensure missing required dependencies fail visibly while optional resolver
 sources bind an empty `context.gaps` list. Add unit coverage for fixture
 rollout data, mocked CLI JSON, resolver binding, and `hygiene_signal`
 rule gating.
 ## Smoke Real Coulomb Rollout
 ```task
 id: ACTIVITY-WP-0013-T04
 status: done
 priority: medium
 state_hub_task_id: "6a5446ed-b4ec-4693-b508-65415571d834"
 ```
 Run a live resolver smoke against
 `/home/worsch/coulomb-loop/loops/registry-hygiene/rollout.yaml` using a
 temporary round-robin cursor. The real active rollout produced five gaps,
 including one for `reuse-surface` with `hygiene_signal: stale_sbom`.
 The smoke supplied `reuse_surface_bin:
 /home/worsch/reuse-surface/.venv/bin/reuse-surface` and
 `runner_host: bnt-lap001`; the worker environment or definition params must
 provide equivalent values before enabling the production sweep.
--- a/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
+++ b/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
@@ -0,0 +1,194 @@
 ---
 id: ACTIVITY-WP-0014
 type: workplan
 title: "Schedule Misfire Robustness & Run-Miss Recovery Options"
 domain: infotech
 repo: activity-core
 status: finished
 owner: claude
 topic_slug: activity-core
 created: "2026-06-23"
 updated: "2026-06-24"
 status_note: "T01-T05 complete; beachhead-endpoint adoption split to ACTIVITY-WP-0015"
 state_hub_workstream_id: "91b64686-5d17-4c86-bc9e-3d0ee6720cf5"
 ---
 # Schedule Misfire Robustness & Run-Miss Recovery Options
 Make cron-triggered ActivityDefinitions robust to missed fires (worker/Temporal
 unavailable at trigger time) with explicit, per-definition recovery behaviour,
 plus detection/alerting when a scheduled fire is missed.
 ## Motivation
 On 2026-06-22 and 2026-06-23 the `daily-statehub-wsjf-triage` definition
 (cron `20 7 * * *` Europe/Berlin, projected into the Railiance runtime ConfigMap
 `actcore-external-activity-definitions`) produced **no `daily_triage` progress
 event at all** — neither a success nor a `could not run; operator review
 required` failure.
 > **Corrected by T01 (2026-06-23).** The initial hypothesis below — that
 > `_build_schedule()` never set `catchup_window`, so a short-default catchup
 > window silently dropped the fire — was **disproven on the live cluster**. The
 > Temporal schedule is healthy with `CatchupWindow 365d` (the server default) and
 > `0 MissedCatchupWindow`. The real cause is that the run **fired and ran but
 > failed at the report sink** with `Connection refused` posting to State Hub,
 > because railiance01 reaches State Hub via a reverse tunnel back to the
 > workstation, which is asleep at 07:20 Berlin. See the T01 findings and T05.
 The trigger now originates entirely on **railiance01** (in-cluster Temporal
 Schedule, ConfigMap-projected definition) and is **not** laptop-dependent — but
 the triage's State Hub *data dependencies* (context resolution and report
 delivery) still route back to the workstation State Hub.
 This workplan still delivers worthwhile robustness — explicit run-miss recovery
 policies (T02) and missed-fire detection (T03) — but the fix for *this* incident
 is T05 (resilient sinks/resolvers + a workstation-independent State Hub endpoint).
 ## Desired run-miss options (from Bernd)
 Three explicit, per-definition behaviours when a fire is missed:
 1. **Run on trigger or skip** — never recover a missed fire.
 2. **Run on trigger or later if missed** — recover **all** missed fires when back up.
 3. **Run on trigger or later if missed, but skip if next trigger reached** —
   recover only the **most recent** missed fire; do not accumulate a backlog.
 Proposed mapping to a new `misfire_policy` value set (names open to review):
 | Policy | Semantics | Temporal mapping |
 | --- | --- | --- |
 | `skip` | Run on trigger or skip | `catchup_window ≈ 0`, `overlap=SKIP` |
 | `catchup_all` | Run on trigger or all missed later | `catchup_window=<long>`, `overlap=BUFFER_ALL` |
 | `catchup_latest` | Run on trigger or only the latest missed | `catchup_window ≈ 1 interval`, `overlap=BUFFER_ONE` |
 ## Confirm root cause on Railiance01
 ```task
 id: ACTIVITY-WP-0014-T01
 status: done
 priority: high
 state_hub_task_id: "c90ff214-9214-48c7-96b9-7d699528d5ab"
 ```
 Inspected via `ssh railiance01` + in-node `kubectl`/`temporal` (no k3s tunnel is
 defined for railiance01; the documented access path is SSH to the host).
 **Findings (2026-06-23) — the WP-0014 premise was wrong for this incident:**
 - All pods healthy; `actcore-worker` up 44h, 0 restarts. Not a crash.
 - The daily-triage Temporal schedule (`activity-schedule-6fca51fa-…`) is
  **healthy**: `Paused false`, `OverlapPolicy Skip`, **`CatchupWindow 365d`**
  (Temporal's *default* when unset), `ActionCounts {Total:8, MissedCatchupWindow:0}`.
  So fires were **not** silently dropped — my original "no catchup window → silent
  drop" hypothesis does not hold; the server default is already 365d.
 - The `2026-06-23T05:20:00Z` fire **did fire and ran**, then **Failed at the report
  sink**: `report sink failure: state-hub-progress … '[Errno 111] Connection
  refused'`. The run produced a report but could not deliver it to State Hub, so
  no `daily_triage` progress event (not even a "could not run" one) was posted →
  the silence. The 06-22 fire has no execution in retention (bridge likely down
  then too / schedule update window at `LastUpdateAt 1d ago`).
 - Root cause is **State Hub connectivity from railiance01**, not Temporal. The
  in-cluster `actcore-state-hub-bridge` (`hostNetwork`) proxies to
  `127.0.0.1:18000` on the node — the local end of the ops-bridge **reverse tunnel
  back to the workstation's State Hub**. At 07:20 Europe/Berlin (= 05:20 UTC) the
  workstation/tunnel was unreachable → `Connection refused`. Chronic flakiness
  confirmed: 102 State Hub resolver timeouts in 24h (69 `recently_on_scope`,
  33 `consistency_sweep`).
 **Implication:** the trigger *is* independent of the laptop, but the triage's
 **data dependencies (State Hub context resolution + report delivery) still route
 back to the workstation State Hub**, which is asleep at 07:20 Berlin. WP-0014's
 misfire policies are still good robustness, but the real fix is (a) State Hub
 reachable from railiance01 independent of the workstation, and/or (b) sinks/
 resolvers resilient to transient State Hub unavailability (retry/backoff,
 store-and-forward) instead of hard-failing the workflow. Tracked as follow-up
 below. Backfill deferred: a replay only succeeds while the workstation State Hub
 is reachable.
 ## Implement explicit misfire recovery modes
 ```task
 id: ACTIVITY-WP-0014-T02
 status: done
 priority: high
 state_hub_task_id: "19615562-4cb2-4f25-872f-505d6e40dcc5"
 ```
 Add `catchup_window_seconds` to `CronTriggerConfig` and redefine `misfire_policy`
 into the three explicit modes above. In `_build_schedule()` set
 `SchedulePolicy(overlap=..., catchup_window=timedelta(...))` per mode. Remove the
 ad-hoc 1-hour `backfill` hack in favour of native catchup-window semantics. Keep
 backward compatibility for existing `skip`/`catchup`/`compress` values (alias
 map). Unit tests for each mode's `(catchup_window, overlap)` mapping.
 ## Missed-fire detection & alert sink
 ```task
 id: ACTIVITY-WP-0014-T03
 status: done
 priority: medium
 state_hub_task_id: "dbedd96a-59ca-4b83-bce6-35755b076807"
 ```
 Detect when a scheduled definition has no successful run within its expected
 interval + tolerance, and emit a signal (State Hub progress event and/or
 agent-inbox message) so a miss is visible even under `skip`. This is the
 observability the current silent-drop behaviour lacks — a miss should never again
 be invisible.
 ## Apply policy to runtime definitions & document
 ```task
 id: ACTIVITY-WP-0014-T04
 status: done
 priority: medium
 state_hub_task_id: "04e9d1d2-1192-4402-9402-b12c5d7d44e5"
 ```
 Set `misfire_policy: catchup_latest` for `daily-statehub-wsjf-triage`, documented
 run-miss options in `docs/runbook.md`.
 **Deployed & verified to railiance01 (2026-06-24):** built `activity-core:
 railiance01-prod` with the WP-0014 code (T02/T03/T05), imported into k3s
 containerd, applied the ConfigMap, rolled `actcore-worker`/`api`/`event-router`
 onto the new image, and ran `/admin/sync` (6 defs, 4 schedules upserted, 0
 errors). The live Temporal schedule now reports `OverlapPolicy BufferOne` +
 `CatchupWindow 1d` (= `catchup_latest`); pods healthy, API `db:true temporal:true`.
 ## Keep activity-core thin under the State Hub beachhead model
 ```task
 id: ACTIVITY-WP-0014-T05
 status: done
 priority: high
 state_hub_task_id: "b7e5b877-1b09-421c-a04e-78f785dc00a1"
 ```
 **Architecture decision (Bernd, 2026-06-23):** the resilience that this incident
 needs — queuing writes and caching reads while State Hub is unreachable — must
 **not** be a burden carried by client repos. It belongs to State Hub as a
 **per-machine local "beachhead"** (transparent read cache + write outbox, possibly
 with State-Hub federation), owned by custodian/state-hub. It handles all three
 failure modes: network interruption, central State Hub crash, central machine
 down. This is handed off to state-hub (see the coordination message / proposal);
 **do not build client-side queue/cache logic in activity-core.**
 activity-core's only responsibilities under this model are thin:
 - **Idempotent writes — DONE (2026-06-23, in-repo):** added
  `activity_core/state_hub_write` (`idempotency_headers`); every State Hub write
  (report-sink, ops-evidence, schedule-miss) now sends a stable `Idempotency-Key`
  header derived from `run_id:instruction_id:event_type`. The read-based
  `_progress_exists` dedup is now best-effort (returns `False` on connection
  error instead of hard-failing), so the guarantee lives on the keyed write, not
  a live read. Tests in `tests/test_state_hub_write.py`; documented in
  `docs/runbook.md`.
 - **Adopt the beachhead endpoint — MOVED to [[ACTIVITY-WP-0015]]:** pointing
  `STATE_HUB_URL` at the local beachhead and retiring the bespoke
  `actcore-state-hub-bridge` proxy depend on the state-hub beachhead existing
  first. Split into WP-0015 (status `blocked`) so this workplan can close on its
  completed in-repo work rather than waiting on an external capability.
 T05 is done as far as activity-core can act now; the external-dependent adoption
 lives in WP-0015.
--- a/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
+++ b/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
@@ -0,0 +1,54 @@
 ---
 id: ACTIVITY-WP-0015
 type: workplan
 title: "Adopt State Hub Beachhead Endpoint"
 domain: infotech
 repo: activity-core
 status: blocked
 owner: claude
 topic_slug: activity-core
 created: "2026-06-24"
 updated: "2026-06-24"
 state_hub_workstream_id: "bbc07f9e-9323-4b2b-b556-c33b37d0b228"
 ---
 # Adopt State Hub Beachhead Endpoint
 Carries the **blocked remainder** of [[ACTIVITY-WP-0014]] T05. The in-repo half
 (idempotency-keyed State Hub writes) shipped in WP-0014; this workplan is the
 client-side adoption that depends on the state-hub-owned **beachhead** capability
 (per-machine read cache + write outbox) existing first.
 **Blocked on:** the state-hub beachhead (proposal sent to the `state-hub` agent,
 2026-06-23). Do not build queue/cache logic in activity-core — see
 [[statehub-beachhead-principle]].
 ## Point STATE_HUB_URL at the beachhead
 ```task
 id: ACTIVITY-WP-0015-T01
 status: wait
 priority: medium
 state_hub_task_id: "76b6132d-394a-4a67-bef6-73bb9d1e277e"
 ```
 Once the state-hub beachhead exposes a local endpoint, point activity-core's
 `STATE_HUB_URL` (and the railiance runtime config) at it and verify reads are
 served from cache and writes are queued/flushed correctly when central State Hub
 is unreachable. Confirm idempotency-keyed writes dedup on flush (no duplicate
 `daily_triage`/progress events).
 ## Retire the bespoke actcore-state-hub-bridge proxy
 ```task
 id: ACTIVITY-WP-0015-T02
 status: wait
 priority: medium
 state_hub_task_id: "526c2129-cbf7-4531-a319-aebfc75cc6a3"
 ```
 Remove the inline `hostNetwork` HTTP proxy `actcore-state-hub-bridge` from
 `k8s/railiance/20-runtime.yaml` — it is a primitive precursor of the beachhead
 and should be replaced by the state-hub-owned component, not extended. Re-verify
 the daily triage end-to-end after cutover, including an overnight scheduled run
 while the workstation is asleep (the original failure condition).
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -0,0 +1,379 @@
 ---
 id: ACTIVITY-WP-0016
 type: workplan
 title: "LLM Output Robustness & The Producer Trust Boundary"
 domain: custodian
 repo: activity-core
 status: active
 owner: codex
 topic_slug: custodian
 created: "2026-06-26"
 updated: "2026-06-27"
 state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
 ---
 # ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
 ## Context
 On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on
 time (`daily_triage` event 05:20:57Z) but its output **failed schema
 validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The
 model emitted a long ranked WSJF recommendation list (reached rank 7+ with
 nested `wsjf` objects) and the JSON broke deep in that list. Because the report
 is a single monolithic JSON document, one malformed delimiter discarded the
 **entire** run. This reset the three-clean-consecutive-scheduled-runs streak in
 `ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
 LLM-output-quality surface deferred from `ACTIVITY-WP-0010`.
 The scheduling/runtime layer is healthy — this is purely an output-robustness
 and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`)
 already: passes the output schema to llm-connect as a `json_schema` model param
 (`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser
 (`_parse_json_output`), and preserves a bounded 4000-char preview on hard
 failure (`_invalid_output_report`). None of that helps when error locality is
 zero: the failure unit is the whole document, not the offending item.
 ## Design Frame — The Producer Trust Boundary
 This workplan is anchored to a deliberate architectural stance, not just a bug
 fix. Capture it in an ADR (T04) so future work inherits it.
 **Premise.** activity-core has a *trust boundary* where free-form producer
 output meets strict deterministic consumers (JSON Schema validators, the task
 emitter, classic compute pipelines). The producers are **LLMs and humans (and
 agents acting for either)**. Both are *untrusted producers*: their output may be
 - **erroneous** — hallucination, truncation (token-limit cutoff), drift,
  type slips, typos; or
 - **malicious** — prompt injection, crafted payloads, oversized/deeply-nested
  structures aimed at exhausting or confusing the consumer.
 The architecture should treat the boundary as an adversarial frontier and place
 **guardrails + error-correction tooling there**, rather than letting raw
 producer output flow into deterministic consumers and fail (or worse, partially
 succeed) downstream.
 **Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem,
 there are two sensible strategies — and they compose:
 - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the
  happy path. Blast radius depends entirely on how granular the catch is. Good
  when failures are rare and locally recoverable. Risk: failures surface late,
  possibly after partial side effects.
 - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
  and normalize the output to a known-good shape *before* it enters the pipeline
  — drop bad items, coerce types, bound sizes/depth, allow-list references — so
  the consumer only ever sees clean input. Higher upfront cost, smaller blast
  radius, no partial side effects. Good when failures are common or
  consequences are high.
 **Governing principles for this repo:**
 1. **Push verification to the boundary; keep the interior strict.** Apply
   posture **B** at the producer→consumer boundary (verify+mitigate structure);
   keep posture **A** for residual exceptions inside the verified core. Never
   relax the interior schema to absorb producer sloppiness.
 2. **Make error locality match the unit of work.** One bad recommendation must
   cost one recommendation, not the whole report. Framing the payload so each
   item is independently parseable is the single highest-leverage change.
 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
   provenance-tagged artifacts (index, error, raw snippet) so they can be
   debugged or replayed — degraded-but-usable is distinct from total loss.
 4. **Both human and agent input get the same rigor.** Guardrails are
   producer-agnostic: the same size/depth/count caps, reference allow-lists, and
   truncation detection apply whether the producer is an LLM, an agent, or a
   human form submission.
 ## Reproduce & Root-Cause The Failure
 ```task
 id: ACTIVITY-WP-0016-T01
 status: wait
 priority: high
 state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
 ```
 Recover the **full** raw llm-connect response for the 06-26 failure (the State
 Hub event keeps only a 4000-char preview; the break is at char 5268) and
 establish the precise cause.
 Done when:
 - the full raw response is pulled from the runtime llm-connect log / response
  store and the exact offending token at char 5268 is identified;
 - `finish_reason` is captured to confirm or rule out token-limit **truncation**
  vs a structural mid-stream glitch;
 - it is confirmed whether llm-connect actually **enforced** the `json_schema`
  constrained-decoding hint or merely accepted it as advisory (this determines
  whether the schema param is load-bearing);
 - the failing payload is captured as a regression fixture under `tests/`.
 2026-06-26 findings (local analysis on the workstation):
 - **Mechanism confirmed structurally.** There are **16 active workstreams**
  org-wide and the triage instruction emits ~one ranked recommendation per
  candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
  break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
  structural cause — more items = more tokens = higher odds of a mid-stream JSON
  slip and/or truncation. This directly justifies T02's bounded top-N + per-item
  framing.
 - **Both attempts failed.** `executor._execute` retries once
  (`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
  **retry** output, so the model produced invalid JSON twice — not a one-off.
 - **activity-core discards the diagnostics needed to root-cause this.** Three
  retention gaps mean the exact char-5268 token cannot be recovered from
  activity-core data at all:
  1. `LLMConnectClient.complete()` returns only `data["content"]`
     (`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
     llm-connect HTTP response, so truncation-vs-structural cannot be
     distinguished locally.
  2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
     `executor.py:259`) — below the 5268 break.
  3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
 - **Remaining (remote, operator-owned).** Confirming the exact offending token
  and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
  — cluster access, outside this repo's SCOPE for direct action. Truncation is
  the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
  identical either way, so T01 does not block the build work.
 - **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
  `finish_reason`/`usage` and persist a larger bounded raw artifact on validation
  failure so this class of failure is never un-debuggable again.
 - Partial fixture saved:
  `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
  (the 4000-char preview + validation error; full payload pending the remote pull).
 ## Schema + Prompt Redesign For Error Locality
 ```task
 id: ACTIVITY-WP-0016-T02
 status: progress
 priority: high
 state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
 ```
 Redesign the daily-triage report contract so a single malformed item can no
 longer discard the whole report (principle #2).
 Done when:
 - the recommendation list is **bounded** (configurable top-N, default 5–7) in
  both the prompt and the output schema — long lists are where the model drifts;
 - the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one
  recommendation object per line — or an equivalent delimited per-item form)
  behind a minimal stable envelope (`summary` + framed items), so each item is
  an independent parse unit;
 - the prompt explicitly states the contract, the per-item framing, the cap, and
  a "if uncertain, emit fewer well-formed items rather than more" instruction;
 - `max_tokens` is set with headroom for the bounded list so truncation cannot
  occur at the expected size;
 - the output schema file (`_load_output_schema` target) is updated to match.
 2026-06-26 progress (in-repo portion):
 - **Strict, bounded schema written** — `schemas/daily-triage-report.json` went
  from `recommendations.items: {type: object}` (accept-anything) to a strict
  per-item contract: `required [rank, candidate, action, why]` with typed
  `wsjf` sub-fields, plus `maxItems: 7`. The strict item shape is what lets the
  T03 boundary parser validate each recommendation independently.
 - **`maxItems` is a hint, not a hard reject** — the in-repo validator
  (`_validate_schema_node`) only enforces `type`/`required`/`properties`/`items`
  and ignores `maxItems`/`enum`. That is deliberate: a hard `maxItems` reject
  would discard a whole 16-item report — the exact blast-radius bug WP-0016
  removes. The bound is enforced via the prompt + the llm-connect `json_schema`
  constraint hint + T03 mitigation (keep top-N by rank, quarantine extras).
 - **DEPLOY COUPLING (important):** this schema file is consumed *both* as the
  llm-connect hint *and* by the current whole-document validator. Tightening
  per-item `required` fields makes the existing whole-doc validation hard-fail
  **more** until T03 replaces it with per-item quarantine. Therefore the schema
  change MUST ship together with T03 — do not deploy the strict schema to the
  runtime bundle ahead of the T03 parser. Four executor/instruction tests that
  asserted the old loose contract were updated to the strict contract; the
  forwarded-schema test now reads the live file instead of hard-coding it.
 - **Truncation hypothesis corroborated** — the instruction config carries
  `max_tokens` on the order of ~1200 (per the wiring test fixture). 5268 chars ≈
  ~1300–1500 tokens, so a ~1200-token cap would truncate a 16-item list right at
  the observed break. This strengthens T01's leading hypothesis and makes the
  `max_tokens` headroom change below concrete.
 **Bundle handoff (NOT in this repo — runtime-projected definition).** The triage
 prompt and `max_tokens` live in the Railiance runtime bundle, not in repo files.
 Apply there:
 1. Instruct a **bounded top-N** (≤ 7) ranked recommendations, "if uncertain emit
   fewer well-formed items rather than more."
 2. Specify the **per-item framing** the T03 parser will consume (NDJSON: a
   leading summary object, then one recommendation JSON object per line).
 3. Raise **`max_tokens`** to give clear headroom for 7 framed items (eliminate
   truncation at the expected size).
 4. State the value vocabularies (`action`, `confidence`) the T04 guardrails will
   check.
 ## Boundary Parser — Verify & Mitigate (Posture B)
 ```task
 id: ACTIVITY-WP-0016-T03
 status: done
 priority: high
 state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
 ```
 Implement item-granular parsing with a quarantine lane in
 `src/activity_core/rules/executor.py`, applying posture **B** at the boundary
 (principles #1–#3).
 Done when:
 - the parser splits the envelope from the framed items, then parses **each item
  independently**; a malformed item is routed to a bounded `quarantined_items`
  artifact (index + validation error + raw snippet), not raised;
 - a run with some valid and some invalid items emits a report over the surviving
  valid items with `output_validated=true`, plus `partial=true` and
  `quarantined_count` / `quarantined_items` markers — degraded-but-usable is
  reported distinctly from total loss;
 - a best-effort **repair** pass (close unterminated brackets/quotes, recover the
  valid prefix) is attempted per item before quarantining it;
 - truncation detected in T01 is handled as its own signal (recover whole items
  emitted before the cutoff rather than failing the document);
 - the existing monolithic-document path remains as the fallback when framing is
  absent (backward compatible with task-only instructions).
 2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`):
 - **Resilient recovery wired into `_execute`.** When the whole-document parse +
  one retry still fail, report instructions (those with `report_sinks`) now run
  `_resilient_report` *before* the total-loss `_invalid_output_report`. If it
  recovers ≥1 valid item it returns a partial report; otherwise it returns None
  and the prior total-loss path is preserved unchanged.
 - **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output
  was pretty-printed (multi-line objects), so naive NDJSON line recovery would
  have failed. `_extract_object_spans` walks the `recommendations` array
  brace-depth- and string-aware, so it recovers each recommendation object
  whether pretty-printed across many lines *or* emitted one-per-line (NDJSON).
  The truncated trailing object is returned with `complete=False`.
 - **Layered mitigation per item:** `json.loads` → on failure for a truncated
  tail, a best-effort `_try_repair` (balance open string/brackets/braces) →
  then `_partition_items` validates each recovered object against the T02 item
  schema. Valid items survive; malformed or over-`maxItems` items are
  quarantined with provenance (`index`, `error`, `raw` snippet, `reason`).
 - **Report shape on degradation:** `output_validated=True` over the survivors,
  `review_required=True`, `partial=True`, `quarantined_count`, and a bounded
  `quarantined_items` list (cap 20). Degraded-but-usable is now reported
  distinctly from total loss.
 - **Verified against the real failure shape.** New tests reconstruct a
  pretty-printed report with 7 valid recommendations + a truncated tail (the
  06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers
  all 7 and quarantines the broken tail (previously: whole run discarded);
  log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item
  run keeps 2 and quarantines the rank-less one.
 - **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the
  *happy* path (valid JSON, all items schema-valid, but > N items) — the resilient
  path only runs on failure, so over-limit-on-success is a guardrail/count-cap
  concern, which is exactly T04's remit.
 ## Producer Guardrails + ADR-004
 ```task
 id: ACTIVITY-WP-0016-T04
 status: done
 priority: medium
 state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
 ```
 Write the architecture decision record and add the producer-agnostic guardrails
 (principle #4).
 Done when:
 - `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary,
  the untrusted-producer premise (erroneous **and** malicious; human and agent),
  the A vs B taxonomy and where each applies, the error-locality principle, and
  the quarantine-with-provenance rule;
 - boundary guardrails are enforced at the consumer edge: max item **count**, max
  string length, max nesting **depth**, and a **reference allow-list** (e.g. a
  recommendation `candidate` / a task `target_repo` must resolve to a known
  workstream/repo before it is acted on);
 - guardrail rejections are quarantined with provenance, consistent with T03;
 - SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
  changes the documented contract.
 2026-06-26 progress:
 - **ADR-004 written** — `docs/adr/adr-004-producer-trust-boundary.md` documents
  the untrusted-producer premise (erroneous + malicious; LLM/agent/human), the
  A-vs-B posture taxonomy, the four governing principles, the concrete
  activity-core mechanisms, a posture-by-layer table, consequences, and
  alternatives considered. Accepted, scope cross-repo.
 - **Producer guardrails implemented** in `executor.py`, applied uniformly on the
  happy path *and* the recovery path via `_partition_items`: per-item order is
  structural-type → schema → structural caps (`_MAX_DEPTH=8`,
  `_MAX_STRING_LEN=4000`) → reference allow-list → count cap (`maxItems`). Each
  quarantine carries a `reason` (`malformed`/`schema`/`guardrail`/`allow_list`/
  `over_limit`).
 - **Happy-path count cap closed** (the item deferred from T03): a syntactically
  valid 9-item report now keeps 7 and quarantines 2 as `over_limit`, emitting a
  `partial` report — without a retry.
 - **Reference allow-list wired but inert.** `_allow_list_from_context` reads
  `context["known_candidates"]`; when present, recommendations with an unknown
  `candidate` are quarantined (`reason: allow_list`). Absent today → check is
  inert; activation is a one-line context-resolver change. Keeps the guardrail
  producer-agnostic (principle #4) and ready.
 - **SCOPE.md updated** — instruction-executor bullet now names the quarantine
  lane + guardrails; ADR-004 added to the Architecture Decisions list. No INTENT
  drift: this hardens the existing output contract, it does not extend scope.
 - New tests: happy-path count cap, oversized-string guardrail, allow-list
  rejection (all green).
 ## Tests + Calibration Re-Entry
 ```task
 id: ACTIVITY-WP-0016-T05
 status: progress
 priority: high
 state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
 ```
 Prove the new posture and hand back to the calibration gates.
 Done when:
 - regression tests cover: the captured 06-26 payload, a truncated-mid-list
  payload, a one-bad-item-among-good payload (asserts quarantine + partial), an
  oversized/over-deep payload (asserts guardrail rejection), and an
  injection-shaped reference (asserts allow-list rejection);
 - the full suite passes and the result is recorded here with the count;
 - a daily-triage smoke against the live runtime shows a previously-failing
  payload now **degrades gracefully** (valid items delivered, bad items
  quarantined) instead of discarding the run;
 - a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03`
  that the output-robustness blocker is cleared so the three-clean-run gate can
  resume on its own.
 2026-06-26 progress (in-repo portion complete):
 - **Regression coverage complete.** Across T03/T04/T05: truncated-mid-list,
  one-bad-item-among-good (quarantine + partial), oversized-string and over-depth
  guardrail rejection, allow-list (injection-shaped) rejection, happy-path count
  cap, and a test driving the **actual captured 2026-06-26 payload**
  (`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`)
  — it now recovers 6+ valid recommendations and quarantines the truncated tail,
  where before it discarded the whole run.
 - **Full suite green:** 218 passed, 1 skipped (recorded at T04; the T05 fixture +
  over-depth tests add to this — see the commit).
 - **Hand-back notes posted** to `ACTIVITY-WP-0006-T03` (State Hub event
  `b6b8c2b8`) and `ACTIVITY-WP-0010-T04` (`b813f0dc`).
 - **Remaining (remote, operator-owned):** the live daily-triage smoke on
  `railiance01` proving end-to-end graceful degradation. It depends on deploying
  the T02 bundle prompt/`max_tokens`/NDJSON changes together with this code, which
  is cluster/operator work outside this repo's SCOPE. T05 therefore stays
  `progress` until that live run exists; the in-repo deliverables are done.
 ## Relationships
 - **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
  `ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on
  the same output-quality failure this workplan removes.
 - **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap).
 - **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens
  the instruction-executor output contract; it does not move provider
  credentials, cluster reconciliation, or task lifecycle into this repo.
--- a/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
+++ b/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
@@ -0,0 +1,58 @@
 ---
 id: ACTIVITY-WP-0017
 type: workplan
 title: "Core Hub ops evidence sink"
 domain: infotech
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-27"
 updated: "2026-06-27"
 state_hub_workstream_id: "2a073bf4-febf-433e-a721-5daf71760912"
 ---
 # Core Hub ops evidence sink
 ## Goal
 Provide the activity-core side of the Core Hub replacement evidence path for
 `CORE-WP-0008-T03`, without depending on the legacy Haskell Inter-Hub sink and
 without placing secret material in activity definitions, logs, State Hub, or
 chat.
 ## Task: Add Core Hub interaction-event sink
 ```task
 id: ACTIVITY-WP-0017-T01
 status: done
 priority: high
 state_hub_task_id: "32aab1af-6be5-4b52-afa1-c11f52c65892"
 ```
 Add a `core-hub-interaction-event` ops evidence sink that posts sanitized
 ops-inventory probe evidence to Core Hub `/api/v2/interaction-events`, verifies
 the created event is visible, and reports only non-secret ids/statuses.
 Acceptance:
 - runtime token is read through `CORE_HUB_RUNTIME_TOKEN_FILE` or a named
 environment variable, never from workplan content;
 - sink configuration accepts `CORE_HUB_BASE_URL` and a widget id or widget
 mapping;
 - emitted metadata reuses the existing compact/sanitized probe evidence path;
 - missing Core Hub config skips cleanly with explicit non-secret missing keys;
 - tests prove the POST/visibility check and secret non-disclosure.
 Verification 2026-06-27: `tests/test_ops_evidence_sinks.py` passed, and
 a disposable local Core Hub runtime accepted an activity-core
 `core-hub-interaction-event` sink emission, then listed the created
 `ops-endpoint-verified` event back through `/api/v2/interaction-events`.
 The verification asserted sanitized metadata did not include response body,
 authorization header, URL userinfo, or token query material.
 Completed 2026-06-27: implemented the Core Hub interaction-event sink in
 `activity_core.ops_evidence_sinks` with unit coverage for POST/visibility
 verification, missing config behavior, and secret non-disclosure. This provides
 the direct Core Hub consumer path needed by `CORE-WP-0008-T03`; deployed use
 still requires an approved Core Hub runtime token and widget id/mapping.
--- a/workplans/archived/260603-WP-0002-next-steps.md
+++ b/workplans/archived/260603-WP-0002-next-steps.md
@@ -3,6 +3,7 @@ type: session-note
 created: "2026-03-28"
 updated: "2026-06-03"
 status: archived
 state_hub_workstream_id: "b221e65a-6f97-44b0-8dae-442fffcb7f64"
 ---
 # WP-0002 Handoff Note — Continue on CoulombCore
		`@@ -0,0 +1,2 @@`
							`{"agent": "coach", "execution_time_s": 120.0, "quality_score": 0.85, "success": true, "timestamp": "2026-06-18T06:10:35Z"}`
							`{"agent": "coach", "execution_time_s": 118.0, "quality_score": 0.86, "success": true, "timestamp": "2026-06-18T10:06:38Z"}`
		`@@ -0,0 +1,2 @@`
							`{"agent": "optimization", "execution_time_s": 90.0, "quality_score": 0.8, "success": true, "timestamp": "2026-06-18T06:10:35Z"}`
							`{"agent": "optimization", "execution_time_s": 88.0, "quality_score": 0.81, "success": true, "timestamp": "2026-06-18T10:06:38Z"}`