ISSUE-WP-0003-T06: issue-core REST sink via actcore-issue-core-bridge (node-local tunnel 18765)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
chore(consistency): sync task status from DB [auto]
2026-07-02 14:20:12 +02:00 · 2026-07-02 11:55:47 +02:00 · 2026-07-02 11:55:07 +02:00 · 2026-07-02 11:54:43 +02:00 · 2026-07-02 11:54:04 +02:00 · 2026-07-02 10:44:00 +02:00
68 changed files with 6890 additions and 207 deletions
--- a/.claude/rules/credential-routing.md
+++ b/.claude/rules/credential-routing.md
@@ -0,0 +1,50 @@
+# Credential and access routing
+
+**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
+for inference. Run this check **before** requesting secrets, API keys, SSH access,
+login tokens, or database passwords — in any repo, not only `ops-warden`.
+
+ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
+other credential need belongs to another subsystem. **Do not** message
+`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
+
+### Lookup (do this first)
+
+```bash
+warden route find "<describe your need>" --json
+warden route show <catalog-id> --json
+```
+
+Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
+
+| Agent runtime | How to orient |
+| --- | --- |
+| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
+| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
+| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
+
+### Quick routing table
+
+| I need… | Owner | ops-warden executes? |
+| --- | --- | --- |
+| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
+| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
+| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
+| Authorization decision | flex-auth | No — route only |
+| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
+| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
+
+### Anti-patterns (do not do these)
+
+- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
+- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
+- Pasting secrets into Git, State Hub, workplans, logs, or chat
+
+### Other capabilities (reuse-surface)
+
+Non-credential capabilities are usually discovered through **reuse-surface** federation
+(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
+every repo's agent instructions because it is high-frequency, high-risk, and easy to
+get wrong.
+
+**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
--- a/.claude/rules/first-session.md
+++ b/.claude/rules/first-session.md
@@ -1,11 +1,11 @@
 ## First Session Protocol

-Triggered when `get_domain_summary("custodian")` shows **no workstreams**.
+Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
 The project is registered but work has not yet been structured.

 **Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/custodian/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/custodian/roadmap_v0.1.md` — planned phases
+- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
+- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
 - Scan repo root: README, directory structure, existing code or docs

 **Step 2 — Survey in-progress work**
@@ -17,7 +17,7 @@ roadmap phase. **Wait for approval before creating.**

 **Step 4 — Create workplan file first, then DB record (ADR-001)**
 ```
-workplans/activity-core-WP-NNNN-<slug>.md   ← write this first
+workplans/ACTIVITY-WP-NNNN-<slug>.md   ← write this first
 ```
 Then register in the hub:
 ```
@@ -28,7 +28,7 @@ create_task(workstream_id="<id>", title="...", priority="high|medium|low")
 **Step 5 — Record the setup**
 ```
 add_progress_event(
-    summary="First session: structured custodian into N workstreams, M tasks",
+    summary="First session: structured infotech into N workstreams, M tasks",
    event_type="milestone",
    topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
    detail={"workstreams": [...], "tasks_created": M}
--- a/.claude/rules/repo-identity.md
+++ b/.claude/rules/repo-identity.md
@@ -1,5 +1,5 @@
 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.

-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a
--- a/.claude/rules/session-protocol.md
+++ b/.claude/rules/session-protocol.md
@@ -1,6 +1,7 @@
 ## Session Protocol

-State Hub: http://127.0.0.1:8000
+Dev Hub (State Hub API): http://127.0.0.1:8000
+MCP server name in `~/.claude.json`: `dev-hub`

 **Step 1 — Orient**

@@ -10,7 +11,7 @@ cat .custodian-brief.md
 ```
 Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
 ```
-get_domain_summary("custodian")
+get_domain_summary("infotech")
 ```
 If MCP tools are unavailable in the current agent session, use the REST API:
 ```bash
@@ -39,11 +40,11 @@ curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
 ls workplans/
 ```
 For each file with `status: ready`, `active`, or `blocked`, note pending
-`todo`/`in_progress` tasks.
+`wait`/`todo`/`progress` tasks.

 **Step 4 — Present brief**

-1. **Active workstreams** for `custodian` — title, task counts, blocking decisions
+1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
 2. **Pending tasks** from `workplans/` + any `[repo:activity-core]` hub tasks
 3. **Goal guidance** — if `goal_guidance` in summary:
   - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
--- a/.claude/rules/workplan-convention.md
+++ b/.claude/rules/workplan-convention.md
@@ -1,7 +1,7 @@
 ## Workplan Convention (ADR-001)

-File location: `workplans/activity-core-WP-NNNN-<slug>.md`
-ID prefix: `ACTIVITY-WP`
+File location: `workplans/ACTIVITY-WP-NNNN-<slug>.md`
+ID prefix: `ACTIVITY-WP-`

 Work items originate as files in this repo **before** being registered in the hub.

@@ -12,7 +12,7 @@ repo state, and `finished` when implementation is complete. `stalled` and
 `needs_review` are derived health labels, not stored statuses.

 Closed workplans may be moved to `workplans/archived/` with a completion-date
-prefix: `YYMMDD-activity-core-WP-NNNN-<slug>.md`. The frontmatter id remains
+prefix: `YYMMDD-ACTIVITY-WP-NNNN-<slug>.md`. The frontmatter id remains
 unchanged; the prefix is only for quick visual reference.

 Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
@@ -25,4 +25,16 @@ Ecosystem todos from other agents arrive as `[repo:activity-core]` hub tasks —
 visible at session start. Pick one up by creating the workplan file, then registering
 the workstream.

+Task blocks use this shape:
+
+```task
+id: ACTIVITY-WP-NNNN-T01
+status: wait | todo | progress | done | cancel
+priority: high | medium | low
+state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
+```
+
+Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
+blocked work and `cancel` for stopped work.
+
 <!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
--- a/.custodian-brief.md
+++ b/.custodian-brief.md
@@ -1,33 +1,23 @@
 <!-- custodian-brief: generated by fix-consistency — do not edit manually -->
 # Custodian Brief — activity-core

-**Domain:** custodian  
-**Last synced:** 2026-06-18 13:20 UTC  
+**Domain:** infotech  
+**Last synced:** 2026-07-02 09:55 UTC  
 **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*

 ## Active Workstreams

-### Definition And Schedule Hot Reload
-Progress: 0/5 done  |  workstream_id: `8887075e-21ec-451b-b82b-cd81035c9ca5`
+### Adopt State Hub Beachhead Endpoint
+Progress: 0/2 done  |  workstream_id: `bbc07f9e-9323-4b2b-b556-c33b37d0b228`

 **Open tasks:**
- ! Live No-Restart Smoke  `68a0e22a`
- · Extract Reusable Sync Service  `53a7970b`
- · Add Admin Sync Endpoint  `8697c761`
- · Preserve Schedule Drift Semantics  `efeac412`
- · Optional Background Sync Loop  `d774087b`
-
-### Post-triage operational hardening
-Progress: 6/7 done  |  workstream_id: `5646e13a-13af-4724-bca6-3c0d86f96733`
-
-**Open tasks:**
- ! Three-Run Calibration Feedback  `7cbf0a35`
+- ! Point STATE_HUB_URL at the beachhead  `76b6132d`
+- ! Retire the bespoke actcore-state-hub-bridge proxy  `526c2129`

 ### Daily Triage LLM Reconciliation And Evidence
-Progress: 1/5 done  |  workstream_id: `f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9`
+Progress: 2/5 done  |  workstream_id: `f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9`

 **Open tasks:**
- ! Reconcile Live Railiance Runtime  `23545ddc`
 - ! Run Daily Triage Fixture Smoke  `10e0df77`
 - ! Collect Three Clean Scheduled Runs  `dc6b9482`
 - ! Close Handoff State  `ecc57e21`
@@ -49,6 +39,6 @@ Progress: 2/3 done  |  workstream_id: `7387fc50-1f2c-471a-9d85-bb085cbd0b63`
 ## MCP Orientation (when available)

 If the state-hub MCP server is reachable, call:
-`get_domain_summary("custodian")`
+`get_domain_summary("infotech")`
 This provides richer cross-domain context.
 If the MCP call fails, use this file as your orientation source.
--- a/.env.example
+++ b/.env.example
@@ -18,7 +18,9 @@ STATE_HUB_URL=http://127.0.0.1:8000
 # Repo scoping — used by the repo-scoping context adapter. Binds {} on failure.
 REPO_SCOPING_URL=http://127.0.0.1:8020
 # Issue Core — task emission backend.
-ISSUE_CORE_URL=http://127.0.0.1:8010
+ISSUE_CORE_URL=http://127.0.0.1:8765
+# Shared ingestion key — must match issue-core's ISSUE_CORE_API_KEY.
+ISSUE_CORE_API_KEY=
 # Sink type: 'rest' (POST to issue-core) or 'null' (discard, for dry-run).
 ISSUE_SINK_TYPE=rest

--- a/.kaizen/schedule.yml
+++ b/.kaizen/schedule.yml
@@ -1,17 +1,15 @@
-# Kaizen scheduled agent execution (ADR-005)
-# Engagement: coulomb-loop — stabilize phase (daily crons per ADR-003)
-# Promoted 2026-06-18 after 3/3 bootstrap E2E cycles
+# Kaizen scheduled agent execution manifest (ADR-005)
+# Engagement: coulomb-loop bootstrap — weekly cadence
+# Regulator promotes cadence per customer engagement policy (ADR-003).
+# Validate with: kaizen-agentic schedule validate
 version: '1'
 timezone: Europe/Berlin
 agents:
  coach:
-    cadence: daily
-    cron: "0 9 * * *"
+    cadence: weekly
+    cron: 0 9 * * 1
    enabled: true
  optimization:
-    cadence: daily
-    cron: "0 10 * * *"
+    cadence: weekly
+    cron: 0 10 * * 1
    enabled: true
-  tdd-workflow:
-    cadence: monthly
-    enabled: false
--- a/.repo-classification.yaml
+++ b/.repo-classification.yaml
@@ -0,0 +1,28 @@
+# Repo classification (Repo Classification Standard v1.0).
+
+repo_classification:
+  standard: Repo Classification Standard
+  version: '1.0'
+  classified_at: '2026-06-22'
+  classified_by: human
+  category: tooling
+  domain: infotech
+  secondary_domains:
+  - agents
+  capability_tags:
+  - workflow
+  - orchestration
+  - automation
+  - coordination
+  - observability
+  business_stake:
+  - technology
+  - operations
+  - automation
+  - execution
+  business_mechanics:
+  - coordination
+  - operation
+  - adaptation
+  notes: Org-wide event bridge / task factory (Temporal-based). Active bounded implementation
+    -> project.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -4,7 +4,7 @@

 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.

-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
 **Workplan prefix:** `ACTIVITY-WP-`
@@ -83,7 +83,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
 1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
 2. Check inbox: `GET /messages/?to_agent=activity-core&unread_only=true`; mark read
 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
-4. Check blocked tasks: `GET /tasks/?needs_human=true`
+4. Check human-needed tasks: `GET /tasks/?needs_human=true`

 **During work:**
 - Update task statuses in workplan files as tasks progress
@@ -101,6 +101,78 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \

 ---

+## Credential and access routing
+
+**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
+for inference. Run this check **before** requesting secrets, API keys, SSH access,
+login tokens, or database passwords — in any repo, not only `ops-warden`.
+
+ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
+other credential need belongs to another subsystem. **Do not** message
+`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
+
+### Lookup (do this first)
+
+```bash
+warden route find "<describe your need>" --json
+warden route show <catalog-id> --json
+```
+
+Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
+
+| Agent runtime | How to orient |
+| --- | --- |
+| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
+| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
+| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
+
+### Quick routing table
+
+| I need… | Owner | ops-warden executes? |
+| --- | --- | --- |
+| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
+| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
+| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
+| Authorization decision | flex-auth | No — route only |
+| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
+| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
+
+### Anti-patterns (do not do these)
+
+- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
+- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
+- Pasting secrets into Git, State Hub, workplans, logs, or chat
+
+### Other capabilities (reuse-surface)
+
+Non-credential capabilities are usually discovered through **reuse-surface** federation
+(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
+every repo's agent instructions because it is high-frequency, high-risk, and easy to
+get wrong.
+
+**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
+
+<!-- REPO-AGENTS-EXTENSIONS -->
+<!-- Append repo-specific agent instructions below this marker.
+     The state-hub template sync preserves content after this line. -->
+
+---
+
+## Automation Scheduling Preference
+
+Durable activity-core automations must use this repo's own infrastructure:
+Temporal Schedules, NATS JetStream, activity-core run records, State Hub
+progress, and configured report/evidence sinks. Do not use coding
+assistant-provided automation, reminder, or heartbeat tooling as the execution
+or evidence source for production or operational recurrence.
+
+Coding assistants may run repo-native inspection commands and summarize their
+outputs, but the baseline answer to questions like "How did our automations go
+since Friday?" must come from deterministic local tooling such as the
+ACTIVITY-WP-0018 automation status surface.
+
+---
+
 ## Workplan Convention (ADR-001)

 Work items originate as files in this repo — not in the hub. The hub is a
@@ -124,7 +196,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
 id: ACTIVITY-WP-NNNN
 type: workplan
 title: "..."
-domain: custodian
+domain: infotech
 repo: activity-core
 status: proposed | ready | active | blocked | backlog | finished | archived
 owner: codex
@@ -154,10 +226,7 @@ state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 Task description text.
 ```

-Status progression: `todo` → `progress` → `done`; use `wait` for a task
-blocked on external input and `cancel` for intentionally abandoned work.
-Workstream/workplan lifecycle status is separate; frontmatter `blocked` remains
-valid there.
+Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.

 To create a new workplan:
 1. Write the file following the format above
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,4 +8,5 @@
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
+@.claude/rules/credential-routing.md
@.claude/rules/agents.md
--- a/27
+++ b/27
@@ -1,13 +1,17 @@
 -include .env
 export

-.PHONY: sync-event-types sync-activity-definitions test migrate sync-all \
+.PHONY: sync-event-types sync-activity-definitions sync-schedules test migrate sync-all \
+        automation-status automation-status-json automation-list automation-list-json \
        dev-up dev-down railiance-up railiance-down \
        start-worker start-api start-event-router help

 sync-activity-definitions:  ## Sync ActivityDefinition files into DB
 	uv run python -m activity_core.sync_activity_definitions

+sync-schedules:  ## Reconcile Temporal schedules from activity_definitions DB
+	uv run python -m activity_core.sync_schedules
+
 sync-event-types:  ## Sync event type YAML files into DB
 	uv run python scripts/sync_event_types.py

@@ -21,6 +25,27 @@ migrate:  ## Apply all pending Alembic migrations

 sync-all: sync-event-types sync-activity-definitions  ## Sync event types and activity definitions

+# -- Automation status ---------------------------------------------------------
+
+SINCE ?= today
+FORMAT ?= human
+ENABLED ?= all
+TRIGGER ?=
+ACTIVITY_ID ?=
+ACTIVITY_NAME ?=
+
+automation-status:  ## Report recent automation status from repo-owned evidence
+	uv run python scripts/automation_status.py --since "$(SINCE)" $(if $(UNTIL),--until "$(UNTIL)",) --format "$(FORMAT)"
+
+automation-status-json:  ## Report recent automation status as JSON
+	$(MAKE) automation-status FORMAT=json
+
+automation-list:  ## List configured scheduled automations from repo-owned definitions
+	@uv run python scripts/automation_inventory.py --format "$(FORMAT)" --enabled "$(ENABLED)" $(if $(TRIGGER),--trigger-type "$(TRIGGER)",) $(if $(ACTIVITY_ID),--activity-id "$(ACTIVITY_ID)",) $(if $(ACTIVITY_NAME),--activity-name "$(ACTIVITY_NAME)",)
+
+automation-list-json:  ## List configured scheduled automations as JSON
+	@$(MAKE) --no-print-directory automation-list FORMAT=json
+
 # ── Infrastructure ─────────────────────────────────────────────────────────────

 dev-up:  ## Start full dev stack (Temporal + PG + ES + NATS)
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -64,7 +64,9 @@ The two evaluation modes:
  `context.*` / `event.*` interpolation and explicit `for_each` per-item
  binding. No `exec()`.
 - **Instruction executor**: trusted-field prompt rendering, LLM call via
-  llm-connect, structured output validation, bounded validation-failure
+  llm-connect, structured output validation, item-granular recovery with a
+  quarantine lane and producer guardrails (count/length/depth caps, reference
+  allow-list) at the producer trust boundary, bounded validation-failure
  artifacts for report instructions, review-required audit metadata, and
  deterministic report sinks. A real downstream review queue is not implemented
  in this repo.
@@ -88,6 +90,9 @@ The two evaluation modes:
 - **REST admin API** (FastAPI): CRUD for ActivityDefinitions, manual trigger,
  event type registry queries.
 - **Prometheus metrics**: Temporal SDK metrics exposed for scraping.
+- **Automation status surface**: deterministic, non-LLM status reporting via
+  `make automation-status` / `scripts/automation_status.py`, using repo-owned
+  evidence sources rather than coding assistant scheduler state.
 - **Operational runbook**: `docs/runbook.md`.

 ---
@@ -114,6 +119,10 @@ The two evaluation modes:
  runs on Railiance infrastructure (or Docker Compose for dev).
 - **End-user task UI** — tasks land in issue-core; presentation is separate.
 - **Synchronous request-response patterns** — Temporal is async-first.
+- **Coding assistant automation infrastructure** — assistant-provided reminders,
+  heartbeats, or scheduled jobs are not the execution or evidence authority for
+  activity-core automations. Assistants may run and summarize repo-native
+  commands only.

 ---

@@ -130,6 +139,8 @@ The two evaluation modes:
  commands.
 - You are replacing scattered bespoke cron jobs and manual coordination with
  a governed, observable automation layer.
+- You need to answer "how did our automations go since Friday?" from
+  deterministic repo-native evidence before any optional LLM summary.

 ---

@@ -320,6 +331,9 @@ new one-off control paths.
  governance model, event type schema, ActivityDefinition structure.
 - `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
  model, evaluation semantics, audit trail, testing strategy.
+- `docs/adr/adr-004-producer-trust-boundary.md` — untrusted-producer premise,
+  trust-but-handle vs verify-and-mitigate postures, error-locality and
+  quarantine-with-provenance, producer guardrails for LLM/agent/human output.

 ---

--- a/docs/adr/adr-004-producer-trust-boundary.md
+++ b/docs/adr/adr-004-producer-trust-boundary.md
@@ -0,0 +1,156 @@
+---
+id: ACT-ADR-004
+type: architecture-decision-record
+title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output"
+status: accepted
+decided_by: Bernd Worsch
+date: "2026-06-26"
+scope: cross-repo
+affects:
+  - activity-core
+  - rules-core (future extraction)
+tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"]
+---
+
+# ACT-ADR-004: The Producer Trust Boundary
+
+## Status
+
+Accepted.
+
+## Context
+
+On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called
+llm-connect successfully, and produced a long ranked recommendation list — but
+the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because
+the report was validated and consumed as a single monolithic JSON document, one
+malformed delimiter discarded the **entire** run, including the 7 perfectly good
+recommendations the model had already emitted. The scheduling and runtime layers
+were healthy; the failure was entirely at the seam where free-form model output
+meets a strict consumer.
+
+This is not a one-off bug, it is a recurring class. activity-core has a **trust
+boundary** wherever generative or human-authored output meets strict deterministic
+consumers: the JSON Schema validator, the task emitter, and any classic compute
+pipeline downstream. The producers on the other side of that boundary — **LLMs,
+agents, and humans** — are all *untrusted producers*. Their output may be:
+
+- **erroneous** — hallucination, truncation at a token limit, drift, type slips,
+  typos, a missing delimiter; or
+- **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested
+  structures intended to exhaust or confuse the consumer.
+
+The pre-existing design treated producer output optimistically: parse the whole
+document, validate the whole document, and on any failure discard the whole
+document (preserving only a bounded diagnostic preview). That gives **zero error
+locality** — the blast radius of any single defect is the entire activation.
+
+## Decision
+
+Treat the producer→consumer seam as an explicit, adversarial **trust boundary**,
+and place guardrails plus error-correction tooling *at that boundary* rather than
+letting raw producer output flow into deterministic consumers.
+
+### Two non-fail-fast postures
+
+When hard-failing on a problem is undesirable, there are two sound strategies, and
+they **compose**:
+
+- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
+  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy
+  path; blast radius depends entirely on how granular the catch is. Best when
+  failures are rare and locally recoverable. Risk: failures surface late, possibly
+  after partial side effects.
+- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
+  and normalize the output to a known-good shape *before* it enters the pipeline —
+  drop bad items, coerce types, bound sizes/depth, allow-list references — so the
+  consumer only ever sees clean input. Higher upfront cost, smaller blast radius,
+  no partial side effects. Best when failures are common or consequences are high.
+
+### Governing principles
+
+1. **Push verification to the boundary; keep the interior strict.** Apply posture
+   **B** at the producer→consumer boundary; keep posture **A** for residual
+   exceptions inside the verified core. Never relax the interior schema to absorb
+   producer sloppiness.
+2. **Make error locality match the unit of work.** One bad recommendation must
+   cost one recommendation, not the whole report. Structuring the payload so each
+   item is independently parseable and validatable is the highest-leverage change.
+3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
+   provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they
+   can be debugged or replayed. Degraded-but-usable is reported distinctly from
+   total loss.
+4. **Both human and agent input get the same rigor.** Guardrails are
+   producer-agnostic: the same count / length / depth caps and reference
+   allow-lists apply whether the producer is an LLM, an agent, or a human.
+
+### What this means concretely in activity-core
+
+Implemented in `src/activity_core/rules/executor.py`:
+
+- **Strict-structure-only schema.** The daily-triage output schema is strict on
+  per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`)
+  and carries `maxItems` as a producer *hint* — never as a hard whole-document
+  reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs
+  the schema format; `schemas/daily-triage-report.json`).
+- **Item-granular recovery (posture B).** When whole-document parse + one retry
+  fail, `_resilient_report` recovers individually-parseable recommendation objects
+  via a brace/quote-aware scanner (`_extract_object_spans`) that works for both
+  pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a
+  truncated tail, validates each recovered object against the item schema, and
+  keeps the valid ones. Survivors are emitted with `output_validated=true`,
+  `partial=true`, and `review_required=true`.
+- **Producer guardrails (`_partition_items`, applied on both the recovery and the
+  happy path).** Per recommendation: structural type → schema → structural caps
+  (`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by
+  `maxItems`). The first failing check quarantines the item with provenance and a
+  `reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`).
+- **Reference allow-list.** A recommendation whose `candidate` is not in the set of
+  known ids is quarantined. The set is sourced from resolved context
+  (`context["known_candidates"]`, via `_allow_list_from_context`); the check is
+  inert until a context resolver populates it, so the capability ships now and
+  activates with a one-line resolver change.
+
+### Where each posture sits
+
+| Layer | Posture | Mechanism |
+|-------|---------|-----------|
+| Schema / contract | B | strict per-item structure; `maxItems` as hint |
+| Whole-document parse | A | tolerant parse + single retry |
+| Failed parse | B | item-granular recovery + repair + quarantine |
+| Per-item screening | B | schema + depth/length caps + allow-list + count cap |
+| Emitted report | — | `partial` / `quarantined_*` provenance; never silent |
+
+## Consequences
+
+- A single malformed or oversized item no longer discards an entire activation;
+  the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid
+  recommendations and quarantine the broken tail.
+- Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks
+  and reviewers can distinguish degraded-but-usable from total loss.
+- Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the
+  allow-list) are policy knobs that will need tuning; they are intentionally
+  conservative defaults, not a finished calibration.
+- **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns
+  only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact
+  caps raw output below realistic break points. Capturing those signals so
+  failures stay debuggable is tracked as a retention fix, not closed by this ADR.
+
+## Alternatives considered
+
+- **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an
+  over-count document reproduces the whole-document blast radius. Mitigation (keep
+  top-N, quarantine the rest) is preferred.
+- **Relax the schema to accept anything.** Rejected: violates principle 1; pushes
+  malformed data into downstream consumers.
+- **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the
+  2026-06-26 failure recurred across both the initial attempt and the retry, so
+  retry alone does not bound the blast radius.
+
+## References
+
+- ACT-ADR-002 — markdown-as-definition format and output schema governance.
+- ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection
+  surface this boundary complements on the output side.
+- `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the
+  implementing workplan.
--- a/docs/issue-core-emission-boundary.md
+++ b/docs/issue-core-emission-boundary.md
@@ -11,7 +11,9 @@ The current authoritative boundary is the issue-core REST API:
 POST {ISSUE_CORE_URL}/issues/
 ```

-`IssueCoreRestSink` sends this payload:
+`IssueCoreRestSink` authenticates with the shared `ISSUE_CORE_API_KEY` env var
+(same value as the issue-core server) via `Authorization: Bearer <key>` and
+sends this payload:

 ```json
 {
@@ -52,7 +54,7 @@ task reference before it can replace `IssueCoreRestSink`.

 Weekly SBOM staleness is safe to evaluate in dry-run mode because the rule
 contract is deterministic and tested. Do not enable it against the real REST sink
-until issue-core credentials, endpoint reachability, and duplicate-handling are
+until `ISSUE_CORE_API_KEY`, endpoint reachability, and duplicate-handling are
 verified in the target environment.

 ## Verification
--- a/docs/runbook.md
+++ b/docs/runbook.md
@@ -116,7 +116,129 @@ asyncio.run(publish())

 ---

-## Syncing schedules manually
+## Syncing definitions and schedules manually
+
+When the API is running, prefer the admin sync endpoint for definition or
+schedule changes. It refreshes file-backed ActivityDefinitions and reconciles
+Temporal Schedules without restarting the worker:
+
+```bash
+curl -s -X POST \
+  'http://localhost:8010/admin/sync?definitions=true&schedules=true'
+```
+
+The response reports:
+
+- `definitions.synced`
+- `event_types.synced`
+- `schedules.upserted`
+- `schedules.paused`
+- `schedules.deleted_orphans`
+- bounded `errors[]`
+
+## Automation inventory
+
+Use the repo-native inventory command to answer "what automations are scheduled
+at all?" before checking whether a recent window succeeded. The command is
+read-only: it loads ActivityDefinition rows or files and, when `TEMPORAL_HOST`
+is configured, describes Temporal schedules for visibility. It does not sync,
+upsert, pause, delete, or enqueue schedules.
+
+```bash
+# Human-readable configured automation inventory.
+make automation-list
+
+# JSON for scripts or assistant summarization.
+make automation-list-json
+
+# Common filters.
+make automation-list ENABLED=true TRIGGER=cron
+make automation-list ACTIVITY_ID=6fca51fa-387a-4fd0-bc4e-d62c29eb859a
+```
+
+Inventory answers what is configured; `make automation-status` answers what
+happened in a time window. Missing optional live sources are warnings, not
+silent omissions, so a degraded local run still lists repo definition files.
+
+Compact human output looks like:
+
+```text
+- Daily State Hub WSJF Triage [enabled cron] schedule=activity-schedule-... trigger=20 7 * * * tz=Europe/Berlin source=files temporal=not_checked
+```
+
+## Automation status
+
+Use the repo-native status command to answer operator questions such as "how did
+our automations go since Friday?". This is the baseline evidence surface; LLMs
+or coding assistants may summarize the output, but they are not the scheduler or
+source of truth.
+
+```bash
+# Human-readable status. `friday` resolves in Europe/Berlin by default.
+make automation-status SINCE=friday
+
+# JSON for scripts or assistant summarization.
+make automation-status-json SINCE=2026-06-26
+```
+
+The command reads activity-core owned evidence only: ActivityDefinition files or
+DB rows, `activity_runs`, State Hub progress, working-memory report notes, and
+Temporal visibility when `TEMPORAL_HOST` is configured. Missing live sources are
+reported as warnings rather than hidden. It exits non-zero for real automation
+failures such as `missed`, `validation_failed`, or `sink_failed`.
+
+Useful knobs:
+
+```bash
+AUTOMATION_STATUS_TIMEOUT_SECONDS=10 make automation-status SINCE=friday
+make automation-status SINCE=2026-06-26 FORMAT=json
+make automation-status SINCE=2026-06-26 UNTIL=2026-06-27 ACTCORE_DB_URL=
+```
+
+Example distinction from the June 2026 daily triage evidence:
+
+```text
+- Activity 6fca51fa-387a-4fd0-bc4e-d62c29eb859a [validation_failed] expected=0 runs=0 evidence=2
+  evidence state_hub_progress event_type=daily_triage run=ebec6e41... output_validated=false validation_error=Unterminated string...
+  evidence state_hub_progress event_type=daily_triage run=c7370f9c... output_validated=false validation_error=Expecting ',' delimiter...
+```
+
+That means the schedule/report path left evidence, but the report was not a
+clean validated output. Disabled schedules, such as the gated weekly coding
+retro, are reported as `disabled` and are not counted as missed runs.
+
+`event_types` defaults to `false` for this endpoint because event-triggered
+definitions already reload from the DB in the event router path; opt in when
+the operator intentionally changed event type definition files:
+
+```bash
+curl -s -X POST \
+  'http://localhost:8010/admin/sync?definitions=true&schedules=true&event_types=true'
+```
+
+The v1 posture is manual/operator-triggered sync. A periodic background loop is
+deferred until live use shows it is needed; this keeps customer definition
+changes explicit and avoids background repo scanning from the worker.
+
+### Railiance01 no-restart smoke
+
+After changing a projected definition in `k8s/railiance/20-runtime.yaml`,
+apply the ConfigMap and wait for the API pod volume to refresh (up to ~60s),
+then reconcile without restarting `actcore-worker`:
+
+```bash
+export KUBECONFIG=~/.kube/config-hosteurope
+kubectl apply -f k8s/railiance/20-runtime.yaml
+sleep 60
+kubectl -n activity-core exec deploy/actcore-api -- \
+  python3 -c 'import urllib.request; req=urllib.request.Request("http://localhost:8010/admin/sync?definitions=true&schedules=true", method="POST"); print(urllib.request.urlopen(req).read().decode())'
+```
+
+Automated regression for the disabled `ops-service-inventory-probes`
+projection (enable/cadence flip, idempotent repeat sync, rollback) lives in
+`scripts/smoke_admin_sync_no_restart.py`.
+
+If the API is unavailable, the schedule-only CLI remains available:

 ```bash
 TEMPORAL_HOST=localhost:7233 \
@@ -126,7 +248,7 @@ ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \

 This reconciles all Temporal Schedules with the `activity_definitions` table:
 - Upserts schedules for every enabled cron definition
- Creates paused schedules for disabled cron definitions
+- Creates paused schedules for disabled cron or one-shot scheduled definitions
 - Deletes orphaned schedules with no matching DB row

 After adding or changing a recurring ActivityDefinition or workflow activity
@@ -282,6 +404,52 @@ the same durable consumer name provides automatic failover.

 ---

+## Run-miss recovery policies (cron triggers)
+
+A cron fire is **missed** when the worker or Temporal is unavailable at trigger
+time. `trigger_config.misfire_policy` selects what happens when the system
+recovers. Each policy combines a Temporal **catchup window** (how far back missed
+fires are recovered) with an **overlap policy** (what to do if a recovered fire
+would start while a prior run is still executing):
+
+| `misfire_policy` | Behaviour | Default catchup window | Overlap |
+| --- | --- | --- | --- |
+| `skip` | Run on trigger or skip — a missed fire is never recovered | 60s grace | `SKIP` |
+| `catchup_all` | Recover **every** fire missed during the outage | 365 days | `BUFFER_ALL` |
+| `catchup_latest` | Recover only the **most recent** missed fire; no backlog | 24h | `BUFFER_ONE` |
+
+Set `trigger_config.catchup_window_seconds` to override the per-policy default
+(e.g. an hourly definition using `catchup_latest` should set it to ~3600 so a
+single missed hour is recovered but older ones are not).
+
+Legacy values are still accepted: `catchup` → `catchup_all`,
+`compress` → `catchup_latest`.
+
+> **Why this exists:** before ACTIVITY-WP-0014 no catchup window was set, so a
+> brief outage at trigger time silently dropped the fire with no recovery and no
+> log line. The `daily-statehub-wsjf-triage` definition now uses `catchup_latest`.
+
+## State Hub write idempotency (ACTIVITY-WP-0014 T05)
+
+Every State Hub write from activity-core (report-sink progress, ops-evidence
+progress, schedule-miss alerts) carries a stable **`Idempotency-Key`** header
+derived deterministically from the write's identity
+(`run_id:instruction_id:event_type`, or `schedule_miss:activity_id:last_fired`
+for miss alerts). This makes writes safe to **buffer and replay** under the
+planned State Hub *beachhead* (per-machine read cache + write outbox): a flush —
+possibly retried after an outage — cannot create duplicate progress/triage
+events once State Hub / the beachhead honours the header.
+
+The guarantee lives on the write, not on a live dedup read. The read-based
+`_progress_exists` check is now best-effort only: if State Hub is unreachable it
+returns `False` (proceed to the keyed write) rather than hard-failing. The header
+passes untouched through the `actcore-state-hub-bridge` proxy and is ignored by
+State Hub versions that do not yet honour it.
+
+> The queue/cache itself is **not** built in activity-core — it belongs to the
+> state-hub beachhead. activity-core only emits the key. See the proposal sent to
+> the `state-hub` agent.
+
 ## Troubleshooting

 ### Worker fails to start: "ACTCORE_DB_URL is required"
@@ -291,6 +459,9 @@ Set the environment variable before running the worker.
 1. Check Temporal UI → Schedules tab for the schedule status.
 2. Ensure `enabled=True` on the ActivityDefinition (paused schedules don't fire).
 3. Verify the cron expression with: `docker exec temporal-admin-tools temporal schedule describe --schedule-id activity-schedule-<uuid>`
+4. If a fire was **missed entirely** (no run, no failure event) during an outage,
+   check `misfire_policy` — under `skip` missed fires are dropped by design. Use
+   `catchup_all` or `catchup_latest` to recover them. See *Run-miss recovery policies*.

 ### Event not routing
 1. Check NATS monitoring: http://localhost:8222/jsz to verify the `ACTIVITY_EVENTS` stream exists.
--- a/k8s/railiance/20-runtime.yaml
+++ b/k8s/railiance/20-runtime.yaml
@@ -14,8 +14,8 @@ data:
  LLM_CONNECT_URL: http://llm-connect.activity-core.svc.cluster.local:8080
  LLM_CONNECT_TIMEOUT_SECONDS: "300"
  REPO_SCOPING_URL: http://repo-scoping.repo-scoping.svc.cluster.local:8020
-  ISSUE_CORE_URL: http://issue-core.issue-core.svc.cluster.local:8010
-  ISSUE_SINK_TYPE: "null"
+  ISSUE_CORE_URL: http://actcore-issue-core-bridge.activity-core.svc.cluster.local:8765
+  ISSUE_SINK_TYPE: "rest"
  ACTIVITY_DEFINITION_DIRS: /etc/activity-core/external-definitions
  OPS_INVENTORY_PATH: /etc/activity-core/ops/service-inventory.yml
  INTER_HUB_URL: ""
@@ -47,7 +47,10 @@ data:
      type: cron
      cron_expression: "20 7 * * *"
      timezone: Europe/Berlin
-      misfire_policy: skip
+      # ACTIVITY-WP-0014: recover the most recent missed daily fire when the
+      # worker/Temporal was unavailable at trigger time, without accumulating a
+      # backlog after a multi-day outage.
+      misfire_policy: catchup_latest
    context_sources:
      - type: static
        bind_to: context.prompt_path
@@ -91,15 +94,19 @@ data:
      Score each recommendation with the WSJF rubric from the prompt:
      (strategic_value + time_criticality + risk_reduction +
      opportunity_enablement) / job_size. Use integer factor values from 1 to 5,
-      round score to one decimal place, sort recommendations by rank, and return at
-      most 10 recommendations.
+      round score to one decimal place, sort recommendations by rank, and return
+      only the bounded top-7 (at most 7) ranked recommendations. If uncertain,
+      emit fewer well-formed recommendations rather than more.

      Curated digest:
      {context.daily_triage_digest}

      Return only JSON matching
-      `/etc/activity-core/schemas/daily-triage-report.json`. Do not wrap the JSON
-      in Markdown fences or add prose before or after it:
+      `/etc/activity-core/schemas/daily-triage-report.json`. Emit the "summary"
+      field first, then inside the "recommendations" array write one complete
+      recommendation JSON object per line (NDJSON-style per-item framing) so
+      each item can be recovered independently if the output is truncated. Do
+      not wrap the JSON in Markdown fences or add prose before or after it:
      {
        "summary": "short operator-facing summary",
        "recommendations": [
@@ -164,6 +171,36 @@ data:

    Kubernetes projection of the Custodian-owned definition in
    `/home/worsch/the-custodian/activity-definitions/hourly-recently-on-scope.md`.
+  state-hub-consistency-sweep.md: |
+    ---
+    id: "7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b"
+    name: "State Hub Consistency Sweep"
+    type: activity-definition
+    version: "1.0"
+    enabled: true
+    owner: custodian
+    governance: custodian
+    status: active
+    created: "2026-06-21"
+    trigger:
+      type: cron
+      cron_expression: "*/15 * * * *"
+      timezone: UTC
+      misfire_policy: skip
+    context_sources:
+      - type: state-hub
+        query: consistency_sweep_remote_all
+        required: true
+        params:
+          max_seconds: 300
+          source: activity-core
+        bind_to: context.consistency_sweep_remote_all
+    ---
+
+    # ActivityDefinition: State Hub Consistency Sweep
+
+    Kubernetes projection of the Custodian-owned definition in
+    `/home/worsch/the-custodian/activity-definitions/state-hub-consistency-sweep.md`.
  ops-service-inventory-probes.md: |
    ---
    id: "40d15a87-7ff6-4d8e-992c-37df15f95110"
@@ -399,7 +436,7 @@ data:
        "recommendations": {
          "type": "array",
          "minItems": 1,
-          "maxItems": 10,
+          "maxItems": 7,
          "items": {
            "type": "object",
            "required": ["rank", "candidate", "action", "why", "confidence", "wsjf"],
@@ -408,7 +445,7 @@ data:
              "rank": {
                "type": "integer",
                "minimum": 1,
-                "maximum": 10
+                "maximum": 7
              },
              "candidate": {
                "type": "string"
@@ -578,7 +615,8 @@ spec:
                          method=self.command,
                      )
                      try:
-                          with urlopen(request, timeout=30) as response:
+                          timeout = 360 if self.command == "POST" else 30
+                          with urlopen(request, timeout=timeout) as response:
                              payload = response.read()
                              self.send_response(response.status)
                              for key, value in response.headers.items():
@@ -599,12 +637,123 @@ spec:
              ThreadingHTTPServer(("0.0.0.0", 18080), Proxy).serve_forever()
          readinessProbe:
            httpGet:
-              path: /state/summary
+              path: /state/health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 6
+apiVersion: v1
+kind: Service
+metadata:
+  name: actcore-issue-core-bridge
+  namespace: activity-core
+  labels:
+    app.kubernetes.io/name: actcore-issue-core-bridge
+    app.kubernetes.io/part-of: activity-core
+spec:
+  selector:
+    app.kubernetes.io/name: actcore-issue-core-bridge
+  ports:
+    - name: http
+      port: 8765
+      targetPort: http
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: actcore-issue-core-bridge
+  namespace: activity-core
+  labels:
+    app.kubernetes.io/name: actcore-issue-core-bridge
+    app.kubernetes.io/part-of: activity-core
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: actcore-issue-core-bridge
+  template:
+    metadata:
+      labels:
+        app.kubernetes.io/name: actcore-issue-core-bridge
+        app.kubernetes.io/part-of: activity-core
+    spec:
+      hostNetwork: true
+      dnsPolicy: ClusterFirstWithHostNet
+      containers:
+        - name: proxy
+          image: activity-core:railiance01-prod
+          imagePullPolicy: Never
+          ports:
+            - name: http
+              containerPort: 18081
+          command:
+            - python
+            - -c
+            - |
+              from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+              from urllib.error import HTTPError, URLError
+              from urllib.request import Request, urlopen
+
+              TARGET = "http://127.0.0.1:18765"
+              HOP_HEADERS = {"connection", "host", "keep-alive", "proxy-authenticate",
+                             "proxy-authorization", "te", "trailers",
+                             "transfer-encoding", "upgrade"}
+
+              class Proxy(BaseHTTPRequestHandler):
+                  def do_GET(self):
+                      self._proxy()
+
+                  def do_POST(self):
+                      self._proxy()
+
+                  def do_PATCH(self):
+                      self._proxy()
+
+                  def _proxy(self):
+                      length = int(self.headers.get("content-length", "0") or "0")
+                      body = self.rfile.read(length) if length else None
+                      headers = {
+                          key: value
+                          for key, value in self.headers.items()
+                          if key.lower() not in HOP_HEADERS
+                      }
+                      request = Request(
+                          TARGET + self.path,
+                          data=body,
+                          headers=headers,
+                          method=self.command,
+                      )
+                      try:
+                          timeout = 360 if self.command == "POST" else 30
+                          with urlopen(request, timeout=timeout) as response:
+                              payload = response.read()
+                              self.send_response(response.status)
+                              for key, value in response.headers.items():
+                                  if key.lower() not in HOP_HEADERS:
+                                      self.send_header(key, value)
+                              self.end_headers()
+                              self.wfile.write(payload)
+                      except HTTPError as exc:
+                          payload = exc.read()
+                          self.send_response(exc.code)
+                          self.end_headers()
+                          self.wfile.write(payload)
+                      except URLError as exc:
+                          self.send_response(502)
+                          self.end_headers()
+                          self.wfile.write(str(exc).encode())
+
+              ThreadingHTTPServer(("0.0.0.0", 18081), Proxy).serve_forever()
+          readinessProbe:
+            httpGet:
+              path: /healthz
+              port: http
+            initialDelaySeconds: 5
+            periodSeconds: 10
+            timeoutSeconds: 5
+            failureThreshold: 6
+---
 ---
 apiVersion: batch/v1
 kind: Job
--- a/schemas/daily-triage-report.json
+++ b/schemas/daily-triage-report.json
@@ -1,4 +1,5 @@
 {
+  "$comment": "ACTIVITY-WP-0016-T02. Strict, bounded contract for the daily WSJF triage report. The per-item 'recommendations' schema is intentionally strict on STRUCTURE (types + required keys) so the T03 boundary parser can validate each recommendation independently and quarantine only the malformed ones. 'maxItems' is a producer hint (honoured by llm-connect constrained decoding and by the prompt); it is deliberately NOT hard-enforced by the in-repo validator, because rejecting a whole report for having too many items would reproduce the monolithic-failure bug WP-0016 exists to remove. Over-count is mitigated in T03 (keep top-N by rank, quarantine the rest). Value-domain vocabularies (action/confidence) are documented in the prompt and enforced by T04 guardrails with mitigation, not as brittle hard-fail enums here.",
  "type": "object",
  "required": ["summary", "recommendations"],
  "properties": {
@@ -7,8 +8,28 @@
    },
    "recommendations": {
      "type": "array",
+      "maxItems": 7,
      "items": {
-        "type": "object"
+        "type": "object",
+        "required": ["rank", "candidate", "action", "why"],
+        "properties": {
+          "rank": { "type": "integer" },
+          "candidate": { "type": "string" },
+          "action": { "type": "string" },
+          "why": { "type": "string" },
+          "confidence": { "type": "string" },
+          "wsjf": {
+            "type": "object",
+            "properties": {
+              "score": { "type": "number" },
+              "strategic_value": { "type": "number" },
+              "time_criticality": { "type": "number" },
+              "risk_reduction": { "type": "number" },
+              "opportunity_enablement": { "type": "number" },
+              "job_size": { "type": "number" }
+            }
+          }
+        }
      }
    }
  }
--- a/scripts/automation_inventory.py
+++ b/scripts/automation_inventory.py
@@ -0,0 +1,8 @@
+#!/usr/bin/env python3
+"""CLI wrapper for the repo-native automation inventory report."""
+
+from activity_core.automation_status import inventory_main
+
+
+if __name__ == "__main__":
+    raise SystemExit(inventory_main())
--- a/scripts/automation_status.py
+++ b/scripts/automation_status.py
@@ -0,0 +1,8 @@
+#!/usr/bin/env python3
+"""CLI wrapper for the repo-native automation status report."""
+
+from activity_core.automation_status import main
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/scripts/smoke_admin_sync_no_restart.py
+++ b/scripts/smoke_admin_sync_no_restart.py
@@ -0,0 +1,212 @@
+#!/usr/bin/env python3
+"""Railiance01 no-restart smoke for POST /admin/sync.
+
+Patches the disabled ops-service-inventory-probes projection in the cluster
+ConfigMap, waits for the API pod volume to refresh, runs /admin/sync twice,
+verifies DB + Temporal schedule drift without restarting actcore-worker, then
+rolls the ConfigMap back to the disabled baseline.
+
+Requires:
+  - KUBECONFIG pointing at railiance01 (for example ~/.kube/config-hosteurope)
+  - kubectl access to the activity-core namespace
+
+Example:
+  export KUBECONFIG=~/.kube/config-hosteurope
+  python3 scripts/smoke_admin_sync_no_restart.py
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+import time
+
+ACTIVITY_ID = "40d15a87-7ff6-4d8e-992c-37df15f95110"
+CONFIGMAP = "actcore-external-activity-definitions"
+DEFINITION_KEY = "ops-service-inventory-probes.md"
+MOUNTED_FILE = (
+    "/etc/activity-core/external-definitions/activity-definitions/"
+    f"{DEFINITION_KEY}"
+)
+VOLUME_PROPAGATION_SECONDS = 65
+
+
+def kubectl(*args: str, input_text: str | None = None) -> str:
+    cmd = ["kubectl", "-n", "activity-core", *args]
+    return subprocess.check_output(
+        cmd,
+        input=input_text,
+        text=True,
+    )
+
+
+def api_json(path: str, *, method: str = "GET") -> dict:
+    script = (
+        "import urllib.request, json\n"
+        f'req = urllib.request.Request("http://localhost:8010{path}", method="{method}")\n'
+        "print(urllib.request.urlopen(req).read().decode())"
+    )
+    return json.loads(kubectl("exec", "deploy/actcore-api", "--", "python3", "-c", script))
+
+
+def worker_lines(script: str) -> list[str]:
+    return kubectl("exec", "deploy/actcore-worker", "--", "python3", "-c", script).splitlines()
+
+
+def worker_uid() -> str:
+    return kubectl(
+        "get",
+        "pod",
+        "-l",
+        "app.kubernetes.io/name=actcore-worker",
+        "-o",
+        "jsonpath={.items[0].metadata.uid}",
+    ).strip()
+
+
+def load_configmap() -> dict:
+    return json.loads(kubectl("get", "configmap", CONFIGMAP, "-o", "json"))
+
+
+def apply_configmap(cm: dict) -> None:
+    kubectl("apply", "-f", "-", input_text=json.dumps(cm))
+
+
+def patch_definition(cm: dict, *, enabled: bool, cron: str) -> None:
+    text = cm["data"][DEFINITION_KEY]
+    for line in text.splitlines():
+        if line.strip().startswith("enabled:"):
+            break
+    else:
+        raise RuntimeError("enabled field not found in projection")
+
+    text = _replace_once(text, 'enabled: false', f"enabled: {'true' if enabled else 'false'}")
+    text = _replace_once(text, 'enabled: true', f"enabled: {'true' if enabled else 'false'}")
+    text = _replace_once(
+        text,
+        'cron_expression: "15 * * * *"',
+        f'cron_expression: "{cron}"',
+    )
+    text = _replace_once(
+        text,
+        'cron_expression: "25 * * * *"',
+        f'cron_expression: "{cron}"',
+    )
+    cm["data"][DEFINITION_KEY] = text
+    apply_configmap(cm)
+
+
+def _replace_once(text: str, old: str, new: str) -> str:
+    if old not in text:
+        return text
+    return text.replace(old, new, 1)
+
+
+def wait_for_mount(*, enabled: bool, cron: str) -> None:
+    deadline = time.time() + VOLUME_PROPAGATION_SECONDS
+    want_enabled = "enabled: true" if enabled else "enabled: false"
+    want_cron = f'cron_expression: "{cron}"'
+    while time.time() < deadline:
+        content = kubectl("exec", "deploy/actcore-api", "--", "cat", MOUNTED_FILE)
+        if want_enabled in content and want_cron in content:
+            return
+        time.sleep(5)
+    raise RuntimeError(
+        f"ConfigMap projection did not refresh within {VOLUME_PROPAGATION_SECONDS}s"
+    )
+
+
+def get_definition() -> dict[str, object]:
+    for item in api_json("/activity-definitions/"):
+        if item["id"] == ACTIVITY_ID:
+            return {
+                "enabled": item["enabled"],
+                "cron": item["trigger_config"]["cron_expression"],
+            }
+    raise RuntimeError(f"ActivityDefinition {ACTIVITY_ID} not found")
+
+
+def describe_schedule() -> dict[str, object]:
+    script = f"""
+import asyncio
+from temporalio.client import Client
+
+async def main() -> None:
+    client = await Client.connect("actcore-temporal:7233")
+    handle = client.get_schedule_handle("activity-schedule-{ACTIVITY_ID}")
+    described = await handle.describe()
+    schedule = described.schedule
+    minute = schedule.spec.calendars[0].minute[0].start if schedule.spec.calendars else None
+    print(schedule.state.paused)
+    print(minute)
+
+asyncio.run(main())
+"""
+    paused, minute = worker_lines(script)
+    return {"paused": paused == "True", "minute": int(minute)}
+
+
+def main() -> int:
+    worker_before = worker_uid()
+    cm = load_configmap()
+
+    print("1) enable + cadence change via ConfigMap")
+    patch_definition(cm, enabled=True, cron="25 * * * *")
+    wait_for_mount(enabled=True, cron="25 * * * *")
+
+    print("2) POST /admin/sync (first pass)")
+    sync1 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
+    if not sync1.get("ok"):
+        print(json.dumps(sync1, indent=2), file=sys.stderr)
+        return 1
+
+    defn = get_definition()
+    schedule = describe_schedule()
+    print("   definition:", defn)
+    print("   schedule:", schedule)
+    if defn != {"enabled": True, "cron": "25 * * * *"}:
+        print("definition drift after sync", file=sys.stderr)
+        return 1
+    if schedule["paused"] or schedule["minute"] != 25:
+        print("schedule drift after enable sync", file=sys.stderr)
+        return 1
+
+    print("3) POST /admin/sync (idempotent repeat)")
+    sync2 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
+    if sync2.get("schedules") != sync1.get("schedules"):
+        print("idempotent schedule counts changed", file=sys.stderr)
+        print(json.dumps({"sync1": sync1, "sync2": sync2}, indent=2), file=sys.stderr)
+        return 1
+
+    print("4) rollback ConfigMap + sync")
+    cm = load_configmap()
+    patch_definition(cm, enabled=False, cron="15 * * * *")
+    wait_for_mount(enabled=False, cron="15 * * * *")
+    sync3 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
+    if not sync3.get("ok"):
+        print(json.dumps(sync3, indent=2), file=sys.stderr)
+        return 1
+
+    defn = get_definition()
+    schedule = describe_schedule()
+    print("   definition:", defn)
+    print("   schedule:", schedule)
+    if defn != {"enabled": False, "cron": "15 * * * *"}:
+        print("rollback definition drift", file=sys.stderr)
+        return 1
+    if not schedule["paused"] or schedule["minute"] != 15:
+        print("rollback schedule drift", file=sys.stderr)
+        return 1
+
+    worker_after = worker_uid()
+    if worker_before != worker_after:
+        print("actcore-worker pod restarted during smoke", file=sys.stderr)
+        return 1
+
+    print("smoke passed: admin sync hot-reload without worker restart")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/src/activity_core/activities.py
+++ b/src/activity_core/activities.py
@@ -149,6 +149,8 @@ async def resolve_context(
        query = source.get("query", "")
        params = source.get("params") or {}
        required = bool(source.get("required") or params.get("required", False))
+        resolver_params = dict(params)
+        resolver_params["required"] = required
        raw_bind = source.get("bind_to") or source.get("name") or source_type
        # Strip the 'context.' namespace prefix so evaluator can find the key.
        bind_key = raw_bind.removeprefix("context.") if raw_bind.startswith("context.") else raw_bind
@@ -172,7 +174,7 @@ async def resolve_context(
            continue

        try:
-            resolved = resolver_cls().resolve(query, event_envelope, params)
+            resolved = resolver_cls().resolve(query, event_envelope, resolver_params)
            snapshot[bind_key] = _bind_resolver_result(bind_key, resolved)
        except Exception as exc:
            if required:
@@ -364,6 +366,7 @@ async def evaluate_instructions(payload: dict) -> dict:
                "output_validated": result.output_validated,
                "review_required": result.review_required,
                "validation_error": result.validation_error,
+                "llm_response_metadata": result.llm_response_metadata,
            })
        for spec in result.tasks:
            task_specs.append({
--- a/src/activity_core/api.py
+++ b/src/activity_core/api.py
@@ -40,6 +40,7 @@ from temporalio.client import Client
 from activity_core.models import ActivityDefinition, CronTriggerConfig
 from activity_core.orm import ActivityDefinition as ActivityDefinitionRow, EventType as EventTypeRow
 from activity_core.schedule_manager import delete_schedule, upsert_schedule
+from activity_core.sync_service import run_sync
 from activity_core.webhook_receiver import router as webhook_router

 TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
@@ -275,6 +276,24 @@ async def trigger_definition(definition_id: uuid.UUID) -> dict[str, str]:
    return {"workflow_id": handle.id, "trigger_key": trigger_key}


+# --- Admin sync ---------------------------------------------------------------
+
+@app.post("/admin/sync")
+async def admin_sync(
+    definitions: bool = True,
+    schedules: bool = True,
+    event_types: bool = False,
+) -> dict[str, Any]:
+    """Run operator-triggered definition/event/schedule sync without restart."""
+    return await run_sync(
+        session_factory=_get_db(),
+        temporal_client=_get_temporal() if schedules else None,
+        definitions=definitions,
+        schedules=schedules,
+        event_types=event_types,
+    )
+
+
 # T42: Curator gate — event type approval endpoint

@app.get("/health")
--- a/src/activity_core/automation_status.py
+++ b/src/activity_core/automation_status.py
--- a/src/activity_core/context_resolvers/init.py
+++ b/src/activity_core/context_resolvers/init.py
@@ -4,4 +4,5 @@ from activity_core.context_resolvers import (  # noqa: F401
    ops_inventory,
    repo_scoping,
    state_hub,
+    reuse_surface,
 )
--- a/src/activity_core/context_resolvers/reuse_surface.py
+++ b/src/activity_core/context_resolvers/reuse_surface.py
@@ -0,0 +1,516 @@
+"""Reuse-surface registry hygiene context adapter.
+
+Registered as source type ``reuse-surface`` and as the ``shell`` resolver
+dispatcher for the ``reuse_surface_report_gaps`` query.  Other shell queries
+continue to delegate to the kaizen resolver for backward compatibility.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import os
+import socket
+import subprocess
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any
+
+import httpx
+import yaml
+
+from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, ContextResolver
+from activity_core.context_resolvers.kaizen import KaizenContextResolver
+from activity_core.context_resolvers.state_hub import StateHubContextResolver
+
+logger = logging.getLogger(__name__)
+
+_DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
+_REPORT_TIMEOUT_SECONDS = 60
+_STATE_HUB_TIMEOUT_SECONDS = 10.0
+_KNOWN_SIGNALS = frozenset(
+    {
+        "registry_gap",
+        "empty_capability_scaffold",
+        "stale_scope",
+        "stale_sbom",
+        "publish_check_fail",
+    }
+)
+
+
+@dataclass(frozen=True)
+class RosterEntry:
+    slug: str
+    domain: str | None = None
+    publish_check: str | None = None
+
+
+def _base_url() -> str:
+    return os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL).rstrip("/")
+
+
+def _runner_host(params: dict[str, Any]) -> str:
+    return str(
+        params.get("runner_host")
+        or os.environ.get("KAIZEN_RUNNER_HOST")
+        or socket.gethostname()
+    )
+
+
+def _as_required(params: dict[str, Any]) -> bool:
+    return bool(params.get("required", False))
+
+
+def reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
+    """Resolve registry-hygiene gaps for the next rollout batch.
+
+    Missing operational dependencies are visible failures for required sources
+    and graceful empty lists for optional sources so definitions can opt into
+    either behavior without changing rule logic.
+    """
+    try:
+        return _resolve_reuse_surface_report_gaps(params)
+    except Exception as exc:
+        if _as_required(params):
+            raise
+        logger.warning("reuse_surface_report_gaps unavailable: %s", exc)
+        return {"gaps": []}
+
+
+def _resolve_reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
+    roster_path = _roster_path(params)
+    entries = _load_active_roster_entries(roster_path)
+    if not entries:
+        return {"gaps": []}
+
+    state_path = _round_robin_state_path(params, roster_path)
+    selected, next_cursor = _select_round_robin_batch(
+        entries,
+        _batch_size(params),
+        state_path,
+    )
+    if not selected:
+        return {"gaps": []}
+
+    signals = _enabled_signals(_signals_path(params, roster_path))
+    roots = _resolve_repo_roots(selected, _runner_host(params))
+    report = _reuse_surface_report(params, signals)
+    gaps = _gap_records(selected, roots, signals, report)
+
+    _write_round_robin_state(state_path, next_cursor, selected)
+    return {"gaps": gaps}
+
+
+def _roster_path(params: dict[str, Any]) -> Path:
+    raw = params.get("roster")
+    if not raw:
+        raise ValueError("reuse_surface_report_gaps requires params.roster")
+    path = Path(str(raw)).expanduser()
+    if not path.is_file():
+        raise FileNotFoundError(f"reuse_surface_report_gaps roster not found: {path}")
+    return path
+
+
+def _batch_size(params: dict[str, Any]) -> int:
+    try:
+        return max(1, int(params.get("batch_size", 3)))
+    except (TypeError, ValueError):
+        return 3
+
+
+def _round_robin_state_path(params: dict[str, Any], roster_path: Path) -> Path:
+    raw = params.get("round_robin_state")
+    if raw:
+        return Path(str(raw)).expanduser()
+    return roster_path.with_name("round-robin-state.json")
+
+
+def _signals_path(params: dict[str, Any], roster_path: Path) -> Path:
+    raw = params.get("signals")
+    if raw:
+        return Path(str(raw)).expanduser()
+    return roster_path.with_name("signals.yml")
+
+
+def _load_active_roster_entries(path: Path) -> list[RosterEntry]:
+    data = yaml.safe_load(path.read_text(encoding="utf-8"))
+    if not isinstance(data, dict):
+        raise ValueError(f"reuse_surface rollout roster is not a mapping: {path}")
+
+    entries: dict[str, RosterEntry] = {}
+    for domain, block in _iter_domain_blocks(data):
+        if _domain_phase(block) != "active":
+            continue
+        for item in _repo_items(block):
+            entry = _entry_from_item(item, domain, block)
+            if entry and entry.slug not in entries:
+                entries[entry.slug] = entry
+    return list(entries.values())
+
+
+def _iter_domain_blocks(data: dict[str, Any]) -> list[tuple[str | None, dict[str, Any]]]:
+    domains = data.get("domains")
+    if isinstance(domains, dict):
+        return [
+            (str(name), block)
+            for name, block in domains.items()
+            if isinstance(block, dict)
+        ]
+    if isinstance(domains, list):
+        return [
+            (str(block.get("name") or block.get("domain") or ""), block)
+            for block in domains
+            if isinstance(block, dict)
+        ]
+    if isinstance(data.get("active"), list):
+        return [(None, {"phase": "active", "repos": data["active"]})]
+    return [
+        (str(name), block)
+        for name, block in data.items()
+        if isinstance(block, dict) and ("phase" in block or "repos" in block)
+    ]
+
+
+def _domain_phase(block: dict[str, Any]) -> str:
+    return str(block.get("phase") or block.get("status") or "").lower()
+
+
+def _repo_items(block: dict[str, Any]) -> list[Any]:
+    repos = (
+        block.get("repos")
+        or block.get("repo_slugs")
+        or block.get("repositories")
+        or block.get("slugs")
+        or []
+    )
+    if isinstance(repos, dict):
+        items: list[Any] = []
+        for slug, config in repos.items():
+            if isinstance(config, dict):
+                item = dict(config)
+                item.setdefault("slug", slug)
+                items.append(item)
+            else:
+                items.append(str(slug))
+        return items
+    if isinstance(repos, list):
+        return repos
+    return []
+
+
+def _entry_from_item(
+    item: Any,
+    domain: str | None,
+    block: dict[str, Any],
+) -> RosterEntry | None:
+    publish_check = block.get("publish_check")
+    if isinstance(item, str):
+        slug = item
+    elif isinstance(item, dict):
+        slug = item.get("slug") or item.get("repo") or item.get("name")
+        publish_check = item.get("publish_check", publish_check)
+    else:
+        return None
+    if not slug:
+        return None
+    return RosterEntry(
+        slug=str(slug),
+        domain=domain or None,
+        publish_check=str(publish_check).lower() if publish_check is not None else None,
+    )
+
+
+def _select_round_robin_batch(
+    entries: list[RosterEntry],
+    batch_size: int,
+    state_path: Path,
+) -> tuple[list[RosterEntry], int]:
+    if not entries:
+        return [], 0
+    cursor = _read_round_robin_cursor(state_path) % len(entries)
+    size = min(batch_size, len(entries))
+    selected = [entries[(cursor + offset) % len(entries)] for offset in range(size)]
+    next_cursor = (cursor + size) % len(entries)
+    return selected, next_cursor
+
+
+def _read_round_robin_cursor(path: Path) -> int:
+    if not path.is_file():
+        return 0
+    try:
+        data = json.loads(path.read_text(encoding="utf-8"))
+    except (OSError, json.JSONDecodeError):
+        return 0
+    if not isinstance(data, dict):
+        return 0
+    try:
+        return int(data.get("cursor", 0))
+    except (TypeError, ValueError):
+        return 0
+
+
+def _write_round_robin_state(
+    path: Path,
+    cursor: int,
+    selected: list[RosterEntry],
+) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    payload = {
+        "cursor": cursor,
+        "last_batch": [entry.slug for entry in selected],
+        "updated_at": datetime.now(timezone.utc).isoformat(),
+    }
+    path.write_text(
+        json.dumps(payload, indent=2, sort_keys=True) + "\n",
+        encoding="utf-8",
+    )
+
+
+def _enabled_signals(path: Path) -> set[str]:
+    if not path.is_file():
+        return set(_KNOWN_SIGNALS)
+    data = yaml.safe_load(path.read_text(encoding="utf-8"))
+    node = data.get("signals") if isinstance(data, dict) else data
+    enabled: set[str] = set()
+    saw_known_signal = False
+
+    if isinstance(node, dict):
+        for name, config in node.items():
+            if str(name) not in _KNOWN_SIGNALS:
+                continue
+            saw_known_signal = True
+            if isinstance(config, dict) and config.get("enabled") is False:
+                continue
+            if config is False:
+                continue
+            enabled.add(str(name))
+    elif isinstance(node, list):
+        for item in node:
+            if isinstance(item, str) and item in _KNOWN_SIGNALS:
+                saw_known_signal = True
+                enabled.add(item)
+            elif isinstance(item, dict):
+                name = item.get("id") or item.get("signal") or item.get("name")
+                if str(name) in _KNOWN_SIGNALS and item.get("enabled", True) is not False:
+                    saw_known_signal = True
+                    enabled.add(str(name))
+
+    return enabled if saw_known_signal else set(_KNOWN_SIGNALS)
+
+
+def _resolve_repo_roots(
+    entries: list[RosterEntry],
+    runner_host: str,
+) -> dict[str, Path]:
+    requested = {entry.slug for entry in entries}
+    roots: dict[str, Path] = {}
+    for repo in _fetch_repos():
+        slug = str(repo.get("slug") or "")
+        if slug not in requested:
+            continue
+        raw = _repo_path_for_host(repo, runner_host)
+        if raw:
+            roots[slug] = Path(raw)
+    return roots
+
+
+def _fetch_repos() -> list[dict[str, Any]]:
+    url = f"{_base_url()}/repos/"
+    try:
+        resp = httpx.get(url, timeout=_STATE_HUB_TIMEOUT_SECONDS)
+        resp.raise_for_status()
+    except httpx.HTTPError as exc:
+        raise RuntimeError(f"State Hub unreachable at {url}: {exc}") from exc
+    payload = resp.json()
+    if not isinstance(payload, list):
+        raise RuntimeError(f"State Hub /repos/ returned non-list: {type(payload)!r}")
+    return [repo for repo in payload if isinstance(repo, dict)]
+
+
+def _repo_path_for_host(repo: dict[str, Any], runner_host: str) -> str | None:
+    host_paths = repo.get("host_paths") or {}
+    raw = None
+    if isinstance(host_paths, dict):
+        raw = host_paths.get(runner_host)
+    raw = raw or repo.get("local_path")
+    if not raw or raw == "(unknown)":
+        return None
+    return str(raw)
+
+
+def _reuse_surface_report(params: dict[str, Any], signals: set[str]) -> dict[str, Any]:
+    if not (signals & {"registry_gap", "empty_capability_scaffold"}):
+        return {}
+    binary = str(params.get("reuse_surface_bin") or "reuse-surface")
+    try:
+        completed = subprocess.run(
+            [binary, "report", "gaps", "--format", "json"],
+            capture_output=True,
+            check=False,
+            text=True,
+            timeout=_REPORT_TIMEOUT_SECONDS,
+        )
+    except FileNotFoundError as exc:
+        raise RuntimeError(f"reuse-surface CLI not found: {binary}") from exc
+    except subprocess.TimeoutExpired as exc:
+        raise RuntimeError("reuse-surface report gaps timed out") from exc
+
+    if completed.returncode != 0:
+        detail = completed.stderr.strip() or completed.stdout.strip()
+        raise RuntimeError(f"reuse-surface report gaps failed: {detail}")
+    try:
+        payload = json.loads(completed.stdout or "{}")
+    except json.JSONDecodeError as exc:
+        raise RuntimeError("reuse-surface report gaps returned invalid JSON") from exc
+    if not isinstance(payload, dict):
+        raise RuntimeError("reuse-surface report gaps returned non-object JSON")
+    return payload
+
+
+def _gap_records(
+    entries: list[RosterEntry],
+    roots: dict[str, Path],
+    signals: set[str],
+    report: dict[str, Any],
+) -> list[dict[str, Any]]:
+    empty_scaffolds = _repo_set(report, {"empty_scaffolds", "empty_scaffold"})
+    publish_fail = _repo_set(
+        report,
+        {"publish_fail", "publish_fails", "publish_failures"},
+    )
+    gaps: list[dict[str, Any]] = []
+    seen: set[tuple[str, str]] = set()
+
+    for entry in entries:
+        root = roots.get(entry.slug)
+        if root is None:
+            logger.info("reuse_surface repo_unreachable slug=%s", entry.slug)
+            continue
+
+        if (
+            signals & {"registry_gap", "empty_capability_scaffold"}
+            and entry.slug in empty_scaffolds
+        ):
+            _append_gap(gaps, seen, entry.slug, root, "empty_capability_scaffold")
+
+        if "registry_gap" in signals and entry.slug in publish_fail:
+            _append_gap(gaps, seen, entry.slug, root, "registry_gap")
+
+        if "publish_check_fail" in signals and entry.publish_check == "fail":
+            _append_gap(gaps, seen, entry.slug, root, "publish_check_fail")
+
+        if "stale_scope" in signals and _scope_is_stale(root):
+            _append_gap(gaps, seen, entry.slug, root, "stale_scope")
+
+        if "stale_sbom" in signals and _sbom_is_stale(entry.slug):
+            _append_gap(gaps, seen, entry.slug, root, "stale_sbom")
+
+    return gaps
+
+
+def _append_gap(
+    gaps: list[dict[str, Any]],
+    seen: set[tuple[str, str]],
+    slug: str,
+    root: Path,
+    signal: str,
+) -> None:
+    key = (slug, signal)
+    if key in seen:
+        return
+    seen.add(key)
+    gaps.append(
+        {
+            "repo": slug,
+            "root": str(root),
+            "signal": signal,
+            "hygiene_signal": signal,
+        }
+    )
+
+
+def _scope_is_stale(root: Path) -> bool:
+    scope = root / "SCOPE.md"
+    if not scope.is_file():
+        return True
+    age_seconds = datetime.now(timezone.utc).timestamp() - scope.stat().st_mtime
+    return age_seconds > 90 * 24 * 60 * 60
+
+
+def _sbom_is_stale(slug: str) -> bool:
+    payload = StateHubContextResolver().resolve(
+        "repo_sbom_status",
+        None,
+        {"repo_slug": slug},
+    )
+    if not isinstance(payload, dict):
+        return False
+    try:
+        return int(payload.get("sbom_age_days", 0)) > 30
+    except (TypeError, ValueError):
+        return False
+
+
+def _repo_set(report: dict[str, Any], keys: set[str]) -> set[str]:
+    slugs: set[str] = set()
+    for value in _values_for_keys(report, keys):
+        slugs.update(_slugs_from_value(value))
+    return slugs
+
+
+def _values_for_keys(value: Any, keys: set[str]) -> list[Any]:
+    values: list[Any] = []
+    if isinstance(value, dict):
+        for key, nested in value.items():
+            if key in keys:
+                values.append(nested)
+            values.extend(_values_for_keys(nested, keys))
+    elif isinstance(value, list):
+        for item in value:
+            values.extend(_values_for_keys(item, keys))
+    return values
+
+
+def _slugs_from_value(value: Any) -> set[str]:
+    if isinstance(value, str):
+        return {value}
+    if isinstance(value, list):
+        slugs: set[str] = set()
+        for item in value:
+            slugs.update(_slugs_from_value(item))
+        return slugs
+    if isinstance(value, dict):
+        for key in ("repo", "repo_slug", "slug", "name"):
+            if value.get(key):
+                return {str(value[key])}
+        slugs: set[str] = set()
+        for key, nested in value.items():
+            if nested is True or isinstance(nested, (dict, list)):
+                slugs.add(str(key))
+            slugs.update(_slugs_from_value(nested))
+        return slugs
+    return set()
+
+
+class ReuseSurfaceContextResolver(ContextResolver):
+    """Resolves reuse-surface registry hygiene gap reports."""
+
+    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
+        if query == "reuse_surface_report_gaps":
+            return reuse_surface_report_gaps(params)
+        return {}
+
+
+class ShellContextResolver(ContextResolver):
+    """Dispatch shell-backed context queries without breaking kaizen aliases."""
+
+    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
+        if query == "reuse_surface_report_gaps":
+            return reuse_surface_report_gaps(params)
+        return KaizenContextResolver().resolve(query, event, params)
+
+
+CONTEXT_RESOLVER_REGISTRY["reuse-surface"] = ReuseSurfaceContextResolver
+CONTEXT_RESOLVER_REGISTRY["shell"] = ShellContextResolver
--- a/src/activity_core/context_resolvers/state_hub.py
+++ b/src/activity_core/context_resolvers/state_hub.py
@@ -12,6 +12,7 @@ Supported queries:
  - coding_retro:     latest /progress/ item with event_type=coding_retro
  - daily_triage_digest: curated scalar JSON digest for daily WSJF triage
  - recently_on_scope_hourly: POST {STATE_HUB_URL}/recently-on-scope/hourly
+  - consistency_sweep_remote_all: POST {STATE_HUB_URL}/consistency/sweep/remote-all

 No caching — state hub data is live operational state and must not be stale
 within a single workflow run.
@@ -31,6 +32,7 @@ from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, Cont

 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _TIMEOUT_SECONDS = 10.0
+_SWEEP_TIMEOUT_SECONDS = 330.0
 _OPEN_WORKSTREAM_STATUSES = {"active", "ready", "blocked"}
 _OPEN_TASK_STATUSES = {"wait", "todo", "progress"}
 # Sentinel age for repos that have never had an SBOM ingested. Large enough
@@ -53,13 +55,26 @@ def _fetch_json(path: str, params: dict[str, Any] | None = None) -> Any:
        return {}


-def _post_json(path: str, payload: dict[str, Any]) -> Any:
+def _post_json(path: str, payload: dict[str, Any], *, timeout: float = _TIMEOUT_SECONDS) -> Any:
    url = f"{_base_url()}{path}"
-    resp = httpx.post(url, json=payload, timeout=_TIMEOUT_SECONDS)
+    resp = httpx.post(url, json=payload, timeout=timeout)
    resp.raise_for_status()
    return resp.json()


+def _validate_consistency_sweep_remote_all(result: Any) -> dict[str, Any]:
+    if not isinstance(result, dict):
+        raise RuntimeError("consistency_sweep_remote_all returned a non-object response")
+    required_keys = {"exit_code", "lock_skipped", "repos_processed"}
+    missing = required_keys - set(result)
+    if missing:
+        missing_list = ", ".join(sorted(missing))
+        raise RuntimeError(
+            f"consistency_sweep_remote_all response missing required key(s): {missing_list}"
+        )
+    return result
+
+
 def _validate_recently_on_scope_hourly(result: Any) -> dict[str, Any]:
    if not isinstance(result, dict):
        raise RuntimeError("recently_on_scope_hourly returned a non-object response")
@@ -107,6 +122,18 @@ class StateHubContextResolver(ContextResolver):
            }
            result = _post_json("/recently-on-scope/hourly", payload)
            return _validate_recently_on_scope_hourly(result)
+        if query == "consistency_sweep_remote_all":
+            payload = {
+                key: value
+                for key, value in params.items()
+                if key not in {"required"}
+            }
+            result = _post_json(
+                "/consistency/sweep/remote-all",
+                payload,
+                timeout=_SWEEP_TIMEOUT_SECONDS,
+            )
+            return _validate_consistency_sweep_remote_all(result)
        return {}


--- a/src/activity_core/issue_sink.py
+++ b/src/activity_core/issue_sink.py
@@ -20,7 +20,8 @@ from activity_core.rules.models import TaskRef, TaskSpec

 logger = logging.getLogger(__name__)

-ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8010")
+ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8765")
+ISSUE_CORE_API_KEY_ENV = "ISSUE_CORE_API_KEY"
 ISSUE_SINK_TYPE = os.environ.get("ISSUE_SINK_TYPE", "rest")


@@ -30,10 +31,30 @@ class IssueSink(ABC):


 class IssueCoreRestSink(IssueSink):
-    """POSTs to issue-core REST API. Config: ISSUE_CORE_URL env var."""
+    """POSTs to issue-core REST API.

-    def __init__(self, base_url: str = ISSUE_CORE_URL) -> None:
+    Config: ISSUE_CORE_URL and ISSUE_CORE_API_KEY env vars (shared key with
+    the issue-core server).
+    """
+
+    def __init__(
+        self,
+        base_url: str = ISSUE_CORE_URL,
+        api_key: str | None = None,
+    ) -> None:
        self._base_url = base_url.rstrip("/")
+        if api_key is not None:
+            self._api_key = api_key.strip()
+        else:
+            self._api_key = os.environ.get(ISSUE_CORE_API_KEY_ENV, "").strip()
+
+    def _auth_headers(self) -> dict[str, str]:
+        if not self._api_key:
+            raise RuntimeError(
+                f"{ISSUE_CORE_API_KEY_ENV} is not set. "
+                "Required when ISSUE_SINK_TYPE=rest."
+            )
+        return {"Authorization": f"Bearer {self._api_key}"}

    def emit(self, task_spec: TaskSpec) -> TaskRef:
        payload = {
@@ -45,10 +66,19 @@ class IssueCoreRestSink(IssueSink):
            "due_in_days": task_spec.due_in_days,
            "source_type": task_spec.source_type,
            "source_id": task_spec.source_id,
-            "triggering_event_id": task_spec.triggering_event_id,
+            "triggering_event_id": (
+                str(task_spec.triggering_event_id)
+                if task_spec.triggering_event_id is not None
+                else None
+            ),
            "activity_definition_id": task_spec.activity_definition_id,
        }
-        resp = httpx.post(f"{self._base_url}/issues/", json=payload, timeout=10.0)
+        resp = httpx.post(
+            f"{self._base_url}/issues/",
+            json=payload,
+            headers=self._auth_headers(),
+            timeout=10.0,
+        )
        resp.raise_for_status()
        data = resp.json()
        return TaskRef(
--- a/src/activity_core/llm_client.py
+++ b/src/activity_core/llm_client.py
@@ -17,6 +17,8 @@ import httpx
 class DisabledLLMClient:
    """LLM client used when no llm-connect endpoint is configured."""

+    last_response_metadata: dict[str, Any] | None = None
+
    def complete(
        self,
        prompt: str,
@@ -32,6 +34,7 @@ class LLMConnectClient:
    def __init__(self, base_url: str, timeout_seconds: float = 300.0) -> None:
        self.base_url = base_url.rstrip("/")
        self.timeout_seconds = timeout_seconds
+        self.last_response_metadata: dict[str, Any] | None = None

    def complete(
        self,
@@ -54,12 +57,48 @@ class LLMConnectClient:
        )
        resp.raise_for_status()
        data = resp.json()
+        self.last_response_metadata = _extract_response_metadata(data)
        content = data.get("content")
        if not isinstance(content, str):
            raise ValueError("llm-connect response missing string content")
        return content


+_SAFE_RESPONSE_METADATA_KEYS = {
+    "finish_reason",
+    "usage",
+    "model",
+    "model_name",
+    "provider",
+    "request_id",
+    "response_id",
+    "trace_id",
+    "latency_ms",
+    "duration_ms",
+    "elapsed_ms",
+    "created",
+    "created_at",
+}
+
+
+def _extract_response_metadata(data: dict[str, Any]) -> dict[str, Any]:
+    """Keep non-secret llm-connect diagnostics alongside the returned content."""
+    return {
+        key: value for key, value in data.items()
+        if key in _SAFE_RESPONSE_METADATA_KEYS and _json_safe(value)
+    }
+
+
+def _json_safe(value: Any) -> bool:
+    try:
+        import json
+
+        json.dumps(value)
+    except (TypeError, ValueError):
+        return False
+    return True
+
+
 def get_llm_client() -> DisabledLLMClient | LLMConnectClient:
    base_url = os.environ.get("LLM_CONNECT_URL", "").strip()
    if not base_url:
--- a/src/activity_core/models.py
+++ b/src/activity_core/models.py
@@ -49,7 +49,18 @@ class CronTriggerConfig(BaseModel):
    )
    timezone: str = Field(default="UTC", description="IANA timezone name.")
    jitter_seconds: int = Field(default=0, ge=0)
-    misfire_policy: Literal["skip", "catchup", "compress"] = Field(default="skip")
+    # Run-miss recovery behaviour (ACTIVITY-WP-0014). What happens when a fire is
+    # missed because the worker / Temporal was unavailable at trigger time:
+    #   skip           - run on trigger or skip; a missed fire is never recovered
+    #   catchup_all    - recover every fire missed during the outage window
+    #   catchup_latest - recover only the most recent missed fire; do not accumulate
+    # Legacy aliases are accepted: catchup → catchup_all, compress → catchup_latest.
+    misfire_policy: Literal[
+        "skip", "catchup_all", "catchup_latest", "catchup", "compress"
+    ] = Field(default="skip")
+    # Override the per-policy default catchup window (how far back Temporal will
+    # recover missed fires after an outage). None uses the policy default.
+    catchup_window_seconds: int | None = Field(default=None, ge=0)


 class EventTriggerConfig(BaseModel):
--- a/src/activity_core/ops_evidence_sinks.py
+++ b/src/activity_core/ops_evidence_sinks.py
@@ -2,12 +2,15 @@

 from __future__ import annotations

+import json
 import os
+from pathlib import Path
 from typing import Any

 import httpx

 from activity_core.context_resolvers.ops_inventory import _sanitize_url
+from activity_core.state_hub_write import idempotency_headers

 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _INTER_HUB_SINK_TYPES = {
@@ -15,6 +18,10 @@ _INTER_HUB_SINK_TYPES = {
    "inter-hub-event",
    "inter-hub-interaction-event",
 }
+_CORE_HUB_SINK_TYPES = {
+    "core-hub",
+    "core-hub-interaction-event",
+}


 def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, Any]]:
@@ -55,6 +62,12 @@ def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, An
                    results.append(
                        _post_state_hub_progress(payload, bind_key, probe_result, sink)
                    )
+                elif sink_type in _CORE_HUB_SINK_TYPES:
+                    results.append(
+                        _post_core_hub_interaction_event(
+                            payload, bind_key, probe_result, sink
+                        )
+                    )
                elif sink_type in _INTER_HUB_SINK_TYPES:
                    results.append(_inter_hub_result(sink))
                else:
@@ -121,6 +134,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
+        headers=idempotency_headers(run_id, context_key, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -136,12 +150,17 @@ def _post_state_hub_progress(


 def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bool:
-    resp = httpx.get(
-        f"{base_url}/progress/",
-        params={"limit": 100},
-        timeout=10.0,
-    )
-    resp.raise_for_status()
+    # Best-effort optimisation only; the Idempotency-Key header on the write is the
+    # real dedup guarantee. Do not hard-fail if State Hub is unreachable here.
+    try:
+        resp = httpx.get(
+            f"{base_url}/progress/",
+            params={"limit": 100},
+            timeout=10.0,
+        )
+        resp.raise_for_status()
+    except httpx.HTTPError:
+        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
@@ -152,6 +171,213 @@ def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bo
    return False


+def _post_core_hub_interaction_event(
+    payload: dict[str, Any],
+    context_key: str,
+    probe_result: dict[str, Any],
+    sink: dict[str, Any],
+) -> dict[str, Any]:
+    raw_base_url = (
+        sink.get("core_hub_url")
+        or sink.get("base_url")
+        or os.environ.get("CORE_HUB_BASE_URL")
+        or ""
+    )
+    base_url = str(raw_base_url).rstrip("/")
+    runtime_token = _core_hub_runtime_token(sink)
+    widget_id = _core_hub_widget_id(sink, probe_result)
+
+    missing: list[str] = []
+    if not base_url:
+        missing.append("CORE_HUB_BASE_URL")
+    if not runtime_token:
+        missing.append("CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE")
+    if not widget_id:
+        missing.append("widget_id or CORE_HUB_WIDGET_ID")
+    if missing:
+        return {
+            "type": sink.get("type"),
+            "status": "skipped",
+            "reason": "missing_core_hub_config",
+            "missing": missing,
+            "context_key": context_key,
+        }
+
+    endpoint = _selected_endpoint(probe_result, sink)
+    event_type = sink.get("event_type", "ops-endpoint-verified")
+    timeout = float(sink.get("timeout_seconds", 10.0))
+    body = {
+        "widgetId": widget_id,
+        "eventType": event_type,
+        "viewContext": _core_hub_view_context(payload, context_key, endpoint, sink),
+        "metadata": _core_hub_metadata(payload, context_key, probe_result, endpoint),
+    }
+    resp = httpx.post(
+        f"{base_url}/api/v2/interaction-events",
+        json=body,
+        headers=_core_hub_headers(runtime_token),
+        timeout=timeout,
+    )
+    resp.raise_for_status()
+    data = resp.json()
+    event_id = data.get("id")
+    if not event_id:
+        raise RuntimeError("Core Hub interaction event response did not include an id")
+    if not _core_hub_event_exists(base_url, runtime_token, str(event_id), timeout):
+        raise RuntimeError("Core Hub interaction event was not visible after create")
+
+    return {
+        "type": sink.get("type"),
+        "status": "posted",
+        "event_type": data.get("eventType", event_type),
+        "event_id": event_id,
+        "widget_id": data.get("widgetId", widget_id),
+        "verified": True,
+        "context_key": context_key,
+    }
+
+
+def _core_hub_headers(runtime_token: str) -> dict[str, str]:
+    return {
+        "Accept": "application/json",
+        "Authorization": f"Bearer {runtime_token}",
+        "Content-Type": "application/json",
+        "User-Agent": "activity-core-ops-evidence/0.1",
+    }
+
+
+def _core_hub_runtime_token(sink: dict[str, Any]) -> str:
+    token_file = (
+        sink.get("runtime_token_file")
+        or sink.get("token_file")
+        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_FILE")
+    )
+    if token_file:
+        return Path(str(token_file)).read_text(encoding="utf-8").strip()
+    env_name = (
+        sink.get("runtime_token_env")
+        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_ENV")
+        or "CORE_HUB_RUNTIME_TOKEN"
+    )
+    return os.environ.get(str(env_name), "").strip()
+
+
+def _core_hub_widget_id(sink: dict[str, Any], probe_result: dict[str, Any]) -> str:
+    direct = sink.get("widget_id") or os.environ.get("CORE_HUB_WIDGET_ID")
+    if direct:
+        return str(direct)
+
+    endpoint = _selected_endpoint(probe_result, sink)
+    widget_ref = endpoint.get("widget_ref") if endpoint else None
+    if not widget_ref:
+        return ""
+
+    mapping = sink.get("widget_mapping") or sink.get("capability_mapping")
+    if mapping is None:
+        mapping = os.environ.get("CORE_HUB_WIDGET_MAPPING")
+    parsed = _parse_widget_mapping(mapping)
+    return parsed.get(str(widget_ref), "")
+
+
+def _parse_widget_mapping(raw: Any) -> dict[str, str]:
+    if isinstance(raw, dict):
+        return {str(key): str(value) for key, value in raw.items() if value}
+    if not isinstance(raw, str) or not raw.strip():
+        return {}
+    value = raw.strip()
+    if value.startswith("{"):
+        try:
+            loaded = json.loads(value)
+        except json.JSONDecodeError:
+            return {}
+        if isinstance(loaded, dict):
+            return {str(key): str(item) for key, item in loaded.items() if item}
+        return {}
+    if "=" not in value:
+        return {}
+    pairs: dict[str, str] = {}
+    for part in value.split(","):
+        key, _, item = part.partition("=")
+        if key.strip() and item.strip():
+            pairs[key.strip()] = item.strip()
+    return pairs
+
+
+def _selected_endpoint(probe_result: dict[str, Any], sink: dict[str, Any]) -> dict[str, Any]:
+    endpoints = [
+        endpoint
+        for endpoint in probe_result.get("endpoints", [])
+        if isinstance(endpoint, dict)
+    ]
+    endpoint_id = sink.get("endpoint_id")
+    if endpoint_id:
+        match = next(
+            (endpoint for endpoint in endpoints if endpoint.get("endpoint_id") == endpoint_id),
+            None,
+        )
+        if match:
+            return match
+    return next(
+        (endpoint for endpoint in endpoints if endpoint.get("widget_ref")),
+        endpoints[0] if endpoints else {},
+    )
+
+
+def _core_hub_view_context(
+    payload: dict[str, Any],
+    context_key: str,
+    endpoint: dict[str, Any],
+    sink: dict[str, Any],
+) -> str:
+    return str(
+        sink.get("view_context")
+        or endpoint.get("view_context")
+        or f"activity-core/ops-inventory/{payload.get('run_id', 'unknown')}/{context_key}"
+    )
+
+
+def _core_hub_metadata(
+    payload: dict[str, Any],
+    context_key: str,
+    probe_result: dict[str, Any],
+    endpoint: dict[str, Any],
+) -> dict[str, Any]:
+    compact = _compact_probe_result(probe_result)
+    return {
+        "activity_id": payload.get("activity_id"),
+        "activity_core_run_id": payload.get("run_id"),
+        "scheduled_for": payload.get("scheduled_for"),
+        "source_type": "ops-inventory",
+        "context_key": context_key,
+        "probe": {
+            "generated_at": compact.get("generated_at"),
+            "inventory_path": compact.get("inventory_path"),
+            "status": compact.get("status"),
+            "reason": compact.get("reason"),
+            "summary": compact.get("summary", {}),
+        },
+        "endpoint": _compact_endpoint(endpoint) if endpoint else {},
+    }
+
+
+def _core_hub_event_exists(
+    base_url: str,
+    runtime_token: str,
+    event_id: str,
+    timeout: float,
+) -> bool:
+    resp = httpx.get(
+        f"{base_url}/api/v2/interaction-events",
+        headers=_core_hub_headers(runtime_token),
+        timeout=timeout,
+    )
+    resp.raise_for_status()
+    payload = resp.json()
+    data = payload.get("data") if isinstance(payload, dict) else []
+    if not isinstance(data, list):
+        return False
+    return any(isinstance(item, dict) and item.get("id") == event_id for item in data)
+
 def _inter_hub_result(sink: dict[str, Any]) -> dict[str, Any]:
    missing: list[str] = []
    if not (sink.get("inter_hub_url") or os.environ.get("INTER_HUB_URL")):
--- a/src/activity_core/report_sinks.py
+++ b/src/activity_core/report_sinks.py
@@ -11,6 +11,8 @@ from zoneinfo import ZoneInfo

 import httpx

+from activity_core.state_hub_write import idempotency_headers
+
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _THE_CUSTODIAN_ROOT = Path("/home/worsch/the-custodian")
 _FORBIDDEN_CUSTODIAN_ROOTS = (
@@ -134,6 +136,7 @@ def _post_state_hub_progress(
            "output_validated": report_entry.get("output_validated"),
            "review_required": report_entry.get("review_required"),
            "validation_error": report_entry.get("validation_error"),
+            "llm_response_metadata": report_entry.get("llm_response_metadata"),
            "report": report,
        },
    }
@@ -149,6 +152,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
+        headers=idempotency_headers(run_id, instruction_id, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -167,12 +171,18 @@ def _progress_exists(
    instruction_id: str,
    event_type: str,
 ) -> bool:
-    resp = httpx.get(
-        f"{base_url}/progress/",
-        params={"limit": 100},
-        timeout=10.0,
-    )
-    resp.raise_for_status()
+    # Best-effort read-dedup optimisation only. The Idempotency-Key header on the
+    # write is the real guarantee; if State Hub is unreachable here we must not
+    # hard-fail — proceed to the (keyed) write rather than raising.
+    try:
+        resp = httpx.get(
+            f"{base_url}/progress/",
+            params={"limit": 100},
+            timeout=10.0,
+        )
+        resp.raise_for_status()
+    except httpx.HTTPError:
+        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
@@ -215,6 +225,16 @@ def _render_markdown(
        lines.extend([summary, ""])
    if validation_error:
        lines.extend(["Validation error:", "", f"`{validation_error}`", ""])
+    metadata = report_entry.get("llm_response_metadata")
+    if metadata:
+        lines.extend([
+            "LLM response metadata:",
+            "",
+            "```json",
+            json.dumps(metadata, indent=2, sort_keys=True),
+            "```",
+            "",
+        ])
    lines.extend([
        "```json",
        json.dumps(report, indent=2, sort_keys=True),
--- a/src/activity_core/rules/executor.py
+++ b/src/activity_core/rules/executor.py
@@ -41,6 +41,7 @@ class InstructionResult:
    review_required: bool = False
    condition_matched: str | None = None
    validation_error: str | None = None
+    llm_response_metadata: dict[str, Any] | None = None


 def _resolve_path(obj: Any, path: str) -> Any:
@@ -160,15 +161,22 @@ def _execute(
    prompt_hash = hashlib.sha256(rendered.encode()).hexdigest()
    llm_config = _llm_run_config(instr)

+    # Reference allow-list (WP-0016-T04): if a context resolver supplied the set
+    # of known candidate ids, recommendations pointing at anything else are
+    # quarantined. Absent (None) today → the check is inert until wired.
+    allow_list = _allow_list_from_context(context)
+
    # Step 3 — call LLM
    raw_output = llm_client.complete(rendered, model=instr.model, config=llm_config)
+    response_metadata = _llm_response_metadata(llm_client)

    # Step 4 — validate and optionally retry
-    task_specs, report, error = _validate_output(raw_output, instr)
+    task_specs, report, error = _validate_output(raw_output, instr, allow_list)
    if error:
        retry_prompt = rendered + f"\n\nPrevious output was invalid: {error}\nPlease fix."
        raw_output = llm_client.complete(retry_prompt, model=instr.model, config=llm_config)
-        task_specs, report, error = _validate_output(raw_output, instr)
+        response_metadata = _llm_response_metadata(llm_client)
+        task_specs, report, error = _validate_output(raw_output, instr, allow_list)
        if error:
            # Truncate to keep log volume bounded but long enough to see the
            # actual JSON shape mismatch (typical reports are <2KB).
@@ -178,7 +186,18 @@ def _execute(
                "error=%s, raw_output_preview=%r",
                instr.id, prompt_hash, error, preview,
            )
-            failure_report = _invalid_output_report(instr, error, raw_output)
+            # Posture B (WP-0016-T03): try to recover a partial-but-usable
+            # report from individually-parseable items before declaring total
+            # loss. One bad item should cost one item, not the whole report.
+            recovered = _resilient_report(
+                instr, raw_output, error, prompt_hash, allow_list,
+                response_metadata=response_metadata,
+            )
+            if recovered is not None:
+                return recovered
+            failure_report = _invalid_output_report(
+                instr, error, raw_output, response_metadata=response_metadata,
+            )
            if failure_report is not None:
                return InstructionResult(
                    tasks=[],
@@ -189,6 +208,7 @@ def _execute(
                    review_required=True,
                    condition_matched=instr.condition or None,
                    validation_error=error,
+                    llm_response_metadata=response_metadata,
                )
            return _empty_result(instr, prompt_hash=prompt_hash, validation_error=error)

@@ -200,6 +220,7 @@ def _execute(
        output_validated=True,
        review_required=bool(getattr(instr, "review_required", False)),
        condition_matched=instr.condition or None,
+        llm_response_metadata=response_metadata,
    )


@@ -239,6 +260,7 @@ def _invalid_output_report(
    instr: Any,
    validation_error: str,
    raw_output: Any,
+    response_metadata: dict[str, Any] | None = None,
 ) -> dict[str, Any] | None:
    """Build a durable diagnostic report for invalid report-sink output.

@@ -256,7 +278,7 @@ def _invalid_output_report(
            partial_output = _parse_json_output(raw_output)
        except json.JSONDecodeError:
            partial_output = None
-            raw_preview = raw_output[:4000]
+            raw_preview = raw_output[:_RAW_OUTPUT_PREVIEW_LIMIT]
    else:
        partial_output = raw_output

@@ -268,6 +290,8 @@ def _invalid_output_report(
        "status": "validation_failed",
        "validation_error": validation_error,
    }
+    if response_metadata:
+        report["llm_response_metadata"] = response_metadata
    if isinstance(partial_output, dict):
        if isinstance(partial_output.get("summary"), str):
            report["partial_summary"] = partial_output["summary"]
@@ -279,6 +303,358 @@ def _invalid_output_report(
    return report


+# ---------------------------------------------------------------------------
+# Resilient report recovery (ACTIVITY-WP-0016-T03)
+#
+# Posture B — verify & mitigate at the producer→consumer boundary. When the
+# whole-document parse/validate fails, recover individually-parseable
+# recommendation objects, validate each against the item schema, keep the valid
+# ones, and quarantine the malformed/over-limit ones with provenance. One bad
+# item costs one item, not the whole report (error locality == unit of work).
+# ---------------------------------------------------------------------------
+
+_QUARANTINE_LIMIT = 20
+_SNIPPET_LIMIT = 200
+# Producer guardrails (ACTIVITY-WP-0016-T04): structural bounds applied to every
+# recommendation regardless of producer (LLM, agent, or human). These are
+# verify-and-mitigate limits — an offending item is quarantined, never allowed to
+# fail the whole report or flow unbounded into a downstream consumer.
+_MAX_STRING_LEN = 4000
+_MAX_DEPTH = 8
+_RAW_OUTPUT_PREVIEW_LIMIT = 12000
+_SUMMARY_RE = re.compile(r'"summary"\s*:\s*"((?:[^"\\]|\\.)*)"')
+
+
+_SAFE_RESPONSE_METADATA_KEYS = {
+    "finish_reason",
+    "usage",
+    "model",
+    "model_name",
+    "provider",
+    "request_id",
+    "response_id",
+    "trace_id",
+    "latency_ms",
+    "duration_ms",
+    "elapsed_ms",
+    "created",
+    "created_at",
+}
+
+
+def _llm_response_metadata(llm_client: Any) -> dict[str, Any] | None:
+    metadata = getattr(llm_client, "last_response_metadata", None)
+    if not isinstance(metadata, dict) or not metadata:
+        return None
+    safe: dict[str, Any] = {}
+    for key, value in metadata.items():
+        if key not in _SAFE_RESPONSE_METADATA_KEYS:
+            continue
+        try:
+            json.dumps(value)
+        except (TypeError, ValueError):
+            continue
+        safe[str(key)] = value
+    return safe or None
+
+
+def _snippet(value: Any) -> str:
+    text = value if isinstance(value, str) else json.dumps(value, default=str)
+    return text[:_SNIPPET_LIMIT]
+
+
+def _json_depth(value: Any, depth: int = 1) -> int:
+    if depth > _MAX_DEPTH:
+        return depth
+    if isinstance(value, dict):
+        return max((_json_depth(v, depth + 1) for v in value.values()), default=depth)
+    if isinstance(value, list):
+        return max((_json_depth(v, depth + 1) for v in value), default=depth)
+    return depth
+
+
+def _has_oversized_string(value: Any) -> bool:
+    if isinstance(value, str):
+        return len(value) > _MAX_STRING_LEN
+    if isinstance(value, dict):
+        return any(_has_oversized_string(v) for v in value.values())
+    if isinstance(value, list):
+        return any(_has_oversized_string(v) for v in value)
+    return False
+
+
+def _item_structure_error(item: Any) -> str | None:
+    """Producer-agnostic structural guardrail: depth and string-length caps."""
+    if _json_depth(item) > _MAX_DEPTH:
+        return f"exceeds max nesting depth {_MAX_DEPTH}"
+    if _has_oversized_string(item):
+        return f"contains a string longer than {_MAX_STRING_LEN} chars"
+    return None
+
+
+def _allow_list_from_context(context: dict | None) -> set[str] | None:
+    """Build the recommendation-candidate allow-list from resolved context.
+
+    Looks for `context["known_candidates"]` (a list/set of valid candidate ids).
+    Returns None when absent so the allow-list check stays inert until a context
+    resolver populates it — the guardrail capability ships now; activation is a
+    one-line resolver change.
+    """
+    if not isinstance(context, dict):
+        return None
+    known = context.get("known_candidates")
+    if isinstance(known, (list, set, tuple)):
+        return {str(item) for item in known}
+    return None
+
+
+def _report_contract(instr: Any) -> tuple[dict[str, Any] | None, int | None]:
+    """Extract (item_schema, max_items) for the recommendations list, if any."""
+    try:
+        schema = _load_output_schema(getattr(instr, "output_schema", ""))
+    except (OSError, json.JSONDecodeError, TypeError):
+        return None, None
+    if not isinstance(schema, dict):
+        return None, None
+    recs = (schema.get("properties") or {}).get("recommendations")
+    if not isinstance(recs, dict):
+        return None, None
+    item_schema = recs.get("items") if isinstance(recs.get("items"), dict) else None
+    max_items = recs.get("maxItems") if isinstance(recs.get("maxItems"), int) else None
+    return item_schema, max_items
+
+
+def _extract_object_spans(raw: str) -> list[tuple[str, bool]]:
+    """Return (span, complete) for each recommendation object in raw output.
+
+    Scans the `recommendations` array brace-aware and string-aware so it recovers
+    objects whether they are pretty-printed across many lines or emitted one per
+    line (NDJSON). A truncated trailing object is returned with complete=False.
+    """
+    key = raw.find('"recommendations"')
+    start_region = raw.find("[", key) if key >= 0 else -1
+    if start_region < 0:
+        return []
+    spans: list[tuple[str, bool]] = []
+    i, n = start_region + 1, len(raw)
+    while i < n:
+        ch = raw[i]
+        if ch == "]":
+            break
+        if ch != "{":
+            i += 1
+            continue
+        depth, in_str, esc, j = 0, False, False, i
+        closed = False
+        while j < n:
+            c = raw[j]
+            if in_str:
+                if esc:
+                    esc = False
+                elif c == "\\":
+                    esc = True
+                elif c == '"':
+                    in_str = False
+            elif c == '"':
+                in_str = True
+            elif c == "{":
+                depth += 1
+            elif c == "}":
+                depth -= 1
+                if depth == 0:
+                    spans.append((raw[i:j + 1], True))
+                    closed = True
+                    break
+            j += 1
+        if not closed:
+            spans.append((raw[i:], False))  # truncated tail
+            break
+        i = j + 1
+    return spans
+
+
+def _try_repair(span: str) -> str:
+    """Best-effort close of a truncated JSON object: balance quote, braces, brackets."""
+    in_str, esc, depth_c, depth_b = False, False, 0, 0
+    for c in span:
+        if in_str:
+            if esc:
+                esc = False
+            elif c == "\\":
+                esc = True
+            elif c == '"':
+                in_str = False
+        elif c == '"':
+            in_str = True
+        elif c == "{":
+            depth_c += 1
+        elif c == "}":
+            depth_c -= 1
+        elif c == "[":
+            depth_b += 1
+        elif c == "]":
+            depth_b -= 1
+    repaired = span.rstrip().rstrip(",")
+    if in_str:
+        repaired += '"'
+    return repaired + "]" * max(depth_b, 0) + "}" * max(depth_c, 0)
+
+
+def _recover_recommendations(
+    raw: str,
+) -> tuple[str | None, list[dict[str, Any]], list[dict[str, Any]]]:
+    """Recover (summary, items, quarantined) from a failed report payload."""
+    summary_match = _SUMMARY_RE.search(raw)
+    summary = None
+    if summary_match:
+        try:
+            summary = json.loads(f'"{summary_match.group(1)}"')
+        except json.JSONDecodeError:
+            summary = summary_match.group(1)
+    items: list[dict[str, Any]] = []
+    quarantined: list[dict[str, Any]] = []
+    for index, (span, complete) in enumerate(_extract_object_spans(raw)):
+        parsed: Any = None
+        try:
+            parsed = json.loads(span)
+        except json.JSONDecodeError as exc:
+            if not complete:
+                try:
+                    parsed = json.loads(_try_repair(span))
+                except json.JSONDecodeError:
+                    parsed = None
+            if parsed is None:
+                quarantined.append(
+                    {"index": index, "error": str(exc), "raw": _snippet(span),
+                     "reason": "truncated" if not complete else "unparseable"}
+                )
+                continue
+        if isinstance(parsed, dict):
+            items.append(parsed)
+        else:
+            quarantined.append(
+                {"index": index, "error": "item is not a JSON object",
+                 "raw": _snippet(span)}
+            )
+    return summary, items, quarantined
+
+
+def _partition_items(
+    items: list[dict[str, Any]],
+    item_schema: dict[str, Any] | None,
+    max_items: int | None,
+    *,
+    run_schema: bool = True,
+    allow_list: set[str] | None = None,
+) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    """Screen items into (valid, quarantined).
+
+    Applied uniformly to recovered items (run_schema=True) and to already
+    schema-valid happy-path items (run_schema=False). Order of checks: structural
+    type → schema → producer guardrails (depth/length) → reference allow-list →
+    count cap. The first failing check quarantines the item with provenance.
+    """
+    valid: list[dict[str, Any]] = []
+    quarantined: list[dict[str, Any]] = []
+    for index, item in enumerate(items):
+        if not isinstance(item, dict):
+            quarantined.append(
+                {"index": index, "error": "item is not a JSON object",
+                 "raw": _snippet(item), "reason": "malformed"}
+            )
+            continue
+        schema_error = (
+            _validate_schema_node(item, item_schema, f"recommendations[{index}]")
+            if (run_schema and item_schema)
+            else None
+        )
+        if schema_error:
+            quarantined.append(
+                {"index": index, "error": schema_error, "raw": _snippet(item),
+                 "reason": "schema"}
+            )
+            continue
+        structure_error = _item_structure_error(item)
+        if structure_error:
+            quarantined.append(
+                {"index": index, "error": structure_error, "raw": _snippet(item),
+                 "reason": "guardrail"}
+            )
+            continue
+        if allow_list is not None:
+            candidate = item.get("candidate")
+            if not isinstance(candidate, str) or candidate not in allow_list:
+                quarantined.append(
+                    {"index": index, "error": f"candidate {candidate!r} not in allow-list",
+                     "raw": _snippet(item), "reason": "allow_list"}
+                )
+                continue
+        valid.append(item)
+    if max_items is not None and len(valid) > max_items:
+        for item in valid[max_items:]:
+            quarantined.append(
+                {"index": None, "error": f"exceeds maxItems={max_items}",
+                 "raw": _snippet(item), "reason": "over_limit"}
+            )
+        valid = valid[:max_items]
+    return valid, quarantined
+
+
+def _resilient_report(
+    instr: Any,
+    raw_output: Any,
+    original_error: str,
+    prompt_hash: str | None,
+    allow_list: set[str] | None = None,
+    response_metadata: dict[str, Any] | None = None,
+) -> InstructionResult | None:
+    """Recover a partial-but-usable report from output that failed validation.
+
+    Returns None when nothing usable can be recovered, so the caller falls back
+    to the total-loss diagnostic artifact (_invalid_output_report).
+    """
+    if not getattr(instr, "report_sinks", None) or not isinstance(raw_output, str):
+        return None
+    item_schema, max_items = _report_contract(instr)
+    summary, items, quarantined = _recover_recommendations(raw_output)
+    if not items:
+        return None
+    valid, item_quarantine = _partition_items(
+        items, item_schema, max_items, allow_list=allow_list,
+    )
+    quarantined.extend(item_quarantine)
+    if not valid:
+        return None
+    report: dict[str, Any] = {
+        "summary": summary
+        or f"Partial daily triage: recovered {len(valid)} recommendation(s) "
+        "after the full report failed validation.",
+        "recommendations": valid,
+        "status": "partial",
+        "partial": True,
+        "quarantined_count": len(quarantined),
+        "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
+        "recovery_note": f"original validation error: {original_error}",
+    }
+    if response_metadata:
+        report["llm_response_metadata"] = response_metadata
+    logger.warning(
+        "instruction_output_recovered: instruction=%r, kept=%d, quarantined=%d",
+        getattr(instr, "id", None), len(valid), len(quarantined),
+    )
+    return InstructionResult(
+        tasks=[],
+        report=report,
+        prompt_hash=prompt_hash,
+        model=getattr(instr, "model", None),
+        output_validated=True,
+        review_required=True,
+        condition_matched=getattr(instr, "condition", "") or None,
+        validation_error=None,
+        llm_response_metadata=response_metadata,
+    )
+
+
 def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
    """Build a durable diagnostic report when a report instruction cannot run."""
    if not getattr(instr, "report_sinks", None):
@@ -295,6 +671,7 @@ def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
 def _validate_output(
    raw_output: Any,
    instr: Any,
+    allow_list: set[str] | None = None,
 ) -> tuple[list[TaskSpec], dict[str, Any] | None, str | None]:
    """Parse raw LLM output into TaskSpecs and optional report payload.

@@ -349,6 +726,28 @@ def _validate_output(
                source_type="instruction",
                source_id=instr.id,
            ))
+
+        # Happy-path producer guardrails (WP-0016-T04): the whole document already
+        # passed schema validation, so recommendations are schema-valid; still apply
+        # the count cap, structural caps, and reference allow-list, quarantining any
+        # offenders rather than emitting them. Report shape only changes when an item
+        # is actually quarantined.
+        if isinstance(report, dict) and isinstance(report.get("recommendations"), list):
+            item_schema, max_items = _report_contract(instr)
+            kept, quarantined = _partition_items(
+                report["recommendations"], item_schema, max_items,
+                run_schema=False, allow_list=allow_list,
+            )
+            if quarantined:
+                report = {
+                    **report,
+                    "recommendations": kept,
+                    "status": "partial",
+                    "partial": True,
+                    "quarantined_count": len(quarantined),
+                    "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
+                }
+
        return specs, report, None
    except (json.JSONDecodeError, AttributeError, KeyError, TypeError) as exc:
        return [], None, str(exc)
--- a/src/activity_core/schedule_health.py
+++ b/src/activity_core/schedule_health.py
@@ -0,0 +1,194 @@
+"""Missed-fire detection for cron schedules (ACTIVITY-WP-0014, T03).
+
+Even with a catchup window configured, an operator wants to *know* when a fire
+was missed — especially under ``misfire_policy: skip`` where missed fires are
+dropped by design and leave no run and no failure event. This module turns the
+schedule's own bookkeeping into an explicit verdict and an optional State Hub
+alert so a miss is never invisible again.
+
+Temporal already counts fires that were dropped because they fell outside the
+catchup window in ``ScheduleInfo.num_actions_missed_catchup_window``. We surface
+that, plus a staleness check on the most recent fire, as a ``ScheduleHealth``
+verdict. The verdict logic is a pure function so it is testable without a live
+Temporal server; ``check_schedule_health`` is the thin async reader.
+"""
+
+from __future__ import annotations
+
+import os
+from dataclasses import dataclass, field
+from datetime import datetime, timedelta, timezone
+from typing import Any
+from uuid import UUID
+
+import httpx
+
+from activity_core.schedule_manager import schedule_id
+from activity_core.state_hub_write import idempotency_headers
+
+_DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
+
+
+@dataclass(frozen=True)
+class ScheduleHealth:
+    """Verdict for a single schedule's recent firing behaviour."""
+
+    activity_id: str
+    healthy: bool
+    missed_catchup_window: int
+    last_fired_at: datetime | None
+    staleness: timedelta | None
+    reasons: list[str] = field(default_factory=list)
+
+    @property
+    def missed(self) -> bool:
+        return not self.healthy
+
+
+def evaluate_schedule_health(
+    *,
+    activity_id: str,
+    missed_catchup_window: int,
+    last_fired_at: datetime | None,
+    now: datetime,
+    expected_interval: timedelta | None = None,
+    tolerance: timedelta = timedelta(minutes=10),
+) -> ScheduleHealth:
+    """Pure verdict: was a fire missed?
+
+    A schedule is unhealthy if Temporal dropped any fire past the catchup window,
+    or — when ``expected_interval`` is known — if the most recent fire is older
+    than one interval plus ``tolerance`` (i.e. a fire should have happened and
+    did not).
+    """
+    reasons: list[str] = []
+
+    if missed_catchup_window > 0:
+        reasons.append(
+            f"{missed_catchup_window} fire(s) dropped outside the catchup window"
+        )
+
+    staleness: timedelta | None = None
+    if last_fired_at is not None:
+        staleness = now - last_fired_at
+        if expected_interval is not None and staleness > expected_interval + tolerance:
+            reasons.append(
+                f"last fire was {staleness} ago, exceeding the expected "
+                f"{expected_interval} interval"
+            )
+    elif expected_interval is not None:
+        reasons.append("no recorded fire for a schedule that should have fired")
+
+    return ScheduleHealth(
+        activity_id=activity_id,
+        healthy=not reasons,
+        missed_catchup_window=missed_catchup_window,
+        last_fired_at=last_fired_at,
+        staleness=staleness,
+        reasons=reasons,
+    )
+
+
+def _extract_info(desc: Any) -> tuple[int, datetime | None]:
+    """Pull (missed_catchup_window, last_fired_at) from a ScheduleDescription.
+
+    Accesses are defensive so a Temporal SDK field rename degrades to "unknown"
+    rather than raising inside an operational health check.
+    """
+    info = getattr(desc, "info", None)
+    missed = int(getattr(info, "num_actions_missed_catchup_window", 0) or 0)
+
+    last_fired: datetime | None = None
+    recent = getattr(info, "recent_actions", None) or []
+    times = [
+        getattr(a, "scheduled_at", None) or getattr(a, "started_at", None)
+        for a in recent
+    ]
+    times = [t for t in times if t is not None]
+    if times:
+        last_fired = max(times)
+    return missed, last_fired
+
+
+async def check_schedule_health(
+    client: Any,
+    activity_id: str | UUID,
+    *,
+    now: datetime | None = None,
+    expected_interval: timedelta | None = None,
+    tolerance: timedelta = timedelta(minutes=10),
+) -> ScheduleHealth:
+    """Describe the schedule for ``activity_id`` and evaluate its health."""
+    now = now or datetime.now(tz=timezone.utc)
+    handle = client.get_schedule_handle(schedule_id(activity_id))
+    desc = await handle.describe()
+    missed, last_fired = _extract_info(desc)
+    return evaluate_schedule_health(
+        activity_id=str(activity_id),
+        missed_catchup_window=missed,
+        last_fired_at=last_fired,
+        now=now,
+        expected_interval=expected_interval,
+        tolerance=tolerance,
+    )
+
+
+def post_missed_fire_alert(
+    health: ScheduleHealth,
+    *,
+    state_hub_url: str | None = None,
+    author: str = "activity-core",
+    topic_id: str | None = None,
+    workstream_id: str | None = None,
+    timeout_seconds: float = 10.0,
+) -> dict[str, Any]:
+    """Post a ``schedule_miss`` progress event to State Hub for an unhealthy schedule.
+
+    No-op (returns ``status: ok``) when the schedule is healthy, so callers can
+    invoke unconditionally.
+    """
+    if health.healthy:
+        return {"type": "schedule-miss-alert", "status": "ok"}
+
+    base_url = state_hub_url or os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL)
+    base_url = str(base_url).rstrip("/")
+
+    body: dict[str, Any] = {
+        "event_type": "schedule_miss",
+        "author": author,
+        "summary": (
+            f"Schedule {health.activity_id} missed a fire: "
+            + "; ".join(health.reasons)
+        ),
+        "detail": {
+            "activity_id": health.activity_id,
+            "missed_catchup_window": health.missed_catchup_window,
+            "last_fired_at": (
+                health.last_fired_at.isoformat() if health.last_fired_at else None
+            ),
+            "staleness_seconds": (
+                health.staleness.total_seconds() if health.staleness else None
+            ),
+            "reasons": health.reasons,
+        },
+    }
+    if topic_id:
+        body["topic_id"] = topic_id
+    if workstream_id:
+        body["workstream_id"] = workstream_id
+
+    # Dedup repeated alerts for the same missed window (same schedule + last fire).
+    last_fired = health.last_fired_at.isoformat() if health.last_fired_at else "none"
+    resp = httpx.post(
+        f"{base_url}/progress/",
+        json=body,
+        headers=idempotency_headers("schedule_miss", health.activity_id, last_fired),
+        timeout=timeout_seconds,
+    )
+    resp.raise_for_status()
+    data = resp.json()
+    return {
+        "type": "schedule-miss-alert",
+        "status": "posted",
+        "progress_id": data.get("id"),
+    }
--- a/src/activity_core/schedule_manager.py
+++ b/src/activity_core/schedule_manager.py
@@ -17,7 +17,6 @@ from temporalio.client import (
    Schedule,
    ScheduleActionStartWorkflow,
    ScheduleAlreadyRunningError,
-    ScheduleBackfill,
    ScheduleCalendarSpec,
    ScheduleHandle,
    ScheduleOverlapPolicy,
@@ -38,13 +37,49 @@ _ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
 # RunActivityWorkflow detects this value and derives run dedup key from workflow_id.
 SCHEDULED_TRIGGER_KEY = "scheduled"

-# T24: misfire_policy → ScheduleOverlapPolicy
-_MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
-    "skip": ScheduleOverlapPolicy.SKIP,
-    "catchup": ScheduleOverlapPolicy.BUFFER_ALL,
-    "compress": ScheduleOverlapPolicy.BUFFER_ONE,
+# ACTIVITY-WP-0014: misfire_policy → run-miss recovery behaviour.
+#
+# A "missed fire" happens when the worker / Temporal is unavailable at trigger
+# time. Two Temporal levers together define the behaviour:
+#   - catchup_window: how far back the server will recover missed fires once it
+#     is healthy again. The previous code never set this, so a brief outage at
+#     trigger time silently dropped the fire with no recovery and no signal.
+#   - overlap: what to do when a (recovered) fire would start while a prior run
+#     is still executing.
+#
+# Legacy values (catchup, compress) are aliased onto the explicit names.
+_MISFIRE_ALIASES: dict[str, str] = {
+    "catchup": "catchup_all",
+    "compress": "catchup_latest",
 }

+# overlap policy + default catchup window (seconds) per normalised policy.
+_SKIP_WINDOW_SECONDS = 60
+_CATCHUP_ALL_WINDOW_SECONDS = 365 * 24 * 3600
+_CATCHUP_LATEST_WINDOW_SECONDS = 24 * 3600
+
+_MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
+    # Run on trigger or skip — recover nothing past a tiny grace window.
+    "skip": ScheduleOverlapPolicy.SKIP,
+    # Run on trigger or recover every missed fire during the outage window.
+    "catchup_all": ScheduleOverlapPolicy.BUFFER_ALL,
+    # Run on trigger or recover the most recent missed fire only; BUFFER_ONE
+    # buffers at most one start and drops the rest, so a backlog never accumulates.
+    "catchup_latest": ScheduleOverlapPolicy.BUFFER_ONE,
+}
+
+_MISFIRE_DEFAULT_WINDOW: dict[str, int] = {
+    "skip": _SKIP_WINDOW_SECONDS,
+    "catchup_all": _CATCHUP_ALL_WINDOW_SECONDS,
+    "catchup_latest": _CATCHUP_LATEST_WINDOW_SECONDS,
+}
+
+
+def _normalize_misfire_policy(misfire_policy: str) -> str:
+    """Map legacy aliases onto the explicit run-miss policy names."""
+    canonical = _MISFIRE_ALIASES.get(misfire_policy, misfire_policy)
+    return canonical if canonical in _MISFIRE_TO_OVERLAP else "skip"
+

 def schedule_id(activity_id: str | UUID) -> str:
    """Return the canonical Temporal Schedule ID for an ActivityDefinition."""
@@ -57,7 +92,15 @@ def smoke_schedule_id(activity_id: str | UUID) -> str:


 def _overlap_policy(misfire_policy: str) -> ScheduleOverlapPolicy:
-    return _MISFIRE_TO_OVERLAP.get(misfire_policy, ScheduleOverlapPolicy.SKIP)
+    return _MISFIRE_TO_OVERLAP[_normalize_misfire_policy(misfire_policy)]
+
+
+def _catchup_window(cfg: CronTriggerConfig) -> timedelta:
+    """Resolve the catchup window: explicit override, else the policy default."""
+    if cfg.catchup_window_seconds is not None:
+        return timedelta(seconds=cfg.catchup_window_seconds)
+    policy = _normalize_misfire_policy(cfg.misfire_policy)
+    return timedelta(seconds=_MISFIRE_DEFAULT_WINDOW[policy])


 def _build_schedule(defn: ActivityDefinition) -> Schedule:
@@ -80,7 +123,10 @@ def _build_schedule(defn: ActivityDefinition) -> Schedule:
        jitter=timedelta(seconds=cfg.jitter_seconds) if cfg.jitter_seconds else None,
    )

-    policy = SchedulePolicy(overlap=_overlap_policy(cfg.misfire_policy))
+    policy = SchedulePolicy(
+        overlap=_overlap_policy(cfg.misfire_policy),
+        catchup_window=_catchup_window(cfg),
+    )
    state = ScheduleState(paused=not defn.enabled)

    return Schedule(action=action, spec=spec, policy=policy, state=state)
@@ -282,18 +328,10 @@ async def upsert_schedule(client: Client, defn: ActivityDefinition) -> ScheduleH
        else:
            await handle.pause(note="disabled via upsert_schedule")

-    # T24 catchup: backfill any fires missed in the last hour.
-    if isinstance(defn.trigger_config, CronTriggerConfig):
-        if defn.trigger_config.misfire_policy == "catchup":
-            now = datetime.now(tz=timezone.utc)
-            backfill_start = now - timedelta(hours=1)
-            await handle.backfill(
-                ScheduleBackfill(
-                    start_at=backfill_start,
-                    end_at=now,
-                    overlap=ScheduleOverlapPolicy.BUFFER_ALL,
-                )
-            )
+    # ACTIVITY-WP-0014: missed-fire recovery is now handled natively by the
+    # schedule's catchup_window (see _build_schedule), which the server applies
+    # continuously after any outage — not only at upsert time. The previous
+    # ad-hoc 1-hour backfill is therefore no longer needed.

    return handle

--- a/src/activity_core/state_hub_write.py
+++ b/src/activity_core/state_hub_write.py
@@ -0,0 +1,34 @@
+"""Idempotency-keyed State Hub writes (ACTIVITY-WP-0014 T05).
+
+Under the State Hub *beachhead* model, a write may be buffered locally while
+central State Hub is unreachable and **flushed later, possibly with retries**.
+To keep that flush safe — no duplicate progress / triage events — every write
+carries a stable ``Idempotency-Key`` header derived deterministically from the
+write's identity. The guarantee lives on the write itself and does **not** depend
+on a live dedup read, so it holds even when the beachhead is serving offline.
+
+activity-core does not implement the queue/cache (that is state-hub's beachhead);
+it only emits the key so the beachhead / State Hub can dedup on flush. The header
+passes untouched through the existing ``actcore-state-hub-bridge`` proxy and is
+ignored by State Hub versions that do not yet honour it.
+"""
+
+from __future__ import annotations
+
+IDEMPOTENCY_HEADER = "Idempotency-Key"
+
+
+def idempotency_key(*parts: str | None) -> str:
+    """Build a stable, header-safe idempotency key from identity parts.
+
+    Empty/None parts are kept as empty segments so the key shape is stable across
+    calls. Whitespace and control characters are collapsed to keep the value a
+    valid single-line HTTP header.
+    """
+    raw = ":".join((p or "") for p in parts)
+    return "".join(ch if 0x20 < ord(ch) < 0x7F else "_" for ch in raw) or "_"
+
+
+def idempotency_headers(*parts: str | None) -> dict[str, str]:
+    """Return the header dict to attach to a State Hub write."""
+    return {IDEMPOTENCY_HEADER: idempotency_key(*parts)}
--- a/src/activity_core/sync_schedules.py
+++ b/src/activity_core/sync_schedules.py
@@ -15,6 +15,8 @@ import asyncio
 import logging
 import os
 import uuid
+from dataclasses import dataclass
+from typing import Sequence

 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
@@ -30,6 +32,20 @@ TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
 TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")


+@dataclass
+class ScheduleSyncResult:
+    upserted: int = 0
+    paused: int = 0
+    deleted_orphans: int = 0
+
+    def to_dict(self) -> dict[str, int]:
+        return {
+            "upserted": self.upserted,
+            "paused": self.paused,
+            "deleted_orphans": self.deleted_orphans,
+        }
+
+
 def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    """Convert an ORM row to a domain ActivityDefinition for schedule_manager."""
    return ActivityDefinition.model_validate(
@@ -46,12 +62,82 @@ def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    )


-async def sync(client: Client, db_url: str) -> None:
+def _valid_schedule_activity_id(defn: ActivityDefinition) -> str:
+    if isinstance(defn.trigger_config, ScheduledTriggerConfig):
+        return f"{defn.id}-once"
+    return str(defn.id)
+
+
+async def _load_schedule_rows(
+    session_factory: async_sessionmaker[AsyncSession],
+) -> Sequence[ActivityDefinitionRow]:
+    async with session_factory() as session:
+        return (
+            await session.scalars(
+                select(ActivityDefinitionRow).where(
+                    ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
+                )
+            )
+        ).all()
+
+
+async def sync_schedule_rows(
+    client: Client,
+    rows: Sequence[ActivityDefinitionRow],
+) -> ScheduleSyncResult:
+    """Reconcile Temporal Schedules against already-loaded definition rows."""
+    valid_schedule_activity_ids: set[str] = set()
+    result = ScheduleSyncResult()
+
+    for row in rows:
+        defn = _row_to_domain(row)
+        if not isinstance(
+            defn.trigger_config,
+            (CronTriggerConfig, ScheduledTriggerConfig),
+        ):
+            continue
+
+        valid_schedule_activity_ids.add(_valid_schedule_activity_id(defn))
+
+        await upsert_schedule(client, defn)
+        if defn.enabled:
+            result.upserted += 1
+            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
+        else:
+            result.paused += 1
+            logger.info("upserted paused schedule for disabled activity %s", defn.id)
+
+    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
+    existing_schedules = await list_schedules(client)
+    for entry in existing_schedules:
+        if entry["activity_id"] not in valid_schedule_activity_ids:
+            await delete_schedule(client, entry["activity_id"])
+            result.deleted_orphans += 1
+            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
+
+    logger.info(
+        "sync_schedules complete — upserted=%d paused=%d deleted_orphans=%d",
+        result.upserted,
+        result.paused,
+        result.deleted_orphans,
+    )
+    return result
+
+
+async def sync_with_session_factory(
+    client: Client,
+    session_factory: async_sessionmaker[AsyncSession],
+) -> ScheduleSyncResult:
+    """Reconcile Temporal Schedules using an existing DB session factory."""
+    return await sync_schedule_rows(client, await _load_schedule_rows(session_factory))
+
+
+async def sync(client: Client, db_url: str) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules against the ActivityDefinition table.

    Steps:
-      1. Load all enabled cron ActivityDefinitions from Postgres.
-      2. Upsert a Temporal Schedule for each one.
+      1. Load all cron/scheduled ActivityDefinitions from Postgres.
+      2. Upsert a Temporal Schedule for each one, paused when disabled.
      3. Delete Temporal Schedules whose activity_id has no matching DB row
         (tombstone cleanup for deleted or trigger-type-changed definitions).
    """
@@ -59,55 +145,10 @@ async def sync(client: Client, db_url: str) -> None:
    session_factory = async_sessionmaker(engine, expire_on_commit=False)

    try:
-        async with session_factory() as session:
-            rows = (
-                await session.scalars(
-                    select(ActivityDefinitionRow).where(
-                        ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
-                    )
-                )
-            ).all()
+        return await sync_with_session_factory(client, session_factory)
    finally:
        await engine.dispose()

-    db_activity_ids: set[str] = set()
-    upserted = 0
-    skipped = 0
-
-    for row in rows:
-        defn = _row_to_domain(row)
-        if not isinstance(defn.trigger_config, (CronTriggerConfig, ScheduledTriggerConfig)):
-            continue
-
-        db_activity_ids.add(str(defn.id))
-
-        if defn.enabled:
-            await upsert_schedule(client, defn)
-            upserted += 1
-            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
-        else:
-            # Disabled definitions: schedule may exist (paused) — leave it;
-            # upsert_schedule already handles the paused state.
-            await upsert_schedule(client, defn)
-            skipped += 1
-            logger.info("upserted paused schedule for disabled activity %s", defn.id)
-
-    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
-    existing_schedules = await list_schedules(client)
-    deleted = 0
-    for entry in existing_schedules:
-        if entry["activity_id"] not in db_activity_ids:
-            await delete_schedule(client, entry["activity_id"])
-            deleted += 1
-            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
-
-    logger.info(
-        "sync_schedules complete — upserted=%d skipped_disabled=%d deleted_orphans=%d",
-        upserted,
-        skipped,
-        deleted,
-    )
-

 async def main() -> None:
    logging.basicConfig(level=logging.INFO)
@@ -116,7 +157,13 @@ async def main() -> None:
        raise RuntimeError("ACTCORE_DB_URL is required")

    client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
-    await sync(client, db_url)
+    result = await sync(client, db_url)
+    print(
+        "Synced schedules: "
+        f"upserted={result.upserted} "
+        f"paused={result.paused} "
+        f"deleted_orphans={result.deleted_orphans}"
+    )


 if __name__ == "__main__":
--- a/src/activity_core/sync_service.py
+++ b/src/activity_core/sync_service.py
@@ -0,0 +1,97 @@
+"""Shared ActivityDefinition/event type/schedule sync orchestration."""
+
+from __future__ import annotations
+
+from typing import Any
+
+from temporalio.client import Client
+
+from activity_core.event_type_registry import sync_event_types
+from activity_core.sync_activity_definitions import sync as sync_activity_definitions
+from activity_core.sync_schedules import ScheduleSyncResult, sync_with_session_factory
+
+_MAX_ERRORS = 20
+_MAX_ERROR_MESSAGE_LENGTH = 1000
+
+
+def _empty_result(
+    *,
+    definitions: bool,
+    schedules: bool,
+    event_types: bool,
+) -> dict[str, Any]:
+    return {
+        "ok": True,
+        "ran": {
+            "definitions": definitions,
+            "schedules": schedules,
+            "event_types": event_types,
+        },
+        "definitions": {"synced": 0},
+        "event_types": {"synced": 0},
+        "schedules": ScheduleSyncResult().to_dict(),
+        "errors": [],
+    }
+
+
+def _record_error(result: dict[str, Any], stage: str, exc: Exception) -> None:
+    errors = result["errors"]
+    if len(errors) >= _MAX_ERRORS:
+        return
+    errors.append(
+        {
+            "stage": stage,
+            "type": type(exc).__name__,
+            "message": str(exc)[:_MAX_ERROR_MESSAGE_LENGTH],
+        }
+    )
+    result["ok"] = False
+
+
+async def run_sync(
+    *,
+    session_factory: Any,
+    temporal_client: Client | None,
+    definitions: bool = True,
+    schedules: bool = True,
+    event_types: bool = False,
+) -> dict[str, Any]:
+    """Run the requested sync stages and return bounded operator-facing status.
+
+    The orchestration deliberately accepts its database and Temporal
+    dependencies as arguments so startup and the API can share the same behavior
+    without creating another global runtime.
+    """
+    result = _empty_result(
+        definitions=definitions,
+        schedules=schedules,
+        event_types=event_types,
+    )
+
+    if definitions:
+        try:
+            result["definitions"]["synced"] = await sync_activity_definitions(
+                session_factory
+            )
+        except Exception as exc:  # pragma: no cover - exercised through tests
+            _record_error(result, "definitions", exc)
+
+    if event_types:
+        try:
+            result["event_types"]["synced"] = await sync_event_types(session_factory)
+        except Exception as exc:  # pragma: no cover - exercised through tests
+            _record_error(result, "event_types", exc)
+
+    if schedules:
+        try:
+            if temporal_client is None:
+                raise RuntimeError("Temporal client is required for schedule sync")
+            schedule_result = await sync_with_session_factory(
+                temporal_client,
+                session_factory,
+            )
+            result["schedules"] = schedule_result.to_dict()
+        except Exception as exc:  # pragma: no cover - exercised through tests
+            _record_error(result, "schedules", exc)
+
+    return result
--- a/src/activity_core/worker.py
+++ b/src/activity_core/worker.py
@@ -46,8 +46,7 @@ from activity_core.activities import (
 )
 from activity_core.db import make_engine
 from sqlalchemy.ext.asyncio import async_sessionmaker
-from activity_core.sync_activity_definitions import sync as sync_activity_defs
-from activity_core.sync_schedules import sync as sync_schedules
+from activity_core.sync_service import run_sync
 from activity_core.workflows import RunActivityWorkflow, TaskExecutorWorkflow

 logger = logging.getLogger(__name__)
@@ -77,20 +76,26 @@ async def run() -> None:
        TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE, runtime=runtime
    )

-    # T45: Sync ActivityDefinition files into DB before schedule sync.
-    logger.info("Syncing ActivityDefinition files...")
+    logger.info("Syncing ActivityDefinitions and Temporal Schedules...")
+    sync_engine = make_engine(db_url)
+    session_factory = async_sessionmaker(sync_engine, expire_on_commit=False)
    try:
-        session_factory = async_sessionmaker(make_engine(db_url), expire_on_commit=False)
-        await sync_activity_defs(session_factory)
-    except Exception:
-        logger.exception("activity definition sync failed — continuing worker startup")
-
-    # T23: Sync Temporal Schedules with the DB before workers start accepting tasks.
-    logger.info("Syncing Temporal Schedules with ActivityDefinition DB...")
-    try:
-        await sync_schedules(client, db_url)
-    except Exception:
-        logger.exception("schedule sync failed — continuing worker startup")
+        sync_result = await run_sync(
+            session_factory=session_factory,
+            temporal_client=client,
+            definitions=True,
+            schedules=True,
+            event_types=False,
+        )
+        for error in sync_result["errors"]:
+            logger.error(
+                "startup sync %s failed — %s: %s",
+                error["stage"],
+                error["type"],
+                error["message"],
+            )
+    finally:
+        await sync_engine.dispose()

    orchestrator_worker = Worker(
        client,
--- a/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
+++ b/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
@@ -0,0 +1,5 @@
+{
+  "_note": "PARTIAL 4000-char preview of the 2026-06-26 daily-triage validation failure (retry attempt). Full payload not recoverable from activity-core: complete() drops finish_reason; report sink caps raw at 4000 chars; the JSON break is at char 5268 (beyond this preview). Full response would require llm-connect producer-side logs on railiance01.",
+  "validation_error": "Expecting ',' delimiter: line 136 column 22 (char 5268)",
+  "raw_output_preview": "{\n  \"summary\": \"Triage report focusing on high-priority workstreams with pending human intervention or critical dependencies, and addressing recently cleared dependencies to unblock progress.\",\n  \"recommendations\": [\n    {\n      \"rank\": 1,\n      \"candidate\": \"2731fece-6c49-45b8-ab8a-4ea6c04ac603\",\n      \"action\": \"work-next\",\n      \"why\": \"A critical dependency (T03 - Configure bounded OpenBao token roles and policies) for this workstream has been cleared, unblocking significant progress on credential management. This workstream has 8 todo tasks and no waits, indicating it's ready for immediate action.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 5.0,\n        \"strategic_value\": 5,\n        \"time_criticality\": 5,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 5,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 2,\n      \"candidate\": \"bd086c41-287d-4a4e-8ac5-9ab270f14d72\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (T04 - Provision the runtime API key outside Git) and is currently blocked by 3 'wait' tasks. Human intervention is required to unblock progress.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 3,\n      \"candidate\": \"9b56414a-c71f-4e72-9b2b-d2166aaf50d0\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (Task: Execute Live Ops-Hub Bootstrap) and is currently blocked by a 'wait' task. Human intervention is required to proceed with the bootstrap.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 4,\n      \"candidate\": \"84e17675-0d15-4268-a8bd-540124d37018\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has 4 'needs_human' tasks, including 'T02 \u2014 Resolve Forgejo production design decisions', indicating significant human input is required to move forward with the migration.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.0,\n        \"strategic_value\": 4,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 5,\n      \"candidate\": \"5646e13a-13af-4724-bca6-3c0d86f96733\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has a 'needs_human' task ('Three-Run Calibration Feedback') and is currently in a 'wait' state. Human feedback is crucial for operational hardening.\",\n      \"confidence\": \"medium\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 6,\n      \"candidate\": \"896ace77-21b3-450b-8fb7-254aefc8c570\",\n      \"action\": \"close-out\",\n      \"why\": \"The task 'Wire activity-core to the live service' has been resolved, and the workstream shows 2 progress tasks with 0 todo/wait tasks. This indicates the deployment is likely complete or nearing completion and ready for close-out after verification.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 7,\n      \"candidate\": \"656e435d-3a00-4f5e-a38e-114467f9062e\",\n      \"action\": \"work-next\",\n      \"why\": \"This high-priority workstream has a single 'wait' task ('Task: Activate Ops-Hub Widgets In Inter-Hub') and no 'needs_human' tasks. It appears ready for the next step to activate the widgets.\",\n      \"confidence\": \"medium\",\n      \"wsjf"
+}
--- a/tests/rules/test_actions.py
+++ b/tests/rules/test_actions.py
@@ -88,6 +88,43 @@ def test_for_each_binds_each_list_item_before_condition_and_action_rendering() -
    ]


+def test_for_each_can_gate_registry_hygiene_gaps_on_signal() -> None:
+    rules = [
+        {
+            "id": "flag-registry-hygiene-gap",
+            "for_each": "context.gaps",
+            "bind_as": "g",
+            "condition": 'context.g.hygiene_signal != ""',
+            "action": {
+                "task_template": "Close registry hygiene gap for {context.g.repo}",
+                "target_repo": "context.g.repo",
+                "priority": "medium",
+                "labels": ["registry-hygiene", "{context.g.hygiene_signal}"],
+            },
+        }
+    ]
+    context = {
+        "gaps": [
+            {
+                "repo": "reuse-surface",
+                "hygiene_signal": "empty_capability_scaffold",
+            },
+            {
+                "repo": "activity-core",
+                "hygiene_signal": "",
+            },
+        ]
+    }
+
+    specs = expand_rule_actions(rules, _Event(), context)
+
+    assert [spec["target_repo"] for spec in specs] == ["reuse-surface"]
+    assert specs[0]["labels"] == [
+        "registry-hygiene",
+        "empty_capability_scaffold",
+    ]
+
+
 def test_for_each_rejects_non_path_expression() -> None:
    rules = [
        {
--- a/tests/rules/test_executor.py
+++ b/tests/rules/test_executor.py
@@ -12,6 +12,7 @@ Covers:
 from __future__ import annotations

 import json
+from pathlib import Path
 from types import SimpleNamespace
 from typing import Any

@@ -333,7 +334,14 @@ def test_execute_instruction_forwards_output_schema_to_llm_connect(tmp_path, mon
 def test_execute_instruction_with_audit_accepts_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
+            {
+                "rank": 1,
+                "action": "revisit",
+                "candidate": "CUST-WP-0045",
+                "why": "Loose ends need attention.",
+            }
+        ],
    }
    llm = _CountingLLM([json.dumps(report_data)])
    instr = _instr(
@@ -353,7 +361,14 @@ def test_execute_instruction_with_audit_accepts_report_payload():
 def test_execute_instruction_with_audit_accepts_fenced_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
+            {
+                "rank": 1,
+                "action": "revisit",
+                "candidate": "CUST-WP-0045",
+                "why": "Loose ends need attention.",
+            }
+        ],
    }
    llm = _CountingLLM([f"```json\n{json.dumps(report_data)}\n```"])
    instr = _instr(
@@ -389,6 +404,216 @@ def test_execute_instruction_with_audit_rejects_invalid_report_schema():
    assert llm.call_count == 2


+# ── WP-0016-T03 resilient report recovery ─────────────────────────────────────
+
+def _valid_rec(rank: int) -> dict[str, Any]:
+    return {
+        "rank": rank,
+        "candidate": f"WS-{rank}",
+        "action": "work-next",
+        "why": f"reason {rank}",
+        "wsjf": {"score": 5.0},
+    }
+
+
+def _pretty_triage_with_truncated_tail(num_valid: int) -> str:
+    body = ",\n".join("    " + json.dumps(_valid_rec(i)) for i in range(1, num_valid + 1))
+    # Trailing object is cut off mid-string — the whole document is invalid JSON,
+    # reproducing the 2026-06-26 failure shape (valid prefix, broken tail).
+    return (
+        '{\n  "summary": "Daily triage.",\n  "recommendations": [\n'
+        + body
+        + ',\n    {\n      "rank": '
+        + str(num_valid + 1)
+        + ',\n      "candidate": "WS-X",\n      "action": "work-'
+    )
+
+
+def test_resilient_report_recovers_valid_prefix_and_quarantines_truncated_tail():
+    raw = _pretty_triage_with_truncated_tail(7)
+    llm = _CountingLLM([raw, raw])
+    instr = _instr(
+        id="daily-triage-report",
+        prompt="Report.",
+        trusted_fields=[],
+        output_schema="schemas/daily-triage-report.json",
+        report_sinks=[{"type": "working-memory"}],
+    )
+
+    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
+
+    assert result.output_validated is True
+    assert result.review_required is True
+    assert result.report is not None
+    assert result.report["partial"] is True
+    assert len(result.report["recommendations"]) == 7
+    assert result.report["summary"] == "Daily triage."
+    assert result.report["quarantined_count"] >= 1
+    # The broken tail is dropped — either as an unparseable/truncated span or,
+    # if _try_repair salvages its structure, as a schema-invalid item. Either way
+    # it carries a diagnostic error and never pollutes the surviving report.
+    assert result.report["quarantined_items"][0]["error"]
+
+
+def test_resilient_report_quarantines_one_bad_item_among_valid():
+    recs = [_valid_rec(1), {"candidate": "WS-2", "action": "x", "why": "no rank"}, _valid_rec(3)]
+    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
+    llm = _CountingLLM([raw, raw])
+    instr = _instr(
+        id="daily-triage-report",
+        prompt="Report.",
+        trusted_fields=[],
+        output_schema="schemas/daily-triage-report.json",
+        report_sinks=[{"type": "working-memory"}],
+    )
+
+    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
+
+    assert result.output_validated is True
+    assert result.report["partial"] is True
+    assert len(result.report["recommendations"]) == 2
+    assert result.report["quarantined_count"] == 1
+    assert "rank" in result.report["quarantined_items"][0]["error"]
+
+
+# ── WP-0016-T04 producer guardrails ───────────────────────────────────────────
+
+def _triage_instr() -> SimpleNamespace:
+    return _instr(
+        id="daily-triage-report",
+        prompt="Report.",
+        trusted_fields=[],
+        output_schema="schemas/daily-triage-report.json",
+        report_sinks=[{"type": "working-memory"}],
+    )
+
+
+def test_guardrail_count_cap_on_valid_happy_path():
+    # 9 fully-valid recommendations in a syntactically valid document: schema
+    # validation passes, but the maxItems=7 count cap must keep 7 and quarantine 2.
+    recs = [_valid_rec(i) for i in range(1, 10)]
+    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
+    llm = _CountingLLM([raw])
+
+    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
+
+    assert llm.call_count == 1  # no retry — the document was valid
+    assert result.report["partial"] is True
+    assert len(result.report["recommendations"]) == 7
+    assert result.report["quarantined_count"] == 2
+    assert all(q["reason"] == "over_limit" for q in result.report["quarantined_items"])
+
+
+def test_guardrail_oversized_string_quarantined():
+    big = _valid_rec(2)
+    big["why"] = "x" * 5000  # exceeds _MAX_STRING_LEN
+    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), big]})
+    llm = _CountingLLM([raw])
+
+    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
+
+    assert len(result.report["recommendations"]) == 1
+    assert result.report["quarantined_count"] == 1
+    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
+
+
+def test_guardrail_allow_list_rejects_unknown_candidate():
+    raw = json.dumps({
+        "summary": "Triage.",
+        "recommendations": [_valid_rec(1), _valid_rec(2)],  # candidates WS-1, WS-2
+    })
+    llm = _CountingLLM([raw])
+    context = {"known_candidates": ["WS-1"]}
+
+    result = execute_instruction_with_audit(_triage_instr(), _Event(), context, llm)
+
+    assert len(result.report["recommendations"]) == 1
+    assert result.report["recommendations"][0]["candidate"] == "WS-1"
+    assert result.report["quarantined_items"][0]["reason"] == "allow_list"
+
+
+def _nested(depth: int) -> dict[str, Any]:
+    node: dict[str, Any] = {"leaf": 1}
+    for _ in range(depth):
+        node = {"a": node}
+    return node
+
+
+def test_guardrail_over_depth_quarantined():
+    deep = _valid_rec(2)
+    deep["extra"] = _nested(12)  # well past _MAX_DEPTH
+    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), deep]})
+    llm = _CountingLLM([raw])
+
+    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
+
+    assert len(result.report["recommendations"]) == 1
+    assert result.report["quarantined_count"] == 1
+    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
+    assert "depth" in result.report["quarantined_items"][0]["error"]
+
+
+def test_resilient_recovery_against_real_2026_06_26_fixture():
+    # The actual captured failure payload (4000-char preview, truncated at the 7th
+    # recommendation) — the run that reset the WP-0006-T03 streak. Before WP-0016
+    # this discarded the whole report; now it must recover the valid prefix.
+    fixture = json.loads(
+        Path("tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json")
+        .read_text(encoding="utf-8")
+    )
+    raw = fixture["raw_output_preview"]
+    llm = _CountingLLM([raw, raw])
+
+    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
+
+    assert result.output_validated is True
+    assert result.report["partial"] is True
+    # Six recommendations are fully intact before the truncation point.
+    assert len(result.report["recommendations"]) >= 6
+    assert all("rank" in rec and "candidate" in rec for rec in result.report["recommendations"])
+
+
+
+class _MetadataBadLLM:
+    def __init__(self) -> None:
+        self.call_count = 0
+        self.last_response_metadata: dict[str, Any] | None = None
+
+    def complete(
+        self,
+        prompt: str,
+        model: str = "",
+        config: dict | None = None,
+    ) -> str:
+        self.call_count += 1
+        self.last_response_metadata = {
+            "finish_reason": "length",
+            "usage": {"input_tokens": 1100, "output_tokens": 1200},
+        }
+        return ("x" * 9000) + "{"
+
+
+def test_invalid_report_preserves_response_metadata_and_long_preview():
+    llm = _MetadataBadLLM()
+    instr = _instr(
+        id="daily-triage-report",
+        prompt="Report.",
+        trusted_fields=[],
+        report_sinks=[{"type": "working-memory", "path": "/tmp"}],
+    )
+
+    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
+
+    assert llm.call_count == 2
+    assert result.output_validated is False
+    assert result.llm_response_metadata == {
+        "finish_reason": "length",
+        "usage": {"input_tokens": 1100, "output_tokens": 1200},
+    }
+    assert result.report["llm_response_metadata"] == result.llm_response_metadata
+    assert len(result.report["raw_output_preview"]) > 4000
+
+
 def test_execute_instruction_with_audit_preserves_invalid_report_with_sinks(
    tmp_path,
    monkeypatch,
--- a/tests/test_admin_sync_api.py
+++ b/tests/test_admin_sync_api.py
@@ -0,0 +1,114 @@
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+
+from activity_core import api
+
+
+@pytest.mark.asyncio
+async def test_admin_sync_definitions_only_does_not_require_temporal(
+    monkeypatch,
+) -> None:
+    seen: dict[str, Any] = {}
+
+    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
+        seen.update(kwargs)
+        return {"ok": True, "ran": {"definitions": True}}
+
+    monkeypatch.setattr(api, "_session_factory", object())
+    monkeypatch.setattr(api, "_temporal_client", None)
+    monkeypatch.setattr(api, "run_sync", fake_run_sync)
+
+    result = await api.admin_sync(
+        definitions=True,
+        schedules=False,
+        event_types=False,
+    )
+
+    assert result == {"ok": True, "ran": {"definitions": True}}
+    assert seen["session_factory"] is api._session_factory
+    assert seen["temporal_client"] is None
+    assert seen["definitions"] is True
+    assert seen["schedules"] is False
+    assert seen["event_types"] is False
+
+
+@pytest.mark.asyncio
+async def test_admin_sync_schedules_only_passes_temporal(monkeypatch) -> None:
+    temporal = object()
+    seen: dict[str, Any] = {}
+
+    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
+        seen.update(kwargs)
+        return {
+            "ok": True,
+            "schedules": {
+                "upserted": 1,
+                "paused": 0,
+                "deleted_orphans": 0,
+            },
+        }
+
+    monkeypatch.setattr(api, "_session_factory", object())
+    monkeypatch.setattr(api, "_temporal_client", temporal)
+    monkeypatch.setattr(api, "run_sync", fake_run_sync)
+
+    result = await api.admin_sync(
+        definitions=False,
+        schedules=True,
+        event_types=False,
+    )
+
+    assert result["schedules"]["upserted"] == 1
+    assert seen["temporal_client"] is temporal
+    assert seen["definitions"] is False
+    assert seen["schedules"] is True
+    assert seen["event_types"] is False
+
+
+@pytest.mark.asyncio
+async def test_admin_sync_all_sync_returns_failure_result(monkeypatch) -> None:
+    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
+        return {
+            "ok": False,
+            "ran": {
+                "definitions": kwargs["definitions"],
+                "schedules": kwargs["schedules"],
+                "event_types": kwargs["event_types"],
+            },
+            "errors": [
+                {
+                    "stage": "event_types",
+                    "type": "RuntimeError",
+                    "message": "bad event type",
+                }
+            ],
+        }
+
+    monkeypatch.setattr(api, "_session_factory", object())
+    monkeypatch.setattr(api, "_temporal_client", object())
+    monkeypatch.setattr(api, "run_sync", fake_run_sync)
+
+    result = await api.admin_sync(
+        definitions=True,
+        schedules=True,
+        event_types=True,
+    )
+
+    assert result == {
+        "ok": False,
+        "ran": {
+            "definitions": True,
+            "schedules": True,
+            "event_types": True,
+        },
+        "errors": [
+            {
+                "stage": "event_types",
+                "type": "RuntimeError",
+                "message": "bad event type",
+            }
+        ],
+    }
--- a/tests/test_automation_status.py
+++ b/tests/test_automation_status.py
@@ -0,0 +1,289 @@
+from __future__ import annotations
+
+import asyncio
+import json
+from datetime import datetime
+from pathlib import Path
+from zoneinfo import ZoneInfo
+
+from activity_core import automation_status as status
+
+ACTIVITY_ID = "00000000-0000-0000-0000-000000000123"
+
+
+def _window():
+    return status.resolve_window(
+        "2026-06-26",
+        "2026-06-29",
+        "Europe/Berlin",
+    )
+
+
+def _definition(enabled: bool = True):
+    return {
+        "id": ACTIVITY_ID,
+        "name": "Daily Check",
+        "enabled": enabled,
+        "trigger_type": "cron",
+        "trigger_config": {
+            "trigger_type": "cron",
+            "cron_expression": "0 9 * * *",
+            "timezone": "Europe/Berlin",
+            "misfire_policy": "skip",
+        },
+        "source": "test",
+    }
+
+
+def test_friday_shortcut_resolves_to_previous_friday_start() -> None:
+    now = datetime(2026, 6, 29, 12, 0, tzinfo=ZoneInfo("Europe/Berlin"))
+
+    window = status.resolve_window("friday", None, "Europe/Berlin", now=now)
+
+    assert window["since"].isoformat() == "2026-06-26T00:00:00+02:00"
+    assert window["until"].isoformat() == "2026-06-29T12:00:00+02:00"
+
+
+def test_expected_fires_for_simple_cron_window() -> None:
+    fires = status.expected_fires(_definition(), _window())
+
+    assert fires == [
+        "2026-06-26T09:00:00+02:00",
+        "2026-06-27T09:00:00+02:00",
+        "2026-06-28T09:00:00+02:00",
+        "2026-06-29T09:00:00+02:00",
+    ]
+
+
+def test_completed_when_expected_run_exists() -> None:
+    run = {
+        "run_id": "run-1",
+        "activity_id": ACTIVITY_ID,
+        "scheduled_for": "2026-06-26T07:00:00+00:00",
+        "fired_at": "2026-06-26T07:00:10+00:00",
+        "tasks_spawned": 1,
+    }
+
+    report = status.classify_activity(
+        _definition(),
+        _window(),
+        [run],
+        [{"source": "state_hub_progress", "run_id": "run-1", "output_validated": True}],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=True,
+    )
+
+    assert report["status"] == "completed"
+
+
+def test_validation_failure_wins_over_completed_run() -> None:
+    run = {"run_id": "run-1", "activity_id": ACTIVITY_ID, "scheduled_for": None, "fired_at": "2026-06-26T07:00:10+00:00"}
+
+    report = status.classify_activity(
+        _definition(),
+        _window(),
+        [run],
+        [{"source": "working_memory", "run_id": "run-1", "output_validated": False}],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=True,
+    )
+
+    assert report["status"] == "validation_failed"
+
+
+def test_missed_when_expected_fire_has_no_run_and_runs_available() -> None:
+    report = status.classify_activity(
+        _definition(),
+        _window(),
+        [],
+        [],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=True,
+    )
+
+    assert report["status"] == "missed"
+
+
+def test_disabled_schedule_is_not_counted_as_missed() -> None:
+    report = status.classify_activity(
+        _definition(enabled=False),
+        _window(),
+        [],
+        [],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=True,
+    )
+
+    assert report["status"] == "disabled"
+
+
+def test_scheduled_definition_reports_one_shot_schedule_id() -> None:
+    definition = {
+        "id": ACTIVITY_ID,
+        "name": "One Shot",
+        "enabled": True,
+        "trigger_type": "scheduled",
+        "trigger_config": {
+            "trigger_type": "scheduled",
+            "at": "2026-06-26T09:00:00+02:00",
+            "timezone": "Europe/Berlin",
+        },
+        "source": "test",
+    }
+
+    report = status.classify_activity(
+        definition,
+        _window(),
+        [],
+        [],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=False,
+    )
+
+    assert status.automation_schedule_id(_definition()) == f"activity-schedule-{ACTIVITY_ID}"
+    assert report["schedule_id"] == f"activity-schedule-{ACTIVITY_ID}-once"
+
+
+def test_partial_source_availability_is_unknown_not_missed() -> None:
+    report = status.classify_activity(
+        _definition(),
+        _window(),
+        [],
+        [],
+        None,
+        ["2026-06-26T09:00:00+02:00"],
+        runs_available=False,
+    )
+
+    assert report["status"] == "unknown"
+    assert "missed-run verdict is unknown" in report["warnings"][0]
+
+
+def test_working_memory_frontmatter_evidence(tmp_path: Path) -> None:
+    note = tmp_path / "daily-triage-2026-06-26-run.md"
+    note.write_text(
+        "---\n"
+        "source: activity-core\n"
+        f"activity_id: {ACTIVITY_ID}\n"
+        "activity_core_run_id: run-1\n"
+        "scheduled_for: 2026-06-26T07:00:00+00:00\n"
+        "output_validated: false\n"
+        "created: 2026-06-26T07:01:00+00:00\n"
+        "---\n"
+        "body\n",
+        encoding="utf-8",
+    )
+
+    evidence, source = status.load_working_memory_evidence(str(tmp_path), _window())
+
+    assert source["status"] == "ok"
+    assert evidence[0]["run_id"] == "run-1"
+    assert evidence[0]["output_validated"] is False
+
+
+def _scheduled_definition(enabled: bool = False):
+    return {
+        "id": "00000000-0000-0000-0000-000000000456",
+        "name": "One Shot",
+        "enabled": enabled,
+        "trigger_type": "scheduled",
+        "trigger_config": {
+            "trigger_type": "scheduled",
+            "at": "2026-06-26T09:00:00+02:00",
+            "timezone": "Europe/Berlin",
+        },
+        "source": "db",
+    }
+
+
+def test_inventory_report_uses_db_definition_rows(monkeypatch) -> None:
+    async def fake_load_definitions(args, warnings):
+        return [dict(_definition(), source="db"), _scheduled_definition()], {"status": "ok", "source": "db"}
+
+    async def fake_temporal(host, namespace, definitions, *, timeout_seconds):
+        return {
+            ACTIVITY_ID: {
+                "schedule_id": f"activity-schedule-{ACTIVITY_ID}",
+                "available": True,
+                "paused": False,
+                "missed_catchup_window": 0,
+                "last_fired_at": None,
+            },
+        }, {"status": "ok", "count": 1}
+
+    monkeypatch.setattr(status, "load_definitions", fake_load_definitions)
+    monkeypatch.setattr(status, "load_temporal_visibility", fake_temporal)
+    args = status.parse_inventory_args(["--format", "json"])
+
+    report, exit_code = asyncio.run(status.build_inventory_report(args))
+
+    assert exit_code == 0
+    assert report["sources"]["definitions"] == {"status": "ok", "source": "db"}
+    assert report["summary"]["automation_count"] == 2
+    assert report["automations"][0]["definition_source"] == "db"
+    assert report["automations"][0]["temporal"]["status"] == "active"
+    assert report["automations"][1]["schedule_id"].endswith("-once")
+
+
+def test_inventory_file_fallback_when_db_url_missing(monkeypatch) -> None:
+    monkeypatch.setattr(status, "file_definitions", lambda: [dict(_definition(), source="files")])
+    args = status.parse_inventory_args(["--db-url", "", "--temporal-host", ""])
+
+    report, exit_code = asyncio.run(status.build_inventory_report(args))
+
+    assert exit_code == 0
+    assert report["sources"]["definitions"]["status"] == "degraded"
+    assert report["automations"][0]["definition_source"] == "files"
+    assert "ACTCORE_DB_URL is not set" in report["warnings"][0]
+
+
+def test_inventory_filters_disabled_definitions() -> None:
+    definitions = [_definition(enabled=True), _scheduled_definition(enabled=False)]
+
+    filtered = status.filter_inventory_definitions(
+        definitions,
+        ids=[],
+        names=[],
+        enabled=False,
+        trigger_types=set(),
+    )
+
+    assert [item["name"] for item in filtered] == ["One Shot"]
+
+
+def test_inventory_temporal_unavailable_is_warning_not_failure(monkeypatch) -> None:
+    async def fake_load_definitions(args, warnings):
+        return [_definition()], {"status": "ok", "source": "db"}
+
+    async def fake_temporal(host, namespace, definitions, *, timeout_seconds):
+        return {}, {"status": "unavailable", "warning": "Temporal unavailable: nope"}
+
+    monkeypatch.setattr(status, "load_definitions", fake_load_definitions)
+    monkeypatch.setattr(status, "load_temporal_visibility", fake_temporal)
+    args = status.parse_inventory_args([])
+
+    report, exit_code = asyncio.run(status.build_inventory_report(args))
+
+    assert exit_code == 0
+    assert report["automations"][0]["temporal"]["status"] == "not_checked"
+    assert report["warnings"] == ["Temporal unavailable: nope"]
+
+
+def test_inventory_cli_emits_json(monkeypatch, capsys) -> None:
+    monkeypatch.setattr(status, "file_definitions", lambda: [dict(_definition(), source="files")])
+
+    exit_code = asyncio.run(status.async_inventory_main([
+        "--db-url", "",
+        "--temporal-host", "",
+        "--format", "json",
+    ]))
+
+    payload = json.loads(capsys.readouterr().out)
+    assert exit_code == 0
+    assert payload["mode"] == "automation-inventory"
+    assert payload["automations"][0]["name"] == "Daily Check"
--- a/tests/test_instruction_evaluation.py
+++ b/tests/test_instruction_evaluation.py
@@ -1,6 +1,7 @@
 from __future__ import annotations

 import json
+from pathlib import Path

 import pytest

@@ -70,7 +71,14 @@ async def test_evaluate_instructions_returns_task_specs_with_audit(monkeypatch)
 async def test_evaluate_instructions_returns_report_payload(monkeypatch) -> None:
    llm = FakeLLMClient(json.dumps({
        "summary": "State Hub has open loose ends.",
-        "recommendations": [{"candidate": "CUST-WP-0045", "action": "work-next"}],
+        "recommendations": [
+            {
+                "rank": 1,
+                "candidate": "CUST-WP-0045",
+                "action": "work-next",
+                "why": "Open loose ends.",
+            }
+        ],
    }))
    monkeypatch.setattr(activities, "get_llm_client", lambda: llm)

@@ -209,6 +217,12 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "context": {},
    })

+    # Read the live schema file rather than hard-coding it, so the forwarded
+    # json_schema assertion tracks schemas/daily-triage-report.json as the
+    # contract evolves (ACTIVITY-WP-0016-T02).
+    expected_schema = json.loads(
+        Path("schemas/daily-triage-report.json").read_text(encoding="utf-8")
+    )
    assert llm.calls[0][2] == {
        "model_name": "custodian-triage-balanced",
        "temperature": 0.2,
@@ -216,16 +230,6 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "max_depth": 2,
        "model_params": {
            "reasoning_effort": "medium",
-            "json_schema": {
-                "type": "object",
-                "required": ["summary", "recommendations"],
-                "properties": {
-                    "summary": {"type": "string"},
-                    "recommendations": {
-                        "type": "array",
-                        "items": {"type": "object"},
-                    },
-                },
-            },
+            "json_schema": expected_schema,
        },
    }
--- a/tests/test_issue_sink.py
+++ b/tests/test_issue_sink.py
@@ -34,7 +34,7 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:

    monkeypatch.setattr(httpx, "post", fake_post)

-    ref = IssueCoreRestSink("http://issue-core.test/").emit(TaskSpec(
+    ref = IssueCoreRestSink("http://issue-core.test/", api_key="test-key").emit(TaskSpec(
        title="Run SBOM rescan for activity-core",
        description="SBOM is older than 30 days.",
        target_repo="activity-core",
@@ -67,12 +67,30 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:
                "triggering_event_id": "scheduled",
                "activity_definition_id": "activity-1",
            },
+            "headers": {"Authorization": "Bearer test-key"},
            "timeout": 10.0,
        }
    ]
    assert "review_required" not in posts[0]["json"]


+def test_issue_core_rest_sink_requires_api_key() -> None:
+    sink = IssueCoreRestSink("http://issue-core.test/", api_key="")
+    with pytest.raises(RuntimeError, match="ISSUE_CORE_API_KEY"):
+        sink.emit(TaskSpec(
+            title="t",
+            description="",
+            target_repo="activity-core",
+            priority="low",
+            labels=[],
+            due_in_days=None,
+            source_type="rule",
+            source_id="r",
+            triggering_event_id="e",
+            activity_definition_id="a",
+        ))
+
+
@pytest.mark.asyncio
 async def test_emit_tasks_raises_when_sink_fails(monkeypatch) -> None:
    class FailingSink:
--- a/tests/test_llm_client.py
+++ b/tests/test_llm_client.py
@@ -13,7 +13,12 @@ def test_llm_connect_client_forwards_run_config(monkeypatch) -> None:
            pass

        def json(self) -> dict:
-            return {"content": '{"summary":"ok","recommendations":[]}'}
+            return {
+                "content": '{"summary":"ok","recommendations":[]}',
+                "finish_reason": "stop",
+                "usage": {"input_tokens": 10, "output_tokens": 20},
+                "raw_response": {"provider_blob": "not persisted"},
+            }

    def fake_post(url: str, json: dict, timeout: float) -> Response:
        captured["url"] = url
@@ -50,3 +55,7 @@ def test_llm_connect_client_forwards_run_config(monkeypatch) -> None:
            "timeout_seconds": 42,
        },
    }
+    assert client.last_response_metadata == {
+        "finish_reason": "stop",
+        "usage": {"input_tokens": 10, "output_tokens": 20},
+    }
--- a/tests/test_ops_evidence_sinks.py
+++ b/tests/test_ops_evidence_sinks.py
@@ -166,6 +166,93 @@ def test_state_hub_progress_sink_is_idempotent(monkeypatch) -> None:
    assert result[0]["idempotency_key"] == idempotency_key


+def test_core_hub_interaction_event_sink_posts_and_verifies_compact_event(monkeypatch) -> None:
+    posts: list[dict[str, Any]] = []
+
+    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
+        assert url == "http://core-hub.test/api/v2/interaction-events"
+        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
+        posts.append({"url": url, **kwargs})
+        return DummyResponse(
+            {
+                "id": "event-1",
+                "eventType": "ops-endpoint-verified",
+                "widgetId": "widget-1",
+            }
+        )
+
+    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
+        assert url == "http://core-hub.test/api/v2/interaction-events"
+        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
+        return DummyResponse({"data": [{"id": "event-1"}]})
+
+    monkeypatch.setenv("CORE_HUB_RUNTIME_TOKEN", "runtime-secret")
+    monkeypatch.setattr(httpx, "post", fake_post)
+    monkeypatch.setattr(httpx, "get", fake_get)
+
+    result = persist_ops_inventory_evidence(
+        _payload([
+            {
+                "type": "core-hub-interaction-event",
+                "core_hub_url": "http://core-hub.test",
+                "widget_id": "widget-1",
+                "event_type": "ops-endpoint-verified",
+            }
+        ])
+    )
+
+    assert result == [
+        {
+            "type": "core-hub-interaction-event",
+            "status": "posted",
+            "event_type": "ops-endpoint-verified",
+            "event_id": "event-1",
+            "widget_id": "widget-1",
+            "verified": True,
+            "context_key": "ops_probe",
+        }
+    ]
+    body = posts[0]["json"]
+    assert body["widgetId"] == "widget-1"
+    assert body["eventType"] == "ops-endpoint-verified"
+    assert body["metadata"]["activity_core_run_id"] == _run_id()
+    assert body["metadata"]["endpoint"]["url"] == "http://state-hub.test/health"
+    assert body["metadata"]["endpoint"]["widget_ref"] == "ops:endpoint:state-hub-health"
+
+    serialized = json.dumps(body, sort_keys=True)
+    assert "runtime-secret" not in serialized
+    assert "secret response body" not in serialized
+    assert "Authorization" not in serialized
+    assert "user:pass" not in serialized
+    assert "token=secret" not in serialized
+
+
+def test_core_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
+    monkeypatch.delenv("CORE_HUB_BASE_URL", raising=False)
+    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN", raising=False)
+    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN_FILE", raising=False)
+    monkeypatch.delenv("CORE_HUB_WIDGET_ID", raising=False)
+    monkeypatch.delenv("CORE_HUB_WIDGET_MAPPING", raising=False)
+
+    result = persist_ops_inventory_evidence(
+        _payload([{"type": "core-hub-interaction-event"}])
+    )
+
+    assert result == [
+        {
+            "type": "core-hub-interaction-event",
+            "status": "skipped",
+            "reason": "missing_core_hub_config",
+            "missing": [
+                "CORE_HUB_BASE_URL",
+                "CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE",
+                "widget_id or CORE_HUB_WIDGET_ID",
+            ],
+            "context_key": "ops_probe",
+        }
+    ]
+
+
 def test_inter_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
    monkeypatch.delenv("INTER_HUB_URL", raising=False)
    monkeypatch.delenv("OPS_HUB_KEY", raising=False)
--- a/tests/test_railiance_ops_inventory_wiring.py
+++ b/tests/test_railiance_ops_inventory_wiring.py
@@ -93,12 +93,21 @@ def test_external_configmap_projects_enabled_daily_wsjf_definition(tmp_path) ->
    assert definition.trigger_config["cron_expression"] == "20 7 * * *"
    assert definition.trigger_config["timezone"] == "Europe/Berlin"
    assert instruction["id"] == "daily-triage-report"
+    assert instruction["max_tokens"] == 1800
+    assert "most 7 recommendations" in instruction["prompt"]
+    assert "fewer well-formed" in instruction["prompt"]
    assert instruction["output_schema"] == (
        "/etc/activity-core/schemas/daily-triage-report.json"
    )
    assert instruction["report_sinks"][0]["type"] == "working-memory"
    assert instruction["report_sinks"][1]["event_type"] == "daily_triage"

+    schema = _by_kind_name("ConfigMap", "actcore-report-schemas")
+    daily_schema = yaml.safe_load(schema["data"]["daily-triage-report.json"])
+    recommendations = daily_schema["properties"]["recommendations"]
+    assert recommendations["maxItems"] == 7
+    assert recommendations["items"]["properties"]["rank"]["maximum"] == 7
+

 def test_ops_inventory_configmap_contains_probeable_inventory() -> None:
    config = _by_kind_name("ConfigMap", "actcore-ops-service-inventory")
--- a/tests/test_report_sinks.py
+++ b/tests/test_report_sinks.py
@@ -37,6 +37,10 @@ def _payload(sinks: list[dict[str, Any]]) -> dict[str, Any]:
                "output_validated": True,
                "review_required": False,
                "validation_error": None,
+                "llm_response_metadata": {
+                    "finish_reason": "stop",
+                    "usage": {"output_tokens": 50},
+                },
            }
        ],
    }
@@ -62,6 +66,8 @@ def test_working_memory_sink_writes_idempotently(tmp_path) -> None:
    assert "output_validated: true" in text
    assert "review_required: false" in text
    assert "model: test-model" in text
+    assert "LLM response metadata:" in text
+    assert '"finish_reason": "stop"' in text
    assert "State Hub has loose ends." in text


@@ -113,6 +119,10 @@ def test_state_hub_progress_sink_posts(monkeypatch) -> None:
    assert posts[0]["json"]["detail"]["activity_core_run_id"] == payload_run_id()
    assert posts[0]["json"]["detail"]["output_validated"] is True
    assert posts[0]["json"]["detail"]["review_required"] is False
+    assert posts[0]["json"]["detail"]["llm_response_metadata"] == {
+        "finish_reason": "stop",
+        "usage": {"output_tokens": 50},
+    }


 def test_state_hub_progress_includes_prior_working_memory_path(
--- a/tests/test_reuse_surface_context_resolver.py
+++ b/tests/test_reuse_surface_context_resolver.py
@@ -0,0 +1,167 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+import pytest
+from temporalio.exceptions import ApplicationError
+
+from activity_core.activities import resolve_context
+from activity_core.context_resolvers import reuse_surface
+from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY
+
+
+class _Response:
+    def __init__(self, payload: Any) -> None:
+        self._payload = payload
+
+    def raise_for_status(self) -> None:
+        return None
+
+    def json(self) -> Any:
+        return self._payload
+
+
+class _Completed:
+    returncode = 0
+    stderr = ""
+
+    def __init__(self, payload: dict[str, Any]) -> None:
+        self.stdout = json.dumps(payload)
+
+
+def _write_rollout(path: Path) -> None:
+    path.write_text(
+        """
+domains:
+  reuse:
+    phase: active
+    repos:
+      - reuse-surface
+      - activity-core
+  parked:
+    phase: backlog
+    repos:
+      - ignored-repo
+""".lstrip(),
+        encoding="utf-8",
+    )
+
+
+def _write_cli_only_signals(path: Path) -> None:
+    path.write_text(
+        """
+signals:
+  empty_capability_scaffold:
+    enabled: true
+  registry_gap:
+    enabled: false
+  stale_scope:
+    enabled: false
+  stale_sbom:
+    enabled: false
+  publish_check_fail:
+    enabled: false
+""".lstrip(),
+        encoding="utf-8",
+    )
+
+
+def test_shell_resolver_emits_reuse_surface_gaps_and_advances_cursor(
+    tmp_path,
+    monkeypatch,
+) -> None:
+    rollout = tmp_path / "rollout.yaml"
+    _write_rollout(rollout)
+    _write_cli_only_signals(tmp_path / "signals.yml")
+    reuse_root = tmp_path / "reuse-surface"
+    reuse_root.mkdir()
+    (reuse_root / "SCOPE.md").write_text("fresh\n", encoding="utf-8")
+    activity_root = tmp_path / "activity-core"
+    activity_root.mkdir()
+
+    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "runner")
+
+    def fake_get(url: str, **kwargs: Any) -> _Response:
+        assert url.endswith("/repos/")
+        return _Response(
+            [
+                {
+                    "slug": "reuse-surface",
+                    "host_paths": {"runner": str(reuse_root)},
+                },
+                {
+                    "slug": "activity-core",
+                    "host_paths": {"runner": str(activity_root)},
+                },
+            ]
+        )
+
+    def fake_run(cmd: list[str], **kwargs: Any) -> _Completed:
+        assert cmd == ["reuse-surface", "report", "gaps", "--format", "json"]
+        return _Completed({"empty_scaffolds": ["reuse-surface"]})
+
+    monkeypatch.setattr(reuse_surface.httpx, "get", fake_get)
+    monkeypatch.setattr(reuse_surface.subprocess, "run", fake_run)
+
+    import activity_core.context_resolvers  # noqa: F401
+
+    result = CONTEXT_RESOLVER_REGISTRY["shell"]().resolve(
+        "reuse_surface_report_gaps",
+        None,
+        {
+            "roster": str(rollout),
+            "batch_size": 1,
+        },
+    )
+
+    assert result == {
+        "gaps": [
+            {
+                "repo": "reuse-surface",
+                "root": str(reuse_root),
+                "signal": "empty_capability_scaffold",
+                "hygiene_signal": "empty_capability_scaffold",
+            }
+        ]
+    }
+    state = json.loads((tmp_path / "round-robin-state.json").read_text(encoding="utf-8"))
+    assert state["cursor"] == 1
+    assert state["last_batch"] == ["reuse-surface"]
+
+
+def test_shell_resolver_keeps_kaizen_fallback_for_existing_queries() -> None:
+    assert CONTEXT_RESOLVER_REGISTRY["shell"]().resolve("unknown_query", None, {}) == {}
+
+
+@pytest.mark.asyncio
+async def test_optional_reuse_surface_missing_roster_binds_empty_list(tmp_path) -> None:
+    snapshot = await resolve_context(
+        [
+            {
+                "type": "shell",
+                "query": "reuse_surface_report_gaps",
+                "params": {"roster": str(tmp_path / "missing.yaml")},
+                "bind_to": "context.gaps",
+            }
+        ]
+    )
+
+    assert snapshot == {"gaps": []}
+
+
+@pytest.mark.asyncio
+async def test_required_reuse_surface_missing_roster_fails_visibly(tmp_path) -> None:
+    with pytest.raises(ApplicationError, match="Required context resolver"):
+        await resolve_context(
+            [
+                {
+                    "type": "shell",
+                    "query": "reuse_surface_report_gaps",
+                    "params": {"roster": str(tmp_path / "missing.yaml")},
+                    "bind_to": "context.gaps",
+                    "required": True,
+                }
+            ]
+        )
--- a/tests/test_schedule_health.py
+++ b/tests/test_schedule_health.py
@@ -0,0 +1,81 @@
+"""ACTIVITY-WP-0014 T03: missed-fire detection verdict tests."""
+
+from __future__ import annotations
+
+from datetime import datetime, timedelta, timezone
+
+from activity_core.schedule_health import evaluate_schedule_health
+
+NOW = datetime(2026, 6, 23, 12, 0, tzinfo=timezone.utc)
+
+
+def test_healthy_when_recent_fire_and_no_drops() -> None:
+    health = evaluate_schedule_health(
+        activity_id="a1",
+        missed_catchup_window=0,
+        last_fired_at=NOW - timedelta(minutes=5),
+        now=NOW,
+        expected_interval=timedelta(hours=1),
+    )
+    assert health.healthy is True
+    assert health.missed is False
+    assert health.reasons == []
+
+
+def test_unhealthy_when_catchup_window_dropped_fires() -> None:
+    health = evaluate_schedule_health(
+        activity_id="a1",
+        missed_catchup_window=2,
+        last_fired_at=NOW - timedelta(minutes=5),
+        now=NOW,
+    )
+    assert health.missed is True
+    assert "2 fire(s) dropped" in health.reasons[0]
+
+
+def test_unhealthy_when_last_fire_too_stale() -> None:
+    health = evaluate_schedule_health(
+        activity_id="daily",
+        missed_catchup_window=0,
+        last_fired_at=NOW - timedelta(days=2),
+        now=NOW,
+        expected_interval=timedelta(days=1),
+    )
+    assert health.missed is True
+    assert any("exceeding the expected" in r for r in health.reasons)
+    assert health.staleness == timedelta(days=2)
+
+
+def test_within_tolerance_is_healthy() -> None:
+    health = evaluate_schedule_health(
+        activity_id="daily",
+        missed_catchup_window=0,
+        last_fired_at=NOW - (timedelta(days=1) + timedelta(minutes=5)),
+        now=NOW,
+        expected_interval=timedelta(days=1),
+        tolerance=timedelta(minutes=10),
+    )
+    assert health.healthy is True
+
+
+def test_no_fire_recorded_for_due_schedule_is_unhealthy() -> None:
+    health = evaluate_schedule_health(
+        activity_id="daily",
+        missed_catchup_window=0,
+        last_fired_at=None,
+        now=NOW,
+        expected_interval=timedelta(days=1),
+    )
+    assert health.missed is True
+    assert "no recorded fire" in health.reasons[0]
+
+
+def test_no_interval_and_no_fire_is_not_flagged() -> None:
+    # Without an expected interval we cannot assert a miss from absence alone.
+    health = evaluate_schedule_health(
+        activity_id="event-ish",
+        missed_catchup_window=0,
+        last_fired_at=None,
+        now=NOW,
+    )
+    assert health.healthy is True
--- a/tests/test_schedule_lifecycle.py
+++ b/tests/test_schedule_lifecycle.py
@@ -37,6 +37,7 @@ def _make_defn(
    misfire_policy: str = "skip",
    enabled: bool = True,
    jitter: int = 0,
+    catchup_window_seconds: int | None = None,
 ) -> ActivityDefinition:
    return ActivityDefinition(
        id=uuid.uuid4(),
@@ -46,6 +47,7 @@ def _make_defn(
            cron_expression=cron,
            misfire_policy=misfire_policy,
            jitter_seconds=jitter,
+            catchup_window_seconds=catchup_window_seconds,
        ),
    )

@@ -186,6 +188,76 @@ async def test_misfire_policy_compress_sets_overlap_buffer_one(env: WorkflowEnvi
    await delete_schedule(env.client, defn.id)


+# ── ACTIVITY-WP-0014: explicit run-miss policies + catchup window ────────────
+
+@pytest.mark.asyncio
+async def test_skip_sets_short_catchup_window(env: WorkflowEnvironment) -> None:
+    """skip = run on trigger or skip: tiny grace window, no real recovery."""
+    defn = _make_defn(misfire_policy="skip")
+    await upsert_schedule(env.client, defn)
+
+    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
+    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.SKIP
+    assert desc.schedule.policy.catchup_window == timedelta(seconds=60)
+
+    await delete_schedule(env.client, defn.id)
+
+
+@pytest.mark.asyncio
+async def test_catchup_all_recovers_full_window(env: WorkflowEnvironment) -> None:
+    """catchup_all = recover every missed fire: long window, BUFFER_ALL."""
+    defn = _make_defn(misfire_policy="catchup_all")
+    await upsert_schedule(env.client, defn)
+
+    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
+    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ALL
+    assert desc.schedule.policy.catchup_window == timedelta(days=365)
+
+    await delete_schedule(env.client, defn.id)
+
+
+@pytest.mark.asyncio
+async def test_catchup_latest_does_not_accumulate(env: WorkflowEnvironment) -> None:
+    """catchup_latest = recover only the most recent missed fire: BUFFER_ONE."""
+    defn = _make_defn(misfire_policy="catchup_latest")
+    await upsert_schedule(env.client, defn)
+
+    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
+    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ONE
+    assert desc.schedule.policy.catchup_window == timedelta(hours=24)
+
+    await delete_schedule(env.client, defn.id)
+
+
+@pytest.mark.asyncio
+async def test_legacy_aliases_map_to_explicit_policies(env: WorkflowEnvironment) -> None:
+    """Legacy catchup/compress keep working and pick up the new catchup windows."""
+    catchup = _make_defn(misfire_policy="catchup")
+    compress = _make_defn(misfire_policy="compress")
+    await upsert_schedule(env.client, catchup)
+    await upsert_schedule(env.client, compress)
+
+    d1 = await env.client.get_schedule_handle(schedule_id(catchup.id)).describe()
+    d2 = await env.client.get_schedule_handle(schedule_id(compress.id)).describe()
+    assert d1.schedule.policy.catchup_window == timedelta(days=365)
+    assert d2.schedule.policy.catchup_window == timedelta(hours=24)
+
+    await delete_schedule(env.client, catchup.id)
+    await delete_schedule(env.client, compress.id)
+
+
+@pytest.mark.asyncio
+async def test_explicit_catchup_window_override(env: WorkflowEnvironment) -> None:
+    """An explicit catchup_window_seconds overrides the per-policy default."""
+    defn = _make_defn(misfire_policy="skip", catchup_window_seconds=7200)
+    await upsert_schedule(env.client, defn)
+
+    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
+    assert desc.schedule.policy.catchup_window == timedelta(hours=2)
+
+    await delete_schedule(env.client, defn.id)
+
+
@pytest.mark.asyncio
 async def test_schedule_smoke_test_creates_one_shot_schedule(
    env: WorkflowEnvironment,
--- a/tests/test_state_hub_context_resolver.py
+++ b/tests/test_state_hub_context_resolver.py
@@ -407,6 +407,70 @@ def test_recently_on_scope_hourly_failure_bubbles(monkeypatch) -> None:
        StateHubContextResolver().resolve("recently_on_scope_hourly", None, {"range": "1h"})


+def test_consistency_sweep_remote_all_posts_batch(monkeypatch) -> None:
+    calls: list[dict[str, Any]] = []
+
+    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
+        calls.append({"url": url, **kwargs})
+        return DummyResponse(
+            {
+                "exit_code": 0,
+                "lock_skipped": False,
+                "repos_processed": [{"repo_slug": "state-hub", "result": "pass"}],
+                "skipped_clean": ["quiet-repo"],
+                "skipped_missing": [],
+                "skipped_budget": [],
+            }
+        )
+
+    monkeypatch.setenv("STATE_HUB_URL", "http://state-hub.test/")
+    monkeypatch.setattr(httpx, "post", fake_post)
+
+    result = StateHubContextResolver().resolve(
+        "consistency_sweep_remote_all",
+        None,
+        {"max_seconds": 300, "source": "activity-core", "required": True},
+    )
+
+    assert result["exit_code"] == 0
+    assert result["repos_processed"][0]["repo_slug"] == "state-hub"
+    assert calls == [
+        {
+            "url": "http://state-hub.test/consistency/sweep/remote-all",
+            "json": {"max_seconds": 300, "source": "activity-core"},
+            "timeout": 330.0,
+        }
+    ]
+
+
+def test_consistency_sweep_remote_all_failure_bubbles(monkeypatch) -> None:
+    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
+        raise httpx.ConnectError("offline")
+
+    monkeypatch.setattr(httpx, "post", fake_post)
+
+    with pytest.raises(httpx.ConnectError):
+        StateHubContextResolver().resolve(
+            "consistency_sweep_remote_all",
+            None,
+            {"max_seconds": 300},
+        )
+
+
+def test_consistency_sweep_remote_all_rejects_empty_response(monkeypatch) -> None:
+    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
+        return DummyResponse({})
+
+    monkeypatch.setattr(httpx, "post", fake_post)
+
+    with pytest.raises(RuntimeError, match="missing required key"):
+        StateHubContextResolver().resolve(
+            "consistency_sweep_remote_all",
+            None,
+            {"max_seconds": 300},
+        )
+
+
 def test_recently_on_scope_hourly_rejects_empty_response(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse({})
--- a/tests/test_state_hub_write.py
+++ b/tests/test_state_hub_write.py
@@ -0,0 +1,81 @@
+"""ACTIVITY-WP-0014 T05: idempotency-keyed State Hub writes."""
+
+from __future__ import annotations
+
+import httpx
+import pytest
+
+from activity_core import report_sinks
+from activity_core.state_hub_write import (
+    IDEMPOTENCY_HEADER,
+    idempotency_headers,
+    idempotency_key,
+)
+
+
+def test_key_is_stable_and_deterministic() -> None:
+    a = idempotency_key("run1", "daily-triage-report", "daily_triage")
+    b = idempotency_key("run1", "daily-triage-report", "daily_triage")
+    assert a == b == "run1:daily-triage-report:daily_triage"
+
+
+def test_key_shape_stable_with_missing_parts() -> None:
+    assert idempotency_key("run1", None, "daily_triage") == "run1::daily_triage"
+
+
+def test_key_sanitizes_control_and_whitespace() -> None:
+    key = idempotency_key("run 1", "a\tb", "x\n")
+    assert "\t" not in key and "\n" not in key and " " not in key
+
+
+def test_headers_carry_the_key() -> None:
+    headers = idempotency_headers("run1", "i", "e")
+    assert headers == {IDEMPOTENCY_HEADER: "run1:i:e"}
+
+
+def test_distinct_identities_get_distinct_keys() -> None:
+    assert idempotency_key("r", "i", "daily_triage") != idempotency_key(
+        "r", "i", "schedule_miss"
+    )
+
+
+def test_progress_exists_is_best_effort_on_connection_error(monkeypatch) -> None:
+    """A down State Hub must not hard-fail the dedup read; it returns False so the
+    keyed write can still proceed."""
+
+    def _boom(*args, **kwargs):
+        raise httpx.ConnectError("Connection refused")
+
+    monkeypatch.setattr(report_sinks.httpx, "get", _boom)
+    assert (
+        report_sinks._progress_exists(
+            "http://127.0.0.1:8000", "run1", "daily-triage-report", "daily_triage"
+        )
+        is False
+    )
+
+
+def test_report_sink_post_sends_idempotency_header(monkeypatch) -> None:
+    """The state-hub-progress write carries a stable Idempotency-Key header."""
+    captured: dict[str, object] = {}
+
+    monkeypatch.setattr(report_sinks, "_progress_exists", lambda *a, **k: False)
+
+    class _Resp:
+        def raise_for_status(self) -> None: ...
+        def json(self) -> dict[str, str]:
+            return {"id": "pid-1"}
+
+    def _capture_post(url, json, headers, timeout):  # noqa: A002
+        captured["headers"] = headers
+        return _Resp()
+
+    monkeypatch.setattr(report_sinks.httpx, "post", _capture_post)
+
+    payload = {"run_id": "run1", "activity_id": "act1", "scheduled_for": None}
+    report_entry = {"instruction_id": "daily-triage-report", "report": {"summary": "s"}}
+    sink = {"event_type": "daily_triage"}
+
+    result = report_sinks._post_state_hub_progress(payload, report_entry, sink)
+    assert result["status"] == "posted"
+    assert captured["headers"][IDEMPOTENCY_HEADER] == "run1:daily-triage-report:daily_triage"
--- a/tests/test_sync_schedules.py
+++ b/tests/test_sync_schedules.py
@@ -0,0 +1,126 @@
+from __future__ import annotations
+
+import uuid
+from datetime import datetime, timezone
+from types import SimpleNamespace
+from typing import Any
+
+import pytest
+
+from activity_core import sync_schedules
+
+
+def _row(
+    *,
+    activity_id: uuid.UUID,
+    enabled: bool,
+    trigger_config: dict[str, Any],
+) -> SimpleNamespace:
+    return SimpleNamespace(
+        id=activity_id,
+        name=f"definition-{activity_id}",
+        enabled=enabled,
+        trigger_config=trigger_config,
+        context_sources=[],
+        task_templates=[],
+        dedupe_key_strategy="skip",
+        version=1,
+    )
+
+
+@pytest.mark.asyncio
+async def test_sync_schedule_rows_reports_drift_counts_and_preserves_one_shots(
+    monkeypatch,
+) -> None:
+    new_id = uuid.uuid4()
+    disabled_old_id = uuid.uuid4()
+    one_shot_id = uuid.uuid4()
+    orphan_id = uuid.uuid4()
+    upserted: list[tuple[uuid.UUID, bool, str]] = []
+    deleted: list[str] = []
+
+    async def fake_upsert_schedule(client: object, defn: object) -> None:
+        upserted.append((
+            defn.id,
+            defn.enabled,
+            defn.trigger_config.trigger_type,
+        ))
+
+    async def fake_list_schedules(client: object) -> list[dict[str, str]]:
+        return [
+            {
+                "schedule_id": f"activity-schedule-{disabled_old_id}",
+                "activity_id": str(disabled_old_id),
+            },
+            {
+                "schedule_id": f"activity-schedule-{one_shot_id}-once",
+                "activity_id": f"{one_shot_id}-once",
+            },
+            {
+                "schedule_id": f"activity-schedule-{orphan_id}",
+                "activity_id": str(orphan_id),
+            },
+        ]
+
+    async def fake_delete_schedule(client: object, activity_id: str) -> None:
+        deleted.append(activity_id)
+
+    monkeypatch.setattr(sync_schedules, "upsert_schedule", fake_upsert_schedule)
+    monkeypatch.setattr(sync_schedules, "list_schedules", fake_list_schedules)
+    monkeypatch.setattr(sync_schedules, "delete_schedule", fake_delete_schedule)
+
+    result = await sync_schedules.sync_schedule_rows(
+        object(),
+        [
+            _row(
+                activity_id=new_id,
+                enabled=True,
+                trigger_config={
+                    "trigger_type": "cron",
+                    "cron_expression": "20 7 * * *",
+                    "timezone": "Europe/Berlin",
+                    "misfire_policy": "skip",
+                },
+            ),
+            _row(
+                activity_id=disabled_old_id,
+                enabled=False,
+                trigger_config={
+                    "trigger_type": "cron",
+                    "cron_expression": "20 * * * *",
+                    "timezone": "Europe/Berlin",
+                    "misfire_policy": "skip",
+                },
+            ),
+            _row(
+                activity_id=one_shot_id,
+                enabled=True,
+                trigger_config={
+                    "trigger_type": "scheduled",
+                    "at": datetime(2026, 6, 19, 8, 0, tzinfo=timezone.utc),
+                    "timezone": "UTC",
+                },
+            ),
+            _row(
+                activity_id=uuid.uuid4(),
+                enabled=True,
+                trigger_config={
+                    "trigger_type": "event",
+                    "event_type": "kaizen.metrics.recorded",
+                    "filters": {},
+                },
+            ),
+        ],
+    )
+
+    assert result.to_dict() == {
+        "upserted": 2,
+        "paused": 1,
+        "deleted_orphans": 1,
+    }
+    assert upserted == [
+        (new_id, True, "cron"),
+        (disabled_old_id, False, "cron"),
+        (one_shot_id, True, "scheduled"),
+    ]
+    assert deleted == [str(orphan_id)]
--- a/tests/test_sync_service.py
+++ b/tests/test_sync_service.py
@@ -0,0 +1,134 @@
+from __future__ import annotations
+
+from typing import Any
+
+import pytest
+
+from activity_core import sync_service
+from activity_core.sync_schedules import ScheduleSyncResult
+
+
+@pytest.mark.asyncio
+async def test_run_sync_runs_requested_sections(monkeypatch) -> None:
+    calls: list[str] = []
+
+    async def fake_definitions(session_factory: object) -> int:
+        calls.append("definitions")
+        return 2
+
+    async def fake_event_types(session_factory: object) -> int:
+        calls.append("event_types")
+        return 5
+
+    async def fake_schedules(
+        temporal_client: object,
+        session_factory: object,
+    ) -> ScheduleSyncResult:
+        calls.append("schedules")
+        return ScheduleSyncResult(upserted=3, paused=1, deleted_orphans=2)
+
+    monkeypatch.setattr(sync_service, "sync_activity_definitions", fake_definitions)
+    monkeypatch.setattr(sync_service, "sync_event_types", fake_event_types)
+    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
+
+    result = await sync_service.run_sync(
+        session_factory=object(),
+        temporal_client=object(),
+        definitions=True,
+        schedules=True,
+        event_types=True,
+    )
+
+    assert calls == ["definitions", "event_types", "schedules"]
+    assert result["ok"] is True
+    assert result["ran"] == {
+        "definitions": True,
+        "schedules": True,
+        "event_types": True,
+    }
+    assert result["definitions"] == {"synced": 2}
+    assert result["event_types"] == {"synced": 5}
+    assert result["schedules"] == {
+        "upserted": 3,
+        "paused": 1,
+        "deleted_orphans": 2,
+    }
+    assert result["errors"] == []
+
+
+@pytest.mark.asyncio
+async def test_run_sync_collects_errors_and_continues(monkeypatch) -> None:
+    calls: list[str] = []
+
+    async def failing_definitions(session_factory: object) -> int:
+        calls.append("definitions")
+        raise RuntimeError("definition parse failed")
+
+    async def fake_schedules(
+        temporal_client: object,
+        session_factory: object,
+    ) -> ScheduleSyncResult:
+        calls.append("schedules")
+        return ScheduleSyncResult(upserted=1)
+
+    monkeypatch.setattr(
+        sync_service,
+        "sync_activity_definitions",
+        failing_definitions,
+    )
+    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
+
+    result = await sync_service.run_sync(
+        session_factory=object(),
+        temporal_client=object(),
+        definitions=True,
+        schedules=True,
+        event_types=False,
+    )
+
+    assert calls == ["definitions", "schedules"]
+    assert result["ok"] is False
+    assert result["definitions"] == {"synced": 0}
+    assert result["schedules"]["upserted"] == 1
+    assert result["errors"] == [
+        {
+            "stage": "definitions",
+            "type": "RuntimeError",
+            "message": "definition parse failed",
+        }
+    ]
+
+
+@pytest.mark.asyncio
+async def test_run_sync_reports_missing_temporal_client_for_schedules() -> None:
+    result = await sync_service.run_sync(
+        session_factory=object(),
+        temporal_client=None,
+        definitions=False,
+        schedules=True,
+        event_types=False,
+    )
+
+    assert result["ok"] is False
+    assert result["errors"] == [
+        {
+            "stage": "schedules",
+            "type": "RuntimeError",
+            "message": "Temporal client is required for schedule sync",
+        }
+    ]
+
+
+def test_record_error_bounds_error_count() -> None:
+    result: dict[str, Any] = {
+        "ok": True,
+        "errors": [],
+    }
+
+    for i in range(25):
+        sync_service._record_error(result, "stage", RuntimeError(f"boom {i}"))
+
+    assert result["ok"] is False
+    assert len(result["errors"]) == 20
+    assert result["errors"][0]["message"] == "boom 0"
+    assert result["errors"][-1]["message"] == "boom 19"
--- a/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
+++ b/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Post-triage operational hardening"
 domain: custodian
 repo: activity-core
-status: active
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-03"
-updated: "2026-06-16"
+updated: "2026-06-30"
 state_hub_workstream_id: "5646e13a-13af-4724-bca6-3c0d86f96733"
 ---

@@ -104,7 +104,7 @@ and emitted a validated `daily_triage` report plus working-memory note.

 ```task
 id: ACTIVITY-WP-0006-T03
-status: wait
+status: done
 priority: medium
 state_hub_task_id: "7cbf0a35-71a1-47ac-afc2-f51ad2180fd0"
 ```
@@ -174,6 +174,56 @@ the worker consumes the configured URL, then produce schema-valid daily triage
 evidence and three clean scheduled runs. This narrower path is tracked in
 `ACTIVITY-WP-0010`.

+2026-06-25: Consecutive-run streak resumed. State Hub `daily_triage` progress
+events from author `activity-core` fired on time on **2026-06-24 05:20:56Z** and
+**2026-06-25 05:20:47Z** (07:20 Berlin), both delivered, no misfires. That is two
+clean consecutive scheduled runs. **RECHECK 2026-06-26 (after 05:20Z):** confirm
+the 06-26 scheduled `daily_triage` event delivered. If clean, that completes three
+clean consecutive scheduled runs (06-24 / 06-25 / 06-26) — record the calibration
+result in State Hub and close T03. If the 06-26 run misfires or is missing, the
+streak resets and T03 stays `wait`. Flag deliberately kept in-repo (agent-agnostic)
+rather than tied to any single coding agent's scheduler.
+
+2026-06-26 recheck outcome: **streak reset at two.** The 06-26 scheduled run fired
+on time (`daily_triage` event 05:20:57Z) — scheduling layer healthy, no misfire —
+but the `daily-triage-report` instruction output **failed schema validation**:
+`Expecting ',' delimiter: line 136 column 22 (char 5268)`. The model produced a
+long ranked WSJF recommendation list (reached rank 7+ with nested `wsjf` objects)
+whose JSON broke ~char 5268; only a bounded 4000-char preview is preserved in the
+State Hub event, so the exact offending token needs the runtime llm-connect log.
+This is an LLM-output-quality failure (tracked by `ACTIVITY-WP-0010`), not a
+runtime/projection failure. T03 stays `wait`; three clean consecutive scheduled
+runs not yet achieved (06-24 ✅, 06-25 ✅, 06-26 ✗-validation).
+
+2026-06-27 recheck outcome: streak remains reset. The scheduled run fired and
+wrote State Hub progress plus working memory, but daily-triage-report failed
+validation again with an unterminated string around char 5246. This confirms the
+runner/sink path is alive and the active blocker is live deployment of the
+ACTIVITY-WP-0016 output-robustness bundle and runtime prompt/token changes, not
+a missing schedule. T03 stays wait until a post-deployment smoke passes and three
+new clean scheduled runs are collected.
+
+2026-06-30 early checkpoint: two new clean scheduled runs exist after the
+validation failures. State Hub daily_triage progress shows 2026-06-28
+05:20:51Z run `6a44d6dd-3f02-53f2-a5d8-d42b76b0ef98` and 2026-06-29
+05:20:49Z run `1dfb47c9-07bf-551b-b778-1d21a40bd95c`, both with
+`output_validated=true` and working-memory notes written. The current local time
+was 2026-06-30 01:37 Europe/Berlin, before the expected 07:20 Berlin scheduled
+fire, so the three-clean-run gate cannot close yet. Recheck after 2026-06-30
+05:20Z; if that scheduled run validates, the clean streak is 06-28 / 06-29 /
+06-30 and T03 can close with calibration feedback.
+
+2026-06-30 closeout: the 07:20 Berlin scheduled run fired at 05:20:50Z as run
+`ac3d71a0-2f8f-50df-b3ce-7c60c2abb5c5` with `output_validated=true` and a
+working-memory note written. The post-failure clean streak is now complete:
+2026-06-28 (`6a44d6dd`), 2026-06-29 (`1dfb47c9`), and 2026-06-30 (`ac3d71a0`).
+Calibration feedback: the scheduler, worker, llm-connect route, State Hub sink,
+and working-memory sink are stable again; the recommendations were operationally
+useful but too dense at 10 items, repeatedly emphasizing human-dependency and
+infrastructure-unblock work. ACTIVITY-WP-0016 now owns the density/contract fix:
+Railiance runtime projection was aligned to a top-7 contract so the next live
+run can prove the bounded output posture. T03 is done.
+
 ## Rule Action Contract Documentation

 ```task
--- a/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
+++ b/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
@@ -8,7 +8,7 @@ status: blocked
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
-updated: "2026-06-18"
+updated: "2026-06-27"
 state_hub_workstream_id: "f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9"
 ---

@@ -87,7 +87,7 @@ reported 9 passed.

 ```task
 id: ACTIVITY-WP-0010-T02
-status: wait
+status: done
 priority: high
 state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a"
 ```
@@ -107,6 +107,30 @@ Current wait reason: this is Railiance/operator-owned live cluster work. State
 Hub handoff message `9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8` asks
 `railiance-cluster` to reconcile the updated config and smoke it.

+2026-06-19 recheck:
+
+- Deployed `llm-connect` into the `activity-core` namespace on `railiance01`
+  (the cluster that runs `actcore-worker`). `coulombcore` had llm-connect only;
+  the in-cluster Service URL is cluster-local.
+- `actcore-runtime-config` already exposed the verified URL and timeout;
+  `deployment/actcore-worker` was restarted and now reports
+  `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
+- `llm-connect-provider-secrets` reports `DATA 1`; no Secret values were
+  inspected.
+- Worker health probe to llm-connect `/health` returns `{"status": "ok"}`.
+- `actcore-state-hub-bridge` remains `0/1` Ready with upstream timeouts, so T02
+  is not fully closed until the node-local State Hub tunnel is restored.
+
+2026-06-27 recheck:
+
+- Superseded by real scheduled runner evidence: State Hub daily_triage events on
+  2026-06-24, 2026-06-25, 2026-06-26, and 2026-06-27 all reached State Hub and
+  wrote working-memory notes. The bridge/sink is therefore reachable for the
+  live runner.
+- 2026-06-24 and 2026-06-25 were schema-valid; 2026-06-26 and 2026-06-27 failed
+  output validation after calling llm-connect. That moves the active blocker out
+  of T02 and into the WP-0016 live bundle/smoke lane. Marking T02 done.
+
 ## Run Daily Triage Fixture Smoke

 ```task
@@ -128,6 +152,27 @@ Done when:
  detail;
 - `scripts/verify_daily_triage.py` reports the smoke/manual run as present.

+2026-06-19 recheck:
+
+- In-namespace llm-connect fixture smoke on `railiance01` passed:
+  `smoke: pass health=ok latency_seconds=1.681 recommendations=1`.
+- Manual `POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/trigger`
+  reached llm-connect, but the workflow failed at `persist_instruction_reports`
+  with `state-hub-progress` sink `Connection refused` while
+  `actcore-state-hub-bridge` is unhealthy.
+- T03 therefore remains open until State Hub bridge reachability is restored and
+  a run emits non-secret `daily_triage` progress with `output_validated=true`.
+
+2026-06-27 recheck:
+
+- Scheduled runs on 2026-06-24 and 2026-06-25 satisfy the non-secret smoke
+  evidence for llm-connect call, State Hub progress with output_validated=true,
+  and working-memory note creation.
+- Kept T03 at progress rather than done because the workstation did not run the
+  live verifier against Temporal/activity-core DB, and the smoke must be repeated
+  after the WP-0016 code/schema/runtime-prompt deployment due the 2026-06-26 and
+  2026-06-27 malformed-output failures.
+
 ## Collect Three Clean Scheduled Runs

 ```task
@@ -151,6 +196,14 @@ Done when:
 - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` can move from `wait` to
  `done`.

+2026-06-27 recheck:
+
+- Three-clean-run streak is reset. The latest sequence is 2026-06-24 clean,
+  2026-06-25 clean, 2026-06-26 validation_failed, 2026-06-27 validation_failed.
+- Current pickup is to deploy ACTIVITY-WP-0016 code/schema together with the
+  Railiance runtime prompt and max_tokens changes, run a live smoke, then restart
+  the three-consecutive-scheduled-run gate from zero.
+
 ## Close Handoff State

 ```task
--- a/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
+++ b/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Definition And Schedule Hot Reload"
 domain: custodian
 repo: activity-core
-status: ready
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
-updated: "2026-06-18"
+updated: "2026-06-22"
 state_hub_workstream_id: "8887075e-21ec-451b-b82b-cd81035c9ca5"
 ---

@@ -39,7 +39,7 @@ a repo checkout manager or CI system.

 ```task
 id: ACTIVITY-WP-0012-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "53a7970b-7eec-47f5-ad30-bbd7c6271952"
 ```
@@ -57,11 +57,17 @@ Done when:
 - failures are collected into a bounded `errors[]` result while preserving the
  current startup best-effort behavior.

+2026-06-19: Completed. Added `activity_core.sync_service.run_sync`, which
+orchestrates ActivityDefinition, event type, and schedule sync independently
+from explicit DB session factory and Temporal client dependencies. Worker
+startup now calls the shared service for definitions+schedules and logs bounded
+stage errors while continuing startup.
+
 ## Add Admin Sync Endpoint

 ```task
 id: ACTIVITY-WP-0012-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "8697c761-15d1-4da0-b66b-d838218a2495"
 ```
@@ -80,11 +86,17 @@ Done when:
 - endpoint tests cover definitions-only, schedules-only, all-sync, and failure
  result behavior.

+2026-06-19: Completed. Added `POST /admin/sync` with defaults
+`definitions=true`, `schedules=true`, and `event_types=false`. The response
+reports definition/event counts, schedule upsert/pause/orphan-delete counts, and
+bounded `errors[]`. Tests cover definitions-only, schedules-only, all-sync, and
+failure-result behavior.
+
 ## Preserve Schedule Drift Semantics

 ```task
 id: ACTIVITY-WP-0012-T03
-status: todo
+status: done
 priority: high
 state_hub_task_id: "efeac412-632c-4c90-9428-bb575ac7a624"
 ```
@@ -101,11 +113,18 @@ Done when:
 - regression tests demonstrate the Coulomb hourly-to-daily rename shape without
  needing a worker restart.

+2026-06-19: Completed. `sync_schedules` now returns explicit counts for enabled
+schedule upserts, disabled schedule pauses, and orphan deletes. Regression tests
+cover the hourly-to-daily rename shape: a new enabled cron schedule is upserted,
+the old disabled cron schedule is preserved as paused, unrelated orphan
+schedules are deleted, event-triggered definitions do not create schedules, and
+one-shot scheduled definitions are no longer mistaken for orphans.
+
 ## Optional Background Sync Loop

 ```task
 id: ACTIVITY-WP-0012-T04
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "d774087b-c51d-4444-8e90-bfef43765456"
 ```
@@ -121,11 +140,17 @@ Done when:
  last error summary;
 - the loop does not block worker startup or workflow task processing.

+2026-06-19: Completed by decision. v1 stays manual/operator-triggered through
+`POST /admin/sync`; no background loop was added. The runbook records this
+posture so customer definition changes stay explicit and the worker does not
+start background repo scanning. A periodic loop remains a future option if live
+operator use proves it is needed.
+
 ## Live No-Restart Smoke

 ```task
 id: ACTIVITY-WP-0012-T05
-status: wait
+status: done
 priority: high
 state_hub_task_id: "68a0e22a-106a-4d21-9f39-c6279850cb5e"
 ```
@@ -141,5 +166,27 @@ Done when non-secret State Hub evidence shows:
 - event-triggered definitions still fire normally;
 - rollback or repeat sync is idempotent.

-Current wait reason: this gate depends on the implementation tasks and a
-cluster-owned smoke path.
+2026-06-22: Completed on Railiance01 (`KUBECONFIG=~/.kube/config-hosteurope`).
+
+Smoke target: disabled projection `ops-service-inventory-probes`
+(`40d15a87-7ff6-4d8e-992c-37df15f95110`) in
+`actcore-external-activity-definitions`.
+
+Evidence:
+
+- ConfigMap flip `enabled: false -> true` and cadence `15 * * * * -> 25 * * * *`,
+  then `POST /admin/sync?definitions=true&schedules=true` from `actcore-api`.
+- DB after sync: `enabled=true`, `cron=25 * * * *`.
+- Temporal schedule after sync: `paused=false`, calendar minute `25`.
+- Repeat sync returned identical schedule counts
+  (`upserted=5`, `paused=1`, `deleted_orphans=0`) — idempotent.
+- Rollback flip restored `enabled=false`, `cron=15 * * * *`, schedule
+  `paused=true`, calendar minute `15`.
+- `actcore-worker` pod UID unchanged (`a68d6539-2bba-457e-a78a-39564002a980`,
+  started `2026-06-21T18:46:46Z`); `actcore-event-router` pod UID unchanged.
+- Event-triggered definitions: none projected on Railiance01 today; hot DB
+  reload path for event definitions remains covered by T03 unit tests and an
+  unchanged event-router deployment.
+
+Automation: `scripts/smoke_admin_sync_no_restart.py`. Runbook section added
+under "Railiance01 no-restart smoke".
--- a/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
+++ b/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
@@ -0,0 +1,78 @@
+---
+id: ACTIVITY-WP-0013
+type: workplan
+title: "Reuse Surface Report Gaps Resolver"
+domain: custodian
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: activity-core
+created: "2026-06-18"
+updated: "2026-06-18"
+state_hub_workstream_id: "01e68dfd-b146-4aef-a575-2d3b178ca5c2"
+---
+
+# Reuse Surface Report Gaps Resolver
+
+Implement the R2 handoff from kaizen-agentic (`bffa224c`) so the
+`reuse_surface_report_gaps` shell context source populates
+`context.gaps` for the Coulomb daily registry hygiene sweep.
+
+## Register Shell Resolver Query
+
+```task
+id: ACTIVITY-WP-0013-T01
+status: done
+priority: high
+state_hub_task_id: "a6e1fc5c-7b42-436d-914e-4d605cb6f329"
+```
+
+Add a dedicated reuse-surface context resolver module and register
+`reuse_surface_report_gaps` on the `shell` resolver path while preserving
+the existing kaizen shell query behavior.
+
+## Implement Batch And Signal Semantics
+
+```task
+id: ACTIVITY-WP-0013-T02
+status: done
+priority: high
+state_hub_task_id: "229cf285-8388-471d-95fd-08400db1553e"
+```
+
+Load the Coulomb rollout roster, select active repos with a persisted
+round-robin cursor, resolve repo roots from State Hub host paths, run
+`reuse-surface report gaps --format json`, and emit gap records for the
+enabled registry hygiene signals.
+
+## Cover Required And Optional Failure Modes
+
+```task
+id: ACTIVITY-WP-0013-T03
+status: done
+priority: high
+state_hub_task_id: "85b5c7d4-40e1-4945-8ada-1dff2363c194"
+```
+
+Ensure missing required dependencies fail visibly while optional resolver
+sources bind an empty `context.gaps` list. Add unit coverage for fixture
+rollout data, mocked CLI JSON, resolver binding, and `hygiene_signal`
+rule gating.
+
+## Smoke Real Coulomb Rollout
+
+```task
+id: ACTIVITY-WP-0013-T04
+status: done
+priority: medium
+state_hub_task_id: "6a5446ed-b4ec-4693-b508-65415571d834"
+```
+
+Run a live resolver smoke against
+`/home/worsch/coulomb-loop/loops/registry-hygiene/rollout.yaml` using a
+temporary round-robin cursor. The real active rollout produced five gaps,
+including one for `reuse-surface` with `hygiene_signal: stale_sbom`.
+The smoke supplied `reuse_surface_bin:
+/home/worsch/reuse-surface/.venv/bin/reuse-surface` and
+`runner_host: bnt-lap001`; the worker environment or definition params must
+provide equivalent values before enabling the production sweep.
--- a/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
+++ b/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
@@ -0,0 +1,194 @@
+---
+id: ACTIVITY-WP-0014
+type: workplan
+title: "Schedule Misfire Robustness & Run-Miss Recovery Options"
+domain: infotech
+repo: activity-core
+status: finished
+owner: claude
+topic_slug: activity-core
+created: "2026-06-23"
+updated: "2026-06-24"
+status_note: "T01-T05 complete; beachhead-endpoint adoption split to ACTIVITY-WP-0015"
+state_hub_workstream_id: "91b64686-5d17-4c86-bc9e-3d0ee6720cf5"
+---
+
+# Schedule Misfire Robustness & Run-Miss Recovery Options
+
+Make cron-triggered ActivityDefinitions robust to missed fires (worker/Temporal
+unavailable at trigger time) with explicit, per-definition recovery behaviour,
+plus detection/alerting when a scheduled fire is missed.
+
+## Motivation
+
+On 2026-06-22 and 2026-06-23 the `daily-statehub-wsjf-triage` definition
+(cron `20 7 * * *` Europe/Berlin, projected into the Railiance runtime ConfigMap
+`actcore-external-activity-definitions`) produced **no `daily_triage` progress
+event at all** — neither a success nor a `could not run; operator review
+required` failure.
+
+> **Corrected by T01 (2026-06-23).** The initial hypothesis below — that
+> `_build_schedule()` never set `catchup_window`, so a short-default catchup
+> window silently dropped the fire — was **disproven on the live cluster**. The
+> Temporal schedule is healthy with `CatchupWindow 365d` (the server default) and
+> `0 MissedCatchupWindow`. The real cause is that the run **fired and ran but
+> failed at the report sink** with `Connection refused` posting to State Hub,
+> because railiance01 reaches State Hub via a reverse tunnel back to the
+> workstation, which is asleep at 07:20 Berlin. See the T01 findings and T05.
+
+The trigger now originates entirely on **railiance01** (in-cluster Temporal
+Schedule, ConfigMap-projected definition) and is **not** laptop-dependent — but
+the triage's State Hub *data dependencies* (context resolution and report
+delivery) still route back to the workstation State Hub.
+
+This workplan still delivers worthwhile robustness — explicit run-miss recovery
+policies (T02) and missed-fire detection (T03) — but the fix for *this* incident
+is T05 (resilient sinks/resolvers + a workstation-independent State Hub endpoint).
+
+## Desired run-miss options (from Bernd)
+
+Three explicit, per-definition behaviours when a fire is missed:
+
+1. **Run on trigger or skip** — never recover a missed fire.
+2. **Run on trigger or later if missed** — recover **all** missed fires when back up.
+3. **Run on trigger or later if missed, but skip if next trigger reached** —
+   recover only the **most recent** missed fire; do not accumulate a backlog.
+
+Proposed mapping to a new `misfire_policy` value set (names open to review):
+
+| Policy | Semantics | Temporal mapping |
+| --- | --- | --- |
+| `skip` | Run on trigger or skip | `catchup_window ≈ 0`, `overlap=SKIP` |
+| `catchup_all` | Run on trigger or all missed later | `catchup_window=<long>`, `overlap=BUFFER_ALL` |
+| `catchup_latest` | Run on trigger or only the latest missed | `catchup_window ≈ 1 interval`, `overlap=BUFFER_ONE` |
+
+## Confirm root cause on Railiance01
+
+```task
+id: ACTIVITY-WP-0014-T01
+status: done
+priority: high
+state_hub_task_id: "c90ff214-9214-48c7-96b9-7d699528d5ab"
+```
+
+Inspected via `ssh railiance01` + in-node `kubectl`/`temporal` (no k3s tunnel is
+defined for railiance01; the documented access path is SSH to the host).
+
+**Findings (2026-06-23) — the WP-0014 premise was wrong for this incident:**
+
+- All pods healthy; `actcore-worker` up 44h, 0 restarts. Not a crash.
+- The daily-triage Temporal schedule (`activity-schedule-6fca51fa-…`) is
+  **healthy**: `Paused false`, `OverlapPolicy Skip`, **`CatchupWindow 365d`**
+  (Temporal's *default* when unset), `ActionCounts {Total:8, MissedCatchupWindow:0}`.
+  So fires were **not** silently dropped — my original "no catchup window → silent
+  drop" hypothesis does not hold; the server default is already 365d.
+- The `2026-06-23T05:20:00Z` fire **did fire and ran**, then **Failed at the report
+  sink**: `report sink failure: state-hub-progress … '[Errno 111] Connection
+  refused'`. The run produced a report but could not deliver it to State Hub, so
+  no `daily_triage` progress event (not even a "could not run" one) was posted →
+  the silence. The 06-22 fire has no execution in retention (bridge likely down
+  then too / schedule update window at `LastUpdateAt 1d ago`).
+- Root cause is **State Hub connectivity from railiance01**, not Temporal. The
+  in-cluster `actcore-state-hub-bridge` (`hostNetwork`) proxies to
+  `127.0.0.1:18000` on the node — the local end of the ops-bridge **reverse tunnel
+  back to the workstation's State Hub**. At 07:20 Europe/Berlin (= 05:20 UTC) the
+  workstation/tunnel was unreachable → `Connection refused`. Chronic flakiness
+  confirmed: 102 State Hub resolver timeouts in 24h (69 `recently_on_scope`,
+  33 `consistency_sweep`).
+
+**Implication:** the trigger *is* independent of the laptop, but the triage's
+**data dependencies (State Hub context resolution + report delivery) still route
+back to the workstation State Hub**, which is asleep at 07:20 Berlin. WP-0014's
+misfire policies are still good robustness, but the real fix is (a) State Hub
+reachable from railiance01 independent of the workstation, and/or (b) sinks/
+resolvers resilient to transient State Hub unavailability (retry/backoff,
+store-and-forward) instead of hard-failing the workflow. Tracked as follow-up
+below. Backfill deferred: a replay only succeeds while the workstation State Hub
+is reachable.
+
+## Implement explicit misfire recovery modes
+
+```task
+id: ACTIVITY-WP-0014-T02
+status: done
+priority: high
+state_hub_task_id: "19615562-4cb2-4f25-872f-505d6e40dcc5"
+```
+
+Add `catchup_window_seconds` to `CronTriggerConfig` and redefine `misfire_policy`
+into the three explicit modes above. In `_build_schedule()` set
+`SchedulePolicy(overlap=..., catchup_window=timedelta(...))` per mode. Remove the
+ad-hoc 1-hour `backfill` hack in favour of native catchup-window semantics. Keep
+backward compatibility for existing `skip`/`catchup`/`compress` values (alias
+map). Unit tests for each mode's `(catchup_window, overlap)` mapping.
+
+## Missed-fire detection & alert sink
+
+```task
+id: ACTIVITY-WP-0014-T03
+status: done
+priority: medium
+state_hub_task_id: "dbedd96a-59ca-4b83-bce6-35755b076807"
+```
+
+Detect when a scheduled definition has no successful run within its expected
+interval + tolerance, and emit a signal (State Hub progress event and/or
+agent-inbox message) so a miss is visible even under `skip`. This is the
+observability the current silent-drop behaviour lacks — a miss should never again
+be invisible.
+
+## Apply policy to runtime definitions & document
+
+```task
+id: ACTIVITY-WP-0014-T04
+status: done
+priority: medium
+state_hub_task_id: "04e9d1d2-1192-4402-9402-b12c5d7d44e5"
+```
+
+Set `misfire_policy: catchup_latest` for `daily-statehub-wsjf-triage`, documented
+run-miss options in `docs/runbook.md`.
+
+**Deployed & verified to railiance01 (2026-06-24):** built `activity-core:
+railiance01-prod` with the WP-0014 code (T02/T03/T05), imported into k3s
+containerd, applied the ConfigMap, rolled `actcore-worker`/`api`/`event-router`
+onto the new image, and ran `/admin/sync` (6 defs, 4 schedules upserted, 0
+errors). The live Temporal schedule now reports `OverlapPolicy BufferOne` +
+`CatchupWindow 1d` (= `catchup_latest`); pods healthy, API `db:true temporal:true`.
+
+## Keep activity-core thin under the State Hub beachhead model
+
+```task
+id: ACTIVITY-WP-0014-T05
+status: done
+priority: high
+state_hub_task_id: "b7e5b877-1b09-421c-a04e-78f785dc00a1"
+```
+
+**Architecture decision (Bernd, 2026-06-23):** the resilience that this incident
+needs — queuing writes and caching reads while State Hub is unreachable — must
+**not** be a burden carried by client repos. It belongs to State Hub as a
+**per-machine local "beachhead"** (transparent read cache + write outbox, possibly
+with State-Hub federation), owned by custodian/state-hub. It handles all three
+failure modes: network interruption, central State Hub crash, central machine
+down. This is handed off to state-hub (see the coordination message / proposal);
+**do not build client-side queue/cache logic in activity-core.**
+
+activity-core's only responsibilities under this model are thin:
+
+- **Idempotent writes — DONE (2026-06-23, in-repo):** added
+  `activity_core/state_hub_write` (`idempotency_headers`); every State Hub write
+  (report-sink, ops-evidence, schedule-miss) now sends a stable `Idempotency-Key`
+  header derived from `run_id:instruction_id:event_type`. The read-based
+  `_progress_exists` dedup is now best-effort (returns `False` on connection
+  error instead of hard-failing), so the guarantee lives on the keyed write, not
+  a live read. Tests in `tests/test_state_hub_write.py`; documented in
+  `docs/runbook.md`.
+- **Adopt the beachhead endpoint — MOVED to [[ACTIVITY-WP-0015]]:** pointing
+  `STATE_HUB_URL` at the local beachhead and retiring the bespoke
+  `actcore-state-hub-bridge` proxy depend on the state-hub beachhead existing
+  first. Split into WP-0015 (status `blocked`) so this workplan can close on its
+  completed in-repo work rather than waiting on an external capability.
+
+T05 is done as far as activity-core can act now; the external-dependent adoption
+lives in WP-0015.
--- a/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
+++ b/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
@@ -0,0 +1,54 @@
+---
+id: ACTIVITY-WP-0015
+type: workplan
+title: "Adopt State Hub Beachhead Endpoint"
+domain: infotech
+repo: activity-core
+status: blocked
+owner: claude
+topic_slug: activity-core
+created: "2026-06-24"
+updated: "2026-06-24"
+state_hub_workstream_id: "bbc07f9e-9323-4b2b-b556-c33b37d0b228"
+---
+
+# Adopt State Hub Beachhead Endpoint
+
+Carries the **blocked remainder** of [[ACTIVITY-WP-0014]] T05. The in-repo half
+(idempotency-keyed State Hub writes) shipped in WP-0014; this workplan is the
+client-side adoption that depends on the state-hub-owned **beachhead** capability
+(per-machine read cache + write outbox) existing first.
+
+**Blocked on:** the state-hub beachhead (proposal sent to the `state-hub` agent,
+2026-06-23). Do not build queue/cache logic in activity-core — see
+[[statehub-beachhead-principle]].
+
+## Point STATE_HUB_URL at the beachhead
+
+```task
+id: ACTIVITY-WP-0015-T01
+status: wait
+priority: medium
+state_hub_task_id: "76b6132d-394a-4a67-bef6-73bb9d1e277e"
+```
+
+Once the state-hub beachhead exposes a local endpoint, point activity-core's
+`STATE_HUB_URL` (and the railiance runtime config) at it and verify reads are
+served from cache and writes are queued/flushed correctly when central State Hub
+is unreachable. Confirm idempotency-keyed writes dedup on flush (no duplicate
+`daily_triage`/progress events).
+
+## Retire the bespoke actcore-state-hub-bridge proxy
+
+```task
+id: ACTIVITY-WP-0015-T02
+status: wait
+priority: medium
+state_hub_task_id: "526c2129-cbf7-4531-a319-aebfc75cc6a3"
+```
+
+Remove the inline `hostNetwork` HTTP proxy `actcore-state-hub-bridge` from
+`k8s/railiance/20-runtime.yaml` — it is a primitive precursor of the beachhead
+and should be replaced by the state-hub-owned component, not extended. Re-verify
+the daily triage end-to-end after cutover, including an overnight scheduled run
+while the workstation is asleep (the original failure condition).
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -0,0 +1,434 @@
+---
+id: ACTIVITY-WP-0016
+type: workplan
+title: "LLM Output Robustness & The Producer Trust Boundary"
+domain: custodian
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: custodian
+created: "2026-06-26"
+updated: "2026-06-30"
+state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
+---
+
+# ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
+
+## Context
+
+On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on
+time (`daily_triage` event 05:20:57Z) but its output **failed schema
+validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The
+model emitted a long ranked WSJF recommendation list (reached rank 7+ with
+nested `wsjf` objects) and the JSON broke deep in that list. Because the report
+is a single monolithic JSON document, one malformed delimiter discarded the
+**entire** run. This reset the three-clean-consecutive-scheduled-runs streak in
+`ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
+LLM-output-quality surface deferred from `ACTIVITY-WP-0010`.
+
+The scheduling/runtime layer is healthy — this is purely an output-robustness
+and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`)
+already: passes the output schema to llm-connect as a `json_schema` model param
+(`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser
+(`_parse_json_output`), and preserves a bounded 4000-char preview on hard
+failure (`_invalid_output_report`). None of that helps when error locality is
+zero: the failure unit is the whole document, not the offending item.
+
+## Design Frame — The Producer Trust Boundary
+
+This workplan is anchored to a deliberate architectural stance, not just a bug
+fix. Capture it in an ADR (T04) so future work inherits it.
+
+**Premise.** activity-core has a *trust boundary* where free-form producer
+output meets strict deterministic consumers (JSON Schema validators, the task
+emitter, classic compute pipelines). The producers are **LLMs and humans (and
+agents acting for either)**. Both are *untrusted producers*: their output may be
+
+- **erroneous** — hallucination, truncation (token-limit cutoff), drift,
+  type slips, typos; or
+- **malicious** — prompt injection, crafted payloads, oversized/deeply-nested
+  structures aimed at exhausting or confusing the consumer.
+
+The architecture should treat the boundary as an adversarial frontier and place
+**guardrails + error-correction tooling there**, rather than letting raw
+producer output flow into deterministic consumers and fail (or worse, partially
+succeed) downstream.
+
+**Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem,
+there are two sensible strategies — and they compose:
+
+- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
+  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the
+  happy path. Blast radius depends entirely on how granular the catch is. Good
+  when failures are rare and locally recoverable. Risk: failures surface late,
+  possibly after partial side effects.
+- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
+  and normalize the output to a known-good shape *before* it enters the pipeline
+  — drop bad items, coerce types, bound sizes/depth, allow-list references — so
+  the consumer only ever sees clean input. Higher upfront cost, smaller blast
+  radius, no partial side effects. Good when failures are common or
+  consequences are high.
+
+**Governing principles for this repo:**
+
+1. **Push verification to the boundary; keep the interior strict.** Apply
+   posture **B** at the producer→consumer boundary (verify+mitigate structure);
+   keep posture **A** for residual exceptions inside the verified core. Never
+   relax the interior schema to absorb producer sloppiness.
+2. **Make error locality match the unit of work.** One bad recommendation must
+   cost one recommendation, not the whole report. Framing the payload so each
+   item is independently parseable is the single highest-leverage change.
+3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
+   provenance-tagged artifacts (index, error, raw snippet) so they can be
+   debugged or replayed — degraded-but-usable is distinct from total loss.
+4. **Both human and agent input get the same rigor.** Guardrails are
+   producer-agnostic: the same size/depth/count caps, reference allow-lists, and
+   truncation detection apply whether the producer is an LLM, an agent, or a
+   human form submission.
+
+## Reproduce & Root-Cause The Failure
+
+```task
+id: ACTIVITY-WP-0016-T01
+status: cancel
+priority: high
+state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
+```
+
+Recover the **full** raw llm-connect response for the 06-26 failure (the State
+Hub event keeps only a 4000-char preview; the break is at char 5268) and
+establish the precise cause.
+
+Done when:
+
+- the full raw response is pulled from the runtime llm-connect log / response
+  store and the exact offending token at char 5268 is identified;
+- `finish_reason` is captured to confirm or rule out token-limit **truncation**
+  vs a structural mid-stream glitch;
+- it is confirmed whether llm-connect actually **enforced** the `json_schema`
+  constrained-decoding hint or merely accepted it as advisory (this determines
+  whether the schema param is load-bearing);
+- the failing payload is captured as a regression fixture under `tests/`.
+
+2026-06-26 findings (local analysis on the workstation):
+
+- **Mechanism confirmed structurally.** There are **16 active workstreams**
+  org-wide and the triage instruction emits ~one ranked recommendation per
+  candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
+  break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
+  structural cause — more items = more tokens = higher odds of a mid-stream JSON
+  slip and/or truncation. This directly justifies T02's bounded top-N + per-item
+  framing.
+- **Both attempts failed.** `executor._execute` retries once
+  (`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
+  **retry** output, so the model produced invalid JSON twice — not a one-off.
+- **activity-core discards the diagnostics needed to root-cause this.** Three
+  retention gaps mean the exact char-5268 token cannot be recovered from
+  activity-core data at all:
+  1. `LLMConnectClient.complete()` returns only `data["content"]`
+     (`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
+     llm-connect HTTP response, so truncation-vs-structural cannot be
+     distinguished locally.
+  2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
+     `executor.py:259`) — below the 5268 break.
+  3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
+- **Remaining (remote, operator-owned).** Confirming the exact offending token
+  and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
+  — cluster access, outside this repo's SCOPE for direct action. Truncation is
+  the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
+  identical either way, so T01 does not block the build work.
+- **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
+  `finish_reason`/`usage` and persist a larger bounded raw artifact on validation
+  failure so this class of failure is never un-debuggable again.
+- Partial fixture saved:
+  `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
+  (the 4000-char preview + validation error; full payload pending the remote pull).
+
+2026-06-30 local retention hardening: activity-core now preserves future
+llm-connect diagnostic metadata instead of dropping it at the client boundary.
+`LLMConnectClient.complete()` still returns the content string for compatibility,
+but records safe non-secret response fields such as `finish_reason` and `usage`
+on `last_response_metadata`; the executor copies that into report artifacts,
+State Hub progress detail, and working-memory notes. Invalid report raw previews
+were raised from 4000 to 12000 chars. This does not recover the historical
+06-26 full payload or producer-side `finish_reason`, so T01 remains wait on the
+remote llm-connect log pull, but the retention gap is closed for future failures.
+
+## Schema + Prompt Redesign For Error Locality
+
+```task
+id: ACTIVITY-WP-0016-T02
+status: done
+priority: high
+state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
+```
+
+Redesign the daily-triage report contract so a single malformed item can no
+longer discard the whole report (principle #2).
+
+Done when:
+
+- the recommendation list is **bounded** (configurable top-N, default 5–7) in
+  both the prompt and the output schema — long lists are where the model drifts;
+- the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one
+  recommendation object per line — or an equivalent delimited per-item form)
+  behind a minimal stable envelope (`summary` + framed items), so each item is
+  an independent parse unit;
+- the prompt explicitly states the contract, the per-item framing, the cap, and
+  a "if uncertain, emit fewer well-formed items rather than more" instruction;
+- `max_tokens` is set with headroom for the bounded list so truncation cannot
+  occur at the expected size;
+- the output schema file (`_load_output_schema` target) is updated to match.
+
+2026-06-26 progress (in-repo portion):
+
+- **Strict, bounded schema written** — `schemas/daily-triage-report.json` went
+  from `recommendations.items: {type: object}` (accept-anything) to a strict
+  per-item contract: `required [rank, candidate, action, why]` with typed
+  `wsjf` sub-fields, plus `maxItems: 7`. The strict item shape is what lets the
+  T03 boundary parser validate each recommendation independently.
+- **`maxItems` is a hint, not a hard reject** — the in-repo validator
+  (`_validate_schema_node`) only enforces `type`/`required`/`properties`/`items`
+  and ignores `maxItems`/`enum`. That is deliberate: a hard `maxItems` reject
+  would discard a whole 16-item report — the exact blast-radius bug WP-0016
+  removes. The bound is enforced via the prompt + the llm-connect `json_schema`
+  constraint hint + T03 mitigation (keep top-N by rank, quarantine extras).
+- **DEPLOY COUPLING (important):** this schema file is consumed *both* as the
+  llm-connect hint *and* by the current whole-document validator. Tightening
+  per-item `required` fields makes the existing whole-doc validation hard-fail
+  **more** until T03 replaces it with per-item quarantine. Therefore the schema
+  change MUST ship together with T03 — do not deploy the strict schema to the
+  runtime bundle ahead of the T03 parser. Four executor/instruction tests that
+  asserted the old loose contract were updated to the strict contract; the
+  forwarded-schema test now reads the live file instead of hard-coding it.
+- **Truncation hypothesis corroborated** — the instruction config carries
+  `max_tokens` on the order of ~1200 (per the wiring test fixture). 5268 chars ≈
+  ~1300–1500 tokens, so a ~1200-token cap would truncate a 16-item list right at
+  the observed break. This strengthens T01's leading hypothesis and makes the
+  `max_tokens` headroom change below concrete.
+
+**Bundle handoff (NOT in this repo — runtime-projected definition).** The triage
+prompt and `max_tokens` live in the Railiance runtime bundle, not in repo files.
+Apply there:
+1. Instruct a **bounded top-N** (≤ 7) ranked recommendations, "if uncertain emit
+   fewer well-formed items rather than more."
+2. Specify the **per-item framing** the T03 parser will consume (NDJSON: a
+   leading summary object, then one recommendation JSON object per line).
+3. Raise **`max_tokens`** to give clear headroom for 7 framed items (eliminate
+   truncation at the expected size).
+4. State the value vocabularies (`action`, `confidence`) the T04 guardrails will
+   check.
+
+2026-06-30 live evidence check: the 2026-06-28 and 2026-06-29 scheduled
+`daily_triage` events validated successfully, which shows the runtime is no
+longer failing every day. However, the preserved State Hub reports still contain
+10 recommendations, not the requested bounded top-N of 7 / framed item contract.
+Treat that as evidence that the runtime-projected prompt/schema/max-token bundle
+has not fully absorbed the T02 handoff yet.
+
+2026-06-30 source projection closeout: patched `k8s/railiance/20-runtime.yaml`
+so the projected `daily-statehub-wsjf-triage.md` prompt now says at most 7
+recommendations and instructs the model to emit fewer well-formed items rather
+than more. The projected `daily-triage-report.json` now has `maxItems: 7` and
+`rank.maximum: 7`, aligned with the repo schema. `max_tokens: 1800` remains as
+headroom for the bounded report. T02 is done in source; live deployment and an
+observed <=7 recommendation run remain under T05.
+
+## Boundary Parser — Verify & Mitigate (Posture B)
+
+```task
+id: ACTIVITY-WP-0016-T03
+status: done
+priority: high
+state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
+```
+
+Implement item-granular parsing with a quarantine lane in
+`src/activity_core/rules/executor.py`, applying posture **B** at the boundary
+(principles #1–#3).
+
+Done when:
+
+- the parser splits the envelope from the framed items, then parses **each item
+  independently**; a malformed item is routed to a bounded `quarantined_items`
+  artifact (index + validation error + raw snippet), not raised;
+- a run with some valid and some invalid items emits a report over the surviving
+  valid items with `output_validated=true`, plus `partial=true` and
+  `quarantined_count` / `quarantined_items` markers — degraded-but-usable is
+  reported distinctly from total loss;
+- a best-effort **repair** pass (close unterminated brackets/quotes, recover the
+  valid prefix) is attempted per item before quarantining it;
+- truncation detected in T01 is handled as its own signal (recover whole items
+  emitted before the cutoff rather than failing the document);
+- the existing monolithic-document path remains as the fallback when framing is
+  absent (backward compatible with task-only instructions).
+
+2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`):
+
+- **Resilient recovery wired into `_execute`.** When the whole-document parse +
+  one retry still fail, report instructions (those with `report_sinks`) now run
+  `_resilient_report` *before* the total-loss `_invalid_output_report`. If it
+  recovers ≥1 valid item it returns a partial report; otherwise it returns None
+  and the prior total-loss path is preserved unchanged.
+- **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output
+  was pretty-printed (multi-line objects), so naive NDJSON line recovery would
+  have failed. `_extract_object_spans` walks the `recommendations` array
+  brace-depth- and string-aware, so it recovers each recommendation object
+  whether pretty-printed across many lines *or* emitted one-per-line (NDJSON).
+  The truncated trailing object is returned with `complete=False`.
+- **Layered mitigation per item:** `json.loads` → on failure for a truncated
+  tail, a best-effort `_try_repair` (balance open string/brackets/braces) →
+  then `_partition_items` validates each recovered object against the T02 item
+  schema. Valid items survive; malformed or over-`maxItems` items are
+  quarantined with provenance (`index`, `error`, `raw` snippet, `reason`).
+- **Report shape on degradation:** `output_validated=True` over the survivors,
+  `review_required=True`, `partial=True`, `quarantined_count`, and a bounded
+  `quarantined_items` list (cap 20). Degraded-but-usable is now reported
+  distinctly from total loss.
+- **Verified against the real failure shape.** New tests reconstruct a
+  pretty-printed report with 7 valid recommendations + a truncated tail (the
+  06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers
+  all 7 and quarantines the broken tail (previously: whole run discarded);
+  log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item
+  run keeps 2 and quarantines the rank-less one.
+- **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the
+  *happy* path (valid JSON, all items schema-valid, but > N items) — the resilient
+  path only runs on failure, so over-limit-on-success is a guardrail/count-cap
+  concern, which is exactly T04's remit.
+
+## Producer Guardrails + ADR-004
+
+```task
+id: ACTIVITY-WP-0016-T04
+status: done
+priority: medium
+state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
+```
+
+Write the architecture decision record and add the producer-agnostic guardrails
+(principle #4).
+
+Done when:
+
+- `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary,
+  the untrusted-producer premise (erroneous **and** malicious; human and agent),
+  the A vs B taxonomy and where each applies, the error-locality principle, and
+  the quarantine-with-provenance rule;
+- boundary guardrails are enforced at the consumer edge: max item **count**, max
+  string length, max nesting **depth**, and a **reference allow-list** (e.g. a
+  recommendation `candidate` / a task `target_repo` must resolve to a known
+  workstream/repo before it is acted on);
+- guardrail rejections are quarantined with provenance, consistent with T03;
+- SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
+  changes the documented contract.
+
+2026-06-26 progress:
+
+- **ADR-004 written** — `docs/adr/adr-004-producer-trust-boundary.md` documents
+  the untrusted-producer premise (erroneous + malicious; LLM/agent/human), the
+  A-vs-B posture taxonomy, the four governing principles, the concrete
+  activity-core mechanisms, a posture-by-layer table, consequences, and
+  alternatives considered. Accepted, scope cross-repo.
+- **Producer guardrails implemented** in `executor.py`, applied uniformly on the
+  happy path *and* the recovery path via `_partition_items`: per-item order is
+  structural-type → schema → structural caps (`_MAX_DEPTH=8`,
+  `_MAX_STRING_LEN=4000`) → reference allow-list → count cap (`maxItems`). Each
+  quarantine carries a `reason` (`malformed`/`schema`/`guardrail`/`allow_list`/
+  `over_limit`).
+- **Happy-path count cap closed** (the item deferred from T03): a syntactically
+  valid 9-item report now keeps 7 and quarantines 2 as `over_limit`, emitting a
+  `partial` report — without a retry.
+- **Reference allow-list wired but inert.** `_allow_list_from_context` reads
+  `context["known_candidates"]`; when present, recommendations with an unknown
+  `candidate` are quarantined (`reason: allow_list`). Absent today → check is
+  inert; activation is a one-line context-resolver change. Keeps the guardrail
+  producer-agnostic (principle #4) and ready.
+- **SCOPE.md updated** — instruction-executor bullet now names the quarantine
+  lane + guardrails; ADR-004 added to the Architecture Decisions list. No INTENT
+  drift: this hardens the existing output contract, it does not extend scope.
+- New tests: happy-path count cap, oversized-string guardrail, allow-list
+  rejection (all green).
+
+## Tests + Calibration Re-Entry
+
+```task
+id: ACTIVITY-WP-0016-T05
+status: done
+priority: high
+state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
+```
+
+Prove the new posture and hand back to the calibration gates.
+
+Done when:
+
+- regression tests cover: the captured 06-26 payload, a truncated-mid-list
+  payload, a one-bad-item-among-good payload (asserts quarantine + partial), an
+  oversized/over-deep payload (asserts guardrail rejection), and an
+  injection-shaped reference (asserts allow-list rejection);
+- the full suite passes and the result is recorded here with the count;
+- a daily-triage smoke against the live runtime shows a previously-failing
+  payload now **degrades gracefully** (valid items delivered, bad items
+  quarantined) instead of discarding the run;
+- a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03`
+  that the output-robustness blocker is cleared so the three-clean-run gate can
+  resume on its own.
+
+2026-06-26 progress (in-repo portion complete):
+
+- **Regression coverage complete.** Across T03/T04/T05: truncated-mid-list,
+  one-bad-item-among-good (quarantine + partial), oversized-string and over-depth
+  guardrail rejection, allow-list (injection-shaped) rejection, happy-path count
+  cap, and a test driving the **actual captured 2026-06-26 payload**
+  (`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`)
+  — it now recovers 6+ valid recommendations and quarantines the truncated tail,
+  where before it discarded the whole run.
+- **Full suite green:** 218 passed, 1 skipped (recorded at T04; the T05 fixture +
+  over-depth tests add to this — see the commit).
+- **Hand-back notes posted** to `ACTIVITY-WP-0006-T03` (State Hub event
+  `b6b8c2b8`) and `ACTIVITY-WP-0010-T04` (`b813f0dc`).
+- **Remaining (remote, operator-owned):** the live daily-triage smoke on
+  `railiance01` proving end-to-end graceful degradation. It depends on deploying
+  the T02 bundle prompt/`max_tokens`/NDJSON changes together with this code, which
+  is cluster/operator work outside this repo's SCOPE. T05 therefore stays
+  `progress` until that live run exists; the in-repo deliverables are done.
+
+2026-06-30 follow-up: added forward-looking diagnostics so future validation
+failures carry llm-connect response metadata and a larger bounded raw-output
+preview in activity-core-owned evidence. Focused verification passed:
+`uv run pytest tests/test_llm_client.py tests/rules/test_executor.py tests/test_report_sinks.py -q`
+=> 39 passed. This improves future root-cause ability but does not replace the
+required live smoke proving graceful degradation on railiance01.
+
+2026-06-30 projection follow-up: local source projection now enforces the top-7
+prompt/schema contract. Remaining T05 proof is operational: deploy or sync the
+updated `k8s/railiance/20-runtime.yaml`, run `actcore-sync`/schedule smoke or wait
+for the next 07:20 Berlin fire, then confirm State Hub `daily_triage` evidence is
+`output_validated=true` with no more than 7 recommendations.
+
+## Relationships
+
+- **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
+  `ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on
+  the same output-quality failure this workplan removes.
+- **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap).
+- **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens
+  the instruction-executor output contract; it does not move provider
+  credentials, cluster reconciliation, or task lifecycle into this repo.
+
+
+## Closure 2026-07-02 (RAIL-BS-WP-0008 live deploy)
+
+- T05 done: the robustness bundle (strict per-item schema + T03 quarantine
+  parser + bounded top-7/NDJSON runtime prompt, activity-core `7612112`) was
+  deployed to railiance01 and live-proven. A manually triggered daily-triage
+  run produced a clean schema-valid report with exactly 7 ranked
+  recommendations: State Hub event `24d2d321-c761-47f7-bf9e-7950a6253c21`,
+  `output_validated=true`, working memory written. Calibration re-entry: the
+  three-clean-run streak (WP-0006-T03 / WP-0010-T04) restarts from this run.
+- T01 cancelled: the raw 2026-06-26 llm-connect response is unrecoverable
+  (stateless pod, no response store, log stream holds only 2 startup lines
+  since 2026-06-19). Root cause stands on the retained 4000-char preview and
+  break-at-char-5268 evidence: output exceeded the old ~1200-token budget and
+  truncated mid-JSON. The deployed mitigation (1800-token headroom, bounded
+  top-7, per-item recovery) addresses exactly that failure mode.
--- a/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
+++ b/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
@@ -0,0 +1,58 @@
+---
+id: ACTIVITY-WP-0017
+type: workplan
+title: "Core Hub ops evidence sink"
+domain: infotech
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: custodian
+created: "2026-06-27"
+updated: "2026-06-27"
+state_hub_workstream_id: "2a073bf4-febf-433e-a721-5daf71760912"
+---
+
+# Core Hub ops evidence sink
+
+## Goal
+
+Provide the activity-core side of the Core Hub replacement evidence path for
+`CORE-WP-0008-T03`, without depending on the legacy Haskell Inter-Hub sink and
+without placing secret material in activity definitions, logs, State Hub, or
+chat.
+
+## Task: Add Core Hub interaction-event sink
+
+```task
+id: ACTIVITY-WP-0017-T01
+status: done
+priority: high
+state_hub_task_id: "32aab1af-6be5-4b52-afa1-c11f52c65892"
+```
+
+Add a `core-hub-interaction-event` ops evidence sink that posts sanitized
+ops-inventory probe evidence to Core Hub `/api/v2/interaction-events`, verifies
+the created event is visible, and reports only non-secret ids/statuses.
+
+Acceptance:
+
+- runtime token is read through `CORE_HUB_RUNTIME_TOKEN_FILE` or a named
+environment variable, never from workplan content;
+- sink configuration accepts `CORE_HUB_BASE_URL` and a widget id or widget
+mapping;
+- emitted metadata reuses the existing compact/sanitized probe evidence path;
+- missing Core Hub config skips cleanly with explicit non-secret missing keys;
+- tests prove the POST/visibility check and secret non-disclosure.
+
+Verification 2026-06-27: `tests/test_ops_evidence_sinks.py` passed, and
+a disposable local Core Hub runtime accepted an activity-core
+`core-hub-interaction-event` sink emission, then listed the created
+`ops-endpoint-verified` event back through `/api/v2/interaction-events`.
+The verification asserted sanitized metadata did not include response body,
+authorization header, URL userinfo, or token query material.
+
+Completed 2026-06-27: implemented the Core Hub interaction-event sink in
+`activity_core.ops_evidence_sinks` with unit coverage for POST/visibility
+verification, missing config behavior, and secret non-disclosure. This provides
+the direct Core Hub consumer path needed by `CORE-WP-0008-T03`; deployed use
+still requires an approved Core Hub runtime token and widget id/mapping.
--- a/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
+++ b/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
@@ -0,0 +1,248 @@
+---
+id: ACTIVITY-WP-0018
+type: workplan
+title: "Own-infrastructure automation status surface"
+domain: infotech
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: automation-observability
+created: "2026-06-29"
+updated: "2026-06-29"
+state_hub_workstream_id: "0220b38b-7c73-4601-9601-5f2c1a5b29e8"
+---
+
+# Own-infrastructure automation status surface
+
+## Goal
+
+Make activity-core's own scheduling and evidence infrastructure the explicit
+operating preference for durable automations, independent of any coding
+assistant-provided scheduler or reminder system.
+
+An operator should be able to answer a question like "How did our automations go
+since Friday?" with a repo-native command that does not require an LLM. Coding
+assistants may inspect or summarize that command's output, but they must not be
+the source of truth for scheduled execution, run history, or operational
+evidence.
+
+## Review notes
+
+The repo already owns the correct infrastructure direction:
+
+- `SCOPE.md` defines activity-core as the org-wide event bridge for cron,
+  one-off scheduled datetime, and event-triggered automation.
+- `Makefile` exposes sync and service targets, but no operator status target for
+  recent automation outcomes.
+- `docs/runbook.md` documents daily-triage verification through
+  `scripts/verify_daily_triage.py`, but that helper is activity-specific and
+  still reads like a checklist rather than the baseline answer surface for all
+  automations.
+- Existing workplan evidence shows the status question is operationally common:
+  2026-06-24 and 2026-06-25 daily triage runs were clean, while 2026-06-26 and
+  2026-06-27 fired on schedule but failed output validation. That distinction is
+  exactly what the baseline command must make obvious.
+
+## Task: Codify the own-infra scheduling preference
+
+```task
+id: ACTIVITY-WP-0018-T01
+status: done
+priority: high
+state_hub_task_id: "00127678-5ce4-4cb3-b81c-f42e04407c73"
+```
+
+Record the repository preference that durable automation scheduling, execution
+history, and run evidence belong to activity-core's own infrastructure: Temporal
+Schedules, NATS JetStream, activity-core run records, State Hub progress, and
+working-memory/report sinks.
+
+Acceptance:
+
+- `AGENTS.md` repo-specific instructions say not to use coding
+  assistant-provided automation tooling as the execution or evidence source for
+  activity-core automations.
+- `SCOPE.md` and `docs/runbook.md` describe coding assistants as callers or
+  summarizers of repo-native automation commands, not as schedulers.
+- The preference distinguishes durable automation from harmless local session
+  reminders: production/operational recurrence belongs to activity-core.
+- The text names the authoritative evidence sources and avoids tying the policy
+  to any one assistant product.
+
+2026-06-29 progress: Added the immediate repo-agent instruction in AGENTS.md
+that durable activity-core automations must use repo-owned infrastructure, not
+coding assistant automation/reminder/heartbeat tooling, as the execution or
+evidence source. Remaining T01 work is to carry the same preference into
+SCOPE.md and docs/runbook.md.
+
+## Task: Define the automation status evidence contract
+
+```task
+id: ACTIVITY-WP-0018-T02
+status: done
+priority: high
+state_hub_task_id: "17e6bb87-d4bf-4ef3-b91c-4bdfe2fe3492"
+```
+
+Define a small, deterministic report contract for answering recent automation
+status questions across all ActivityDefinitions.
+
+Acceptance:
+
+- The contract covers schedule state, expected fires in the requested window,
+  observed workflow runs, `activity_runs` rows, State Hub progress events,
+  working-memory/report sink evidence, and known validation or sink failures.
+- It defines normalized statuses such as `completed`, `running`, `retrying`,
+  `validation_failed`, `sink_failed`, `missed`, `disabled`, and `unknown`.
+- Partial data is explicit: if Temporal, Postgres, State Hub, or a sink path is
+  unavailable, the report includes warnings rather than silently passing or
+  failing the whole check.
+- The contract is safe for operator logs: no secrets, prompts, raw model output,
+  or credential-bearing URLs.
+- The contract can be emitted as JSON for scripts and rendered as concise text
+  for humans.
+
+## Task: Implement the non-LLM automation status CLI
+
+```task
+id: ACTIVITY-WP-0018-T03
+status: done
+priority: high
+state_hub_task_id: "7831f2fc-8b76-48fe-aa34-9dcc11ee84db"
+```
+
+Add a deterministic CLI, likely under `scripts/automation_status.py` or an
+`activity_core` module, that answers recent automation status questions without
+calling an LLM.
+
+Acceptance:
+
+- Supports `--since`, `--until`, activity name/id filters, JSON output, and a
+  concise human summary.
+- Accepts simple operator dates, including absolute dates and a documented
+  `friday`/`last-friday` style shortcut, resolving them to concrete dates in the
+  configured timezone.
+- Inspects all enabled scheduled ActivityDefinitions by default, not just daily
+  triage.
+- Uses live sources when configured: Postgres `activity_definitions` /
+  `activity_runs`, Temporal schedule and workflow visibility, State Hub
+  progress, and configured local report sink paths.
+- Degrades usefully when a source is unavailable and exits non-zero only for
+  real status failures or invalid input, not for optional evidence gaps that are
+  clearly reported.
+- Includes focused unit tests with fixture data for clean runs, validation
+  failures, missed runs, disabled schedules, and partial-source availability.
+
+## Task: Add the Make target baseline
+
+```task
+id: ACTIVITY-WP-0018-T04
+status: done
+priority: high
+state_hub_task_id: "451bdf62-b619-4ace-9262-46d20b912781"
+```
+
+Expose the CLI through a Make target that is easy for an operator or any coding
+assistant to run before attempting a prose summary.
+
+Acceptance:
+
+- `make automation-status SINCE=2026-06-26` prints the human-readable baseline.
+- `make automation-status SINCE=friday` is supported or documented with the
+  exact accepted shortcut.
+- A JSON form is available, either through `FORMAT=json` or a separate target
+  such as `make automation-status-json`.
+- The target does not require LLM credentials, coding assistant automation
+  tooling, or interactive prompts.
+- `make help` lists the target with a clear one-line description.
+
+## Task: Update operator docs and examples
+
+```task
+id: ACTIVITY-WP-0018-T05
+status: done
+priority: medium
+state_hub_task_id: "233659aa-e14a-4b3d-b156-d04f0fa16db6"
+```
+
+Update the runbook so "How did automations go since Friday?" has an obvious
+operator recipe.
+
+Acceptance:
+
+- `docs/runbook.md` has a short "Automation status" section near the scheduling
+  operations.
+- The docs include example output or a compact sample for the known daily
+  triage distinction: fired on time versus completed successfully versus output
+  validation failure.
+- The docs clarify that LLM summaries are optional convenience only; the Make
+  target output is the baseline evidence.
+- The daily-triage-specific helper is either kept as a lower-level diagnostic or
+  folded into the generalized status command.
+
+## Task: Verify against recent scheduled-run evidence
+
+```task
+id: ACTIVITY-WP-0018-T06
+status: done
+priority: medium
+state_hub_task_id: "24efbe9f-dfff-482f-9edc-456379c9a2aa"
+```
+
+Prove the new surface against the recent evidence that motivated this workplan.
+
+Acceptance:
+
+- Running the status command over the window starting Friday, 2026-06-26 shows
+  that the daily triage schedule fired on 2026-06-26 and 2026-06-27 but did not
+  produce clean validated reports.
+- The command distinguishes scheduling health from output/schema validation
+  failure.
+- Disabled or waiting schedules, such as the weekly coding retro gate when its
+  upstream read model is not available, are reported without being counted as
+  missed runs.
+- Verification results are recorded in this workplan and as a State Hub progress
+  note once the implementation lands.
+
+## Implementation Result
+
+Completed 2026-06-29: implemented the own-infrastructure automation status
+surface and codified the scheduling preference.
+
+Delivered:
+
+- `AGENTS.md` now states that durable activity-core automations use repo-owned
+  infrastructure, not coding assistant automation/reminder/heartbeat tooling, as
+  execution or evidence authority.
+- `SCOPE.md` and `docs/runbook.md` describe the deterministic status surface and
+  assistant boundary.
+- `src/activity_core/automation_status.py` and `scripts/automation_status.py`
+  provide the non-LLM CLI.
+- `make automation-status SINCE=...` and `make automation-status-json` expose the
+  baseline operator commands.
+- `tests/test_automation_status.py` covers date shortcuts, cron fire estimation,
+  completed runs, validation failures, missed runs, disabled schedules, partial
+  source availability, and working-memory evidence parsing.
+
+Verification:
+
+```bash
+python3 -m py_compile src/activity_core/automation_status.py scripts/automation_status.py tests/test_automation_status.py
+/home/worsch/.local/bin/uv run pytest tests/test_automation_status.py tests/test_daily_triage_verifier.py -q
+/home/worsch/.local/bin/uv run python scripts/automation_status.py \
+  --since 2026-06-26 --until 2026-06-27 --db-url '' \
+  --progress-event-type daily_triage --timeout-seconds 10 \
+  --working-memory-dir /tmp --format json
+```
+
+Results:
+
+- focused tests: `11 passed`;
+- `make help` lists `automation-status` and `automation-status-json`;
+- the 2026-06-26 through 2026-06-27 status run exited `1` as expected because
+  State Hub evidence classified daily triage activity
+  `6fca51fa-387a-4fd0-bc4e-d62c29eb859a` as `validation_failed` with two
+  non-secret evidence records: 2026-06-26 `Expecting ',' delimiter` and
+  2026-06-27 `Unterminated string`;
+- the same report classified the gated weekly coding retro as `disabled`, not
+  `missed`.
--- a/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
+++ b/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
@@ -0,0 +1,204 @@
+---
+id: ACTIVITY-WP-0019
+type: workplan
+title: "Automation schedule inventory Make targets"
+domain: infotech
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: automation-inventory
+created: "2026-06-29"
+updated: "2026-07-01"
+state_hub_workstream_id: "21c73763-9adc-42f6-8fd2-1b8b33c2c770"
+---
+
+# Automation schedule inventory Make targets
+
+## Goal
+
+Provide a repo-native, non-LLM way to list every scheduled automation that
+activity-core knows about.
+
+`ACTIVITY-WP-0018` added the status surface for questions like "How did our
+automations go since Friday?". The next operator question is the inventory
+baseline: "What automations are scheduled at all?" That should be answerable
+through Make targets backed by activity-core's own ActivityDefinitions,
+database, and Temporal schedule metadata when available, independent of any
+coding assistant automation infrastructure.
+
+## Review notes
+
+- `Makefile` currently exposes `automation-status` and
+  `automation-status-json`, but no dedicated inventory/list target.
+- `scripts/automation_status.py` and `src/activity_core/automation_status.py`
+  already load scheduled ActivityDefinitions and compute their Temporal schedule
+  ids. The inventory target should reuse that parsing/loading posture where it
+  fits rather than creating a second discovery path.
+- `make sync-schedules` reconciles Temporal schedules from the
+  `activity_definitions` database, but it is an action target, not a read-only
+  operator inventory command.
+- The inventory command should remain useful in degraded local mode: file-backed
+  definitions are enough to list configured scheduled automations, while live
+  DB and Temporal visibility can enrich the output.
+
+## Task: Define the automation inventory contract
+
+```task
+id: ACTIVITY-WP-0019-T01
+status: done
+priority: high
+state_hub_task_id: "8de24590-f9ee-4d0e-8692-b7ada9f232ed"
+```
+
+Define the fields and source precedence for a deterministic scheduled
+automation inventory report.
+
+Acceptance:
+
+- The report includes every ActivityDefinition with `trigger_type` of `cron` or
+  `scheduled`, including disabled definitions.
+- Each row includes id, name, enabled/disabled state, trigger type, schedule
+  expression or one-shot datetime, timezone, overlap/catchup policy when known,
+  and the derived Temporal schedule id.
+- The report identifies its source for each row: database, repo definition file,
+  Temporal visibility, or a combination.
+- If Temporal is reachable, the report adds paused/missing/drift hints without
+  mutating schedules.
+- Missing optional sources produce warnings, not silent omissions.
+- The JSON shape is stable enough for scripts and tests.
+
+## Task: Implement a non-mutating inventory CLI
+
+```task
+id: ACTIVITY-WP-0019-T02
+status: done
+priority: high
+state_hub_task_id: "538cb9a5-48f3-470c-8518-29ee66c96678"
+```
+
+Add a deterministic CLI path for listing scheduled automations without requiring
+LLM credentials or coding assistant tooling.
+
+Acceptance:
+
+- A script or module command, likely sharing code with
+  `activity_core.automation_status`, supports human and JSON output.
+- The command is read-only: it does not call `sync-schedules`, upsert schedules,
+  delete schedules, enqueue workflows, or write State Hub evidence.
+- It supports filters by activity id, activity name, enabled state, and trigger
+  type.
+- It loads from the database when configured and falls back to repo definition
+  files when the database is unavailable or explicitly disabled.
+- It optionally enriches rows from Temporal when `TEMPORAL_HOST` is configured,
+  with bounded timeouts so an unreachable service does not hang the command.
+- Unit tests cover DB rows, file fallback, disabled definitions, Temporal
+  enrichment unavailable, and JSON output.
+
+## Task: Add Make targets
+
+```task
+id: ACTIVITY-WP-0019-T03
+status: done
+priority: high
+state_hub_task_id: "f2001721-07f3-42f5-a15e-0c7d1b0ed801"
+```
+
+Expose the inventory command through Make targets that are easy for humans,
+scripts, and coding assistants to run before asking for a prose summary.
+
+Acceptance:
+
+- `make automation-list` prints a concise human-readable inventory.
+- `make automation-list-json` emits the same inventory as JSON.
+- Optional Make variables pass through cleanly, for example `ENABLED=true`,
+  `TRIGGER=cron`, `ACTIVITY_ID=<uuid>`, or `FORMAT=json`.
+- `make help` lists both targets with clear one-line descriptions.
+- The targets do not require LLM access, Codex automation tooling, or
+  interactive prompts.
+
+## Task: Document the inventory workflow
+
+```task
+id: ACTIVITY-WP-0019-T04
+status: done
+priority: medium
+state_hub_task_id: "f687743b-3936-413e-ae50-d35484ae9a81"
+```
+
+Update operator documentation so the scheduled automation inventory path is
+discoverable next to the status path.
+
+Acceptance:
+
+- `docs/runbook.md` documents `make automation-list` and
+  `make automation-list-json`.
+- The docs distinguish inventory from status: inventory answers what is
+  configured; status answers what happened in a time window.
+- The docs state that the command is read-only and uses activity-core-owned
+  scheduling evidence.
+- The docs include a compact example of the expected human output.
+
+## Task: Verify against current repo and live/degraded sources
+
+```task
+id: ACTIVITY-WP-0019-T05
+status: done
+priority: medium
+state_hub_task_id: "5317b532-5cef-4eff-b6d8-3e85bbca8e8a"
+```
+
+Prove the target against the current scheduled automation definitions and
+degraded local conditions.
+
+Acceptance:
+
+- `make automation-list` shows the current scheduled automations, including
+  daily triage and weekly scheduled definitions when present in the selected
+  source.
+- JSON output is valid and includes the same rows.
+- A DB-unavailable run falls back to repo definition files or reports a clear
+  warning if no definitions are discoverable.
+- A Temporal-unavailable run exits successfully with Temporal warnings rather
+  than hanging.
+- Focused tests pass and the result is recorded in this workplan before the
+  workplan is moved to `finished`.
+
+
+## Implementation Result
+
+Completed 2026-07-01: implemented the read-only scheduled automation inventory
+surface.
+
+Delivered:
+
+- `scripts/automation_inventory.py` exposes the inventory CLI backed by
+  `activity_core.automation_status` shared definition and Temporal helpers.
+- `make automation-list` and `make automation-list-json` list configured
+  scheduled ActivityDefinitions with filters for `ENABLED`, `TRIGGER`,
+  `ACTIVITY_ID`, and `ACTIVITY_NAME`.
+- JSON output is script-safe; the Make JSON target suppresses command echo and
+  recursive make directory chatter.
+- `docs/runbook.md` now distinguishes inventory (what is configured) from status
+  (what happened in a time window).
+- Tests cover DB-backed rows, file fallback, disabled filtering, Temporal
+  unavailable warnings, and JSON CLI output.
+
+Verification:
+
+```bash
+/home/worsch/.local/bin/uv run pytest tests/test_automation_status.py tests/test_daily_triage_verifier.py -q
+bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list ACTCORE_DB_URL= TEMPORAL_HOST='
+bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list-json ACTCORE_DB_URL= TEMPORAL_HOST= > /tmp/activity-core-inventory.json && python3 -m json.tool /tmp/activity-core-inventory.json >/tmp/activity-core-inventory.pretty'
+bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list ACTCORE_DB_URL= TEMPORAL_HOST= ENABLED=true TRIGGER=cron'
+bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make help'
+```
+
+Results:
+
+- focused tests: `16 passed`;
+- degraded Make inventory run listed 9 file-backed scheduled automations, with
+  5 enabled and 4 disabled;
+- filtered Make run with `ENABLED=true TRIGGER=cron` listed 5 enabled cron
+  automations;
+- `automation-list-json` emitted parseable JSON directly;
+- `make help` lists `automation-list` and `automation-list-json`.
--- a/workplans/archived/260603-WP-0002-next-steps.md
+++ b/workplans/archived/260603-WP-0002-next-steps.md
@@ -3,6 +3,7 @@ type: session-note
 created: "2026-03-28"
 updated: "2026-06-03"
 status: archived
+state_hub_workstream_id: "b221e65a-6f97-44b0-8dae-442fffcb7f64"
 ---

 # WP-0002 Handoff Note — Continue on CoulombCore