ISSUE-WP-0003-T06: issue-core REST sink via actcore-issue-core-bridge (node-local tunnel 18765)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
chore(consistency): sync task status from DB [auto]
2026-07-02 14:20:12 +02:00 · 2026-07-02 11:55:47 +02:00 · 2026-07-02 11:55:07 +02:00 · 2026-07-02 11:54:43 +02:00 · 2026-07-02 11:54:04 +02:00 · 2026-07-02 10:44:00 +02:00
68 changed files with 6890 additions and 208 deletions
--- a/.claude/rules/credential-routing.md
+++ b/.claude/rules/credential-routing.md
@@ -0,0 +1,50 @@
 # Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
--- a/.claude/rules/first-session.md
+++ b/.claude/rules/first-session.md
@@ -1,11 +1,11 @@
 ## First Session Protocol
-Triggered when `get_domain_summary("custodian")` shows **no workstreams**.
+Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
 The project is registered but work has not yet been structured.
 **Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/custodian/project_charter_v0.1.md` — purpose, scope
+- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/custodian/roadmap_v0.1.md` — planned phases
+- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
 - Scan repo root: README, directory structure, existing code or docs
 **Step 2 — Survey in-progress work**
@@ -17,7 +17,7 @@ roadmap phase. **Wait for approval before creating.**
 **Step 4 — Create workplan file first, then DB record (ADR-001)**
 ```
-workplans/activity-core-WP-NNNN-<slug>.md   ← write this first
+workplans/ACTIVITY-WP-NNNN-<slug>.md   ← write this first
 ```
 Then register in the hub:
 ```
@@ -28,7 +28,7 @@ create_task(workstream_id="<id>", title="...", priority="high|medium|low")
 **Step 5 — Record the setup**
 ```
 add_progress_event(
-    summary="First session: structured custodian into N workstreams, M tasks",
+    summary="First session: structured infotech into N workstreams, M tasks",
    event_type="milestone",
    topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
    detail={"workstreams": [...], "tasks_created": M}
--- a/.claude/rules/repo-identity.md
+++ b/.claude/rules/repo-identity.md
@@ -1,5 +1,5 @@
 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.
-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a
--- a/.claude/rules/session-protocol.md
+++ b/.claude/rules/session-protocol.md
@@ -1,6 +1,7 @@
 ## Session Protocol
-State Hub: http://127.0.0.1:8000
+Dev Hub (State Hub API): http://127.0.0.1:8000
 MCP server name in `~/.claude.json`: `dev-hub`
 **Step 1 — Orient**
@@ -10,7 +11,7 @@ cat .custodian-brief.md
 ```
 Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
 ```
-get_domain_summary("custodian")
+get_domain_summary("infotech")
 ```
 If MCP tools are unavailable in the current agent session, use the REST API:
 ```bash
@@ -39,11 +40,11 @@ curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
 ls workplans/
 ```
 For each file with `status: ready`, `active`, or `blocked`, note pending
-`todo`/`in_progress` tasks.
+`wait`/`todo`/`progress` tasks.
 **Step 4 — Present brief**
-1. **Active workstreams** for `custodian` — title, task counts, blocking decisions
+1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
 2. **Pending tasks** from `workplans/` + any `[repo:activity-core]` hub tasks
 3. **Goal guidance** — if `goal_guidance` in summary:
   - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
--- a/.claude/rules/workplan-convention.md
+++ b/.claude/rules/workplan-convention.md
@@ -1,7 +1,7 @@
 ## Workplan Convention (ADR-001)
-File location: `workplans/activity-core-WP-NNNN-<slug>.md`
+File location: `workplans/ACTIVITY-WP-NNNN-<slug>.md`
-ID prefix: `ACTIVITY-WP`
+ID prefix: `ACTIVITY-WP-`
 Work items originate as files in this repo **before** being registered in the hub.
@@ -12,7 +12,7 @@ repo state, and `finished` when implementation is complete. `stalled` and
 `needs_review` are derived health labels, not stored statuses.
 Closed workplans may be moved to `workplans/archived/` with a completion-date
-prefix: `YYMMDD-activity-core-WP-NNNN-<slug>.md`. The frontmatter id remains
+prefix: `YYMMDD-ACTIVITY-WP-NNNN-<slug>.md`. The frontmatter id remains
 unchanged; the prefix is only for quick visual reference.
 Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
@@ -25,4 +25,16 @@ Ecosystem todos from other agents arrive as `[repo:activity-core]` hub tasks —
 visible at session start. Pick one up by creating the workplan file, then registering
 the workstream.
 Task blocks use this shape:
 ```task
 id: ACTIVITY-WP-NNNN-T01
 status: wait | todo | progress | done | cancel
 priority: high | medium | low
 state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 ```
 Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
 blocked work and `cancel` for stopped work.
 <!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
--- a/.custodian-brief.md
+++ b/.custodian-brief.md
@@ -1,34 +1,23 @@
 <!-- custodian-brief: generated by fix-consistency — do not edit manually -->
 # Custodian Brief — activity-core
-**Domain:** custodian  
+**Domain:** infotech  
-**Last synced:** 2026-06-18 15:52 UTC  
+**Last synced:** 2026-07-02 09:55 UTC  
 **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
 ## Active Workstreams
-### Definition And Schedule Hot Reload
+### Adopt State Hub Beachhead Endpoint
-Progress: 0/5 done  |  workstream_id: `8887075e-21ec-451b-b82b-cd81035c9ca5`
+Progress: 0/2 done  |  workstream_id: `bbc07f9e-9323-4b2b-b556-c33b37d0b228`
 **Open tasks:**
- ! Live No-Restart Smoke  `68a0e22a`
+- ! Point STATE_HUB_URL at the beachhead  `76b6132d`
- · Extract Reusable Sync Service  `53a7970b`
+- ! Retire the bespoke actcore-state-hub-bridge proxy  `526c2129`
 - · Add Admin Sync Endpoint  `8697c761`
 - · Preserve Schedule Drift Semantics  `efeac412`
 - · Optional Background Sync Loop  `d774087b`
 ### Post-triage operational hardening
 Progress: 6/8 done  |  workstream_id: `5646e13a-13af-4724-bca6-3c0d86f96733`
 **Open tasks:**
 - ! Three-Run Calibration Feedback  `7cbf0a35`
 - · Implement reuse_surface_report_gaps shell resolver for coulomb registry hygiene  `25293d5e`
 ### Daily Triage LLM Reconciliation And Evidence
-Progress: 1/5 done  |  workstream_id: `f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9`
+Progress: 2/5 done  |  workstream_id: `f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9`
 **Open tasks:**
 - ! Reconcile Live Railiance Runtime  `23545ddc`
 - ! Run Daily Triage Fixture Smoke  `10e0df77`
 - ! Collect Three Clean Scheduled Runs  `dc6b9482`
 - ! Close Handoff State  `ecc57e21`
@@ -50,6 +39,6 @@ Progress: 2/3 done  |  workstream_id: `7387fc50-1f2c-471a-9d85-bb085cbd0b63`
 ## MCP Orientation (when available)
 If the state-hub MCP server is reachable, call:
-`get_domain_summary("custodian")`
+`get_domain_summary("infotech")`
 This provides richer cross-domain context.
 If the MCP call fails, use this file as your orientation source.
--- a/.env.example
+++ b/.env.example
@@ -18,7 +18,9 @@ STATE_HUB_URL=http://127.0.0.1:8000
 # Repo scoping — used by the repo-scoping context adapter. Binds {} on failure.
 REPO_SCOPING_URL=http://127.0.0.1:8020
 # Issue Core — task emission backend.
-ISSUE_CORE_URL=http://127.0.0.1:8010
+ISSUE_CORE_URL=http://127.0.0.1:8765
 # Shared ingestion key — must match issue-core's ISSUE_CORE_API_KEY.
 ISSUE_CORE_API_KEY=
 # Sink type: 'rest' (POST to issue-core) or 'null' (discard, for dry-run).
 ISSUE_SINK_TYPE=rest
--- a/.kaizen/schedule.yml
+++ b/.kaizen/schedule.yml
@@ -1,17 +1,15 @@
-# Kaizen scheduled agent execution (ADR-005)
+# Kaizen scheduled agent execution manifest (ADR-005)
-# Engagement: coulomb-loop — stabilize phase (daily crons per ADR-003)
+# Engagement: coulomb-loop bootstrap — weekly cadence
-# Promoted 2026-06-18 after 3/3 bootstrap E2E cycles
+# Regulator promotes cadence per customer engagement policy (ADR-003).
 # Validate with: kaizen-agentic schedule validate
 version: '1'
 timezone: Europe/Berlin
 agents:
  coach:
-    cadence: daily
+    cadence: weekly
-    cron: "0 9 * * *"
+    cron: 0 9 * * 1
    enabled: true
  optimization:
-    cadence: daily
+    cadence: weekly
-    cron: "0 10 * * *"
+    cron: 0 10 * * 1
    enabled: true
  tdd-workflow:
    cadence: monthly
    enabled: false
--- a/.repo-classification.yaml
+++ b/.repo-classification.yaml
@@ -0,0 +1,28 @@
 # Repo classification (Repo Classification Standard v1.0).
 repo_classification:
  standard: Repo Classification Standard
  version: '1.0'
  classified_at: '2026-06-22'
  classified_by: human
  category: tooling
  domain: infotech
  secondary_domains:
  - agents
  capability_tags:
  - workflow
  - orchestration
  - automation
  - coordination
  - observability
  business_stake:
  - technology
  - operations
  - automation
  - execution
  business_mechanics:
  - coordination
  - operation
  - adaptation
  notes: Org-wide event bridge / task factory (Temporal-based). Active bounded implementation
    -> project.
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -4,7 +4,7 @@
 **Purpose:** Durable task factory built on Temporal. Manages ActivityDefinitions, schedules recurring workflows via Temporal Schedules, routes events via NATS JetStream, and exposes a FastAPI CRUD surface for the custodian domain.
-**Domain:** custodian
+**Domain:** infotech
 **Repo slug:** activity-core
 **Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
 **Workplan prefix:** `ACTIVITY-WP-`
@@ -83,7 +83,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
 1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
 2. Check inbox: `GET /messages/?to_agent=activity-core&unread_only=true`; mark read
 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
-4. Check blocked tasks: `GET /tasks/?needs_human=true`
+4. Check human-needed tasks: `GET /tasks/?needs_human=true`
 **During work:**
 - Update task statuses in workplan files as tasks progress
@@ -101,6 +101,78 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
 ---
 ## Credential and access routing
 **Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
 for inference. Run this check **before** requesting secrets, API keys, SSH access,
 login tokens, or database passwords — in any repo, not only `ops-warden`.
 ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
 other credential need belongs to another subsystem. **Do not** message
 `ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
 ### Lookup (do this first)
 ```bash
 warden route find "<describe your need>" --json
 warden route show <catalog-id> --json
 ```
 Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
 | Agent runtime | How to orient |
 | --- | --- |
 | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=activity-core` is for coordination, not secret vending |
 | **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
 | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
 ### Quick routing table
 | I need… | Owner | ops-warden executes? |
 | --- | --- | --- |
 | SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
 | API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
 | Login / OIDC / MFA | key-cape / Keycloak | No — route only |
 | Authorization decision | flex-auth | No — route only |
 | activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
 | SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
 ### Anti-patterns (do not do these)
 - `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
 - Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
 - Pasting secrets into Git, State Hub, workplans, logs, or chat
 ### Other capabilities (reuse-surface)
 Non-credential capabilities are usually discovered through **reuse-surface** federation
 (`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
 every repo's agent instructions because it is high-frequency, high-risk, and easy to
 get wrong.
 **Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
 <!-- REPO-AGENTS-EXTENSIONS -->
 <!-- Append repo-specific agent instructions below this marker.
     The state-hub template sync preserves content after this line. -->
 ---
 ## Automation Scheduling Preference
 Durable activity-core automations must use this repo's own infrastructure:
 Temporal Schedules, NATS JetStream, activity-core run records, State Hub
 progress, and configured report/evidence sinks. Do not use coding
 assistant-provided automation, reminder, or heartbeat tooling as the execution
 or evidence source for production or operational recurrence.
 Coding assistants may run repo-native inspection commands and summarize their
 outputs, but the baseline answer to questions like "How did our automations go
 since Friday?" must come from deterministic local tooling such as the
 ACTIVITY-WP-0018 automation status surface.
 ---
 ## Workplan Convention (ADR-001)
 Work items originate as files in this repo — not in the hub. The hub is a
@@ -124,7 +196,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
 id: ACTIVITY-WP-NNNN
 type: workplan
 title: "..."
-domain: custodian
+domain: infotech
 repo: activity-core
 status: proposed | ready | active | blocked | backlog | finished | archived
 owner: codex
@@ -154,10 +226,7 @@ state_hub_task_id: "<uuid>"         # written by fix-consistency — do not edit
 Task description text.
 ```
-Status progression: `todo` → `progress` → `done`; use `wait` for a task
+Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
 blocked on external input and `cancel` for intentionally abandoned work.
 Workstream/workplan lifecycle status is separate; frontmatter `blocked` remains
 valid there.
 To create a new workplan:
 1. Write the file following the format above
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -8,4 +8,5 @@
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md
--- a/27
+++ b/27
@@ -1,13 +1,17 @@
 -include .env
 export
-.PHONY: sync-event-types sync-activity-definitions test migrate sync-all \
+.PHONY: sync-event-types sync-activity-definitions sync-schedules test migrate sync-all \
        automation-status automation-status-json automation-list automation-list-json \
        dev-up dev-down railiance-up railiance-down \
        start-worker start-api start-event-router help
 sync-activity-definitions:  ## Sync ActivityDefinition files into DB
 	uv run python -m activity_core.sync_activity_definitions
 sync-schedules:  ## Reconcile Temporal schedules from activity_definitions DB
 	uv run python -m activity_core.sync_schedules
 sync-event-types:  ## Sync event type YAML files into DB
 	uv run python scripts/sync_event_types.py
@@ -21,6 +25,27 @@ migrate:  ## Apply all pending Alembic migrations
 sync-all: sync-event-types sync-activity-definitions  ## Sync event types and activity definitions
 # -- Automation status ---------------------------------------------------------
 SINCE ?= today
 FORMAT ?= human
 ENABLED ?= all
 TRIGGER ?=
 ACTIVITY_ID ?=
 ACTIVITY_NAME ?=
 automation-status:  ## Report recent automation status from repo-owned evidence
 	uv run python scripts/automation_status.py --since "$(SINCE)" $(if $(UNTIL),--until "$(UNTIL)",) --format "$(FORMAT)"
 automation-status-json:  ## Report recent automation status as JSON
 	$(MAKE) automation-status FORMAT=json
 automation-list:  ## List configured scheduled automations from repo-owned definitions
 	@uv run python scripts/automation_inventory.py --format "$(FORMAT)" --enabled "$(ENABLED)" $(if $(TRIGGER),--trigger-type "$(TRIGGER)",) $(if $(ACTIVITY_ID),--activity-id "$(ACTIVITY_ID)",) $(if $(ACTIVITY_NAME),--activity-name "$(ACTIVITY_NAME)",)
 automation-list-json:  ## List configured scheduled automations as JSON
 	@$(MAKE) --no-print-directory automation-list FORMAT=json
 # ── Infrastructure ─────────────────────────────────────────────────────────────
 dev-up:  ## Start full dev stack (Temporal + PG + ES + NATS)
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -64,7 +64,9 @@ The two evaluation modes:
  `context.*` / `event.*` interpolation and explicit `for_each` per-item
  binding. No `exec()`.
 - **Instruction executor**: trusted-field prompt rendering, LLM call via
-  llm-connect, structured output validation, bounded validation-failure
+  llm-connect, structured output validation, item-granular recovery with a
  quarantine lane and producer guardrails (count/length/depth caps, reference
  allow-list) at the producer trust boundary, bounded validation-failure
  artifacts for report instructions, review-required audit metadata, and
  deterministic report sinks. A real downstream review queue is not implemented
  in this repo.
@@ -88,6 +90,9 @@ The two evaluation modes:
 - **REST admin API** (FastAPI): CRUD for ActivityDefinitions, manual trigger,
  event type registry queries.
 - **Prometheus metrics**: Temporal SDK metrics exposed for scraping.
 - **Automation status surface**: deterministic, non-LLM status reporting via
  `make automation-status` / `scripts/automation_status.py`, using repo-owned
  evidence sources rather than coding assistant scheduler state.
 - **Operational runbook**: `docs/runbook.md`.
 ---
@@ -114,6 +119,10 @@ The two evaluation modes:
  runs on Railiance infrastructure (or Docker Compose for dev).
 - **End-user task UI** — tasks land in issue-core; presentation is separate.
 - **Synchronous request-response patterns** — Temporal is async-first.
 - **Coding assistant automation infrastructure** — assistant-provided reminders,
  heartbeats, or scheduled jobs are not the execution or evidence authority for
  activity-core automations. Assistants may run and summarize repo-native
  commands only.
 ---
@@ -130,6 +139,8 @@ The two evaluation modes:
  commands.
 - You are replacing scattered bespoke cron jobs and manual coordination with
  a governed, observable automation layer.
 - You need to answer "how did our automations go since Friday?" from
  deterministic repo-native evidence before any optional LLM summary.
 ---
@@ -320,6 +331,9 @@ new one-off control paths.
  governance model, event type schema, ActivityDefinition structure.
 - `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
  model, evaluation semantics, audit trail, testing strategy.
 - `docs/adr/adr-004-producer-trust-boundary.md` — untrusted-producer premise,
  trust-but-handle vs verify-and-mitigate postures, error-locality and
  quarantine-with-provenance, producer guardrails for LLM/agent/human output.
 ---
--- a/docs/adr/adr-004-producer-trust-boundary.md
+++ b/docs/adr/adr-004-producer-trust-boundary.md
@@ -0,0 +1,156 @@
 ---
 id: ACT-ADR-004
 type: architecture-decision-record
 title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output"
 status: accepted
 decided_by: Bernd Worsch
 date: "2026-06-26"
 scope: cross-repo
 affects:
  - activity-core
  - rules-core (future extraction)
 tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"]
 ---
 # ACT-ADR-004: The Producer Trust Boundary
 ## Status
 Accepted.
 ## Context
 On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called
 llm-connect successfully, and produced a long ranked recommendation list — but
 the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because
 the report was validated and consumed as a single monolithic JSON document, one
 malformed delimiter discarded the **entire** run, including the 7 perfectly good
 recommendations the model had already emitted. The scheduling and runtime layers
 were healthy; the failure was entirely at the seam where free-form model output
 meets a strict consumer.
 This is not a one-off bug, it is a recurring class. activity-core has a **trust
 boundary** wherever generative or human-authored output meets strict deterministic
 consumers: the JSON Schema validator, the task emitter, and any classic compute
 pipeline downstream. The producers on the other side of that boundary — **LLMs,
 agents, and humans** — are all *untrusted producers*. Their output may be:
 - **erroneous** — hallucination, truncation at a token limit, drift, type slips,
  typos, a missing delimiter; or
 - **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested
  structures intended to exhaust or confuse the consumer.
 The pre-existing design treated producer output optimistically: parse the whole
 document, validate the whole document, and on any failure discard the whole
 document (preserving only a bounded diagnostic preview). That gives **zero error
 locality** — the blast radius of any single defect is the entire activation.
 ## Decision
 Treat the producer→consumer seam as an explicit, adversarial **trust boundary**,
 and place guardrails plus error-correction tooling *at that boundary* rather than
 letting raw producer output flow into deterministic consumers.
 ### Two non-fail-fast postures
 When hard-failing on a problem is undesirable, there are two sound strategies, and
 they **compose**:
 - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy
  path; blast radius depends entirely on how granular the catch is. Best when
  failures are rare and locally recoverable. Risk: failures surface late, possibly
  after partial side effects.
 - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
  and normalize the output to a known-good shape *before* it enters the pipeline —
  drop bad items, coerce types, bound sizes/depth, allow-list references — so the
  consumer only ever sees clean input. Higher upfront cost, smaller blast radius,
  no partial side effects. Best when failures are common or consequences are high.
 ### Governing principles
 1. **Push verification to the boundary; keep the interior strict.** Apply posture
   **B** at the producer→consumer boundary; keep posture **A** for residual
   exceptions inside the verified core. Never relax the interior schema to absorb
   producer sloppiness.
 2. **Make error locality match the unit of work.** One bad recommendation must
   cost one recommendation, not the whole report. Structuring the payload so each
   item is independently parseable and validatable is the highest-leverage change.
 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
   provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they
   can be debugged or replayed. Degraded-but-usable is reported distinctly from
   total loss.
 4. **Both human and agent input get the same rigor.** Guardrails are
   producer-agnostic: the same count / length / depth caps and reference
   allow-lists apply whether the producer is an LLM, an agent, or a human.
 ### What this means concretely in activity-core
 Implemented in `src/activity_core/rules/executor.py`:
 - **Strict-structure-only schema.** The daily-triage output schema is strict on
  per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`)
  and carries `maxItems` as a producer *hint* — never as a hard whole-document
  reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs
  the schema format; `schemas/daily-triage-report.json`).
 - **Item-granular recovery (posture B).** When whole-document parse + one retry
  fail, `_resilient_report` recovers individually-parseable recommendation objects
  via a brace/quote-aware scanner (`_extract_object_spans`) that works for both
  pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a
  truncated tail, validates each recovered object against the item schema, and
  keeps the valid ones. Survivors are emitted with `output_validated=true`,
  `partial=true`, and `review_required=true`.
 - **Producer guardrails (`_partition_items`, applied on both the recovery and the
  happy path).** Per recommendation: structural type → schema → structural caps
  (`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by
  `maxItems`). The first failing check quarantines the item with provenance and a
  `reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`).
 - **Reference allow-list.** A recommendation whose `candidate` is not in the set of
  known ids is quarantined. The set is sourced from resolved context
  (`context["known_candidates"]`, via `_allow_list_from_context`); the check is
  inert until a context resolver populates it, so the capability ships now and
  activates with a one-line resolver change.
 ### Where each posture sits
 | Layer | Posture | Mechanism |
 |-------|---------|-----------|
 | Schema / contract | B | strict per-item structure; `maxItems` as hint |
 | Whole-document parse | A | tolerant parse + single retry |
 | Failed parse | B | item-granular recovery + repair + quarantine |
 | Per-item screening | B | schema + depth/length caps + allow-list + count cap |
 | Emitted report | — | `partial` / `quarantined_*` provenance; never silent |
 ## Consequences
 - A single malformed or oversized item no longer discards an entire activation;
  the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid
  recommendations and quarantine the broken tail.
 - Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks
  and reviewers can distinguish degraded-but-usable from total loss.
 - Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the
  allow-list) are policy knobs that will need tuning; they are intentionally
  conservative defaults, not a finished calibration.
 - **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns
  only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact
  caps raw output below realistic break points. Capturing those signals so
  failures stay debuggable is tracked as a retention fix, not closed by this ADR.
 ## Alternatives considered
 - **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an
  over-count document reproduces the whole-document blast radius. Mitigation (keep
  top-N, quarantine the rest) is preferred.
 - **Relax the schema to accept anything.** Rejected: violates principle 1; pushes
  malformed data into downstream consumers.
 - **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the
  2026-06-26 failure recurred across both the initial attempt and the retry, so
  retry alone does not bound the blast radius.
 ## References
 - ACT-ADR-002 — markdown-as-definition format and output schema governance.
 - ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection
  surface this boundary complements on the output side.
 - `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the
  implementing workplan.
--- a/docs/issue-core-emission-boundary.md
+++ b/docs/issue-core-emission-boundary.md
@@ -11,7 +11,9 @@ The current authoritative boundary is the issue-core REST API:
 POST {ISSUE_CORE_URL}/issues/
 ```
-`IssueCoreRestSink` sends this payload:
+`IssueCoreRestSink` authenticates with the shared `ISSUE_CORE_API_KEY` env var
 (same value as the issue-core server) via `Authorization: Bearer <key>` and
 sends this payload:
 ```json
 {
@@ -52,7 +54,7 @@ task reference before it can replace `IssueCoreRestSink`.
 Weekly SBOM staleness is safe to evaluate in dry-run mode because the rule
 contract is deterministic and tested. Do not enable it against the real REST sink
-until issue-core credentials, endpoint reachability, and duplicate-handling are
+until `ISSUE_CORE_API_KEY`, endpoint reachability, and duplicate-handling are
 verified in the target environment.
 ## Verification
--- a/docs/runbook.md
+++ b/docs/runbook.md
@@ -116,7 +116,129 @@ asyncio.run(publish())
 ---
-## Syncing schedules manually
+## Syncing definitions and schedules manually
 When the API is running, prefer the admin sync endpoint for definition or
 schedule changes. It refreshes file-backed ActivityDefinitions and reconciles
 Temporal Schedules without restarting the worker:
 ```bash
 curl -s -X POST \
  'http://localhost:8010/admin/sync?definitions=true&schedules=true'
 ```
 The response reports:
 - `definitions.synced`
 - `event_types.synced`
 - `schedules.upserted`
 - `schedules.paused`
 - `schedules.deleted_orphans`
 - bounded `errors[]`
 ## Automation inventory
 Use the repo-native inventory command to answer "what automations are scheduled
 at all?" before checking whether a recent window succeeded. The command is
 read-only: it loads ActivityDefinition rows or files and, when `TEMPORAL_HOST`
 is configured, describes Temporal schedules for visibility. It does not sync,
 upsert, pause, delete, or enqueue schedules.
 ```bash
 # Human-readable configured automation inventory.
 make automation-list
 # JSON for scripts or assistant summarization.
 make automation-list-json
 # Common filters.
 make automation-list ENABLED=true TRIGGER=cron
 make automation-list ACTIVITY_ID=6fca51fa-387a-4fd0-bc4e-d62c29eb859a
 ```
 Inventory answers what is configured; `make automation-status` answers what
 happened in a time window. Missing optional live sources are warnings, not
 silent omissions, so a degraded local run still lists repo definition files.
 Compact human output looks like:
 ```text
 - Daily State Hub WSJF Triage [enabled cron] schedule=activity-schedule-... trigger=20 7 * * * tz=Europe/Berlin source=files temporal=not_checked
 ```
 ## Automation status
 Use the repo-native status command to answer operator questions such as "how did
 our automations go since Friday?". This is the baseline evidence surface; LLMs
 or coding assistants may summarize the output, but they are not the scheduler or
 source of truth.
 ```bash
 # Human-readable status. `friday` resolves in Europe/Berlin by default.
 make automation-status SINCE=friday
 # JSON for scripts or assistant summarization.
 make automation-status-json SINCE=2026-06-26
 ```
 The command reads activity-core owned evidence only: ActivityDefinition files or
 DB rows, `activity_runs`, State Hub progress, working-memory report notes, and
 Temporal visibility when `TEMPORAL_HOST` is configured. Missing live sources are
 reported as warnings rather than hidden. It exits non-zero for real automation
 failures such as `missed`, `validation_failed`, or `sink_failed`.
 Useful knobs:
 ```bash
 AUTOMATION_STATUS_TIMEOUT_SECONDS=10 make automation-status SINCE=friday
 make automation-status SINCE=2026-06-26 FORMAT=json
 make automation-status SINCE=2026-06-26 UNTIL=2026-06-27 ACTCORE_DB_URL=
 ```
 Example distinction from the June 2026 daily triage evidence:
 ```text
 - Activity 6fca51fa-387a-4fd0-bc4e-d62c29eb859a [validation_failed] expected=0 runs=0 evidence=2
  evidence state_hub_progress event_type=daily_triage run=ebec6e41... output_validated=false validation_error=Unterminated string...
  evidence state_hub_progress event_type=daily_triage run=c7370f9c... output_validated=false validation_error=Expecting ',' delimiter...
 ```
 That means the schedule/report path left evidence, but the report was not a
 clean validated output. Disabled schedules, such as the gated weekly coding
 retro, are reported as `disabled` and are not counted as missed runs.
 `event_types` defaults to `false` for this endpoint because event-triggered
 definitions already reload from the DB in the event router path; opt in when
 the operator intentionally changed event type definition files:
 ```bash
 curl -s -X POST \
  'http://localhost:8010/admin/sync?definitions=true&schedules=true&event_types=true'
 ```
 The v1 posture is manual/operator-triggered sync. A periodic background loop is
 deferred until live use shows it is needed; this keeps customer definition
 changes explicit and avoids background repo scanning from the worker.
 ### Railiance01 no-restart smoke
 After changing a projected definition in `k8s/railiance/20-runtime.yaml`,
 apply the ConfigMap and wait for the API pod volume to refresh (up to ~60s),
 then reconcile without restarting `actcore-worker`:
 ```bash
 export KUBECONFIG=~/.kube/config-hosteurope
 kubectl apply -f k8s/railiance/20-runtime.yaml
 sleep 60
 kubectl -n activity-core exec deploy/actcore-api -- \
  python3 -c 'import urllib.request; req=urllib.request.Request("http://localhost:8010/admin/sync?definitions=true&schedules=true", method="POST"); print(urllib.request.urlopen(req).read().decode())'
 ```
 Automated regression for the disabled `ops-service-inventory-probes`
 projection (enable/cadence flip, idempotent repeat sync, rollback) lives in
 `scripts/smoke_admin_sync_no_restart.py`.
 If the API is unavailable, the schedule-only CLI remains available:
 ```bash
 TEMPORAL_HOST=localhost:7233 \
@@ -126,7 +248,7 @@ ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
 This reconciles all Temporal Schedules with the `activity_definitions` table:
 - Upserts schedules for every enabled cron definition
- Creates paused schedules for disabled cron definitions
+- Creates paused schedules for disabled cron or one-shot scheduled definitions
 - Deletes orphaned schedules with no matching DB row
 After adding or changing a recurring ActivityDefinition or workflow activity
@@ -282,6 +404,52 @@ the same durable consumer name provides automatic failover.
 ---
 ## Run-miss recovery policies (cron triggers)
 A cron fire is **missed** when the worker or Temporal is unavailable at trigger
 time. `trigger_config.misfire_policy` selects what happens when the system
 recovers. Each policy combines a Temporal **catchup window** (how far back missed
 fires are recovered) with an **overlap policy** (what to do if a recovered fire
 would start while a prior run is still executing):
 | `misfire_policy` | Behaviour | Default catchup window | Overlap |
 | --- | --- | --- | --- |
 | `skip` | Run on trigger or skip — a missed fire is never recovered | 60s grace | `SKIP` |
 | `catchup_all` | Recover **every** fire missed during the outage | 365 days | `BUFFER_ALL` |
 | `catchup_latest` | Recover only the **most recent** missed fire; no backlog | 24h | `BUFFER_ONE` |
 Set `trigger_config.catchup_window_seconds` to override the per-policy default
 (e.g. an hourly definition using `catchup_latest` should set it to ~3600 so a
 single missed hour is recovered but older ones are not).
 Legacy values are still accepted: `catchup` → `catchup_all`,
 `compress` → `catchup_latest`.
 > **Why this exists:** before ACTIVITY-WP-0014 no catchup window was set, so a
 > brief outage at trigger time silently dropped the fire with no recovery and no
 > log line. The `daily-statehub-wsjf-triage` definition now uses `catchup_latest`.
 ## State Hub write idempotency (ACTIVITY-WP-0014 T05)
 Every State Hub write from activity-core (report-sink progress, ops-evidence
 progress, schedule-miss alerts) carries a stable **`Idempotency-Key`** header
 derived deterministically from the write's identity
 (`run_id:instruction_id:event_type`, or `schedule_miss:activity_id:last_fired`
 for miss alerts). This makes writes safe to **buffer and replay** under the
 planned State Hub *beachhead* (per-machine read cache + write outbox): a flush —
 possibly retried after an outage — cannot create duplicate progress/triage
 events once State Hub / the beachhead honours the header.
 The guarantee lives on the write, not on a live dedup read. The read-based
 `_progress_exists` check is now best-effort only: if State Hub is unreachable it
 returns `False` (proceed to the keyed write) rather than hard-failing. The header
 passes untouched through the `actcore-state-hub-bridge` proxy and is ignored by
 State Hub versions that do not yet honour it.
 > The queue/cache itself is **not** built in activity-core — it belongs to the
 > state-hub beachhead. activity-core only emits the key. See the proposal sent to
 > the `state-hub` agent.
 ## Troubleshooting
 ### Worker fails to start: "ACTCORE_DB_URL is required"
@@ -291,6 +459,9 @@ Set the environment variable before running the worker.
 1. Check Temporal UI → Schedules tab for the schedule status.
 2. Ensure `enabled=True` on the ActivityDefinition (paused schedules don't fire).
 3. Verify the cron expression with: `docker exec temporal-admin-tools temporal schedule describe --schedule-id activity-schedule-<uuid>`
 4. If a fire was **missed entirely** (no run, no failure event) during an outage,
   check `misfire_policy` — under `skip` missed fires are dropped by design. Use
   `catchup_all` or `catchup_latest` to recover them. See *Run-miss recovery policies*.
 ### Event not routing
 1. Check NATS monitoring: http://localhost:8222/jsz to verify the `ACTIVITY_EVENTS` stream exists.
--- a/k8s/railiance/20-runtime.yaml
+++ b/k8s/railiance/20-runtime.yaml
@@ -14,8 +14,8 @@ data:
  LLM_CONNECT_URL: http://llm-connect.activity-core.svc.cluster.local:8080
  LLM_CONNECT_TIMEOUT_SECONDS: "300"
  REPO_SCOPING_URL: http://repo-scoping.repo-scoping.svc.cluster.local:8020
-  ISSUE_CORE_URL: http://issue-core.issue-core.svc.cluster.local:8010
+  ISSUE_CORE_URL: http://actcore-issue-core-bridge.activity-core.svc.cluster.local:8765
-  ISSUE_SINK_TYPE: "null"
+  ISSUE_SINK_TYPE: "rest"
  ACTIVITY_DEFINITION_DIRS: /etc/activity-core/external-definitions
  OPS_INVENTORY_PATH: /etc/activity-core/ops/service-inventory.yml
  INTER_HUB_URL: ""
@@ -47,7 +47,10 @@ data:
      type: cron
      cron_expression: "20 7 * * *"
      timezone: Europe/Berlin
-      misfire_policy: skip
+      # ACTIVITY-WP-0014: recover the most recent missed daily fire when the
      # worker/Temporal was unavailable at trigger time, without accumulating a
      # backlog after a multi-day outage.
      misfire_policy: catchup_latest
    context_sources:
      - type: static
        bind_to: context.prompt_path
@@ -91,15 +94,19 @@ data:
      Score each recommendation with the WSJF rubric from the prompt:
      (strategic_value + time_criticality + risk_reduction +
      opportunity_enablement) / job_size. Use integer factor values from 1 to 5,
-      round score to one decimal place, sort recommendations by rank, and return at
+      round score to one decimal place, sort recommendations by rank, and return
-      most 10 recommendations.
+      only the bounded top-7 (at most 7) ranked recommendations. If uncertain,
      emit fewer well-formed recommendations rather than more.
      Curated digest:
      {context.daily_triage_digest}
      Return only JSON matching
-      `/etc/activity-core/schemas/daily-triage-report.json`. Do not wrap the JSON
+      `/etc/activity-core/schemas/daily-triage-report.json`. Emit the "summary"
-      in Markdown fences or add prose before or after it:
+      field first, then inside the "recommendations" array write one complete
      recommendation JSON object per line (NDJSON-style per-item framing) so
      each item can be recovered independently if the output is truncated. Do
      not wrap the JSON in Markdown fences or add prose before or after it:
      {
        "summary": "short operator-facing summary",
        "recommendations": [
@@ -164,6 +171,36 @@ data:
    Kubernetes projection of the Custodian-owned definition in
    `/home/worsch/the-custodian/activity-definitions/hourly-recently-on-scope.md`.
  state-hub-consistency-sweep.md: |
    ---
    id: "7c4e9a12-8f3b-4d5e-9c6a-1b2d3e4f5a6b"
    name: "State Hub Consistency Sweep"
    type: activity-definition
    version: "1.0"
    enabled: true
    owner: custodian
    governance: custodian
    status: active
    created: "2026-06-21"
    trigger:
      type: cron
      cron_expression: "*/15 * * * *"
      timezone: UTC
      misfire_policy: skip
    context_sources:
      - type: state-hub
        query: consistency_sweep_remote_all
        required: true
        params:
          max_seconds: 300
          source: activity-core
        bind_to: context.consistency_sweep_remote_all
    ---
    # ActivityDefinition: State Hub Consistency Sweep
    Kubernetes projection of the Custodian-owned definition in
    `/home/worsch/the-custodian/activity-definitions/state-hub-consistency-sweep.md`.
  ops-service-inventory-probes.md: |
    ---
    id: "40d15a87-7ff6-4d8e-992c-37df15f95110"
@@ -399,7 +436,7 @@ data:
        "recommendations": {
          "type": "array",
          "minItems": 1,
-          "maxItems": 10,
+          "maxItems": 7,
          "items": {
            "type": "object",
            "required": ["rank", "candidate", "action", "why", "confidence", "wsjf"],
@@ -408,7 +445,7 @@ data:
              "rank": {
                "type": "integer",
                "minimum": 1,
-                "maximum": 10
+                "maximum": 7
              },
              "candidate": {
                "type": "string"
@@ -578,7 +615,8 @@ spec:
                          method=self.command,
                      )
                      try:
-                          with urlopen(request, timeout=30) as response:
+                          timeout = 360 if self.command == "POST" else 30
                          with urlopen(request, timeout=timeout) as response:
                              payload = response.read()
                              self.send_response(response.status)
                              for key, value in response.headers.items():
@@ -599,12 +637,123 @@ spec:
              ThreadingHTTPServer(("0.0.0.0", 18080), Proxy).serve_forever()
          readinessProbe:
            httpGet:
-              path: /state/summary
+              path: /state/health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 6
 apiVersion: v1
 kind: Service
 metadata:
  name: actcore-issue-core-bridge
  namespace: activity-core
  labels:
    app.kubernetes.io/name: actcore-issue-core-bridge
    app.kubernetes.io/part-of: activity-core
 spec:
  selector:
    app.kubernetes.io/name: actcore-issue-core-bridge
  ports:
    - name: http
      port: 8765
      targetPort: http
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: actcore-issue-core-bridge
  namespace: activity-core
  labels:
    app.kubernetes.io/name: actcore-issue-core-bridge
    app.kubernetes.io/part-of: activity-core
 spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: actcore-issue-core-bridge
  template:
    metadata:
      labels:
        app.kubernetes.io/name: actcore-issue-core-bridge
        app.kubernetes.io/part-of: activity-core
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: proxy
          image: activity-core:railiance01-prod
          imagePullPolicy: Never
          ports:
            - name: http
              containerPort: 18081
          command:
            - python
            - -c
            - |
              from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
              from urllib.error import HTTPError, URLError
              from urllib.request import Request, urlopen
              TARGET = "http://127.0.0.1:18765"
              HOP_HEADERS = {"connection", "host", "keep-alive", "proxy-authenticate",
                             "proxy-authorization", "te", "trailers",
                             "transfer-encoding", "upgrade"}
              class Proxy(BaseHTTPRequestHandler):
                  def do_GET(self):
                      self._proxy()
                  def do_POST(self):
                      self._proxy()
                  def do_PATCH(self):
                      self._proxy()
                  def _proxy(self):
                      length = int(self.headers.get("content-length", "0") or "0")
                      body = self.rfile.read(length) if length else None
                      headers = {
                          key: value
                          for key, value in self.headers.items()
                          if key.lower() not in HOP_HEADERS
                      }
                      request = Request(
                          TARGET + self.path,
                          data=body,
                          headers=headers,
                          method=self.command,
                      )
                      try:
                          timeout = 360 if self.command == "POST" else 30
                          with urlopen(request, timeout=timeout) as response:
                              payload = response.read()
                              self.send_response(response.status)
                              for key, value in response.headers.items():
                                  if key.lower() not in HOP_HEADERS:
                                      self.send_header(key, value)
                              self.end_headers()
                              self.wfile.write(payload)
                      except HTTPError as exc:
                          payload = exc.read()
                          self.send_response(exc.code)
                          self.end_headers()
                          self.wfile.write(payload)
                      except URLError as exc:
                          self.send_response(502)
                          self.end_headers()
                          self.wfile.write(str(exc).encode())
              ThreadingHTTPServer(("0.0.0.0", 18081), Proxy).serve_forever()
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 6
 ---
 ---
 apiVersion: batch/v1
 kind: Job
--- a/schemas/daily-triage-report.json
+++ b/schemas/daily-triage-report.json
@@ -1,4 +1,5 @@
 {
  "$comment": "ACTIVITY-WP-0016-T02. Strict, bounded contract for the daily WSJF triage report. The per-item 'recommendations' schema is intentionally strict on STRUCTURE (types + required keys) so the T03 boundary parser can validate each recommendation independently and quarantine only the malformed ones. 'maxItems' is a producer hint (honoured by llm-connect constrained decoding and by the prompt); it is deliberately NOT hard-enforced by the in-repo validator, because rejecting a whole report for having too many items would reproduce the monolithic-failure bug WP-0016 exists to remove. Over-count is mitigated in T03 (keep top-N by rank, quarantine the rest). Value-domain vocabularies (action/confidence) are documented in the prompt and enforced by T04 guardrails with mitigation, not as brittle hard-fail enums here.",
  "type": "object",
  "required": ["summary", "recommendations"],
  "properties": {
@@ -7,8 +8,28 @@
    },
    "recommendations": {
      "type": "array",
      "maxItems": 7,
      "items": {
-        "type": "object"
+        "type": "object",
        "required": ["rank", "candidate", "action", "why"],
        "properties": {
          "rank": { "type": "integer" },
          "candidate": { "type": "string" },
          "action": { "type": "string" },
          "why": { "type": "string" },
          "confidence": { "type": "string" },
          "wsjf": {
            "type": "object",
            "properties": {
              "score": { "type": "number" },
              "strategic_value": { "type": "number" },
              "time_criticality": { "type": "number" },
              "risk_reduction": { "type": "number" },
              "opportunity_enablement": { "type": "number" },
              "job_size": { "type": "number" }
            }
          }
        }
      }
    }
  }
--- a/scripts/automation_inventory.py
+++ b/scripts/automation_inventory.py
@@ -0,0 +1,8 @@
 #!/usr/bin/env python3
 """CLI wrapper for the repo-native automation inventory report."""
 from activity_core.automation_status import inventory_main
 if __name__ == "__main__":
    raise SystemExit(inventory_main())
--- a/scripts/automation_status.py
+++ b/scripts/automation_status.py
@@ -0,0 +1,8 @@
 #!/usr/bin/env python3
 """CLI wrapper for the repo-native automation status report."""
 from activity_core.automation_status import main
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/scripts/smoke_admin_sync_no_restart.py
+++ b/scripts/smoke_admin_sync_no_restart.py
@@ -0,0 +1,212 @@
 #!/usr/bin/env python3
 """Railiance01 no-restart smoke for POST /admin/sync.
 Patches the disabled ops-service-inventory-probes projection in the cluster
 ConfigMap, waits for the API pod volume to refresh, runs /admin/sync twice,
 verifies DB + Temporal schedule drift without restarting actcore-worker, then
 rolls the ConfigMap back to the disabled baseline.
 Requires:
  - KUBECONFIG pointing at railiance01 (for example ~/.kube/config-hosteurope)
  - kubectl access to the activity-core namespace
 Example:
  export KUBECONFIG=~/.kube/config-hosteurope
  python3 scripts/smoke_admin_sync_no_restart.py
 """
 from __future__ import annotations
 import json
 import subprocess
 import sys
 import time
 ACTIVITY_ID = "40d15a87-7ff6-4d8e-992c-37df15f95110"
 CONFIGMAP = "actcore-external-activity-definitions"
 DEFINITION_KEY = "ops-service-inventory-probes.md"
 MOUNTED_FILE = (
    "/etc/activity-core/external-definitions/activity-definitions/"
    f"{DEFINITION_KEY}"
 )
 VOLUME_PROPAGATION_SECONDS = 65
 def kubectl(*args: str, input_text: str | None = None) -> str:
    cmd = ["kubectl", "-n", "activity-core", *args]
    return subprocess.check_output(
        cmd,
        input=input_text,
        text=True,
    )
 def api_json(path: str, *, method: str = "GET") -> dict:
    script = (
        "import urllib.request, json\n"
        f'req = urllib.request.Request("http://localhost:8010{path}", method="{method}")\n'
        "print(urllib.request.urlopen(req).read().decode())"
    )
    return json.loads(kubectl("exec", "deploy/actcore-api", "--", "python3", "-c", script))
 def worker_lines(script: str) -> list[str]:
    return kubectl("exec", "deploy/actcore-worker", "--", "python3", "-c", script).splitlines()
 def worker_uid() -> str:
    return kubectl(
        "get",
        "pod",
        "-l",
        "app.kubernetes.io/name=actcore-worker",
        "-o",
        "jsonpath={.items[0].metadata.uid}",
    ).strip()
 def load_configmap() -> dict:
    return json.loads(kubectl("get", "configmap", CONFIGMAP, "-o", "json"))
 def apply_configmap(cm: dict) -> None:
    kubectl("apply", "-f", "-", input_text=json.dumps(cm))
 def patch_definition(cm: dict, *, enabled: bool, cron: str) -> None:
    text = cm["data"][DEFINITION_KEY]
    for line in text.splitlines():
        if line.strip().startswith("enabled:"):
            break
    else:
        raise RuntimeError("enabled field not found in projection")
    text = _replace_once(text, 'enabled: false', f"enabled: {'true' if enabled else 'false'}")
    text = _replace_once(text, 'enabled: true', f"enabled: {'true' if enabled else 'false'}")
    text = _replace_once(
        text,
        'cron_expression: "15 * * * *"',
        f'cron_expression: "{cron}"',
    )
    text = _replace_once(
        text,
        'cron_expression: "25 * * * *"',
        f'cron_expression: "{cron}"',
    )
    cm["data"][DEFINITION_KEY] = text
    apply_configmap(cm)
 def _replace_once(text: str, old: str, new: str) -> str:
    if old not in text:
        return text
    return text.replace(old, new, 1)
 def wait_for_mount(*, enabled: bool, cron: str) -> None:
    deadline = time.time() + VOLUME_PROPAGATION_SECONDS
    want_enabled = "enabled: true" if enabled else "enabled: false"
    want_cron = f'cron_expression: "{cron}"'
    while time.time() < deadline:
        content = kubectl("exec", "deploy/actcore-api", "--", "cat", MOUNTED_FILE)
        if want_enabled in content and want_cron in content:
            return
        time.sleep(5)
    raise RuntimeError(
        f"ConfigMap projection did not refresh within {VOLUME_PROPAGATION_SECONDS}s"
    )
 def get_definition() -> dict[str, object]:
    for item in api_json("/activity-definitions/"):
        if item["id"] == ACTIVITY_ID:
            return {
                "enabled": item["enabled"],
                "cron": item["trigger_config"]["cron_expression"],
            }
    raise RuntimeError(f"ActivityDefinition {ACTIVITY_ID} not found")
 def describe_schedule() -> dict[str, object]:
    script = f"""
 import asyncio
 from temporalio.client import Client
 async def main() -> None:
    client = await Client.connect("actcore-temporal:7233")
    handle = client.get_schedule_handle("activity-schedule-{ACTIVITY_ID}")
    described = await handle.describe()
    schedule = described.schedule
    minute = schedule.spec.calendars[0].minute[0].start if schedule.spec.calendars else None
    print(schedule.state.paused)
    print(minute)
 asyncio.run(main())
 """
    paused, minute = worker_lines(script)
    return {"paused": paused == "True", "minute": int(minute)}
 def main() -> int:
    worker_before = worker_uid()
    cm = load_configmap()
    print("1) enable + cadence change via ConfigMap")
    patch_definition(cm, enabled=True, cron="25 * * * *")
    wait_for_mount(enabled=True, cron="25 * * * *")
    print("2) POST /admin/sync (first pass)")
    sync1 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if not sync1.get("ok"):
        print(json.dumps(sync1, indent=2), file=sys.stderr)
        return 1
    defn = get_definition()
    schedule = describe_schedule()
    print("   definition:", defn)
    print("   schedule:", schedule)
    if defn != {"enabled": True, "cron": "25 * * * *"}:
        print("definition drift after sync", file=sys.stderr)
        return 1
    if schedule["paused"] or schedule["minute"] != 25:
        print("schedule drift after enable sync", file=sys.stderr)
        return 1
    print("3) POST /admin/sync (idempotent repeat)")
    sync2 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if sync2.get("schedules") != sync1.get("schedules"):
        print("idempotent schedule counts changed", file=sys.stderr)
        print(json.dumps({"sync1": sync1, "sync2": sync2}, indent=2), file=sys.stderr)
        return 1
    print("4) rollback ConfigMap + sync")
    cm = load_configmap()
    patch_definition(cm, enabled=False, cron="15 * * * *")
    wait_for_mount(enabled=False, cron="15 * * * *")
    sync3 = api_json("/admin/sync?definitions=true&schedules=true", method="POST")
    if not sync3.get("ok"):
        print(json.dumps(sync3, indent=2), file=sys.stderr)
        return 1
    defn = get_definition()
    schedule = describe_schedule()
    print("   definition:", defn)
    print("   schedule:", schedule)
    if defn != {"enabled": False, "cron": "15 * * * *"}:
        print("rollback definition drift", file=sys.stderr)
        return 1
    if not schedule["paused"] or schedule["minute"] != 15:
        print("rollback schedule drift", file=sys.stderr)
        return 1
    worker_after = worker_uid()
    if worker_before != worker_after:
        print("actcore-worker pod restarted during smoke", file=sys.stderr)
        return 1
    print("smoke passed: admin sync hot-reload without worker restart")
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/src/activity_core/activities.py
+++ b/src/activity_core/activities.py
@@ -149,6 +149,8 @@ async def resolve_context(
        query = source.get("query", "")
        params = source.get("params") or {}
        required = bool(source.get("required") or params.get("required", False))
        resolver_params = dict(params)
        resolver_params["required"] = required
        raw_bind = source.get("bind_to") or source.get("name") or source_type
        # Strip the 'context.' namespace prefix so evaluator can find the key.
        bind_key = raw_bind.removeprefix("context.") if raw_bind.startswith("context.") else raw_bind
@@ -172,7 +174,7 @@ async def resolve_context(
            continue
        try:
-            resolved = resolver_cls().resolve(query, event_envelope, params)
+            resolved = resolver_cls().resolve(query, event_envelope, resolver_params)
            snapshot[bind_key] = _bind_resolver_result(bind_key, resolved)
        except Exception as exc:
            if required:
@@ -364,6 +366,7 @@ async def evaluate_instructions(payload: dict) -> dict:
                "output_validated": result.output_validated,
                "review_required": result.review_required,
                "validation_error": result.validation_error,
                "llm_response_metadata": result.llm_response_metadata,
            })
        for spec in result.tasks:
            task_specs.append({
--- a/src/activity_core/api.py
+++ b/src/activity_core/api.py
@@ -40,6 +40,7 @@ from temporalio.client import Client
 from activity_core.models import ActivityDefinition, CronTriggerConfig
 from activity_core.orm import ActivityDefinition as ActivityDefinitionRow, EventType as EventTypeRow
 from activity_core.schedule_manager import delete_schedule, upsert_schedule
 from activity_core.sync_service import run_sync
 from activity_core.webhook_receiver import router as webhook_router
 TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
@@ -275,6 +276,24 @@ async def trigger_definition(definition_id: uuid.UUID) -> dict[str, str]:
    return {"workflow_id": handle.id, "trigger_key": trigger_key}
 # --- Admin sync ---------------------------------------------------------------
@app.post("/admin/sync")
 async def admin_sync(
    definitions: bool = True,
    schedules: bool = True,
    event_types: bool = False,
 ) -> dict[str, Any]:
    """Run operator-triggered definition/event/schedule sync without restart."""
    return await run_sync(
        session_factory=_get_db(),
        temporal_client=_get_temporal() if schedules else None,
        definitions=definitions,
        schedules=schedules,
        event_types=event_types,
    )
 # T42: Curator gate — event type approval endpoint
@app.get("/health")
--- a/src/activity_core/automation_status.py
+++ b/src/activity_core/automation_status.py
--- a/src/activity_core/context_resolvers/init.py
+++ b/src/activity_core/context_resolvers/init.py
@@ -4,4 +4,5 @@ from activity_core.context_resolvers import (  # noqa: F401
    ops_inventory,
    repo_scoping,
    state_hub,
    reuse_surface,
 )
--- a/src/activity_core/context_resolvers/reuse_surface.py
+++ b/src/activity_core/context_resolvers/reuse_surface.py
@@ -0,0 +1,516 @@
 """Reuse-surface registry hygiene context adapter.
 Registered as source type ``reuse-surface`` and as the ``shell`` resolver
 dispatcher for the ``reuse_surface_report_gaps`` query.  Other shell queries
 continue to delegate to the kaizen resolver for backward compatibility.
 """
 from __future__ import annotations
 import json
 import logging
 import os
 import socket
 import subprocess
 from dataclasses import dataclass
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Any
 import httpx
 import yaml
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, ContextResolver
 from activity_core.context_resolvers.kaizen import KaizenContextResolver
 from activity_core.context_resolvers.state_hub import StateHubContextResolver
 logger = logging.getLogger(__name__)
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _REPORT_TIMEOUT_SECONDS = 60
 _STATE_HUB_TIMEOUT_SECONDS = 10.0
 _KNOWN_SIGNALS = frozenset(
    {
        "registry_gap",
        "empty_capability_scaffold",
        "stale_scope",
        "stale_sbom",
        "publish_check_fail",
    }
 )
@dataclass(frozen=True)
 class RosterEntry:
    slug: str
    domain: str | None = None
    publish_check: str | None = None
 def _base_url() -> str:
    return os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL).rstrip("/")
 def _runner_host(params: dict[str, Any]) -> str:
    return str(
        params.get("runner_host")
        or os.environ.get("KAIZEN_RUNNER_HOST")
        or socket.gethostname()
    )
 def _as_required(params: dict[str, Any]) -> bool:
    return bool(params.get("required", False))
 def reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
    """Resolve registry-hygiene gaps for the next rollout batch.
    Missing operational dependencies are visible failures for required sources
    and graceful empty lists for optional sources so definitions can opt into
    either behavior without changing rule logic.
    """
    try:
        return _resolve_reuse_surface_report_gaps(params)
    except Exception as exc:
        if _as_required(params):
            raise
        logger.warning("reuse_surface_report_gaps unavailable: %s", exc)
        return {"gaps": []}
 def _resolve_reuse_surface_report_gaps(params: dict[str, Any]) -> dict[str, Any]:
    roster_path = _roster_path(params)
    entries = _load_active_roster_entries(roster_path)
    if not entries:
        return {"gaps": []}
    state_path = _round_robin_state_path(params, roster_path)
    selected, next_cursor = _select_round_robin_batch(
        entries,
        _batch_size(params),
        state_path,
    )
    if not selected:
        return {"gaps": []}
    signals = _enabled_signals(_signals_path(params, roster_path))
    roots = _resolve_repo_roots(selected, _runner_host(params))
    report = _reuse_surface_report(params, signals)
    gaps = _gap_records(selected, roots, signals, report)
    _write_round_robin_state(state_path, next_cursor, selected)
    return {"gaps": gaps}
 def _roster_path(params: dict[str, Any]) -> Path:
    raw = params.get("roster")
    if not raw:
        raise ValueError("reuse_surface_report_gaps requires params.roster")
    path = Path(str(raw)).expanduser()
    if not path.is_file():
        raise FileNotFoundError(f"reuse_surface_report_gaps roster not found: {path}")
    return path
 def _batch_size(params: dict[str, Any]) -> int:
    try:
        return max(1, int(params.get("batch_size", 3)))
    except (TypeError, ValueError):
        return 3
 def _round_robin_state_path(params: dict[str, Any], roster_path: Path) -> Path:
    raw = params.get("round_robin_state")
    if raw:
        return Path(str(raw)).expanduser()
    return roster_path.with_name("round-robin-state.json")
 def _signals_path(params: dict[str, Any], roster_path: Path) -> Path:
    raw = params.get("signals")
    if raw:
        return Path(str(raw)).expanduser()
    return roster_path.with_name("signals.yml")
 def _load_active_roster_entries(path: Path) -> list[RosterEntry]:
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    if not isinstance(data, dict):
        raise ValueError(f"reuse_surface rollout roster is not a mapping: {path}")
    entries: dict[str, RosterEntry] = {}
    for domain, block in _iter_domain_blocks(data):
        if _domain_phase(block) != "active":
            continue
        for item in _repo_items(block):
            entry = _entry_from_item(item, domain, block)
            if entry and entry.slug not in entries:
                entries[entry.slug] = entry
    return list(entries.values())
 def _iter_domain_blocks(data: dict[str, Any]) -> list[tuple[str | None, dict[str, Any]]]:
    domains = data.get("domains")
    if isinstance(domains, dict):
        return [
            (str(name), block)
            for name, block in domains.items()
            if isinstance(block, dict)
        ]
    if isinstance(domains, list):
        return [
            (str(block.get("name") or block.get("domain") or ""), block)
            for block in domains
            if isinstance(block, dict)
        ]
    if isinstance(data.get("active"), list):
        return [(None, {"phase": "active", "repos": data["active"]})]
    return [
        (str(name), block)
        for name, block in data.items()
        if isinstance(block, dict) and ("phase" in block or "repos" in block)
    ]
 def _domain_phase(block: dict[str, Any]) -> str:
    return str(block.get("phase") or block.get("status") or "").lower()
 def _repo_items(block: dict[str, Any]) -> list[Any]:
    repos = (
        block.get("repos")
        or block.get("repo_slugs")
        or block.get("repositories")
        or block.get("slugs")
        or []
    )
    if isinstance(repos, dict):
        items: list[Any] = []
        for slug, config in repos.items():
            if isinstance(config, dict):
                item = dict(config)
                item.setdefault("slug", slug)
                items.append(item)
            else:
                items.append(str(slug))
        return items
    if isinstance(repos, list):
        return repos
    return []
 def _entry_from_item(
    item: Any,
    domain: str | None,
    block: dict[str, Any],
 ) -> RosterEntry | None:
    publish_check = block.get("publish_check")
    if isinstance(item, str):
        slug = item
    elif isinstance(item, dict):
        slug = item.get("slug") or item.get("repo") or item.get("name")
        publish_check = item.get("publish_check", publish_check)
    else:
        return None
    if not slug:
        return None
    return RosterEntry(
        slug=str(slug),
        domain=domain or None,
        publish_check=str(publish_check).lower() if publish_check is not None else None,
    )
 def _select_round_robin_batch(
    entries: list[RosterEntry],
    batch_size: int,
    state_path: Path,
 ) -> tuple[list[RosterEntry], int]:
    if not entries:
        return [], 0
    cursor = _read_round_robin_cursor(state_path) % len(entries)
    size = min(batch_size, len(entries))
    selected = [entries[(cursor + offset) % len(entries)] for offset in range(size)]
    next_cursor = (cursor + size) % len(entries)
    return selected, next_cursor
 def _read_round_robin_cursor(path: Path) -> int:
    if not path.is_file():
        return 0
    try:
        data = json.loads(path.read_text(encoding="utf-8"))
    except (OSError, json.JSONDecodeError):
        return 0
    if not isinstance(data, dict):
        return 0
    try:
        return int(data.get("cursor", 0))
    except (TypeError, ValueError):
        return 0
 def _write_round_robin_state(
    path: Path,
    cursor: int,
    selected: list[RosterEntry],
 ) -> None:
    path.parent.mkdir(parents=True, exist_ok=True)
    payload = {
        "cursor": cursor,
        "last_batch": [entry.slug for entry in selected],
        "updated_at": datetime.now(timezone.utc).isoformat(),
    }
    path.write_text(
        json.dumps(payload, indent=2, sort_keys=True) + "\n",
        encoding="utf-8",
    )
 def _enabled_signals(path: Path) -> set[str]:
    if not path.is_file():
        return set(_KNOWN_SIGNALS)
    data = yaml.safe_load(path.read_text(encoding="utf-8"))
    node = data.get("signals") if isinstance(data, dict) else data
    enabled: set[str] = set()
    saw_known_signal = False
    if isinstance(node, dict):
        for name, config in node.items():
            if str(name) not in _KNOWN_SIGNALS:
                continue
            saw_known_signal = True
            if isinstance(config, dict) and config.get("enabled") is False:
                continue
            if config is False:
                continue
            enabled.add(str(name))
    elif isinstance(node, list):
        for item in node:
            if isinstance(item, str) and item in _KNOWN_SIGNALS:
                saw_known_signal = True
                enabled.add(item)
            elif isinstance(item, dict):
                name = item.get("id") or item.get("signal") or item.get("name")
                if str(name) in _KNOWN_SIGNALS and item.get("enabled", True) is not False:
                    saw_known_signal = True
                    enabled.add(str(name))
    return enabled if saw_known_signal else set(_KNOWN_SIGNALS)
 def _resolve_repo_roots(
    entries: list[RosterEntry],
    runner_host: str,
 ) -> dict[str, Path]:
    requested = {entry.slug for entry in entries}
    roots: dict[str, Path] = {}
    for repo in _fetch_repos():
        slug = str(repo.get("slug") or "")
        if slug not in requested:
            continue
        raw = _repo_path_for_host(repo, runner_host)
        if raw:
            roots[slug] = Path(raw)
    return roots
 def _fetch_repos() -> list[dict[str, Any]]:
    url = f"{_base_url()}/repos/"
    try:
        resp = httpx.get(url, timeout=_STATE_HUB_TIMEOUT_SECONDS)
        resp.raise_for_status()
    except httpx.HTTPError as exc:
        raise RuntimeError(f"State Hub unreachable at {url}: {exc}") from exc
    payload = resp.json()
    if not isinstance(payload, list):
        raise RuntimeError(f"State Hub /repos/ returned non-list: {type(payload)!r}")
    return [repo for repo in payload if isinstance(repo, dict)]
 def _repo_path_for_host(repo: dict[str, Any], runner_host: str) -> str | None:
    host_paths = repo.get("host_paths") or {}
    raw = None
    if isinstance(host_paths, dict):
        raw = host_paths.get(runner_host)
    raw = raw or repo.get("local_path")
    if not raw or raw == "(unknown)":
        return None
    return str(raw)
 def _reuse_surface_report(params: dict[str, Any], signals: set[str]) -> dict[str, Any]:
    if not (signals & {"registry_gap", "empty_capability_scaffold"}):
        return {}
    binary = str(params.get("reuse_surface_bin") or "reuse-surface")
    try:
        completed = subprocess.run(
            [binary, "report", "gaps", "--format", "json"],
            capture_output=True,
            check=False,
            text=True,
            timeout=_REPORT_TIMEOUT_SECONDS,
        )
    except FileNotFoundError as exc:
        raise RuntimeError(f"reuse-surface CLI not found: {binary}") from exc
    except subprocess.TimeoutExpired as exc:
        raise RuntimeError("reuse-surface report gaps timed out") from exc
    if completed.returncode != 0:
        detail = completed.stderr.strip() or completed.stdout.strip()
        raise RuntimeError(f"reuse-surface report gaps failed: {detail}")
    try:
        payload = json.loads(completed.stdout or "{}")
    except json.JSONDecodeError as exc:
        raise RuntimeError("reuse-surface report gaps returned invalid JSON") from exc
    if not isinstance(payload, dict):
        raise RuntimeError("reuse-surface report gaps returned non-object JSON")
    return payload
 def _gap_records(
    entries: list[RosterEntry],
    roots: dict[str, Path],
    signals: set[str],
    report: dict[str, Any],
 ) -> list[dict[str, Any]]:
    empty_scaffolds = _repo_set(report, {"empty_scaffolds", "empty_scaffold"})
    publish_fail = _repo_set(
        report,
        {"publish_fail", "publish_fails", "publish_failures"},
    )
    gaps: list[dict[str, Any]] = []
    seen: set[tuple[str, str]] = set()
    for entry in entries:
        root = roots.get(entry.slug)
        if root is None:
            logger.info("reuse_surface repo_unreachable slug=%s", entry.slug)
            continue
        if (
            signals & {"registry_gap", "empty_capability_scaffold"}
            and entry.slug in empty_scaffolds
        ):
            _append_gap(gaps, seen, entry.slug, root, "empty_capability_scaffold")
        if "registry_gap" in signals and entry.slug in publish_fail:
            _append_gap(gaps, seen, entry.slug, root, "registry_gap")
        if "publish_check_fail" in signals and entry.publish_check == "fail":
            _append_gap(gaps, seen, entry.slug, root, "publish_check_fail")
        if "stale_scope" in signals and _scope_is_stale(root):
            _append_gap(gaps, seen, entry.slug, root, "stale_scope")
        if "stale_sbom" in signals and _sbom_is_stale(entry.slug):
            _append_gap(gaps, seen, entry.slug, root, "stale_sbom")
    return gaps
 def _append_gap(
    gaps: list[dict[str, Any]],
    seen: set[tuple[str, str]],
    slug: str,
    root: Path,
    signal: str,
 ) -> None:
    key = (slug, signal)
    if key in seen:
        return
    seen.add(key)
    gaps.append(
        {
            "repo": slug,
            "root": str(root),
            "signal": signal,
            "hygiene_signal": signal,
        }
    )
 def _scope_is_stale(root: Path) -> bool:
    scope = root / "SCOPE.md"
    if not scope.is_file():
        return True
    age_seconds = datetime.now(timezone.utc).timestamp() - scope.stat().st_mtime
    return age_seconds > 90 * 24 * 60 * 60
 def _sbom_is_stale(slug: str) -> bool:
    payload = StateHubContextResolver().resolve(
        "repo_sbom_status",
        None,
        {"repo_slug": slug},
    )
    if not isinstance(payload, dict):
        return False
    try:
        return int(payload.get("sbom_age_days", 0)) > 30
    except (TypeError, ValueError):
        return False
 def _repo_set(report: dict[str, Any], keys: set[str]) -> set[str]:
    slugs: set[str] = set()
    for value in _values_for_keys(report, keys):
        slugs.update(_slugs_from_value(value))
    return slugs
 def _values_for_keys(value: Any, keys: set[str]) -> list[Any]:
    values: list[Any] = []
    if isinstance(value, dict):
        for key, nested in value.items():
            if key in keys:
                values.append(nested)
            values.extend(_values_for_keys(nested, keys))
    elif isinstance(value, list):
        for item in value:
            values.extend(_values_for_keys(item, keys))
    return values
 def _slugs_from_value(value: Any) -> set[str]:
    if isinstance(value, str):
        return {value}
    if isinstance(value, list):
        slugs: set[str] = set()
        for item in value:
            slugs.update(_slugs_from_value(item))
        return slugs
    if isinstance(value, dict):
        for key in ("repo", "repo_slug", "slug", "name"):
            if value.get(key):
                return {str(value[key])}
        slugs: set[str] = set()
        for key, nested in value.items():
            if nested is True or isinstance(nested, (dict, list)):
                slugs.add(str(key))
            slugs.update(_slugs_from_value(nested))
        return slugs
    return set()
 class ReuseSurfaceContextResolver(ContextResolver):
    """Resolves reuse-surface registry hygiene gap reports."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
        if query == "reuse_surface_report_gaps":
            return reuse_surface_report_gaps(params)
        return {}
 class ShellContextResolver(ContextResolver):
    """Dispatch shell-backed context queries without breaking kaizen aliases."""
    def resolve(self, query: str, event: Any, params: dict[str, Any]) -> dict[str, Any]:
        if query == "reuse_surface_report_gaps":
            return reuse_surface_report_gaps(params)
        return KaizenContextResolver().resolve(query, event, params)
 CONTEXT_RESOLVER_REGISTRY["reuse-surface"] = ReuseSurfaceContextResolver
 CONTEXT_RESOLVER_REGISTRY["shell"] = ShellContextResolver
--- a/src/activity_core/context_resolvers/state_hub.py
+++ b/src/activity_core/context_resolvers/state_hub.py
@@ -12,6 +12,7 @@ Supported queries:
  - coding_retro:     latest /progress/ item with event_type=coding_retro
  - daily_triage_digest: curated scalar JSON digest for daily WSJF triage
  - recently_on_scope_hourly: POST {STATE_HUB_URL}/recently-on-scope/hourly
  - consistency_sweep_remote_all: POST {STATE_HUB_URL}/consistency/sweep/remote-all
 No caching — state hub data is live operational state and must not be stale
 within a single workflow run.
@@ -31,6 +32,7 @@ from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY, Cont
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _TIMEOUT_SECONDS = 10.0
 _SWEEP_TIMEOUT_SECONDS = 330.0
 _OPEN_WORKSTREAM_STATUSES = {"active", "ready", "blocked"}
 _OPEN_TASK_STATUSES = {"wait", "todo", "progress"}
 # Sentinel age for repos that have never had an SBOM ingested. Large enough
@@ -53,13 +55,26 @@ def _fetch_json(path: str, params: dict[str, Any] | None = None) -> Any:
        return {}
-def _post_json(path: str, payload: dict[str, Any]) -> Any:
+def _post_json(path: str, payload: dict[str, Any], *, timeout: float = _TIMEOUT_SECONDS) -> Any:
    url = f"{_base_url()}{path}"
-    resp = httpx.post(url, json=payload, timeout=_TIMEOUT_SECONDS)
+    resp = httpx.post(url, json=payload, timeout=timeout)
    resp.raise_for_status()
    return resp.json()
 def _validate_consistency_sweep_remote_all(result: Any) -> dict[str, Any]:
    if not isinstance(result, dict):
        raise RuntimeError("consistency_sweep_remote_all returned a non-object response")
    required_keys = {"exit_code", "lock_skipped", "repos_processed"}
    missing = required_keys - set(result)
    if missing:
        missing_list = ", ".join(sorted(missing))
        raise RuntimeError(
            f"consistency_sweep_remote_all response missing required key(s): {missing_list}"
        )
    return result
 def _validate_recently_on_scope_hourly(result: Any) -> dict[str, Any]:
    if not isinstance(result, dict):
        raise RuntimeError("recently_on_scope_hourly returned a non-object response")
@@ -107,6 +122,18 @@ class StateHubContextResolver(ContextResolver):
            }
            result = _post_json("/recently-on-scope/hourly", payload)
            return _validate_recently_on_scope_hourly(result)
        if query == "consistency_sweep_remote_all":
            payload = {
                key: value
                for key, value in params.items()
                if key not in {"required"}
            }
            result = _post_json(
                "/consistency/sweep/remote-all",
                payload,
                timeout=_SWEEP_TIMEOUT_SECONDS,
            )
            return _validate_consistency_sweep_remote_all(result)
        return {}
--- a/src/activity_core/issue_sink.py
+++ b/src/activity_core/issue_sink.py
@@ -20,7 +20,8 @@ from activity_core.rules.models import TaskRef, TaskSpec
 logger = logging.getLogger(__name__)
-ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8010")
+ISSUE_CORE_URL = os.environ.get("ISSUE_CORE_URL", "http://127.0.0.1:8765")
 ISSUE_CORE_API_KEY_ENV = "ISSUE_CORE_API_KEY"
 ISSUE_SINK_TYPE = os.environ.get("ISSUE_SINK_TYPE", "rest")
@@ -30,10 +31,30 @@ class IssueSink(ABC):
 class IssueCoreRestSink(IssueSink):
-    """POSTs to issue-core REST API. Config: ISSUE_CORE_URL env var."""
+    """POSTs to issue-core REST API.
-    def __init__(self, base_url: str = ISSUE_CORE_URL) -> None:
+    Config: ISSUE_CORE_URL and ISSUE_CORE_API_KEY env vars (shared key with
    the issue-core server).
    """
    def __init__(
        self,
        base_url: str = ISSUE_CORE_URL,
        api_key: str | None = None,
    ) -> None:
        self._base_url = base_url.rstrip("/")
        if api_key is not None:
            self._api_key = api_key.strip()
        else:
            self._api_key = os.environ.get(ISSUE_CORE_API_KEY_ENV, "").strip()
    def _auth_headers(self) -> dict[str, str]:
        if not self._api_key:
            raise RuntimeError(
                f"{ISSUE_CORE_API_KEY_ENV} is not set. "
                "Required when ISSUE_SINK_TYPE=rest."
            )
        return {"Authorization": f"Bearer {self._api_key}"}
    def emit(self, task_spec: TaskSpec) -> TaskRef:
        payload = {
@@ -45,10 +66,19 @@ class IssueCoreRestSink(IssueSink):
            "due_in_days": task_spec.due_in_days,
            "source_type": task_spec.source_type,
            "source_id": task_spec.source_id,
-            "triggering_event_id": task_spec.triggering_event_id,
+            "triggering_event_id": (
                str(task_spec.triggering_event_id)
                if task_spec.triggering_event_id is not None
                else None
            ),
            "activity_definition_id": task_spec.activity_definition_id,
        }
-        resp = httpx.post(f"{self._base_url}/issues/", json=payload, timeout=10.0)
+        resp = httpx.post(
            f"{self._base_url}/issues/",
            json=payload,
            headers=self._auth_headers(),
            timeout=10.0,
        )
        resp.raise_for_status()
        data = resp.json()
        return TaskRef(
--- a/src/activity_core/llm_client.py
+++ b/src/activity_core/llm_client.py
@@ -17,6 +17,8 @@ import httpx
 class DisabledLLMClient:
    """LLM client used when no llm-connect endpoint is configured."""
    last_response_metadata: dict[str, Any] | None = None
    def complete(
        self,
        prompt: str,
@@ -32,6 +34,7 @@ class LLMConnectClient:
    def __init__(self, base_url: str, timeout_seconds: float = 300.0) -> None:
        self.base_url = base_url.rstrip("/")
        self.timeout_seconds = timeout_seconds
        self.last_response_metadata: dict[str, Any] | None = None
    def complete(
        self,
@@ -54,12 +57,48 @@ class LLMConnectClient:
        )
        resp.raise_for_status()
        data = resp.json()
        self.last_response_metadata = _extract_response_metadata(data)
        content = data.get("content")
        if not isinstance(content, str):
            raise ValueError("llm-connect response missing string content")
        return content
 _SAFE_RESPONSE_METADATA_KEYS = {
    "finish_reason",
    "usage",
    "model",
    "model_name",
    "provider",
    "request_id",
    "response_id",
    "trace_id",
    "latency_ms",
    "duration_ms",
    "elapsed_ms",
    "created",
    "created_at",
 }
 def _extract_response_metadata(data: dict[str, Any]) -> dict[str, Any]:
    """Keep non-secret llm-connect diagnostics alongside the returned content."""
    return {
        key: value for key, value in data.items()
        if key in _SAFE_RESPONSE_METADATA_KEYS and _json_safe(value)
    }
 def _json_safe(value: Any) -> bool:
    try:
        import json
        json.dumps(value)
    except (TypeError, ValueError):
        return False
    return True
 def get_llm_client() -> DisabledLLMClient | LLMConnectClient:
    base_url = os.environ.get("LLM_CONNECT_URL", "").strip()
    if not base_url:
--- a/src/activity_core/models.py
+++ b/src/activity_core/models.py
@@ -49,7 +49,18 @@ class CronTriggerConfig(BaseModel):
    )
    timezone: str = Field(default="UTC", description="IANA timezone name.")
    jitter_seconds: int = Field(default=0, ge=0)
-    misfire_policy: Literal["skip", "catchup", "compress"] = Field(default="skip")
+    # Run-miss recovery behaviour (ACTIVITY-WP-0014). What happens when a fire is
    # missed because the worker / Temporal was unavailable at trigger time:
    #   skip           - run on trigger or skip; a missed fire is never recovered
    #   catchup_all    - recover every fire missed during the outage window
    #   catchup_latest - recover only the most recent missed fire; do not accumulate
    # Legacy aliases are accepted: catchup → catchup_all, compress → catchup_latest.
    misfire_policy: Literal[
        "skip", "catchup_all", "catchup_latest", "catchup", "compress"
    ] = Field(default="skip")
    # Override the per-policy default catchup window (how far back Temporal will
    # recover missed fires after an outage). None uses the policy default.
    catchup_window_seconds: int | None = Field(default=None, ge=0)
 class EventTriggerConfig(BaseModel):
--- a/src/activity_core/ops_evidence_sinks.py
+++ b/src/activity_core/ops_evidence_sinks.py
@@ -2,12 +2,15 @@
 from __future__ import annotations
 import json
 import os
 from pathlib import Path
 from typing import Any
 import httpx
 from activity_core.context_resolvers.ops_inventory import _sanitize_url
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _INTER_HUB_SINK_TYPES = {
@@ -15,6 +18,10 @@ _INTER_HUB_SINK_TYPES = {
    "inter-hub-event",
    "inter-hub-interaction-event",
 }
 _CORE_HUB_SINK_TYPES = {
    "core-hub",
    "core-hub-interaction-event",
 }
 def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, Any]]:
@@ -55,6 +62,12 @@ def persist_ops_inventory_evidence(payload: dict[str, Any]) -> list[dict[str, An
                    results.append(
                        _post_state_hub_progress(payload, bind_key, probe_result, sink)
                    )
                elif sink_type in _CORE_HUB_SINK_TYPES:
                    results.append(
                        _post_core_hub_interaction_event(
                            payload, bind_key, probe_result, sink
                        )
                    )
                elif sink_type in _INTER_HUB_SINK_TYPES:
                    results.append(_inter_hub_result(sink))
                else:
@@ -121,6 +134,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers(run_id, context_key, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -136,12 +150,17 @@ def _post_state_hub_progress(
 def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bool:
    # Best-effort optimisation only; the Idempotency-Key header on the write is the
    # real dedup guarantee. Do not hard-fail if State Hub is unreachable here.
    try:
        resp = httpx.get(
            f"{base_url}/progress/",
            params={"limit": 100},
            timeout=10.0,
        )
        resp.raise_for_status()
    except httpx.HTTPError:
        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
@@ -152,6 +171,213 @@ def _progress_exists(base_url: str, event_type: str, idempotency_key: str) -> bo
    return False
 def _post_core_hub_interaction_event(
    payload: dict[str, Any],
    context_key: str,
    probe_result: dict[str, Any],
    sink: dict[str, Any],
 ) -> dict[str, Any]:
    raw_base_url = (
        sink.get("core_hub_url")
        or sink.get("base_url")
        or os.environ.get("CORE_HUB_BASE_URL")
        or ""
    )
    base_url = str(raw_base_url).rstrip("/")
    runtime_token = _core_hub_runtime_token(sink)
    widget_id = _core_hub_widget_id(sink, probe_result)
    missing: list[str] = []
    if not base_url:
        missing.append("CORE_HUB_BASE_URL")
    if not runtime_token:
        missing.append("CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE")
    if not widget_id:
        missing.append("widget_id or CORE_HUB_WIDGET_ID")
    if missing:
        return {
            "type": sink.get("type"),
            "status": "skipped",
            "reason": "missing_core_hub_config",
            "missing": missing,
            "context_key": context_key,
        }
    endpoint = _selected_endpoint(probe_result, sink)
    event_type = sink.get("event_type", "ops-endpoint-verified")
    timeout = float(sink.get("timeout_seconds", 10.0))
    body = {
        "widgetId": widget_id,
        "eventType": event_type,
        "viewContext": _core_hub_view_context(payload, context_key, endpoint, sink),
        "metadata": _core_hub_metadata(payload, context_key, probe_result, endpoint),
    }
    resp = httpx.post(
        f"{base_url}/api/v2/interaction-events",
        json=body,
        headers=_core_hub_headers(runtime_token),
        timeout=timeout,
    )
    resp.raise_for_status()
    data = resp.json()
    event_id = data.get("id")
    if not event_id:
        raise RuntimeError("Core Hub interaction event response did not include an id")
    if not _core_hub_event_exists(base_url, runtime_token, str(event_id), timeout):
        raise RuntimeError("Core Hub interaction event was not visible after create")
    return {
        "type": sink.get("type"),
        "status": "posted",
        "event_type": data.get("eventType", event_type),
        "event_id": event_id,
        "widget_id": data.get("widgetId", widget_id),
        "verified": True,
        "context_key": context_key,
    }
 def _core_hub_headers(runtime_token: str) -> dict[str, str]:
    return {
        "Accept": "application/json",
        "Authorization": f"Bearer {runtime_token}",
        "Content-Type": "application/json",
        "User-Agent": "activity-core-ops-evidence/0.1",
    }
 def _core_hub_runtime_token(sink: dict[str, Any]) -> str:
    token_file = (
        sink.get("runtime_token_file")
        or sink.get("token_file")
        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_FILE")
    )
    if token_file:
        return Path(str(token_file)).read_text(encoding="utf-8").strip()
    env_name = (
        sink.get("runtime_token_env")
        or os.environ.get("CORE_HUB_RUNTIME_TOKEN_ENV")
        or "CORE_HUB_RUNTIME_TOKEN"
    )
    return os.environ.get(str(env_name), "").strip()
 def _core_hub_widget_id(sink: dict[str, Any], probe_result: dict[str, Any]) -> str:
    direct = sink.get("widget_id") or os.environ.get("CORE_HUB_WIDGET_ID")
    if direct:
        return str(direct)
    endpoint = _selected_endpoint(probe_result, sink)
    widget_ref = endpoint.get("widget_ref") if endpoint else None
    if not widget_ref:
        return ""
    mapping = sink.get("widget_mapping") or sink.get("capability_mapping")
    if mapping is None:
        mapping = os.environ.get("CORE_HUB_WIDGET_MAPPING")
    parsed = _parse_widget_mapping(mapping)
    return parsed.get(str(widget_ref), "")
 def _parse_widget_mapping(raw: Any) -> dict[str, str]:
    if isinstance(raw, dict):
        return {str(key): str(value) for key, value in raw.items() if value}
    if not isinstance(raw, str) or not raw.strip():
        return {}
    value = raw.strip()
    if value.startswith("{"):
        try:
            loaded = json.loads(value)
        except json.JSONDecodeError:
            return {}
        if isinstance(loaded, dict):
            return {str(key): str(item) for key, item in loaded.items() if item}
        return {}
    if "=" not in value:
        return {}
    pairs: dict[str, str] = {}
    for part in value.split(","):
        key, _, item = part.partition("=")
        if key.strip() and item.strip():
            pairs[key.strip()] = item.strip()
    return pairs
 def _selected_endpoint(probe_result: dict[str, Any], sink: dict[str, Any]) -> dict[str, Any]:
    endpoints = [
        endpoint
        for endpoint in probe_result.get("endpoints", [])
        if isinstance(endpoint, dict)
    ]
    endpoint_id = sink.get("endpoint_id")
    if endpoint_id:
        match = next(
            (endpoint for endpoint in endpoints if endpoint.get("endpoint_id") == endpoint_id),
            None,
        )
        if match:
            return match
    return next(
        (endpoint for endpoint in endpoints if endpoint.get("widget_ref")),
        endpoints[0] if endpoints else {},
    )
 def _core_hub_view_context(
    payload: dict[str, Any],
    context_key: str,
    endpoint: dict[str, Any],
    sink: dict[str, Any],
 ) -> str:
    return str(
        sink.get("view_context")
        or endpoint.get("view_context")
        or f"activity-core/ops-inventory/{payload.get('run_id', 'unknown')}/{context_key}"
    )
 def _core_hub_metadata(
    payload: dict[str, Any],
    context_key: str,
    probe_result: dict[str, Any],
    endpoint: dict[str, Any],
 ) -> dict[str, Any]:
    compact = _compact_probe_result(probe_result)
    return {
        "activity_id": payload.get("activity_id"),
        "activity_core_run_id": payload.get("run_id"),
        "scheduled_for": payload.get("scheduled_for"),
        "source_type": "ops-inventory",
        "context_key": context_key,
        "probe": {
            "generated_at": compact.get("generated_at"),
            "inventory_path": compact.get("inventory_path"),
            "status": compact.get("status"),
            "reason": compact.get("reason"),
            "summary": compact.get("summary", {}),
        },
        "endpoint": _compact_endpoint(endpoint) if endpoint else {},
    }
 def _core_hub_event_exists(
    base_url: str,
    runtime_token: str,
    event_id: str,
    timeout: float,
 ) -> bool:
    resp = httpx.get(
        f"{base_url}/api/v2/interaction-events",
        headers=_core_hub_headers(runtime_token),
        timeout=timeout,
    )
    resp.raise_for_status()
    payload = resp.json()
    data = payload.get("data") if isinstance(payload, dict) else []
    if not isinstance(data, list):
        return False
    return any(isinstance(item, dict) and item.get("id") == event_id for item in data)
 def _inter_hub_result(sink: dict[str, Any]) -> dict[str, Any]:
    missing: list[str] = []
    if not (sink.get("inter_hub_url") or os.environ.get("INTER_HUB_URL")):
--- a/src/activity_core/report_sinks.py
+++ b/src/activity_core/report_sinks.py
@@ -11,6 +11,8 @@ from zoneinfo import ZoneInfo
 import httpx
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
 _THE_CUSTODIAN_ROOT = Path("/home/worsch/the-custodian")
 _FORBIDDEN_CUSTODIAN_ROOTS = (
@@ -134,6 +136,7 @@ def _post_state_hub_progress(
            "output_validated": report_entry.get("output_validated"),
            "review_required": report_entry.get("review_required"),
            "validation_error": report_entry.get("validation_error"),
            "llm_response_metadata": report_entry.get("llm_response_metadata"),
            "report": report,
        },
    }
@@ -149,6 +152,7 @@ def _post_state_hub_progress(
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers(run_id, instruction_id, event_type),
        timeout=float(sink.get("timeout_seconds", 10.0)),
    )
    resp.raise_for_status()
@@ -167,12 +171,18 @@ def _progress_exists(
    instruction_id: str,
    event_type: str,
 ) -> bool:
    # Best-effort read-dedup optimisation only. The Idempotency-Key header on the
    # write is the real guarantee; if State Hub is unreachable here we must not
    # hard-fail — proceed to the (keyed) write rather than raising.
    try:
        resp = httpx.get(
            f"{base_url}/progress/",
            params={"limit": 100},
            timeout=10.0,
        )
        resp.raise_for_status()
    except httpx.HTTPError:
        return False
    for item in resp.json():
        detail = item.get("detail") or {}
        if (
@@ -215,6 +225,16 @@ def _render_markdown(
        lines.extend([summary, ""])
    if validation_error:
        lines.extend(["Validation error:", "", f"`{validation_error}`", ""])
    metadata = report_entry.get("llm_response_metadata")
    if metadata:
        lines.extend([
            "LLM response metadata:",
            "",
            "```json",
            json.dumps(metadata, indent=2, sort_keys=True),
            "```",
            "",
        ])
    lines.extend([
        "```json",
        json.dumps(report, indent=2, sort_keys=True),
--- a/src/activity_core/rules/executor.py
+++ b/src/activity_core/rules/executor.py
@@ -41,6 +41,7 @@ class InstructionResult:
    review_required: bool = False
    condition_matched: str | None = None
    validation_error: str | None = None
    llm_response_metadata: dict[str, Any] | None = None
 def _resolve_path(obj: Any, path: str) -> Any:
@@ -160,15 +161,22 @@ def _execute(
    prompt_hash = hashlib.sha256(rendered.encode()).hexdigest()
    llm_config = _llm_run_config(instr)
    # Reference allow-list (WP-0016-T04): if a context resolver supplied the set
    # of known candidate ids, recommendations pointing at anything else are
    # quarantined. Absent (None) today → the check is inert until wired.
    allow_list = _allow_list_from_context(context)
    # Step 3 — call LLM
    raw_output = llm_client.complete(rendered, model=instr.model, config=llm_config)
    response_metadata = _llm_response_metadata(llm_client)
    # Step 4 — validate and optionally retry
-    task_specs, report, error = _validate_output(raw_output, instr)
+    task_specs, report, error = _validate_output(raw_output, instr, allow_list)
    if error:
        retry_prompt = rendered + f"\n\nPrevious output was invalid: {error}\nPlease fix."
        raw_output = llm_client.complete(retry_prompt, model=instr.model, config=llm_config)
-        task_specs, report, error = _validate_output(raw_output, instr)
+        response_metadata = _llm_response_metadata(llm_client)
        task_specs, report, error = _validate_output(raw_output, instr, allow_list)
        if error:
            # Truncate to keep log volume bounded but long enough to see the
            # actual JSON shape mismatch (typical reports are <2KB).
@@ -178,7 +186,18 @@ def _execute(
                "error=%s, raw_output_preview=%r",
                instr.id, prompt_hash, error, preview,
            )
-            failure_report = _invalid_output_report(instr, error, raw_output)
+            # Posture B (WP-0016-T03): try to recover a partial-but-usable
            # report from individually-parseable items before declaring total
            # loss. One bad item should cost one item, not the whole report.
            recovered = _resilient_report(
                instr, raw_output, error, prompt_hash, allow_list,
                response_metadata=response_metadata,
            )
            if recovered is not None:
                return recovered
            failure_report = _invalid_output_report(
                instr, error, raw_output, response_metadata=response_metadata,
            )
            if failure_report is not None:
                return InstructionResult(
                    tasks=[],
@@ -189,6 +208,7 @@ def _execute(
                    review_required=True,
                    condition_matched=instr.condition or None,
                    validation_error=error,
                    llm_response_metadata=response_metadata,
                )
            return _empty_result(instr, prompt_hash=prompt_hash, validation_error=error)
@@ -200,6 +220,7 @@ def _execute(
        output_validated=True,
        review_required=bool(getattr(instr, "review_required", False)),
        condition_matched=instr.condition or None,
        llm_response_metadata=response_metadata,
    )
@@ -239,6 +260,7 @@ def _invalid_output_report(
    instr: Any,
    validation_error: str,
    raw_output: Any,
    response_metadata: dict[str, Any] | None = None,
 ) -> dict[str, Any] | None:
    """Build a durable diagnostic report for invalid report-sink output.
@@ -256,7 +278,7 @@ def _invalid_output_report(
            partial_output = _parse_json_output(raw_output)
        except json.JSONDecodeError:
            partial_output = None
-            raw_preview = raw_output[:4000]
+            raw_preview = raw_output[:_RAW_OUTPUT_PREVIEW_LIMIT]
    else:
        partial_output = raw_output
@@ -268,6 +290,8 @@ def _invalid_output_report(
        "status": "validation_failed",
        "validation_error": validation_error,
    }
    if response_metadata:
        report["llm_response_metadata"] = response_metadata
    if isinstance(partial_output, dict):
        if isinstance(partial_output.get("summary"), str):
            report["partial_summary"] = partial_output["summary"]
@@ -279,6 +303,358 @@ def _invalid_output_report(
    return report
 # ---------------------------------------------------------------------------
 # Resilient report recovery (ACTIVITY-WP-0016-T03)
 #
 # Posture B — verify & mitigate at the producer→consumer boundary. When the
 # whole-document parse/validate fails, recover individually-parseable
 # recommendation objects, validate each against the item schema, keep the valid
 # ones, and quarantine the malformed/over-limit ones with provenance. One bad
 # item costs one item, not the whole report (error locality == unit of work).
 # ---------------------------------------------------------------------------
 _QUARANTINE_LIMIT = 20
 _SNIPPET_LIMIT = 200
 # Producer guardrails (ACTIVITY-WP-0016-T04): structural bounds applied to every
 # recommendation regardless of producer (LLM, agent, or human). These are
 # verify-and-mitigate limits — an offending item is quarantined, never allowed to
 # fail the whole report or flow unbounded into a downstream consumer.
 _MAX_STRING_LEN = 4000
 _MAX_DEPTH = 8
 _RAW_OUTPUT_PREVIEW_LIMIT = 12000
 _SUMMARY_RE = re.compile(r'"summary"\s*:\s*"((?:[^"\\]|\\.)*)"')
 _SAFE_RESPONSE_METADATA_KEYS = {
    "finish_reason",
    "usage",
    "model",
    "model_name",
    "provider",
    "request_id",
    "response_id",
    "trace_id",
    "latency_ms",
    "duration_ms",
    "elapsed_ms",
    "created",
    "created_at",
 }
 def _llm_response_metadata(llm_client: Any) -> dict[str, Any] | None:
    metadata = getattr(llm_client, "last_response_metadata", None)
    if not isinstance(metadata, dict) or not metadata:
        return None
    safe: dict[str, Any] = {}
    for key, value in metadata.items():
        if key not in _SAFE_RESPONSE_METADATA_KEYS:
            continue
        try:
            json.dumps(value)
        except (TypeError, ValueError):
            continue
        safe[str(key)] = value
    return safe or None
 def _snippet(value: Any) -> str:
    text = value if isinstance(value, str) else json.dumps(value, default=str)
    return text[:_SNIPPET_LIMIT]
 def _json_depth(value: Any, depth: int = 1) -> int:
    if depth > _MAX_DEPTH:
        return depth
    if isinstance(value, dict):
        return max((_json_depth(v, depth + 1) for v in value.values()), default=depth)
    if isinstance(value, list):
        return max((_json_depth(v, depth + 1) for v in value), default=depth)
    return depth
 def _has_oversized_string(value: Any) -> bool:
    if isinstance(value, str):
        return len(value) > _MAX_STRING_LEN
    if isinstance(value, dict):
        return any(_has_oversized_string(v) for v in value.values())
    if isinstance(value, list):
        return any(_has_oversized_string(v) for v in value)
    return False
 def _item_structure_error(item: Any) -> str | None:
    """Producer-agnostic structural guardrail: depth and string-length caps."""
    if _json_depth(item) > _MAX_DEPTH:
        return f"exceeds max nesting depth {_MAX_DEPTH}"
    if _has_oversized_string(item):
        return f"contains a string longer than {_MAX_STRING_LEN} chars"
    return None
 def _allow_list_from_context(context: dict | None) -> set[str] | None:
    """Build the recommendation-candidate allow-list from resolved context.
    Looks for `context["known_candidates"]` (a list/set of valid candidate ids).
    Returns None when absent so the allow-list check stays inert until a context
    resolver populates it — the guardrail capability ships now; activation is a
    one-line resolver change.
    """
    if not isinstance(context, dict):
        return None
    known = context.get("known_candidates")
    if isinstance(known, (list, set, tuple)):
        return {str(item) for item in known}
    return None
 def _report_contract(instr: Any) -> tuple[dict[str, Any] | None, int | None]:
    """Extract (item_schema, max_items) for the recommendations list, if any."""
    try:
        schema = _load_output_schema(getattr(instr, "output_schema", ""))
    except (OSError, json.JSONDecodeError, TypeError):
        return None, None
    if not isinstance(schema, dict):
        return None, None
    recs = (schema.get("properties") or {}).get("recommendations")
    if not isinstance(recs, dict):
        return None, None
    item_schema = recs.get("items") if isinstance(recs.get("items"), dict) else None
    max_items = recs.get("maxItems") if isinstance(recs.get("maxItems"), int) else None
    return item_schema, max_items
 def _extract_object_spans(raw: str) -> list[tuple[str, bool]]:
    """Return (span, complete) for each recommendation object in raw output.
    Scans the `recommendations` array brace-aware and string-aware so it recovers
    objects whether they are pretty-printed across many lines or emitted one per
    line (NDJSON). A truncated trailing object is returned with complete=False.
    """
    key = raw.find('"recommendations"')
    start_region = raw.find("[", key) if key >= 0 else -1
    if start_region < 0:
        return []
    spans: list[tuple[str, bool]] = []
    i, n = start_region + 1, len(raw)
    while i < n:
        ch = raw[i]
        if ch == "]":
            break
        if ch != "{":
            i += 1
            continue
        depth, in_str, esc, j = 0, False, False, i
        closed = False
        while j < n:
            c = raw[j]
            if in_str:
                if esc:
                    esc = False
                elif c == "\\":
                    esc = True
                elif c == '"':
                    in_str = False
            elif c == '"':
                in_str = True
            elif c == "{":
                depth += 1
            elif c == "}":
                depth -= 1
                if depth == 0:
                    spans.append((raw[i:j + 1], True))
                    closed = True
                    break
            j += 1
        if not closed:
            spans.append((raw[i:], False))  # truncated tail
            break
        i = j + 1
    return spans
 def _try_repair(span: str) -> str:
    """Best-effort close of a truncated JSON object: balance quote, braces, brackets."""
    in_str, esc, depth_c, depth_b = False, False, 0, 0
    for c in span:
        if in_str:
            if esc:
                esc = False
            elif c == "\\":
                esc = True
            elif c == '"':
                in_str = False
        elif c == '"':
            in_str = True
        elif c == "{":
            depth_c += 1
        elif c == "}":
            depth_c -= 1
        elif c == "[":
            depth_b += 1
        elif c == "]":
            depth_b -= 1
    repaired = span.rstrip().rstrip(",")
    if in_str:
        repaired += '"'
    return repaired + "]" * max(depth_b, 0) + "}" * max(depth_c, 0)
 def _recover_recommendations(
    raw: str,
 ) -> tuple[str | None, list[dict[str, Any]], list[dict[str, Any]]]:
    """Recover (summary, items, quarantined) from a failed report payload."""
    summary_match = _SUMMARY_RE.search(raw)
    summary = None
    if summary_match:
        try:
            summary = json.loads(f'"{summary_match.group(1)}"')
        except json.JSONDecodeError:
            summary = summary_match.group(1)
    items: list[dict[str, Any]] = []
    quarantined: list[dict[str, Any]] = []
    for index, (span, complete) in enumerate(_extract_object_spans(raw)):
        parsed: Any = None
        try:
            parsed = json.loads(span)
        except json.JSONDecodeError as exc:
            if not complete:
                try:
                    parsed = json.loads(_try_repair(span))
                except json.JSONDecodeError:
                    parsed = None
            if parsed is None:
                quarantined.append(
                    {"index": index, "error": str(exc), "raw": _snippet(span),
                     "reason": "truncated" if not complete else "unparseable"}
                )
                continue
        if isinstance(parsed, dict):
            items.append(parsed)
        else:
            quarantined.append(
                {"index": index, "error": "item is not a JSON object",
                 "raw": _snippet(span)}
            )
    return summary, items, quarantined
 def _partition_items(
    items: list[dict[str, Any]],
    item_schema: dict[str, Any] | None,
    max_items: int | None,
    *,
    run_schema: bool = True,
    allow_list: set[str] | None = None,
 ) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
    """Screen items into (valid, quarantined).
    Applied uniformly to recovered items (run_schema=True) and to already
    schema-valid happy-path items (run_schema=False). Order of checks: structural
    type → schema → producer guardrails (depth/length) → reference allow-list →
    count cap. The first failing check quarantines the item with provenance.
    """
    valid: list[dict[str, Any]] = []
    quarantined: list[dict[str, Any]] = []
    for index, item in enumerate(items):
        if not isinstance(item, dict):
            quarantined.append(
                {"index": index, "error": "item is not a JSON object",
                 "raw": _snippet(item), "reason": "malformed"}
            )
            continue
        schema_error = (
            _validate_schema_node(item, item_schema, f"recommendations[{index}]")
            if (run_schema and item_schema)
            else None
        )
        if schema_error:
            quarantined.append(
                {"index": index, "error": schema_error, "raw": _snippet(item),
                 "reason": "schema"}
            )
            continue
        structure_error = _item_structure_error(item)
        if structure_error:
            quarantined.append(
                {"index": index, "error": structure_error, "raw": _snippet(item),
                 "reason": "guardrail"}
            )
            continue
        if allow_list is not None:
            candidate = item.get("candidate")
            if not isinstance(candidate, str) or candidate not in allow_list:
                quarantined.append(
                    {"index": index, "error": f"candidate {candidate!r} not in allow-list",
                     "raw": _snippet(item), "reason": "allow_list"}
                )
                continue
        valid.append(item)
    if max_items is not None and len(valid) > max_items:
        for item in valid[max_items:]:
            quarantined.append(
                {"index": None, "error": f"exceeds maxItems={max_items}",
                 "raw": _snippet(item), "reason": "over_limit"}
            )
        valid = valid[:max_items]
    return valid, quarantined
 def _resilient_report(
    instr: Any,
    raw_output: Any,
    original_error: str,
    prompt_hash: str | None,
    allow_list: set[str] | None = None,
    response_metadata: dict[str, Any] | None = None,
 ) -> InstructionResult | None:
    """Recover a partial-but-usable report from output that failed validation.
    Returns None when nothing usable can be recovered, so the caller falls back
    to the total-loss diagnostic artifact (_invalid_output_report).
    """
    if not getattr(instr, "report_sinks", None) or not isinstance(raw_output, str):
        return None
    item_schema, max_items = _report_contract(instr)
    summary, items, quarantined = _recover_recommendations(raw_output)
    if not items:
        return None
    valid, item_quarantine = _partition_items(
        items, item_schema, max_items, allow_list=allow_list,
    )
    quarantined.extend(item_quarantine)
    if not valid:
        return None
    report: dict[str, Any] = {
        "summary": summary
        or f"Partial daily triage: recovered {len(valid)} recommendation(s) "
        "after the full report failed validation.",
        "recommendations": valid,
        "status": "partial",
        "partial": True,
        "quarantined_count": len(quarantined),
        "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
        "recovery_note": f"original validation error: {original_error}",
    }
    if response_metadata:
        report["llm_response_metadata"] = response_metadata
    logger.warning(
        "instruction_output_recovered: instruction=%r, kept=%d, quarantined=%d",
        getattr(instr, "id", None), len(valid), len(quarantined),
    )
    return InstructionResult(
        tasks=[],
        report=report,
        prompt_hash=prompt_hash,
        model=getattr(instr, "model", None),
        output_validated=True,
        review_required=True,
        condition_matched=getattr(instr, "condition", "") or None,
        validation_error=None,
        llm_response_metadata=response_metadata,
    )
 def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
    """Build a durable diagnostic report when a report instruction cannot run."""
    if not getattr(instr, "report_sinks", None):
@@ -295,6 +671,7 @@ def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
 def _validate_output(
    raw_output: Any,
    instr: Any,
    allow_list: set[str] | None = None,
 ) -> tuple[list[TaskSpec], dict[str, Any] | None, str | None]:
    """Parse raw LLM output into TaskSpecs and optional report payload.
@@ -349,6 +726,28 @@ def _validate_output(
                source_type="instruction",
                source_id=instr.id,
            ))
        # Happy-path producer guardrails (WP-0016-T04): the whole document already
        # passed schema validation, so recommendations are schema-valid; still apply
        # the count cap, structural caps, and reference allow-list, quarantining any
        # offenders rather than emitting them. Report shape only changes when an item
        # is actually quarantined.
        if isinstance(report, dict) and isinstance(report.get("recommendations"), list):
            item_schema, max_items = _report_contract(instr)
            kept, quarantined = _partition_items(
                report["recommendations"], item_schema, max_items,
                run_schema=False, allow_list=allow_list,
            )
            if quarantined:
                report = {
                    **report,
                    "recommendations": kept,
                    "status": "partial",
                    "partial": True,
                    "quarantined_count": len(quarantined),
                    "quarantined_items": quarantined[:_QUARANTINE_LIMIT],
                }
        return specs, report, None
    except (json.JSONDecodeError, AttributeError, KeyError, TypeError) as exc:
        return [], None, str(exc)
--- a/src/activity_core/schedule_health.py
+++ b/src/activity_core/schedule_health.py
@@ -0,0 +1,194 @@
 """Missed-fire detection for cron schedules (ACTIVITY-WP-0014, T03).
 Even with a catchup window configured, an operator wants to *know* when a fire
 was missed — especially under ``misfire_policy: skip`` where missed fires are
 dropped by design and leave no run and no failure event. This module turns the
 schedule's own bookkeeping into an explicit verdict and an optional State Hub
 alert so a miss is never invisible again.
 Temporal already counts fires that were dropped because they fell outside the
 catchup window in ``ScheduleInfo.num_actions_missed_catchup_window``. We surface
 that, plus a staleness check on the most recent fire, as a ``ScheduleHealth``
 verdict. The verdict logic is a pure function so it is testable without a live
 Temporal server; ``check_schedule_health`` is the thin async reader.
 """
 from __future__ import annotations
 import os
 from dataclasses import dataclass, field
 from datetime import datetime, timedelta, timezone
 from typing import Any
 from uuid import UUID
 import httpx
 from activity_core.schedule_manager import schedule_id
 from activity_core.state_hub_write import idempotency_headers
 _DEFAULT_STATE_HUB_URL = "http://127.0.0.1:8000"
@dataclass(frozen=True)
 class ScheduleHealth:
    """Verdict for a single schedule's recent firing behaviour."""
    activity_id: str
    healthy: bool
    missed_catchup_window: int
    last_fired_at: datetime | None
    staleness: timedelta | None
    reasons: list[str] = field(default_factory=list)
    @property
    def missed(self) -> bool:
        return not self.healthy
 def evaluate_schedule_health(
    *,
    activity_id: str,
    missed_catchup_window: int,
    last_fired_at: datetime | None,
    now: datetime,
    expected_interval: timedelta | None = None,
    tolerance: timedelta = timedelta(minutes=10),
 ) -> ScheduleHealth:
    """Pure verdict: was a fire missed?
    A schedule is unhealthy if Temporal dropped any fire past the catchup window,
    or — when ``expected_interval`` is known — if the most recent fire is older
    than one interval plus ``tolerance`` (i.e. a fire should have happened and
    did not).
    """
    reasons: list[str] = []
    if missed_catchup_window > 0:
        reasons.append(
            f"{missed_catchup_window} fire(s) dropped outside the catchup window"
        )
    staleness: timedelta | None = None
    if last_fired_at is not None:
        staleness = now - last_fired_at
        if expected_interval is not None and staleness > expected_interval + tolerance:
            reasons.append(
                f"last fire was {staleness} ago, exceeding the expected "
                f"{expected_interval} interval"
            )
    elif expected_interval is not None:
        reasons.append("no recorded fire for a schedule that should have fired")
    return ScheduleHealth(
        activity_id=activity_id,
        healthy=not reasons,
        missed_catchup_window=missed_catchup_window,
        last_fired_at=last_fired_at,
        staleness=staleness,
        reasons=reasons,
    )
 def _extract_info(desc: Any) -> tuple[int, datetime | None]:
    """Pull (missed_catchup_window, last_fired_at) from a ScheduleDescription.
    Accesses are defensive so a Temporal SDK field rename degrades to "unknown"
    rather than raising inside an operational health check.
    """
    info = getattr(desc, "info", None)
    missed = int(getattr(info, "num_actions_missed_catchup_window", 0) or 0)
    last_fired: datetime | None = None
    recent = getattr(info, "recent_actions", None) or []
    times = [
        getattr(a, "scheduled_at", None) or getattr(a, "started_at", None)
        for a in recent
    ]
    times = [t for t in times if t is not None]
    if times:
        last_fired = max(times)
    return missed, last_fired
 async def check_schedule_health(
    client: Any,
    activity_id: str | UUID,
    *,
    now: datetime | None = None,
    expected_interval: timedelta | None = None,
    tolerance: timedelta = timedelta(minutes=10),
 ) -> ScheduleHealth:
    """Describe the schedule for ``activity_id`` and evaluate its health."""
    now = now or datetime.now(tz=timezone.utc)
    handle = client.get_schedule_handle(schedule_id(activity_id))
    desc = await handle.describe()
    missed, last_fired = _extract_info(desc)
    return evaluate_schedule_health(
        activity_id=str(activity_id),
        missed_catchup_window=missed,
        last_fired_at=last_fired,
        now=now,
        expected_interval=expected_interval,
        tolerance=tolerance,
    )
 def post_missed_fire_alert(
    health: ScheduleHealth,
    *,
    state_hub_url: str | None = None,
    author: str = "activity-core",
    topic_id: str | None = None,
    workstream_id: str | None = None,
    timeout_seconds: float = 10.0,
 ) -> dict[str, Any]:
    """Post a ``schedule_miss`` progress event to State Hub for an unhealthy schedule.
    No-op (returns ``status: ok``) when the schedule is healthy, so callers can
    invoke unconditionally.
    """
    if health.healthy:
        return {"type": "schedule-miss-alert", "status": "ok"}
    base_url = state_hub_url or os.environ.get("STATE_HUB_URL", _DEFAULT_STATE_HUB_URL)
    base_url = str(base_url).rstrip("/")
    body: dict[str, Any] = {
        "event_type": "schedule_miss",
        "author": author,
        "summary": (
            f"Schedule {health.activity_id} missed a fire: "
            + "; ".join(health.reasons)
        ),
        "detail": {
            "activity_id": health.activity_id,
            "missed_catchup_window": health.missed_catchup_window,
            "last_fired_at": (
                health.last_fired_at.isoformat() if health.last_fired_at else None
            ),
            "staleness_seconds": (
                health.staleness.total_seconds() if health.staleness else None
            ),
            "reasons": health.reasons,
        },
    }
    if topic_id:
        body["topic_id"] = topic_id
    if workstream_id:
        body["workstream_id"] = workstream_id
    # Dedup repeated alerts for the same missed window (same schedule + last fire).
    last_fired = health.last_fired_at.isoformat() if health.last_fired_at else "none"
    resp = httpx.post(
        f"{base_url}/progress/",
        json=body,
        headers=idempotency_headers("schedule_miss", health.activity_id, last_fired),
        timeout=timeout_seconds,
    )
    resp.raise_for_status()
    data = resp.json()
    return {
        "type": "schedule-miss-alert",
        "status": "posted",
        "progress_id": data.get("id"),
    }
--- a/src/activity_core/schedule_manager.py
+++ b/src/activity_core/schedule_manager.py
@@ -17,7 +17,6 @@ from temporalio.client import (
    Schedule,
    ScheduleActionStartWorkflow,
    ScheduleAlreadyRunningError,
    ScheduleBackfill,
    ScheduleCalendarSpec,
    ScheduleHandle,
    ScheduleOverlapPolicy,
@@ -38,13 +37,49 @@ _ORCHESTRATOR_TASK_QUEUE = "orchestrator-tq"
 # RunActivityWorkflow detects this value and derives run dedup key from workflow_id.
 SCHEDULED_TRIGGER_KEY = "scheduled"
-# T24: misfire_policy → ScheduleOverlapPolicy
+# ACTIVITY-WP-0014: misfire_policy → run-miss recovery behaviour.
-_MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
+#
-    "skip": ScheduleOverlapPolicy.SKIP,
+# A "missed fire" happens when the worker / Temporal is unavailable at trigger
-    "catchup": ScheduleOverlapPolicy.BUFFER_ALL,
+# time. Two Temporal levers together define the behaviour:
-    "compress": ScheduleOverlapPolicy.BUFFER_ONE,
+#   - catchup_window: how far back the server will recover missed fires once it
 #     is healthy again. The previous code never set this, so a brief outage at
 #     trigger time silently dropped the fire with no recovery and no signal.
 #   - overlap: what to do when a (recovered) fire would start while a prior run
 #     is still executing.
 #
 # Legacy values (catchup, compress) are aliased onto the explicit names.
 _MISFIRE_ALIASES: dict[str, str] = {
    "catchup": "catchup_all",
    "compress": "catchup_latest",
 }
 # overlap policy + default catchup window (seconds) per normalised policy.
 _SKIP_WINDOW_SECONDS = 60
 _CATCHUP_ALL_WINDOW_SECONDS = 365 * 24 * 3600
 _CATCHUP_LATEST_WINDOW_SECONDS = 24 * 3600
 _MISFIRE_TO_OVERLAP: dict[str, ScheduleOverlapPolicy] = {
    # Run on trigger or skip — recover nothing past a tiny grace window.
    "skip": ScheduleOverlapPolicy.SKIP,
    # Run on trigger or recover every missed fire during the outage window.
    "catchup_all": ScheduleOverlapPolicy.BUFFER_ALL,
    # Run on trigger or recover the most recent missed fire only; BUFFER_ONE
    # buffers at most one start and drops the rest, so a backlog never accumulates.
    "catchup_latest": ScheduleOverlapPolicy.BUFFER_ONE,
 }
 _MISFIRE_DEFAULT_WINDOW: dict[str, int] = {
    "skip": _SKIP_WINDOW_SECONDS,
    "catchup_all": _CATCHUP_ALL_WINDOW_SECONDS,
    "catchup_latest": _CATCHUP_LATEST_WINDOW_SECONDS,
 }
 def _normalize_misfire_policy(misfire_policy: str) -> str:
    """Map legacy aliases onto the explicit run-miss policy names."""
    canonical = _MISFIRE_ALIASES.get(misfire_policy, misfire_policy)
    return canonical if canonical in _MISFIRE_TO_OVERLAP else "skip"
 def schedule_id(activity_id: str | UUID) -> str:
    """Return the canonical Temporal Schedule ID for an ActivityDefinition."""
@@ -57,7 +92,15 @@ def smoke_schedule_id(activity_id: str | UUID) -> str:
 def _overlap_policy(misfire_policy: str) -> ScheduleOverlapPolicy:
-    return _MISFIRE_TO_OVERLAP.get(misfire_policy, ScheduleOverlapPolicy.SKIP)
+    return _MISFIRE_TO_OVERLAP[_normalize_misfire_policy(misfire_policy)]
 def _catchup_window(cfg: CronTriggerConfig) -> timedelta:
    """Resolve the catchup window: explicit override, else the policy default."""
    if cfg.catchup_window_seconds is not None:
        return timedelta(seconds=cfg.catchup_window_seconds)
    policy = _normalize_misfire_policy(cfg.misfire_policy)
    return timedelta(seconds=_MISFIRE_DEFAULT_WINDOW[policy])
 def _build_schedule(defn: ActivityDefinition) -> Schedule:
@@ -80,7 +123,10 @@ def _build_schedule(defn: ActivityDefinition) -> Schedule:
        jitter=timedelta(seconds=cfg.jitter_seconds) if cfg.jitter_seconds else None,
    )
-    policy = SchedulePolicy(overlap=_overlap_policy(cfg.misfire_policy))
+    policy = SchedulePolicy(
        overlap=_overlap_policy(cfg.misfire_policy),
        catchup_window=_catchup_window(cfg),
    )
    state = ScheduleState(paused=not defn.enabled)
    return Schedule(action=action, spec=spec, policy=policy, state=state)
@@ -282,18 +328,10 @@ async def upsert_schedule(client: Client, defn: ActivityDefinition) -> ScheduleH
        else:
            await handle.pause(note="disabled via upsert_schedule")
-    # T24 catchup: backfill any fires missed in the last hour.
+    # ACTIVITY-WP-0014: missed-fire recovery is now handled natively by the
-    if isinstance(defn.trigger_config, CronTriggerConfig):
+    # schedule's catchup_window (see _build_schedule), which the server applies
-        if defn.trigger_config.misfire_policy == "catchup":
+    # continuously after any outage — not only at upsert time. The previous
-            now = datetime.now(tz=timezone.utc)
+    # ad-hoc 1-hour backfill is therefore no longer needed.
            backfill_start = now - timedelta(hours=1)
            await handle.backfill(
                ScheduleBackfill(
                    start_at=backfill_start,
                    end_at=now,
                    overlap=ScheduleOverlapPolicy.BUFFER_ALL,
                )
            )
    return handle
--- a/src/activity_core/state_hub_write.py
+++ b/src/activity_core/state_hub_write.py
@@ -0,0 +1,34 @@
 """Idempotency-keyed State Hub writes (ACTIVITY-WP-0014 T05).
 Under the State Hub *beachhead* model, a write may be buffered locally while
 central State Hub is unreachable and **flushed later, possibly with retries**.
 To keep that flush safe — no duplicate progress / triage events — every write
 carries a stable ``Idempotency-Key`` header derived deterministically from the
 write's identity. The guarantee lives on the write itself and does **not** depend
 on a live dedup read, so it holds even when the beachhead is serving offline.
 activity-core does not implement the queue/cache (that is state-hub's beachhead);
 it only emits the key so the beachhead / State Hub can dedup on flush. The header
 passes untouched through the existing ``actcore-state-hub-bridge`` proxy and is
 ignored by State Hub versions that do not yet honour it.
 """
 from __future__ import annotations
 IDEMPOTENCY_HEADER = "Idempotency-Key"
 def idempotency_key(*parts: str | None) -> str:
    """Build a stable, header-safe idempotency key from identity parts.
    Empty/None parts are kept as empty segments so the key shape is stable across
    calls. Whitespace and control characters are collapsed to keep the value a
    valid single-line HTTP header.
    """
    raw = ":".join((p or "") for p in parts)
    return "".join(ch if 0x20 < ord(ch) < 0x7F else "_" for ch in raw) or "_"
 def idempotency_headers(*parts: str | None) -> dict[str, str]:
    """Return the header dict to attach to a State Hub write."""
    return {IDEMPOTENCY_HEADER: idempotency_key(*parts)}
--- a/src/activity_core/sync_schedules.py
+++ b/src/activity_core/sync_schedules.py
@@ -15,6 +15,8 @@ import asyncio
 import logging
 import os
 import uuid
 from dataclasses import dataclass
 from typing import Sequence
 from sqlalchemy import select
 from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
@@ -30,6 +32,20 @@ TEMPORAL_HOST = os.environ.get("TEMPORAL_HOST", "localhost:7233")
 TEMPORAL_NAMESPACE = os.environ.get("TEMPORAL_NAMESPACE", "default")
@dataclass
 class ScheduleSyncResult:
    upserted: int = 0
    paused: int = 0
    deleted_orphans: int = 0
    def to_dict(self) -> dict[str, int]:
        return {
            "upserted": self.upserted,
            "paused": self.paused,
            "deleted_orphans": self.deleted_orphans,
        }
 def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    """Convert an ORM row to a domain ActivityDefinition for schedule_manager."""
    return ActivityDefinition.model_validate(
@@ -46,12 +62,82 @@ def _row_to_domain(row: ActivityDefinitionRow) -> ActivityDefinition:
    )
-async def sync(client: Client, db_url: str) -> None:
+def _valid_schedule_activity_id(defn: ActivityDefinition) -> str:
    if isinstance(defn.trigger_config, ScheduledTriggerConfig):
        return f"{defn.id}-once"
    return str(defn.id)
 async def _load_schedule_rows(
    session_factory: async_sessionmaker[AsyncSession],
 ) -> Sequence[ActivityDefinitionRow]:
    async with session_factory() as session:
        return (
            await session.scalars(
                select(ActivityDefinitionRow).where(
                    ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
                )
            )
        ).all()
 async def sync_schedule_rows(
    client: Client,
    rows: Sequence[ActivityDefinitionRow],
 ) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules against already-loaded definition rows."""
    valid_schedule_activity_ids: set[str] = set()
    result = ScheduleSyncResult()
    for row in rows:
        defn = _row_to_domain(row)
        if not isinstance(
            defn.trigger_config,
            (CronTriggerConfig, ScheduledTriggerConfig),
        ):
            continue
        valid_schedule_activity_ids.add(_valid_schedule_activity_id(defn))
        await upsert_schedule(client, defn)
        if defn.enabled:
            result.upserted += 1
            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
        else:
            result.paused += 1
            logger.info("upserted paused schedule for disabled activity %s", defn.id)
    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
    existing_schedules = await list_schedules(client)
    for entry in existing_schedules:
        if entry["activity_id"] not in valid_schedule_activity_ids:
            await delete_schedule(client, entry["activity_id"])
            result.deleted_orphans += 1
            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
    logger.info(
        "sync_schedules complete — upserted=%d paused=%d deleted_orphans=%d",
        result.upserted,
        result.paused,
        result.deleted_orphans,
    )
    return result
 async def sync_with_session_factory(
    client: Client,
    session_factory: async_sessionmaker[AsyncSession],
 ) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules using an existing DB session factory."""
    return await sync_schedule_rows(client, await _load_schedule_rows(session_factory))
 async def sync(client: Client, db_url: str) -> ScheduleSyncResult:
    """Reconcile Temporal Schedules against the ActivityDefinition table.
    Steps:
-      1. Load all enabled cron ActivityDefinitions from Postgres.
+      1. Load all cron/scheduled ActivityDefinitions from Postgres.
-      2. Upsert a Temporal Schedule for each one.
+      2. Upsert a Temporal Schedule for each one, paused when disabled.
      3. Delete Temporal Schedules whose activity_id has no matching DB row
         (tombstone cleanup for deleted or trigger-type-changed definitions).
    """
@@ -59,55 +145,10 @@ async def sync(client: Client, db_url: str) -> None:
    session_factory = async_sessionmaker(engine, expire_on_commit=False)
    try:
-        async with session_factory() as session:
+        return await sync_with_session_factory(client, session_factory)
            rows = (
                await session.scalars(
                    select(ActivityDefinitionRow).where(
                        ActivityDefinitionRow.trigger_type.in_(["cron", "scheduled"])
                    )
                )
            ).all()
    finally:
        await engine.dispose()
    db_activity_ids: set[str] = set()
    upserted = 0
    skipped = 0
    for row in rows:
        defn = _row_to_domain(row)
        if not isinstance(defn.trigger_config, (CronTriggerConfig, ScheduledTriggerConfig)):
            continue
        db_activity_ids.add(str(defn.id))
        if defn.enabled:
            await upsert_schedule(client, defn)
            upserted += 1
            logger.info("upserted schedule for activity %s (%s)", defn.id, defn.name)
        else:
            # Disabled definitions: schedule may exist (paused) — leave it;
            # upsert_schedule already handles the paused state.
            await upsert_schedule(client, defn)
            skipped += 1
            logger.info("upserted paused schedule for disabled activity %s", defn.id)
    # Tombstone cleanup: remove Temporal Schedules with no matching DB row.
    existing_schedules = await list_schedules(client)
    deleted = 0
    for entry in existing_schedules:
        if entry["activity_id"] not in db_activity_ids:
            await delete_schedule(client, entry["activity_id"])
            deleted += 1
            logger.info("deleted orphaned schedule %s", entry["schedule_id"])
    logger.info(
        "sync_schedules complete — upserted=%d skipped_disabled=%d deleted_orphans=%d",
        upserted,
        skipped,
        deleted,
    )
 async def main() -> None:
    logging.basicConfig(level=logging.INFO)
@@ -116,7 +157,13 @@ async def main() -> None:
        raise RuntimeError("ACTCORE_DB_URL is required")
    client = await Client.connect(TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE)
-    await sync(client, db_url)
+    result = await sync(client, db_url)
    print(
        "Synced schedules: "
        f"upserted={result.upserted} "
        f"paused={result.paused} "
        f"deleted_orphans={result.deleted_orphans}"
    )
 if __name__ == "__main__":
--- a/src/activity_core/sync_service.py
+++ b/src/activity_core/sync_service.py
@@ -0,0 +1,97 @@
 """Shared ActivityDefinition/event type/schedule sync orchestration."""
 from __future__ import annotations
 from typing import Any
 from temporalio.client import Client
 from activity_core.event_type_registry import sync_event_types
 from activity_core.sync_activity_definitions import sync as sync_activity_definitions
 from activity_core.sync_schedules import ScheduleSyncResult, sync_with_session_factory
 _MAX_ERRORS = 20
 _MAX_ERROR_MESSAGE_LENGTH = 1000
 def _empty_result(
    *,
    definitions: bool,
    schedules: bool,
    event_types: bool,
 ) -> dict[str, Any]:
    return {
        "ok": True,
        "ran": {
            "definitions": definitions,
            "schedules": schedules,
            "event_types": event_types,
        },
        "definitions": {"synced": 0},
        "event_types": {"synced": 0},
        "schedules": ScheduleSyncResult().to_dict(),
        "errors": [],
    }
 def _record_error(result: dict[str, Any], stage: str, exc: Exception) -> None:
    errors = result["errors"]
    if len(errors) >= _MAX_ERRORS:
        return
    errors.append(
        {
            "stage": stage,
            "type": type(exc).__name__,
            "message": str(exc)[:_MAX_ERROR_MESSAGE_LENGTH],
        }
    )
    result["ok"] = False
 async def run_sync(
    *,
    session_factory: Any,
    temporal_client: Client | None,
    definitions: bool = True,
    schedules: bool = True,
    event_types: bool = False,
 ) -> dict[str, Any]:
    """Run the requested sync stages and return bounded operator-facing status.
    The orchestration deliberately accepts its database and Temporal
    dependencies as arguments so startup and the API can share the same behavior
    without creating another global runtime.
    """
    result = _empty_result(
        definitions=definitions,
        schedules=schedules,
        event_types=event_types,
    )
    if definitions:
        try:
            result["definitions"]["synced"] = await sync_activity_definitions(
                session_factory
            )
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "definitions", exc)
    if event_types:
        try:
            result["event_types"]["synced"] = await sync_event_types(session_factory)
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "event_types", exc)
    if schedules:
        try:
            if temporal_client is None:
                raise RuntimeError("Temporal client is required for schedule sync")
            schedule_result = await sync_with_session_factory(
                temporal_client,
                session_factory,
            )
            result["schedules"] = schedule_result.to_dict()
        except Exception as exc:  # pragma: no cover - exercised through tests
            _record_error(result, "schedules", exc)
    return result
--- a/src/activity_core/worker.py
+++ b/src/activity_core/worker.py
@@ -46,8 +46,7 @@ from activity_core.activities import (
 )
 from activity_core.db import make_engine
 from sqlalchemy.ext.asyncio import async_sessionmaker
-from activity_core.sync_activity_definitions import sync as sync_activity_defs
+from activity_core.sync_service import run_sync
 from activity_core.sync_schedules import sync as sync_schedules
 from activity_core.workflows import RunActivityWorkflow, TaskExecutorWorkflow
 logger = logging.getLogger(__name__)
@@ -77,20 +76,26 @@ async def run() -> None:
        TEMPORAL_HOST, namespace=TEMPORAL_NAMESPACE, runtime=runtime
    )
-    # T45: Sync ActivityDefinition files into DB before schedule sync.
+    logger.info("Syncing ActivityDefinitions and Temporal Schedules...")
-    logger.info("Syncing ActivityDefinition files...")
+    sync_engine = make_engine(db_url)
    session_factory = async_sessionmaker(sync_engine, expire_on_commit=False)
    try:
-        session_factory = async_sessionmaker(make_engine(db_url), expire_on_commit=False)
+        sync_result = await run_sync(
-        await sync_activity_defs(session_factory)
+            session_factory=session_factory,
-    except Exception:
+            temporal_client=client,
-        logger.exception("activity definition sync failed — continuing worker startup")
+            definitions=True,
-
+            schedules=True,
-    # T23: Sync Temporal Schedules with the DB before workers start accepting tasks.
+            event_types=False,
-    logger.info("Syncing Temporal Schedules with ActivityDefinition DB...")
+        )
-    try:
+        for error in sync_result["errors"]:
-        await sync_schedules(client, db_url)
+            logger.error(
-    except Exception:
+                "startup sync %s failed — %s: %s",
-        logger.exception("schedule sync failed — continuing worker startup")
+                error["stage"],
                error["type"],
                error["message"],
            )
    finally:
        await sync_engine.dispose()
    orchestrator_worker = Worker(
        client,
--- a/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
+++ b/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
@@ -0,0 +1,5 @@
 {
  "_note": "PARTIAL 4000-char preview of the 2026-06-26 daily-triage validation failure (retry attempt). Full payload not recoverable from activity-core: complete() drops finish_reason; report sink caps raw at 4000 chars; the JSON break is at char 5268 (beyond this preview). Full response would require llm-connect producer-side logs on railiance01.",
  "validation_error": "Expecting ',' delimiter: line 136 column 22 (char 5268)",
  "raw_output_preview": "{\n  \"summary\": \"Triage report focusing on high-priority workstreams with pending human intervention or critical dependencies, and addressing recently cleared dependencies to unblock progress.\",\n  \"recommendations\": [\n    {\n      \"rank\": 1,\n      \"candidate\": \"2731fece-6c49-45b8-ab8a-4ea6c04ac603\",\n      \"action\": \"work-next\",\n      \"why\": \"A critical dependency (T03 - Configure bounded OpenBao token roles and policies) for this workstream has been cleared, unblocking significant progress on credential management. This workstream has 8 todo tasks and no waits, indicating it's ready for immediate action.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 5.0,\n        \"strategic_value\": 5,\n        \"time_criticality\": 5,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 5,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 2,\n      \"candidate\": \"bd086c41-287d-4a4e-8ac5-9ab270f14d72\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (T04 - Provision the runtime API key outside Git) and is currently blocked by 3 'wait' tasks. Human intervention is required to unblock progress.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 3,\n      \"candidate\": \"9b56414a-c71f-4e72-9b2b-d2166aaf50d0\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (Task: Execute Live Ops-Hub Bootstrap) and is currently blocked by a 'wait' task. Human intervention is required to proceed with the bootstrap.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 4,\n      \"candidate\": \"84e17675-0d15-4268-a8bd-540124d37018\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has 4 'needs_human' tasks, including 'T02 \u2014 Resolve Forgejo production design decisions', indicating significant human input is required to move forward with the migration.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.0,\n        \"strategic_value\": 4,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 5,\n      \"candidate\": \"5646e13a-13af-4724-bca6-3c0d86f96733\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has a 'needs_human' task ('Three-Run Calibration Feedback') and is currently in a 'wait' state. Human feedback is crucial for operational hardening.\",\n      \"confidence\": \"medium\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 6,\n      \"candidate\": \"896ace77-21b3-450b-8fb7-254aefc8c570\",\n      \"action\": \"close-out\",\n      \"why\": \"The task 'Wire activity-core to the live service' has been resolved, and the workstream shows 2 progress tasks with 0 todo/wait tasks. This indicates the deployment is likely complete or nearing completion and ready for close-out after verification.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 7,\n      \"candidate\": \"656e435d-3a00-4f5e-a38e-114467f9062e\",\n      \"action\": \"work-next\",\n      \"why\": \"This high-priority workstream has a single 'wait' task ('Task: Activate Ops-Hub Widgets In Inter-Hub') and no 'needs_human' tasks. It appears ready for the next step to activate the widgets.\",\n      \"confidence\": \"medium\",\n      \"wsjf"
 }
--- a/tests/rules/test_actions.py
+++ b/tests/rules/test_actions.py
@@ -88,6 +88,43 @@ def test_for_each_binds_each_list_item_before_condition_and_action_rendering() -
    ]
 def test_for_each_can_gate_registry_hygiene_gaps_on_signal() -> None:
    rules = [
        {
            "id": "flag-registry-hygiene-gap",
            "for_each": "context.gaps",
            "bind_as": "g",
            "condition": 'context.g.hygiene_signal != ""',
            "action": {
                "task_template": "Close registry hygiene gap for {context.g.repo}",
                "target_repo": "context.g.repo",
                "priority": "medium",
                "labels": ["registry-hygiene", "{context.g.hygiene_signal}"],
            },
        }
    ]
    context = {
        "gaps": [
            {
                "repo": "reuse-surface",
                "hygiene_signal": "empty_capability_scaffold",
            },
            {
                "repo": "activity-core",
                "hygiene_signal": "",
            },
        ]
    }
    specs = expand_rule_actions(rules, _Event(), context)
    assert [spec["target_repo"] for spec in specs] == ["reuse-surface"]
    assert specs[0]["labels"] == [
        "registry-hygiene",
        "empty_capability_scaffold",
    ]
 def test_for_each_rejects_non_path_expression() -> None:
    rules = [
        {
--- a/tests/rules/test_executor.py
+++ b/tests/rules/test_executor.py
@@ -12,6 +12,7 @@ Covers:
 from __future__ import annotations
 import json
 from pathlib import Path
 from types import SimpleNamespace
 from typing import Any
@@ -333,7 +334,14 @@ def test_execute_instruction_forwards_output_schema_to_llm_connect(tmp_path, mon
 def test_execute_instruction_with_audit_accepts_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
            {
                "rank": 1,
                "action": "revisit",
                "candidate": "CUST-WP-0045",
                "why": "Loose ends need attention.",
            }
        ],
    }
    llm = _CountingLLM([json.dumps(report_data)])
    instr = _instr(
@@ -353,7 +361,14 @@ def test_execute_instruction_with_audit_accepts_report_payload():
 def test_execute_instruction_with_audit_accepts_fenced_report_payload():
    report_data = {
        "summary": "State Hub has loose ends.",
-        "recommendations": [{"action": "revisit", "candidate": "CUST-WP-0045"}],
+        "recommendations": [
            {
                "rank": 1,
                "action": "revisit",
                "candidate": "CUST-WP-0045",
                "why": "Loose ends need attention.",
            }
        ],
    }
    llm = _CountingLLM([f"```json\n{json.dumps(report_data)}\n```"])
    instr = _instr(
@@ -389,6 +404,216 @@ def test_execute_instruction_with_audit_rejects_invalid_report_schema():
    assert llm.call_count == 2
 # ── WP-0016-T03 resilient report recovery ─────────────────────────────────────
 def _valid_rec(rank: int) -> dict[str, Any]:
    return {
        "rank": rank,
        "candidate": f"WS-{rank}",
        "action": "work-next",
        "why": f"reason {rank}",
        "wsjf": {"score": 5.0},
    }
 def _pretty_triage_with_truncated_tail(num_valid: int) -> str:
    body = ",\n".join("    " + json.dumps(_valid_rec(i)) for i in range(1, num_valid + 1))
    # Trailing object is cut off mid-string — the whole document is invalid JSON,
    # reproducing the 2026-06-26 failure shape (valid prefix, broken tail).
    return (
        '{\n  "summary": "Daily triage.",\n  "recommendations": [\n'
        + body
        + ',\n    {\n      "rank": '
        + str(num_valid + 1)
        + ',\n      "candidate": "WS-X",\n      "action": "work-'
    )
 def test_resilient_report_recovers_valid_prefix_and_quarantines_truncated_tail():
    raw = _pretty_triage_with_truncated_tail(7)
    llm = _CountingLLM([raw, raw])
    instr = _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
    assert result.output_validated is True
    assert result.review_required is True
    assert result.report is not None
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 7
    assert result.report["summary"] == "Daily triage."
    assert result.report["quarantined_count"] >= 1
    # The broken tail is dropped — either as an unparseable/truncated span or,
    # if _try_repair salvages its structure, as a schema-invalid item. Either way
    # it carries a diagnostic error and never pollutes the surviving report.
    assert result.report["quarantined_items"][0]["error"]
 def test_resilient_report_quarantines_one_bad_item_among_valid():
    recs = [_valid_rec(1), {"candidate": "WS-2", "action": "x", "why": "no rank"}, _valid_rec(3)]
    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
    llm = _CountingLLM([raw, raw])
    instr = _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
    assert result.output_validated is True
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 2
    assert result.report["quarantined_count"] == 1
    assert "rank" in result.report["quarantined_items"][0]["error"]
 # ── WP-0016-T04 producer guardrails ───────────────────────────────────────────
 def _triage_instr() -> SimpleNamespace:
    return _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        output_schema="schemas/daily-triage-report.json",
        report_sinks=[{"type": "working-memory"}],
    )
 def test_guardrail_count_cap_on_valid_happy_path():
    # 9 fully-valid recommendations in a syntactically valid document: schema
    # validation passes, but the maxItems=7 count cap must keep 7 and quarantine 2.
    recs = [_valid_rec(i) for i in range(1, 10)]
    raw = json.dumps({"summary": "Triage.", "recommendations": recs})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert llm.call_count == 1  # no retry — the document was valid
    assert result.report["partial"] is True
    assert len(result.report["recommendations"]) == 7
    assert result.report["quarantined_count"] == 2
    assert all(q["reason"] == "over_limit" for q in result.report["quarantined_items"])
 def test_guardrail_oversized_string_quarantined():
    big = _valid_rec(2)
    big["why"] = "x" * 5000  # exceeds _MAX_STRING_LEN
    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), big]})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["quarantined_count"] == 1
    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
 def test_guardrail_allow_list_rejects_unknown_candidate():
    raw = json.dumps({
        "summary": "Triage.",
        "recommendations": [_valid_rec(1), _valid_rec(2)],  # candidates WS-1, WS-2
    })
    llm = _CountingLLM([raw])
    context = {"known_candidates": ["WS-1"]}
    result = execute_instruction_with_audit(_triage_instr(), _Event(), context, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["recommendations"][0]["candidate"] == "WS-1"
    assert result.report["quarantined_items"][0]["reason"] == "allow_list"
 def _nested(depth: int) -> dict[str, Any]:
    node: dict[str, Any] = {"leaf": 1}
    for _ in range(depth):
        node = {"a": node}
    return node
 def test_guardrail_over_depth_quarantined():
    deep = _valid_rec(2)
    deep["extra"] = _nested(12)  # well past _MAX_DEPTH
    raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), deep]})
    llm = _CountingLLM([raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert len(result.report["recommendations"]) == 1
    assert result.report["quarantined_count"] == 1
    assert result.report["quarantined_items"][0]["reason"] == "guardrail"
    assert "depth" in result.report["quarantined_items"][0]["error"]
 def test_resilient_recovery_against_real_2026_06_26_fixture():
    # The actual captured failure payload (4000-char preview, truncated at the 7th
    # recommendation) — the run that reset the WP-0006-T03 streak. Before WP-0016
    # this discarded the whole report; now it must recover the valid prefix.
    fixture = json.loads(
        Path("tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json")
        .read_text(encoding="utf-8")
    )
    raw = fixture["raw_output_preview"]
    llm = _CountingLLM([raw, raw])
    result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
    assert result.output_validated is True
    assert result.report["partial"] is True
    # Six recommendations are fully intact before the truncation point.
    assert len(result.report["recommendations"]) >= 6
    assert all("rank" in rec and "candidate" in rec for rec in result.report["recommendations"])
 class _MetadataBadLLM:
    def __init__(self) -> None:
        self.call_count = 0
        self.last_response_metadata: dict[str, Any] | None = None
    def complete(
        self,
        prompt: str,
        model: str = "",
        config: dict | None = None,
    ) -> str:
        self.call_count += 1
        self.last_response_metadata = {
            "finish_reason": "length",
            "usage": {"input_tokens": 1100, "output_tokens": 1200},
        }
        return ("x" * 9000) + "{"
 def test_invalid_report_preserves_response_metadata_and_long_preview():
    llm = _MetadataBadLLM()
    instr = _instr(
        id="daily-triage-report",
        prompt="Report.",
        trusted_fields=[],
        report_sinks=[{"type": "working-memory", "path": "/tmp"}],
    )
    result = execute_instruction_with_audit(instr, _Event(), {}, llm)
    assert llm.call_count == 2
    assert result.output_validated is False
    assert result.llm_response_metadata == {
        "finish_reason": "length",
        "usage": {"input_tokens": 1100, "output_tokens": 1200},
    }
    assert result.report["llm_response_metadata"] == result.llm_response_metadata
    assert len(result.report["raw_output_preview"]) > 4000
 def test_execute_instruction_with_audit_preserves_invalid_report_with_sinks(
    tmp_path,
    monkeypatch,
--- a/tests/test_admin_sync_api.py
+++ b/tests/test_admin_sync_api.py
@@ -0,0 +1,114 @@
 from __future__ import annotations
 from typing import Any
 import pytest
 from activity_core import api
@pytest.mark.asyncio
 async def test_admin_sync_definitions_only_does_not_require_temporal(
    monkeypatch,
 ) -> None:
    seen: dict[str, Any] = {}
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        seen.update(kwargs)
        return {"ok": True, "ran": {"definitions": True}}
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", None)
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=True,
        schedules=False,
        event_types=False,
    )
    assert result == {"ok": True, "ran": {"definitions": True}}
    assert seen["session_factory"] is api._session_factory
    assert seen["temporal_client"] is None
    assert seen["definitions"] is True
    assert seen["schedules"] is False
    assert seen["event_types"] is False
@pytest.mark.asyncio
 async def test_admin_sync_schedules_only_passes_temporal(monkeypatch) -> None:
    temporal = object()
    seen: dict[str, Any] = {}
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        seen.update(kwargs)
        return {
            "ok": True,
            "schedules": {
                "upserted": 1,
                "paused": 0,
                "deleted_orphans": 0,
            },
        }
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", temporal)
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=False,
        schedules=True,
        event_types=False,
    )
    assert result["schedules"]["upserted"] == 1
    assert seen["temporal_client"] is temporal
    assert seen["definitions"] is False
    assert seen["schedules"] is True
    assert seen["event_types"] is False
@pytest.mark.asyncio
 async def test_admin_sync_all_sync_returns_failure_result(monkeypatch) -> None:
    async def fake_run_sync(**kwargs: Any) -> dict[str, Any]:
        return {
            "ok": False,
            "ran": {
                "definitions": kwargs["definitions"],
                "schedules": kwargs["schedules"],
                "event_types": kwargs["event_types"],
            },
            "errors": [
                {
                    "stage": "event_types",
                    "type": "RuntimeError",
                    "message": "bad event type",
                }
            ],
        }
    monkeypatch.setattr(api, "_session_factory", object())
    monkeypatch.setattr(api, "_temporal_client", object())
    monkeypatch.setattr(api, "run_sync", fake_run_sync)
    result = await api.admin_sync(
        definitions=True,
        schedules=True,
        event_types=True,
    )
    assert result == {
        "ok": False,
        "ran": {
            "definitions": True,
            "schedules": True,
            "event_types": True,
        },
        "errors": [
            {
                "stage": "event_types",
                "type": "RuntimeError",
                "message": "bad event type",
            }
        ],
    }
--- a/tests/test_automation_status.py
+++ b/tests/test_automation_status.py
@@ -0,0 +1,289 @@
 from __future__ import annotations
 import asyncio
 import json
 from datetime import datetime
 from pathlib import Path
 from zoneinfo import ZoneInfo
 from activity_core import automation_status as status
 ACTIVITY_ID = "00000000-0000-0000-0000-000000000123"
 def _window():
    return status.resolve_window(
        "2026-06-26",
        "2026-06-29",
        "Europe/Berlin",
    )
 def _definition(enabled: bool = True):
    return {
        "id": ACTIVITY_ID,
        "name": "Daily Check",
        "enabled": enabled,
        "trigger_type": "cron",
        "trigger_config": {
            "trigger_type": "cron",
            "cron_expression": "0 9 * * *",
            "timezone": "Europe/Berlin",
            "misfire_policy": "skip",
        },
        "source": "test",
    }
 def test_friday_shortcut_resolves_to_previous_friday_start() -> None:
    now = datetime(2026, 6, 29, 12, 0, tzinfo=ZoneInfo("Europe/Berlin"))
    window = status.resolve_window("friday", None, "Europe/Berlin", now=now)
    assert window["since"].isoformat() == "2026-06-26T00:00:00+02:00"
    assert window["until"].isoformat() == "2026-06-29T12:00:00+02:00"
 def test_expected_fires_for_simple_cron_window() -> None:
    fires = status.expected_fires(_definition(), _window())
    assert fires == [
        "2026-06-26T09:00:00+02:00",
        "2026-06-27T09:00:00+02:00",
        "2026-06-28T09:00:00+02:00",
        "2026-06-29T09:00:00+02:00",
    ]
 def test_completed_when_expected_run_exists() -> None:
    run = {
        "run_id": "run-1",
        "activity_id": ACTIVITY_ID,
        "scheduled_for": "2026-06-26T07:00:00+00:00",
        "fired_at": "2026-06-26T07:00:10+00:00",
        "tasks_spawned": 1,
    }
    report = status.classify_activity(
        _definition(),
        _window(),
        [run],
        [{"source": "state_hub_progress", "run_id": "run-1", "output_validated": True}],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=True,
    )
    assert report["status"] == "completed"
 def test_validation_failure_wins_over_completed_run() -> None:
    run = {"run_id": "run-1", "activity_id": ACTIVITY_ID, "scheduled_for": None, "fired_at": "2026-06-26T07:00:10+00:00"}
    report = status.classify_activity(
        _definition(),
        _window(),
        [run],
        [{"source": "working_memory", "run_id": "run-1", "output_validated": False}],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=True,
    )
    assert report["status"] == "validation_failed"
 def test_missed_when_expected_fire_has_no_run_and_runs_available() -> None:
    report = status.classify_activity(
        _definition(),
        _window(),
        [],
        [],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=True,
    )
    assert report["status"] == "missed"
 def test_disabled_schedule_is_not_counted_as_missed() -> None:
    report = status.classify_activity(
        _definition(enabled=False),
        _window(),
        [],
        [],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=True,
    )
    assert report["status"] == "disabled"
 def test_scheduled_definition_reports_one_shot_schedule_id() -> None:
    definition = {
        "id": ACTIVITY_ID,
        "name": "One Shot",
        "enabled": True,
        "trigger_type": "scheduled",
        "trigger_config": {
            "trigger_type": "scheduled",
            "at": "2026-06-26T09:00:00+02:00",
            "timezone": "Europe/Berlin",
        },
        "source": "test",
    }
    report = status.classify_activity(
        definition,
        _window(),
        [],
        [],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=False,
    )
    assert status.automation_schedule_id(_definition()) == f"activity-schedule-{ACTIVITY_ID}"
    assert report["schedule_id"] == f"activity-schedule-{ACTIVITY_ID}-once"
 def test_partial_source_availability_is_unknown_not_missed() -> None:
    report = status.classify_activity(
        _definition(),
        _window(),
        [],
        [],
        None,
        ["2026-06-26T09:00:00+02:00"],
        runs_available=False,
    )
    assert report["status"] == "unknown"
    assert "missed-run verdict is unknown" in report["warnings"][0]
 def test_working_memory_frontmatter_evidence(tmp_path: Path) -> None:
    note = tmp_path / "daily-triage-2026-06-26-run.md"
    note.write_text(
        "---\n"
        "source: activity-core\n"
        f"activity_id: {ACTIVITY_ID}\n"
        "activity_core_run_id: run-1\n"
        "scheduled_for: 2026-06-26T07:00:00+00:00\n"
        "output_validated: false\n"
        "created: 2026-06-26T07:01:00+00:00\n"
        "---\n"
        "body\n",
        encoding="utf-8",
    )
    evidence, source = status.load_working_memory_evidence(str(tmp_path), _window())
    assert source["status"] == "ok"
    assert evidence[0]["run_id"] == "run-1"
    assert evidence[0]["output_validated"] is False
 def _scheduled_definition(enabled: bool = False):
    return {
        "id": "00000000-0000-0000-0000-000000000456",
        "name": "One Shot",
        "enabled": enabled,
        "trigger_type": "scheduled",
        "trigger_config": {
            "trigger_type": "scheduled",
            "at": "2026-06-26T09:00:00+02:00",
            "timezone": "Europe/Berlin",
        },
        "source": "db",
    }
 def test_inventory_report_uses_db_definition_rows(monkeypatch) -> None:
    async def fake_load_definitions(args, warnings):
        return [dict(_definition(), source="db"), _scheduled_definition()], {"status": "ok", "source": "db"}
    async def fake_temporal(host, namespace, definitions, *, timeout_seconds):
        return {
            ACTIVITY_ID: {
                "schedule_id": f"activity-schedule-{ACTIVITY_ID}",
                "available": True,
                "paused": False,
                "missed_catchup_window": 0,
                "last_fired_at": None,
            },
        }, {"status": "ok", "count": 1}
    monkeypatch.setattr(status, "load_definitions", fake_load_definitions)
    monkeypatch.setattr(status, "load_temporal_visibility", fake_temporal)
    args = status.parse_inventory_args(["--format", "json"])
    report, exit_code = asyncio.run(status.build_inventory_report(args))
    assert exit_code == 0
    assert report["sources"]["definitions"] == {"status": "ok", "source": "db"}
    assert report["summary"]["automation_count"] == 2
    assert report["automations"][0]["definition_source"] == "db"
    assert report["automations"][0]["temporal"]["status"] == "active"
    assert report["automations"][1]["schedule_id"].endswith("-once")
 def test_inventory_file_fallback_when_db_url_missing(monkeypatch) -> None:
    monkeypatch.setattr(status, "file_definitions", lambda: [dict(_definition(), source="files")])
    args = status.parse_inventory_args(["--db-url", "", "--temporal-host", ""])
    report, exit_code = asyncio.run(status.build_inventory_report(args))
    assert exit_code == 0
    assert report["sources"]["definitions"]["status"] == "degraded"
    assert report["automations"][0]["definition_source"] == "files"
    assert "ACTCORE_DB_URL is not set" in report["warnings"][0]
 def test_inventory_filters_disabled_definitions() -> None:
    definitions = [_definition(enabled=True), _scheduled_definition(enabled=False)]
    filtered = status.filter_inventory_definitions(
        definitions,
        ids=[],
        names=[],
        enabled=False,
        trigger_types=set(),
    )
    assert [item["name"] for item in filtered] == ["One Shot"]
 def test_inventory_temporal_unavailable_is_warning_not_failure(monkeypatch) -> None:
    async def fake_load_definitions(args, warnings):
        return [_definition()], {"status": "ok", "source": "db"}
    async def fake_temporal(host, namespace, definitions, *, timeout_seconds):
        return {}, {"status": "unavailable", "warning": "Temporal unavailable: nope"}
    monkeypatch.setattr(status, "load_definitions", fake_load_definitions)
    monkeypatch.setattr(status, "load_temporal_visibility", fake_temporal)
    args = status.parse_inventory_args([])
    report, exit_code = asyncio.run(status.build_inventory_report(args))
    assert exit_code == 0
    assert report["automations"][0]["temporal"]["status"] == "not_checked"
    assert report["warnings"] == ["Temporal unavailable: nope"]
 def test_inventory_cli_emits_json(monkeypatch, capsys) -> None:
    monkeypatch.setattr(status, "file_definitions", lambda: [dict(_definition(), source="files")])
    exit_code = asyncio.run(status.async_inventory_main([
        "--db-url", "",
        "--temporal-host", "",
        "--format", "json",
    ]))
    payload = json.loads(capsys.readouterr().out)
    assert exit_code == 0
    assert payload["mode"] == "automation-inventory"
    assert payload["automations"][0]["name"] == "Daily Check"
--- a/tests/test_instruction_evaluation.py
+++ b/tests/test_instruction_evaluation.py
@@ -1,6 +1,7 @@
 from __future__ import annotations
 import json
 from pathlib import Path
 import pytest
@@ -70,7 +71,14 @@ async def test_evaluate_instructions_returns_task_specs_with_audit(monkeypatch)
 async def test_evaluate_instructions_returns_report_payload(monkeypatch) -> None:
    llm = FakeLLMClient(json.dumps({
        "summary": "State Hub has open loose ends.",
-        "recommendations": [{"candidate": "CUST-WP-0045", "action": "work-next"}],
+        "recommendations": [
            {
                "rank": 1,
                "candidate": "CUST-WP-0045",
                "action": "work-next",
                "why": "Open loose ends.",
            }
        ],
    }))
    monkeypatch.setattr(activities, "get_llm_client", lambda: llm)
@@ -209,6 +217,12 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "context": {},
    })
    # Read the live schema file rather than hard-coding it, so the forwarded
    # json_schema assertion tracks schemas/daily-triage-report.json as the
    # contract evolves (ACTIVITY-WP-0016-T02).
    expected_schema = json.loads(
        Path("schemas/daily-triage-report.json").read_text(encoding="utf-8")
    )
    assert llm.calls[0][2] == {
        "model_name": "custodian-triage-balanced",
        "temperature": 0.2,
@@ -216,16 +230,6 @@ async def test_evaluate_instructions_forwards_llm_connect_depth_config(monkeypat
        "max_depth": 2,
        "model_params": {
            "reasoning_effort": "medium",
-            "json_schema": {
+            "json_schema": expected_schema,
                "type": "object",
                "required": ["summary", "recommendations"],
                "properties": {
                    "summary": {"type": "string"},
                    "recommendations": {
                        "type": "array",
                        "items": {"type": "object"},
                    },
                },
            },
        },
    }
--- a/tests/test_issue_sink.py
+++ b/tests/test_issue_sink.py
@@ -34,7 +34,7 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:
    monkeypatch.setattr(httpx, "post", fake_post)
-    ref = IssueCoreRestSink("http://issue-core.test/").emit(TaskSpec(
+    ref = IssueCoreRestSink("http://issue-core.test/", api_key="test-key").emit(TaskSpec(
        title="Run SBOM rescan for activity-core",
        description="SBOM is older than 30 days.",
        target_repo="activity-core",
@@ -67,12 +67,30 @@ def test_issue_core_rest_sink_posts_task_contract(monkeypatch) -> None:
                "triggering_event_id": "scheduled",
                "activity_definition_id": "activity-1",
            },
            "headers": {"Authorization": "Bearer test-key"},
            "timeout": 10.0,
        }
    ]
    assert "review_required" not in posts[0]["json"]
 def test_issue_core_rest_sink_requires_api_key() -> None:
    sink = IssueCoreRestSink("http://issue-core.test/", api_key="")
    with pytest.raises(RuntimeError, match="ISSUE_CORE_API_KEY"):
        sink.emit(TaskSpec(
            title="t",
            description="",
            target_repo="activity-core",
            priority="low",
            labels=[],
            due_in_days=None,
            source_type="rule",
            source_id="r",
            triggering_event_id="e",
            activity_definition_id="a",
        ))
@pytest.mark.asyncio
 async def test_emit_tasks_raises_when_sink_fails(monkeypatch) -> None:
    class FailingSink:
--- a/tests/test_llm_client.py
+++ b/tests/test_llm_client.py
@@ -13,7 +13,12 @@ def test_llm_connect_client_forwards_run_config(monkeypatch) -> None:
            pass
        def json(self) -> dict:
-            return {"content": '{"summary":"ok","recommendations":[]}'}
+            return {
                "content": '{"summary":"ok","recommendations":[]}',
                "finish_reason": "stop",
                "usage": {"input_tokens": 10, "output_tokens": 20},
                "raw_response": {"provider_blob": "not persisted"},
            }
    def fake_post(url: str, json: dict, timeout: float) -> Response:
        captured["url"] = url
@@ -50,3 +55,7 @@ def test_llm_connect_client_forwards_run_config(monkeypatch) -> None:
            "timeout_seconds": 42,
        },
    }
    assert client.last_response_metadata == {
        "finish_reason": "stop",
        "usage": {"input_tokens": 10, "output_tokens": 20},
    }
--- a/tests/test_ops_evidence_sinks.py
+++ b/tests/test_ops_evidence_sinks.py
@@ -166,6 +166,93 @@ def test_state_hub_progress_sink_is_idempotent(monkeypatch) -> None:
    assert result[0]["idempotency_key"] == idempotency_key
 def test_core_hub_interaction_event_sink_posts_and_verifies_compact_event(monkeypatch) -> None:
    posts: list[dict[str, Any]] = []
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        assert url == "http://core-hub.test/api/v2/interaction-events"
        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
        posts.append({"url": url, **kwargs})
        return DummyResponse(
            {
                "id": "event-1",
                "eventType": "ops-endpoint-verified",
                "widgetId": "widget-1",
            }
        )
    def fake_get(url: str, **kwargs: Any) -> DummyResponse:
        assert url == "http://core-hub.test/api/v2/interaction-events"
        assert kwargs["headers"]["Authorization"] == "Bearer runtime-secret"
        return DummyResponse({"data": [{"id": "event-1"}]})
    monkeypatch.setenv("CORE_HUB_RUNTIME_TOKEN", "runtime-secret")
    monkeypatch.setattr(httpx, "post", fake_post)
    monkeypatch.setattr(httpx, "get", fake_get)
    result = persist_ops_inventory_evidence(
        _payload([
            {
                "type": "core-hub-interaction-event",
                "core_hub_url": "http://core-hub.test",
                "widget_id": "widget-1",
                "event_type": "ops-endpoint-verified",
            }
        ])
    )
    assert result == [
        {
            "type": "core-hub-interaction-event",
            "status": "posted",
            "event_type": "ops-endpoint-verified",
            "event_id": "event-1",
            "widget_id": "widget-1",
            "verified": True,
            "context_key": "ops_probe",
        }
    ]
    body = posts[0]["json"]
    assert body["widgetId"] == "widget-1"
    assert body["eventType"] == "ops-endpoint-verified"
    assert body["metadata"]["activity_core_run_id"] == _run_id()
    assert body["metadata"]["endpoint"]["url"] == "http://state-hub.test/health"
    assert body["metadata"]["endpoint"]["widget_ref"] == "ops:endpoint:state-hub-health"
    serialized = json.dumps(body, sort_keys=True)
    assert "runtime-secret" not in serialized
    assert "secret response body" not in serialized
    assert "Authorization" not in serialized
    assert "user:pass" not in serialized
    assert "token=secret" not in serialized
 def test_core_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
    monkeypatch.delenv("CORE_HUB_BASE_URL", raising=False)
    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN", raising=False)
    monkeypatch.delenv("CORE_HUB_RUNTIME_TOKEN_FILE", raising=False)
    monkeypatch.delenv("CORE_HUB_WIDGET_ID", raising=False)
    monkeypatch.delenv("CORE_HUB_WIDGET_MAPPING", raising=False)
    result = persist_ops_inventory_evidence(
        _payload([{"type": "core-hub-interaction-event"}])
    )
    assert result == [
        {
            "type": "core-hub-interaction-event",
            "status": "skipped",
            "reason": "missing_core_hub_config",
            "missing": [
                "CORE_HUB_BASE_URL",
                "CORE_HUB_RUNTIME_TOKEN or CORE_HUB_RUNTIME_TOKEN_FILE",
                "widget_id or CORE_HUB_WIDGET_ID",
            ],
            "context_key": "ops_probe",
        }
    ]
 def test_inter_hub_sink_skips_cleanly_when_config_missing(monkeypatch) -> None:
    monkeypatch.delenv("INTER_HUB_URL", raising=False)
    monkeypatch.delenv("OPS_HUB_KEY", raising=False)
--- a/tests/test_railiance_ops_inventory_wiring.py
+++ b/tests/test_railiance_ops_inventory_wiring.py
@@ -93,12 +93,21 @@ def test_external_configmap_projects_enabled_daily_wsjf_definition(tmp_path) ->
    assert definition.trigger_config["cron_expression"] == "20 7 * * *"
    assert definition.trigger_config["timezone"] == "Europe/Berlin"
    assert instruction["id"] == "daily-triage-report"
    assert instruction["max_tokens"] == 1800
    assert "most 7 recommendations" in instruction["prompt"]
    assert "fewer well-formed" in instruction["prompt"]
    assert instruction["output_schema"] == (
        "/etc/activity-core/schemas/daily-triage-report.json"
    )
    assert instruction["report_sinks"][0]["type"] == "working-memory"
    assert instruction["report_sinks"][1]["event_type"] == "daily_triage"
    schema = _by_kind_name("ConfigMap", "actcore-report-schemas")
    daily_schema = yaml.safe_load(schema["data"]["daily-triage-report.json"])
    recommendations = daily_schema["properties"]["recommendations"]
    assert recommendations["maxItems"] == 7
    assert recommendations["items"]["properties"]["rank"]["maximum"] == 7
 def test_ops_inventory_configmap_contains_probeable_inventory() -> None:
    config = _by_kind_name("ConfigMap", "actcore-ops-service-inventory")
--- a/tests/test_report_sinks.py
+++ b/tests/test_report_sinks.py
@@ -37,6 +37,10 @@ def _payload(sinks: list[dict[str, Any]]) -> dict[str, Any]:
                "output_validated": True,
                "review_required": False,
                "validation_error": None,
                "llm_response_metadata": {
                    "finish_reason": "stop",
                    "usage": {"output_tokens": 50},
                },
            }
        ],
    }
@@ -62,6 +66,8 @@ def test_working_memory_sink_writes_idempotently(tmp_path) -> None:
    assert "output_validated: true" in text
    assert "review_required: false" in text
    assert "model: test-model" in text
    assert "LLM response metadata:" in text
    assert '"finish_reason": "stop"' in text
    assert "State Hub has loose ends." in text
@@ -113,6 +119,10 @@ def test_state_hub_progress_sink_posts(monkeypatch) -> None:
    assert posts[0]["json"]["detail"]["activity_core_run_id"] == payload_run_id()
    assert posts[0]["json"]["detail"]["output_validated"] is True
    assert posts[0]["json"]["detail"]["review_required"] is False
    assert posts[0]["json"]["detail"]["llm_response_metadata"] == {
        "finish_reason": "stop",
        "usage": {"output_tokens": 50},
    }
 def test_state_hub_progress_includes_prior_working_memory_path(
--- a/tests/test_reuse_surface_context_resolver.py
+++ b/tests/test_reuse_surface_context_resolver.py
@@ -0,0 +1,167 @@
 from __future__ import annotations
 import json
 from pathlib import Path
 from typing import Any
 import pytest
 from temporalio.exceptions import ApplicationError
 from activity_core.activities import resolve_context
 from activity_core.context_resolvers import reuse_surface
 from activity_core.context_resolvers.base import CONTEXT_RESOLVER_REGISTRY
 class _Response:
    def __init__(self, payload: Any) -> None:
        self._payload = payload
    def raise_for_status(self) -> None:
        return None
    def json(self) -> Any:
        return self._payload
 class _Completed:
    returncode = 0
    stderr = ""
    def __init__(self, payload: dict[str, Any]) -> None:
        self.stdout = json.dumps(payload)
 def _write_rollout(path: Path) -> None:
    path.write_text(
        """
 domains:
  reuse:
    phase: active
    repos:
      - reuse-surface
      - activity-core
  parked:
    phase: backlog
    repos:
      - ignored-repo
 """.lstrip(),
        encoding="utf-8",
    )
 def _write_cli_only_signals(path: Path) -> None:
    path.write_text(
        """
 signals:
  empty_capability_scaffold:
    enabled: true
  registry_gap:
    enabled: false
  stale_scope:
    enabled: false
  stale_sbom:
    enabled: false
  publish_check_fail:
    enabled: false
 """.lstrip(),
        encoding="utf-8",
    )
 def test_shell_resolver_emits_reuse_surface_gaps_and_advances_cursor(
    tmp_path,
    monkeypatch,
 ) -> None:
    rollout = tmp_path / "rollout.yaml"
    _write_rollout(rollout)
    _write_cli_only_signals(tmp_path / "signals.yml")
    reuse_root = tmp_path / "reuse-surface"
    reuse_root.mkdir()
    (reuse_root / "SCOPE.md").write_text("fresh\n", encoding="utf-8")
    activity_root = tmp_path / "activity-core"
    activity_root.mkdir()
    monkeypatch.setenv("KAIZEN_RUNNER_HOST", "runner")
    def fake_get(url: str, **kwargs: Any) -> _Response:
        assert url.endswith("/repos/")
        return _Response(
            [
                {
                    "slug": "reuse-surface",
                    "host_paths": {"runner": str(reuse_root)},
                },
                {
                    "slug": "activity-core",
                    "host_paths": {"runner": str(activity_root)},
                },
            ]
        )
    def fake_run(cmd: list[str], **kwargs: Any) -> _Completed:
        assert cmd == ["reuse-surface", "report", "gaps", "--format", "json"]
        return _Completed({"empty_scaffolds": ["reuse-surface"]})
    monkeypatch.setattr(reuse_surface.httpx, "get", fake_get)
    monkeypatch.setattr(reuse_surface.subprocess, "run", fake_run)
    import activity_core.context_resolvers  # noqa: F401
    result = CONTEXT_RESOLVER_REGISTRY["shell"]().resolve(
        "reuse_surface_report_gaps",
        None,
        {
            "roster": str(rollout),
            "batch_size": 1,
        },
    )
    assert result == {
        "gaps": [
            {
                "repo": "reuse-surface",
                "root": str(reuse_root),
                "signal": "empty_capability_scaffold",
                "hygiene_signal": "empty_capability_scaffold",
            }
        ]
    }
    state = json.loads((tmp_path / "round-robin-state.json").read_text(encoding="utf-8"))
    assert state["cursor"] == 1
    assert state["last_batch"] == ["reuse-surface"]
 def test_shell_resolver_keeps_kaizen_fallback_for_existing_queries() -> None:
    assert CONTEXT_RESOLVER_REGISTRY["shell"]().resolve("unknown_query", None, {}) == {}
@pytest.mark.asyncio
 async def test_optional_reuse_surface_missing_roster_binds_empty_list(tmp_path) -> None:
    snapshot = await resolve_context(
        [
            {
                "type": "shell",
                "query": "reuse_surface_report_gaps",
                "params": {"roster": str(tmp_path / "missing.yaml")},
                "bind_to": "context.gaps",
            }
        ]
    )
    assert snapshot == {"gaps": []}
@pytest.mark.asyncio
 async def test_required_reuse_surface_missing_roster_fails_visibly(tmp_path) -> None:
    with pytest.raises(ApplicationError, match="Required context resolver"):
        await resolve_context(
            [
                {
                    "type": "shell",
                    "query": "reuse_surface_report_gaps",
                    "params": {"roster": str(tmp_path / "missing.yaml")},
                    "bind_to": "context.gaps",
                    "required": True,
                }
            ]
        )
--- a/tests/test_schedule_health.py
+++ b/tests/test_schedule_health.py
@@ -0,0 +1,81 @@
 """ACTIVITY-WP-0014 T03: missed-fire detection verdict tests."""
 from __future__ import annotations
 from datetime import datetime, timedelta, timezone
 from activity_core.schedule_health import evaluate_schedule_health
 NOW = datetime(2026, 6, 23, 12, 0, tzinfo=timezone.utc)
 def test_healthy_when_recent_fire_and_no_drops() -> None:
    health = evaluate_schedule_health(
        activity_id="a1",
        missed_catchup_window=0,
        last_fired_at=NOW - timedelta(minutes=5),
        now=NOW,
        expected_interval=timedelta(hours=1),
    )
    assert health.healthy is True
    assert health.missed is False
    assert health.reasons == []
 def test_unhealthy_when_catchup_window_dropped_fires() -> None:
    health = evaluate_schedule_health(
        activity_id="a1",
        missed_catchup_window=2,
        last_fired_at=NOW - timedelta(minutes=5),
        now=NOW,
    )
    assert health.missed is True
    assert "2 fire(s) dropped" in health.reasons[0]
 def test_unhealthy_when_last_fire_too_stale() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=NOW - timedelta(days=2),
        now=NOW,
        expected_interval=timedelta(days=1),
    )
    assert health.missed is True
    assert any("exceeding the expected" in r for r in health.reasons)
    assert health.staleness == timedelta(days=2)
 def test_within_tolerance_is_healthy() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=NOW - (timedelta(days=1) + timedelta(minutes=5)),
        now=NOW,
        expected_interval=timedelta(days=1),
        tolerance=timedelta(minutes=10),
    )
    assert health.healthy is True
 def test_no_fire_recorded_for_due_schedule_is_unhealthy() -> None:
    health = evaluate_schedule_health(
        activity_id="daily",
        missed_catchup_window=0,
        last_fired_at=None,
        now=NOW,
        expected_interval=timedelta(days=1),
    )
    assert health.missed is True
    assert "no recorded fire" in health.reasons[0]
 def test_no_interval_and_no_fire_is_not_flagged() -> None:
    # Without an expected interval we cannot assert a miss from absence alone.
    health = evaluate_schedule_health(
        activity_id="event-ish",
        missed_catchup_window=0,
        last_fired_at=None,
        now=NOW,
    )
    assert health.healthy is True
--- a/tests/test_schedule_lifecycle.py
+++ b/tests/test_schedule_lifecycle.py
@@ -37,6 +37,7 @@ def _make_defn(
    misfire_policy: str = "skip",
    enabled: bool = True,
    jitter: int = 0,
    catchup_window_seconds: int | None = None,
 ) -> ActivityDefinition:
    return ActivityDefinition(
        id=uuid.uuid4(),
@@ -46,6 +47,7 @@ def _make_defn(
            cron_expression=cron,
            misfire_policy=misfire_policy,
            jitter_seconds=jitter,
            catchup_window_seconds=catchup_window_seconds,
        ),
    )
@@ -186,6 +188,76 @@ async def test_misfire_policy_compress_sets_overlap_buffer_one(env: WorkflowEnvi
    await delete_schedule(env.client, defn.id)
 # ── ACTIVITY-WP-0014: explicit run-miss policies + catchup window ────────────
@pytest.mark.asyncio
 async def test_skip_sets_short_catchup_window(env: WorkflowEnvironment) -> None:
    """skip = run on trigger or skip: tiny grace window, no real recovery."""
    defn = _make_defn(misfire_policy="skip")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.SKIP
    assert desc.schedule.policy.catchup_window == timedelta(seconds=60)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_catchup_all_recovers_full_window(env: WorkflowEnvironment) -> None:
    """catchup_all = recover every missed fire: long window, BUFFER_ALL."""
    defn = _make_defn(misfire_policy="catchup_all")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ALL
    assert desc.schedule.policy.catchup_window == timedelta(days=365)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_catchup_latest_does_not_accumulate(env: WorkflowEnvironment) -> None:
    """catchup_latest = recover only the most recent missed fire: BUFFER_ONE."""
    defn = _make_defn(misfire_policy="catchup_latest")
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.overlap == ScheduleOverlapPolicy.BUFFER_ONE
    assert desc.schedule.policy.catchup_window == timedelta(hours=24)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_legacy_aliases_map_to_explicit_policies(env: WorkflowEnvironment) -> None:
    """Legacy catchup/compress keep working and pick up the new catchup windows."""
    catchup = _make_defn(misfire_policy="catchup")
    compress = _make_defn(misfire_policy="compress")
    await upsert_schedule(env.client, catchup)
    await upsert_schedule(env.client, compress)
    d1 = await env.client.get_schedule_handle(schedule_id(catchup.id)).describe()
    d2 = await env.client.get_schedule_handle(schedule_id(compress.id)).describe()
    assert d1.schedule.policy.catchup_window == timedelta(days=365)
    assert d2.schedule.policy.catchup_window == timedelta(hours=24)
    await delete_schedule(env.client, catchup.id)
    await delete_schedule(env.client, compress.id)
@pytest.mark.asyncio
 async def test_explicit_catchup_window_override(env: WorkflowEnvironment) -> None:
    """An explicit catchup_window_seconds overrides the per-policy default."""
    defn = _make_defn(misfire_policy="skip", catchup_window_seconds=7200)
    await upsert_schedule(env.client, defn)
    desc = await env.client.get_schedule_handle(schedule_id(defn.id)).describe()
    assert desc.schedule.policy.catchup_window == timedelta(hours=2)
    await delete_schedule(env.client, defn.id)
@pytest.mark.asyncio
 async def test_schedule_smoke_test_creates_one_shot_schedule(
    env: WorkflowEnvironment,
--- a/tests/test_state_hub_context_resolver.py
+++ b/tests/test_state_hub_context_resolver.py
@@ -407,6 +407,70 @@ def test_recently_on_scope_hourly_failure_bubbles(monkeypatch) -> None:
        StateHubContextResolver().resolve("recently_on_scope_hourly", None, {"range": "1h"})
 def test_consistency_sweep_remote_all_posts_batch(monkeypatch) -> None:
    calls: list[dict[str, Any]] = []
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        calls.append({"url": url, **kwargs})
        return DummyResponse(
            {
                "exit_code": 0,
                "lock_skipped": False,
                "repos_processed": [{"repo_slug": "state-hub", "result": "pass"}],
                "skipped_clean": ["quiet-repo"],
                "skipped_missing": [],
                "skipped_budget": [],
            }
        )
    monkeypatch.setenv("STATE_HUB_URL", "http://state-hub.test/")
    monkeypatch.setattr(httpx, "post", fake_post)
    result = StateHubContextResolver().resolve(
        "consistency_sweep_remote_all",
        None,
        {"max_seconds": 300, "source": "activity-core", "required": True},
    )
    assert result["exit_code"] == 0
    assert result["repos_processed"][0]["repo_slug"] == "state-hub"
    assert calls == [
        {
            "url": "http://state-hub.test/consistency/sweep/remote-all",
            "json": {"max_seconds": 300, "source": "activity-core"},
            "timeout": 330.0,
        }
    ]
 def test_consistency_sweep_remote_all_failure_bubbles(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        raise httpx.ConnectError("offline")
    monkeypatch.setattr(httpx, "post", fake_post)
    with pytest.raises(httpx.ConnectError):
        StateHubContextResolver().resolve(
            "consistency_sweep_remote_all",
            None,
            {"max_seconds": 300},
        )
 def test_consistency_sweep_remote_all_rejects_empty_response(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse({})
    monkeypatch.setattr(httpx, "post", fake_post)
    with pytest.raises(RuntimeError, match="missing required key"):
        StateHubContextResolver().resolve(
            "consistency_sweep_remote_all",
            None,
            {"max_seconds": 300},
        )
 def test_recently_on_scope_hourly_rejects_empty_response(monkeypatch) -> None:
    def fake_post(url: str, **kwargs: Any) -> DummyResponse:
        return DummyResponse({})
--- a/tests/test_state_hub_write.py
+++ b/tests/test_state_hub_write.py
@@ -0,0 +1,81 @@
 """ACTIVITY-WP-0014 T05: idempotency-keyed State Hub writes."""
 from __future__ import annotations
 import httpx
 import pytest
 from activity_core import report_sinks
 from activity_core.state_hub_write import (
    IDEMPOTENCY_HEADER,
    idempotency_headers,
    idempotency_key,
 )
 def test_key_is_stable_and_deterministic() -> None:
    a = idempotency_key("run1", "daily-triage-report", "daily_triage")
    b = idempotency_key("run1", "daily-triage-report", "daily_triage")
    assert a == b == "run1:daily-triage-report:daily_triage"
 def test_key_shape_stable_with_missing_parts() -> None:
    assert idempotency_key("run1", None, "daily_triage") == "run1::daily_triage"
 def test_key_sanitizes_control_and_whitespace() -> None:
    key = idempotency_key("run 1", "a\tb", "x\n")
    assert "\t" not in key and "\n" not in key and " " not in key
 def test_headers_carry_the_key() -> None:
    headers = idempotency_headers("run1", "i", "e")
    assert headers == {IDEMPOTENCY_HEADER: "run1:i:e"}
 def test_distinct_identities_get_distinct_keys() -> None:
    assert idempotency_key("r", "i", "daily_triage") != idempotency_key(
        "r", "i", "schedule_miss"
    )
 def test_progress_exists_is_best_effort_on_connection_error(monkeypatch) -> None:
    """A down State Hub must not hard-fail the dedup read; it returns False so the
    keyed write can still proceed."""
    def _boom(*args, **kwargs):
        raise httpx.ConnectError("Connection refused")
    monkeypatch.setattr(report_sinks.httpx, "get", _boom)
    assert (
        report_sinks._progress_exists(
            "http://127.0.0.1:8000", "run1", "daily-triage-report", "daily_triage"
        )
        is False
    )
 def test_report_sink_post_sends_idempotency_header(monkeypatch) -> None:
    """The state-hub-progress write carries a stable Idempotency-Key header."""
    captured: dict[str, object] = {}
    monkeypatch.setattr(report_sinks, "_progress_exists", lambda *a, **k: False)
    class _Resp:
        def raise_for_status(self) -> None: ...
        def json(self) -> dict[str, str]:
            return {"id": "pid-1"}
    def _capture_post(url, json, headers, timeout):  # noqa: A002
        captured["headers"] = headers
        return _Resp()
    monkeypatch.setattr(report_sinks.httpx, "post", _capture_post)
    payload = {"run_id": "run1", "activity_id": "act1", "scheduled_for": None}
    report_entry = {"instruction_id": "daily-triage-report", "report": {"summary": "s"}}
    sink = {"event_type": "daily_triage"}
    result = report_sinks._post_state_hub_progress(payload, report_entry, sink)
    assert result["status"] == "posted"
    assert captured["headers"][IDEMPOTENCY_HEADER] == "run1:daily-triage-report:daily_triage"
--- a/tests/test_sync_schedules.py
+++ b/tests/test_sync_schedules.py
@@ -0,0 +1,126 @@
 from __future__ import annotations
 import uuid
 from datetime import datetime, timezone
 from types import SimpleNamespace
 from typing import Any
 import pytest
 from activity_core import sync_schedules
 def _row(
    *,
    activity_id: uuid.UUID,
    enabled: bool,
    trigger_config: dict[str, Any],
 ) -> SimpleNamespace:
    return SimpleNamespace(
        id=activity_id,
        name=f"definition-{activity_id}",
        enabled=enabled,
        trigger_config=trigger_config,
        context_sources=[],
        task_templates=[],
        dedupe_key_strategy="skip",
        version=1,
    )
@pytest.mark.asyncio
 async def test_sync_schedule_rows_reports_drift_counts_and_preserves_one_shots(
    monkeypatch,
 ) -> None:
    new_id = uuid.uuid4()
    disabled_old_id = uuid.uuid4()
    one_shot_id = uuid.uuid4()
    orphan_id = uuid.uuid4()
    upserted: list[tuple[uuid.UUID, bool, str]] = []
    deleted: list[str] = []
    async def fake_upsert_schedule(client: object, defn: object) -> None:
        upserted.append((
            defn.id,
            defn.enabled,
            defn.trigger_config.trigger_type,
        ))
    async def fake_list_schedules(client: object) -> list[dict[str, str]]:
        return [
            {
                "schedule_id": f"activity-schedule-{disabled_old_id}",
                "activity_id": str(disabled_old_id),
            },
            {
                "schedule_id": f"activity-schedule-{one_shot_id}-once",
                "activity_id": f"{one_shot_id}-once",
            },
            {
                "schedule_id": f"activity-schedule-{orphan_id}",
                "activity_id": str(orphan_id),
            },
        ]
    async def fake_delete_schedule(client: object, activity_id: str) -> None:
        deleted.append(activity_id)
    monkeypatch.setattr(sync_schedules, "upsert_schedule", fake_upsert_schedule)
    monkeypatch.setattr(sync_schedules, "list_schedules", fake_list_schedules)
    monkeypatch.setattr(sync_schedules, "delete_schedule", fake_delete_schedule)
    result = await sync_schedules.sync_schedule_rows(
        object(),
        [
            _row(
                activity_id=new_id,
                enabled=True,
                trigger_config={
                    "trigger_type": "cron",
                    "cron_expression": "20 7 * * *",
                    "timezone": "Europe/Berlin",
                    "misfire_policy": "skip",
                },
            ),
            _row(
                activity_id=disabled_old_id,
                enabled=False,
                trigger_config={
                    "trigger_type": "cron",
                    "cron_expression": "20 * * * *",
                    "timezone": "Europe/Berlin",
                    "misfire_policy": "skip",
                },
            ),
            _row(
                activity_id=one_shot_id,
                enabled=True,
                trigger_config={
                    "trigger_type": "scheduled",
                    "at": datetime(2026, 6, 19, 8, 0, tzinfo=timezone.utc),
                    "timezone": "UTC",
                },
            ),
            _row(
                activity_id=uuid.uuid4(),
                enabled=True,
                trigger_config={
                    "trigger_type": "event",
                    "event_type": "kaizen.metrics.recorded",
                    "filters": {},
                },
            ),
        ],
    )
    assert result.to_dict() == {
        "upserted": 2,
        "paused": 1,
        "deleted_orphans": 1,
    }
    assert upserted == [
        (new_id, True, "cron"),
        (disabled_old_id, False, "cron"),
        (one_shot_id, True, "scheduled"),
    ]
    assert deleted == [str(orphan_id)]
--- a/tests/test_sync_service.py
+++ b/tests/test_sync_service.py
@@ -0,0 +1,134 @@
 from __future__ import annotations
 from typing import Any
 import pytest
 from activity_core import sync_service
 from activity_core.sync_schedules import ScheduleSyncResult
@pytest.mark.asyncio
 async def test_run_sync_runs_requested_sections(monkeypatch) -> None:
    calls: list[str] = []
    async def fake_definitions(session_factory: object) -> int:
        calls.append("definitions")
        return 2
    async def fake_event_types(session_factory: object) -> int:
        calls.append("event_types")
        return 5
    async def fake_schedules(
        temporal_client: object,
        session_factory: object,
    ) -> ScheduleSyncResult:
        calls.append("schedules")
        return ScheduleSyncResult(upserted=3, paused=1, deleted_orphans=2)
    monkeypatch.setattr(sync_service, "sync_activity_definitions", fake_definitions)
    monkeypatch.setattr(sync_service, "sync_event_types", fake_event_types)
    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=object(),
        definitions=True,
        schedules=True,
        event_types=True,
    )
    assert calls == ["definitions", "event_types", "schedules"]
    assert result["ok"] is True
    assert result["ran"] == {
        "definitions": True,
        "schedules": True,
        "event_types": True,
    }
    assert result["definitions"] == {"synced": 2}
    assert result["event_types"] == {"synced": 5}
    assert result["schedules"] == {
        "upserted": 3,
        "paused": 1,
        "deleted_orphans": 2,
    }
    assert result["errors"] == []
@pytest.mark.asyncio
 async def test_run_sync_collects_errors_and_continues(monkeypatch) -> None:
    calls: list[str] = []
    async def failing_definitions(session_factory: object) -> int:
        calls.append("definitions")
        raise RuntimeError("definition parse failed")
    async def fake_schedules(
        temporal_client: object,
        session_factory: object,
    ) -> ScheduleSyncResult:
        calls.append("schedules")
        return ScheduleSyncResult(upserted=1)
    monkeypatch.setattr(
        sync_service,
        "sync_activity_definitions",
        failing_definitions,
    )
    monkeypatch.setattr(sync_service, "sync_with_session_factory", fake_schedules)
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=object(),
        definitions=True,
        schedules=True,
        event_types=False,
    )
    assert calls == ["definitions", "schedules"]
    assert result["ok"] is False
    assert result["definitions"] == {"synced": 0}
    assert result["schedules"]["upserted"] == 1
    assert result["errors"] == [
        {
            "stage": "definitions",
            "type": "RuntimeError",
            "message": "definition parse failed",
        }
    ]
@pytest.mark.asyncio
 async def test_run_sync_reports_missing_temporal_client_for_schedules() -> None:
    result = await sync_service.run_sync(
        session_factory=object(),
        temporal_client=None,
        definitions=False,
        schedules=True,
        event_types=False,
    )
    assert result["ok"] is False
    assert result["errors"] == [
        {
            "stage": "schedules",
            "type": "RuntimeError",
            "message": "Temporal client is required for schedule sync",
        }
    ]
 def test_record_error_bounds_error_count() -> None:
    result: dict[str, Any] = {
        "ok": True,
        "errors": [],
    }
    for i in range(25):
        sync_service._record_error(result, "stage", RuntimeError(f"boom {i}"))
    assert result["ok"] is False
    assert len(result["errors"]) == 20
    assert result["errors"][0]["message"] == "boom 0"
    assert result["errors"][-1]["message"] == "boom 19"
--- a/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
+++ b/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Post-triage operational hardening"
 domain: custodian
 repo: activity-core
-status: active
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-03"
-updated: "2026-06-16"
+updated: "2026-06-30"
 state_hub_workstream_id: "5646e13a-13af-4724-bca6-3c0d86f96733"
 ---
@@ -104,7 +104,7 @@ and emitted a validated `daily_triage` report plus working-memory note.
 ```task
 id: ACTIVITY-WP-0006-T03
-status: wait
+status: done
 priority: medium
 state_hub_task_id: "7cbf0a35-71a1-47ac-afc2-f51ad2180fd0"
 ```
@@ -174,6 +174,56 @@ the worker consumes the configured URL, then produce schema-valid daily triage
 evidence and three clean scheduled runs. This narrower path is tracked in
 `ACTIVITY-WP-0010`.
 2026-06-25: Consecutive-run streak resumed. State Hub `daily_triage` progress
 events from author `activity-core` fired on time on **2026-06-24 05:20:56Z** and
 **2026-06-25 05:20:47Z** (07:20 Berlin), both delivered, no misfires. That is two
 clean consecutive scheduled runs. **RECHECK 2026-06-26 (after 05:20Z):** confirm
 the 06-26 scheduled `daily_triage` event delivered. If clean, that completes three
 clean consecutive scheduled runs (06-24 / 06-25 / 06-26) — record the calibration
 result in State Hub and close T03. If the 06-26 run misfires or is missing, the
 streak resets and T03 stays `wait`. Flag deliberately kept in-repo (agent-agnostic)
 rather than tied to any single coding agent's scheduler.
 2026-06-26 recheck outcome: **streak reset at two.** The 06-26 scheduled run fired
 on time (`daily_triage` event 05:20:57Z) — scheduling layer healthy, no misfire —
 but the `daily-triage-report` instruction output **failed schema validation**:
 `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The model produced a
 long ranked WSJF recommendation list (reached rank 7+ with nested `wsjf` objects)
 whose JSON broke ~char 5268; only a bounded 4000-char preview is preserved in the
 State Hub event, so the exact offending token needs the runtime llm-connect log.
 This is an LLM-output-quality failure (tracked by `ACTIVITY-WP-0010`), not a
 runtime/projection failure. T03 stays `wait`; three clean consecutive scheduled
 runs not yet achieved (06-24 ✅, 06-25 ✅, 06-26 ✗-validation).
 2026-06-27 recheck outcome: streak remains reset. The scheduled run fired and
 wrote State Hub progress plus working memory, but daily-triage-report failed
 validation again with an unterminated string around char 5246. This confirms the
 runner/sink path is alive and the active blocker is live deployment of the
 ACTIVITY-WP-0016 output-robustness bundle and runtime prompt/token changes, not
 a missing schedule. T03 stays wait until a post-deployment smoke passes and three
 new clean scheduled runs are collected.
 2026-06-30 early checkpoint: two new clean scheduled runs exist after the
 validation failures. State Hub daily_triage progress shows 2026-06-28
 05:20:51Z run `6a44d6dd-3f02-53f2-a5d8-d42b76b0ef98` and 2026-06-29
 05:20:49Z run `1dfb47c9-07bf-551b-b778-1d21a40bd95c`, both with
 `output_validated=true` and working-memory notes written. The current local time
 was 2026-06-30 01:37 Europe/Berlin, before the expected 07:20 Berlin scheduled
 fire, so the three-clean-run gate cannot close yet. Recheck after 2026-06-30
 05:20Z; if that scheduled run validates, the clean streak is 06-28 / 06-29 /
 06-30 and T03 can close with calibration feedback.
 2026-06-30 closeout: the 07:20 Berlin scheduled run fired at 05:20:50Z as run
 `ac3d71a0-2f8f-50df-b3ce-7c60c2abb5c5` with `output_validated=true` and a
 working-memory note written. The post-failure clean streak is now complete:
 2026-06-28 (`6a44d6dd`), 2026-06-29 (`1dfb47c9`), and 2026-06-30 (`ac3d71a0`).
 Calibration feedback: the scheduler, worker, llm-connect route, State Hub sink,
 and working-memory sink are stable again; the recommendations were operationally
 useful but too dense at 10 items, repeatedly emphasizing human-dependency and
 infrastructure-unblock work. ACTIVITY-WP-0016 now owns the density/contract fix:
 Railiance runtime projection was aligned to a top-7 contract so the next live
 run can prove the bounded output posture. T03 is done.
 ## Rule Action Contract Documentation
 ```task
--- a/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
+++ b/workplans/ACTIVITY-WP-0010-daily-triage-llm-reconciliation.md
@@ -8,7 +8,7 @@ status: blocked
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
-updated: "2026-06-18"
+updated: "2026-06-27"
 state_hub_workstream_id: "f2c73ac6-13f0-4005-82cc-76c7c9f9c8b9"
 ---
@@ -87,7 +87,7 @@ reported 9 passed.
 ```task
 id: ACTIVITY-WP-0010-T02
-status: wait
+status: done
 priority: high
 state_hub_task_id: "23545ddc-926b-485a-8535-5cc11e01134a"
 ```
@@ -107,6 +107,30 @@ Current wait reason: this is Railiance/operator-owned live cluster work. State
 Hub handoff message `9a074b7c-4b87-4e3c-a6bf-e1fe5580daa8` asks
 `railiance-cluster` to reconcile the updated config and smoke it.
 2026-06-19 recheck:
 - Deployed `llm-connect` into the `activity-core` namespace on `railiance01`
  (the cluster that runs `actcore-worker`). `coulombcore` had llm-connect only;
  the in-cluster Service URL is cluster-local.
 - `actcore-runtime-config` already exposed the verified URL and timeout;
  `deployment/actcore-worker` was restarted and now reports
  `LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080`.
 - `llm-connect-provider-secrets` reports `DATA 1`; no Secret values were
  inspected.
 - Worker health probe to llm-connect `/health` returns `{"status": "ok"}`.
 - `actcore-state-hub-bridge` remains `0/1` Ready with upstream timeouts, so T02
  is not fully closed until the node-local State Hub tunnel is restored.
 2026-06-27 recheck:
 - Superseded by real scheduled runner evidence: State Hub daily_triage events on
  2026-06-24, 2026-06-25, 2026-06-26, and 2026-06-27 all reached State Hub and
  wrote working-memory notes. The bridge/sink is therefore reachable for the
  live runner.
 - 2026-06-24 and 2026-06-25 were schema-valid; 2026-06-26 and 2026-06-27 failed
  output validation after calling llm-connect. That moves the active blocker out
  of T02 and into the WP-0016 live bundle/smoke lane. Marking T02 done.
 ## Run Daily Triage Fixture Smoke
 ```task
@@ -128,6 +152,27 @@ Done when:
  detail;
 - `scripts/verify_daily_triage.py` reports the smoke/manual run as present.
 2026-06-19 recheck:
 - In-namespace llm-connect fixture smoke on `railiance01` passed:
  `smoke: pass health=ok latency_seconds=1.681 recommendations=1`.
 - Manual `POST /activity-definitions/6fca51fa-387a-4fd0-bc4e-d62c29eb859a/trigger`
  reached llm-connect, but the workflow failed at `persist_instruction_reports`
  with `state-hub-progress` sink `Connection refused` while
  `actcore-state-hub-bridge` is unhealthy.
 - T03 therefore remains open until State Hub bridge reachability is restored and
  a run emits non-secret `daily_triage` progress with `output_validated=true`.
 2026-06-27 recheck:
 - Scheduled runs on 2026-06-24 and 2026-06-25 satisfy the non-secret smoke
  evidence for llm-connect call, State Hub progress with output_validated=true,
  and working-memory note creation.
 - Kept T03 at progress rather than done because the workstation did not run the
  live verifier against Temporal/activity-core DB, and the smoke must be repeated
  after the WP-0016 code/schema/runtime-prompt deployment due the 2026-06-26 and
  2026-06-27 malformed-output failures.
 ## Collect Three Clean Scheduled Runs
 ```task
@@ -151,6 +196,14 @@ Done when:
 - `ACTIVITY-WP-0006-T03` and `ACTIVITY-WP-0009-T01` can move from `wait` to
  `done`.
 2026-06-27 recheck:
 - Three-clean-run streak is reset. The latest sequence is 2026-06-24 clean,
  2026-06-25 clean, 2026-06-26 validation_failed, 2026-06-27 validation_failed.
 - Current pickup is to deploy ACTIVITY-WP-0016 code/schema together with the
  Railiance runtime prompt and max_tokens changes, run a live smoke, then restart
  the three-consecutive-scheduled-run gate from zero.
 ## Close Handoff State
 ```task
--- a/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
+++ b/workplans/ACTIVITY-WP-0012-definition-schedule-hot-reload.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Definition And Schedule Hot Reload"
 domain: custodian
 repo: activity-core
-status: active
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-18"
-updated: "2026-06-18"
+updated: "2026-06-22"
 state_hub_workstream_id: "8887075e-21ec-451b-b82b-cd81035c9ca5"
 ---
@@ -39,7 +39,7 @@ a repo checkout manager or CI system.
 ```task
 id: ACTIVITY-WP-0012-T01
-status: todo
+status: done
 priority: high
 state_hub_task_id: "53a7970b-7eec-47f5-ad30-bbd7c6271952"
 ```
@@ -57,11 +57,17 @@ Done when:
 - failures are collected into a bounded `errors[]` result while preserving the
  current startup best-effort behavior.
 2026-06-19: Completed. Added `activity_core.sync_service.run_sync`, which
 orchestrates ActivityDefinition, event type, and schedule sync independently
 from explicit DB session factory and Temporal client dependencies. Worker
 startup now calls the shared service for definitions+schedules and logs bounded
 stage errors while continuing startup.
 ## Add Admin Sync Endpoint
 ```task
 id: ACTIVITY-WP-0012-T02
-status: todo
+status: done
 priority: high
 state_hub_task_id: "8697c761-15d1-4da0-b66b-d838218a2495"
 ```
@@ -80,11 +86,17 @@ Done when:
 - endpoint tests cover definitions-only, schedules-only, all-sync, and failure
  result behavior.
 2026-06-19: Completed. Added `POST /admin/sync` with defaults
 `definitions=true`, `schedules=true`, and `event_types=false`. The response
 reports definition/event counts, schedule upsert/pause/orphan-delete counts, and
 bounded `errors[]`. Tests cover definitions-only, schedules-only, all-sync, and
 failure-result behavior.
 ## Preserve Schedule Drift Semantics
 ```task
 id: ACTIVITY-WP-0012-T03
-status: todo
+status: done
 priority: high
 state_hub_task_id: "efeac412-632c-4c90-9428-bb575ac7a624"
 ```
@@ -101,11 +113,18 @@ Done when:
 - regression tests demonstrate the Coulomb hourly-to-daily rename shape without
  needing a worker restart.
 2026-06-19: Completed. `sync_schedules` now returns explicit counts for enabled
 schedule upserts, disabled schedule pauses, and orphan deletes. Regression tests
 cover the hourly-to-daily rename shape: a new enabled cron schedule is upserted,
 the old disabled cron schedule is preserved as paused, unrelated orphan
 schedules are deleted, event-triggered definitions do not create schedules, and
 one-shot scheduled definitions are no longer mistaken for orphans.
 ## Optional Background Sync Loop
 ```task
 id: ACTIVITY-WP-0012-T04
-status: todo
+status: done
 priority: medium
 state_hub_task_id: "d774087b-c51d-4444-8e90-bfef43765456"
 ```
@@ -121,11 +140,17 @@ Done when:
  last error summary;
 - the loop does not block worker startup or workflow task processing.
 2026-06-19: Completed by decision. v1 stays manual/operator-triggered through
 `POST /admin/sync`; no background loop was added. The runbook records this
 posture so customer definition changes stay explicit and the worker does not
 start background repo scanning. A periodic loop remains a future option if live
 operator use proves it is needed.
 ## Live No-Restart Smoke
 ```task
 id: ACTIVITY-WP-0012-T05
-status: wait
+status: done
 priority: high
 state_hub_task_id: "68a0e22a-106a-4d21-9f39-c6279850cb5e"
 ```
@@ -141,5 +166,27 @@ Done when non-secret State Hub evidence shows:
 - event-triggered definitions still fire normally;
 - rollback or repeat sync is idempotent.
-Current wait reason: this gate depends on the implementation tasks and a
+2026-06-22: Completed on Railiance01 (`KUBECONFIG=~/.kube/config-hosteurope`).
-cluster-owned smoke path.
+
 Smoke target: disabled projection `ops-service-inventory-probes`
 (`40d15a87-7ff6-4d8e-992c-37df15f95110`) in
 `actcore-external-activity-definitions`.
 Evidence:
 - ConfigMap flip `enabled: false -> true` and cadence `15 * * * * -> 25 * * * *`,
  then `POST /admin/sync?definitions=true&schedules=true` from `actcore-api`.
 - DB after sync: `enabled=true`, `cron=25 * * * *`.
 - Temporal schedule after sync: `paused=false`, calendar minute `25`.
 - Repeat sync returned identical schedule counts
  (`upserted=5`, `paused=1`, `deleted_orphans=0`) — idempotent.
 - Rollback flip restored `enabled=false`, `cron=15 * * * *`, schedule
  `paused=true`, calendar minute `15`.
 - `actcore-worker` pod UID unchanged (`a68d6539-2bba-457e-a78a-39564002a980`,
  started `2026-06-21T18:46:46Z`); `actcore-event-router` pod UID unchanged.
 - Event-triggered definitions: none projected on Railiance01 today; hot DB
  reload path for event definitions remains covered by T03 unit tests and an
  unchanged event-router deployment.
 Automation: `scripts/smoke_admin_sync_no_restart.py`. Runbook section added
 under "Railiance01 no-restart smoke".
--- a/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
+++ b/workplans/ACTIVITY-WP-0013-reuse-surface-report-gaps-resolver.md
@@ -0,0 +1,78 @@
 ---
 id: ACTIVITY-WP-0013
 type: workplan
 title: "Reuse Surface Report Gaps Resolver"
 domain: custodian
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: activity-core
 created: "2026-06-18"
 updated: "2026-06-18"
 state_hub_workstream_id: "01e68dfd-b146-4aef-a575-2d3b178ca5c2"
 ---
 # Reuse Surface Report Gaps Resolver
 Implement the R2 handoff from kaizen-agentic (`bffa224c`) so the
 `reuse_surface_report_gaps` shell context source populates
 `context.gaps` for the Coulomb daily registry hygiene sweep.
 ## Register Shell Resolver Query
 ```task
 id: ACTIVITY-WP-0013-T01
 status: done
 priority: high
 state_hub_task_id: "a6e1fc5c-7b42-436d-914e-4d605cb6f329"
 ```
 Add a dedicated reuse-surface context resolver module and register
 `reuse_surface_report_gaps` on the `shell` resolver path while preserving
 the existing kaizen shell query behavior.
 ## Implement Batch And Signal Semantics
 ```task
 id: ACTIVITY-WP-0013-T02
 status: done
 priority: high
 state_hub_task_id: "229cf285-8388-471d-95fd-08400db1553e"
 ```
 Load the Coulomb rollout roster, select active repos with a persisted
 round-robin cursor, resolve repo roots from State Hub host paths, run
 `reuse-surface report gaps --format json`, and emit gap records for the
 enabled registry hygiene signals.
 ## Cover Required And Optional Failure Modes
 ```task
 id: ACTIVITY-WP-0013-T03
 status: done
 priority: high
 state_hub_task_id: "85b5c7d4-40e1-4945-8ada-1dff2363c194"
 ```
 Ensure missing required dependencies fail visibly while optional resolver
 sources bind an empty `context.gaps` list. Add unit coverage for fixture
 rollout data, mocked CLI JSON, resolver binding, and `hygiene_signal`
 rule gating.
 ## Smoke Real Coulomb Rollout
 ```task
 id: ACTIVITY-WP-0013-T04
 status: done
 priority: medium
 state_hub_task_id: "6a5446ed-b4ec-4693-b508-65415571d834"
 ```
 Run a live resolver smoke against
 `/home/worsch/coulomb-loop/loops/registry-hygiene/rollout.yaml` using a
 temporary round-robin cursor. The real active rollout produced five gaps,
 including one for `reuse-surface` with `hygiene_signal: stale_sbom`.
 The smoke supplied `reuse_surface_bin:
 /home/worsch/reuse-surface/.venv/bin/reuse-surface` and
 `runner_host: bnt-lap001`; the worker environment or definition params must
 provide equivalent values before enabling the production sweep.
--- a/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
+++ b/workplans/ACTIVITY-WP-0014-schedule-misfire-robustness.md
@@ -0,0 +1,194 @@
 ---
 id: ACTIVITY-WP-0014
 type: workplan
 title: "Schedule Misfire Robustness & Run-Miss Recovery Options"
 domain: infotech
 repo: activity-core
 status: finished
 owner: claude
 topic_slug: activity-core
 created: "2026-06-23"
 updated: "2026-06-24"
 status_note: "T01-T05 complete; beachhead-endpoint adoption split to ACTIVITY-WP-0015"
 state_hub_workstream_id: "91b64686-5d17-4c86-bc9e-3d0ee6720cf5"
 ---
 # Schedule Misfire Robustness & Run-Miss Recovery Options
 Make cron-triggered ActivityDefinitions robust to missed fires (worker/Temporal
 unavailable at trigger time) with explicit, per-definition recovery behaviour,
 plus detection/alerting when a scheduled fire is missed.
 ## Motivation
 On 2026-06-22 and 2026-06-23 the `daily-statehub-wsjf-triage` definition
 (cron `20 7 * * *` Europe/Berlin, projected into the Railiance runtime ConfigMap
 `actcore-external-activity-definitions`) produced **no `daily_triage` progress
 event at all** — neither a success nor a `could not run; operator review
 required` failure.
 > **Corrected by T01 (2026-06-23).** The initial hypothesis below — that
 > `_build_schedule()` never set `catchup_window`, so a short-default catchup
 > window silently dropped the fire — was **disproven on the live cluster**. The
 > Temporal schedule is healthy with `CatchupWindow 365d` (the server default) and
 > `0 MissedCatchupWindow`. The real cause is that the run **fired and ran but
 > failed at the report sink** with `Connection refused` posting to State Hub,
 > because railiance01 reaches State Hub via a reverse tunnel back to the
 > workstation, which is asleep at 07:20 Berlin. See the T01 findings and T05.
 The trigger now originates entirely on **railiance01** (in-cluster Temporal
 Schedule, ConfigMap-projected definition) and is **not** laptop-dependent — but
 the triage's State Hub *data dependencies* (context resolution and report
 delivery) still route back to the workstation State Hub.
 This workplan still delivers worthwhile robustness — explicit run-miss recovery
 policies (T02) and missed-fire detection (T03) — but the fix for *this* incident
 is T05 (resilient sinks/resolvers + a workstation-independent State Hub endpoint).
 ## Desired run-miss options (from Bernd)
 Three explicit, per-definition behaviours when a fire is missed:
 1. **Run on trigger or skip** — never recover a missed fire.
 2. **Run on trigger or later if missed** — recover **all** missed fires when back up.
 3. **Run on trigger or later if missed, but skip if next trigger reached** —
   recover only the **most recent** missed fire; do not accumulate a backlog.
 Proposed mapping to a new `misfire_policy` value set (names open to review):
 | Policy | Semantics | Temporal mapping |
 | --- | --- | --- |
 | `skip` | Run on trigger or skip | `catchup_window ≈ 0`, `overlap=SKIP` |
 | `catchup_all` | Run on trigger or all missed later | `catchup_window=<long>`, `overlap=BUFFER_ALL` |
 | `catchup_latest` | Run on trigger or only the latest missed | `catchup_window ≈ 1 interval`, `overlap=BUFFER_ONE` |
 ## Confirm root cause on Railiance01
 ```task
 id: ACTIVITY-WP-0014-T01
 status: done
 priority: high
 state_hub_task_id: "c90ff214-9214-48c7-96b9-7d699528d5ab"
 ```
 Inspected via `ssh railiance01` + in-node `kubectl`/`temporal` (no k3s tunnel is
 defined for railiance01; the documented access path is SSH to the host).
 **Findings (2026-06-23) — the WP-0014 premise was wrong for this incident:**
 - All pods healthy; `actcore-worker` up 44h, 0 restarts. Not a crash.
 - The daily-triage Temporal schedule (`activity-schedule-6fca51fa-…`) is
  **healthy**: `Paused false`, `OverlapPolicy Skip`, **`CatchupWindow 365d`**
  (Temporal's *default* when unset), `ActionCounts {Total:8, MissedCatchupWindow:0}`.
  So fires were **not** silently dropped — my original "no catchup window → silent
  drop" hypothesis does not hold; the server default is already 365d.
 - The `2026-06-23T05:20:00Z` fire **did fire and ran**, then **Failed at the report
  sink**: `report sink failure: state-hub-progress … '[Errno 111] Connection
  refused'`. The run produced a report but could not deliver it to State Hub, so
  no `daily_triage` progress event (not even a "could not run" one) was posted →
  the silence. The 06-22 fire has no execution in retention (bridge likely down
  then too / schedule update window at `LastUpdateAt 1d ago`).
 - Root cause is **State Hub connectivity from railiance01**, not Temporal. The
  in-cluster `actcore-state-hub-bridge` (`hostNetwork`) proxies to
  `127.0.0.1:18000` on the node — the local end of the ops-bridge **reverse tunnel
  back to the workstation's State Hub**. At 07:20 Europe/Berlin (= 05:20 UTC) the
  workstation/tunnel was unreachable → `Connection refused`. Chronic flakiness
  confirmed: 102 State Hub resolver timeouts in 24h (69 `recently_on_scope`,
  33 `consistency_sweep`).
 **Implication:** the trigger *is* independent of the laptop, but the triage's
 **data dependencies (State Hub context resolution + report delivery) still route
 back to the workstation State Hub**, which is asleep at 07:20 Berlin. WP-0014's
 misfire policies are still good robustness, but the real fix is (a) State Hub
 reachable from railiance01 independent of the workstation, and/or (b) sinks/
 resolvers resilient to transient State Hub unavailability (retry/backoff,
 store-and-forward) instead of hard-failing the workflow. Tracked as follow-up
 below. Backfill deferred: a replay only succeeds while the workstation State Hub
 is reachable.
 ## Implement explicit misfire recovery modes
 ```task
 id: ACTIVITY-WP-0014-T02
 status: done
 priority: high
 state_hub_task_id: "19615562-4cb2-4f25-872f-505d6e40dcc5"
 ```
 Add `catchup_window_seconds` to `CronTriggerConfig` and redefine `misfire_policy`
 into the three explicit modes above. In `_build_schedule()` set
 `SchedulePolicy(overlap=..., catchup_window=timedelta(...))` per mode. Remove the
 ad-hoc 1-hour `backfill` hack in favour of native catchup-window semantics. Keep
 backward compatibility for existing `skip`/`catchup`/`compress` values (alias
 map). Unit tests for each mode's `(catchup_window, overlap)` mapping.
 ## Missed-fire detection & alert sink
 ```task
 id: ACTIVITY-WP-0014-T03
 status: done
 priority: medium
 state_hub_task_id: "dbedd96a-59ca-4b83-bce6-35755b076807"
 ```
 Detect when a scheduled definition has no successful run within its expected
 interval + tolerance, and emit a signal (State Hub progress event and/or
 agent-inbox message) so a miss is visible even under `skip`. This is the
 observability the current silent-drop behaviour lacks — a miss should never again
 be invisible.
 ## Apply policy to runtime definitions & document
 ```task
 id: ACTIVITY-WP-0014-T04
 status: done
 priority: medium
 state_hub_task_id: "04e9d1d2-1192-4402-9402-b12c5d7d44e5"
 ```
 Set `misfire_policy: catchup_latest` for `daily-statehub-wsjf-triage`, documented
 run-miss options in `docs/runbook.md`.
 **Deployed & verified to railiance01 (2026-06-24):** built `activity-core:
 railiance01-prod` with the WP-0014 code (T02/T03/T05), imported into k3s
 containerd, applied the ConfigMap, rolled `actcore-worker`/`api`/`event-router`
 onto the new image, and ran `/admin/sync` (6 defs, 4 schedules upserted, 0
 errors). The live Temporal schedule now reports `OverlapPolicy BufferOne` +
 `CatchupWindow 1d` (= `catchup_latest`); pods healthy, API `db:true temporal:true`.
 ## Keep activity-core thin under the State Hub beachhead model
 ```task
 id: ACTIVITY-WP-0014-T05
 status: done
 priority: high
 state_hub_task_id: "b7e5b877-1b09-421c-a04e-78f785dc00a1"
 ```
 **Architecture decision (Bernd, 2026-06-23):** the resilience that this incident
 needs — queuing writes and caching reads while State Hub is unreachable — must
 **not** be a burden carried by client repos. It belongs to State Hub as a
 **per-machine local "beachhead"** (transparent read cache + write outbox, possibly
 with State-Hub federation), owned by custodian/state-hub. It handles all three
 failure modes: network interruption, central State Hub crash, central machine
 down. This is handed off to state-hub (see the coordination message / proposal);
 **do not build client-side queue/cache logic in activity-core.**
 activity-core's only responsibilities under this model are thin:
 - **Idempotent writes — DONE (2026-06-23, in-repo):** added
  `activity_core/state_hub_write` (`idempotency_headers`); every State Hub write
  (report-sink, ops-evidence, schedule-miss) now sends a stable `Idempotency-Key`
  header derived from `run_id:instruction_id:event_type`. The read-based
  `_progress_exists` dedup is now best-effort (returns `False` on connection
  error instead of hard-failing), so the guarantee lives on the keyed write, not
  a live read. Tests in `tests/test_state_hub_write.py`; documented in
  `docs/runbook.md`.
 - **Adopt the beachhead endpoint — MOVED to [[ACTIVITY-WP-0015]]:** pointing
  `STATE_HUB_URL` at the local beachhead and retiring the bespoke
  `actcore-state-hub-bridge` proxy depend on the state-hub beachhead existing
  first. Split into WP-0015 (status `blocked`) so this workplan can close on its
  completed in-repo work rather than waiting on an external capability.
 T05 is done as far as activity-core can act now; the external-dependent adoption
 lives in WP-0015.
--- a/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
+++ b/workplans/ACTIVITY-WP-0015-adopt-statehub-beachhead-endpoint.md
@@ -0,0 +1,54 @@
 ---
 id: ACTIVITY-WP-0015
 type: workplan
 title: "Adopt State Hub Beachhead Endpoint"
 domain: infotech
 repo: activity-core
 status: blocked
 owner: claude
 topic_slug: activity-core
 created: "2026-06-24"
 updated: "2026-06-24"
 state_hub_workstream_id: "bbc07f9e-9323-4b2b-b556-c33b37d0b228"
 ---
 # Adopt State Hub Beachhead Endpoint
 Carries the **blocked remainder** of [[ACTIVITY-WP-0014]] T05. The in-repo half
 (idempotency-keyed State Hub writes) shipped in WP-0014; this workplan is the
 client-side adoption that depends on the state-hub-owned **beachhead** capability
 (per-machine read cache + write outbox) existing first.
 **Blocked on:** the state-hub beachhead (proposal sent to the `state-hub` agent,
 2026-06-23). Do not build queue/cache logic in activity-core — see
 [[statehub-beachhead-principle]].
 ## Point STATE_HUB_URL at the beachhead
 ```task
 id: ACTIVITY-WP-0015-T01
 status: wait
 priority: medium
 state_hub_task_id: "76b6132d-394a-4a67-bef6-73bb9d1e277e"
 ```
 Once the state-hub beachhead exposes a local endpoint, point activity-core's
 `STATE_HUB_URL` (and the railiance runtime config) at it and verify reads are
 served from cache and writes are queued/flushed correctly when central State Hub
 is unreachable. Confirm idempotency-keyed writes dedup on flush (no duplicate
 `daily_triage`/progress events).
 ## Retire the bespoke actcore-state-hub-bridge proxy
 ```task
 id: ACTIVITY-WP-0015-T02
 status: wait
 priority: medium
 state_hub_task_id: "526c2129-cbf7-4531-a319-aebfc75cc6a3"
 ```
 Remove the inline `hostNetwork` HTTP proxy `actcore-state-hub-bridge` from
 `k8s/railiance/20-runtime.yaml` — it is a primitive precursor of the beachhead
 and should be replaced by the state-hub-owned component, not extended. Re-verify
 the daily triage end-to-end after cutover, including an overnight scheduled run
 while the workstation is asleep (the original failure condition).
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -0,0 +1,434 @@
 ---
 id: ACTIVITY-WP-0016
 type: workplan
 title: "LLM Output Robustness & The Producer Trust Boundary"
 domain: custodian
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-26"
 updated: "2026-06-30"
 state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
 ---
 # ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
 ## Context
 On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on
 time (`daily_triage` event 05:20:57Z) but its output **failed schema
 validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The
 model emitted a long ranked WSJF recommendation list (reached rank 7+ with
 nested `wsjf` objects) and the JSON broke deep in that list. Because the report
 is a single monolithic JSON document, one malformed delimiter discarded the
 **entire** run. This reset the three-clean-consecutive-scheduled-runs streak in
 `ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
 LLM-output-quality surface deferred from `ACTIVITY-WP-0010`.
 The scheduling/runtime layer is healthy — this is purely an output-robustness
 and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`)
 already: passes the output schema to llm-connect as a `json_schema` model param
 (`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser
 (`_parse_json_output`), and preserves a bounded 4000-char preview on hard
 failure (`_invalid_output_report`). None of that helps when error locality is
 zero: the failure unit is the whole document, not the offending item.
 ## Design Frame — The Producer Trust Boundary
 This workplan is anchored to a deliberate architectural stance, not just a bug
 fix. Capture it in an ADR (T04) so future work inherits it.
 **Premise.** activity-core has a *trust boundary* where free-form producer
 output meets strict deterministic consumers (JSON Schema validators, the task
 emitter, classic compute pipelines). The producers are **LLMs and humans (and
 agents acting for either)**. Both are *untrusted producers*: their output may be
 - **erroneous** — hallucination, truncation (token-limit cutoff), drift,
  type slips, typos; or
 - **malicious** — prompt injection, crafted payloads, oversized/deeply-nested
  structures aimed at exhausting or confusing the consumer.
 The architecture should treat the boundary as an adversarial frontier and place
 **guardrails + error-correction tooling there**, rather than letting raw
 producer output flow into deterministic consumers and fail (or worse, partially
 succeed) downstream.
 **Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem,
 there are two sensible strategies — and they compose:
 - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the
  happy path. Blast radius depends entirely on how granular the catch is. Good
  when failures are rare and locally recoverable. Risk: failures surface late,
  possibly after partial side effects.
 - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
  and normalize the output to a known-good shape *before* it enters the pipeline
  — drop bad items, coerce types, bound sizes/depth, allow-list references — so
  the consumer only ever sees clean input. Higher upfront cost, smaller blast
  radius, no partial side effects. Good when failures are common or
  consequences are high.
 **Governing principles for this repo:**
 1. **Push verification to the boundary; keep the interior strict.** Apply
   posture **B** at the producer→consumer boundary (verify+mitigate structure);
   keep posture **A** for residual exceptions inside the verified core. Never
   relax the interior schema to absorb producer sloppiness.
 2. **Make error locality match the unit of work.** One bad recommendation must
   cost one recommendation, not the whole report. Framing the payload so each
   item is independently parseable is the single highest-leverage change.
 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
   provenance-tagged artifacts (index, error, raw snippet) so they can be
   debugged or replayed — degraded-but-usable is distinct from total loss.
 4. **Both human and agent input get the same rigor.** Guardrails are
   producer-agnostic: the same size/depth/count caps, reference allow-lists, and
   truncation detection apply whether the producer is an LLM, an agent, or a
   human form submission.
 ## Reproduce & Root-Cause The Failure
 ```task
 id: ACTIVITY-WP-0016-T01
 status: cancel
 priority: high
 state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
 ```
 Recover the **full** raw llm-connect response for the 06-26 failure (the State
 Hub event keeps only a 4000-char preview; the break is at char 5268) and
 establish the precise cause.
 Done when:
 - the full raw response is pulled from the runtime llm-connect log / response
  store and the exact offending token at char 5268 is identified;
 - `finish_reason` is captured to confirm or rule out token-limit **truncation**
  vs a structural mid-stream glitch;
 - it is confirmed whether llm-connect actually **enforced** the `json_schema`
  constrained-decoding hint or merely accepted it as advisory (this determines
  whether the schema param is load-bearing);
 - the failing payload is captured as a regression fixture under `tests/`.
 2026-06-26 findings (local analysis on the workstation):
 - **Mechanism confirmed structurally.** There are **16 active workstreams**
  org-wide and the triage instruction emits ~one ranked recommendation per
  candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
  break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
  structural cause — more items = more tokens = higher odds of a mid-stream JSON
  slip and/or truncation. This directly justifies T02's bounded top-N + per-item
  framing.
 - **Both attempts failed.** `executor._execute` retries once
  (`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
  **retry** output, so the model produced invalid JSON twice — not a one-off.
 - **activity-core discards the diagnostics needed to root-cause this.** Three
  retention gaps mean the exact char-5268 token cannot be recovered from
  activity-core data at all:
  1. `LLMConnectClient.complete()` returns only `data["content"]`
     (`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
     llm-connect HTTP response, so truncation-vs-structural cannot be
     distinguished locally.
  2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
     `executor.py:259`) — below the 5268 break.
  3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
 - **Remaining (remote, operator-owned).** Confirming the exact offending token
  and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
  — cluster access, outside this repo's SCOPE for direct action. Truncation is
  the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
  identical either way, so T01 does not block the build work.
 - **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
  `finish_reason`/`usage` and persist a larger bounded raw artifact on validation
  failure so this class of failure is never un-debuggable again.
 - Partial fixture saved:
  `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
  (the 4000-char preview + validation error; full payload pending the remote pull).
 2026-06-30 local retention hardening: activity-core now preserves future
 llm-connect diagnostic metadata instead of dropping it at the client boundary.
 `LLMConnectClient.complete()` still returns the content string for compatibility,
 but records safe non-secret response fields such as `finish_reason` and `usage`
 on `last_response_metadata`; the executor copies that into report artifacts,
 State Hub progress detail, and working-memory notes. Invalid report raw previews
 were raised from 4000 to 12000 chars. This does not recover the historical
 06-26 full payload or producer-side `finish_reason`, so T01 remains wait on the
 remote llm-connect log pull, but the retention gap is closed for future failures.
 ## Schema + Prompt Redesign For Error Locality
 ```task
 id: ACTIVITY-WP-0016-T02
 status: done
 priority: high
 state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
 ```
 Redesign the daily-triage report contract so a single malformed item can no
 longer discard the whole report (principle #2).
 Done when:
 - the recommendation list is **bounded** (configurable top-N, default 5–7) in
  both the prompt and the output schema — long lists are where the model drifts;
 - the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one
  recommendation object per line — or an equivalent delimited per-item form)
  behind a minimal stable envelope (`summary` + framed items), so each item is
  an independent parse unit;
 - the prompt explicitly states the contract, the per-item framing, the cap, and
  a "if uncertain, emit fewer well-formed items rather than more" instruction;
 - `max_tokens` is set with headroom for the bounded list so truncation cannot
  occur at the expected size;
 - the output schema file (`_load_output_schema` target) is updated to match.
 2026-06-26 progress (in-repo portion):
 - **Strict, bounded schema written** — `schemas/daily-triage-report.json` went
  from `recommendations.items: {type: object}` (accept-anything) to a strict
  per-item contract: `required [rank, candidate, action, why]` with typed
  `wsjf` sub-fields, plus `maxItems: 7`. The strict item shape is what lets the
  T03 boundary parser validate each recommendation independently.
 - **`maxItems` is a hint, not a hard reject** — the in-repo validator
  (`_validate_schema_node`) only enforces `type`/`required`/`properties`/`items`
  and ignores `maxItems`/`enum`. That is deliberate: a hard `maxItems` reject
  would discard a whole 16-item report — the exact blast-radius bug WP-0016
  removes. The bound is enforced via the prompt + the llm-connect `json_schema`
  constraint hint + T03 mitigation (keep top-N by rank, quarantine extras).
 - **DEPLOY COUPLING (important):** this schema file is consumed *both* as the
  llm-connect hint *and* by the current whole-document validator. Tightening
  per-item `required` fields makes the existing whole-doc validation hard-fail
  **more** until T03 replaces it with per-item quarantine. Therefore the schema
  change MUST ship together with T03 — do not deploy the strict schema to the
  runtime bundle ahead of the T03 parser. Four executor/instruction tests that
  asserted the old loose contract were updated to the strict contract; the
  forwarded-schema test now reads the live file instead of hard-coding it.
 - **Truncation hypothesis corroborated** — the instruction config carries
  `max_tokens` on the order of ~1200 (per the wiring test fixture). 5268 chars ≈
  ~1300–1500 tokens, so a ~1200-token cap would truncate a 16-item list right at
  the observed break. This strengthens T01's leading hypothesis and makes the
  `max_tokens` headroom change below concrete.
 **Bundle handoff (NOT in this repo — runtime-projected definition).** The triage
 prompt and `max_tokens` live in the Railiance runtime bundle, not in repo files.
 Apply there:
 1. Instruct a **bounded top-N** (≤ 7) ranked recommendations, "if uncertain emit
   fewer well-formed items rather than more."
 2. Specify the **per-item framing** the T03 parser will consume (NDJSON: a
   leading summary object, then one recommendation JSON object per line).
 3. Raise **`max_tokens`** to give clear headroom for 7 framed items (eliminate
   truncation at the expected size).
 4. State the value vocabularies (`action`, `confidence`) the T04 guardrails will
   check.
 2026-06-30 live evidence check: the 2026-06-28 and 2026-06-29 scheduled
 `daily_triage` events validated successfully, which shows the runtime is no
 longer failing every day. However, the preserved State Hub reports still contain
 10 recommendations, not the requested bounded top-N of 7 / framed item contract.
 Treat that as evidence that the runtime-projected prompt/schema/max-token bundle
 has not fully absorbed the T02 handoff yet.
 2026-06-30 source projection closeout: patched `k8s/railiance/20-runtime.yaml`
 so the projected `daily-statehub-wsjf-triage.md` prompt now says at most 7
 recommendations and instructs the model to emit fewer well-formed items rather
 than more. The projected `daily-triage-report.json` now has `maxItems: 7` and
 `rank.maximum: 7`, aligned with the repo schema. `max_tokens: 1800` remains as
 headroom for the bounded report. T02 is done in source; live deployment and an
 observed <=7 recommendation run remain under T05.
 ## Boundary Parser — Verify & Mitigate (Posture B)
 ```task
 id: ACTIVITY-WP-0016-T03
 status: done
 priority: high
 state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
 ```
 Implement item-granular parsing with a quarantine lane in
 `src/activity_core/rules/executor.py`, applying posture **B** at the boundary
 (principles #1–#3).
 Done when:
 - the parser splits the envelope from the framed items, then parses **each item
  independently**; a malformed item is routed to a bounded `quarantined_items`
  artifact (index + validation error + raw snippet), not raised;
 - a run with some valid and some invalid items emits a report over the surviving
  valid items with `output_validated=true`, plus `partial=true` and
  `quarantined_count` / `quarantined_items` markers — degraded-but-usable is
  reported distinctly from total loss;
 - a best-effort **repair** pass (close unterminated brackets/quotes, recover the
  valid prefix) is attempted per item before quarantining it;
 - truncation detected in T01 is handled as its own signal (recover whole items
  emitted before the cutoff rather than failing the document);
 - the existing monolithic-document path remains as the fallback when framing is
  absent (backward compatible with task-only instructions).
 2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`):
 - **Resilient recovery wired into `_execute`.** When the whole-document parse +
  one retry still fail, report instructions (those with `report_sinks`) now run
  `_resilient_report` *before* the total-loss `_invalid_output_report`. If it
  recovers ≥1 valid item it returns a partial report; otherwise it returns None
  and the prior total-loss path is preserved unchanged.
 - **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output
  was pretty-printed (multi-line objects), so naive NDJSON line recovery would
  have failed. `_extract_object_spans` walks the `recommendations` array
  brace-depth- and string-aware, so it recovers each recommendation object
  whether pretty-printed across many lines *or* emitted one-per-line (NDJSON).
  The truncated trailing object is returned with `complete=False`.
 - **Layered mitigation per item:** `json.loads` → on failure for a truncated
  tail, a best-effort `_try_repair` (balance open string/brackets/braces) →
  then `_partition_items` validates each recovered object against the T02 item
  schema. Valid items survive; malformed or over-`maxItems` items are
  quarantined with provenance (`index`, `error`, `raw` snippet, `reason`).
 - **Report shape on degradation:** `output_validated=True` over the survivors,
  `review_required=True`, `partial=True`, `quarantined_count`, and a bounded
  `quarantined_items` list (cap 20). Degraded-but-usable is now reported
  distinctly from total loss.
 - **Verified against the real failure shape.** New tests reconstruct a
  pretty-printed report with 7 valid recommendations + a truncated tail (the
  06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers
  all 7 and quarantines the broken tail (previously: whole run discarded);
  log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item
  run keeps 2 and quarantines the rank-less one.
 - **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the
  *happy* path (valid JSON, all items schema-valid, but > N items) — the resilient
  path only runs on failure, so over-limit-on-success is a guardrail/count-cap
  concern, which is exactly T04's remit.
 ## Producer Guardrails + ADR-004
 ```task
 id: ACTIVITY-WP-0016-T04
 status: done
 priority: medium
 state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
 ```
 Write the architecture decision record and add the producer-agnostic guardrails
 (principle #4).
 Done when:
 - `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary,
  the untrusted-producer premise (erroneous **and** malicious; human and agent),
  the A vs B taxonomy and where each applies, the error-locality principle, and
  the quarantine-with-provenance rule;
 - boundary guardrails are enforced at the consumer edge: max item **count**, max
  string length, max nesting **depth**, and a **reference allow-list** (e.g. a
  recommendation `candidate` / a task `target_repo` must resolve to a known
  workstream/repo before it is acted on);
 - guardrail rejections are quarantined with provenance, consistent with T03;
 - SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
  changes the documented contract.
 2026-06-26 progress:
 - **ADR-004 written** — `docs/adr/adr-004-producer-trust-boundary.md` documents
  the untrusted-producer premise (erroneous + malicious; LLM/agent/human), the
  A-vs-B posture taxonomy, the four governing principles, the concrete
  activity-core mechanisms, a posture-by-layer table, consequences, and
  alternatives considered. Accepted, scope cross-repo.
 - **Producer guardrails implemented** in `executor.py`, applied uniformly on the
  happy path *and* the recovery path via `_partition_items`: per-item order is
  structural-type → schema → structural caps (`_MAX_DEPTH=8`,
  `_MAX_STRING_LEN=4000`) → reference allow-list → count cap (`maxItems`). Each
  quarantine carries a `reason` (`malformed`/`schema`/`guardrail`/`allow_list`/
  `over_limit`).
 - **Happy-path count cap closed** (the item deferred from T03): a syntactically
  valid 9-item report now keeps 7 and quarantines 2 as `over_limit`, emitting a
  `partial` report — without a retry.
 - **Reference allow-list wired but inert.** `_allow_list_from_context` reads
  `context["known_candidates"]`; when present, recommendations with an unknown
  `candidate` are quarantined (`reason: allow_list`). Absent today → check is
  inert; activation is a one-line context-resolver change. Keeps the guardrail
  producer-agnostic (principle #4) and ready.
 - **SCOPE.md updated** — instruction-executor bullet now names the quarantine
  lane + guardrails; ADR-004 added to the Architecture Decisions list. No INTENT
  drift: this hardens the existing output contract, it does not extend scope.
 - New tests: happy-path count cap, oversized-string guardrail, allow-list
  rejection (all green).
 ## Tests + Calibration Re-Entry
 ```task
 id: ACTIVITY-WP-0016-T05
 status: done
 priority: high
 state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
 ```
 Prove the new posture and hand back to the calibration gates.
 Done when:
 - regression tests cover: the captured 06-26 payload, a truncated-mid-list
  payload, a one-bad-item-among-good payload (asserts quarantine + partial), an
  oversized/over-deep payload (asserts guardrail rejection), and an
  injection-shaped reference (asserts allow-list rejection);
 - the full suite passes and the result is recorded here with the count;
 - a daily-triage smoke against the live runtime shows a previously-failing
  payload now **degrades gracefully** (valid items delivered, bad items
  quarantined) instead of discarding the run;
 - a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03`
  that the output-robustness blocker is cleared so the three-clean-run gate can
  resume on its own.
 2026-06-26 progress (in-repo portion complete):
 - **Regression coverage complete.** Across T03/T04/T05: truncated-mid-list,
  one-bad-item-among-good (quarantine + partial), oversized-string and over-depth
  guardrail rejection, allow-list (injection-shaped) rejection, happy-path count
  cap, and a test driving the **actual captured 2026-06-26 payload**
  (`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`)
  — it now recovers 6+ valid recommendations and quarantines the truncated tail,
  where before it discarded the whole run.
 - **Full suite green:** 218 passed, 1 skipped (recorded at T04; the T05 fixture +
  over-depth tests add to this — see the commit).
 - **Hand-back notes posted** to `ACTIVITY-WP-0006-T03` (State Hub event
  `b6b8c2b8`) and `ACTIVITY-WP-0010-T04` (`b813f0dc`).
 - **Remaining (remote, operator-owned):** the live daily-triage smoke on
  `railiance01` proving end-to-end graceful degradation. It depends on deploying
  the T02 bundle prompt/`max_tokens`/NDJSON changes together with this code, which
  is cluster/operator work outside this repo's SCOPE. T05 therefore stays
  `progress` until that live run exists; the in-repo deliverables are done.
 2026-06-30 follow-up: added forward-looking diagnostics so future validation
 failures carry llm-connect response metadata and a larger bounded raw-output
 preview in activity-core-owned evidence. Focused verification passed:
 `uv run pytest tests/test_llm_client.py tests/rules/test_executor.py tests/test_report_sinks.py -q`
 => 39 passed. This improves future root-cause ability but does not replace the
 required live smoke proving graceful degradation on railiance01.
 2026-06-30 projection follow-up: local source projection now enforces the top-7
 prompt/schema contract. Remaining T05 proof is operational: deploy or sync the
 updated `k8s/railiance/20-runtime.yaml`, run `actcore-sync`/schedule smoke or wait
 for the next 07:20 Berlin fire, then confirm State Hub `daily_triage` evidence is
 `output_validated=true` with no more than 7 recommendations.
 ## Relationships
 - **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
  `ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on
  the same output-quality failure this workplan removes.
 - **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap).
 - **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens
  the instruction-executor output contract; it does not move provider
  credentials, cluster reconciliation, or task lifecycle into this repo.
 ## Closure 2026-07-02 (RAIL-BS-WP-0008 live deploy)
 - T05 done: the robustness bundle (strict per-item schema + T03 quarantine
  parser + bounded top-7/NDJSON runtime prompt, activity-core `7612112`) was
  deployed to railiance01 and live-proven. A manually triggered daily-triage
  run produced a clean schema-valid report with exactly 7 ranked
  recommendations: State Hub event `24d2d321-c761-47f7-bf9e-7950a6253c21`,
  `output_validated=true`, working memory written. Calibration re-entry: the
  three-clean-run streak (WP-0006-T03 / WP-0010-T04) restarts from this run.
 - T01 cancelled: the raw 2026-06-26 llm-connect response is unrecoverable
  (stateless pod, no response store, log stream holds only 2 startup lines
  since 2026-06-19). Root cause stands on the retained 4000-char preview and
  break-at-char-5268 evidence: output exceeded the old ~1200-token budget and
  truncated mid-JSON. The deployed mitigation (1800-token headroom, bounded
  top-7, per-item recovery) addresses exactly that failure mode.
--- a/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
+++ b/workplans/ACTIVITY-WP-0017-core-hub-ops-evidence-sink.md
@@ -0,0 +1,58 @@
 ---
 id: ACTIVITY-WP-0017
 type: workplan
 title: "Core Hub ops evidence sink"
 domain: infotech
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-27"
 updated: "2026-06-27"
 state_hub_workstream_id: "2a073bf4-febf-433e-a721-5daf71760912"
 ---
 # Core Hub ops evidence sink
 ## Goal
 Provide the activity-core side of the Core Hub replacement evidence path for
 `CORE-WP-0008-T03`, without depending on the legacy Haskell Inter-Hub sink and
 without placing secret material in activity definitions, logs, State Hub, or
 chat.
 ## Task: Add Core Hub interaction-event sink
 ```task
 id: ACTIVITY-WP-0017-T01
 status: done
 priority: high
 state_hub_task_id: "32aab1af-6be5-4b52-afa1-c11f52c65892"
 ```
 Add a `core-hub-interaction-event` ops evidence sink that posts sanitized
 ops-inventory probe evidence to Core Hub `/api/v2/interaction-events`, verifies
 the created event is visible, and reports only non-secret ids/statuses.
 Acceptance:
 - runtime token is read through `CORE_HUB_RUNTIME_TOKEN_FILE` or a named
 environment variable, never from workplan content;
 - sink configuration accepts `CORE_HUB_BASE_URL` and a widget id or widget
 mapping;
 - emitted metadata reuses the existing compact/sanitized probe evidence path;
 - missing Core Hub config skips cleanly with explicit non-secret missing keys;
 - tests prove the POST/visibility check and secret non-disclosure.
 Verification 2026-06-27: `tests/test_ops_evidence_sinks.py` passed, and
 a disposable local Core Hub runtime accepted an activity-core
 `core-hub-interaction-event` sink emission, then listed the created
 `ops-endpoint-verified` event back through `/api/v2/interaction-events`.
 The verification asserted sanitized metadata did not include response body,
 authorization header, URL userinfo, or token query material.
 Completed 2026-06-27: implemented the Core Hub interaction-event sink in
 `activity_core.ops_evidence_sinks` with unit coverage for POST/visibility
 verification, missing config behavior, and secret non-disclosure. This provides
 the direct Core Hub consumer path needed by `CORE-WP-0008-T03`; deployed use
 still requires an approved Core Hub runtime token and widget id/mapping.
--- a/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
+++ b/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
@@ -0,0 +1,248 @@
 ---
 id: ACTIVITY-WP-0018
 type: workplan
 title: "Own-infrastructure automation status surface"
 domain: infotech
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: automation-observability
 created: "2026-06-29"
 updated: "2026-06-29"
 state_hub_workstream_id: "0220b38b-7c73-4601-9601-5f2c1a5b29e8"
 ---
 # Own-infrastructure automation status surface
 ## Goal
 Make activity-core's own scheduling and evidence infrastructure the explicit
 operating preference for durable automations, independent of any coding
 assistant-provided scheduler or reminder system.
 An operator should be able to answer a question like "How did our automations go
 since Friday?" with a repo-native command that does not require an LLM. Coding
 assistants may inspect or summarize that command's output, but they must not be
 the source of truth for scheduled execution, run history, or operational
 evidence.
 ## Review notes
 The repo already owns the correct infrastructure direction:
 - `SCOPE.md` defines activity-core as the org-wide event bridge for cron,
  one-off scheduled datetime, and event-triggered automation.
 - `Makefile` exposes sync and service targets, but no operator status target for
  recent automation outcomes.
 - `docs/runbook.md` documents daily-triage verification through
  `scripts/verify_daily_triage.py`, but that helper is activity-specific and
  still reads like a checklist rather than the baseline answer surface for all
  automations.
 - Existing workplan evidence shows the status question is operationally common:
  2026-06-24 and 2026-06-25 daily triage runs were clean, while 2026-06-26 and
  2026-06-27 fired on schedule but failed output validation. That distinction is
  exactly what the baseline command must make obvious.
 ## Task: Codify the own-infra scheduling preference
 ```task
 id: ACTIVITY-WP-0018-T01
 status: done
 priority: high
 state_hub_task_id: "00127678-5ce4-4cb3-b81c-f42e04407c73"
 ```
 Record the repository preference that durable automation scheduling, execution
 history, and run evidence belong to activity-core's own infrastructure: Temporal
 Schedules, NATS JetStream, activity-core run records, State Hub progress, and
 working-memory/report sinks.
 Acceptance:
 - `AGENTS.md` repo-specific instructions say not to use coding
  assistant-provided automation tooling as the execution or evidence source for
  activity-core automations.
 - `SCOPE.md` and `docs/runbook.md` describe coding assistants as callers or
  summarizers of repo-native automation commands, not as schedulers.
 - The preference distinguishes durable automation from harmless local session
  reminders: production/operational recurrence belongs to activity-core.
 - The text names the authoritative evidence sources and avoids tying the policy
  to any one assistant product.
 2026-06-29 progress: Added the immediate repo-agent instruction in AGENTS.md
 that durable activity-core automations must use repo-owned infrastructure, not
 coding assistant automation/reminder/heartbeat tooling, as the execution or
 evidence source. Remaining T01 work is to carry the same preference into
 SCOPE.md and docs/runbook.md.
 ## Task: Define the automation status evidence contract
 ```task
 id: ACTIVITY-WP-0018-T02
 status: done
 priority: high
 state_hub_task_id: "17e6bb87-d4bf-4ef3-b91c-4bdfe2fe3492"
 ```
 Define a small, deterministic report contract for answering recent automation
 status questions across all ActivityDefinitions.
 Acceptance:
 - The contract covers schedule state, expected fires in the requested window,
  observed workflow runs, `activity_runs` rows, State Hub progress events,
  working-memory/report sink evidence, and known validation or sink failures.
 - It defines normalized statuses such as `completed`, `running`, `retrying`,
  `validation_failed`, `sink_failed`, `missed`, `disabled`, and `unknown`.
 - Partial data is explicit: if Temporal, Postgres, State Hub, or a sink path is
  unavailable, the report includes warnings rather than silently passing or
  failing the whole check.
 - The contract is safe for operator logs: no secrets, prompts, raw model output,
  or credential-bearing URLs.
 - The contract can be emitted as JSON for scripts and rendered as concise text
  for humans.
 ## Task: Implement the non-LLM automation status CLI
 ```task
 id: ACTIVITY-WP-0018-T03
 status: done
 priority: high
 state_hub_task_id: "7831f2fc-8b76-48fe-aa34-9dcc11ee84db"
 ```
 Add a deterministic CLI, likely under `scripts/automation_status.py` or an
 `activity_core` module, that answers recent automation status questions without
 calling an LLM.
 Acceptance:
 - Supports `--since`, `--until`, activity name/id filters, JSON output, and a
  concise human summary.
 - Accepts simple operator dates, including absolute dates and a documented
  `friday`/`last-friday` style shortcut, resolving them to concrete dates in the
  configured timezone.
 - Inspects all enabled scheduled ActivityDefinitions by default, not just daily
  triage.
 - Uses live sources when configured: Postgres `activity_definitions` /
  `activity_runs`, Temporal schedule and workflow visibility, State Hub
  progress, and configured local report sink paths.
 - Degrades usefully when a source is unavailable and exits non-zero only for
  real status failures or invalid input, not for optional evidence gaps that are
  clearly reported.
 - Includes focused unit tests with fixture data for clean runs, validation
  failures, missed runs, disabled schedules, and partial-source availability.
 ## Task: Add the Make target baseline
 ```task
 id: ACTIVITY-WP-0018-T04
 status: done
 priority: high
 state_hub_task_id: "451bdf62-b619-4ace-9262-46d20b912781"
 ```
 Expose the CLI through a Make target that is easy for an operator or any coding
 assistant to run before attempting a prose summary.
 Acceptance:
 - `make automation-status SINCE=2026-06-26` prints the human-readable baseline.
 - `make automation-status SINCE=friday` is supported or documented with the
  exact accepted shortcut.
 - A JSON form is available, either through `FORMAT=json` or a separate target
  such as `make automation-status-json`.
 - The target does not require LLM credentials, coding assistant automation
  tooling, or interactive prompts.
 - `make help` lists the target with a clear one-line description.
 ## Task: Update operator docs and examples
 ```task
 id: ACTIVITY-WP-0018-T05
 status: done
 priority: medium
 state_hub_task_id: "233659aa-e14a-4b3d-b156-d04f0fa16db6"
 ```
 Update the runbook so "How did automations go since Friday?" has an obvious
 operator recipe.
 Acceptance:
 - `docs/runbook.md` has a short "Automation status" section near the scheduling
  operations.
 - The docs include example output or a compact sample for the known daily
  triage distinction: fired on time versus completed successfully versus output
  validation failure.
 - The docs clarify that LLM summaries are optional convenience only; the Make
  target output is the baseline evidence.
 - The daily-triage-specific helper is either kept as a lower-level diagnostic or
  folded into the generalized status command.
 ## Task: Verify against recent scheduled-run evidence
 ```task
 id: ACTIVITY-WP-0018-T06
 status: done
 priority: medium
 state_hub_task_id: "24efbe9f-dfff-482f-9edc-456379c9a2aa"
 ```
 Prove the new surface against the recent evidence that motivated this workplan.
 Acceptance:
 - Running the status command over the window starting Friday, 2026-06-26 shows
  that the daily triage schedule fired on 2026-06-26 and 2026-06-27 but did not
  produce clean validated reports.
 - The command distinguishes scheduling health from output/schema validation
  failure.
 - Disabled or waiting schedules, such as the weekly coding retro gate when its
  upstream read model is not available, are reported without being counted as
  missed runs.
 - Verification results are recorded in this workplan and as a State Hub progress
  note once the implementation lands.
 ## Implementation Result
 Completed 2026-06-29: implemented the own-infrastructure automation status
 surface and codified the scheduling preference.
 Delivered:
 - `AGENTS.md` now states that durable activity-core automations use repo-owned
  infrastructure, not coding assistant automation/reminder/heartbeat tooling, as
  execution or evidence authority.
 - `SCOPE.md` and `docs/runbook.md` describe the deterministic status surface and
  assistant boundary.
 - `src/activity_core/automation_status.py` and `scripts/automation_status.py`
  provide the non-LLM CLI.
 - `make automation-status SINCE=...` and `make automation-status-json` expose the
  baseline operator commands.
 - `tests/test_automation_status.py` covers date shortcuts, cron fire estimation,
  completed runs, validation failures, missed runs, disabled schedules, partial
  source availability, and working-memory evidence parsing.
 Verification:
 ```bash
 python3 -m py_compile src/activity_core/automation_status.py scripts/automation_status.py tests/test_automation_status.py
 /home/worsch/.local/bin/uv run pytest tests/test_automation_status.py tests/test_daily_triage_verifier.py -q
 /home/worsch/.local/bin/uv run python scripts/automation_status.py \
  --since 2026-06-26 --until 2026-06-27 --db-url '' \
  --progress-event-type daily_triage --timeout-seconds 10 \
  --working-memory-dir /tmp --format json
 ```
 Results:
 - focused tests: `11 passed`;
 - `make help` lists `automation-status` and `automation-status-json`;
 - the 2026-06-26 through 2026-06-27 status run exited `1` as expected because
  State Hub evidence classified daily triage activity
  `6fca51fa-387a-4fd0-bc4e-d62c29eb859a` as `validation_failed` with two
  non-secret evidence records: 2026-06-26 `Expecting ',' delimiter` and
  2026-06-27 `Unterminated string`;
 - the same report classified the gated weekly coding retro as `disabled`, not
  `missed`.
--- a/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
+++ b/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
@@ -0,0 +1,204 @@
 ---
 id: ACTIVITY-WP-0019
 type: workplan
 title: "Automation schedule inventory Make targets"
 domain: infotech
 repo: activity-core
 status: finished
 owner: codex
 topic_slug: automation-inventory
 created: "2026-06-29"
 updated: "2026-07-01"
 state_hub_workstream_id: "21c73763-9adc-42f6-8fd2-1b8b33c2c770"
 ---
 # Automation schedule inventory Make targets
 ## Goal
 Provide a repo-native, non-LLM way to list every scheduled automation that
 activity-core knows about.
 `ACTIVITY-WP-0018` added the status surface for questions like "How did our
 automations go since Friday?". The next operator question is the inventory
 baseline: "What automations are scheduled at all?" That should be answerable
 through Make targets backed by activity-core's own ActivityDefinitions,
 database, and Temporal schedule metadata when available, independent of any
 coding assistant automation infrastructure.
 ## Review notes
 - `Makefile` currently exposes `automation-status` and
  `automation-status-json`, but no dedicated inventory/list target.
 - `scripts/automation_status.py` and `src/activity_core/automation_status.py`
  already load scheduled ActivityDefinitions and compute their Temporal schedule
  ids. The inventory target should reuse that parsing/loading posture where it
  fits rather than creating a second discovery path.
 - `make sync-schedules` reconciles Temporal schedules from the
  `activity_definitions` database, but it is an action target, not a read-only
  operator inventory command.
 - The inventory command should remain useful in degraded local mode: file-backed
  definitions are enough to list configured scheduled automations, while live
  DB and Temporal visibility can enrich the output.
 ## Task: Define the automation inventory contract
 ```task
 id: ACTIVITY-WP-0019-T01
 status: done
 priority: high
 state_hub_task_id: "8de24590-f9ee-4d0e-8692-b7ada9f232ed"
 ```
 Define the fields and source precedence for a deterministic scheduled
 automation inventory report.
 Acceptance:
 - The report includes every ActivityDefinition with `trigger_type` of `cron` or
  `scheduled`, including disabled definitions.
 - Each row includes id, name, enabled/disabled state, trigger type, schedule
  expression or one-shot datetime, timezone, overlap/catchup policy when known,
  and the derived Temporal schedule id.
 - The report identifies its source for each row: database, repo definition file,
  Temporal visibility, or a combination.
 - If Temporal is reachable, the report adds paused/missing/drift hints without
  mutating schedules.
 - Missing optional sources produce warnings, not silent omissions.
 - The JSON shape is stable enough for scripts and tests.
 ## Task: Implement a non-mutating inventory CLI
 ```task
 id: ACTIVITY-WP-0019-T02
 status: done
 priority: high
 state_hub_task_id: "538cb9a5-48f3-470c-8518-29ee66c96678"
 ```
 Add a deterministic CLI path for listing scheduled automations without requiring
 LLM credentials or coding assistant tooling.
 Acceptance:
 - A script or module command, likely sharing code with
  `activity_core.automation_status`, supports human and JSON output.
 - The command is read-only: it does not call `sync-schedules`, upsert schedules,
  delete schedules, enqueue workflows, or write State Hub evidence.
 - It supports filters by activity id, activity name, enabled state, and trigger
  type.
 - It loads from the database when configured and falls back to repo definition
  files when the database is unavailable or explicitly disabled.
 - It optionally enriches rows from Temporal when `TEMPORAL_HOST` is configured,
  with bounded timeouts so an unreachable service does not hang the command.
 - Unit tests cover DB rows, file fallback, disabled definitions, Temporal
  enrichment unavailable, and JSON output.
 ## Task: Add Make targets
 ```task
 id: ACTIVITY-WP-0019-T03
 status: done
 priority: high
 state_hub_task_id: "f2001721-07f3-42f5-a15e-0c7d1b0ed801"
 ```
 Expose the inventory command through Make targets that are easy for humans,
 scripts, and coding assistants to run before asking for a prose summary.
 Acceptance:
 - `make automation-list` prints a concise human-readable inventory.
 - `make automation-list-json` emits the same inventory as JSON.
 - Optional Make variables pass through cleanly, for example `ENABLED=true`,
  `TRIGGER=cron`, `ACTIVITY_ID=<uuid>`, or `FORMAT=json`.
 - `make help` lists both targets with clear one-line descriptions.
 - The targets do not require LLM access, Codex automation tooling, or
  interactive prompts.
 ## Task: Document the inventory workflow
 ```task
 id: ACTIVITY-WP-0019-T04
 status: done
 priority: medium
 state_hub_task_id: "f687743b-3936-413e-ae50-d35484ae9a81"
 ```
 Update operator documentation so the scheduled automation inventory path is
 discoverable next to the status path.
 Acceptance:
 - `docs/runbook.md` documents `make automation-list` and
  `make automation-list-json`.
 - The docs distinguish inventory from status: inventory answers what is
  configured; status answers what happened in a time window.
 - The docs state that the command is read-only and uses activity-core-owned
  scheduling evidence.
 - The docs include a compact example of the expected human output.
 ## Task: Verify against current repo and live/degraded sources
 ```task
 id: ACTIVITY-WP-0019-T05
 status: done
 priority: medium
 state_hub_task_id: "5317b532-5cef-4eff-b6d8-3e85bbca8e8a"
 ```
 Prove the target against the current scheduled automation definitions and
 degraded local conditions.
 Acceptance:
 - `make automation-list` shows the current scheduled automations, including
  daily triage and weekly scheduled definitions when present in the selected
  source.
 - JSON output is valid and includes the same rows.
 - A DB-unavailable run falls back to repo definition files or reports a clear
  warning if no definitions are discoverable.
 - A Temporal-unavailable run exits successfully with Temporal warnings rather
  than hanging.
 - Focused tests pass and the result is recorded in this workplan before the
  workplan is moved to `finished`.
 ## Implementation Result
 Completed 2026-07-01: implemented the read-only scheduled automation inventory
 surface.
 Delivered:
 - `scripts/automation_inventory.py` exposes the inventory CLI backed by
  `activity_core.automation_status` shared definition and Temporal helpers.
 - `make automation-list` and `make automation-list-json` list configured
  scheduled ActivityDefinitions with filters for `ENABLED`, `TRIGGER`,
  `ACTIVITY_ID`, and `ACTIVITY_NAME`.
 - JSON output is script-safe; the Make JSON target suppresses command echo and
  recursive make directory chatter.
 - `docs/runbook.md` now distinguishes inventory (what is configured) from status
  (what happened in a time window).
 - Tests cover DB-backed rows, file fallback, disabled filtering, Temporal
  unavailable warnings, and JSON CLI output.
 Verification:
 ```bash
 /home/worsch/.local/bin/uv run pytest tests/test_automation_status.py tests/test_daily_triage_verifier.py -q
 bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list ACTCORE_DB_URL= TEMPORAL_HOST='
 bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list-json ACTCORE_DB_URL= TEMPORAL_HOST= > /tmp/activity-core-inventory.json && python3 -m json.tool /tmp/activity-core-inventory.json >/tmp/activity-core-inventory.pretty'
 bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make automation-list ACTCORE_DB_URL= TEMPORAL_HOST= ENABLED=true TRIGGER=cron'
 bash -lc 'export PATH="/home/worsch/.local/bin:$PATH"; make help'
 ```
 Results:
 - focused tests: `16 passed`;
 - degraded Make inventory run listed 9 file-backed scheduled automations, with
  5 enabled and 4 disabled;
 - filtered Make run with `ENABLED=true TRIGGER=cron` listed 5 enabled cron
  automations;
 - `automation-list-json` emitted parseable JSON directly;
 - `make help` lists `automation-list` and `automation-list-json`.
--- a/workplans/archived/260603-WP-0002-next-steps.md
+++ b/workplans/archived/260603-WP-0002-next-steps.md
@@ -3,6 +3,7 @@ type: session-note
 created: "2026-03-28"
 updated: "2026-06-03"
 status: archived
 state_hub_workstream_id: "b221e65a-6f97-44b0-8dae-442fffcb7f64"
 ---
 # WP-0002 Handoff Note — Continue on CoulombCore