Add activity_core/schedule_health: a pure evaluate_schedule_health() verdict
(built on Temporal's num_actions_missed_catchup_window plus a staleness check),
an async check_schedule_health() reader, and post_missed_fire_alert() that emits
a schedule_miss State Hub progress event. Makes a missed fire visible even under
misfire_policy=skip, where Temporal drops it by design. Unit tests for the
verdict logic.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Set Temporal catchup_window on cron schedules so a fire missed during a
worker/Temporal outage is no longer silently dropped. Redefine misfire_policy
into three explicit modes — skip, catchup_all, catchup_latest — mapping to
(catchup_window, overlap) pairs; legacy catchup/compress aliased. Add
catchup_window_seconds override. Remove the ad-hoc upsert-time 1h backfill in
favour of native catchup. Apply catchup_latest to daily-statehub-wsjf-triage in
the Railiance runtime manifest and document run-miss policies in the runbook.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cron fires are silently dropped: _build_schedule() sets SchedulePolicy(overlap=)
but never catchup_window, so a brief worker/Temporal outage at trigger time drops
the fire with no recovery and no signal (root cause of missing 06-22/06-23 daily
triage runs). Define three explicit run-miss policies: skip, catchup_all,
catchup_latest, plus missed-fire detection.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The actcore-state-hub-bridge readiness probe hit /state/summary through
the tunnel proxy chain. Cold-cache summary requests and intermittent
tunnel stalls routinely exceeded the 5s probe timeout (1584 failures
over 17h), leaving the pod 0/1 Ready and breaking hourly/triage sinks.
Use /state/health instead — same signal the ops inventory already
expects, and completes in ~30ms through the bridge.
Apply the new 'tooling' category (reusable internal tooling/infrastructure)
from the Repo Classification Standard. First-pass agent classification.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First-pass agent classification per the Repo Classification Standard v1.0
(canon-repo-classification); pending human review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Note the 2026-06-19 live reconciliation on railiance01: llm-connect
deployed, worker restarted with LLM_CONNECT_URL, fixture smoke passed.
Manual daily triage still blocked on actcore-state-hub-bridge reachability.
After coulomb-loop bootstrap E2E (3/3 cycles on 2026-06-18), revert
activity-core from experimental daily crons to weekly Monday schedules
so discover_kaizen_scheduled_repos(cadence=weekly) matches the
operate-phase ActivityDefinitions. Drop the disabled tdd-workflow stub.
IssueCoreRestSink.emit() passed task_spec.triggering_event_id straight
into the httpx json= payload. When the field is a UUID object (rather
than a string), httpx's JSON encoder raised
"TypeError: Object of type UUID is not JSON serializable", failing the
emission. Guard with str(), preserving None for optional event ids.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ISSUE_CORE_URL, ISSUE_CORE_API_KEY, and ISSUE_SINK_TYPE guidance so
agents pair keys locally or via OpenBao instead of requesting them from
ops-warden.
Issue-core requires a shared ingestion key on POST /issues/. The REST sink
now sends Authorization: Bearer using ISSUE_CORE_API_KEY and fails fast
when the key is missing under ISSUE_SINK_TYPE=rest.
Updates .env.example, emission boundary docs, and unit tests for the
header contract and missing-key error.
When discover_kaizen_projects returns {"projects": [...]} bound to
context.projects, for_each can iterate the list directly. Multi-key
summaries (e.g. repo SBOM bulk) remain unchanged.
Implement discover_kaizen_scheduled_repos and discover_kaizen_projects per
kaizen-agentic ADR-005 contract: State Hub roster, roster.yaml filter, schedule
validation, and prepare_command emission. Register kaizen/resolver/shell source
types with unit tests and runbook dry-run instructions.