Add a test driving the actual captured 2026-06-26 failure payload
(tests/fixtures/wp0016/...partial.json): it now recovers 6+ valid recommendations
and quarantines the truncated tail, where before WP-0016 it discarded the whole run.
Add an over-depth guardrail test. Together with T03/T04 the regression set now covers
truncation, one-bad-item, oversized-string, over-depth, allow-list/injection-shaped,
and happy-path count cap.
In-repo portion of T05 complete; the live railiance01 graceful-degradation smoke is
operator-owned cluster work (deploy-coupled with the T02 bundle changes) and remains
outstanding. Hand-back notes posted to WP-0006-T03 and WP-0010-T04. Full suite: 220
passed, 1 skipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM,
agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate
postures, error-locality and quarantine-with-provenance principles, and the concrete
activity-core mechanisms.
Implement producer-agnostic guardrails in executor.py, applied uniformly on the
happy path and the recovery path via _partition_items: structural-type -> schema ->
structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap.
Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred
from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads
context["known_candidates"] via _allow_list_from_context; inert until a resolver
populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift.
New tests: happy-path count cap, oversized-string guardrail, allow-list rejection.
Full suite: 218 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When the whole-document parse + one retry still fail, report instructions now run
_resilient_report before the total-loss path. A brace/quote-aware scanner
(_extract_object_spans) recovers each recommendation object whether pretty-printed
across many lines or NDJSON one-per-line; a truncated tail gets a best-effort
_try_repair; _partition_items validates each recovered object against the T02 item
schema. Valid items survive (output_validated=True, partial=True), malformed/
over-maxItems items are quarantined with provenance (index, error, raw, reason),
capped at 20. Error locality now matches the unit of work: one bad item costs one
item, not the whole report.
Verified against the real 06-26 shape: 7 valid recommendations + a truncated tail
now recovers all 7 and quarantines the broken tail (previously the whole run was
discarded). Happy-path maxItems top-N enforcement is deferred to T04 (count caps).
Full suite: 215 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the accept-anything recommendations.items ({type: object}) with a strict
per-item contract (required [rank, candidate, action, why] + typed wsjf) and a
maxItems:7 hint. Strict item structure is what lets the T03 boundary parser
validate each recommendation independently and quarantine only malformed ones.
maxItems is a producer hint (prompt + llm-connect json_schema + T03 mitigation),
NOT a hard reject — a hard maxItems reject would discard a whole 16-item report,
the blast-radius bug WP-0016 removes. DEPLOY COUPLING: the strict schema is also
consumed by the current whole-doc validator, so it must ship with T03's per-item
quarantine parser; until then it increases whole-doc hard-fails. Prompt + max_tokens
headroom + NDJSON framing are documented as a runtime-bundle handoff.
Updated four tests to the strict contract; the forwarded-schema test now reads the
live schema file instead of hard-coding it. Full suite: 213 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Local analysis of the 2026-06-26 daily-triage validation failure: the unbounded
~1-recommendation-per-workstream list (16 active workstreams; JSON break at char
5268, ~rank 8-9) is the structural cause; both the first attempt and the retry
failed. The exact offending token and finish_reason are unrecoverable from
activity-core data — complete() drops finish_reason/usage, the report sink caps
raw output at 4000 chars (< 5268), and the log preview at 2000. Confirming the
exact token needs llm-connect producer-side logs on railiance01 (operator-owned);
mitigation (T02/T03) is identical regardless. Partial fixture captured.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add activity_core/state_hub_write: every State Hub write (report-sink,
ops-evidence, schedule-miss) now sends a stable Idempotency-Key header derived
from run_id:instruction_id:event_type. Makes writes safe to buffer/replay under
the future state-hub beachhead without duplicate progress/triage events. The
read-based _progress_exists dedup is now best-effort (returns False on connection
error instead of hard-failing), so the guarantee lives on the keyed write rather
than a live read. Tests + runbook note. Endpoint adoption / proxy retirement stays
blocked on the state-hub beachhead capability.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add activity_core/schedule_health: a pure evaluate_schedule_health() verdict
(built on Temporal's num_actions_missed_catchup_window plus a staleness check),
an async check_schedule_health() reader, and post_missed_fire_alert() that emits
a schedule_miss State Hub progress event. Makes a missed fire visible even under
misfire_policy=skip, where Temporal drops it by design. Unit tests for the
verdict logic.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Set Temporal catchup_window on cron schedules so a fire missed during a
worker/Temporal outage is no longer silently dropped. Redefine misfire_policy
into three explicit modes — skip, catchup_all, catchup_latest — mapping to
(catchup_window, overlap) pairs; legacy catchup/compress aliased. Add
catchup_window_seconds override. Remove the ad-hoc upsert-time 1h backfill in
favour of native catchup. Apply catchup_latest to daily-statehub-wsjf-triage in
the Railiance runtime manifest and document run-miss policies in the runbook.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Issue-core requires a shared ingestion key on POST /issues/. The REST sink
now sends Authorization: Bearer using ISSUE_CORE_API_KEY and fails fast
when the key is missing under ISSUE_SINK_TYPE=rest.
Updates .env.example, emission boundary docs, and unit tests for the
header contract and missing-key error.
When discover_kaizen_projects returns {"projects": [...]} bound to
context.projects, for_each can iterate the list directly. Multi-key
summaries (e.g. repo SBOM bulk) remain unchanged.
Implement discover_kaizen_scheduled_repos and discover_kaizen_projects per
kaizen-agentic ADR-005 contract: State Hub roster, roster.yaml filter, schedule
validation, and prepare_command emission. Register kaizen/resolver/shell source
types with unit tests and runbook dry-run instructions.
Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
The state-hub resolver was calling GET /sbom/status?repo={slug}, which State
Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug},
/sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/.
The weekly-sbom-staleness ActivityDefinition was passing params {repos: all}
and the resolver was reading params.get("repo_slug", ""), so the URL
collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error,
the rule context.repos.sbom_age_days > 30 evaluated against {} and never
matched, and the weekly SBOM check has been a silent no-op for as long as
the route mismatch has existed.
Resolver now supports two modes selected by params:
- single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns
{repo_slug, last_sbom_at, sbom_age_days, has_sbom}
- bulk: {repos: all} → GET /repos/, computes per-repo age, returns the
worst repo's fields hoisted to the top of the result alongside
stale_count, total_count, worst_* fields, and the full per-repo list
Never-scanned repos get a 99999 sentinel age so threshold rules treat
them as very stale without forcing the rule to special-case None.
Hoisting the worst entry to the top preserves the existing rule
expression context.repos.sbom_age_days > 30 (and target_repo:
context.repos.repo_slug, though that field is a separate interpolation
gap tracked as ADHOC-2026-06-01-T02). The integration tests'
aspirational per-repo iteration model is left intact.
Live validation against State Hub on 2026-06-01:
- single: activity-core → 36 days since 2026-04-26 ingest
- bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never
scanned), rule expression evaluates True
Tests: 120 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>