From caa260809285b373214987279dc7ccbdf11bd64a Mon Sep 17 00:00:00 2001 From: tegwick Date: Fri, 26 Jun 2026 17:52:28 +0200 Subject: [PATCH] chore(consistency): renormalize lifecycle state [auto] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated by fix-consistency on 2026-06-26: - workplan status: proposed → active --- ...16-llm-output-robustness-trust-boundary.md | 35 ++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md index 7d56e33..5b1438a 100644 --- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md +++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md @@ -4,7 +4,7 @@ type: workplan title: "LLM Output Robustness & The Producer Trust Boundary" domain: custodian repo: activity-core -status: proposed +status: active owner: codex topic_slug: custodian created: "2026-06-26" @@ -238,6 +238,39 @@ Done when: - the existing monolithic-document path remains as the fallback when framing is absent (backward compatible with task-only instructions). +2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`): + +- **Resilient recovery wired into `_execute`.** When the whole-document parse + + one retry still fail, report instructions (those with `report_sinks`) now run + `_resilient_report` *before* the total-loss `_invalid_output_report`. If it + recovers ≥1 valid item it returns a partial report; otherwise it returns None + and the prior total-loss path is preserved unchanged. +- **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output + was pretty-printed (multi-line objects), so naive NDJSON line recovery would + have failed. `_extract_object_spans` walks the `recommendations` array + brace-depth- and string-aware, so it recovers each recommendation object + whether pretty-printed across many lines *or* emitted one-per-line (NDJSON). + The truncated trailing object is returned with `complete=False`. +- **Layered mitigation per item:** `json.loads` → on failure for a truncated + tail, a best-effort `_try_repair` (balance open string/brackets/braces) → + then `_partition_items` validates each recovered object against the T02 item + schema. Valid items survive; malformed or over-`maxItems` items are + quarantined with provenance (`index`, `error`, `raw` snippet, `reason`). +- **Report shape on degradation:** `output_validated=True` over the survivors, + `review_required=True`, `partial=True`, `quarantined_count`, and a bounded + `quarantined_items` list (cap 20). Degraded-but-usable is now reported + distinctly from total loss. +- **Verified against the real failure shape.** New tests reconstruct a + pretty-printed report with 7 valid recommendations + a truncated tail (the + 06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers + all 7 and quarantines the broken tail (previously: whole run discarded); + log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item + run keeps 2 and quarantines the rank-less one. +- **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the + *happy* path (valid JSON, all items schema-valid, but > N items) — the resilient + path only runs on failure, so over-limit-on-success is a guardrail/count-cap + concern, which is exactly T04's remit. + ## Producer Guardrails + ADR-004 ```task