generated from coulomb/repo-seed
chore(consistency): renormalize lifecycle state [auto]
Updated by fix-consistency on 2026-06-26: - workplan status: proposed → active
This commit is contained in:
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "LLM Output Robustness & The Producer Trust Boundary"
|
||||
domain: custodian
|
||||
repo: activity-core
|
||||
status: proposed
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-26"
|
||||
@@ -238,6 +238,39 @@ Done when:
|
||||
- the existing monolithic-document path remains as the fallback when framing is
|
||||
absent (backward compatible with task-only instructions).
|
||||
|
||||
2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`):
|
||||
|
||||
- **Resilient recovery wired into `_execute`.** When the whole-document parse +
|
||||
one retry still fail, report instructions (those with `report_sinks`) now run
|
||||
`_resilient_report` *before* the total-loss `_invalid_output_report`. If it
|
||||
recovers ≥1 valid item it returns a partial report; otherwise it returns None
|
||||
and the prior total-loss path is preserved unchanged.
|
||||
- **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output
|
||||
was pretty-printed (multi-line objects), so naive NDJSON line recovery would
|
||||
have failed. `_extract_object_spans` walks the `recommendations` array
|
||||
brace-depth- and string-aware, so it recovers each recommendation object
|
||||
whether pretty-printed across many lines *or* emitted one-per-line (NDJSON).
|
||||
The truncated trailing object is returned with `complete=False`.
|
||||
- **Layered mitigation per item:** `json.loads` → on failure for a truncated
|
||||
tail, a best-effort `_try_repair` (balance open string/brackets/braces) →
|
||||
then `_partition_items` validates each recovered object against the T02 item
|
||||
schema. Valid items survive; malformed or over-`maxItems` items are
|
||||
quarantined with provenance (`index`, `error`, `raw` snippet, `reason`).
|
||||
- **Report shape on degradation:** `output_validated=True` over the survivors,
|
||||
`review_required=True`, `partial=True`, `quarantined_count`, and a bounded
|
||||
`quarantined_items` list (cap 20). Degraded-but-usable is now reported
|
||||
distinctly from total loss.
|
||||
- **Verified against the real failure shape.** New tests reconstruct a
|
||||
pretty-printed report with 7 valid recommendations + a truncated tail (the
|
||||
06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers
|
||||
all 7 and quarantines the broken tail (previously: whole run discarded);
|
||||
log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item
|
||||
run keeps 2 and quarantines the rank-less one.
|
||||
- **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the
|
||||
*happy* path (valid JSON, all items schema-valid, but > N items) — the resilient
|
||||
path only runs on failure, so over-limit-on-success is a guardrail/count-cap
|
||||
concern, which is exactly T04's remit.
|
||||
|
||||
## Producer Guardrails + ADR-004
|
||||
|
||||
```task
|
||||
|
||||
Reference in New Issue
Block a user