--- id: ACTIVITY-WP-0016 type: workplan title: "LLM Output Robustness & The Producer Trust Boundary" domain: custodian repo: activity-core status: active owner: codex topic_slug: custodian created: "2026-06-26" updated: "2026-06-26" state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726" --- # ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary ## Context On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on time (`daily_triage` event 05:20:57Z) but its output **failed schema validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The model emitted a long ranked WSJF recommendation list (reached rank 7+ with nested `wsjf` objects) and the JSON broke deep in that list. Because the report is a single monolithic JSON document, one malformed delimiter discarded the **entire** run. This reset the three-clean-consecutive-scheduled-runs streak in `ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the LLM-output-quality surface deferred from `ACTIVITY-WP-0010`. The scheduling/runtime layer is healthy — this is purely an output-robustness and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`) already: passes the output schema to llm-connect as a `json_schema` model param (`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser (`_parse_json_output`), and preserves a bounded 4000-char preview on hard failure (`_invalid_output_report`). None of that helps when error locality is zero: the failure unit is the whole document, not the offending item. ## Design Frame — The Producer Trust Boundary This workplan is anchored to a deliberate architectural stance, not just a bug fix. Capture it in an ADR (T04) so future work inherits it. **Premise.** activity-core has a *trust boundary* where free-form producer output meets strict deterministic consumers (JSON Schema validators, the task emitter, classic compute pipelines). The producers are **LLMs and humans (and agents acting for either)**. Both are *untrusted producers*: their output may be - **erroneous** — hallucination, truncation (token-limit cutoff), drift, type slips, typos; or - **malicious** — prompt injection, crafted payloads, oversized/deeply-nested structures aimed at exhausting or confusing the consumer. The architecture should treat the boundary as an adversarial frontier and place **guardrails + error-correction tooling there**, rather than letting raw producer output flow into deterministic consumers and fail (or worse, partially succeed) downstream. **Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem, there are two sensible strategies — and they compose: - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy path. Blast radius depends entirely on how granular the catch is. Good when failures are rare and locally recoverable. Risk: failures surface late, possibly after partial side effects. - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp, and normalize the output to a known-good shape *before* it enters the pipeline — drop bad items, coerce types, bound sizes/depth, allow-list references — so the consumer only ever sees clean input. Higher upfront cost, smaller blast radius, no partial side effects. Good when failures are common or consequences are high. **Governing principles for this repo:** 1. **Push verification to the boundary; keep the interior strict.** Apply posture **B** at the producer→consumer boundary (verify+mitigate structure); keep posture **A** for residual exceptions inside the verified core. Never relax the interior schema to absorb producer sloppiness. 2. **Make error locality match the unit of work.** One bad recommendation must cost one recommendation, not the whole report. Framing the payload so each item is independently parseable is the single highest-leverage change. 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded, provenance-tagged artifacts (index, error, raw snippet) so they can be debugged or replayed — degraded-but-usable is distinct from total loss. 4. **Both human and agent input get the same rigor.** Guardrails are producer-agnostic: the same size/depth/count caps, reference allow-lists, and truncation detection apply whether the producer is an LLM, an agent, or a human form submission. ## Reproduce & Root-Cause The Failure ```task id: ACTIVITY-WP-0016-T01 status: wait priority: high state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138" ``` Recover the **full** raw llm-connect response for the 06-26 failure (the State Hub event keeps only a 4000-char preview; the break is at char 5268) and establish the precise cause. Done when: - the full raw response is pulled from the runtime llm-connect log / response store and the exact offending token at char 5268 is identified; - `finish_reason` is captured to confirm or rule out token-limit **truncation** vs a structural mid-stream glitch; - it is confirmed whether llm-connect actually **enforced** the `json_schema` constrained-decoding hint or merely accepted it as advisory (this determines whether the schema param is load-bearing); - the failing payload is captured as a regression fixture under `tests/`. 2026-06-26 findings (local analysis on the workstation): - **Mechanism confirmed structurally.** There are **16 active workstreams** org-wide and the triage instruction emits ~one ranked recommendation per candidate. The preserved preview holds 7 fully-formed recommendations; the JSON break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the structural cause — more items = more tokens = higher odds of a mid-stream JSON slip and/or truncation. This directly justifies T02's bounded top-N + per-item framing. - **Both attempts failed.** `executor._execute` retries once (`src/activity_core/rules/executor.py:166-171`); the recorded error is from the **retry** output, so the model produced invalid JSON twice — not a one-off. - **activity-core discards the diagnostics needed to root-cause this.** Three retention gaps mean the exact char-5268 token cannot be recovered from activity-core data at all: 1. `LLMConnectClient.complete()` returns only `data["content"]` (`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the llm-connect HTTP response, so truncation-vs-structural cannot be distinguished locally. 2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`, `executor.py:259`) — below the 5268 break. 3. the worker log caps the preview at **2000 chars** (`executor.py:175`). - **Remaining (remote, operator-owned).** Confirming the exact offending token and `finish_reason` requires llm-connect's producer-side logs on `railiance01` — cluster access, outside this repo's SCOPE for direct action. Truncation is the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is identical either way, so T01 does not block the build work. - **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture `finish_reason`/`usage` and persist a larger bounded raw artifact on validation failure so this class of failure is never un-debuggable again. - Partial fixture saved: `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json` (the 4000-char preview + validation error; full payload pending the remote pull). ## Schema + Prompt Redesign For Error Locality ```task id: ACTIVITY-WP-0016-T02 status: progress priority: high state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758" ``` Redesign the daily-triage report contract so a single malformed item can no longer discard the whole report (principle #2). Done when: - the recommendation list is **bounded** (configurable top-N, default 5–7) in both the prompt and the output schema — long lists are where the model drifts; - the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one recommendation object per line — or an equivalent delimited per-item form) behind a minimal stable envelope (`summary` + framed items), so each item is an independent parse unit; - the prompt explicitly states the contract, the per-item framing, the cap, and a "if uncertain, emit fewer well-formed items rather than more" instruction; - `max_tokens` is set with headroom for the bounded list so truncation cannot occur at the expected size; - the output schema file (`_load_output_schema` target) is updated to match. 2026-06-26 progress (in-repo portion): - **Strict, bounded schema written** — `schemas/daily-triage-report.json` went from `recommendations.items: {type: object}` (accept-anything) to a strict per-item contract: `required [rank, candidate, action, why]` with typed `wsjf` sub-fields, plus `maxItems: 7`. The strict item shape is what lets the T03 boundary parser validate each recommendation independently. - **`maxItems` is a hint, not a hard reject** — the in-repo validator (`_validate_schema_node`) only enforces `type`/`required`/`properties`/`items` and ignores `maxItems`/`enum`. That is deliberate: a hard `maxItems` reject would discard a whole 16-item report — the exact blast-radius bug WP-0016 removes. The bound is enforced via the prompt + the llm-connect `json_schema` constraint hint + T03 mitigation (keep top-N by rank, quarantine extras). - **DEPLOY COUPLING (important):** this schema file is consumed *both* as the llm-connect hint *and* by the current whole-document validator. Tightening per-item `required` fields makes the existing whole-doc validation hard-fail **more** until T03 replaces it with per-item quarantine. Therefore the schema change MUST ship together with T03 — do not deploy the strict schema to the runtime bundle ahead of the T03 parser. Four executor/instruction tests that asserted the old loose contract were updated to the strict contract; the forwarded-schema test now reads the live file instead of hard-coding it. - **Truncation hypothesis corroborated** — the instruction config carries `max_tokens` on the order of ~1200 (per the wiring test fixture). 5268 chars ≈ ~1300–1500 tokens, so a ~1200-token cap would truncate a 16-item list right at the observed break. This strengthens T01's leading hypothesis and makes the `max_tokens` headroom change below concrete. **Bundle handoff (NOT in this repo — runtime-projected definition).** The triage prompt and `max_tokens` live in the Railiance runtime bundle, not in repo files. Apply there: 1. Instruct a **bounded top-N** (≤ 7) ranked recommendations, "if uncertain emit fewer well-formed items rather than more." 2. Specify the **per-item framing** the T03 parser will consume (NDJSON: a leading summary object, then one recommendation JSON object per line). 3. Raise **`max_tokens`** to give clear headroom for 7 framed items (eliminate truncation at the expected size). 4. State the value vocabularies (`action`, `confidence`) the T04 guardrails will check. ## Boundary Parser — Verify & Mitigate (Posture B) ```task id: ACTIVITY-WP-0016-T03 status: done priority: high state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709" ``` Implement item-granular parsing with a quarantine lane in `src/activity_core/rules/executor.py`, applying posture **B** at the boundary (principles #1–#3). Done when: - the parser splits the envelope from the framed items, then parses **each item independently**; a malformed item is routed to a bounded `quarantined_items` artifact (index + validation error + raw snippet), not raised; - a run with some valid and some invalid items emits a report over the surviving valid items with `output_validated=true`, plus `partial=true` and `quarantined_count` / `quarantined_items` markers — degraded-but-usable is reported distinctly from total loss; - a best-effort **repair** pass (close unterminated brackets/quotes, recover the valid prefix) is attempted per item before quarantining it; - truncation detected in T01 is handled as its own signal (recover whole items emitted before the cutoff rather than failing the document); - the existing monolithic-document path remains as the fallback when framing is absent (backward compatible with task-only instructions). 2026-06-26 progress (implemented in `src/activity_core/rules/executor.py`): - **Resilient recovery wired into `_execute`.** When the whole-document parse + one retry still fail, report instructions (those with `report_sinks`) now run `_resilient_report` *before* the total-loss `_invalid_output_report`. If it recovers ≥1 valid item it returns a partial report; otherwise it returns None and the prior total-loss path is preserved unchanged. - **Brace/quote-aware object scanner, not line-splitting.** The real 06-26 output was pretty-printed (multi-line objects), so naive NDJSON line recovery would have failed. `_extract_object_spans` walks the `recommendations` array brace-depth- and string-aware, so it recovers each recommendation object whether pretty-printed across many lines *or* emitted one-per-line (NDJSON). The truncated trailing object is returned with `complete=False`. - **Layered mitigation per item:** `json.loads` → on failure for a truncated tail, a best-effort `_try_repair` (balance open string/brackets/braces) → then `_partition_items` validates each recovered object against the T02 item schema. Valid items survive; malformed or over-`maxItems` items are quarantined with provenance (`index`, `error`, `raw` snippet, `reason`). - **Report shape on degradation:** `output_validated=True` over the survivors, `review_required=True`, `partial=True`, `quarantined_count`, and a bounded `quarantined_items` list (cap 20). Degraded-but-usable is now reported distinctly from total loss. - **Verified against the real failure shape.** New tests reconstruct a pretty-printed report with 7 valid recommendations + a truncated tail (the 06-26 shape) and a one-bad-item-among-valid case. The 7-item run now recovers all 7 and quarantines the broken tail (previously: whole run discarded); log line `instruction_output_recovered: kept=7, quarantined=1`. The bad-item run keeps 2 and quarantines the rank-less one. - **Deferred to T04 (clean scope boundary):** enforcing `maxItems` top-N on the *happy* path (valid JSON, all items schema-valid, but > N items) — the resilient path only runs on failure, so over-limit-on-success is a guardrail/count-cap concern, which is exactly T04's remit. ## Producer Guardrails + ADR-004 ```task id: ACTIVITY-WP-0016-T04 status: todo priority: medium state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9" ``` Write the architecture decision record and add the producer-agnostic guardrails (principle #4). Done when: - `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary, the untrusted-producer premise (erroneous **and** malicious; human and agent), the A vs B taxonomy and where each applies, the error-locality principle, and the quarantine-with-provenance rule; - boundary guardrails are enforced at the consumer edge: max item **count**, max string length, max nesting **depth**, and a **reference allow-list** (e.g. a recommendation `candidate` / a task `target_repo` must resolve to a known workstream/repo before it is acted on); - guardrail rejections are quarantined with provenance, consistent with T03; - SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance changes the documented contract. 2026-06-26 progress: - **ADR-004 written** — `docs/adr/adr-004-producer-trust-boundary.md` documents the untrusted-producer premise (erroneous + malicious; LLM/agent/human), the A-vs-B posture taxonomy, the four governing principles, the concrete activity-core mechanisms, a posture-by-layer table, consequences, and alternatives considered. Accepted, scope cross-repo. - **Producer guardrails implemented** in `executor.py`, applied uniformly on the happy path *and* the recovery path via `_partition_items`: per-item order is structural-type → schema → structural caps (`_MAX_DEPTH=8`, `_MAX_STRING_LEN=4000`) → reference allow-list → count cap (`maxItems`). Each quarantine carries a `reason` (`malformed`/`schema`/`guardrail`/`allow_list`/ `over_limit`). - **Happy-path count cap closed** (the item deferred from T03): a syntactically valid 9-item report now keeps 7 and quarantines 2 as `over_limit`, emitting a `partial` report — without a retry. - **Reference allow-list wired but inert.** `_allow_list_from_context` reads `context["known_candidates"]`; when present, recommendations with an unknown `candidate` are quarantined (`reason: allow_list`). Absent today → check is inert; activation is a one-line context-resolver change. Keeps the guardrail producer-agnostic (principle #4) and ready. - **SCOPE.md updated** — instruction-executor bullet now names the quarantine lane + guardrails; ADR-004 added to the Architecture Decisions list. No INTENT drift: this hardens the existing output contract, it does not extend scope. - New tests: happy-path count cap, oversized-string guardrail, allow-list rejection (all green). ## Tests + Calibration Re-Entry ```task id: ACTIVITY-WP-0016-T05 status: todo priority: high state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f" ``` Prove the new posture and hand back to the calibration gates. Done when: - regression tests cover: the captured 06-26 payload, a truncated-mid-list payload, a one-bad-item-among-good payload (asserts quarantine + partial), an oversized/over-deep payload (asserts guardrail rejection), and an injection-shaped reference (asserts allow-list rejection); - the full suite passes and the result is recorded here with the count; - a daily-triage smoke against the live runtime shows a previously-failing payload now **degrades gracefully** (valid items delivered, bad items quarantined) instead of discarding the run; - a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03` that the output-robustness blocker is cleared so the three-clean-run gate can resume on its own. 2026-06-26 progress (in-repo portion complete): - **Regression coverage complete.** Across T03/T04/T05: truncated-mid-list, one-bad-item-among-good (quarantine + partial), oversized-string and over-depth guardrail rejection, allow-list (injection-shaped) rejection, happy-path count cap, and a test driving the **actual captured 2026-06-26 payload** (`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`) — it now recovers 6+ valid recommendations and quarantines the truncated tail, where before it discarded the whole run. - **Full suite green:** 218 passed, 1 skipped (recorded at T04; the T05 fixture + over-depth tests add to this — see the commit). - **Hand-back notes posted** to `ACTIVITY-WP-0006-T03` (State Hub event `b6b8c2b8`) and `ACTIVITY-WP-0010-T04` (`b813f0dc`). - **Remaining (remote, operator-owned):** the live daily-triage smoke on `railiance01` proving end-to-end graceful degradation. It depends on deploying the T02 bundle prompt/`max_tokens`/NDJSON changes together with this code, which is cluster/operator work outside this repo's SCOPE. T05 therefore stays `progress` until that live run exists; the in-repo deliverables are done. ## Relationships - **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and `ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on the same output-quality failure this workplan removes. - **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap). - **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens the instruction-executor output contract; it does not move provider credentials, cluster reconciliation, or task lifecycle into this repo.