--- id: ACT-ADR-004 type: architecture-decision-record title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output" status: accepted decided_by: Bernd Worsch date: "2026-06-26" scope: cross-repo affects: - activity-core - rules-core (future extraction) tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"] --- # ACT-ADR-004: The Producer Trust Boundary ## Status Accepted. ## Context On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called llm-connect successfully, and produced a long ranked recommendation list — but the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because the report was validated and consumed as a single monolithic JSON document, one malformed delimiter discarded the **entire** run, including the 7 perfectly good recommendations the model had already emitted. The scheduling and runtime layers were healthy; the failure was entirely at the seam where free-form model output meets a strict consumer. This is not a one-off bug, it is a recurring class. activity-core has a **trust boundary** wherever generative or human-authored output meets strict deterministic consumers: the JSON Schema validator, the task emitter, and any classic compute pipeline downstream. The producers on the other side of that boundary — **LLMs, agents, and humans** — are all *untrusted producers*. Their output may be: - **erroneous** — hallucination, truncation at a token limit, drift, type slips, typos, a missing delimiter; or - **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested structures intended to exhaust or confuse the consumer. The pre-existing design treated producer output optimistically: parse the whole document, validate the whole document, and on any failure discard the whole document (preserving only a bounded diagnostic preview). That gives **zero error locality** — the blast radius of any single defect is the entire activation. ## Decision Treat the producer→consumer seam as an explicit, adversarial **trust boundary**, and place guardrails plus error-correction tooling *at that boundary* rather than letting raw producer output flow into deterministic consumers. ### Two non-fail-fast postures When hard-failing on a problem is undesirable, there are two sound strategies, and they **compose**: - **A) Trust but handle exceptions** (optimistic / reactive). Consume the output as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy path; blast radius depends entirely on how granular the catch is. Best when failures are rare and locally recoverable. Risk: failures surface late, possibly after partial side effects. - **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp, and normalize the output to a known-good shape *before* it enters the pipeline — drop bad items, coerce types, bound sizes/depth, allow-list references — so the consumer only ever sees clean input. Higher upfront cost, smaller blast radius, no partial side effects. Best when failures are common or consequences are high. ### Governing principles 1. **Push verification to the boundary; keep the interior strict.** Apply posture **B** at the producer→consumer boundary; keep posture **A** for residual exceptions inside the verified core. Never relax the interior schema to absorb producer sloppiness. 2. **Make error locality match the unit of work.** One bad recommendation must cost one recommendation, not the whole report. Structuring the payload so each item is independently parseable and validatable is the highest-leverage change. 3. **Quarantine, never silently drop.** Invalid units are preserved as bounded, provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they can be debugged or replayed. Degraded-but-usable is reported distinctly from total loss. 4. **Both human and agent input get the same rigor.** Guardrails are producer-agnostic: the same count / length / depth caps and reference allow-lists apply whether the producer is an LLM, an agent, or a human. ### What this means concretely in activity-core Implemented in `src/activity_core/rules/executor.py`: - **Strict-structure-only schema.** The daily-triage output schema is strict on per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`) and carries `maxItems` as a producer *hint* — never as a hard whole-document reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs the schema format; `schemas/daily-triage-report.json`). - **Item-granular recovery (posture B).** When whole-document parse + one retry fail, `_resilient_report` recovers individually-parseable recommendation objects via a brace/quote-aware scanner (`_extract_object_spans`) that works for both pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a truncated tail, validates each recovered object against the item schema, and keeps the valid ones. Survivors are emitted with `output_validated=true`, `partial=true`, and `review_required=true`. - **Producer guardrails (`_partition_items`, applied on both the recovery and the happy path).** Per recommendation: structural type → schema → structural caps (`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by `maxItems`). The first failing check quarantines the item with provenance and a `reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`). - **Reference allow-list.** A recommendation whose `candidate` is not in the set of known ids is quarantined. The set is sourced from resolved context (`context["known_candidates"]`, via `_allow_list_from_context`); the check is inert until a context resolver populates it, so the capability ships now and activates with a one-line resolver change. ### Where each posture sits | Layer | Posture | Mechanism | |-------|---------|-----------| | Schema / contract | B | strict per-item structure; `maxItems` as hint | | Whole-document parse | A | tolerant parse + single retry | | Failed parse | B | item-granular recovery + repair + quarantine | | Per-item screening | B | schema + depth/length caps + allow-list + count cap | | Emitted report | — | `partial` / `quarantined_*` provenance; never silent | ## Consequences - A single malformed or oversized item no longer discards an entire activation; the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid recommendations and quarantine the broken tail. - Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks and reviewers can distinguish degraded-but-usable from total loss. - Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the allow-list) are policy knobs that will need tuning; they are intentionally conservative defaults, not a finished calibration. - **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact caps raw output below realistic break points. Capturing those signals so failures stay debuggable is tracked as a retention fix, not closed by this ADR. ## Alternatives considered - **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an over-count document reproduces the whole-document blast radius. Mitigation (keep top-N, quarantine the rest) is preferred. - **Relax the schema to accept anything.** Rejected: violates principle 1; pushes malformed data into downstream consumers. - **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the 2026-06-26 failure recurred across both the initial attempt and the retry, so retry alone does not bound the blast radius. ## References - ACT-ADR-002 — markdown-as-definition format and output schema governance. - ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection surface this boundary complements on the output side. - `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the implementing workplan.