activity-core/docs/adr/adr-004-producer-trust-boundary.md

---
id: ACT-ADR-004
type: architecture-decision-record
title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output"
status: accepted
decided_by: Bernd Worsch
date: "2026-06-26"
scope: cross-repo
affects:
  - activity-core
  - rules-core (future extraction)
tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"]
---

# ACT-ADR-004: The Producer Trust Boundary

## Status

Accepted.

## Context

On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called
llm-connect successfully, and produced a long ranked recommendation list — but
the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because
the report was validated and consumed as a single monolithic JSON document, one
malformed delimiter discarded the **entire** run, including the 7 perfectly good
recommendations the model had already emitted. The scheduling and runtime layers
were healthy; the failure was entirely at the seam where free-form model output
meets a strict consumer.

This is not a one-off bug, it is a recurring class. activity-core has a **trust
boundary** wherever generative or human-authored output meets strict deterministic
consumers: the JSON Schema validator, the task emitter, and any classic compute
pipeline downstream. The producers on the other side of that boundary — **LLMs,
agents, and humans** — are all *untrusted producers*. Their output may be:

- **erroneous** — hallucination, truncation at a token limit, drift, type slips,
  typos, a missing delimiter; or
- **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested
  structures intended to exhaust or confuse the consumer.

The pre-existing design treated producer output optimistically: parse the whole
document, validate the whole document, and on any failure discard the whole
document (preserving only a bounded diagnostic preview). That gives **zero error
locality** — the blast radius of any single defect is the entire activation.

## Decision

Treat the producer→consumer seam as an explicit, adversarial **trust boundary**,
and place guardrails plus error-correction tooling *at that boundary* rather than
letting raw producer output flow into deterministic consumers.

### Two non-fail-fast postures

When hard-failing on a problem is undesirable, there are two sound strategies, and
they **compose**:

- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy
  path; blast radius depends entirely on how granular the catch is. Best when
  failures are rare and locally recoverable. Risk: failures surface late, possibly
  after partial side effects.
- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
  and normalize the output to a known-good shape *before* it enters the pipeline —
  drop bad items, coerce types, bound sizes/depth, allow-list references — so the
  consumer only ever sees clean input. Higher upfront cost, smaller blast radius,
  no partial side effects. Best when failures are common or consequences are high.

### Governing principles

1. **Push verification to the boundary; keep the interior strict.** Apply posture
   **B** at the producer→consumer boundary; keep posture **A** for residual
   exceptions inside the verified core. Never relax the interior schema to absorb
   producer sloppiness.
2. **Make error locality match the unit of work.** One bad recommendation must
   cost one recommendation, not the whole report. Structuring the payload so each
   item is independently parseable and validatable is the highest-leverage change.
3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
   provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they
   can be debugged or replayed. Degraded-but-usable is reported distinctly from
   total loss.
4. **Both human and agent input get the same rigor.** Guardrails are
   producer-agnostic: the same count / length / depth caps and reference
   allow-lists apply whether the producer is an LLM, an agent, or a human.

### What this means concretely in activity-core

Implemented in `src/activity_core/rules/executor.py`:

- **Strict-structure-only schema.** The daily-triage output schema is strict on
  per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`)
  and carries `maxItems` as a producer *hint* — never as a hard whole-document
  reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs
  the schema format; `schemas/daily-triage-report.json`).
- **Item-granular recovery (posture B).** When whole-document parse + one retry
  fail, `_resilient_report` recovers individually-parseable recommendation objects
  via a brace/quote-aware scanner (`_extract_object_spans`) that works for both
  pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a
  truncated tail, validates each recovered object against the item schema, and
  keeps the valid ones. Survivors are emitted with `output_validated=true`,
  `partial=true`, and `review_required=true`.
- **Producer guardrails (`_partition_items`, applied on both the recovery and the
  happy path).** Per recommendation: structural type → schema → structural caps
  (`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by
  `maxItems`). The first failing check quarantines the item with provenance and a
  `reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`).
- **Reference allow-list.** A recommendation whose `candidate` is not in the set of
  known ids is quarantined. The set is sourced from resolved context
  (`context["known_candidates"]`, via `_allow_list_from_context`); the check is
  inert until a context resolver populates it, so the capability ships now and
  activates with a one-line resolver change.

### Where each posture sits

| Layer | Posture | Mechanism |
|-------|---------|-----------|
| Schema / contract | B | strict per-item structure; `maxItems` as hint |
| Whole-document parse | A | tolerant parse + single retry |
| Failed parse | B | item-granular recovery + repair + quarantine |
| Per-item screening | B | schema + depth/length caps + allow-list + count cap |
| Emitted report | — | `partial` / `quarantined_*` provenance; never silent |

## Consequences

- A single malformed or oversized item no longer discards an entire activation;
  the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid
  recommendations and quarantine the broken tail.
- Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks
  and reviewers can distinguish degraded-but-usable from total loss.
- Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the
  allow-list) are policy knobs that will need tuning; they are intentionally
  conservative defaults, not a finished calibration.
- **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns
  only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact
  caps raw output below realistic break points. Capturing those signals so
  failures stay debuggable is tracked as a retention fix, not closed by this ADR.

## Alternatives considered

- **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an
  over-count document reproduces the whole-document blast radius. Mitigation (keep
  top-N, quarantine the rest) is preferred.
- **Relax the schema to accept anything.** Rejected: violates principle 1; pushes
  malformed data into downstream consumers.
- **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the
  2026-06-26 failure recurred across both the initial attempt and the retry, so
  retry alone does not bound the blast radius.

## References

- ACT-ADR-002 — markdown-as-definition format and output schema governance.
- ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection
  surface this boundary complements on the output side.
- `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the
  implementing workplan.