Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM, agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate postures, error-locality and quarantine-with-provenance principles, and the concrete activity-core mechanisms. Implement producer-agnostic guardrails in executor.py, applied uniformly on the happy path and the recovery path via _partition_items: structural-type -> schema -> structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap. Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads context["known_candidates"] via _allow_list_from_context; inert until a resolver populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift. New tests: happy-path count cap, oversized-string guardrail, allow-list rejection. Full suite: 218 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8.1 KiB
id, type, title, status, decided_by, date, scope, affects, tags
| id | type | title | status | decided_by | date | scope | affects | tags | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACT-ADR-004 | architecture-decision-record | The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output | accepted | Bernd Worsch | 2026-06-26 | cross-repo |
|
|
ACT-ADR-004: The Producer Trust Boundary
Status
Accepted.
Context
On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called llm-connect successfully, and produced a long ranked recommendation list — but the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because the report was validated and consumed as a single monolithic JSON document, one malformed delimiter discarded the entire run, including the 7 perfectly good recommendations the model had already emitted. The scheduling and runtime layers were healthy; the failure was entirely at the seam where free-form model output meets a strict consumer.
This is not a one-off bug, it is a recurring class. activity-core has a trust boundary wherever generative or human-authored output meets strict deterministic consumers: the JSON Schema validator, the task emitter, and any classic compute pipeline downstream. The producers on the other side of that boundary — LLMs, agents, and humans — are all untrusted producers. Their output may be:
- erroneous — hallucination, truncation at a token limit, drift, type slips, typos, a missing delimiter; or
- malicious — prompt injection, crafted payloads, or oversized / deeply-nested structures intended to exhaust or confuse the consumer.
The pre-existing design treated producer output optimistically: parse the whole document, validate the whole document, and on any failure discard the whole document (preserving only a bounded diagnostic preview). That gives zero error locality — the blast radius of any single defect is the entire activation.
Decision
Treat the producer→consumer seam as an explicit, adversarial trust boundary, and place guardrails plus error-correction tooling at that boundary rather than letting raw producer output flow into deterministic consumers.
Two non-fail-fast postures
When hard-failing on a problem is undesirable, there are two sound strategies, and they compose:
- A) Trust but handle exceptions (optimistic / reactive). Consume the output as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy path; blast radius depends entirely on how granular the catch is. Best when failures are rare and locally recoverable. Risk: failures surface late, possibly after partial side effects.
- B) Verify and mitigate (defensive / proactive). Validate, sanitize, clamp, and normalize the output to a known-good shape before it enters the pipeline — drop bad items, coerce types, bound sizes/depth, allow-list references — so the consumer only ever sees clean input. Higher upfront cost, smaller blast radius, no partial side effects. Best when failures are common or consequences are high.
Governing principles
- Push verification to the boundary; keep the interior strict. Apply posture B at the producer→consumer boundary; keep posture A for residual exceptions inside the verified core. Never relax the interior schema to absorb producer sloppiness.
- Make error locality match the unit of work. One bad recommendation must cost one recommendation, not the whole report. Structuring the payload so each item is independently parseable and validatable is the highest-leverage change.
- Quarantine, never silently drop. Invalid units are preserved as bounded,
provenance-tagged artifacts (
index,error,rawsnippet,reason) so they can be debugged or replayed. Degraded-but-usable is reported distinctly from total loss. - Both human and agent input get the same rigor. Guardrails are producer-agnostic: the same count / length / depth caps and reference allow-lists apply whether the producer is an LLM, an agent, or a human.
What this means concretely in activity-core
Implemented in src/activity_core/rules/executor.py:
- Strict-structure-only schema. The daily-triage output schema is strict on
per-item structure (
required [rank, candidate, action, why], typedwsjf) and carriesmaxItemsas a producer hint — never as a hard whole-document reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs the schema format;schemas/daily-triage-report.json). - Item-granular recovery (posture B). When whole-document parse + one retry
fail,
_resilient_reportrecovers individually-parseable recommendation objects via a brace/quote-aware scanner (_extract_object_spans) that works for both pretty-printed and NDJSON output, attempts a best-effort_try_repairon a truncated tail, validates each recovered object against the item schema, and keeps the valid ones. Survivors are emitted withoutput_validated=true,partial=true, andreview_required=true. - Producer guardrails (
_partition_items, applied on both the recovery and the happy path). Per recommendation: structural type → schema → structural caps (_MAX_DEPTH,_MAX_STRING_LEN) → reference allow-list → count cap (top-N bymaxItems). The first failing check quarantines the item with provenance and areason(malformed/schema/guardrail/allow_list/over_limit). - Reference allow-list. A recommendation whose
candidateis not in the set of known ids is quarantined. The set is sourced from resolved context (context["known_candidates"], via_allow_list_from_context); the check is inert until a context resolver populates it, so the capability ships now and activates with a one-line resolver change.
Where each posture sits
| Layer | Posture | Mechanism |
|---|---|---|
| Schema / contract | B | strict per-item structure; maxItems as hint |
| Whole-document parse | A | tolerant parse + single retry |
| Failed parse | B | item-granular recovery + repair + quarantine |
| Per-item screening | B | schema + depth/length caps + allow-list + count cap |
| Emitted report | — | partial / quarantined_* provenance; never silent |
Consequences
- A single malformed or oversized item no longer discards an entire activation; the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid recommendations and quarantine the broken tail.
- Reports gain a
partial/quarantined_*vocabulary; downstream report sinks and reviewers can distinguish degraded-but-usable from total loss. - Guardrail thresholds (
_MAX_DEPTH,_MAX_STRING_LEN,maxItems, the allow-list) are policy knobs that will need tuning; they are intentionally conservative defaults, not a finished calibration. - Known retention gap (follow-on):
LLMConnectClient.complete()still returns onlycontent, discardingfinish_reason/usage, and the total-loss artifact caps raw output below realistic break points. Capturing those signals so failures stay debuggable is tracked as a retention fix, not closed by this ADR.
Alternatives considered
- Hard-enforce
maxItemsin the validator. Rejected: a hard reject of an over-count document reproduces the whole-document blast radius. Mitigation (keep top-N, quarantine the rest) is preferred. - Relax the schema to accept anything. Rejected: violates principle 1; pushes malformed data into downstream consumers.
- Retry-until-valid only (pure posture A). Rejected as the sole strategy: the 2026-06-26 failure recurred across both the initial attempt and the retry, so retry alone does not bound the blast radius.
References
- ACT-ADR-002 — markdown-as-definition format and output schema governance.
- ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection surface this boundary complements on the output side.
workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md— the implementing workplan.