feat(ACTIVITY-WP-0016-T04): producer trust-boundary guardrails + ADR-004

Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM,
agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate
postures, error-locality and quarantine-with-provenance principles, and the concrete
activity-core mechanisms.

Implement producer-agnostic guardrails in executor.py, applied uniformly on the
happy path and the recovery path via _partition_items: structural-type -> schema ->
structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap.
Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred
from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads
context["known_candidates"] via _allow_list_from_context; inert until a resolver
populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift.

New tests: happy-path count cap, oversized-string guardrail, allow-list rejection.
Full suite: 218 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-26 18:10:17 +02:00
parent c5440e8429
commit 9be4ddbdb7
5 changed files with 373 additions and 12 deletions

View File

@@ -64,7 +64,9 @@ The two evaluation modes:
`context.*` / `event.*` interpolation and explicit `for_each` per-item
binding. No `exec()`.
- **Instruction executor**: trusted-field prompt rendering, LLM call via
llm-connect, structured output validation, bounded validation-failure
llm-connect, structured output validation, item-granular recovery with a
quarantine lane and producer guardrails (count/length/depth caps, reference
allow-list) at the producer trust boundary, bounded validation-failure
artifacts for report instructions, review-required audit metadata, and
deterministic report sinks. A real downstream review queue is not implemented
in this repo.
@@ -320,6 +322,9 @@ new one-off control paths.
governance model, event type schema, ActivityDefinition structure.
- `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
model, evaluation semantics, audit trail, testing strategy.
- `docs/adr/adr-004-producer-trust-boundary.md` — untrusted-producer premise,
trust-but-handle vs verify-and-mitigate postures, error-locality and
quarantine-with-provenance, producer guardrails for LLM/agent/human output.
---