generated from coulomb/repo-seed
feat(ACTIVITY-WP-0016-T04): producer trust-boundary guardrails + ADR-004
Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM, agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate postures, error-locality and quarantine-with-provenance principles, and the concrete activity-core mechanisms. Implement producer-agnostic guardrails in executor.py, applied uniformly on the happy path and the recovery path via _partition_items: structural-type -> schema -> structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap. Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads context["known_candidates"] via _allow_list_from_context; inert until a resolver populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift. New tests: happy-path count cap, oversized-string guardrail, allow-list rejection. Full suite: 218 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
7
SCOPE.md
7
SCOPE.md
@@ -64,7 +64,9 @@ The two evaluation modes:
|
||||
`context.*` / `event.*` interpolation and explicit `for_each` per-item
|
||||
binding. No `exec()`.
|
||||
- **Instruction executor**: trusted-field prompt rendering, LLM call via
|
||||
llm-connect, structured output validation, bounded validation-failure
|
||||
llm-connect, structured output validation, item-granular recovery with a
|
||||
quarantine lane and producer guardrails (count/length/depth caps, reference
|
||||
allow-list) at the producer trust boundary, bounded validation-failure
|
||||
artifacts for report instructions, review-required audit metadata, and
|
||||
deterministic report sinks. A real downstream review queue is not implemented
|
||||
in this repo.
|
||||
@@ -320,6 +322,9 @@ new one-off control paths.
|
||||
governance model, event type schema, ActivityDefinition structure.
|
||||
- `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
|
||||
model, evaluation semantics, audit trail, testing strategy.
|
||||
- `docs/adr/adr-004-producer-trust-boundary.md` — untrusted-producer premise,
|
||||
trust-but-handle vs verify-and-mitigate postures, error-locality and
|
||||
quarantine-with-provenance, producer guardrails for LLM/agent/human output.
|
||||
|
||||
---
|
||||
|
||||
|
||||
156
docs/adr/adr-004-producer-trust-boundary.md
Normal file
156
docs/adr/adr-004-producer-trust-boundary.md
Normal file
@@ -0,0 +1,156 @@
|
||||
---
|
||||
id: ACT-ADR-004
|
||||
type: architecture-decision-record
|
||||
title: "The Producer Trust Boundary — Guardrails and Error-Correction for Untrusted Output"
|
||||
status: accepted
|
||||
decided_by: Bernd Worsch
|
||||
date: "2026-06-26"
|
||||
scope: cross-repo
|
||||
affects:
|
||||
- activity-core
|
||||
- rules-core (future extraction)
|
||||
tags: ["architecture", "llm", "safety", "validation", "guardrails", "trust-boundary", "resilience"]
|
||||
---
|
||||
|
||||
# ACT-ADR-004: The Producer Trust Boundary
|
||||
|
||||
## Status
|
||||
|
||||
Accepted.
|
||||
|
||||
## Context
|
||||
|
||||
On 2026-06-26 the scheduled daily WSJF triage instruction fired on time, called
|
||||
llm-connect successfully, and produced a long ranked recommendation list — but
|
||||
the JSON broke at char 5268 (~rank 8–9 of ~16), failing schema validation. Because
|
||||
the report was validated and consumed as a single monolithic JSON document, one
|
||||
malformed delimiter discarded the **entire** run, including the 7 perfectly good
|
||||
recommendations the model had already emitted. The scheduling and runtime layers
|
||||
were healthy; the failure was entirely at the seam where free-form model output
|
||||
meets a strict consumer.
|
||||
|
||||
This is not a one-off bug, it is a recurring class. activity-core has a **trust
|
||||
boundary** wherever generative or human-authored output meets strict deterministic
|
||||
consumers: the JSON Schema validator, the task emitter, and any classic compute
|
||||
pipeline downstream. The producers on the other side of that boundary — **LLMs,
|
||||
agents, and humans** — are all *untrusted producers*. Their output may be:
|
||||
|
||||
- **erroneous** — hallucination, truncation at a token limit, drift, type slips,
|
||||
typos, a missing delimiter; or
|
||||
- **malicious** — prompt injection, crafted payloads, or oversized / deeply-nested
|
||||
structures intended to exhaust or confuse the consumer.
|
||||
|
||||
The pre-existing design treated producer output optimistically: parse the whole
|
||||
document, validate the whole document, and on any failure discard the whole
|
||||
document (preserving only a bounded diagnostic preview). That gives **zero error
|
||||
locality** — the blast radius of any single defect is the entire activation.
|
||||
|
||||
## Decision
|
||||
|
||||
Treat the producer→consumer seam as an explicit, adversarial **trust boundary**,
|
||||
and place guardrails plus error-correction tooling *at that boundary* rather than
|
||||
letting raw producer output flow into deterministic consumers.
|
||||
|
||||
### Two non-fail-fast postures
|
||||
|
||||
When hard-failing on a problem is undesirable, there are two sound strategies, and
|
||||
they **compose**:
|
||||
|
||||
- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
|
||||
as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy
|
||||
path; blast radius depends entirely on how granular the catch is. Best when
|
||||
failures are rare and locally recoverable. Risk: failures surface late, possibly
|
||||
after partial side effects.
|
||||
- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
|
||||
and normalize the output to a known-good shape *before* it enters the pipeline —
|
||||
drop bad items, coerce types, bound sizes/depth, allow-list references — so the
|
||||
consumer only ever sees clean input. Higher upfront cost, smaller blast radius,
|
||||
no partial side effects. Best when failures are common or consequences are high.
|
||||
|
||||
### Governing principles
|
||||
|
||||
1. **Push verification to the boundary; keep the interior strict.** Apply posture
|
||||
**B** at the producer→consumer boundary; keep posture **A** for residual
|
||||
exceptions inside the verified core. Never relax the interior schema to absorb
|
||||
producer sloppiness.
|
||||
2. **Make error locality match the unit of work.** One bad recommendation must
|
||||
cost one recommendation, not the whole report. Structuring the payload so each
|
||||
item is independently parseable and validatable is the highest-leverage change.
|
||||
3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
|
||||
provenance-tagged artifacts (`index`, `error`, `raw` snippet, `reason`) so they
|
||||
can be debugged or replayed. Degraded-but-usable is reported distinctly from
|
||||
total loss.
|
||||
4. **Both human and agent input get the same rigor.** Guardrails are
|
||||
producer-agnostic: the same count / length / depth caps and reference
|
||||
allow-lists apply whether the producer is an LLM, an agent, or a human.
|
||||
|
||||
### What this means concretely in activity-core
|
||||
|
||||
Implemented in `src/activity_core/rules/executor.py`:
|
||||
|
||||
- **Strict-structure-only schema.** The daily-triage output schema is strict on
|
||||
per-item *structure* (`required [rank, candidate, action, why]`, typed `wsjf`)
|
||||
and carries `maxItems` as a producer *hint* — never as a hard whole-document
|
||||
reject, which would reproduce the very blast-radius failure (ACT-ADR-002 governs
|
||||
the schema format; `schemas/daily-triage-report.json`).
|
||||
- **Item-granular recovery (posture B).** When whole-document parse + one retry
|
||||
fail, `_resilient_report` recovers individually-parseable recommendation objects
|
||||
via a brace/quote-aware scanner (`_extract_object_spans`) that works for both
|
||||
pretty-printed and NDJSON output, attempts a best-effort `_try_repair` on a
|
||||
truncated tail, validates each recovered object against the item schema, and
|
||||
keeps the valid ones. Survivors are emitted with `output_validated=true`,
|
||||
`partial=true`, and `review_required=true`.
|
||||
- **Producer guardrails (`_partition_items`, applied on both the recovery and the
|
||||
happy path).** Per recommendation: structural type → schema → structural caps
|
||||
(`_MAX_DEPTH`, `_MAX_STRING_LEN`) → reference allow-list → count cap (top-N by
|
||||
`maxItems`). The first failing check quarantines the item with provenance and a
|
||||
`reason` (`malformed` / `schema` / `guardrail` / `allow_list` / `over_limit`).
|
||||
- **Reference allow-list.** A recommendation whose `candidate` is not in the set of
|
||||
known ids is quarantined. The set is sourced from resolved context
|
||||
(`context["known_candidates"]`, via `_allow_list_from_context`); the check is
|
||||
inert until a context resolver populates it, so the capability ships now and
|
||||
activates with a one-line resolver change.
|
||||
|
||||
### Where each posture sits
|
||||
|
||||
| Layer | Posture | Mechanism |
|
||||
|-------|---------|-----------|
|
||||
| Schema / contract | B | strict per-item structure; `maxItems` as hint |
|
||||
| Whole-document parse | A | tolerant parse + single retry |
|
||||
| Failed parse | B | item-granular recovery + repair + quarantine |
|
||||
| Per-item screening | B | schema + depth/length caps + allow-list + count cap |
|
||||
| Emitted report | — | `partial` / `quarantined_*` provenance; never silent |
|
||||
|
||||
## Consequences
|
||||
|
||||
- A single malformed or oversized item no longer discards an entire activation;
|
||||
the daily-triage run that failed on 2026-06-26 would now deliver its 7 valid
|
||||
recommendations and quarantine the broken tail.
|
||||
- Reports gain a `partial` / `quarantined_*` vocabulary; downstream report sinks
|
||||
and reviewers can distinguish degraded-but-usable from total loss.
|
||||
- Guardrail thresholds (`_MAX_DEPTH`, `_MAX_STRING_LEN`, `maxItems`, the
|
||||
allow-list) are policy knobs that will need tuning; they are intentionally
|
||||
conservative defaults, not a finished calibration.
|
||||
- **Known retention gap (follow-on):** `LLMConnectClient.complete()` still returns
|
||||
only `content`, discarding `finish_reason`/`usage`, and the total-loss artifact
|
||||
caps raw output below realistic break points. Capturing those signals so
|
||||
failures stay debuggable is tracked as a retention fix, not closed by this ADR.
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
- **Hard-enforce `maxItems` in the validator.** Rejected: a hard reject of an
|
||||
over-count document reproduces the whole-document blast radius. Mitigation (keep
|
||||
top-N, quarantine the rest) is preferred.
|
||||
- **Relax the schema to accept anything.** Rejected: violates principle 1; pushes
|
||||
malformed data into downstream consumers.
|
||||
- **Retry-until-valid only (pure posture A).** Rejected as the sole strategy: the
|
||||
2026-06-26 failure recurred across both the initial attempt and the retry, so
|
||||
retry alone does not bound the blast radius.
|
||||
|
||||
## References
|
||||
|
||||
- ACT-ADR-002 — markdown-as-definition format and output schema governance.
|
||||
- ACT-ADR-003 — Rule vs. Instruction model; the Instruction prompt-injection
|
||||
surface this boundary complements on the output side.
|
||||
- `workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md` — the
|
||||
implementing workplan.
|
||||
@@ -160,15 +160,20 @@ def _execute(
|
||||
prompt_hash = hashlib.sha256(rendered.encode()).hexdigest()
|
||||
llm_config = _llm_run_config(instr)
|
||||
|
||||
# Reference allow-list (WP-0016-T04): if a context resolver supplied the set
|
||||
# of known candidate ids, recommendations pointing at anything else are
|
||||
# quarantined. Absent (None) today → the check is inert until wired.
|
||||
allow_list = _allow_list_from_context(context)
|
||||
|
||||
# Step 3 — call LLM
|
||||
raw_output = llm_client.complete(rendered, model=instr.model, config=llm_config)
|
||||
|
||||
# Step 4 — validate and optionally retry
|
||||
task_specs, report, error = _validate_output(raw_output, instr)
|
||||
task_specs, report, error = _validate_output(raw_output, instr, allow_list)
|
||||
if error:
|
||||
retry_prompt = rendered + f"\n\nPrevious output was invalid: {error}\nPlease fix."
|
||||
raw_output = llm_client.complete(retry_prompt, model=instr.model, config=llm_config)
|
||||
task_specs, report, error = _validate_output(raw_output, instr)
|
||||
task_specs, report, error = _validate_output(raw_output, instr, allow_list)
|
||||
if error:
|
||||
# Truncate to keep log volume bounded but long enough to see the
|
||||
# actual JSON shape mismatch (typical reports are <2KB).
|
||||
@@ -181,7 +186,9 @@ def _execute(
|
||||
# Posture B (WP-0016-T03): try to recover a partial-but-usable
|
||||
# report from individually-parseable items before declaring total
|
||||
# loss. One bad item should cost one item, not the whole report.
|
||||
recovered = _resilient_report(instr, raw_output, error, prompt_hash)
|
||||
recovered = _resilient_report(
|
||||
instr, raw_output, error, prompt_hash, allow_list,
|
||||
)
|
||||
if recovered is not None:
|
||||
return recovered
|
||||
failure_report = _invalid_output_report(instr, error, raw_output)
|
||||
@@ -297,6 +304,12 @@ def _invalid_output_report(
|
||||
|
||||
_QUARANTINE_LIMIT = 20
|
||||
_SNIPPET_LIMIT = 200
|
||||
# Producer guardrails (ACTIVITY-WP-0016-T04): structural bounds applied to every
|
||||
# recommendation regardless of producer (LLM, agent, or human). These are
|
||||
# verify-and-mitigate limits — an offending item is quarantined, never allowed to
|
||||
# fail the whole report or flow unbounded into a downstream consumer.
|
||||
_MAX_STRING_LEN = 4000
|
||||
_MAX_DEPTH = 8
|
||||
_SUMMARY_RE = re.compile(r'"summary"\s*:\s*"((?:[^"\\]|\\.)*)"')
|
||||
|
||||
|
||||
@@ -305,6 +318,51 @@ def _snippet(value: Any) -> str:
|
||||
return text[:_SNIPPET_LIMIT]
|
||||
|
||||
|
||||
def _json_depth(value: Any, depth: int = 1) -> int:
|
||||
if depth > _MAX_DEPTH:
|
||||
return depth
|
||||
if isinstance(value, dict):
|
||||
return max((_json_depth(v, depth + 1) for v in value.values()), default=depth)
|
||||
if isinstance(value, list):
|
||||
return max((_json_depth(v, depth + 1) for v in value), default=depth)
|
||||
return depth
|
||||
|
||||
|
||||
def _has_oversized_string(value: Any) -> bool:
|
||||
if isinstance(value, str):
|
||||
return len(value) > _MAX_STRING_LEN
|
||||
if isinstance(value, dict):
|
||||
return any(_has_oversized_string(v) for v in value.values())
|
||||
if isinstance(value, list):
|
||||
return any(_has_oversized_string(v) for v in value)
|
||||
return False
|
||||
|
||||
|
||||
def _item_structure_error(item: Any) -> str | None:
|
||||
"""Producer-agnostic structural guardrail: depth and string-length caps."""
|
||||
if _json_depth(item) > _MAX_DEPTH:
|
||||
return f"exceeds max nesting depth {_MAX_DEPTH}"
|
||||
if _has_oversized_string(item):
|
||||
return f"contains a string longer than {_MAX_STRING_LEN} chars"
|
||||
return None
|
||||
|
||||
|
||||
def _allow_list_from_context(context: dict | None) -> set[str] | None:
|
||||
"""Build the recommendation-candidate allow-list from resolved context.
|
||||
|
||||
Looks for `context["known_candidates"]` (a list/set of valid candidate ids).
|
||||
Returns None when absent so the allow-list check stays inert until a context
|
||||
resolver populates it — the guardrail capability ships now; activation is a
|
||||
one-line resolver change.
|
||||
"""
|
||||
if not isinstance(context, dict):
|
||||
return None
|
||||
known = context.get("known_candidates")
|
||||
if isinstance(known, (list, set, tuple)):
|
||||
return {str(item) for item in known}
|
||||
return None
|
||||
|
||||
|
||||
def _report_contract(instr: Any) -> tuple[dict[str, Any] | None, int | None]:
|
||||
"""Extract (item_schema, max_items) for the recommendations list, if any."""
|
||||
try:
|
||||
@@ -440,20 +498,53 @@ def _partition_items(
|
||||
items: list[dict[str, Any]],
|
||||
item_schema: dict[str, Any] | None,
|
||||
max_items: int | None,
|
||||
*,
|
||||
run_schema: bool = True,
|
||||
allow_list: set[str] | None = None,
|
||||
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
|
||||
"""Split items into (valid, quarantined): schema-invalid then over-limit."""
|
||||
"""Screen items into (valid, quarantined).
|
||||
|
||||
Applied uniformly to recovered items (run_schema=True) and to already
|
||||
schema-valid happy-path items (run_schema=False). Order of checks: structural
|
||||
type → schema → producer guardrails (depth/length) → reference allow-list →
|
||||
count cap. The first failing check quarantines the item with provenance.
|
||||
"""
|
||||
valid: list[dict[str, Any]] = []
|
||||
quarantined: list[dict[str, Any]] = []
|
||||
for index, item in enumerate(items):
|
||||
error = (
|
||||
if not isinstance(item, dict):
|
||||
quarantined.append(
|
||||
{"index": index, "error": "item is not a JSON object",
|
||||
"raw": _snippet(item), "reason": "malformed"}
|
||||
)
|
||||
continue
|
||||
schema_error = (
|
||||
_validate_schema_node(item, item_schema, f"recommendations[{index}]")
|
||||
if item_schema
|
||||
if (run_schema and item_schema)
|
||||
else None
|
||||
)
|
||||
if error:
|
||||
quarantined.append({"index": index, "error": error, "raw": _snippet(item)})
|
||||
else:
|
||||
valid.append(item)
|
||||
if schema_error:
|
||||
quarantined.append(
|
||||
{"index": index, "error": schema_error, "raw": _snippet(item),
|
||||
"reason": "schema"}
|
||||
)
|
||||
continue
|
||||
structure_error = _item_structure_error(item)
|
||||
if structure_error:
|
||||
quarantined.append(
|
||||
{"index": index, "error": structure_error, "raw": _snippet(item),
|
||||
"reason": "guardrail"}
|
||||
)
|
||||
continue
|
||||
if allow_list is not None:
|
||||
candidate = item.get("candidate")
|
||||
if not isinstance(candidate, str) or candidate not in allow_list:
|
||||
quarantined.append(
|
||||
{"index": index, "error": f"candidate {candidate!r} not in allow-list",
|
||||
"raw": _snippet(item), "reason": "allow_list"}
|
||||
)
|
||||
continue
|
||||
valid.append(item)
|
||||
if max_items is not None and len(valid) > max_items:
|
||||
for item in valid[max_items:]:
|
||||
quarantined.append(
|
||||
@@ -469,6 +560,7 @@ def _resilient_report(
|
||||
raw_output: Any,
|
||||
original_error: str,
|
||||
prompt_hash: str | None,
|
||||
allow_list: set[str] | None = None,
|
||||
) -> InstructionResult | None:
|
||||
"""Recover a partial-but-usable report from output that failed validation.
|
||||
|
||||
@@ -481,7 +573,9 @@ def _resilient_report(
|
||||
summary, items, quarantined = _recover_recommendations(raw_output)
|
||||
if not items:
|
||||
return None
|
||||
valid, item_quarantine = _partition_items(items, item_schema, max_items)
|
||||
valid, item_quarantine = _partition_items(
|
||||
items, item_schema, max_items, allow_list=allow_list,
|
||||
)
|
||||
quarantined.extend(item_quarantine)
|
||||
if not valid:
|
||||
return None
|
||||
@@ -528,6 +622,7 @@ def _execution_failure_report(instr: Any, error: str) -> dict[str, Any] | None:
|
||||
def _validate_output(
|
||||
raw_output: Any,
|
||||
instr: Any,
|
||||
allow_list: set[str] | None = None,
|
||||
) -> tuple[list[TaskSpec], dict[str, Any] | None, str | None]:
|
||||
"""Parse raw LLM output into TaskSpecs and optional report payload.
|
||||
|
||||
@@ -582,6 +677,28 @@ def _validate_output(
|
||||
source_type="instruction",
|
||||
source_id=instr.id,
|
||||
))
|
||||
|
||||
# Happy-path producer guardrails (WP-0016-T04): the whole document already
|
||||
# passed schema validation, so recommendations are schema-valid; still apply
|
||||
# the count cap, structural caps, and reference allow-list, quarantining any
|
||||
# offenders rather than emitting them. Report shape only changes when an item
|
||||
# is actually quarantined.
|
||||
if isinstance(report, dict) and isinstance(report.get("recommendations"), list):
|
||||
item_schema, max_items = _report_contract(instr)
|
||||
kept, quarantined = _partition_items(
|
||||
report["recommendations"], item_schema, max_items,
|
||||
run_schema=False, allow_list=allow_list,
|
||||
)
|
||||
if quarantined:
|
||||
report = {
|
||||
**report,
|
||||
"recommendations": kept,
|
||||
"status": "partial",
|
||||
"partial": True,
|
||||
"quarantined_count": len(quarantined),
|
||||
"quarantined_items": quarantined[:_QUARANTINE_LIMIT],
|
||||
}
|
||||
|
||||
return specs, report, None
|
||||
except (json.JSONDecodeError, AttributeError, KeyError, TypeError) as exc:
|
||||
return [], None, str(exc)
|
||||
|
||||
@@ -475,6 +475,62 @@ def test_resilient_report_quarantines_one_bad_item_among_valid():
|
||||
assert "rank" in result.report["quarantined_items"][0]["error"]
|
||||
|
||||
|
||||
# ── WP-0016-T04 producer guardrails ───────────────────────────────────────────
|
||||
|
||||
def _triage_instr() -> SimpleNamespace:
|
||||
return _instr(
|
||||
id="daily-triage-report",
|
||||
prompt="Report.",
|
||||
trusted_fields=[],
|
||||
output_schema="schemas/daily-triage-report.json",
|
||||
report_sinks=[{"type": "working-memory"}],
|
||||
)
|
||||
|
||||
|
||||
def test_guardrail_count_cap_on_valid_happy_path():
|
||||
# 9 fully-valid recommendations in a syntactically valid document: schema
|
||||
# validation passes, but the maxItems=7 count cap must keep 7 and quarantine 2.
|
||||
recs = [_valid_rec(i) for i in range(1, 10)]
|
||||
raw = json.dumps({"summary": "Triage.", "recommendations": recs})
|
||||
llm = _CountingLLM([raw])
|
||||
|
||||
result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
|
||||
|
||||
assert llm.call_count == 1 # no retry — the document was valid
|
||||
assert result.report["partial"] is True
|
||||
assert len(result.report["recommendations"]) == 7
|
||||
assert result.report["quarantined_count"] == 2
|
||||
assert all(q["reason"] == "over_limit" for q in result.report["quarantined_items"])
|
||||
|
||||
|
||||
def test_guardrail_oversized_string_quarantined():
|
||||
big = _valid_rec(2)
|
||||
big["why"] = "x" * 5000 # exceeds _MAX_STRING_LEN
|
||||
raw = json.dumps({"summary": "Triage.", "recommendations": [_valid_rec(1), big]})
|
||||
llm = _CountingLLM([raw])
|
||||
|
||||
result = execute_instruction_with_audit(_triage_instr(), _Event(), {}, llm)
|
||||
|
||||
assert len(result.report["recommendations"]) == 1
|
||||
assert result.report["quarantined_count"] == 1
|
||||
assert result.report["quarantined_items"][0]["reason"] == "guardrail"
|
||||
|
||||
|
||||
def test_guardrail_allow_list_rejects_unknown_candidate():
|
||||
raw = json.dumps({
|
||||
"summary": "Triage.",
|
||||
"recommendations": [_valid_rec(1), _valid_rec(2)], # candidates WS-1, WS-2
|
||||
})
|
||||
llm = _CountingLLM([raw])
|
||||
context = {"known_candidates": ["WS-1"]}
|
||||
|
||||
result = execute_instruction_with_audit(_triage_instr(), _Event(), context, llm)
|
||||
|
||||
assert len(result.report["recommendations"]) == 1
|
||||
assert result.report["recommendations"][0]["candidate"] == "WS-1"
|
||||
assert result.report["quarantined_items"][0]["reason"] == "allow_list"
|
||||
|
||||
|
||||
def test_execute_instruction_with_audit_preserves_invalid_report_with_sinks(
|
||||
tmp_path,
|
||||
monkeypatch,
|
||||
|
||||
@@ -297,6 +297,33 @@ Done when:
|
||||
- SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
|
||||
changes the documented contract.
|
||||
|
||||
2026-06-26 progress:
|
||||
|
||||
- **ADR-004 written** — `docs/adr/adr-004-producer-trust-boundary.md` documents
|
||||
the untrusted-producer premise (erroneous + malicious; LLM/agent/human), the
|
||||
A-vs-B posture taxonomy, the four governing principles, the concrete
|
||||
activity-core mechanisms, a posture-by-layer table, consequences, and
|
||||
alternatives considered. Accepted, scope cross-repo.
|
||||
- **Producer guardrails implemented** in `executor.py`, applied uniformly on the
|
||||
happy path *and* the recovery path via `_partition_items`: per-item order is
|
||||
structural-type → schema → structural caps (`_MAX_DEPTH=8`,
|
||||
`_MAX_STRING_LEN=4000`) → reference allow-list → count cap (`maxItems`). Each
|
||||
quarantine carries a `reason` (`malformed`/`schema`/`guardrail`/`allow_list`/
|
||||
`over_limit`).
|
||||
- **Happy-path count cap closed** (the item deferred from T03): a syntactically
|
||||
valid 9-item report now keeps 7 and quarantines 2 as `over_limit`, emitting a
|
||||
`partial` report — without a retry.
|
||||
- **Reference allow-list wired but inert.** `_allow_list_from_context` reads
|
||||
`context["known_candidates"]`; when present, recommendations with an unknown
|
||||
`candidate` are quarantined (`reason: allow_list`). Absent today → check is
|
||||
inert; activation is a one-line context-resolver change. Keeps the guardrail
|
||||
producer-agnostic (principle #4) and ready.
|
||||
- **SCOPE.md updated** — instruction-executor bullet now names the quarantine
|
||||
lane + guardrails; ADR-004 added to the Architecture Decisions list. No INTENT
|
||||
drift: this hardens the existing output contract, it does not extend scope.
|
||||
- New tests: happy-path count cap, oversized-string guardrail, allow-list
|
||||
rejection (all green).
|
||||
|
||||
## Tests + Calibration Re-Entry
|
||||
|
||||
```task
|
||||
|
||||
Reference in New Issue
Block a user