Add WP-0016 to make the instruction-executor output contract robust after the 2026-06-26 daily-triage validation failure (one malformed delimiter discarded a whole report). Per-item framing for error locality, verify-and-mitigate boundary parsing with a quarantine lane, producer-trust-boundary guardrails (ADR-004), and regression/calibration tests. Unblocks WP-0006-T03 / WP-0010-T04. Also record the 06-26 recheck outcome (streak reset at two) in WP-0006-T03. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
10 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| ACTIVITY-WP-0016 | workplan | LLM Output Robustness & The Producer Trust Boundary | custodian | activity-core | proposed | codex | custodian | 2026-06-26 | 2026-06-26 | 4ef0d53b-1777-41ae-80c6-1b69fdb34726 |
ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
Context
On 2026-06-26 the scheduled daily-statehub-wsjf-triage instruction fired on
time (daily_triage event 05:20:57Z) but its output failed schema
validation: Expecting ',' delimiter: line 136 column 22 (char 5268). The
model emitted a long ranked WSJF recommendation list (reached rank 7+ with
nested wsjf objects) and the JSON broke deep in that list. Because the report
is a single monolithic JSON document, one malformed delimiter discarded the
entire run. This reset the three-clean-consecutive-scheduled-runs streak in
ACTIVITY-WP-0006-T03 (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
LLM-output-quality surface deferred from ACTIVITY-WP-0010.
The scheduling/runtime layer is healthy — this is purely an output-robustness
and boundary-design problem. Today's code (src/activity_core/rules/executor.py)
already: passes the output schema to llm-connect as a json_schema model param
(_llm_run_config), retries once, runs a fenced/raw_decode tolerant parser
(_parse_json_output), and preserves a bounded 4000-char preview on hard
failure (_invalid_output_report). None of that helps when error locality is
zero: the failure unit is the whole document, not the offending item.
Design Frame — The Producer Trust Boundary
This workplan is anchored to a deliberate architectural stance, not just a bug fix. Capture it in an ADR (T04) so future work inherits it.
Premise. activity-core has a trust boundary where free-form producer output meets strict deterministic consumers (JSON Schema validators, the task emitter, classic compute pipelines). The producers are LLMs and humans (and agents acting for either). Both are untrusted producers: their output may be
- erroneous — hallucination, truncation (token-limit cutoff), drift, type slips, typos; or
- malicious — prompt injection, crafted payloads, oversized/deeply-nested structures aimed at exhausting or confusing the consumer.
The architecture should treat the boundary as an adversarial frontier and place guardrails + error-correction tooling there, rather than letting raw producer output flow into deterministic consumers and fail (or worse, partially succeed) downstream.
Two non-fail-fast postures. When we do not want to hard-fail on a problem, there are two sensible strategies — and they compose:
- A) Trust but handle exceptions (optimistic / reactive). Consume the output as-is; on exception, catch → repair → retry → or quarantine. Cheap on the happy path. Blast radius depends entirely on how granular the catch is. Good when failures are rare and locally recoverable. Risk: failures surface late, possibly after partial side effects.
- B) Verify and mitigate (defensive / proactive). Validate, sanitize, clamp, and normalize the output to a known-good shape before it enters the pipeline — drop bad items, coerce types, bound sizes/depth, allow-list references — so the consumer only ever sees clean input. Higher upfront cost, smaller blast radius, no partial side effects. Good when failures are common or consequences are high.
Governing principles for this repo:
- Push verification to the boundary; keep the interior strict. Apply posture B at the producer→consumer boundary (verify+mitigate structure); keep posture A for residual exceptions inside the verified core. Never relax the interior schema to absorb producer sloppiness.
- Make error locality match the unit of work. One bad recommendation must cost one recommendation, not the whole report. Framing the payload so each item is independently parseable is the single highest-leverage change.
- Quarantine, never silently drop. Invalid units are preserved as bounded, provenance-tagged artifacts (index, error, raw snippet) so they can be debugged or replayed — degraded-but-usable is distinct from total loss.
- Both human and agent input get the same rigor. Guardrails are producer-agnostic: the same size/depth/count caps, reference allow-lists, and truncation detection apply whether the producer is an LLM, an agent, or a human form submission.
Reproduce & Root-Cause The Failure
id: ACTIVITY-WP-0016-T01
status: todo
priority: high
state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
Recover the full raw llm-connect response for the 06-26 failure (the State Hub event keeps only a 4000-char preview; the break is at char 5268) and establish the precise cause.
Done when:
- the full raw response is pulled from the runtime llm-connect log / response store and the exact offending token at char 5268 is identified;
finish_reasonis captured to confirm or rule out token-limit truncation vs a structural mid-stream glitch;- it is confirmed whether llm-connect actually enforced the
json_schemaconstrained-decoding hint or merely accepted it as advisory (this determines whether the schema param is load-bearing); - the failing payload is captured as a regression fixture under
tests/.
Schema + Prompt Redesign For Error Locality
id: ACTIVITY-WP-0016-T02
status: todo
priority: high
state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
Redesign the daily-triage report contract so a single malformed item can no longer discard the whole report (principle #2).
Done when:
- the recommendation list is bounded (configurable top-N, default 5–7) in both the prompt and the output schema — long lists are where the model drifts;
- the report uses a per-item-framed shape (JSON Lines / NDJSON — one
recommendation object per line — or an equivalent delimited per-item form)
behind a minimal stable envelope (
summary+ framed items), so each item is an independent parse unit; - the prompt explicitly states the contract, the per-item framing, the cap, and a "if uncertain, emit fewer well-formed items rather than more" instruction;
max_tokensis set with headroom for the bounded list so truncation cannot occur at the expected size;- the output schema file (
_load_output_schematarget) is updated to match.
Boundary Parser — Verify & Mitigate (Posture B)
id: ACTIVITY-WP-0016-T03
status: todo
priority: high
state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
Implement item-granular parsing with a quarantine lane in
src/activity_core/rules/executor.py, applying posture B at the boundary
(principles #1–#3).
Done when:
- the parser splits the envelope from the framed items, then parses each item
independently; a malformed item is routed to a bounded
quarantined_itemsartifact (index + validation error + raw snippet), not raised; - a run with some valid and some invalid items emits a report over the surviving
valid items with
output_validated=true, pluspartial=trueandquarantined_count/quarantined_itemsmarkers — degraded-but-usable is reported distinctly from total loss; - a best-effort repair pass (close unterminated brackets/quotes, recover the valid prefix) is attempted per item before quarantining it;
- truncation detected in T01 is handled as its own signal (recover whole items emitted before the cutoff rather than failing the document);
- the existing monolithic-document path remains as the fallback when framing is absent (backward compatible with task-only instructions).
Producer Guardrails + ADR-004
id: ACTIVITY-WP-0016-T04
status: todo
priority: medium
state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
Write the architecture decision record and add the producer-agnostic guardrails (principle #4).
Done when:
docs/adr/adr-004-producer-trust-boundary.mddocuments the trust boundary, the untrusted-producer premise (erroneous and malicious; human and agent), the A vs B taxonomy and where each applies, the error-locality principle, and the quarantine-with-provenance rule;- boundary guardrails are enforced at the consumer edge: max item count, max
string length, max nesting depth, and a reference allow-list (e.g. a
recommendation
candidate/ a tasktarget_repomust resolve to a known workstream/repo before it is acted on); - guardrail rejections are quarantined with provenance, consistent with T03;
- SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance changes the documented contract.
Tests + Calibration Re-Entry
id: ACTIVITY-WP-0016-T05
status: todo
priority: high
state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
Prove the new posture and hand back to the calibration gates.
Done when:
- regression tests cover: the captured 06-26 payload, a truncated-mid-list payload, a one-bad-item-among-good payload (asserts quarantine + partial), an oversized/over-deep payload (asserts guardrail rejection), and an injection-shaped reference (asserts allow-list rejection);
- the full suite passes and the result is recorded here with the count;
- a daily-triage smoke against the live runtime shows a previously-failing payload now degrades gracefully (valid items delivered, bad items quarantined) instead of discarding the run;
- a progress note hands back to
ACTIVITY-WP-0010-T04andACTIVITY-WP-0006-T03that the output-robustness blocker is cleared so the three-clean-run gate can resume on its own.
Relationships
- Blocks / feeds:
ACTIVITY-WP-0006-T03(three clean scheduled runs) andACTIVITY-WP-0010-T04(collect three clean scheduled runs) — both stalled on the same output-quality failure this workplan removes. - References:
ACTIVITY-WP-0009(scheduled-run trust gap). - Boundary discipline: keeps activity-core inside its SCOPE — this hardens the instruction-executor output contract; it does not move provider credentials, cluster reconciliation, or task lifecycle into this repo.