feat(ACTIVITY-WP-0016): register LLM output robustness & producer trust boundary workplan

Add WP-0016 to make the instruction-executor output contract robust after the 2026-06-26 daily-triage validation failure (one malformed delimiter discarded a whole report). Per-item framing for error locality, verify-and-mitigate boundary parsing with a quarantine lane, producer-trust-boundary guardrails (ADR-004), and regression/calibration tests. Unblocks WP-0006-T03 / WP-0010-T04. Also record the 06-26 recheck outcome (streak reset at two) in WP-0006-T03. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 14:39:21 +02:00
parent 612c226472
commit 5eb33bd3bb
2 changed files with 248 additions and 0 deletions
--- a/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
+++ b/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
@@ -174,6 +174,27 @@ the worker consumes the configured URL, then produce schema-valid daily triage
 evidence and three clean scheduled runs. This narrower path is tracked in
 `ACTIVITY-WP-0010`.

+2026-06-25: Consecutive-run streak resumed. State Hub `daily_triage` progress
+events from author `activity-core` fired on time on **2026-06-24 05:20:56Z** and
+**2026-06-25 05:20:47Z** (07:20 Berlin), both delivered, no misfires. That is two
+clean consecutive scheduled runs. **RECHECK 2026-06-26 (after 05:20Z):** confirm
+the 06-26 scheduled `daily_triage` event delivered. If clean, that completes three
+clean consecutive scheduled runs (06-24 / 06-25 / 06-26) — record the calibration
+result in State Hub and close T03. If the 06-26 run misfires or is missing, the
+streak resets and T03 stays `wait`. Flag deliberately kept in-repo (agent-agnostic)
+rather than tied to any single coding agent's scheduler.
+
+2026-06-26 recheck outcome: **streak reset at two.** The 06-26 scheduled run fired
+on time (`daily_triage` event 05:20:57Z) — scheduling layer healthy, no misfire —
+but the `daily-triage-report` instruction output **failed schema validation**:
+`Expecting ',' delimiter: line 136 column 22 (char 5268)`. The model produced a
+long ranked WSJF recommendation list (reached rank 7+ with nested `wsjf` objects)
+whose JSON broke ~char 5268; only a bounded 4000-char preview is preserved in the
+State Hub event, so the exact offending token needs the runtime llm-connect log.
+This is an LLM-output-quality failure (tracked by `ACTIVITY-WP-0010`), not a
+runtime/projection failure. T03 stays `wait`; three clean consecutive scheduled
+runs not yet achieved (06-24 ✅, 06-25 ✅, 06-26 ✗-validation).
+
 ## Rule Action Contract Documentation

 ```task
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -0,0 +1,227 @@
+---
+id: ACTIVITY-WP-0016
+type: workplan
+title: "LLM Output Robustness & The Producer Trust Boundary"
+domain: custodian
+repo: activity-core
+status: proposed
+owner: codex
+topic_slug: custodian
+created: "2026-06-26"
+updated: "2026-06-26"
+state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
+---
+
+# ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
+
+## Context
+
+On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on
+time (`daily_triage` event 05:20:57Z) but its output **failed schema
+validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The
+model emitted a long ranked WSJF recommendation list (reached rank 7+ with
+nested `wsjf` objects) and the JSON broke deep in that list. Because the report
+is a single monolithic JSON document, one malformed delimiter discarded the
+**entire** run. This reset the three-clean-consecutive-scheduled-runs streak in
+`ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
+LLM-output-quality surface deferred from `ACTIVITY-WP-0010`.
+
+The scheduling/runtime layer is healthy — this is purely an output-robustness
+and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`)
+already: passes the output schema to llm-connect as a `json_schema` model param
+(`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser
+(`_parse_json_output`), and preserves a bounded 4000-char preview on hard
+failure (`_invalid_output_report`). None of that helps when error locality is
+zero: the failure unit is the whole document, not the offending item.
+
+## Design Frame — The Producer Trust Boundary
+
+This workplan is anchored to a deliberate architectural stance, not just a bug
+fix. Capture it in an ADR (T04) so future work inherits it.
+
+**Premise.** activity-core has a *trust boundary* where free-form producer
+output meets strict deterministic consumers (JSON Schema validators, the task
+emitter, classic compute pipelines). The producers are **LLMs and humans (and
+agents acting for either)**. Both are *untrusted producers*: their output may be
+
+- **erroneous** — hallucination, truncation (token-limit cutoff), drift,
+  type slips, typos; or
+- **malicious** — prompt injection, crafted payloads, oversized/deeply-nested
+  structures aimed at exhausting or confusing the consumer.
+
+The architecture should treat the boundary as an adversarial frontier and place
+**guardrails + error-correction tooling there**, rather than letting raw
+producer output flow into deterministic consumers and fail (or worse, partially
+succeed) downstream.
+
+**Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem,
+there are two sensible strategies — and they compose:
+
+- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
+  as-is; on exception, catch → repair → retry → or quarantine. Cheap on the
+  happy path. Blast radius depends entirely on how granular the catch is. Good
+  when failures are rare and locally recoverable. Risk: failures surface late,
+  possibly after partial side effects.
+- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
+  and normalize the output to a known-good shape *before* it enters the pipeline
+  — drop bad items, coerce types, bound sizes/depth, allow-list references — so
+  the consumer only ever sees clean input. Higher upfront cost, smaller blast
+  radius, no partial side effects. Good when failures are common or
+  consequences are high.
+
+**Governing principles for this repo:**
+
+1. **Push verification to the boundary; keep the interior strict.** Apply
+   posture **B** at the producer→consumer boundary (verify+mitigate structure);
+   keep posture **A** for residual exceptions inside the verified core. Never
+   relax the interior schema to absorb producer sloppiness.
+2. **Make error locality match the unit of work.** One bad recommendation must
+   cost one recommendation, not the whole report. Framing the payload so each
+   item is independently parseable is the single highest-leverage change.
+3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
+   provenance-tagged artifacts (index, error, raw snippet) so they can be
+   debugged or replayed — degraded-but-usable is distinct from total loss.
+4. **Both human and agent input get the same rigor.** Guardrails are
+   producer-agnostic: the same size/depth/count caps, reference allow-lists, and
+   truncation detection apply whether the producer is an LLM, an agent, or a
+   human form submission.
+
+## Reproduce & Root-Cause The Failure
+
+```task
+id: ACTIVITY-WP-0016-T01
+status: todo
+priority: high
+state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
+```
+
+Recover the **full** raw llm-connect response for the 06-26 failure (the State
+Hub event keeps only a 4000-char preview; the break is at char 5268) and
+establish the precise cause.
+
+Done when:
+
+- the full raw response is pulled from the runtime llm-connect log / response
+  store and the exact offending token at char 5268 is identified;
+- `finish_reason` is captured to confirm or rule out token-limit **truncation**
+  vs a structural mid-stream glitch;
+- it is confirmed whether llm-connect actually **enforced** the `json_schema`
+  constrained-decoding hint or merely accepted it as advisory (this determines
+  whether the schema param is load-bearing);
+- the failing payload is captured as a regression fixture under `tests/`.
+
+## Schema + Prompt Redesign For Error Locality
+
+```task
+id: ACTIVITY-WP-0016-T02
+status: todo
+priority: high
+state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
+```
+
+Redesign the daily-triage report contract so a single malformed item can no
+longer discard the whole report (principle #2).
+
+Done when:
+
+- the recommendation list is **bounded** (configurable top-N, default 5–7) in
+  both the prompt and the output schema — long lists are where the model drifts;
+- the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one
+  recommendation object per line — or an equivalent delimited per-item form)
+  behind a minimal stable envelope (`summary` + framed items), so each item is
+  an independent parse unit;
+- the prompt explicitly states the contract, the per-item framing, the cap, and
+  a "if uncertain, emit fewer well-formed items rather than more" instruction;
+- `max_tokens` is set with headroom for the bounded list so truncation cannot
+  occur at the expected size;
+- the output schema file (`_load_output_schema` target) is updated to match.
+
+## Boundary Parser — Verify & Mitigate (Posture B)
+
+```task
+id: ACTIVITY-WP-0016-T03
+status: todo
+priority: high
+state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
+```
+
+Implement item-granular parsing with a quarantine lane in
+`src/activity_core/rules/executor.py`, applying posture **B** at the boundary
+(principles #1–#3).
+
+Done when:
+
+- the parser splits the envelope from the framed items, then parses **each item
+  independently**; a malformed item is routed to a bounded `quarantined_items`
+  artifact (index + validation error + raw snippet), not raised;
+- a run with some valid and some invalid items emits a report over the surviving
+  valid items with `output_validated=true`, plus `partial=true` and
+  `quarantined_count` / `quarantined_items` markers — degraded-but-usable is
+  reported distinctly from total loss;
+- a best-effort **repair** pass (close unterminated brackets/quotes, recover the
+  valid prefix) is attempted per item before quarantining it;
+- truncation detected in T01 is handled as its own signal (recover whole items
+  emitted before the cutoff rather than failing the document);
+- the existing monolithic-document path remains as the fallback when framing is
+  absent (backward compatible with task-only instructions).
+
+## Producer Guardrails + ADR-004
+
+```task
+id: ACTIVITY-WP-0016-T04
+status: todo
+priority: medium
+state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
+```
+
+Write the architecture decision record and add the producer-agnostic guardrails
+(principle #4).
+
+Done when:
+
+- `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary,
+  the untrusted-producer premise (erroneous **and** malicious; human and agent),
+  the A vs B taxonomy and where each applies, the error-locality principle, and
+  the quarantine-with-provenance rule;
+- boundary guardrails are enforced at the consumer edge: max item **count**, max
+  string length, max nesting **depth**, and a **reference allow-list** (e.g. a
+  recommendation `candidate` / a task `target_repo` must resolve to a known
+  workstream/repo before it is acted on);
+- guardrail rejections are quarantined with provenance, consistent with T03;
+- SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
+  changes the documented contract.
+
+## Tests + Calibration Re-Entry
+
+```task
+id: ACTIVITY-WP-0016-T05
+status: todo
+priority: high
+state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
+```
+
+Prove the new posture and hand back to the calibration gates.
+
+Done when:
+
+- regression tests cover: the captured 06-26 payload, a truncated-mid-list
+  payload, a one-bad-item-among-good payload (asserts quarantine + partial), an
+  oversized/over-deep payload (asserts guardrail rejection), and an
+  injection-shaped reference (asserts allow-list rejection);
+- the full suite passes and the result is recorded here with the count;
+- a daily-triage smoke against the live runtime shows a previously-failing
+  payload now **degrades gracefully** (valid items delivered, bad items
+  quarantined) instead of discarding the run;
+- a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03`
+  that the output-robustness blocker is cleared so the three-clean-run gate can
+  resume on its own.
+
+## Relationships
+
+- **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
+  `ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on
+  the same output-quality failure this workplan removes.
+- **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap).
+- **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens
+  the instruction-executor output contract; it does not move provider
+  credentials, cluster reconciliation, or task lifecycle into this repo.