generated from coulomb/repo-seed
Local analysis of the 2026-06-26 daily-triage validation failure: the unbounded ~1-recommendation-per-workstream list (16 active workstreams; JSON break at char 5268, ~rank 8-9) is the structural cause; both the first attempt and the retry failed. The exact offending token and finish_reason are unrecoverable from activity-core data — complete() drops finish_reason/usage, the report sink caps raw output at 4000 chars (< 5268), and the log preview at 2000. Confirming the exact token needs llm-connect producer-side logs on railiance01 (operator-owned); mitigation (T02/T03) is identical regardless. Partial fixture captured. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
262 lines
12 KiB
Markdown
262 lines
12 KiB
Markdown
---
|
||
id: ACTIVITY-WP-0016
|
||
type: workplan
|
||
title: "LLM Output Robustness & The Producer Trust Boundary"
|
||
domain: custodian
|
||
repo: activity-core
|
||
status: proposed
|
||
owner: codex
|
||
topic_slug: custodian
|
||
created: "2026-06-26"
|
||
updated: "2026-06-26"
|
||
state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
|
||
---
|
||
|
||
# ACTIVITY-WP-0016 — LLM Output Robustness & The Producer Trust Boundary
|
||
|
||
## Context
|
||
|
||
On 2026-06-26 the scheduled `daily-statehub-wsjf-triage` instruction fired on
|
||
time (`daily_triage` event 05:20:57Z) but its output **failed schema
|
||
validation**: `Expecting ',' delimiter: line 136 column 22 (char 5268)`. The
|
||
model emitted a long ranked WSJF recommendation list (reached rank 7+ with
|
||
nested `wsjf` objects) and the JSON broke deep in that list. Because the report
|
||
is a single monolithic JSON document, one malformed delimiter discarded the
|
||
**entire** run. This reset the three-clean-consecutive-scheduled-runs streak in
|
||
`ACTIVITY-WP-0006-T03` (06-24 ✅, 06-25 ✅, 06-26 ✗-validation) and is the
|
||
LLM-output-quality surface deferred from `ACTIVITY-WP-0010`.
|
||
|
||
The scheduling/runtime layer is healthy — this is purely an output-robustness
|
||
and boundary-design problem. Today's code (`src/activity_core/rules/executor.py`)
|
||
already: passes the output schema to llm-connect as a `json_schema` model param
|
||
(`_llm_run_config`), retries once, runs a fenced/`raw_decode` tolerant parser
|
||
(`_parse_json_output`), and preserves a bounded 4000-char preview on hard
|
||
failure (`_invalid_output_report`). None of that helps when error locality is
|
||
zero: the failure unit is the whole document, not the offending item.
|
||
|
||
## Design Frame — The Producer Trust Boundary
|
||
|
||
This workplan is anchored to a deliberate architectural stance, not just a bug
|
||
fix. Capture it in an ADR (T04) so future work inherits it.
|
||
|
||
**Premise.** activity-core has a *trust boundary* where free-form producer
|
||
output meets strict deterministic consumers (JSON Schema validators, the task
|
||
emitter, classic compute pipelines). The producers are **LLMs and humans (and
|
||
agents acting for either)**. Both are *untrusted producers*: their output may be
|
||
|
||
- **erroneous** — hallucination, truncation (token-limit cutoff), drift,
|
||
type slips, typos; or
|
||
- **malicious** — prompt injection, crafted payloads, oversized/deeply-nested
|
||
structures aimed at exhausting or confusing the consumer.
|
||
|
||
The architecture should treat the boundary as an adversarial frontier and place
|
||
**guardrails + error-correction tooling there**, rather than letting raw
|
||
producer output flow into deterministic consumers and fail (or worse, partially
|
||
succeed) downstream.
|
||
|
||
**Two non-fail-fast postures.** When we do *not* want to hard-fail on a problem,
|
||
there are two sensible strategies — and they compose:
|
||
|
||
- **A) Trust but handle exceptions** (optimistic / reactive). Consume the output
|
||
as-is; on exception, catch → repair → retry → or quarantine. Cheap on the
|
||
happy path. Blast radius depends entirely on how granular the catch is. Good
|
||
when failures are rare and locally recoverable. Risk: failures surface late,
|
||
possibly after partial side effects.
|
||
- **B) Verify and mitigate** (defensive / proactive). Validate, sanitize, clamp,
|
||
and normalize the output to a known-good shape *before* it enters the pipeline
|
||
— drop bad items, coerce types, bound sizes/depth, allow-list references — so
|
||
the consumer only ever sees clean input. Higher upfront cost, smaller blast
|
||
radius, no partial side effects. Good when failures are common or
|
||
consequences are high.
|
||
|
||
**Governing principles for this repo:**
|
||
|
||
1. **Push verification to the boundary; keep the interior strict.** Apply
|
||
posture **B** at the producer→consumer boundary (verify+mitigate structure);
|
||
keep posture **A** for residual exceptions inside the verified core. Never
|
||
relax the interior schema to absorb producer sloppiness.
|
||
2. **Make error locality match the unit of work.** One bad recommendation must
|
||
cost one recommendation, not the whole report. Framing the payload so each
|
||
item is independently parseable is the single highest-leverage change.
|
||
3. **Quarantine, never silently drop.** Invalid units are preserved as bounded,
|
||
provenance-tagged artifacts (index, error, raw snippet) so they can be
|
||
debugged or replayed — degraded-but-usable is distinct from total loss.
|
||
4. **Both human and agent input get the same rigor.** Guardrails are
|
||
producer-agnostic: the same size/depth/count caps, reference allow-lists, and
|
||
truncation detection apply whether the producer is an LLM, an agent, or a
|
||
human form submission.
|
||
|
||
## Reproduce & Root-Cause The Failure
|
||
|
||
```task
|
||
id: ACTIVITY-WP-0016-T01
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "74fd16a5-4ea5-4dfe-8526-dfa27cf76138"
|
||
```
|
||
|
||
Recover the **full** raw llm-connect response for the 06-26 failure (the State
|
||
Hub event keeps only a 4000-char preview; the break is at char 5268) and
|
||
establish the precise cause.
|
||
|
||
Done when:
|
||
|
||
- the full raw response is pulled from the runtime llm-connect log / response
|
||
store and the exact offending token at char 5268 is identified;
|
||
- `finish_reason` is captured to confirm or rule out token-limit **truncation**
|
||
vs a structural mid-stream glitch;
|
||
- it is confirmed whether llm-connect actually **enforced** the `json_schema`
|
||
constrained-decoding hint or merely accepted it as advisory (this determines
|
||
whether the schema param is load-bearing);
|
||
- the failing payload is captured as a regression fixture under `tests/`.
|
||
|
||
2026-06-26 findings (local analysis on the workstation):
|
||
|
||
- **Mechanism confirmed structurally.** There are **16 active workstreams**
|
||
org-wide and the triage instruction emits ~one ranked recommendation per
|
||
candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
|
||
break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
|
||
structural cause — more items = more tokens = higher odds of a mid-stream JSON
|
||
slip and/or truncation. This directly justifies T02's bounded top-N + per-item
|
||
framing.
|
||
- **Both attempts failed.** `executor._execute` retries once
|
||
(`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
|
||
**retry** output, so the model produced invalid JSON twice — not a one-off.
|
||
- **activity-core discards the diagnostics needed to root-cause this.** Three
|
||
retention gaps mean the exact char-5268 token cannot be recovered from
|
||
activity-core data at all:
|
||
1. `LLMConnectClient.complete()` returns only `data["content"]`
|
||
(`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
|
||
llm-connect HTTP response, so truncation-vs-structural cannot be
|
||
distinguished locally.
|
||
2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
|
||
`executor.py:259`) — below the 5268 break.
|
||
3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
|
||
- **Remaining (remote, operator-owned).** Confirming the exact offending token
|
||
and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
|
||
— cluster access, outside this repo's SCOPE for direct action. Truncation is
|
||
the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
|
||
identical either way, so T01 does not block the build work.
|
||
- **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
|
||
`finish_reason`/`usage` and persist a larger bounded raw artifact on validation
|
||
failure so this class of failure is never un-debuggable again.
|
||
- Partial fixture saved:
|
||
`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
|
||
(the 4000-char preview + validation error; full payload pending the remote pull).
|
||
|
||
## Schema + Prompt Redesign For Error Locality
|
||
|
||
```task
|
||
id: ACTIVITY-WP-0016-T02
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
|
||
```
|
||
|
||
Redesign the daily-triage report contract so a single malformed item can no
|
||
longer discard the whole report (principle #2).
|
||
|
||
Done when:
|
||
|
||
- the recommendation list is **bounded** (configurable top-N, default 5–7) in
|
||
both the prompt and the output schema — long lists are where the model drifts;
|
||
- the report uses a **per-item-framed** shape (JSON Lines / NDJSON — one
|
||
recommendation object per line — or an equivalent delimited per-item form)
|
||
behind a minimal stable envelope (`summary` + framed items), so each item is
|
||
an independent parse unit;
|
||
- the prompt explicitly states the contract, the per-item framing, the cap, and
|
||
a "if uncertain, emit fewer well-formed items rather than more" instruction;
|
||
- `max_tokens` is set with headroom for the bounded list so truncation cannot
|
||
occur at the expected size;
|
||
- the output schema file (`_load_output_schema` target) is updated to match.
|
||
|
||
## Boundary Parser — Verify & Mitigate (Posture B)
|
||
|
||
```task
|
||
id: ACTIVITY-WP-0016-T03
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "d65a6281-f1f9-4a9b-a835-da065411b709"
|
||
```
|
||
|
||
Implement item-granular parsing with a quarantine lane in
|
||
`src/activity_core/rules/executor.py`, applying posture **B** at the boundary
|
||
(principles #1–#3).
|
||
|
||
Done when:
|
||
|
||
- the parser splits the envelope from the framed items, then parses **each item
|
||
independently**; a malformed item is routed to a bounded `quarantined_items`
|
||
artifact (index + validation error + raw snippet), not raised;
|
||
- a run with some valid and some invalid items emits a report over the surviving
|
||
valid items with `output_validated=true`, plus `partial=true` and
|
||
`quarantined_count` / `quarantined_items` markers — degraded-but-usable is
|
||
reported distinctly from total loss;
|
||
- a best-effort **repair** pass (close unterminated brackets/quotes, recover the
|
||
valid prefix) is attempted per item before quarantining it;
|
||
- truncation detected in T01 is handled as its own signal (recover whole items
|
||
emitted before the cutoff rather than failing the document);
|
||
- the existing monolithic-document path remains as the fallback when framing is
|
||
absent (backward compatible with task-only instructions).
|
||
|
||
## Producer Guardrails + ADR-004
|
||
|
||
```task
|
||
id: ACTIVITY-WP-0016-T04
|
||
status: todo
|
||
priority: medium
|
||
state_hub_task_id: "f5c3af5b-9e28-42b0-9af5-4c99284e99b9"
|
||
```
|
||
|
||
Write the architecture decision record and add the producer-agnostic guardrails
|
||
(principle #4).
|
||
|
||
Done when:
|
||
|
||
- `docs/adr/adr-004-producer-trust-boundary.md` documents the trust boundary,
|
||
the untrusted-producer premise (erroneous **and** malicious; human and agent),
|
||
the A vs B taxonomy and where each applies, the error-locality principle, and
|
||
the quarantine-with-provenance rule;
|
||
- boundary guardrails are enforced at the consumer edge: max item **count**, max
|
||
string length, max nesting **depth**, and a **reference allow-list** (e.g. a
|
||
recommendation `candidate` / a task `target_repo` must resolve to a known
|
||
workstream/repo before it is acted on);
|
||
- guardrail rejections are quarantined with provenance, consistent with T03;
|
||
- SCOPE.md / INTENT.md are checked for drift and updated if the boundary stance
|
||
changes the documented contract.
|
||
|
||
## Tests + Calibration Re-Entry
|
||
|
||
```task
|
||
id: ACTIVITY-WP-0016-T05
|
||
status: todo
|
||
priority: high
|
||
state_hub_task_id: "c881500b-5459-4620-81c0-b176971e989f"
|
||
```
|
||
|
||
Prove the new posture and hand back to the calibration gates.
|
||
|
||
Done when:
|
||
|
||
- regression tests cover: the captured 06-26 payload, a truncated-mid-list
|
||
payload, a one-bad-item-among-good payload (asserts quarantine + partial), an
|
||
oversized/over-deep payload (asserts guardrail rejection), and an
|
||
injection-shaped reference (asserts allow-list rejection);
|
||
- the full suite passes and the result is recorded here with the count;
|
||
- a daily-triage smoke against the live runtime shows a previously-failing
|
||
payload now **degrades gracefully** (valid items delivered, bad items
|
||
quarantined) instead of discarding the run;
|
||
- a progress note hands back to `ACTIVITY-WP-0010-T04` and `ACTIVITY-WP-0006-T03`
|
||
that the output-robustness blocker is cleared so the three-clean-run gate can
|
||
resume on its own.
|
||
|
||
## Relationships
|
||
|
||
- **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
|
||
`ACTIVITY-WP-0010-T04` (collect three clean scheduled runs) — both stalled on
|
||
the same output-quality failure this workplan removes.
|
||
- **References:** `ACTIVITY-WP-0009` (scheduled-run trust gap).
|
||
- **Boundary discipline:** keeps activity-core inside its SCOPE — this hardens
|
||
the instruction-executor output contract; it does not move provider
|
||
credentials, cluster reconciliation, or task lifecycle into this repo.
|