generated from coulomb/repo-seed
chore(ACTIVITY-WP-0016-T01): record root-cause findings + partial failure fixture
Local analysis of the 2026-06-26 daily-triage validation failure: the unbounded ~1-recommendation-per-workstream list (16 active workstreams; JSON break at char 5268, ~rank 8-9) is the structural cause; both the first attempt and the retry failed. The exact offending token and finish_reason are unrecoverable from activity-core data — complete() drops finish_reason/usage, the report sink caps raw output at 4000 chars (< 5268), and the log preview at 2000. Confirming the exact token needs llm-connect producer-side logs on railiance01 (operator-owned); mitigation (T02/T03) is identical regardless. Partial fixture captured. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -110,6 +110,40 @@ Done when:
|
||||
whether the schema param is load-bearing);
|
||||
- the failing payload is captured as a regression fixture under `tests/`.
|
||||
|
||||
2026-06-26 findings (local analysis on the workstation):
|
||||
|
||||
- **Mechanism confirmed structurally.** There are **16 active workstreams**
|
||||
org-wide and the triage instruction emits ~one ranked recommendation per
|
||||
candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
|
||||
break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
|
||||
structural cause — more items = more tokens = higher odds of a mid-stream JSON
|
||||
slip and/or truncation. This directly justifies T02's bounded top-N + per-item
|
||||
framing.
|
||||
- **Both attempts failed.** `executor._execute` retries once
|
||||
(`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
|
||||
**retry** output, so the model produced invalid JSON twice — not a one-off.
|
||||
- **activity-core discards the diagnostics needed to root-cause this.** Three
|
||||
retention gaps mean the exact char-5268 token cannot be recovered from
|
||||
activity-core data at all:
|
||||
1. `LLMConnectClient.complete()` returns only `data["content"]`
|
||||
(`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
|
||||
llm-connect HTTP response, so truncation-vs-structural cannot be
|
||||
distinguished locally.
|
||||
2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
|
||||
`executor.py:259`) — below the 5268 break.
|
||||
3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
|
||||
- **Remaining (remote, operator-owned).** Confirming the exact offending token
|
||||
and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
|
||||
— cluster access, outside this repo's SCOPE for direct action. Truncation is
|
||||
the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
|
||||
identical either way, so T01 does not block the build work.
|
||||
- **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
|
||||
`finish_reason`/`usage` and persist a larger bounded raw artifact on validation
|
||||
failure so this class of failure is never un-debuggable again.
|
||||
- Partial fixture saved:
|
||||
`tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
|
||||
(the 4000-char preview + validation error; full payload pending the remote pull).
|
||||
|
||||
## Schema + Prompt Redesign For Error Locality
|
||||
|
||||
```task
|
||||
|
||||
Reference in New Issue
Block a user