From 0e9e18a59a84bb69aecacab6cf976a5d8376b13a Mon Sep 17 00:00:00 2001
From: tegwick <bernd.worsch@gmail.com>
Date: Fri, 26 Jun 2026 15:04:27 +0200
Subject: [PATCH] chore(ACTIVITY-WP-0016-T01): record root-cause findings +
 partial failure fixture
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Local analysis of the 2026-06-26 daily-triage validation failure: the unbounded
~1-recommendation-per-workstream list (16 active workstreams; JSON break at char
5268, ~rank 8-9) is the structural cause; both the first attempt and the retry
failed. The exact offending token and finish_reason are unrecoverable from
activity-core data — complete() drops finish_reason/usage, the report sink caps
raw output at 4000 chars (< 5268), and the log preview at 2000. Confirming the
exact token needs llm-connect producer-side logs on railiance01 (operator-owned);
mitigation (T02/T03) is identical regardless. Partial fixture captured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 ...2026-06-26_validation_failure.partial.json |  5 +++
 ...16-llm-output-robustness-trust-boundary.md | 34 +++++++++++++++++++
 2 files changed, 39 insertions(+)
 create mode 100644 tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json

diff --git a/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json b/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
new file mode 100644
index 0000000..feeb270
--- /dev/null
+++ b/tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json
@@ -0,0 +1,5 @@
+{
+  "_note": "PARTIAL 4000-char preview of the 2026-06-26 daily-triage validation failure (retry attempt). Full payload not recoverable from activity-core: complete() drops finish_reason; report sink caps raw at 4000 chars; the JSON break is at char 5268 (beyond this preview). Full response would require llm-connect producer-side logs on railiance01.",
+  "validation_error": "Expecting ',' delimiter: line 136 column 22 (char 5268)",
+  "raw_output_preview": "{\n  \"summary\": \"Triage report focusing on high-priority workstreams with pending human intervention or critical dependencies, and addressing recently cleared dependencies to unblock progress.\",\n  \"recommendations\": [\n    {\n      \"rank\": 1,\n      \"candidate\": \"2731fece-6c49-45b8-ab8a-4ea6c04ac603\",\n      \"action\": \"work-next\",\n      \"why\": \"A critical dependency (T03 - Configure bounded OpenBao token roles and policies) for this workstream has been cleared, unblocking significant progress on credential management. This workstream has 8 todo tasks and no waits, indicating it's ready for immediate action.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 5.0,\n        \"strategic_value\": 5,\n        \"time_criticality\": 5,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 5,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 2,\n      \"candidate\": \"bd086c41-287d-4a4e-8ac5-9ab270f14d72\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (T04 - Provision the runtime API key outside Git) and is currently blocked by 3 'wait' tasks. Human intervention is required to unblock progress.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 3,\n      \"candidate\": \"9b56414a-c71f-4e72-9b2b-d2166aaf50d0\",\n      \"action\": \"needs-human\",\n      \"why\": \"This high-priority workstream has a 'needs_human' task (Task: Execute Live Ops-Hub Bootstrap) and is currently blocked by a 'wait' task. Human intervention is required to proceed with the bootstrap.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.7,\n        \"strategic_value\": 5,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 5,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 3\n      }\n    },\n    {\n      \"rank\": 4,\n      \"candidate\": \"84e17675-0d15-4268-a8bd-540124d37018\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has 4 'needs_human' tasks, including 'T02 \u2014 Resolve Forgejo production design decisions', indicating significant human input is required to move forward with the migration.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 4.0,\n        \"strategic_value\": 4,\n        \"time_criticality\": 4,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 5,\n      \"candidate\": \"5646e13a-13af-4724-bca6-3c0d86f96733\",\n      \"action\": \"needs-human\",\n      \"why\": \"This workstream has a 'needs_human' task ('Three-Run Calibration Feedback') and is currently in a 'wait' state. Human feedback is crucial for operational hardening.\",\n      \"confidence\": \"medium\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 6,\n      \"candidate\": \"896ace77-21b3-450b-8fb7-254aefc8c570\",\n      \"action\": \"close-out\",\n      \"why\": \"The task 'Wire activity-core to the live service' has been resolved, and the workstream shows 2 progress tasks with 0 todo/wait tasks. This indicates the deployment is likely complete or nearing completion and ready for close-out after verification.\",\n      \"confidence\": \"high\",\n      \"wsjf\": {\n        \"score\": 3.7,\n        \"strategic_value\": 4,\n        \"time_criticality\": 3,\n        \"risk_reduction\": 4,\n        \"opportunity_enablement\": 4,\n        \"job_size\": 4\n      }\n    },\n    {\n      \"rank\": 7,\n      \"candidate\": \"656e435d-3a00-4f5e-a38e-114467f9062e\",\n      \"action\": \"work-next\",\n      \"why\": \"This high-priority workstream has a single 'wait' task ('Task: Activate Ops-Hub Widgets In Inter-Hub') and no 'needs_human' tasks. It appears ready for the next step to activate the widgets.\",\n      \"confidence\": \"medium\",\n      \"wsjf"
+}
\ No newline at end of file
diff --git a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
index 26cec64..844c333 100644
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -110,6 +110,40 @@ Done when:
   whether the schema param is load-bearing);
 - the failing payload is captured as a regression fixture under `tests/`.
 
+2026-06-26 findings (local analysis on the workstation):
+
+- **Mechanism confirmed structurally.** There are **16 active workstreams**
+  org-wide and the triage instruction emits ~one ranked recommendation per
+  candidate. The preserved preview holds 7 fully-formed recommendations; the JSON
+  break is at char 5268 (~rank 8–9). The unbounded one-per-workstream list is the
+  structural cause — more items = more tokens = higher odds of a mid-stream JSON
+  slip and/or truncation. This directly justifies T02's bounded top-N + per-item
+  framing.
+- **Both attempts failed.** `executor._execute` retries once
+  (`src/activity_core/rules/executor.py:166-171`); the recorded error is from the
+  **retry** output, so the model produced invalid JSON twice — not a one-off.
+- **activity-core discards the diagnostics needed to root-cause this.** Three
+  retention gaps mean the exact char-5268 token cannot be recovered from
+  activity-core data at all:
+  1. `LLMConnectClient.complete()` returns only `data["content"]`
+     (`llm_client.py:57-60`) — it drops `finish_reason`/`usage` from the
+     llm-connect HTTP response, so truncation-vs-structural cannot be
+     distinguished locally.
+  2. the report sink caps raw output at **4000 chars** (`_invalid_output_report`,
+     `executor.py:259`) — below the 5268 break.
+  3. the worker log caps the preview at **2000 chars** (`executor.py:175`).
+- **Remaining (remote, operator-owned).** Confirming the exact offending token
+  and `finish_reason` requires llm-connect's producer-side logs on `railiance01`
+  — cluster access, outside this repo's SCOPE for direct action. Truncation is
+  the leading hypothesis given the 16-item input, but the mitigation (T02/T03) is
+  identical either way, so T01 does not block the build work.
+- **Feeds T03/T04.** The retention gaps are themselves defects to fix: capture
+  `finish_reason`/`usage` and persist a larger bounded raw artifact on validation
+  failure so this class of failure is never un-debuggable again.
+- Partial fixture saved:
+  `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
+  (the 4000-char preview + validation error; full payload pending the remote pull).
+
 ## Schema + Prompt Redesign For Error Locality
 
 ```task