Files
railiance-cluster/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md
2026-07-02 10:44:06 +02:00

111 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: RAIL-BS-WP-0008
type: workplan
title: "activity-core WP-0016 triage-output robustness deploy"
domain: financials
repo: railiance-cluster
status: active
owner: railiance-cluster
topic_slug: railiance
created: "2026-07-01"
updated: "2026-07-02"
state_hub_workstream_id: "7cbbe0d6-fea9-41c6-840c-46d0d8e8edde"
---
# activity-core WP-0016 triage-output robustness deploy
## Context
Inbox message `87952ff1` (activity-core, 2026-06-26): the scheduled daily WSJF
triage run on 2026-06-26 failed schema validation and the whole run was
discarded, resetting the WP-0006-T03 three-clean-run streak. ACTIVITY-WP-0016
hardened the instruction-executor output contract in-repo (commits
`5eb33bd..bf877b7` on activity-core main, 220 tests passed). The remaining
work is operator/cluster-owned on railiance01.
**Deploy coupling constraint:** `schemas/daily-triage-report.json` is now
strict per-item and is consumed by both the llm-connect hint and the
whole-doc validator. It MUST ship together with the new `executor.py`
(T03 per-item quarantine parser). Never deploy the schema ahead of the code.
## Deploy activity-core with coupled schema and executor
```task
id: RAIL-BS-WP-0008-T01
status: todo
priority: high
state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83"
```
Rebuild/import the activity-core image from main (`bf877b7` or later) into
the railiance01 k3s runtime and reconcile the activity-core deployment so the
new executor and the strict per-item schema ship together.
2026-07-02: Added `make deploy-activity-core-triage-robustness` /
`bin/railiance deploy-triage-robustness` as the repeatable operator path. The
command gates the remote activity-core repo on `bf877b7` or later, checks the
runtime bundle contract before applying it, restarts the activity-core
deployments by default, waits for migrate/sync jobs and rollouts, then records
non-secret State Hub evidence. Live execution on railiance01 remains pending.
## Update daily-statehub-wsjf-triage runtime-bundle Instruction
```task
id: RAIL-BS-WP-0008-T02
status: todo
priority: high
state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b"
```
In the runtime projection (not the activity-core repo), update the
`daily-statehub-wsjf-triage` Instruction:
- raise `max_tokens` (currently ~1200; give clear headroom above the
~13001500-token 16-workstream list);
- prompt: bounded top-N (≤7) ranked recommendations, "if uncertain emit fewer
well-formed items rather than more";
- prompt: per-item NDJSON framing (leading summary object, then one
recommendation JSON object per line) so the T03 parser recovers items
independently.
2026-07-02: The new deploy command enforces this contract against
`k8s/railiance/20-runtime.yaml` before it will touch the cluster: it requires
the daily instruction, a top-7 bound, the "fewer well-formed" fallback, NDJSON
or one-object-per-line framing, and `max_tokens` headroom of at least 1800.
## Pull raw llm-connect response for the 2026-06-26 run
```task
id: RAIL-BS-WP-0008-T03
status: todo
priority: medium
state_hub_task_id: "59559f1d-821f-4660-8a7d-c623c6631864"
```
From the llm-connect pod logs / response store on railiance01, capture the
full raw response and `finish_reason` for the 2026-06-26 05:20:57Z run
(activity-core retained only a 4000-char preview; the JSON break is at char
5268). Send to activity-core to close ACTIVITY-WP-0016-T01. Logs only, no
secrets.
## Acceptance smoke
```task
id: RAIL-BS-WP-0008-T04
status: todo
priority: high
state_hub_task_id: "8096621a-54ee-4be5-943e-5dc2da19ed28"
```
Trigger one daily-triage run against the reconciled runtime and confirm it
either (i) returns a clean schema-valid report, or (ii) degrades gracefully
(valid recommendations with `output_validated=true`, `partial=true`,
`quarantined_count>0`) instead of discarding the run. Confirm the State Hub
shows a matching `daily_triage` progress event. Closes ACTIVITY-WP-0016-T05
and unblocks the three-clean-run streak for ACTIVITY-WP-0010-T04 /
WP-0006-T03.
2026-07-02: The deploy command now triggers the daily-triage definition after
reconcile and polls State Hub for a post-trigger `daily_triage` event with
`output_validated=true`. If the run is partial, it also requires
`quarantined_count>0` before posting pass evidence.