Record daily triage clean streak checkpoint

This commit is contained in:
2026-06-30 10:07:28 +02:00
parent 3ef57f63c1
commit cb7fd7b19d
3 changed files with 65 additions and 18 deletions

View File

@@ -1,6 +1,6 @@
# Daily-Triage Stabilization Status
Updated: 2026-06-27
Updated: 2026-06-30
## Purpose
@@ -20,10 +20,16 @@ Recent scheduled run evidence:
| 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written |
| 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written |
| 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written |
| 2026-06-28 | `f0d8477e-1db9-4c07-bb8c-d28cbb868abc` | schema-valid daily triage, working memory written; still emitted 10 recommendations |
| 2026-06-29 | `176d2ea7-f0e3-48cd-999b-4ab6055c6a55` | schema-valid daily triage, working memory written; still emitted 10 recommendations |
| 2026-06-30 | `27d695b2-a537-481b-ada6-ca84ec24cd96` | schema-valid daily triage, working memory written; still emitted 10 recommendations |
The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON
responses from `daily-triage-report`. They are not missed schedules and they are
not silent sink failures.
not silent sink failures. The 2026-06-28 through 2026-06-30 events restore a
three-run schema-valid streak, but they do not prove the bounded WP-0016
contract because the reports still emit 10 recommendations instead of the
targeted top-N framing.
## Current Blocker
@@ -40,15 +46,16 @@ malformed tail.
- producer guardrails and ADR-004;
- regression tests for the 2026-06-26 failure shape.
The remaining gate is the live deployment/smoke path:
The remaining gate is the live contract/smoke path:
1. Deploy the WP-0016 code and schema together.
2. Update the Railiance runtime prompt bundle with bounded top-N instructions,
per-item framing, value vocabularies, and sufficient `max_tokens` headroom.
3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail
output degrades to partial valid output with quarantined items.
4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and
`ACTIVITY-WP-0010-T04`.
4. Record the 2026-06-28 / 2026-06-29 / 2026-06-30 three-clean-run
calibration result with the caveat that top-N contract adoption is still
pending.
## Hygiene Note
@@ -66,3 +73,14 @@ to remove or reconcile stale duplicate task rows from the State Hub index.
runner evidence proves the State Hub sink and working-memory path are reachable.
The live human-needed notes now sit on the post-deployment smoke, WP-0016 live
proof, and three-clean-run calibration tasks.
2026-06-30 recheck: State Hub now has schema-valid scheduled `daily_triage`
events for 2026-06-28 (`f0d8477e-1db9-4c07-bb8c-d28cbb868abc`), 2026-06-29
(`176d2ea7-f0e3-48cd-999b-4ab6055c6a55`), and 2026-06-30
(`27d695b2-a537-481b-ada6-ca84ec24cd96`), all with working-memory notes. This
is enough to bank the scheduling/sink/schema-validity streak for calibration,
but not enough to close the WP-0016 live-proof gate: the reports still contain
10 recommendations rather than the bounded top-N contract, and the local
activity-core worktree already has separate in-flight diagnostic/status changes
that should be committed by their owner before Custodian treats them as source
truth.

View File

@@ -1,6 +1,6 @@
# Infrastructure Stabilization Pickup Checkpoint
Updated: 2026-06-27
Updated: 2026-06-30
Coordinator workplan: `CUST-WP-0051`
## Purpose
@@ -68,7 +68,7 @@ separate ops-warden worker.
| State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. |
| Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Legacy/fallback only. Prefer Core Hub deployed smoke; run attended Inter-Hub bootstrap only by explicit operator supersede/rollback decision. |
| Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Do not materialize legacy `OPS_HUB_KEY` until a deployed Core Hub smoke or explicit legacy Inter-Hub smoke is ready to use it. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. |
| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean streak, then have the activity-core owner land/sync the in-flight WP-0016 diagnostics and prove bounded top-N plus graceful-degradation smoke. |
| activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. |
| Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. |
| OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | `warden-sign` lane is verified/banked; broader custody profile and issuer automation remain separate operator-design gates. |
@@ -77,23 +77,32 @@ separate ops-warden worker.
## Daily Automation Evidence
The scheduled daily-triage runner is alive and writing State Hub plus working
memory evidence. The current blocker is output validation, not scheduling or
sink reachability.
memory evidence. The current blocker is bounded output-contract adoption and
live graceful-degradation proof, not scheduling or sink reachability.
Latest clean scheduled run:
Latest clean scheduled streak:
- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`,
schema-valid daily triage, working memory written.
- 2026-06-28: event `f0d8477e-1db9-4c07-bb8c-d28cbb868abc`, schema-valid daily
triage, working memory written.
- 2026-06-29: event `176d2ea7-f0e3-48cd-999b-4ab6055c6a55`, schema-valid daily
triage, working memory written.
- 2026-06-30: event `27d695b2-a537-481b-ada6-ca84ec24cd96`, schema-valid daily
triage, working memory written.
Latest failed scheduled runs:
Latest failed scheduled runs before the clean streak:
- 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed
at char 5268, working memory written.
- 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed
at char 5246, working memory written.
Resume from `docs/daily-triage-stabilization-status.md` and
`ACTIVITY-WP-0016` before restarting the three-clean-run gate.
Bank the three-run calibration streak, but keep the WP-0016 live-proof gate open
until the bounded top-N contract and graceful-degradation smoke are proven. The
activity-core worktree currently has in-flight uncommitted ACTIVITY-WP-0016
and ACTIVITY-WP-0018/0019 changes, so Custodian should wait for that owner to
commit/sync or explicitly hand off before treating those files as source truth.
Use activity-core repo-native automation status surface once it lands; do not
use assistant-provided scheduling as operational evidence.
## Production Service Summary
@@ -117,8 +126,10 @@ Resume from `docs/daily-triage-stabilization-status.md` and
2. Keep `CUST-WP-0047` and `CUST-WP-0049` as legacy evidence/fallback until
Core Hub deployed smoke evidence or an explicit supersede decision closes
them.
3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt
bundle, then run the railiance01 daily-triage smoke.
3. Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean daily-triage
streak for calibration, then have the activity-core owner land/sync the
in-flight WP-0016 diagnostics/status work and prove the bounded top-N plus
graceful-degradation smoke.
4. Complete the issue-core handoff by wiring activity-core to port `8765` with
`ISSUE_SINK_TYPE=rest` and one known-safe emission smoke.
5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or