From cb7fd7b19dccf9539e11641ed81232873179cb82 Mon Sep 17 00:00:00 2001 From: tegwick Date: Tue, 30 Jun 2026 10:07:28 +0200 Subject: [PATCH] Record daily triage clean streak checkpoint --- docs/daily-triage-stabilization-status.md | 28 ++++++++++++--- ...ructure-stabilization-pickup-checkpoint.md | 35 ++++++++++++------- ...1-infrastructure-stabilization-metaplan.md | 20 ++++++++++- 3 files changed, 65 insertions(+), 18 deletions(-) diff --git a/docs/daily-triage-stabilization-status.md b/docs/daily-triage-stabilization-status.md index 9f05e2a..7878166 100644 --- a/docs/daily-triage-stabilization-status.md +++ b/docs/daily-triage-stabilization-status.md @@ -1,6 +1,6 @@ # Daily-Triage Stabilization Status -Updated: 2026-06-27 +Updated: 2026-06-30 ## Purpose @@ -20,10 +20,16 @@ Recent scheduled run evidence: | 2026-06-25 | `cbba6bc0-14cb-492b-ab23-74b9349326c8` | schema-valid daily triage, working memory written | | 2026-06-26 | `97fd20a0-eee0-45ea-8290-6d91874e1515` | validation failed at char 5268, working memory written | | 2026-06-27 | `c5ab50a8-404b-4e30-849f-841b059ace65` | validation failed at char 5246, working memory written | +| 2026-06-28 | `f0d8477e-1db9-4c07-bb8c-d28cbb868abc` | schema-valid daily triage, working memory written; still emitted 10 recommendations | +| 2026-06-29 | `176d2ea7-f0e3-48cd-999b-4ab6055c6a55` | schema-valid daily triage, working memory written; still emitted 10 recommendations | +| 2026-06-30 | `27d695b2-a537-481b-ada6-ca84ec24cd96` | schema-valid daily triage, working memory written; still emitted 10 recommendations | The 2026-06-26 and 2026-06-27 failures are both overlong malformed JSON responses from `daily-triage-report`. They are not missed schedules and they are -not silent sink failures. +not silent sink failures. The 2026-06-28 through 2026-06-30 events restore a +three-run schema-valid streak, but they do not prove the bounded WP-0016 +contract because the reports still emit 10 recommendations instead of the +targeted top-N framing. ## Current Blocker @@ -40,15 +46,16 @@ malformed tail. - producer guardrails and ADR-004; - regression tests for the 2026-06-26 failure shape. -The remaining gate is the live deployment/smoke path: +The remaining gate is the live contract/smoke path: 1. Deploy the WP-0016 code and schema together. 2. Update the Railiance runtime prompt bundle with bounded top-N instructions, per-item framing, value vocabularies, and sufficient `max_tokens` headroom. 3. Run a live daily-triage smoke on railiance01 and confirm malformed-tail output degrades to partial valid output with quarantined items. -4. Resume the three-clean-scheduled-run gate for `ACTIVITY-WP-0006-T03` and - `ACTIVITY-WP-0010-T04`. +4. Record the 2026-06-28 / 2026-06-29 / 2026-06-30 three-clean-run + calibration result with the caveat that top-N contract adoption is still + pending. ## Hygiene Note @@ -66,3 +73,14 @@ to remove or reconcile stale duplicate task rows from the State Hub index. runner evidence proves the State Hub sink and working-memory path are reachable. The live human-needed notes now sit on the post-deployment smoke, WP-0016 live proof, and three-clean-run calibration tasks. + +2026-06-30 recheck: State Hub now has schema-valid scheduled `daily_triage` +events for 2026-06-28 (`f0d8477e-1db9-4c07-bb8c-d28cbb868abc`), 2026-06-29 +(`176d2ea7-f0e3-48cd-999b-4ab6055c6a55`), and 2026-06-30 +(`27d695b2-a537-481b-ada6-ca84ec24cd96`), all with working-memory notes. This +is enough to bank the scheduling/sink/schema-validity streak for calibration, +but not enough to close the WP-0016 live-proof gate: the reports still contain +10 recommendations rather than the bounded top-N contract, and the local +activity-core worktree already has separate in-flight diagnostic/status changes +that should be committed by their owner before Custodian treats them as source +truth. diff --git a/docs/infrastructure-stabilization-pickup-checkpoint.md b/docs/infrastructure-stabilization-pickup-checkpoint.md index 3336570..95c2153 100644 --- a/docs/infrastructure-stabilization-pickup-checkpoint.md +++ b/docs/infrastructure-stabilization-pickup-checkpoint.md @@ -1,6 +1,6 @@ # Infrastructure Stabilization Pickup Checkpoint -Updated: 2026-06-27 +Updated: 2026-06-30 Coordinator workplan: `CUST-WP-0051` ## Purpose @@ -68,7 +68,7 @@ separate ops-warden worker. | State Hub fallback retirement | Custodian/operator approval; `CUST-WP-0038-T08` | HA failover drill id, restore drill id, stabilization pass | Keep deferred until after HA drills; do not retire WSL2 fallback early. | | Inter-Hub ops-hub bootstrap | `inter-hub-bootstrap-ssh`, `openbao-api-key`, `ssh-cert-host-access` as needed | Hub id, manifest id, widget count, runtime key prefix only, smoke result | Legacy/fallback only. Prefer Core Hub deployed smoke; run attended Inter-Hub bootstrap only by explicit operator supersede/rollback decision. | | Ops-hub runtime evidence key | `openbao-api-key` / OpenBao custody | OpenBao path/version or populated key count, event smoke id | Do not materialize legacy `OPS_HUB_KEY` until a deployed Core Hub smoke or explicit legacy Inter-Hub smoke is ready to use it. | -| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Deploy WP-0016 code/schema and bounded runtime prompt bundle, then run railiance01 smoke. | +| Daily-triage live proof | activity-core deploy/runtime operator | State Hub `daily_triage` id, output-valid or partial/quarantine status, working-memory path | Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean streak, then have the activity-core owner land/sync the in-flight WP-0016 diagnostics and prove bounded top-N plus graceful-degradation smoke. | | activity-core to issue-core | route `activity-core-issue-sink` | `actcore-runtime-secret` has key, activity-core points to issue-core port `8765`, HTTP 201, Gitea issue id | Inject `ISSUE_CORE_API_KEY` through approved custody, set REST sink env, restart/sync, run safe emission. | | Forgejo production design | Forgejo/operator decisions plus OpenBao/KeyCape/ops-bridge routes as needed | Decision id, SMTP smoke, backup/restore drill, package/action smoke, cutover approval id | Resolve T02 production choices before any production cutover work. | | OpenBao unseal and credential helper | `openbao-api-key`, `railiance-infra-principals`, `ssh-cert-host-access`, `key-cape-oidc-login` | Policy names, role names, token accessor only, allow/deny smoke | `warden-sign` lane is verified/banked; broader custody profile and issuer automation remain separate operator-design gates. | @@ -77,23 +77,32 @@ separate ops-warden worker. ## Daily Automation Evidence The scheduled daily-triage runner is alive and writing State Hub plus working -memory evidence. The current blocker is output validation, not scheduling or -sink reachability. +memory evidence. The current blocker is bounded output-contract adoption and +live graceful-degradation proof, not scheduling or sink reachability. -Latest clean scheduled run: +Latest clean scheduled streak: -- 2026-06-25: State Hub event `cbba6bc0-14cb-492b-ab23-74b9349326c8`, - schema-valid daily triage, working memory written. +- 2026-06-28: event `f0d8477e-1db9-4c07-bb8c-d28cbb868abc`, schema-valid daily + triage, working memory written. +- 2026-06-29: event `176d2ea7-f0e3-48cd-999b-4ab6055c6a55`, schema-valid daily + triage, working memory written. +- 2026-06-30: event `27d695b2-a537-481b-ada6-ca84ec24cd96`, schema-valid daily + triage, working memory written. -Latest failed scheduled runs: +Latest failed scheduled runs before the clean streak: - 2026-06-26: event `97fd20a0-eee0-45ea-8290-6d91874e1515`, validation failed at char 5268, working memory written. - 2026-06-27: event `c5ab50a8-404b-4e30-849f-841b059ace65`, validation failed at char 5246, working memory written. -Resume from `docs/daily-triage-stabilization-status.md` and -`ACTIVITY-WP-0016` before restarting the three-clean-run gate. +Bank the three-run calibration streak, but keep the WP-0016 live-proof gate open +until the bounded top-N contract and graceful-degradation smoke are proven. The +activity-core worktree currently has in-flight uncommitted ACTIVITY-WP-0016 +and ACTIVITY-WP-0018/0019 changes, so Custodian should wait for that owner to +commit/sync or explicitly hand off before treating those files as source truth. +Use activity-core repo-native automation status surface once it lands; do not +use assistant-provided scheduling as operational evidence. ## Production Service Summary @@ -117,8 +126,10 @@ Resume from `docs/daily-triage-stabilization-status.md` and 2. Keep `CUST-WP-0047` and `CUST-WP-0049` as legacy evidence/fallback until Core Hub deployed smoke evidence or an explicit supersede decision closes them. -3. Deploy the activity-core WP-0016 code/schema and bounded runtime prompt - bundle, then run the railiance01 daily-triage smoke. +3. Bank the 2026-06-28 / 2026-06-29 / 2026-06-30 clean daily-triage + streak for calibration, then have the activity-core owner land/sync the + in-flight WP-0016 diagnostics/status work and prove the bounded top-N plus + graceful-degradation smoke. 4. Complete the issue-core handoff by wiring activity-core to port `8765` with `ISSUE_SINK_TYPE=rest` and one known-safe emission smoke. 5. Request explicit State Hub cutover approval for `CUST-WP-0011-T07`, or diff --git a/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md b/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md index d4b2be1..963054a 100644 --- a/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md +++ b/workplans/CUST-WP-0051-infrastructure-stabilization-metaplan.md @@ -10,7 +10,7 @@ topic_slug: custodian planning_priority: high planning_order: 51 created: "2026-06-27" -updated: "2026-06-27" +updated: "2026-06-30" state_hub_workstream_id: "21cabc98-3f80-4d00-b3b7-06e2ac2af88f" --- @@ -270,6 +270,24 @@ Progress 2026-06-27: - Cleared the stale human-needed flag from the completed bridge/config task and moved live intervention notes onto the deploy/smoke/calibration gate. +Progress 2026-06-30 daily-triage recheck: + +- State Hub now shows three consecutive schema-valid scheduled `daily_triage` + events after the malformed 2026-06-26 and 2026-06-27 outputs: + 2026-06-28 `f0d8477e-1db9-4c07-bb8c-d28cbb868abc`, 2026-06-29 + `176d2ea7-f0e3-48cd-999b-4ab6055c6a55`, and 2026-06-30 + `27d695b2-a537-481b-ada6-ca84ec24cd96`; all wrote working memory. +- This banks the scheduling/sink/schema-validity streak for + `ACTIVITY-WP-0006-T03` calibration feedback, but not the full WP-0016 + live-proof gate because the reports still emit 10 recommendations instead of + the bounded top-N contract. +- /home/worsch/activity-core currently has in-flight uncommitted changes for + ACTIVITY-WP-0016 diagnostics and new ACTIVITY-WP-0018/0019 + automation-status/inventory workplans. Custodian should not overwrite or + commit that worktree; the next clean handoff is for the activity-core owner to + commit/sync or explicitly hand it off, then use the repo-native automation + status surface as evidence. + ## Task: Finish Near-Term Production Service Lanes ```task