From 38c6b11103161697ad7d4e2dd32b88ccaeb08ce8 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 14:52:08 +0200 Subject: [PATCH] RAILIANCE-WP-0009/0010 T07: credential lane lifecycle runbook Co-Authored-By: Claude Fable 5 --- docs/credential-lane-lifecycle-runbook.md | 89 +++++++++++++++++++ ...9-issue-core-runtime-ingestion-key-lane.md | 15 +++- ...lm-connect-openrouter-provider-key-lane.md | 15 +++- 3 files changed, 117 insertions(+), 2 deletions(-) create mode 100644 docs/credential-lane-lifecycle-runbook.md diff --git a/docs/credential-lane-lifecycle-runbook.md b/docs/credential-lane-lifecycle-runbook.md new file mode 100644 index 0000000..30ff4cf --- /dev/null +++ b/docs/credential-lane-lifecycle-runbook.md @@ -0,0 +1,89 @@ +# Credential Lane Lifecycle Runbook + +Status: active (RAILIANCE-WP-0009-T07 / RAILIANCE-WP-0010-T07) +Date: 2026-07-02 + +Covers deactivation, rotation, and compromise response for the workload KV +lanes established by `CCR-2026-0002` (issue-core) and `CCR-2026-0003` +(llm-connect). The **canonical, always-current procedure** is generated from +the CCR itself — this runbook adds only the lane-specific consumer facts the +generator cannot know. + +```bash +scripts/credential-change.py lifecycle-plan --action {deactivate|rotate|compromise} +# then execute the rendered steps and record: +scripts/credential-change.py lifecycle-event --action \ + --actor --reason "" --detail "" --record-state-hub +``` + +All three actions share the same invariants: the front door goes +non-resolvable *first*, OpenBao metadata changes use approved operator or +delegated-applier authority (never `platform-admin` handoffs), audit +evidence is preserved (never delete the audit device or its entries), and no +secret value ever appears in Git, State Hub, chat, prompts, or shell history. + +## Lane: issue-core runtime ingestion (`CCR-2026-0002`) + +| Item | Value | +| --- | --- | +| KV path | `platform/workloads/issue-core/issue-core/issue-core-runtime` | +| Fields | `ISSUE_CORE_API_KEY`, `GITEA_BACKEND_TOKEN` | +| Policy / auth role | `workload-kv-read-issue-core-runtime` / `auth/kubernetes/role/external-secrets-issue-core` | +| Primary consumer | ExternalSecret `issue-core/issue-core-runtime` (CoulombCore cluster, 1h refresh) | +| ops-warden catalog | `issue-core-ingestion-api-key` | + +**Consumer facts the generated plan does not cover:** + +- Deactivating the policy/role stops the ExternalSecret from *refreshing*, + but the materialized Kubernetes Secret **persists** with the last value — + a real deactivation or compromise response must also delete + `secret/issue-core-runtime` in the `issue-core` namespace (ESO will not + recreate it while the lane is down) and restart the issue-core Deployment. +- **`ISSUE_CORE_API_KEY` has a second consumer**: railiance01's + `activity-core/actcore-runtime-secret` holds an operator-injected copy + (2026-07-02, ISSUE-WP-0003-T06). Rotation and compromise response MUST + re-inject the new value there (stdin-only pipe from OpenBao) and restart + `deploy/actcore-worker`, or activity-core emission silently starts failing + with 401s on the next run. +- `GITEA_BACKEND_TOKEN` is a scoped Gitea token for service user + `issue-core-svc`; rotating it means minting a new token in Gitea first, + then updating OpenBao — order matters, or ingestion breaks between steps. + +## Lane: llm-connect OpenRouter provider key (`CCR-2026-0003`) + +| Item | Value | +| --- | --- | +| KV path | `platform/workloads/activity-core/llm-connect/llm-connect-provider-secrets` | +| Field | `OPENROUTER_API_KEY` | +| Policy / auth role | `workload-kv-read-llm-connect-provider-secrets` / `auth/kubernetes/role/external-secrets-activity-core` | +| Primary consumer | ExternalSecret `activity-core/llm-connect-provider-secrets` (CoulombCore cluster, 1h refresh) | +| ops-warden catalog | `openrouter-llm-connect` | + +**Consumer facts the generated plan does not cover:** + +- llm-connect consumes the Secret via `envFrom`, so a rotated value reaches + the runtime only after `kubectl -n activity-core rollout restart + deploy/llm-connect` (CoulombCore). Wait for the ExternalSecret refresh (or + `force-sync` annotate) *before* restarting. +- **The railiance01 llm-connect instance is out of scope of this lane**: it + uses a bootstrap-provisioned Secret from + `activity-core/k8s/railiance/bootstrap-secrets.sh`. Rotating the OpenRouter + key upstream (at OpenRouter) invalidates *both* copies — a provider-side + rotation therefore always requires the railiance01 manual update too, or + the daily triage runs start failing with provider auth errors. +- Compromise response for a provider key has an extra step the plan cannot + render: **revoke the key at OpenRouter itself** (provider console) before + or immediately after disabling the front door; OpenBao custody actions + alone do not stop a leaked provider key from working. + +## Verification after rotate + +Return the lane to `active` only with fresh positive + negative evidence, +same shape as activation (2026-07-02 precedent): + +- positive: ExternalSecret `SecretSynced=True` with a new refresh timestamp, + consumer pod healthy after restart; +- negative: a `default`-policy token denied on the KV data path, matched in + the file audit device by path and timestamp; +- record via `lifecycle-event ... --record-state-hub` and notify ops-warden + to flip the catalog entry back to active. diff --git a/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md b/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md index 2171b73..b9eeefc 100644 --- a/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md +++ b/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md @@ -249,7 +249,7 @@ Acceptance: ```task id: RAILIANCE-WP-0009-T07 -status: wait +status: done priority: medium state_hub_task_id: "c85d1139-1f7d-4ed4-a2fc-5ea4ecbdf0c6" ``` @@ -293,3 +293,16 @@ the field-set decision to keep `ISSUE_CORE_API_KEY` and `GITEA_BACKEND_TOKEN`. `/openbao/audit/openbao-audit.log`. - T06 progress: front-door handoff sent to ops-warden (State Hub message `5d47caaa-dd3f-496f-94ba-a488722f8d82`); waiting on catalog confirmation. + + +## T07 completed 2026-07-02 + +Lifecycle operations documented in +`docs/credential-lane-lifecycle-runbook.md`: the canonical per-action +procedure is generated by `scripts/credential-change.py lifecycle-plan + --action {deactivate|rotate|compromise}`, and the runbook adds the +lane-specific consumer facts (materialized-Secret persistence, second +consumers, restart requirements, provider-side revocation for the OpenRouter +key) plus the post-rotate verification contract. Front-door disable comes +first in every action; audit evidence is never deleted; values stay in +OpenBao/operator custody. diff --git a/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md b/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md index 1610fe0..a17d411 100644 --- a/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md +++ b/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md @@ -263,7 +263,7 @@ Acceptance: ```task id: RAILIANCE-WP-0010-T07 -status: wait +status: done priority: medium state_hub_task_id: "130155a5-e0f9-49f8-ba27-b48098746f02" ``` @@ -326,3 +326,16 @@ activity-core-owner); T01 closes on that approval with the llm-connect instance on the railiance01 k3s cluster still consumes its bootstrap-provisioned Secret; migrating it is railiance01-cluster work, not part of CCR-2026-0003. + + +## T07 completed 2026-07-02 + +Lifecycle operations documented in +`docs/credential-lane-lifecycle-runbook.md`: the canonical per-action +procedure is generated by `scripts/credential-change.py lifecycle-plan + --action {deactivate|rotate|compromise}`, and the runbook adds the +lane-specific consumer facts (materialized-Secret persistence, second +consumers, restart requirements, provider-side revocation for the OpenRouter +key) plus the post-rotate verification contract. Front-door disable comes +first in every action; audit evidence is never deleted; values stay in +OpenBao/operator custody.