Add credential approval workflow plan

2026-06-27 22:48:24 +02:00
parent 9d42c73833
commit 85a4278a55
8 changed files with 1103 additions and 0 deletions
--- a/docs/credential-change-approval.md
+++ b/docs/credential-change-approval.md
@@ -0,0 +1,236 @@
+# Credential Change Approval Workflow
+
+This document sketches the operator workflow we want for it-sec and credential
+changes. The goal is to remove raw OpenBao command authoring from routine human
+operation while preserving explicit human approval, auditability, and safe
+handling of secret values.
+
+## Problem
+
+The current workflow still asks operators to translate a reviewed intent into
+OpenBao commands by hand:
+
+- create or update policies;
+- create auth roles with the right bound claims;
+- create or rotate secret paths and fields;
+- verify positive and negative access;
+- tell ops-warden or another access front door when a lane may become active.
+
+That is inefficient and easy to get wrong. It is also hard to review because
+the actual unit of work is spread across chat, workplans, OpenBao UI screens,
+State Hub notes, and shell commands.
+
+## Direction
+
+Treat OpenBao as the enforcement and audit engine, not the primary review UI.
+Add a small approval control plane in front of it:
+
+1. an agent or CLI creates a structured, non-secret credential change request;
+2. humans review the rendered proposal, risk notes, generated OpenBao plan, and
+   verification plan;
+3. a human approves or denies with a comment;
+4. only approved requests can be applied by an operator-controlled helper;
+5. the helper records non-secret evidence and marks the request active,
+   rejected, deactivated, rotated, or compromised.
+
+This can be implemented with repo files, State Hub, and CLI/chat integration
+first. An OpenBao UI extension can come later if the workflow proves itself.
+
+## Core Object
+
+The canonical unit is a credential change request, abbreviated `CCR`.
+
+The CCR must be non-secret. It may contain:
+
+- stable request id and title;
+- requester, reviewer, approver, and applier identities;
+- target domain, tenant, workload, environment, and purpose;
+- OpenBao mount, path, field names, policy names, and auth role names;
+- exact non-secret policy HCL or generated policy references;
+- proposed auth bindings and bound claims;
+- delivery surface such as ops-warden, External Secrets, CSI, or direct caller
+  fetch;
+- risk classification and approval requirements;
+- generated apply plan;
+- verification plan;
+- rollback, deactivate, rotate, and compromise response plan;
+- comments, approvals, denials, and timestamps;
+- non-secret OpenBao audit request ids or timestamps after execution.
+
+It must not contain:
+
+- secret values;
+- wrapped token values;
+- root, platform-admin, or issuer tokens;
+- passwords, API keys, private keys, OTP seeds, unseal shares, or recovery
+  codes;
+- command output that includes secret values.
+
+## State Machine
+
+Suggested states:
+
+```text
+draft
+proposed
+needs_changes
+approved
+denied
+apply_pending
+applied
+verified
+active
+deactivated
+rotated
+compromised
+superseded
+cancelled
+```
+
+Only `approved` requests may be applied. Only `verified` requests may become
+`active`.
+
+Emergency break-glass work may create a request after the fact, but it must be
+marked as break-glass, reviewed retrospectively, and linked to audit evidence.
+
+## Review Surface
+
+A reviewer should see a concise rendered proposal:
+
+```text
+Request: whynot-design npm publish token lane
+Type: workload-kv-read
+Mount/path/field:
+  platform/workloads/whynot-design/whynot-design/npm-publish
+  NPM_AUTH_TOKEN
+Policy:
+  workload-kv-read-whynot-design-npm-publish
+Auth binding:
+  netkingdom OIDC role whynot-design-workload-kv-read
+  bound claim: groups includes whynot-design
+Access front door:
+  ops-warden whynot-design-npm-token
+Risk:
+  grants read access to npm publish credential
+Checks:
+  positive whynot fetch, negative non-whynot denial, OpenBao audit evidence
+Decision:
+  approve | deny | needs changes
+Comment:
+  free text
+```
+
+The reviewer should not need to know the exact `bao write` syntax. They should
+be able to discuss the proposal in chat, request changes, and then make a
+formal decision.
+
+## Minimal Implementation
+
+Version 1 should be boring:
+
+- store CCR files under `credential-change-requests/`;
+- validate CCR schema offline;
+- render a human-readable review summary;
+- generate OpenBao apply plans from approved CCRs;
+- require an approval record before apply;
+- apply only non-secret policy/auth/path metadata;
+- prompt or delegate separately for secret value entry;
+- record non-secret evidence in State Hub.
+
+The CLI shape can be:
+
+```bash
+scripts/credential-change.py propose workload-kv ...
+scripts/credential-change.py render CCR-YYYY-NNNN
+scripts/credential-change.py approve CCR-YYYY-NNNN --comment "..."
+scripts/credential-change.py deny CCR-YYYY-NNNN --comment "..."
+scripts/credential-change.py apply CCR-YYYY-NNNN
+scripts/credential-change.py verify CCR-YYYY-NNNN
+scripts/credential-change.py deactivate CCR-YYYY-NNNN --reason "..."
+```
+
+The same operations can be exposed through chat by having the agent create the
+proposal, show the rendered summary, then call the CLI only after the human
+gives an explicit approval phrase.
+
+## State Hub Role
+
+State Hub should hold:
+
+- request lifecycle events;
+- review comments;
+- approval/denial decisions;
+- non-secret apply and verification evidence;
+- links to workplans and CCR files.
+
+State Hub should not hold secret values. It can be the first review UI because
+it already supports messages, progress, task status, and cross-repo
+coordination.
+
+## OpenBao Role
+
+OpenBao remains authoritative for:
+
+- policy enforcement;
+- auth method configuration;
+- token issuance and revocation;
+- secret storage;
+- audit logs.
+
+Where OpenBao supports non-secret metadata on secret paths or auth roles, we can
+mirror CCR ids and status labels. The workflow must not depend on OpenBao being
+the only index, because operators need to see proposed, rejected, deactivated,
+rotated, and compromised items across repos and access front doors.
+
+## ops-warden Role
+
+ops-warden should consume only approved and active access lanes.
+
+For draft requests, ops-warden may create a draft catalog entry that points to
+the CCR, but it should not activate the entry until the CCR is verified.
+
+For `warden access --fetch` / `--exec`, the catalog should include the CCR id
+and refuse active use when the CCR state is not `active`.
+
+## Interactive Runbook Role
+
+The interactive runbook is the operator bridge:
+
+1. load a CCR;
+2. show the rendered summary and exact generated plan;
+3. confirm the request is approved;
+4. acquire operator authority through an approved path;
+5. apply the plan;
+6. ask for attended secret entry when needed;
+7. run positive and negative verification;
+8. record non-secret evidence;
+9. notify downstream front doors such as ops-warden.
+
+This lets operators safely drive privileged work without needing to remember
+every OpenBao command.
+
+## Compromise And Deactivation
+
+Every active CCR needs a deactivate and rotate path:
+
+- `deactivated`: access intentionally disabled but not necessarily compromised;
+- `rotated`: secret value replaced and old value no longer valid;
+- `compromised`: emergency state requiring immediate disablement, rotation,
+  blast-radius notes, and incident follow-up.
+
+The workflow must support marking an existing credential or lane as compromised
+even when the original request predates this system.
+
+## Near-Term Target
+
+Use the whynot-design npm token lane as the pilot:
+
+1. encode the existing non-secret lane as a CCR;
+2. render it for review;
+3. approve or request changes from chat;
+4. generate/apply the OpenBao policy and auth role only after approval;
+5. provision the secret value by attended operator custody;
+6. verify and activate the ops-warden catalog entry.
+
+Once that path feels good, reuse it for the sibling workload-KV lanes and the
+credential broker's OpenBao token-role gates.
--- a/docs/openbao.md
+++ b/docs/openbao.md
@@ -404,6 +404,10 @@ platform/operators/<purpose>
 The template policy for workload KV reads is
 `openbao/policies/workload-kv-read-template.hcl`.

+Concrete workload access lanes used by ops-warden and similar front doors are
+tracked in `docs/workload-kv-access-lanes.md`. These docs carry non-secret
+path, field, policy, auth-role, and verification pointers only.
+
 ## Backup, Restore, Audit, And Monitoring

 Before any live application secrets move into OpenBao:
--- a/docs/workload-kv-access-lanes.md
+++ b/docs/workload-kv-access-lanes.md
@@ -0,0 +1,154 @@
+# Workload KV Access Lanes
+
+This document records concrete OpenBao workload KV paths that external access
+front doors can reference without storing or vending secret values themselves.
+The first lane is for ops-warden `warden access --fetch` / `--exec`.
+
+## Safety Rules
+
+- Do not put secret values in Git, State Hub, chat, prompts, workplans, or logs.
+- Store only non-secret pointers here: path, field name, policy name, auth role,
+  flex-auth reference, and verification status.
+- ops-warden may proxy a read as the caller, but it must not hold the returned
+  value beyond the caller-requested fetch/exec process.
+- Live writes require an approved OpenBao/operator path and attended handling
+  of the secret value.
+
+## whynot-design npm Publish Token
+
+Ops-warden request:
+`551031d1-335e-4db8-9535-820fea52d0a3`
+
+| Item | Value |
+| --- | --- |
+| ops-warden catalog id | `whynot-design-npm-token` |
+| KV mount | `platform` |
+| OpenBao CLI path | `platform/workloads/whynot-design/whynot-design/npm-publish` |
+| Secret field | `NPM_AUTH_TOKEN` |
+| Read policy | `workload-kv-read-whynot-design-npm-publish` |
+| Policy file | `openbao/policies/workload-kv-read-whynot-design-npm-publish.hcl` |
+| OIDC auth mount | `netkingdom` |
+| OIDC role | `whynot-design-workload-kv-read` |
+| Kubernetes auth role | `whynot-design-workload-kv-read` if an in-cluster service account consumes this lane |
+| flex-auth ref | `secret.read:whynot-design` if tenant policy requires pre-approval |
+
+Expected caller login shape:
+
+```bash
+bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read
+```
+
+Expected fetch shape:
+
+```bash
+bao kv get -field=NPM_AUTH_TOKEN platform/workloads/whynot-design/whynot-design/npm-publish
+```
+
+The fetch command returns the secret value to the authenticated caller. Run it
+only in an attended shell or through a process that consumes the value without
+logging it.
+
+## OpenBao Policy
+
+The source policy grants only:
+
+```text
+read platform/data/workloads/whynot-design/whynot-design/npm-publish
+read platform/metadata/workloads/whynot-design/whynot-design/npm-publish
+```
+
+It does not grant write, delete, patch, sudo, auth, sibling workload, or parent
+list capabilities.
+
+Dry-run the policy apply path:
+
+```bash
+make openbao-workload-kv-lanes-dry-run
+```
+
+Apply the policy with an approved platform-admin/operator token:
+
+```bash
+OPENBAO_TOKEN_FILE=~/.local/openbao/platform-admin.token \
+  make openbao-configure-workload-kv-lanes
+```
+
+If the OpenBao pod has an approved token-helper session, use:
+
+```bash
+make openbao-configure-workload-kv-lanes OPENBAO_WORKLOAD_KV_ARGS=--use-token-helper
+```
+
+Do not paste the token into shell history or logs. The helper reads a token
+from `OPENBAO_TOKEN_FILE` or an interactive hidden prompt unless
+`--use-token-helper` is set, and passes it to OpenBao through stdin.
+
+## Auth Role
+
+The intended OpenBao OIDC role is:
+
+```text
+auth/netkingdom/role/whynot-design-workload-kv-read
+```
+
+The role must attach only:
+
+```text
+workload-kv-read-whynot-design-npm-publish
+```
+
+Before applying the role, confirm the KeyCape/NetKingdom claim that identifies
+the whynot-design caller. The role must bind to that claim; do not create an
+unbounded OIDC role that grants this policy to every OIDC user.
+
+If the consumer is an in-cluster service account instead of an OIDC caller, use
+Kubernetes auth with the same role name and bind only the approved namespace
+and service account.
+
+## Secret Provisioning
+
+An approved operator must create or confirm the secret with:
+
+```text
+path:  platform/workloads/whynot-design/whynot-design/npm-publish
+field: NPM_AUTH_TOKEN
+```
+
+The value must be entered directly through OpenBao/operator custody. Record only
+non-secret evidence: actor, timestamp, path, field name, policy name, and
+verification result.
+
+## Verification
+
+Positive verification:
+
+1. Authenticate as the whynot-design caller using the approved OIDC or
+   Kubernetes auth role.
+2. Fetch the field in an attended session or through `warden access --fetch`.
+3. Record only that the fetch succeeded; do not record the value.
+
+Negative verification:
+
+1. Authenticate as a non-whynot identity.
+2. Confirm the same field read is denied.
+3. Record the non-secret OpenBao audit request ids or timestamps for the
+   allowed and denied attempts.
+
+## ops-warden Handoff
+
+Send ops-warden only these pointers:
+
+```text
+catalog id: whynot-design-npm-token
+mount: platform
+path: platform/workloads/whynot-design/whynot-design/npm-publish
+field: NPM_AUTH_TOKEN
+oidc login: bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read
+policy: workload-kv-read-whynot-design-npm-publish
+policy file: openbao/policies/workload-kv-read-whynot-design-npm-publish.hcl
+flex-auth ref: secret.read:whynot-design, if tenant policy requires it
+runbook: docs/workload-kv-access-lanes.md
+```
+
+Until live provisioning and verification are complete, ops-warden should keep
+the catalog entry in `draft` or equivalent non-active state.