# OpenBao - Platform Secrets Service **Chart:** `openbao/openbao` **Chart version:** `0.28.2` **App version:** `v2.5.3` **Namespace:** `openbao` **Managed by:** `railiance-platform` (S3) **Workplan:** `RAIL-PL-WP-0002` **Initial target:** Railiance01 (`92.205.62.239`) --- ## Architecture ``` S5 workloads / operators -> openbao.openbao.svc.cluster.local:8200 -> openbao-0 -> integrated Raft storage on local-path PVC -> audit storage PVC mounted at /openbao/audit Platform operators with approved admin identity -> https://bao.coulomb.social -> Traefik Ingress + TLS -> openbao-ui service -> OpenBao UI/API -> KeyCape OIDC at https://kc.coulomb.social for login ``` - OpenBao is the canonical Railiance S3 secrets service. - SOPS/age remains the Git-at-rest bootstrap mechanism. - The first Railiance01 deployment is single-replica Raft, not true HA. - Browser UI/API exposure is declared for `https://bao.coulomb.social`. Operators authenticate through KeyCape/OIDC with MFA and the `platform-admin` role. Do not use the root token through the browser UI. - `kubectl exec` and port-forwarding remain valid break-glass/operator paths for maintenance and non-browser verification. - TLS is disabled inside the pod listener for this internal-only bootstrap. Add cert-manager-backed internal TLS before relying on cluster-internal traffic from untrusted namespaces. ## Deployment The official OpenBao project recommends the Helm chart for Kubernetes deployments and warns to run Helm with `--dry-run` before install or upgrade. From a host with kubeconfig access: ```bash make openbao-dry-run make openbao-deploy make openbao-status ``` `make openbao-deploy` applies `helm/openbao-middleware.yaml` (Traefik rate-limit and HSTS), upgrades the OpenBao Helm release, then applies the KeyCape login overlay gateway (`helm/openbao-ui-overlay-k8s.yaml`). Public ingress for `bao.coulomb.social` targets `openbao-ui-gateway`, not the chart ingress (which stays disabled in `helm/openbao-values.yaml`). On Railiance01 directly: ```bash cd ~/railiance-platform sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-dry-run sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-deploy sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-status ``` If the repo is not present on Railiance01 yet, copy only the non-secret values and middleware files, then run Helm directly: ```bash scp helm/openbao-values.yaml tegwick@92.205.62.239:/tmp/openbao-values.yaml scp helm/openbao-middleware.yaml tegwick@92.205.62.239:/tmp/openbao-middleware.yaml ssh tegwick@92.205.62.239 \ 'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply -f /tmp/openbao-middleware.yaml' ssh tegwick@92.205.62.239 \ 'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install openbao openbao/openbao \ --version 0.28.2 \ --namespace openbao \ --create-namespace \ -f /tmp/openbao-values.yaml \ --dry-run' ``` Repeat without `--dry-run` to deploy. ## Verification ```bash kubectl get pods,svc,pvc -n openbao -o wide kubectl exec -n openbao openbao-0 -- bao status ``` Expected immediately after install: - `openbao-0` is Running. - `openbao`, `openbao-active`, `openbao-internal`, and `openbao-ui` services exist as cluster-internal services. - After DNS points at the cluster ingress, `https://bao.coulomb.social` serves the OpenBao UI over valid TLS. - data and audit PVCs are Bound. - `bao status` reports `Initialized: false` and `Sealed: true`. That state is intentional until the bootstrap ceremony is completed. `bao status` may return exit code `2` while sealed; this is expected for the pre-init state and does not by itself indicate a deployment failure. ## Bootstrap Ceremony Do not initialize OpenBao in a casual shell session. Initialization emits the unseal keys and initial root token. Treat this as a break-glass event. ### Setup Operator And King Credential The initial accountable setup operator/contact is `tegwick` (`bernd.worsch@gmail.com`), with Gitea identity `tegwick`. This identity can assemble early infrastructure, receive notifications, and operate day-to-day Git/Gitea workflows, but it is not the desired long-term platform root of trust. The actual platform-root target is a separate king credential created through the NetKingdom bootstrap path before OpenBao becomes live secret custody. Email may receive notifications, but Gitea, Git, State Hub, chat, tickets, shell history, and email must not store or transfer OpenBao unseal keys, root tokens, private keys, OTP seeds, recovery codes, or screenshots of secret output. The canonical custody policy is in `net-kingdom/docs/platform-root-custody.md`. The preferred production posture is independent two-of-three custody. Temporary single-operator king custody is feasible for pre-production bootstrap only when second-factor protection, offline recovery storage, and a low-friction upgrade path to additional custodians are in place. Pre-flight checks: ```bash make openbao-status make openbao-verify ``` Proceed only when: - `openbao-0` is Running. - data and audit PVCs are Bound. - `bao status` reports `Initialized: false` and `Sealed: true`. - Railiance01 host/cluster backup posture is understood for this maintenance window. - the guided NetKingdom bootstrap path exists for creating or importing the king credential. - the OpenBao custody mode is recorded: preferred independent custody, or an explicit temporary single-custodian king bootstrap exception. Recommended ceremony: 1. Confirm the Railiance01 backup posture first. 2. Prepare the king credential and approved escrow holders or offline single-custody locations. 3. Run initialization once: ```bash kubectl exec -n openbao openbao-0 -- \ bao operator init -key-shares=3 -key-threshold=2 ``` 4. Give each unseal share to its escrow owner or approved king-custody location through an out-of-band channel. 5. Unseal with two shares: ```bash kubectl exec -n openbao openbao-0 -- bao operator unseal ``` 6. Log in with the initial root token only long enough to create durable admin auth, enable audit, and prepare policies. 7. Revoke or tightly escrow the initial root token. Do not paste unseal keys, root tokens, screenshots, or command output into Git, State Hub, chat, shell history, or issue trackers. Each unseal share goes to one escrow owner through an out-of-band channel. The initial root token is either revoked after a non-root platform-admin token exists or stored as offline break-glass material with the same handling as unseal shares. ## Initial Configuration After Unseal File audit is configured declaratively in `helm/openbao-values.yaml` with a server config `audit "file" "file"` stanza that writes to `/openbao/audit/openbao-audit.log` on the audit PVC. Enable the first KV v2 mount: ```bash kubectl exec -n openbao openbao-0 -- \ bao secrets enable -path=platform kv-v2 ``` Kubernetes auth, database dynamic credentials, PKI, CSI, and External Secrets integration are follow-up tasks in `RAIL-PL-WP-0002`. Do not migrate live application secrets until those policies and restore drills are documented. The repo now includes a non-secret helper for the first post-unseal configuration: ```bash make openbao-configure-initial ``` The target prompts for a token, verifies the declarative file audit device is visible, enables the `platform/` KV v2 mount, enables Kubernetes auth, configures Kubernetes auth from the in-pod service account, and loads: - `openbao/policies/platform-admin.hcl` - `openbao/policies/platform-readonly.hcl` It does not print or store the token. You may also set `OPENBAO_TOKEN_FILE=/path/to/token-file` for an operator-local, uncommitted token file. OpenBao audit is a production gate. If `bao audit list` does not show `file/`, fix the declarative audit stanza or Helm rollout before moving production secrets into OpenBao. The helper is idempotent. Re-running it should report existing `platform/` and `kubernetes/` paths as already enabled instead of failing the ceremony. After the helper succeeds, create a non-root admin token: ```bash kubectl exec -n openbao openbao-0 -- \ bao token create -policy=platform-admin -period=24h -orphan ``` Store that token through the approved operator secret path, then revoke or tightly escrow the initial root token. The root token should not become the normal operator credential. ## SSH Secrets Engine (ops-warden) After `openbao-configure-initial`, enable the SSH user CA used by `ops-warden` (`warden sign` via `backend: vault`). This is **NET-WP-0020 T5** / **WP-0008 T2** prerequisite. Declarative artifacts: - `openbao/ssh/roles-spec.yaml` — `adm-role`, `agt-role`, `atm-role` TTLs - `openbao/policies/warden-sign.hcl` — least-privilege signing policy - `scripts/openbao-apply-ssh-engine.sh` — idempotent apply via `kubectl exec` - `scripts/openbao-verify-ssh-engine.sh` — non-mutating verification Apply (requires `platform-admin` or equivalent token with `ssh/*` admin): ```bash mkdir -p ~/.local/openbao # Store platform-admin token locally (mode 600, never commit): # echo '' > ~/.local/openbao/platform-admin.token && chmod 600 ~/.local/openbao/platform-admin.token OPENBAO_TOKEN_FILE=~/.local/openbao/platform-admin.token OPENBAO_SSH_CA_PUBKEY_OUT=/tmp/openbao-ssh-ca.pub make openbao-configure-ssh OPENBAO_TOKEN_FILE=~/.local/openbao/platform-admin.token make openbao-verify-ssh ``` The apply script exports the CA public key to `OPENBAO_SSH_CA_PUBKEY_OUT` and updates K8s secret `openbao/openbao-ssh-ca-pub` (non-secret pubkey only). Create a dedicated warden signing token (do not use platform-admin daily): ```bash kubectl exec -n openbao openbao-0 -- bao token create -policy=warden-sign -period=8h -orphan ``` Host trust and principals are **railiance-infra** scope: ```bash cd ~/railiance-infra make bootstrap-ssh-ca SSH_CA_PUBKEY=/tmp/openbao-ssh-ca.pub ``` Then on the workstation: `bao login` (or export `VAULT_TOKEN` from the `warden-sign` token) and run `warden sign` per `ops-warden/wiki/OpenBaoSshEngineChecklist.md`. ## Auth And Workload Integration Initial auth model: | Actor | Method | Notes | |-------|--------|-------| | Setup operator/contact | Gitea `tegwick` / `bernd.worsch@gmail.com` | low-trust assembly and notifications; not platform root of trust | | King credential | NetKingdom custody record for dedicated platform-root identity | accountable bootstrap/recovery authority; not a Git or email secret store | | Bootstrap operator | one-time root token | only for initial audit, mounts, auth, policies, and non-root token creation | | Platform operator | token with `platform-admin` | temporary until NetKingdom OIDC/admin integration is ready | | Read-only reviewer | token with `platform-readonly` | metadata and health visibility, no secret reads | | Kubernetes workload | Kubernetes auth role | namespace/service-account bound, policy per workload | | Human identity | NetKingdom IAM Profile/OIDC | target model; OpenBao is not the identity provider | | Automation | Kubernetes auth or short-lived operator token | no root tokens in automation | ### Browser UI Login The browser operator surface is: ```text https://bao.coulomb.social ``` Operators see a streamlined **Sign in with KeyCape** mask. The raw OpenBao fields (namespace, method, mount path, role) are hidden presets applied by the UI overlay in `helm/openbao-ui-overlay/`. Public ingress targets the `openbao-ui-gateway` nginx proxy, which injects overlay assets and forwards to the OpenBao service. Hidden defaults (also in `helm/openbao-ui-overlay/presets.json`): ```text method: OIDC namespace: leave blank mount path: netkingdom role: platform-admin ``` Deploy or refresh the overlay: ```bash make openbao-overlay-apply make openbao-verify-login-overlay make openbao-verify-login-overlay OPENBAO_VERIFY_LOGIN_OVERLAY_ARGS=--check-upstream-drift ``` After an OpenBao image or chart upgrade, follow `helm/openbao-ui-overlay/README.md` to refresh overlay selectors and `patches//manifest.sha256` fingerprints if upstream login markup changed. OIDC mounts must be visible to the unauthenticated UI listing or Ember falls back to token auth (`?with=token`). Apply once per cluster: ```bash OPENBAO_TOKEN_FILE=~/.local/openbao/platform-admin.token \ scripts/openbao-tune-auth-listing.sh ``` The login overlay also redirects to `?with=netkingdom/` and starts KeyCape OIDC directly when the operator clicks **Sign in with KeyCape**. The OpenBao UI redirects the browser to KeyCape at `kc.coulomb.social`, then returns to: ```text https://bao.coulomb.social/ui/vault/auth/netkingdom/oidc/callback ``` The legacy `keycape` mount remains a compatibility alias for existing operator notes and CLI experiments. The preferred browser mount is `netkingdom`. The browser callback URI must be present in both: - KeyCape `openbao-admin` client redirect URIs; and - OpenBao `auth/netkingdom/role/platform-admin` `allowed_redirect_uris`. If the compatibility alias is kept enabled, also keep `https://bao.coulomb.social/ui/vault/auth/keycape/oidc/callback` in the KeyCape client and `auth/keycape/role/platform-admin`. Use the browser UI for metadata inspection and attended operator workflows. Do not use the OpenBao root token through the browser UI. Do not copy secret values, Inter-Hub keys, unseal shares, root tokens, OIDC client secrets, or screenshots of secret values into Git, State Hub, chat, or workplans. For `HF-WP-0001`, prefer metadata-only inspection of candidate paths such as: ```text platform/ platform/operators/ platform/operators/inter-hub/ ``` Workload delivery choice: - Prefer External Secrets Operator for values that should become Kubernetes Secrets consumed by ordinary Helm charts. - Use CSI-mounted files for workloads that need file references, sharper mount-level boundaries, or secret refresh without rewriting application manifests. - Do not use the OpenBao injector in the current deployment; the Helm values leave it disabled. - Application repositories request paths and policies; `railiance-platform` owns platform mounts, policy shape, and delivery mechanisms. Path convention: ```text platform/workloads/// platform/object-storage/ platform/databases/ platform/operators/ ``` The template policy for workload KV reads is `openbao/policies/workload-kv-read-template.hcl`. ## Backup, Restore, Audit, And Monitoring Before any live application secrets move into OpenBao: 1. Confirm file audit is enabled and an audit file is written under `/openbao/audit/openbao-audit.log`. 2. Create an OpenBao Raft snapshot from the unsealed pod: ```bash kubectl exec -n openbao openbao-0 -- \ bao operator raft snapshot save /tmp/openbao-raft.snap kubectl cp openbao/openbao-0:/tmp/openbao-raft.snap ./openbao-raft.snap ``` 3. Encrypt the snapshot with age/SOPS-compatible custody before it leaves the operator machine. 4. Run an isolated restore drill before treating OpenBao as live secret custody. The drill must prove that a fresh OpenBao instance can restore the snapshot, unseal, and read a test secret. Record only non-secret evidence using `docs/openbao-restore-drill-evidence.example.json` as a template, replace every placeholder with real drill evidence, then validate it with: ```bash make openbao-validate-restore-evidence \ OPENBAO_RESTORE_EVIDENCE=/path/to/evidence.json ``` 5. Decide where audit logs are shipped durably. The audit PVC alone is not a durable audit sink. The interim `audit-core` mock file backend can prove API and setup wiring, but it writes to `/tmp` and is not production retention. 6. Run: ```bash make openbao-verify-post-unseal ``` Authenticated verification, after the KeyCape-backed `platform-admin` path or another approved operator token is available: ```bash make openbao-verify-authenticated ``` The target prompts for the token without echoing it, never puts the token on the command line, and only runs non-mutating checks. It verifies that `bao audit list` shows `file/`, `bao secrets list` shows `platform/`, `bao auth list` shows `kubernetes/`, `netkingdom/`, and `keycape/`, and that the file audit log is non-empty. If a previous attended OIDC login stored a still-valid token in the pod token helper, use: ```bash make openbao-verify-authenticated OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper ``` Current durable audit status: the file audit device writes to the audit PVC, which is necessary but not enough for production trust. Before application secrets move into OpenBao, choose and test a durable audit sink beyond that PVC such as an encrypted platform backup/export path or the future centralized logging stack. Do not treat non-secret hashes, screenshots, or State Hub notes as substitutes for retained audit log custody. Interim integration status: `/home/worsch/audit-core` provides a mock Audit Core backend that writes JSONL records under `/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and deletes files older than seven days. Use it only to wire interfaces and setup validation before the durable Audit Core archive exists. Emergency seal/unseal drills are disruptive and must only run in an attended window with threshold unseal shares available. Record non-secret drill evidence using `docs/openbao-emergency-drill-evidence.example.json` as a template, replace every placeholder with real drill evidence, then validate it with: ```bash make openbao-validate-emergency-evidence \ OPENBAO_EMERGENCY_EVIDENCE=/path/to/evidence.json ``` Monitoring baseline: - pod readiness and liveness from Kubernetes probes - `bao status` seal/init state - PVC capacity for data and audit storage - audit log write success - future Prometheus scraping once the cluster monitoring stack exists ## Artifact-Store Object Storage Handoff `artifact-store` is the consumer-facing artifact preservation service for generated outputs, evidence packages, reports, logs, snapshots, exports, and release artifacts. It already has an S3-compatible backend with `env:NAME` and `file:/mounted/path` credential references, plus an `artifactstore storage verify --backend s3` smoke path. Railiance should avoid building a parallel object-storage client or credential vending flow in OpenBao. The ownership split is: - `railiance-platform` / OpenBao owns bootstrap secret custody, policy, audit, break-glass access, and workload secret delivery. - `artifact-store` owns artifact package manifests, the S3 backend, storage verification, and whether temporary credentials require backend refresh support or a sidecar/controller. - `net-kingdom` owns the identity issuer and role-claim model if object storage adopts STS with `AssumeRoleWithWebIdentity`. Initial static-credential bridge, before STS is proven: 1. Create a scoped object-store access key limited to the artifact-store bucket and prefix. Do not use object-store root credentials. 2. Store the key pair in OpenBao under a platform-owned path such as `platform/object-storage/artifact-store`. 3. Deliver the values to the artifact-store pod through CSI or External Secrets as mounted files. 4. Configure artifact-store with file references: ```bash export ARTIFACTSTORE_S3_ACCESS_KEY_REF=file:/run/secrets/artifactstore/s3-access-key export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore/s3-secret-key ``` 5. Verify from artifact-store: ```bash artifactstore storage verify --backend s3 ``` STS credential vending remains linked to `ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS Credential Vending`. If that workstream chooses MinIO-compatible `AssumeRoleWithWebIdentity`, OpenBao should not become the identity provider by default. Use the NetKingdom OIDC issuer for workload/user identity, map object storage roles and policies there, and keep OpenBao responsible for bootstrap, break-glass, audit, and delivery of any controller configuration. Current artifact-store configuration exposes access key and secret key refs, but no session-token ref. `ARTIFACT-STORE-WP-0007-T004` must either add temporary-session-token support to the S3 backend or choose a sidecar/secret controller pattern that keeps refreshed credentials available through the existing env/file reference contract. ## Upgrade And Rollback 1. Read the OpenBao chart release notes. 2. Update `OPENBAO_CHART_VERSION` in `Makefile`. 3. Run `make openbao-dry-run`. 4. Confirm current backup and audit log posture. 5. Run `make openbao-deploy`. 6. Run `make openbao-status`. For rollback, run `helm rollback openbao -n openbao` on Railiance01 and re-check `bao status`. ## Scaling To Three Nodes When Railiance02 and Railiance03 join: 1. Move storage from `local-path` to distributed storage. 2. Set `server.affinity` back to anti-affinity. 3. Set `server.ha.replicas: 3`. 4. Re-enable a PodDisruptionBudget. 5. Run an unseal, failover, backup, and restore drill before migrating secrets.