Files
railiance-platform/docs/openbao.md

15 KiB

OpenBao - Platform Secrets Service

Chart: openbao/openbao Chart version: 0.28.2 App version: v2.5.3 Namespace: openbao Managed by: railiance-platform (S3) Workplan: RAIL-PL-WP-0002 Initial target: Railiance01 (92.205.62.239)


Architecture

S5 workloads / operators
  -> openbao.openbao.svc.cluster.local:8200
       -> openbao-0
            -> integrated Raft storage on local-path PVC
            -> audit storage PVC mounted at /openbao/audit
  • OpenBao is the canonical Railiance S3 secrets service.
  • SOPS/age remains the Git-at-rest bootstrap mechanism.
  • The first Railiance01 deployment is single-replica Raft, not true HA.
  • Public ingress is disabled. Operators use kubectl exec or port-forwarding.
  • TLS is disabled inside the pod listener for this internal-only bootstrap. Add cert-manager-backed internal TLS before exposing OpenBao beyond cluster-local traffic.

Deployment

The official OpenBao project recommends the Helm chart for Kubernetes deployments and warns to run Helm with --dry-run before install or upgrade.

From a host with kubeconfig access:

make openbao-dry-run
make openbao-deploy
make openbao-status

On Railiance01 directly:

cd ~/railiance-platform
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-dry-run
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-deploy
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-status

If the repo is not present on Railiance01 yet, copy only the non-secret values file and run Helm directly:

scp helm/openbao-values.yaml tegwick@92.205.62.239:/tmp/openbao-values.yaml
ssh tegwick@92.205.62.239 \
  'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install openbao openbao/openbao \
     --version 0.28.2 \
     --namespace openbao \
     --create-namespace \
     -f /tmp/openbao-values.yaml \
     --dry-run'

Repeat without --dry-run to deploy.

Verification

kubectl get pods,svc,pvc -n openbao -o wide
kubectl exec -n openbao openbao-0 -- bao status

Expected immediately after install:

  • openbao-0 is Running.
  • openbao, openbao-active, openbao-internal, and openbao-ui services exist as cluster-internal services.
  • data and audit PVCs are Bound.
  • bao status reports Initialized: false and Sealed: true.

That state is intentional until the bootstrap ceremony is completed. bao status may return exit code 2 while sealed; this is expected for the pre-init state and does not by itself indicate a deployment failure.

Bootstrap Ceremony

Do not initialize OpenBao in a casual shell session. Initialization emits the unseal keys and initial root token. Treat this as a break-glass event.

Setup Operator And King Credential

The initial accountable setup operator/contact is tegwick (bernd.worsch@gmail.com), with Gitea identity tegwick. This identity can assemble early infrastructure, receive notifications, and operate day-to-day Git/Gitea workflows, but it is not the desired long-term platform root of trust.

The actual platform-root target is a separate king credential created through the NetKingdom bootstrap path before OpenBao becomes live secret custody. Email may receive notifications, but Gitea, Git, State Hub, chat, tickets, shell history, and email must not store or transfer OpenBao unseal keys, root tokens, private keys, OTP seeds, recovery codes, or screenshots of secret output.

The canonical custody policy is in net-kingdom/docs/platform-root-custody.md. The preferred production posture is independent two-of-three custody. Temporary single-operator king custody is feasible for pre-production bootstrap only when second-factor protection, offline recovery storage, and a low-friction upgrade path to additional custodians are in place.

Pre-flight checks:

make openbao-status
make openbao-verify

Proceed only when:

  • openbao-0 is Running.
  • data and audit PVCs are Bound.
  • bao status reports Initialized: false and Sealed: true.
  • Railiance01 host/cluster backup posture is understood for this maintenance window.
  • the guided NetKingdom bootstrap path exists for creating or importing the king credential.
  • the OpenBao custody mode is recorded: preferred independent custody, or an explicit temporary single-custodian king bootstrap exception.

Recommended ceremony:

  1. Confirm the Railiance01 backup posture first.

  2. Prepare the king credential and approved escrow holders or offline single-custody locations.

  3. Run initialization once:

    kubectl exec -n openbao openbao-0 -- \
      bao operator init -key-shares=3 -key-threshold=2
    
  4. Give each unseal share to its escrow owner or approved king-custody location through an out-of-band channel.

  5. Unseal with two shares:

    kubectl exec -n openbao openbao-0 -- bao operator unseal
    
  6. Log in with the initial root token only long enough to create durable admin auth, enable audit, and prepare policies.

  7. Revoke or tightly escrow the initial root token.

Do not paste unseal keys, root tokens, screenshots, or command output into Git, State Hub, chat, shell history, or issue trackers. Each unseal share goes to one escrow owner through an out-of-band channel. The initial root token is either revoked after a non-root platform-admin token exists or stored as offline break-glass material with the same handling as unseal shares.

Initial Configuration After Unseal

File audit is configured declaratively in helm/openbao-values.yaml with a server config audit "file" "file" stanza that writes to /openbao/audit/openbao-audit.log on the audit PVC.

Enable the first KV v2 mount:

kubectl exec -n openbao openbao-0 -- \
  bao secrets enable -path=platform kv-v2

Kubernetes auth, database dynamic credentials, PKI, CSI, and External Secrets integration are follow-up tasks in RAIL-PL-WP-0002. Do not migrate live application secrets until those policies and restore drills are documented.

The repo now includes a non-secret helper for the first post-unseal configuration:

make openbao-configure-initial

The target prompts for a token, verifies the declarative file audit device is visible, enables the platform/ KV v2 mount, enables Kubernetes auth, configures Kubernetes auth from the in-pod service account, and loads:

  • openbao/policies/platform-admin.hcl
  • openbao/policies/platform-readonly.hcl

It does not print or store the token. You may also set OPENBAO_TOKEN_FILE=/path/to/token-file for an operator-local, uncommitted token file.

OpenBao audit is a production gate. If bao audit list does not show file/, fix the declarative audit stanza or Helm rollout before moving production secrets into OpenBao.

The helper is idempotent. Re-running it should report existing platform/ and kubernetes/ paths as already enabled instead of failing the ceremony.

After the helper succeeds, create a non-root admin token:

kubectl exec -n openbao openbao-0 -- \
  bao token create -policy=platform-admin -period=24h -orphan

Store that token through the approved operator secret path, then revoke or tightly escrow the initial root token. The root token should not become the normal operator credential.

Auth And Workload Integration

Initial auth model:

Actor Method Notes
Setup operator/contact Gitea tegwick / bernd.worsch@gmail.com low-trust assembly and notifications; not platform root of trust
King credential NetKingdom custody record for dedicated platform-root identity accountable bootstrap/recovery authority; not a Git or email secret store
Bootstrap operator one-time root token only for initial audit, mounts, auth, policies, and non-root token creation
Platform operator token with platform-admin temporary until NetKingdom OIDC/admin integration is ready
Read-only reviewer token with platform-readonly metadata and health visibility, no secret reads
Kubernetes workload Kubernetes auth role namespace/service-account bound, policy per workload
Human identity NetKingdom IAM Profile/OIDC target model; OpenBao is not the identity provider
Automation Kubernetes auth or short-lived operator token no root tokens in automation

Workload delivery choice:

  • Prefer External Secrets Operator for values that should become Kubernetes Secrets consumed by ordinary Helm charts.
  • Use CSI-mounted files for workloads that need file references, sharper mount-level boundaries, or secret refresh without rewriting application manifests.
  • Do not use the OpenBao injector in the current deployment; the Helm values leave it disabled.
  • Application repositories request paths and policies; railiance-platform owns platform mounts, policy shape, and delivery mechanisms.

Path convention:

platform/workloads/<namespace>/<service-account>/<secret-name>
platform/object-storage/<consumer>
platform/databases/<consumer>
platform/operators/<purpose>

The template policy for workload KV reads is openbao/policies/workload-kv-read-template.hcl.

Backup, Restore, Audit, And Monitoring

Before any live application secrets move into OpenBao:

  1. Confirm file audit is enabled and an audit file is written under /openbao/audit/openbao-audit.log.

  2. Create an OpenBao Raft snapshot from the unsealed pod:

    kubectl exec -n openbao openbao-0 -- \
      bao operator raft snapshot save /tmp/openbao-raft.snap
    kubectl cp openbao/openbao-0:/tmp/openbao-raft.snap ./openbao-raft.snap
    
  3. Encrypt the snapshot with age/SOPS-compatible custody before it leaves the operator machine.

  4. Run an isolated restore drill before treating OpenBao as live secret custody. The drill must prove that a fresh OpenBao instance can restore the snapshot, unseal, and read a test secret. Record only non-secret evidence using docs/openbao-restore-drill-evidence.example.json as a template, then validate it with:

    make openbao-validate-restore-evidence \
      OPENBAO_RESTORE_EVIDENCE=/path/to/evidence.json
    
  5. Decide where audit logs are shipped durably. The audit PVC alone is not a durable audit sink. The interim audit-core mock file backend can prove API and setup wiring, but it writes to /tmp and is not production retention.

  6. Run:

    make openbao-verify-post-unseal
    

Authenticated verification, after the KeyCape-backed platform-admin path or another approved operator token is available:

make openbao-verify-authenticated

The target prompts for the token without echoing it, never puts the token on the command line, and only runs non-mutating checks. It verifies that bao audit list shows file/, bao secrets list shows platform/, bao auth list shows both kubernetes/ and keycape/, and that the file audit log is non-empty.

If a previous attended OIDC login stored a still-valid token in the pod token helper, use:

make openbao-verify-authenticated OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper

Current durable audit status: the file audit device writes to the audit PVC, which is necessary but not enough for production trust. Before application secrets move into OpenBao, choose and test a durable audit sink beyond that PVC such as an encrypted platform backup/export path or the future centralized logging stack. Do not treat non-secret hashes, screenshots, or State Hub notes as substitutes for retained audit log custody.

Interim integration status: /home/worsch/audit-core provides a mock Audit Core backend that writes JSONL records under /tmp/audit-core/audit-YYYYMMDDTHH.jsonl and deletes files older than seven days. Use it only to wire interfaces and setup validation before the durable Audit Core archive exists.

Monitoring baseline:

  • pod readiness and liveness from Kubernetes probes
  • bao status seal/init state
  • PVC capacity for data and audit storage
  • audit log write success
  • future Prometheus scraping once the cluster monitoring stack exists

Artifact-Store Object Storage Handoff

artifact-store is the consumer-facing artifact preservation service for generated outputs, evidence packages, reports, logs, snapshots, exports, and release artifacts. It already has an S3-compatible backend with env:NAME and file:/mounted/path credential references, plus an artifactstore storage verify --backend s3 smoke path.

Railiance should avoid building a parallel object-storage client or credential vending flow in OpenBao. The ownership split is:

  • railiance-platform / OpenBao owns bootstrap secret custody, policy, audit, break-glass access, and workload secret delivery.
  • artifact-store owns artifact package manifests, the S3 backend, storage verification, and whether temporary credentials require backend refresh support or a sidecar/controller.
  • net-kingdom owns the identity issuer and role-claim model if object storage adopts STS with AssumeRoleWithWebIdentity.

Initial static-credential bridge, before STS is proven:

  1. Create a scoped object-store access key limited to the artifact-store bucket and prefix. Do not use object-store root credentials.

  2. Store the key pair in OpenBao under a platform-owned path such as platform/object-storage/artifact-store.

  3. Deliver the values to the artifact-store pod through CSI or External Secrets as mounted files.

  4. Configure artifact-store with file references:

    export ARTIFACTSTORE_S3_ACCESS_KEY_REF=file:/run/secrets/artifactstore/s3-access-key
    export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore/s3-secret-key
    
  5. Verify from artifact-store:

    artifactstore storage verify --backend s3
    

STS credential vending remains linked to ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS Credential Vending. If that workstream chooses MinIO-compatible AssumeRoleWithWebIdentity, OpenBao should not become the identity provider by default. Use the NetKingdom OIDC issuer for workload/user identity, map object storage roles and policies there, and keep OpenBao responsible for bootstrap, break-glass, audit, and delivery of any controller configuration.

Current artifact-store configuration exposes access key and secret key refs, but no session-token ref. ARTIFACT-STORE-WP-0007-T004 must either add temporary-session-token support to the S3 backend or choose a sidecar/secret controller pattern that keeps refreshed credentials available through the existing env/file reference contract.

Upgrade And Rollback

  1. Read the OpenBao chart release notes.
  2. Update OPENBAO_CHART_VERSION in Makefile.
  3. Run make openbao-dry-run.
  4. Confirm current backup and audit log posture.
  5. Run make openbao-deploy.
  6. Run make openbao-status.

For rollback, run helm rollback openbao <REVISION> -n openbao on Railiance01 and re-check bao status.

Scaling To Three Nodes

When Railiance02 and Railiance03 join:

  1. Move storage from local-path to distributed storage.
  2. Set server.affinity back to anti-affinity.
  3. Set server.ha.replicas: 3.
  4. Re-enable a PodDisruptionBudget.
  5. Run an unseal, failover, backup, and restore drill before migrating secrets.