Files
railiance-platform/docs/openbao.md
2026-05-23 13:59:58 +02:00

331 lines
11 KiB
Markdown

# OpenBao - Platform Secrets Service
**Chart:** `openbao/openbao`
**Chart version:** `0.28.2`
**App version:** `v2.5.3`
**Namespace:** `openbao`
**Managed by:** `railiance-platform` (S3)
**Workplan:** `RAIL-PL-WP-0002`
**Initial target:** Railiance01 (`92.205.62.239`)
---
## Architecture
```
S5 workloads / operators
-> openbao.openbao.svc.cluster.local:8200
-> openbao-0
-> integrated Raft storage on local-path PVC
-> audit storage PVC mounted at /openbao/audit
```
- OpenBao is the canonical Railiance S3 secrets service.
- SOPS/age remains the Git-at-rest bootstrap mechanism.
- The first Railiance01 deployment is single-replica Raft, not true HA.
- Public ingress is disabled. Operators use `kubectl exec` or port-forwarding.
- TLS is disabled inside the pod listener for this internal-only bootstrap. Add
cert-manager-backed internal TLS before exposing OpenBao beyond cluster-local
traffic.
## Deployment
The official OpenBao project recommends the Helm chart for Kubernetes
deployments and warns to run Helm with `--dry-run` before install or upgrade.
From a host with kubeconfig access:
```bash
make openbao-dry-run
make openbao-deploy
make openbao-status
```
On Railiance01 directly:
```bash
cd ~/railiance-platform
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-dry-run
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-deploy
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-status
```
If the repo is not present on Railiance01 yet, copy only the non-secret values
file and run Helm directly:
```bash
scp helm/openbao-values.yaml tegwick@92.205.62.239:/tmp/openbao-values.yaml
ssh tegwick@92.205.62.239 \
'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install openbao openbao/openbao \
--version 0.28.2 \
--namespace openbao \
--create-namespace \
-f /tmp/openbao-values.yaml \
--dry-run'
```
Repeat without `--dry-run` to deploy.
## Verification
```bash
kubectl get pods,svc,pvc -n openbao -o wide
kubectl exec -n openbao openbao-0 -- bao status
```
Expected immediately after install:
- `openbao-0` is Running.
- `openbao`, `openbao-active`, `openbao-internal`, and `openbao-ui` services
exist as cluster-internal services.
- data and audit PVCs are Bound.
- `bao status` reports `Initialized: false` and `Sealed: true`.
That state is intentional until the bootstrap ceremony is completed.
## Bootstrap Ceremony
Do not initialize OpenBao in a casual shell session. Initialization emits the
unseal keys and initial root token. Treat this as a break-glass event.
Pre-flight checks:
```bash
make openbao-status
make openbao-verify
```
Proceed only when:
- `openbao-0` is Running.
- data and audit PVCs are Bound.
- `bao status` reports `Initialized: false` and `Sealed: true`.
- Railiance01 host/cluster backup posture is understood for this maintenance
window.
- three human escrow recipients are named before the command is run.
Recommended ceremony:
1. Confirm the Railiance01 backup posture first.
2. Prepare three human escrow recipients for unseal shares.
3. Run initialization once:
```bash
kubectl exec -n openbao openbao-0 -- \
bao operator init -key-shares=3 -key-threshold=2
```
4. Give each unseal share to its escrow owner through an out-of-band channel.
5. Unseal with two shares:
```bash
kubectl exec -n openbao openbao-0 -- bao operator unseal
```
6. Log in with the initial root token only long enough to create durable admin
auth, enable audit, and prepare policies.
7. Revoke or tightly escrow the initial root token.
Do not paste unseal keys, root tokens, screenshots, or command output into Git,
State Hub, chat, shell history, or issue trackers. Each unseal share goes to one
escrow owner through an out-of-band channel. The initial root token is either
revoked after a non-root platform-admin token exists or stored as offline
break-glass material with the same handling as unseal shares.
## Initial Configuration After Unseal
Enable file audit:
```bash
kubectl exec -n openbao openbao-0 -- \
bao audit enable file file_path=/openbao/audit/openbao-audit.log
```
Enable the first KV v2 mount:
```bash
kubectl exec -n openbao openbao-0 -- \
bao secrets enable -path=platform kv-v2
```
Kubernetes auth, database dynamic credentials, PKI, CSI, and External Secrets
integration are follow-up tasks in `RAIL-PL-WP-0002`. Do not migrate live
application secrets until those policies and restore drills are documented.
The repo now includes a non-secret helper for the first post-unseal
configuration:
```bash
make openbao-configure-initial
```
The target prompts for a token, enables file audit, enables the `platform/` KV
v2 mount, enables Kubernetes auth, configures Kubernetes auth from the in-pod
service account, and loads:
- `openbao/policies/platform-admin.hcl`
- `openbao/policies/platform-readonly.hcl`
It does not print or store the token. You may also set
`OPENBAO_TOKEN_FILE=/path/to/token-file` for an operator-local, uncommitted
token file.
After the helper succeeds, create a non-root admin token:
```bash
kubectl exec -n openbao openbao-0 -- \
bao token create -policy=platform-admin -period=24h -orphan
```
Store that token through the approved operator secret path, then revoke or
tightly escrow the initial root token. The root token should not become the
normal operator credential.
## Auth And Workload Integration
Initial auth model:
| Actor | Method | Notes |
|-------|--------|-------|
| Bootstrap operator | one-time root token | only for initial audit, mounts, auth, policies, and non-root token creation |
| Platform operator | token with `platform-admin` | temporary until NetKingdom OIDC/admin integration is ready |
| Read-only reviewer | token with `platform-readonly` | metadata and health visibility, no secret reads |
| Kubernetes workload | Kubernetes auth role | namespace/service-account bound, policy per workload |
| Human identity | NetKingdom IAM Profile/OIDC | target model; OpenBao is not the identity provider |
| Automation | Kubernetes auth or short-lived operator token | no root tokens in automation |
Workload delivery choice:
- Prefer External Secrets Operator for values that should become Kubernetes
Secrets consumed by ordinary Helm charts.
- Use CSI-mounted files for workloads that need file references, sharper
mount-level boundaries, or secret refresh without rewriting application
manifests.
- Do not use the OpenBao injector in the current deployment; the Helm values
leave it disabled.
- Application repositories request paths and policies; `railiance-platform`
owns platform mounts, policy shape, and delivery mechanisms.
Path convention:
```text
platform/workloads/<namespace>/<service-account>/<secret-name>
platform/object-storage/<consumer>
platform/databases/<consumer>
platform/operators/<purpose>
```
The template policy for workload KV reads is
`openbao/policies/workload-kv-read-template.hcl`.
## Backup, Restore, Audit, And Monitoring
Before any live application secrets move into OpenBao:
1. Enable file audit and confirm an audit file is written under
`/openbao/audit/openbao-audit.log`.
2. Create an OpenBao Raft snapshot from the unsealed pod:
```bash
kubectl exec -n openbao openbao-0 -- \
bao operator raft snapshot save /tmp/openbao-raft.snap
kubectl cp openbao/openbao-0:/tmp/openbao-raft.snap ./openbao-raft.snap
```
3. Encrypt the snapshot with age/SOPS-compatible custody before it leaves the
operator machine.
4. Run an isolated restore drill before treating OpenBao as live secret
custody. The drill must prove that a fresh OpenBao instance can restore the
snapshot, unseal, and read a test secret.
5. Decide where audit logs are shipped durably. The audit PVC alone is not a
durable audit sink.
6. Run:
```bash
make openbao-verify-post-unseal
```
Monitoring baseline:
- pod readiness and liveness from Kubernetes probes
- `bao status` seal/init state
- PVC capacity for data and audit storage
- audit log write success
- future Prometheus scraping once the cluster monitoring stack exists
## Artifact-Store Object Storage Handoff
`artifact-store` is the consumer-facing artifact preservation service for
generated outputs, evidence packages, reports, logs, snapshots, exports, and
release artifacts. It already has an S3-compatible backend with `env:NAME` and
`file:/mounted/path` credential references, plus an
`artifactstore storage verify --backend s3` smoke path.
Railiance should avoid building a parallel object-storage client or credential
vending flow in OpenBao. The ownership split is:
- `railiance-platform` / OpenBao owns bootstrap secret custody, policy, audit,
break-glass access, and workload secret delivery.
- `artifact-store` owns artifact package manifests, the S3 backend, storage
verification, and whether temporary credentials require backend refresh
support or a sidecar/controller.
- `net-kingdom` owns the identity issuer and role-claim model if object storage
adopts STS with `AssumeRoleWithWebIdentity`.
Initial static-credential bridge, before STS is proven:
1. Create a scoped object-store access key limited to the artifact-store bucket
and prefix. Do not use object-store root credentials.
2. Store the key pair in OpenBao under a platform-owned path such as
`platform/object-storage/artifact-store`.
3. Deliver the values to the artifact-store pod through CSI or External Secrets
as mounted files.
4. Configure artifact-store with file references:
```bash
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=file:/run/secrets/artifactstore/s3-access-key
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore/s3-secret-key
```
5. Verify from artifact-store:
```bash
artifactstore storage verify --backend s3
```
STS credential vending remains linked to
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
Credential Vending`. If that workstream chooses MinIO-compatible
`AssumeRoleWithWebIdentity`, OpenBao should not become the identity provider by
default. Use the NetKingdom OIDC issuer for workload/user identity, map object
storage roles and policies there, and keep OpenBao responsible for bootstrap,
break-glass, audit, and delivery of any controller configuration.
Current artifact-store configuration exposes access key and secret key refs,
but no session-token ref. `ARTIFACT-STORE-WP-0007-T004` must either add
temporary-session-token support to the S3 backend or choose a sidecar/secret
controller pattern that keeps refreshed credentials available through the
existing env/file reference contract.
## Upgrade And Rollback
1. Read the OpenBao chart release notes.
2. Update `OPENBAO_CHART_VERSION` in `Makefile`.
3. Run `make openbao-dry-run`.
4. Confirm current backup and audit log posture.
5. Run `make openbao-deploy`.
6. Run `make openbao-status`.
For rollback, run `helm rollback openbao <REVISION> -n openbao` on Railiance01
and re-check `bao status`.
## Scaling To Three Nodes
When Railiance02 and Railiance03 join:
1. Move storage from `local-path` to distributed storage.
2. Set `server.affinity` back to anti-affinity.
3. Set `server.ha.replicas: 3`.
4. Re-enable a PodDisruptionBudget.
5. Run an unseal, failover, backup, and restore drill before migrating secrets.