Files
railiance-platform/docs/openbao.md

418 lines
16 KiB
Markdown

# OpenBao - Platform Secrets Service
**Chart:** `openbao/openbao`
**Chart version:** `0.28.2`
**App version:** `v2.5.3`
**Namespace:** `openbao`
**Managed by:** `railiance-platform` (S3)
**Workplan:** `RAIL-PL-WP-0002`
**Initial target:** Railiance01 (`92.205.62.239`)
---
## Architecture
```
S5 workloads / operators
-> openbao.openbao.svc.cluster.local:8200
-> openbao-0
-> integrated Raft storage on local-path PVC
-> audit storage PVC mounted at /openbao/audit
```
- OpenBao is the canonical Railiance S3 secrets service.
- SOPS/age remains the Git-at-rest bootstrap mechanism.
- The first Railiance01 deployment is single-replica Raft, not true HA.
- Public ingress is disabled. Operators use `kubectl exec` or port-forwarding.
- TLS is disabled inside the pod listener for this internal-only bootstrap. Add
cert-manager-backed internal TLS before exposing OpenBao beyond cluster-local
traffic.
## Deployment
The official OpenBao project recommends the Helm chart for Kubernetes
deployments and warns to run Helm with `--dry-run` before install or upgrade.
From a host with kubeconfig access:
```bash
make openbao-dry-run
make openbao-deploy
make openbao-status
```
On Railiance01 directly:
```bash
cd ~/railiance-platform
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-dry-run
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-deploy
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-status
```
If the repo is not present on Railiance01 yet, copy only the non-secret values
file and run Helm directly:
```bash
scp helm/openbao-values.yaml tegwick@92.205.62.239:/tmp/openbao-values.yaml
ssh tegwick@92.205.62.239 \
'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install openbao openbao/openbao \
--version 0.28.2 \
--namespace openbao \
--create-namespace \
-f /tmp/openbao-values.yaml \
--dry-run'
```
Repeat without `--dry-run` to deploy.
## Verification
```bash
kubectl get pods,svc,pvc -n openbao -o wide
kubectl exec -n openbao openbao-0 -- bao status
```
Expected immediately after install:
- `openbao-0` is Running.
- `openbao`, `openbao-active`, `openbao-internal`, and `openbao-ui` services
exist as cluster-internal services.
- data and audit PVCs are Bound.
- `bao status` reports `Initialized: false` and `Sealed: true`.
That state is intentional until the bootstrap ceremony is completed.
`bao status` may return exit code `2` while sealed; this is expected for the
pre-init state and does not by itself indicate a deployment failure.
## Bootstrap Ceremony
Do not initialize OpenBao in a casual shell session. Initialization emits the
unseal keys and initial root token. Treat this as a break-glass event.
### Setup Operator And King Credential
The initial accountable setup operator/contact is `tegwick`
(`bernd.worsch@gmail.com`), with Gitea identity `tegwick`. This identity can
assemble early infrastructure, receive notifications, and operate day-to-day
Git/Gitea workflows, but it is not the desired long-term platform root of
trust.
The actual platform-root target is a separate king credential created through
the NetKingdom bootstrap path before OpenBao becomes live secret custody. Email
may receive notifications, but Gitea, Git, State Hub, chat, tickets, shell
history, and email must not store or transfer OpenBao unseal keys, root tokens,
private keys, OTP seeds, recovery codes, or screenshots of secret output.
The canonical custody policy is in
`net-kingdom/docs/platform-root-custody.md`. The preferred production posture
is independent two-of-three custody. Temporary single-operator king custody is
feasible for pre-production bootstrap only when second-factor protection,
offline recovery storage, and a low-friction upgrade path to additional
custodians are in place.
Pre-flight checks:
```bash
make openbao-status
make openbao-verify
```
Proceed only when:
- `openbao-0` is Running.
- data and audit PVCs are Bound.
- `bao status` reports `Initialized: false` and `Sealed: true`.
- Railiance01 host/cluster backup posture is understood for this maintenance
window.
- the guided NetKingdom bootstrap path exists for creating or importing the
king credential.
- the OpenBao custody mode is recorded: preferred independent custody, or an
explicit temporary single-custodian king bootstrap exception.
Recommended ceremony:
1. Confirm the Railiance01 backup posture first.
2. Prepare the king credential and approved escrow holders or offline
single-custody locations.
3. Run initialization once:
```bash
kubectl exec -n openbao openbao-0 -- \
bao operator init -key-shares=3 -key-threshold=2
```
4. Give each unseal share to its escrow owner or approved king-custody location
through an out-of-band channel.
5. Unseal with two shares:
```bash
kubectl exec -n openbao openbao-0 -- bao operator unseal
```
6. Log in with the initial root token only long enough to create durable admin
auth, enable audit, and prepare policies.
7. Revoke or tightly escrow the initial root token.
Do not paste unseal keys, root tokens, screenshots, or command output into Git,
State Hub, chat, shell history, or issue trackers. Each unseal share goes to one
escrow owner through an out-of-band channel. The initial root token is either
revoked after a non-root platform-admin token exists or stored as offline
break-glass material with the same handling as unseal shares.
## Initial Configuration After Unseal
File audit is configured declaratively in `helm/openbao-values.yaml` with a
server config `audit "file" "file"` stanza that writes to
`/openbao/audit/openbao-audit.log` on the audit PVC.
Enable the first KV v2 mount:
```bash
kubectl exec -n openbao openbao-0 -- \
bao secrets enable -path=platform kv-v2
```
Kubernetes auth, database dynamic credentials, PKI, CSI, and External Secrets
integration are follow-up tasks in `RAIL-PL-WP-0002`. Do not migrate live
application secrets until those policies and restore drills are documented.
The repo now includes a non-secret helper for the first post-unseal
configuration:
```bash
make openbao-configure-initial
```
The target prompts for a token, verifies the declarative file audit device is
visible, enables the `platform/` KV v2 mount, enables Kubernetes auth,
configures Kubernetes auth from the in-pod service account, and loads:
- `openbao/policies/platform-admin.hcl`
- `openbao/policies/platform-readonly.hcl`
It does not print or store the token. You may also set
`OPENBAO_TOKEN_FILE=/path/to/token-file` for an operator-local, uncommitted
token file.
OpenBao audit is a production gate. If `bao audit list` does not show `file/`,
fix the declarative audit stanza or Helm rollout before moving production
secrets into OpenBao.
The helper is idempotent. Re-running it should report existing `platform/` and
`kubernetes/` paths as already enabled instead of failing the ceremony.
After the helper succeeds, create a non-root admin token:
```bash
kubectl exec -n openbao openbao-0 -- \
bao token create -policy=platform-admin -period=24h -orphan
```
Store that token through the approved operator secret path, then revoke or
tightly escrow the initial root token. The root token should not become the
normal operator credential.
## Auth And Workload Integration
Initial auth model:
| Actor | Method | Notes |
|-------|--------|-------|
| Setup operator/contact | Gitea `tegwick` / `bernd.worsch@gmail.com` | low-trust assembly and notifications; not platform root of trust |
| King credential | NetKingdom custody record for dedicated platform-root identity | accountable bootstrap/recovery authority; not a Git or email secret store |
| Bootstrap operator | one-time root token | only for initial audit, mounts, auth, policies, and non-root token creation |
| Platform operator | token with `platform-admin` | temporary until NetKingdom OIDC/admin integration is ready |
| Read-only reviewer | token with `platform-readonly` | metadata and health visibility, no secret reads |
| Kubernetes workload | Kubernetes auth role | namespace/service-account bound, policy per workload |
| Human identity | NetKingdom IAM Profile/OIDC | target model; OpenBao is not the identity provider |
| Automation | Kubernetes auth or short-lived operator token | no root tokens in automation |
Workload delivery choice:
- Prefer External Secrets Operator for values that should become Kubernetes
Secrets consumed by ordinary Helm charts.
- Use CSI-mounted files for workloads that need file references, sharper
mount-level boundaries, or secret refresh without rewriting application
manifests.
- Do not use the OpenBao injector in the current deployment; the Helm values
leave it disabled.
- Application repositories request paths and policies; `railiance-platform`
owns platform mounts, policy shape, and delivery mechanisms.
Path convention:
```text
platform/workloads/<namespace>/<service-account>/<secret-name>
platform/object-storage/<consumer>
platform/databases/<consumer>
platform/operators/<purpose>
```
The template policy for workload KV reads is
`openbao/policies/workload-kv-read-template.hcl`.
## Backup, Restore, Audit, And Monitoring
Before any live application secrets move into OpenBao:
1. Confirm file audit is enabled and an audit file is written under
`/openbao/audit/openbao-audit.log`.
2. Create an OpenBao Raft snapshot from the unsealed pod:
```bash
kubectl exec -n openbao openbao-0 -- \
bao operator raft snapshot save /tmp/openbao-raft.snap
kubectl cp openbao/openbao-0:/tmp/openbao-raft.snap ./openbao-raft.snap
```
3. Encrypt the snapshot with age/SOPS-compatible custody before it leaves the
operator machine.
4. Run an isolated restore drill before treating OpenBao as live secret
custody. The drill must prove that a fresh OpenBao instance can restore the
snapshot, unseal, and read a test secret.
Record only non-secret evidence using
`docs/openbao-restore-drill-evidence.example.json` as a template, replace
every placeholder with real drill evidence, then validate it with:
```bash
make openbao-validate-restore-evidence \
OPENBAO_RESTORE_EVIDENCE=/path/to/evidence.json
```
5. Decide where audit logs are shipped durably. The audit PVC alone is not a
durable audit sink. The interim `audit-core` mock file backend can prove API
and setup wiring, but it writes to `/tmp` and is not production retention.
6. Run:
```bash
make openbao-verify-post-unseal
```
Authenticated verification, after the KeyCape-backed `platform-admin` path or
another approved operator token is available:
```bash
make openbao-verify-authenticated
```
The target prompts for the token without echoing it, never puts the token on
the command line, and only runs non-mutating checks. It verifies that
`bao audit list` shows `file/`, `bao secrets list` shows `platform/`,
`bao auth list` shows both `kubernetes/` and `keycape/`, and that the file
audit log is non-empty.
If a previous attended OIDC login stored a still-valid token in the pod token
helper, use:
```bash
make openbao-verify-authenticated OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper
```
Current durable audit status: the file audit device writes to the audit PVC,
which is necessary but not enough for production trust. Before application
secrets move into OpenBao, choose and test a durable audit sink beyond that PVC
such as an encrypted platform backup/export path or the future centralized
logging stack. Do not treat non-secret hashes, screenshots, or State Hub notes
as substitutes for retained audit log custody.
Interim integration status: `/home/worsch/audit-core` provides a mock
Audit Core backend that writes JSONL records under
`/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and deletes files older than seven
days. Use it only to wire interfaces and setup validation before the durable
Audit Core archive exists.
Emergency seal/unseal drills are disruptive and must only run in an attended
window with threshold unseal shares available. Record non-secret drill evidence
using `docs/openbao-emergency-drill-evidence.example.json` as a template,
replace every placeholder with real drill evidence, then validate it with:
```bash
make openbao-validate-emergency-evidence \
OPENBAO_EMERGENCY_EVIDENCE=/path/to/evidence.json
```
Monitoring baseline:
- pod readiness and liveness from Kubernetes probes
- `bao status` seal/init state
- PVC capacity for data and audit storage
- audit log write success
- future Prometheus scraping once the cluster monitoring stack exists
## Artifact-Store Object Storage Handoff
`artifact-store` is the consumer-facing artifact preservation service for
generated outputs, evidence packages, reports, logs, snapshots, exports, and
release artifacts. It already has an S3-compatible backend with `env:NAME` and
`file:/mounted/path` credential references, plus an
`artifactstore storage verify --backend s3` smoke path.
Railiance should avoid building a parallel object-storage client or credential
vending flow in OpenBao. The ownership split is:
- `railiance-platform` / OpenBao owns bootstrap secret custody, policy, audit,
break-glass access, and workload secret delivery.
- `artifact-store` owns artifact package manifests, the S3 backend, storage
verification, and whether temporary credentials require backend refresh
support or a sidecar/controller.
- `net-kingdom` owns the identity issuer and role-claim model if object storage
adopts STS with `AssumeRoleWithWebIdentity`.
Initial static-credential bridge, before STS is proven:
1. Create a scoped object-store access key limited to the artifact-store bucket
and prefix. Do not use object-store root credentials.
2. Store the key pair in OpenBao under a platform-owned path such as
`platform/object-storage/artifact-store`.
3. Deliver the values to the artifact-store pod through CSI or External Secrets
as mounted files.
4. Configure artifact-store with file references:
```bash
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=file:/run/secrets/artifactstore/s3-access-key
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore/s3-secret-key
```
5. Verify from artifact-store:
```bash
artifactstore storage verify --backend s3
```
STS credential vending remains linked to
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
Credential Vending`. If that workstream chooses MinIO-compatible
`AssumeRoleWithWebIdentity`, OpenBao should not become the identity provider by
default. Use the NetKingdom OIDC issuer for workload/user identity, map object
storage roles and policies there, and keep OpenBao responsible for bootstrap,
break-glass, audit, and delivery of any controller configuration.
Current artifact-store configuration exposes access key and secret key refs,
but no session-token ref. `ARTIFACT-STORE-WP-0007-T004` must either add
temporary-session-token support to the S3 backend or choose a sidecar/secret
controller pattern that keeps refreshed credentials available through the
existing env/file reference contract.
## Upgrade And Rollback
1. Read the OpenBao chart release notes.
2. Update `OPENBAO_CHART_VERSION` in `Makefile`.
3. Run `make openbao-dry-run`.
4. Confirm current backup and audit log posture.
5. Run `make openbao-deploy`.
6. Run `make openbao-status`.
For rollback, run `helm rollback openbao <REVISION> -n openbao` on Railiance01
and re-check `bao status`.
## Scaling To Three Nodes
When Railiance02 and Railiance03 join:
1. Move storage from `local-path` to distributed storage.
2. Set `server.affinity` back to anti-affinity.
3. Set `server.ha.replicas: 3`.
4. Re-enable a PodDisruptionBudget.
5. Run an unseal, failover, backup, and restore drill before migrating secrets.