Added openbao secrets management and phaseout of bitnami for CloudNative PG

This commit is contained in:
2026-05-18 11:53:59 +02:00
parent fc0a6c280b
commit 980947894e
8 changed files with 493 additions and 16 deletions

1
.gitignore vendored
View File

@@ -2,6 +2,7 @@
helm/*.yaml
!helm/*.sops.yaml
!helm/*.yaml.template
!helm/openbao-values.yaml
# Kubernetes manifests (no secrets) are safe to commit
!helm/*-cluster.yaml
!helm/*-networkpolicies.yaml

66
AGENTS.md Normal file
View File

@@ -0,0 +1,66 @@
# railiance-platform — Codex Instructions
**OAS Stack Level:** S3 Platform Services
**Scope:** Shared services supporting cluster workloads — PostgreSQL HA,
object storage, secret management, identity, message brokers, caches.
**Pre-condition:** `railiance-cluster` must be converged (k3s running,
Helm available) before deploying anything from this repo.
## Custodian State Hub Integration
Domain: **railiance** — topic ID: `ca369340-a64e-442e-98f1-a4fa7dc74a38`
State Hub: http://127.0.0.1:8000
### Session Protocol
**Step 1 — Orient**
```
get_domain_summary("railiance")
```
**Step 2 — Scan workplans**
```
ls workplans/ # read all active workplans; note todo/in_progress tasks
```
**Step 3 — Present brief**
1. Active workstreams for railiance with `[repo:railiance-platform]` tasks
2. Pending tasks from local workplans
3. Goal guidance from summary (needs_workplan / alignment_warnings)
4. Suggested next action
**During work:** use `record_decision()`, `add_progress_event()`, `resolve_decision()`.
**Session close:** `add_progress_event()` with topic_id and workstream_id.
> Design boundary: hub is read model. Bootstrap tools (create_workstream,
> create_task) are First Session Protocol only. Work originates as files
> per ADR-001.
### Repo Boundary Rule (ADR-003)
This repo owns **S3 Platform Services only**. Do not manage:
- OS-level concerns → `railiance-infra` (S1)
- Kubernetes runtime concerns → `railiance-cluster` (S2)
- CI/CD or developer tooling → `railiance-enablement` (S4)
- Application deployments → `railiance-apps` (S5)
Reference: `railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md`
### Workplan Convention (ADR-001)
File location: `workplans/RAIL-PL-WP-NNNN-<slug>.md`
Prefix: `RAIL-PL`
### SBOM
After updating dependencies:
```bash
cd ~/the-custodian/state-hub
make ingest-sbom REPO=railiance-platform SCAN=1 REPO_PATH=/home/worsch/railiance-platform
```
### Quick Reference
`~/the-custodian/state-hub/mcp_server/TOOLS.md`

View File

@@ -8,6 +8,10 @@ NAMESPACE := platform
PG_CHART_VERSION ?= 16.2.2
VALKEY_CHART_VERSION ?= 2.x
OPENBAO_CHART_VERSION ?= 0.28.2
OPENBAO_NAMESPACE ?= openbao
OPENBAO_RELEASE ?= openbao
OPENBAO_VALUES ?= helm/openbao-values.yaml
##@ CloudNative PG (cnpg) — primary database operator
@@ -60,6 +64,33 @@ valkey-deploy: ## Deploy / upgrade Valkey (Redis-compatible) to platform namespa
valkey-status: ## Check Valkey pod status
$(KUBECTL) get pods -n $(NAMESPACE) -l app.kubernetes.io/name=valkey
##@ OpenBao (secrets)
openbao-repo: ## Add / update the official OpenBao Helm repository
$(HELM) repo add openbao https://openbao.github.io/openbao-helm --force-update
$(HELM) repo update openbao
openbao-dry-run: openbao-repo ## Render the OpenBao Helm release without applying it
$(HELM) upgrade --install $(OPENBAO_RELEASE) openbao/openbao \
--version $(OPENBAO_CHART_VERSION) \
--namespace $(OPENBAO_NAMESPACE) \
--create-namespace \
-f $(OPENBAO_VALUES) \
--dry-run
openbao-deploy: openbao-repo ## Deploy / upgrade OpenBao to the openbao namespace
$(KUBECTL) create namespace $(OPENBAO_NAMESPACE) --dry-run=client -o yaml | $(KUBECTL) apply -f -
$(HELM) upgrade --install $(OPENBAO_RELEASE) openbao/openbao \
--version $(OPENBAO_CHART_VERSION) \
--namespace $(OPENBAO_NAMESPACE) \
-f $(OPENBAO_VALUES) \
--wait --timeout 5m
openbao-status: ## Show OpenBao pods, services, PVCs, and seal/init status
$(KUBECTL) get pods,svc,pvc -n $(OPENBAO_NAMESPACE) \
-l app.kubernetes.io/instance=$(OPENBAO_RELEASE) -o wide
-$(KUBECTL) exec -n $(OPENBAO_NAMESPACE) $(OPENBAO_RELEASE)-0 -- bao status
##@ Backup
backup: ## Backup platform services (PostgreSQL logical dump) — age-encrypted to Nextcloud
@@ -72,4 +103,4 @@ help: ## Show this help
/^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-22s\033[0m %s\n", $$1, $$2 } \
/^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) }' $(MAKEFILE_LIST)
.PHONY: db-deploy db-status db-shell db-logs pg-deploy pg-status pg-pgpool-check valkey-deploy valkey-status backup help
.PHONY: db-deploy db-status db-shell db-logs pg-deploy pg-status pg-pgpool-check valkey-deploy valkey-status openbao-repo openbao-dry-run openbao-deploy openbao-status backup help

View File

@@ -22,7 +22,8 @@ Railiance is structured as five independent repos per OAS Stack layer. This repo
- PostgreSQL via CloudNative PG operator (cnpg) — operator deployed, `databases` namespace active
- Valkey / Redis-compatible cache as a standalone Helm release (to be extracted from S2)
- Secret management infrastructure
- Secret management infrastructure (OpenBao as the platform service,
SOPS/age for Git-at-rest bootstrap material)
- Identity services integration point (with net-kingdom)
- Message brokers (RabbitMQ, similar)
- Object storage (MinIO / S3-compatible)
@@ -111,10 +112,17 @@ description: S3-compatible object storage service (MinIO) for artifact storage,
keywords: [minio, s3, object-storage, storage, artifacts, backup]
```
```capability
type: security
title: OpenBao platform secrets service
description: Canonical S3 secrets service for runtime secrets, dynamic credentials, audit, and future workload integrations. SOPS/age remains the bootstrap mechanism for Git-at-rest secrets.
keywords: [openbao, secrets, vault-compatible, secret-management, dynamic-credentials, audit, kubernetes-auth]
```
---
## Getting Oriented
- Start with: `CLAUDE.md` (session protocol, boundary rules)
- Key files / directories: `workplans/RAIL-PL-WP-0001-platform-baseline.md`, `helm/` (platform Helm charts), `Makefile`
- Key files / directories: `workplans/RAIL-PL-WP-0001-platform-baseline.md`, `workplans/RAIL-PL-WP-0002-openbao-platform-secrets-service.md`, `helm/` (platform Helm charts), `docs/openbao.md`, `Makefile`
- Pre-conditions: railiance-cluster (S2) converged with k3s running; cluster backup verified before migration steps (`sudo make backup` in railiance-cluster)

207
docs/openbao.md Normal file
View File

@@ -0,0 +1,207 @@
# OpenBao - Platform Secrets Service
**Chart:** `openbao/openbao`
**Chart version:** `0.28.2`
**App version:** `v2.5.3`
**Namespace:** `openbao`
**Managed by:** `railiance-platform` (S3)
**Workplan:** `RAIL-PL-WP-0002`
**Initial target:** Railiance01 (`92.205.62.239`)
---
## Architecture
```
S5 workloads / operators
-> openbao.openbao.svc.cluster.local:8200
-> openbao-0
-> integrated Raft storage on local-path PVC
-> audit storage PVC mounted at /openbao/audit
```
- OpenBao is the canonical Railiance S3 secrets service.
- SOPS/age remains the Git-at-rest bootstrap mechanism.
- The first Railiance01 deployment is single-replica Raft, not true HA.
- Public ingress is disabled. Operators use `kubectl exec` or port-forwarding.
- TLS is disabled inside the pod listener for this internal-only bootstrap. Add
cert-manager-backed internal TLS before exposing OpenBao beyond cluster-local
traffic.
## Deployment
The official OpenBao project recommends the Helm chart for Kubernetes
deployments and warns to run Helm with `--dry-run` before install or upgrade.
From a host with kubeconfig access:
```bash
make openbao-dry-run
make openbao-deploy
make openbao-status
```
On Railiance01 directly:
```bash
cd ~/railiance-platform
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-dry-run
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-deploy
sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml make openbao-status
```
If the repo is not present on Railiance01 yet, copy only the non-secret values
file and run Helm directly:
```bash
scp helm/openbao-values.yaml tegwick@92.205.62.239:/tmp/openbao-values.yaml
ssh tegwick@92.205.62.239 \
'sudo env KUBECONFIG=/etc/rancher/k3s/k3s.yaml helm upgrade --install openbao openbao/openbao \
--version 0.28.2 \
--namespace openbao \
--create-namespace \
-f /tmp/openbao-values.yaml \
--dry-run'
```
Repeat without `--dry-run` to deploy.
## Verification
```bash
kubectl get pods,svc,pvc -n openbao -o wide
kubectl exec -n openbao openbao-0 -- bao status
```
Expected immediately after install:
- `openbao-0` is Running.
- `openbao`, `openbao-active`, `openbao-internal`, and `openbao-ui` services
exist as cluster-internal services.
- data and audit PVCs are Bound.
- `bao status` reports `Initialized: false` and `Sealed: true`.
That state is intentional until the bootstrap ceremony is completed.
## Bootstrap Ceremony
Do not initialize OpenBao in a casual shell session. Initialization emits the
unseal keys and initial root token. Treat this as a break-glass event.
Recommended ceremony:
1. Confirm the Railiance01 backup posture first.
2. Prepare three human escrow recipients for unseal shares.
3. Run initialization once:
```bash
kubectl exec -n openbao openbao-0 -- \
bao operator init -key-shares=3 -key-threshold=2
```
4. Give each unseal share to its escrow owner through an out-of-band channel.
5. Unseal with two shares:
```bash
kubectl exec -n openbao openbao-0 -- bao operator unseal
```
6. Log in with the initial root token only long enough to create durable admin
auth, enable audit, and prepare policies.
7. Revoke or tightly escrow the initial root token.
## Initial Configuration After Unseal
Enable file audit:
```bash
kubectl exec -n openbao openbao-0 -- \
bao audit enable file file_path=/openbao/audit/openbao-audit.log
```
Enable the first KV v2 mount:
```bash
kubectl exec -n openbao openbao-0 -- \
bao secrets enable -path=platform kv-v2
```
Kubernetes auth, database dynamic credentials, PKI, CSI, and External Secrets
integration are follow-up tasks in `RAIL-PL-WP-0002`. Do not migrate live
application secrets until those policies and restore drills are documented.
## Artifact-Store Object Storage Handoff
`artifact-store` is the consumer-facing artifact preservation service for
generated outputs, evidence packages, reports, logs, snapshots, exports, and
release artifacts. It already has an S3-compatible backend with `env:NAME` and
`file:/mounted/path` credential references, plus an
`artifactstore storage verify --backend s3` smoke path.
Railiance should avoid building a parallel object-storage client or credential
vending flow in OpenBao. The ownership split is:
- `railiance-platform` / OpenBao owns bootstrap secret custody, policy, audit,
break-glass access, and workload secret delivery.
- `artifact-store` owns artifact package manifests, the S3 backend, storage
verification, and whether temporary credentials require backend refresh
support or a sidecar/controller.
- `net-kingdom` owns the identity issuer and role-claim model if object storage
adopts STS with `AssumeRoleWithWebIdentity`.
Initial static-credential bridge, before STS is proven:
1. Create a scoped object-store access key limited to the artifact-store bucket
and prefix. Do not use object-store root credentials.
2. Store the key pair in OpenBao under a platform-owned path such as
`platform/object-storage/artifact-store`.
3. Deliver the values to the artifact-store pod through CSI or External Secrets
as mounted files.
4. Configure artifact-store with file references:
```bash
export ARTIFACTSTORE_S3_ACCESS_KEY_REF=file:/run/secrets/artifactstore/s3-access-key
export ARTIFACTSTORE_S3_SECRET_KEY_REF=file:/run/secrets/artifactstore/s3-secret-key
```
5. Verify from artifact-store:
```bash
artifactstore storage verify --backend s3
```
STS credential vending remains linked to
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
Credential Vending`. If that workstream chooses MinIO-compatible
`AssumeRoleWithWebIdentity`, OpenBao should not become the identity provider by
default. Use the NetKingdom OIDC issuer for workload/user identity, map object
storage roles and policies there, and keep OpenBao responsible for bootstrap,
break-glass, audit, and delivery of any controller configuration.
Current artifact-store configuration exposes access key and secret key refs,
but no session-token ref. `ARTIFACT-STORE-WP-0007-T004` must either add
temporary-session-token support to the S3 backend or choose a sidecar/secret
controller pattern that keeps refreshed credentials available through the
existing env/file reference contract.
## Upgrade And Rollback
1. Read the OpenBao chart release notes.
2. Update `OPENBAO_CHART_VERSION` in `Makefile`.
3. Run `make openbao-dry-run`.
4. Confirm current backup and audit log posture.
5. Run `make openbao-deploy`.
6. Run `make openbao-status`.
For rollback, run `helm rollback openbao <REVISION> -n openbao` on Railiance01
and re-check `bao status`.
## Scaling To Three Nodes
When Railiance02 and Railiance03 join:
1. Move storage from `local-path` to distributed storage.
2. Set `server.affinity` back to anti-affinity.
3. Set `server.ha.replicas: 3`.
4. Re-enable a PodDisruptionBudget.
5. Run an unseal, failover, backup, and restore drill before migrating secrets.

123
helm/openbao-values.yaml Normal file
View File

@@ -0,0 +1,123 @@
# Railiance S3 OpenBao platform secrets service.
#
# This file intentionally contains no secret material. OpenBao initialization
# creates unseal keys and the initial root token; handle those outside Git
# during the bootstrap ceremony documented in docs/openbao.md.
global:
namespace: openbao
tlsDisable: true
injector:
enabled: false
server:
enabled: true
logLevel: info
logFormat: json
image:
registry: quay.io
repository: openbao/openbao
pullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
ingress:
enabled: false
authDelegator:
enabled: true
# Single-node Railiance01 bootstrap. Remove this override and scale
# server.ha.replicas when Railiance02/03 join with distributed storage.
affinity: ""
readinessProbe:
enabled: true
path: /v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
port: 8200
livenessProbe:
enabled: true
path: /v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204
port: 8200
initialDelaySeconds: 60
networkPolicy:
enabled: true
ingress:
- from:
- namespaceSelector: {}
ports:
- port: 8200
protocol: TCP
- port: 8201
protocol: TCP
dataStorage:
enabled: true
size: 5Gi
storageClass: local-path
accessMode: ReadWriteOnce
auditStorage:
enabled: true
size: 2Gi
storageClass: local-path
accessMode: ReadWriteOnce
standalone:
enabled: false
ha:
enabled: true
replicas: 1
disruptionBudget:
enabled: false
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
telemetry {
unauthenticated_metrics_access = "true"
}
}
storage "raft" {
path = "/openbao/data"
}
service_registration "kubernetes" {}
telemetry {
prometheus_retention_time = "30s"
disable_hostname = true
}
serviceAccount:
create: true
name: openbao
serviceDiscovery:
enabled: true
ui:
enabled: true
serviceType: ClusterIP
activeOpenbaoPodOnly: false
csi:
enabled: false

View File

@@ -4,7 +4,7 @@ type: workplan
title: "OpenBao Platform Secrets Service"
domain: railiance
repo: railiance-platform
status: proposed
status: active
owner: codex
topic_slug: railiance
planning_priority: high
@@ -74,7 +74,7 @@ Out of scope:
```task
id: RAIL-PL-WP-0002-T01
status: todo
status: done
priority: high
state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99"
```
@@ -84,11 +84,16 @@ ops-warden, Railiance, and application runbooks. Decide whether
Railiance standardizes on OpenBao, keeps Vault-compatible abstraction
language, or supports both for a transition period.
**2026-05-17:** Decision recorded in State Hub:
`a0df816c-3749-4418-9c8b-28eb428be953`. Railiance S3 standardizes on
OpenBao as the runtime platform secrets service. SOPS/age remains the
Git-at-rest bootstrap mechanism.
### T02 - Kubernetes Deployment Design
```task
id: RAIL-PL-WP-0002-T02
status: todo
status: done
priority: high
state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef"
```
@@ -98,11 +103,18 @@ backend, HA posture, ingress/internal service exposure, TLS, resource
limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback
strategy.
**2026-05-17:** Implemented `helm/openbao-values.yaml`, Make targets, and
`docs/openbao.md`. Deployed chart `openbao/openbao` `0.28.2` (app
`v2.5.3`) to Railiance01 namespace `openbao` as internal-only,
single-replica Raft with data/audit PVCs. Public ingress remains disabled;
OpenBao is intentionally uninitialized and sealed until the bootstrap
ceremony.
### T03 - Bootstrap, Unseal, And Break-Glass Procedure
```task
id: RAIL-PL-WP-0002-T03
status: todo
status: in_progress
priority: high
state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e"
```
@@ -112,6 +124,10 @@ emergency access, backup escrow, and recovery drill. Ensure the design
does not introduce an unmanaged "secret zero" worse than the current
SOPS/age bootstrap.
**2026-05-17:** Initial ceremony documented in `docs/openbao.md`. Still
needs human escrow assignment, root-token retirement details, and a
restore/recovery drill before live secrets move into OpenBao.
### T04 - Auth Methods And Workload Integration
```task
@@ -130,7 +146,7 @@ Operator, or sidecars/controllers.
```task
id: RAIL-PL-WP-0002-T05
status: todo
status: in_progress
priority: medium
state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095"
```
@@ -141,6 +157,16 @@ PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH
future paths, and an assessment of object-storage credential vending
integration with NK-WP-0007.
**2026-05-17:** Object-storage credential vending assessment started and
documented in `docs/openbao.md`. Existing `artifact-store` capabilities cover
artifact package preservation, an S3-compatible backend, env/file secret refs,
and `artifactstore storage verify --backend s3`. Railiance S3 should use
OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret
delivery, while `artifact-store` owns S3 backend behavior and
`ARTIFACT-STORE-WP-0007` owns MinIO/fork compatibility plus temporary
credential refresh decisions. NetKingdom remains the default owner for OIDC
identity if object storage adopts `AssumeRoleWithWebIdentity`.
### T06 - Backup, Audit, Monitoring, And Verification
```task
@@ -158,7 +184,7 @@ developer/operator verification script for the deployed service.
```task
id: RAIL-PL-WP-0002-T07
status: todo
status: in_progress
priority: medium
state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114"
```
@@ -168,6 +194,21 @@ artifact-store, and S5 applications where documentation or integration
must move from HashiCorp Vault-specific assumptions to OpenBao-first
or Vault-compatible abstraction language.
**2026-05-17:** Started cross-repo transition by updating
`net-kingdom/docs/platform-identity-security-architecture.md` and
`net-kingdom/SCOPE.md` so NetKingdom treats OpenBao as the runtime
platform secrets authority while SOPS/age remains bootstrap/Git-at-rest
protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app,
and stale HashiCorp Vault wording follow-ups.
**2026-05-17:** Linked the artifact-store transition to
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
Credential Vending` instead of creating duplicate S3 backend work in
`railiance-platform`. The OpenBao side of the handoff is now documented in
`docs/openbao.md`; remaining artifact-store work belongs in
`ARTIFACT-STORE-WP-0007-T004` and follow-up routing in
`ARTIFACT-STORE-WP-0007-T005`.
## Acceptance Criteria
- Railiance has an explicit decision on OpenBao versus HashiCorp Vault

View File

@@ -10,7 +10,7 @@ topic_slug: railiance
state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7"
superseded_by: RAIL-HO-WP-0004
created: "2026-03-11"
updated: "2026-03-26"
updated: "2026-05-17"
---
# S3 Platform Services Baseline
@@ -59,7 +59,7 @@ depend on.
```task
id: RAIL-PL-WP-0001-T01
state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99
status: todo
status: cancelled
priority: high
```
@@ -111,7 +111,7 @@ Running in the `platform` namespace; `make smoke` still passes.
```task
id: RAIL-PL-WP-0001-T02
state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88
status: todo
status: cancelled
priority: high
```
@@ -149,7 +149,7 @@ all data intact.
```task
id: RAIL-PL-WP-0001-T03
state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1
status: todo
status: cancelled
priority: medium
```
@@ -188,7 +188,7 @@ still operational; tombstone in place.
```task
id: RAIL-PL-WP-0001-T04
state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee
status: todo
status: cancelled
priority: high
```
@@ -212,7 +212,7 @@ remains available within the recovery window.
```task
id: RAIL-PL-WP-0001-T05
state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3
status: todo
status: cancelled
priority: medium
```
@@ -254,7 +254,7 @@ railiance-cluster backup still covers etcd/kubeconfig; no duplication.
```task
id: RAIL-PL-WP-0001-T06
state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811
status: todo
status: cancelled
priority: low
```