--- id: RAIL-PL-WP-0002 type: workplan title: "OpenBao Platform Secrets Service" domain: financials repo: railiance-platform status: finished owner: codex topic_slug: railiance planning_priority: high planning_order: 2 created: "2026-05-17" updated: "2026-05-29" depends_on: - RAIL-PL-WP-0001 state_hub_workstream_id: "fd1c045a-01d4-43be-980f-acbda6c64e6c" --- # RAIL-PL-WP-0002 - OpenBao Platform Secrets Service ## Goal Establish OpenBao as the canonical Railiance S3 platform secrets service, or define a controlled transition path from existing HashiCorp Vault assumptions to OpenBao. This workplan belongs in `railiance-platform` because S3 owns shared platform services: secret management, identity integration, object storage, backups, and other services consumed by S5 applications. ## Context OpenBao is an open-source, Linux Foundation-governed fork of Vault for managing, storing, and distributing secrets, certificates, and keys. The official OpenBao documentation includes Kubernetes deployment via Helm, CSI provider support, dynamic database secrets, Kubernetes service account token generation, and lease/revocation semantics. Current local architecture references still mention HashiCorp Vault in several places, especially credential bootstrap and ops-warden's Vault SSH backend. Railiance also uses SOPS/age for Git-at-rest secrets. The platform needs an explicit decision and migration path so "Vault" does not remain an accidental brand-specific dependency where "secrets manager" is what we really mean. ## Scope In scope: - decide whether OpenBao is the canonical Railiance platform secrets service - define deployment topology for OpenBao on the Railiance Kubernetes platform - define auth methods for workloads, operators, and automations - define secret engines for KV, database dynamic secrets, Kubernetes tokens, PKI/certificates, and future object-storage credential vending integrations - define CSI provider and/or External Secrets Operator integration - define unseal, backup, restore, break-glass, audit, and monitoring procedures - identify NetKingdom documentation and workplan updates needed to replace HashiCorp Vault-specific language with OpenBao-first language Out of scope: - replacing SOPS/age for Git-at-rest bootstrap secrets - changing S1/S2 cluster runtime configuration without coordination - rewriting ops-warden's SSH certificate backend in this workplan - implementing application-specific secrets in S5 ## Tasks ### T01 - OpenBao Decision And Migration Inventory ```task id: RAIL-PL-WP-0002-T01 status: done priority: high state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99" ``` Inventory current HashiCorp Vault assumptions across NetKingdom, ops-warden, Railiance, and application runbooks. Decide whether Railiance standardizes on OpenBao, keeps Vault-compatible abstraction language, or supports both for a transition period. **2026-05-17:** Decision recorded in State Hub: `a0df816c-3749-4418-9c8b-28eb428be953`. Railiance S3 standardizes on OpenBao as the runtime platform secrets service. SOPS/age remains the Git-at-rest bootstrap mechanism. ### T02 - Kubernetes Deployment Design ```task id: RAIL-PL-WP-0002-T02 status: done priority: high state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef" ``` Design the OpenBao Helm deployment for Railiance: namespace, storage backend, HA posture, ingress/internal service exposure, TLS, resource limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback strategy. **2026-05-17:** Implemented `helm/openbao-values.yaml`, Make targets, and `docs/openbao.md`. Deployed chart `openbao/openbao` `0.28.2` (app `v2.5.3`) to Railiance01 namespace `openbao` as internal-only, single-replica Raft with data/audit PVCs. Public ingress remains disabled; OpenBao is intentionally uninitialized and sealed until the bootstrap ceremony. ### T03 - Bootstrap, Unseal, And Break-Glass Procedure ```task id: RAIL-PL-WP-0002-T03 status: done priority: high state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e" ``` Define initialization, unseal, root-token retirement, operator access, emergency access, backup escrow, and recovery drill. Ensure the design does not introduce an unmanaged "secret zero" worse than the current SOPS/age bootstrap. **2026-05-17:** Initial ceremony documented in `docs/openbao.md`. Still needs human escrow assignment, root-token retirement details, and a restore/recovery drill before live secrets move into OpenBao. **2026-05-23:** Added non-secret bootstrap support: `make openbao-verify`, `make openbao-verify-post-unseal`, `make openbao-configure-initial`, `scripts/openbao-verify.sh`, `scripts/openbao-apply-initial-config.sh`, and initial platform policies under `openbao/policies/`. `docs/openbao.md` now spells out pre-flight checks, escrow handling, root-token retirement, and the post-unseal initial configuration path. The actual initialization/unseal ceremony remains gated on named human escrow recipients and must not happen in a casual agent shell. **2026-05-24:** Revised the custody model: `tegwick` (`bernd.worsch@gmail.com`, Gitea `tegwick`) is the setup operator/contact, not the long-term platform root of trust. The OpenBao ceremony is now gated on a separate NetKingdom king credential and guided bootstrap path. T03 remains `in_progress`: the live OpenBao init/unseal ceremony is still gated on king credential creation, custody mode approval, root-token disposition, reset/rotation, and restore-drill execution. **2026-05-26:** Live OpenBao is now initialized, unsealed, and post-unseal verified on Railiance01. NetKingdom bootstrap metadata records custody approval, root-token revocation, unseal-key rotation, and restore-drill confirmation. T03 remains `in_progress` for production-trust closeout: declarative audit, durable audit shipping, OIDC-backed admin login verification, residual taint response, and cleanup before live application secrets move in. These remaining operator-facing gates are consolidated in `NET-WP-0017`. **2026-05-29:** Railiance-owned bootstrap and break-glass scope is complete: `make openbao-status` and `make openbao-verify-post-unseal` pass against the live Railiance01 OpenBao pod, which is initialized, unsealed, and active with Bound data/audit PVCs. The production-trust gates that remain before ordinary user onboarding or live application secrets move into OpenBao are now explicitly owned by `NET-WP-0017`: declarative/durable audit closeout, OIDC-backed admin login evidence, residual taint cleanup, and hardening. ### T04 - Auth Methods And Workload Integration ```task id: RAIL-PL-WP-0002-T04 status: done priority: high state_hub_task_id: "ca2b3ac2-b522-4445-a418-c6ec312cd5f4" ``` Configure or document auth methods for Kubernetes workloads, NetKingdom identity, admins, agents, and automations. Decide when workloads use OpenBao directly, CSI-mounted secrets, External Secrets Operator, or sidecars/controllers. **2026-05-23:** Documented the auth and delivery model in `docs/openbao.md`. Bootstrap uses the one-time root token only for initial setup; platform operators use a non-root `platform-admin` token until NetKingdom OIDC/admin integration is ready; reviewers use `platform-readonly`; workloads use Kubernetes auth with namespace/service-account-bound policies. External Secrets Operator is preferred for Helm-compatible Kubernetes Secrets, CSI is reserved for mounted-file delivery and refresh-sensitive workloads, and the OpenBao injector remains disabled. ### T05 - Secret Engines And Dynamic Credentials ```task id: RAIL-PL-WP-0002-T05 status: done priority: medium state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095" ``` Enable and document the initial secret engines: KV v2 for platform configuration, database dynamic credentials for CNPG-managed PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH future paths, and an assessment of object-storage credential vending integration with NK-WP-0007. **2026-05-17:** Object-storage credential vending assessment started and documented in `docs/openbao.md`. Existing `artifact-store` capabilities cover artifact package preservation, an S3-compatible backend, env/file secret refs, and `artifactstore storage verify --backend s3`. Railiance S3 should use OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret delivery, while `artifact-store` owns S3 backend behavior and `ARTIFACT-STORE-WP-0007` owns MinIO/fork compatibility plus temporary credential refresh decisions. NetKingdom remains the default owner for OIDC identity if object storage adopts `AssumeRoleWithWebIdentity`. **2026-05-29:** Initial secret-engine scope is complete for this workplan: OpenBao has the `platform/` KV path and Kubernetes auth configured through the initial configuration helper, with `platform-admin` and `platform-readonly` policies present. Database dynamic credentials, PKI, SSH, and object-storage STS vending remain future integration work owned by their downstream service workplans and `ARTIFACT-STORE-WP-0007`; they are not blockers for the platform secrets service closeout. ### T06 - Backup, Audit, Monitoring, And Verification ```task id: RAIL-PL-WP-0002-T06 status: done priority: medium state_hub_task_id: "cd61bc7d-8b9f-484f-97bd-7254c227b0ee" ``` Define backup/restore procedure, audit device configuration, metrics, logs, health checks, restore drill, and smoke tests. Include a developer/operator verification script for the deployed service. **2026-05-23:** Documented audit, Raft snapshot, encrypted snapshot custody, isolated restore drill, durable audit-log shipping, and monitoring baseline in `docs/openbao.md`. Added `scripts/openbao-verify.sh` plus Make targets for basic and post-unseal verification. The restore drill still must be executed before any live application secrets are migrated; that remains a gate under T03. **2026-05-26:** `make openbao-verify-post-unseal` passes against the live OpenBao pod: Kubernetes objects exist, the pod is running, OpenBao reports `Initialized: true` and `Sealed: false`, and data/audit directories exist. Authenticated checks for audit devices, auth methods, and mounts still require the OIDC-backed or temporary platform-admin path and remain part of the production-readiness closeout. **2026-06-01:** Added the source-side declarative file-audit configuration required by `NET-WP-0017-T02`: `helm/openbao-values.yaml` now includes an OpenBao `audit "file" "file"` stanza writing to `/openbao/audit/openbao-audit.log`, and `scripts/openbao-apply-initial-config.sh` now verifies audit visibility with `bao audit list` instead of attempting API-managed audit creation. The post-unseal verifier now warns when the audit log file is missing or empty. Live verification still reports the pod unsealed and healthy, but also reports the audit log file missing because this Helm change has not yet been rolled out. Roll out only in an attended window with unseal shares available. **2026-06-01:** Rolled out the declarative audit configuration to the live Railiance01 OpenBao release in an attended window. Because the StatefulSet uses `OnDelete`, the pod was explicitly recycled after the Helm values upgrade and then unsealed by the operator. Post-unseal verification now reports OpenBao `2.5.4`, `Sealed: false`, the audit directory present, and `/openbao/audit/openbao-audit.log` present and non-empty. The source values now pin the live OpenBao image tag to `2.5.4`; Helm release revision 3 has the same explicit tag and the pod remained ready, so future chart upgrades do not implicitly change the runtime version while applying unrelated configuration. **2026-06-01:** Added `make openbao-verify-authenticated` as a non-mutating operator proof for the remaining OpenBao readiness checks that require an approved token. The helper prompts for the token without echoing it, verifies `file/` audit visibility, `platform/` secrets, `kubernetes/` and `keycape/` auth methods, and confirms the audit log file is non-empty. It can also use an already-valid pod token helper via `OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper` so the token does not move through the local shell at all. Durable audit shipping beyond the audit PVC remains intentionally open until a tested sink is selected; State Hub notes and hashes are evidence, not retained audit custody. **2026-06-01:** Ran the authenticated verifier against the live pod token helper immediately after a fresh `bao login -no-print -method=oidc -path=keycape role=platform-admin` browser/MFA flow. The verifier passed: OpenBao is unsealed on `2.5.4`, `bao audit list` shows `file/`, `bao secrets list` shows `platform/`, `bao auth list` shows `kubernetes/` and `keycape/`, and `/openbao/audit/openbao-audit.log` grew from 7969 bytes to 23330 bytes during the check. No token value was printed or copied into the workplan. The cached verifier token was then revoked with `bao token revoke -self`. **2026-06-01:** Durable tenant-aware audit retention is now a separate `audit-core` product/repo instead of a Railiance OpenBao bootstrap subtask. The initial Audit Core mock backend writes JSONL events under `/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and removes files older than seven days; it is suitable for interface wiring and setup validation only. Railiance still owns the OpenBao file audit device and PVC, while production retention, tenant policy, and tamper-evident archive belong to Audit Core. **2026-06-01:** Added a non-secret OpenBao restore-drill evidence template and `make openbao-validate-restore-evidence`. The validator requires concrete review evidence such as snapshot hashes, encrypted snapshot location, isolated restore completion, unseal/status/test-secret verification, isolated environment destruction, and a `no_secret_material_recorded` assertion. This keeps `NET-WP-0017-T02` from relying on a bare UI checkbox for restore proof. **2026-06-01:** Added the matching non-secret emergency seal/unseal drill evidence template and `make openbao-validate-emergency-evidence`. The validator requires an attended seal/unseal evidence file with timing, sealed-state proof, unseal quorum availability, post-unseal verification, availability-window duration, and `no_secret_material_recorded`. The validator does not run the disruptive drill; it only checks the evidence captured after the attended operation. **2026-06-02:** Hardened both evidence validators so unchanged templates or obvious placeholder values cannot accidentally satisfy NetKingdom T02. Restore evidence now rejects placeholder digests and template wording, while emergency drill evidence rejects template wording. Operators must copy the examples into local evidence files and replace placeholders with real non-secret drill evidence before validation can pass. ### T07 - Cross-Repo Transition Tasks ```task id: RAIL-PL-WP-0002-T07 status: done priority: medium state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114" ``` Create or link follow-up tasks for NetKingdom, ops-warden, ops-bridge, artifact-store, and S5 applications where documentation or integration must move from HashiCorp Vault-specific assumptions to OpenBao-first or Vault-compatible abstraction language. **2026-05-17:** Started cross-repo transition by updating `net-kingdom/docs/platform-identity-security-architecture.md` and `net-kingdom/SCOPE.md` so NetKingdom treats OpenBao as the runtime platform secrets authority while SOPS/age remains bootstrap/Git-at-rest protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app, and stale HashiCorp Vault wording follow-ups. **2026-05-24:** Updated NetKingdom custody linkage: `net-kingdom/docs/platform-root-custody.md`, `NET-WP-0015`, and `NET-WP-0016` now define `tegwick` as setup operator/contact and a separate king credential as the platform-root custody target for OpenBao. **2026-05-17:** Linked the artifact-store transition to `ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS Credential Vending` instead of creating duplicate S3 backend work in `railiance-platform`. The OpenBao side of the handoff is now documented in `docs/openbao.md`; remaining artifact-store work belongs in `ARTIFACT-STORE-WP-0007-T004` and follow-up routing in `ARTIFACT-STORE-WP-0007-T005`. **2026-05-29:** Cross-repo transition ownership is explicit enough for Railiance closeout. NetKingdom owns the remaining identity, OIDC admin login, operator UX, hardening, and onboarding-readiness gates through `NET-WP-0017`. Artifact-store owns S3-compatible backend and credential-vending decisions through `ARTIFACT-STORE-WP-0007`. Future application-specific OpenBao adoption belongs with the relevant S5/application workplans once user onboarding is unblocked. ## Acceptance Criteria - Railiance has an explicit decision on OpenBao versus HashiCorp Vault for platform secrets management. - OpenBao deployment topology is defined for the S3 platform-services layer. - Bootstrap, unseal, backup, restore, audit, and break-glass procedures are documented before live secrets are migrated. - Integration choices are clear for Kubernetes workloads, NetKingdom identity, dynamic database credentials, and future object-storage STS credential vending. - SOPS/age remains the bootstrap Git-at-rest mechanism unless a later ADR deliberately replaces it.