Files
railiance-platform/workplans/RAIL-PL-WP-0002-openbao-platform-secrets-service.md

13 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, depends_on, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order created updated depends_on state_hub_workstream_id
RAIL-PL-WP-0002 workplan OpenBao Platform Secrets Service railiance railiance-platform finished codex railiance high 2 2026-05-17 2026-05-29
RAIL-PL-WP-0001
fd1c045a-01d4-43be-980f-acbda6c64e6c

RAIL-PL-WP-0002 - OpenBao Platform Secrets Service

Goal

Establish OpenBao as the canonical Railiance S3 platform secrets service, or define a controlled transition path from existing HashiCorp Vault assumptions to OpenBao.

This workplan belongs in railiance-platform because S3 owns shared platform services: secret management, identity integration, object storage, backups, and other services consumed by S5 applications.

Context

OpenBao is an open-source, Linux Foundation-governed fork of Vault for managing, storing, and distributing secrets, certificates, and keys. The official OpenBao documentation includes Kubernetes deployment via Helm, CSI provider support, dynamic database secrets, Kubernetes service account token generation, and lease/revocation semantics.

Current local architecture references still mention HashiCorp Vault in several places, especially credential bootstrap and ops-warden's Vault SSH backend. Railiance also uses SOPS/age for Git-at-rest secrets. The platform needs an explicit decision and migration path so "Vault" does not remain an accidental brand-specific dependency where "secrets manager" is what we really mean.

Scope

In scope:

  • decide whether OpenBao is the canonical Railiance platform secrets service
  • define deployment topology for OpenBao on the Railiance Kubernetes platform
  • define auth methods for workloads, operators, and automations
  • define secret engines for KV, database dynamic secrets, Kubernetes tokens, PKI/certificates, and future object-storage credential vending integrations
  • define CSI provider and/or External Secrets Operator integration
  • define unseal, backup, restore, break-glass, audit, and monitoring procedures
  • identify NetKingdom documentation and workplan updates needed to replace HashiCorp Vault-specific language with OpenBao-first language

Out of scope:

  • replacing SOPS/age for Git-at-rest bootstrap secrets
  • changing S1/S2 cluster runtime configuration without coordination
  • rewriting ops-warden's SSH certificate backend in this workplan
  • implementing application-specific secrets in S5

Tasks

T01 - OpenBao Decision And Migration Inventory

id: RAIL-PL-WP-0002-T01
status: done
priority: high
state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99"

Inventory current HashiCorp Vault assumptions across NetKingdom, ops-warden, Railiance, and application runbooks. Decide whether Railiance standardizes on OpenBao, keeps Vault-compatible abstraction language, or supports both for a transition period.

2026-05-17: Decision recorded in State Hub: a0df816c-3749-4418-9c8b-28eb428be953. Railiance S3 standardizes on OpenBao as the runtime platform secrets service. SOPS/age remains the Git-at-rest bootstrap mechanism.

T02 - Kubernetes Deployment Design

id: RAIL-PL-WP-0002-T02
status: done
priority: high
state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef"

Design the OpenBao Helm deployment for Railiance: namespace, storage backend, HA posture, ingress/internal service exposure, TLS, resource limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback strategy.

2026-05-17: Implemented helm/openbao-values.yaml, Make targets, and docs/openbao.md. Deployed chart openbao/openbao 0.28.2 (app v2.5.3) to Railiance01 namespace openbao as internal-only, single-replica Raft with data/audit PVCs. Public ingress remains disabled; OpenBao is intentionally uninitialized and sealed until the bootstrap ceremony.

T03 - Bootstrap, Unseal, And Break-Glass Procedure

id: RAIL-PL-WP-0002-T03
status: done
priority: high
state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e"

Define initialization, unseal, root-token retirement, operator access, emergency access, backup escrow, and recovery drill. Ensure the design does not introduce an unmanaged "secret zero" worse than the current SOPS/age bootstrap.

2026-05-17: Initial ceremony documented in docs/openbao.md. Still needs human escrow assignment, root-token retirement details, and a restore/recovery drill before live secrets move into OpenBao.

2026-05-23: Added non-secret bootstrap support: make openbao-verify, make openbao-verify-post-unseal, make openbao-configure-initial, scripts/openbao-verify.sh, scripts/openbao-apply-initial-config.sh, and initial platform policies under openbao/policies/. docs/openbao.md now spells out pre-flight checks, escrow handling, root-token retirement, and the post-unseal initial configuration path. The actual initialization/unseal ceremony remains gated on named human escrow recipients and must not happen in a casual agent shell.

2026-05-24: Revised the custody model: tegwick (bernd.worsch@gmail.com, Gitea tegwick) is the setup operator/contact, not the long-term platform root of trust. The OpenBao ceremony is now gated on a separate NetKingdom king credential and guided bootstrap path. T03 remains in_progress: the live OpenBao init/unseal ceremony is still gated on king credential creation, custody mode approval, root-token disposition, reset/rotation, and restore-drill execution.

2026-05-26: Live OpenBao is now initialized, unsealed, and post-unseal verified on Railiance01. NetKingdom bootstrap metadata records custody approval, root-token revocation, unseal-key rotation, and restore-drill confirmation. T03 remains in_progress for production-trust closeout: declarative audit, durable audit shipping, OIDC-backed admin login verification, residual taint response, and cleanup before live application secrets move in. These remaining operator-facing gates are consolidated in NET-WP-0017.

2026-05-29: Railiance-owned bootstrap and break-glass scope is complete: make openbao-status and make openbao-verify-post-unseal pass against the live Railiance01 OpenBao pod, which is initialized, unsealed, and active with Bound data/audit PVCs. The production-trust gates that remain before ordinary user onboarding or live application secrets move into OpenBao are now explicitly owned by NET-WP-0017: declarative/durable audit closeout, OIDC-backed admin login evidence, residual taint cleanup, and hardening.

T04 - Auth Methods And Workload Integration

id: RAIL-PL-WP-0002-T04
status: done
priority: high
state_hub_task_id: "ca2b3ac2-b522-4445-a418-c6ec312cd5f4"

Configure or document auth methods for Kubernetes workloads, NetKingdom identity, admins, agents, and automations. Decide when workloads use OpenBao directly, CSI-mounted secrets, External Secrets Operator, or sidecars/controllers.

2026-05-23: Documented the auth and delivery model in docs/openbao.md. Bootstrap uses the one-time root token only for initial setup; platform operators use a non-root platform-admin token until NetKingdom OIDC/admin integration is ready; reviewers use platform-readonly; workloads use Kubernetes auth with namespace/service-account-bound policies. External Secrets Operator is preferred for Helm-compatible Kubernetes Secrets, CSI is reserved for mounted-file delivery and refresh-sensitive workloads, and the OpenBao injector remains disabled.

T05 - Secret Engines And Dynamic Credentials

id: RAIL-PL-WP-0002-T05
status: done
priority: medium
state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095"

Enable and document the initial secret engines: KV v2 for platform configuration, database dynamic credentials for CNPG-managed PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH future paths, and an assessment of object-storage credential vending integration with NK-WP-0007.

2026-05-17: Object-storage credential vending assessment started and documented in docs/openbao.md. Existing artifact-store capabilities cover artifact package preservation, an S3-compatible backend, env/file secret refs, and artifactstore storage verify --backend s3. Railiance S3 should use OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret delivery, while artifact-store owns S3 backend behavior and ARTIFACT-STORE-WP-0007 owns MinIO/fork compatibility plus temporary credential refresh decisions. NetKingdom remains the default owner for OIDC identity if object storage adopts AssumeRoleWithWebIdentity.

2026-05-29: Initial secret-engine scope is complete for this workplan: OpenBao has the platform/ KV path and Kubernetes auth configured through the initial configuration helper, with platform-admin and platform-readonly policies present. Database dynamic credentials, PKI, SSH, and object-storage STS vending remain future integration work owned by their downstream service workplans and ARTIFACT-STORE-WP-0007; they are not blockers for the platform secrets service closeout.

T06 - Backup, Audit, Monitoring, And Verification

id: RAIL-PL-WP-0002-T06
status: done
priority: medium
state_hub_task_id: "cd61bc7d-8b9f-484f-97bd-7254c227b0ee"

Define backup/restore procedure, audit device configuration, metrics, logs, health checks, restore drill, and smoke tests. Include a developer/operator verification script for the deployed service.

2026-05-23: Documented audit, Raft snapshot, encrypted snapshot custody, isolated restore drill, durable audit-log shipping, and monitoring baseline in docs/openbao.md. Added scripts/openbao-verify.sh plus Make targets for basic and post-unseal verification. The restore drill still must be executed before any live application secrets are migrated; that remains a gate under T03.

2026-05-26: make openbao-verify-post-unseal passes against the live OpenBao pod: Kubernetes objects exist, the pod is running, OpenBao reports Initialized: true and Sealed: false, and data/audit directories exist. Authenticated checks for audit devices, auth methods, and mounts still require the OIDC-backed or temporary platform-admin path and remain part of the production-readiness closeout.

2026-06-01: Added the source-side declarative file-audit configuration required by NET-WP-0017-T02: helm/openbao-values.yaml now includes an OpenBao audit "file" "file" stanza writing to /openbao/audit/openbao-audit.log, and scripts/openbao-apply-initial-config.sh now verifies audit visibility with bao audit list instead of attempting API-managed audit creation. The post-unseal verifier now warns when the audit log file is missing or empty. Live verification still reports the pod unsealed and healthy, but also reports the audit log file missing because this Helm change has not yet been rolled out. Roll out only in an attended window with unseal shares available.

T07 - Cross-Repo Transition Tasks

id: RAIL-PL-WP-0002-T07
status: done
priority: medium
state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114"

Create or link follow-up tasks for NetKingdom, ops-warden, ops-bridge, artifact-store, and S5 applications where documentation or integration must move from HashiCorp Vault-specific assumptions to OpenBao-first or Vault-compatible abstraction language.

2026-05-17: Started cross-repo transition by updating net-kingdom/docs/platform-identity-security-architecture.md and net-kingdom/SCOPE.md so NetKingdom treats OpenBao as the runtime platform secrets authority while SOPS/age remains bootstrap/Git-at-rest protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app, and stale HashiCorp Vault wording follow-ups.

2026-05-24: Updated NetKingdom custody linkage: net-kingdom/docs/platform-root-custody.md, NET-WP-0015, and NET-WP-0016 now define tegwick as setup operator/contact and a separate king credential as the platform-root custody target for OpenBao.

2026-05-17: Linked the artifact-store transition to ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS Credential Vending instead of creating duplicate S3 backend work in railiance-platform. The OpenBao side of the handoff is now documented in docs/openbao.md; remaining artifact-store work belongs in ARTIFACT-STORE-WP-0007-T004 and follow-up routing in ARTIFACT-STORE-WP-0007-T005.

2026-05-29: Cross-repo transition ownership is explicit enough for Railiance closeout. NetKingdom owns the remaining identity, OIDC admin login, operator UX, hardening, and onboarding-readiness gates through NET-WP-0017. Artifact-store owns S3-compatible backend and credential-vending decisions through ARTIFACT-STORE-WP-0007. Future application-specific OpenBao adoption belongs with the relevant S5/application workplans once user onboarding is unblocked.

Acceptance Criteria

  • Railiance has an explicit decision on OpenBao versus HashiCorp Vault for platform secrets management.
  • OpenBao deployment topology is defined for the S3 platform-services layer.
  • Bootstrap, unseal, backup, restore, audit, and break-glass procedures are documented before live secrets are migrated.
  • Integration choices are clear for Kubernetes workloads, NetKingdom identity, dynamic database credentials, and future object-storage STS credential vending.
  • SOPS/age remains the bootstrap Git-at-rest mechanism unless a later ADR deliberately replaces it.