- Align agent files with on-disk workplan prefixes (infer from workplan ids) - Set workplan domain to registered domain_slug; add topic_slug where applicable - Repair frontmatter delimiter formatting; migrate legacy task status literals - Regenerate AGENTS.md, CLAUDE.md, and .claude/rules from State Hub templates
374 lines
17 KiB
Markdown
374 lines
17 KiB
Markdown
---
|
|
id: RAIL-PL-WP-0002
|
|
type: workplan
|
|
title: "OpenBao Platform Secrets Service"
|
|
domain: financials
|
|
repo: railiance-platform
|
|
status: finished
|
|
owner: codex
|
|
topic_slug: railiance
|
|
planning_priority: high
|
|
planning_order: 2
|
|
created: "2026-05-17"
|
|
updated: "2026-05-29"
|
|
depends_on:
|
|
- RAIL-PL-WP-0001
|
|
state_hub_workstream_id: "fd1c045a-01d4-43be-980f-acbda6c64e6c"
|
|
---
|
|
|
|
# RAIL-PL-WP-0002 - OpenBao Platform Secrets Service
|
|
|
|
## Goal
|
|
|
|
Establish OpenBao as the canonical Railiance S3 platform secrets service,
|
|
or define a controlled transition path from existing HashiCorp Vault
|
|
assumptions to OpenBao.
|
|
|
|
This workplan belongs in `railiance-platform` because S3 owns shared
|
|
platform services: secret management, identity integration, object
|
|
storage, backups, and other services consumed by S5 applications.
|
|
|
|
## Context
|
|
|
|
OpenBao is an open-source, Linux Foundation-governed fork of Vault for
|
|
managing, storing, and distributing secrets, certificates, and keys.
|
|
The official OpenBao documentation includes Kubernetes deployment via
|
|
Helm, CSI provider support, dynamic database secrets, Kubernetes service
|
|
account token generation, and lease/revocation semantics.
|
|
|
|
Current local architecture references still mention HashiCorp Vault in
|
|
several places, especially credential bootstrap and ops-warden's Vault
|
|
SSH backend. Railiance also uses SOPS/age for Git-at-rest secrets. The
|
|
platform needs an explicit decision and migration path so "Vault" does
|
|
not remain an accidental brand-specific dependency where "secrets
|
|
manager" is what we really mean.
|
|
|
|
## Scope
|
|
|
|
In scope:
|
|
|
|
- decide whether OpenBao is the canonical Railiance platform secrets
|
|
service
|
|
- define deployment topology for OpenBao on the Railiance Kubernetes
|
|
platform
|
|
- define auth methods for workloads, operators, and automations
|
|
- define secret engines for KV, database dynamic secrets, Kubernetes
|
|
tokens, PKI/certificates, and future object-storage credential
|
|
vending integrations
|
|
- define CSI provider and/or External Secrets Operator integration
|
|
- define unseal, backup, restore, break-glass, audit, and monitoring
|
|
procedures
|
|
- identify NetKingdom documentation and workplan updates needed to
|
|
replace HashiCorp Vault-specific language with OpenBao-first language
|
|
|
|
Out of scope:
|
|
|
|
- replacing SOPS/age for Git-at-rest bootstrap secrets
|
|
- changing S1/S2 cluster runtime configuration without coordination
|
|
- rewriting ops-warden's SSH certificate backend in this workplan
|
|
- implementing application-specific secrets in S5
|
|
|
|
## Tasks
|
|
|
|
### T01 - OpenBao Decision And Migration Inventory
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99"
|
|
```
|
|
|
|
Inventory current HashiCorp Vault assumptions across NetKingdom,
|
|
ops-warden, Railiance, and application runbooks. Decide whether
|
|
Railiance standardizes on OpenBao, keeps Vault-compatible abstraction
|
|
language, or supports both for a transition period.
|
|
|
|
**2026-05-17:** Decision recorded in State Hub:
|
|
`a0df816c-3749-4418-9c8b-28eb428be953`. Railiance S3 standardizes on
|
|
OpenBao as the runtime platform secrets service. SOPS/age remains the
|
|
Git-at-rest bootstrap mechanism.
|
|
|
|
### T02 - Kubernetes Deployment Design
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef"
|
|
```
|
|
|
|
Design the OpenBao Helm deployment for Railiance: namespace, storage
|
|
backend, HA posture, ingress/internal service exposure, TLS, resource
|
|
limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback
|
|
strategy.
|
|
|
|
**2026-05-17:** Implemented `helm/openbao-values.yaml`, Make targets, and
|
|
`docs/openbao.md`. Deployed chart `openbao/openbao` `0.28.2` (app
|
|
`v2.5.3`) to Railiance01 namespace `openbao` as internal-only,
|
|
single-replica Raft with data/audit PVCs. Public ingress remains disabled;
|
|
OpenBao is intentionally uninitialized and sealed until the bootstrap
|
|
ceremony.
|
|
|
|
### T03 - Bootstrap, Unseal, And Break-Glass Procedure
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e"
|
|
```
|
|
|
|
Define initialization, unseal, root-token retirement, operator access,
|
|
emergency access, backup escrow, and recovery drill. Ensure the design
|
|
does not introduce an unmanaged "secret zero" worse than the current
|
|
SOPS/age bootstrap.
|
|
|
|
**2026-05-17:** Initial ceremony documented in `docs/openbao.md`. Still
|
|
needs human escrow assignment, root-token retirement details, and a
|
|
restore/recovery drill before live secrets move into OpenBao.
|
|
|
|
**2026-05-23:** Added non-secret bootstrap support: `make openbao-verify`,
|
|
`make openbao-verify-post-unseal`, `make openbao-configure-initial`,
|
|
`scripts/openbao-verify.sh`, `scripts/openbao-apply-initial-config.sh`, and
|
|
initial platform policies under `openbao/policies/`. `docs/openbao.md` now
|
|
spells out pre-flight checks, escrow handling, root-token retirement, and the
|
|
post-unseal initial configuration path. The actual initialization/unseal
|
|
ceremony remains gated on named human escrow recipients and must not happen in
|
|
a casual agent shell.
|
|
|
|
**2026-05-24:** Revised the custody model: `tegwick`
|
|
(`bernd.worsch@gmail.com`, Gitea `tegwick`) is the setup operator/contact, not
|
|
the long-term platform root of trust. The OpenBao ceremony is now gated on a
|
|
separate NetKingdom king credential and guided bootstrap path. T03 remains
|
|
`in_progress`: the live OpenBao init/unseal ceremony is still gated on king
|
|
credential creation, custody mode approval, root-token disposition,
|
|
reset/rotation, and restore-drill execution.
|
|
|
|
**2026-05-26:** Live OpenBao is now initialized, unsealed, and post-unseal
|
|
verified on Railiance01. NetKingdom bootstrap metadata records custody approval,
|
|
root-token revocation, unseal-key rotation, and restore-drill confirmation.
|
|
T03 remains `in_progress` for production-trust closeout: declarative audit,
|
|
durable audit shipping, OIDC-backed admin login verification, residual taint
|
|
response, and cleanup before live application secrets move in. These remaining
|
|
operator-facing gates are consolidated in `NET-WP-0017`.
|
|
|
|
**2026-05-29:** Railiance-owned bootstrap and break-glass scope is complete:
|
|
`make openbao-status` and `make openbao-verify-post-unseal` pass against the
|
|
live Railiance01 OpenBao pod, which is initialized, unsealed, and active with
|
|
Bound data/audit PVCs. The production-trust gates that remain before ordinary
|
|
user onboarding or live application secrets move into OpenBao are now explicitly
|
|
owned by `NET-WP-0017`: declarative/durable audit closeout, OIDC-backed admin
|
|
login evidence, residual taint cleanup, and hardening.
|
|
|
|
### T04 - Auth Methods And Workload Integration
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T04
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "ca2b3ac2-b522-4445-a418-c6ec312cd5f4"
|
|
```
|
|
|
|
Configure or document auth methods for Kubernetes workloads,
|
|
NetKingdom identity, admins, agents, and automations. Decide when
|
|
workloads use OpenBao directly, CSI-mounted secrets, External Secrets
|
|
Operator, or sidecars/controllers.
|
|
|
|
**2026-05-23:** Documented the auth and delivery model in `docs/openbao.md`.
|
|
Bootstrap uses the one-time root token only for initial setup; platform
|
|
operators use a non-root `platform-admin` token until NetKingdom OIDC/admin
|
|
integration is ready; reviewers use `platform-readonly`; workloads use
|
|
Kubernetes auth with namespace/service-account-bound policies. External
|
|
Secrets Operator is preferred for Helm-compatible Kubernetes Secrets, CSI is
|
|
reserved for mounted-file delivery and refresh-sensitive workloads, and the
|
|
OpenBao injector remains disabled.
|
|
|
|
### T05 - Secret Engines And Dynamic Credentials
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095"
|
|
```
|
|
|
|
Enable and document the initial secret engines: KV v2 for platform
|
|
configuration, database dynamic credentials for CNPG-managed
|
|
PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH
|
|
future paths, and an assessment of object-storage credential vending
|
|
integration with NK-WP-0007.
|
|
|
|
**2026-05-17:** Object-storage credential vending assessment started and
|
|
documented in `docs/openbao.md`. Existing `artifact-store` capabilities cover
|
|
artifact package preservation, an S3-compatible backend, env/file secret refs,
|
|
and `artifactstore storage verify --backend s3`. Railiance S3 should use
|
|
OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret
|
|
delivery, while `artifact-store` owns S3 backend behavior and
|
|
`ARTIFACT-STORE-WP-0007` owns MinIO/fork compatibility plus temporary
|
|
credential refresh decisions. NetKingdom remains the default owner for OIDC
|
|
identity if object storage adopts `AssumeRoleWithWebIdentity`.
|
|
|
|
**2026-05-29:** Initial secret-engine scope is complete for this workplan:
|
|
OpenBao has the `platform/` KV path and Kubernetes auth configured through the
|
|
initial configuration helper, with `platform-admin` and `platform-readonly`
|
|
policies present. Database dynamic credentials, PKI, SSH, and object-storage
|
|
STS vending remain future integration work owned by their downstream service
|
|
workplans and `ARTIFACT-STORE-WP-0007`; they are not blockers for the platform
|
|
secrets service closeout.
|
|
|
|
### T06 - Backup, Audit, Monitoring, And Verification
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T06
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "cd61bc7d-8b9f-484f-97bd-7254c227b0ee"
|
|
```
|
|
|
|
Define backup/restore procedure, audit device configuration, metrics,
|
|
logs, health checks, restore drill, and smoke tests. Include a
|
|
developer/operator verification script for the deployed service.
|
|
|
|
**2026-05-23:** Documented audit, Raft snapshot, encrypted snapshot custody,
|
|
isolated restore drill, durable audit-log shipping, and monitoring baseline in
|
|
`docs/openbao.md`. Added `scripts/openbao-verify.sh` plus Make targets for
|
|
basic and post-unseal verification. The restore drill still must be executed
|
|
before any live application secrets are migrated; that remains a gate under
|
|
T03.
|
|
|
|
**2026-05-26:** `make openbao-verify-post-unseal` passes against the live
|
|
OpenBao pod: Kubernetes objects exist, the pod is running, OpenBao reports
|
|
`Initialized: true` and `Sealed: false`, and data/audit directories exist.
|
|
Authenticated checks for audit devices, auth methods, and mounts still require
|
|
the OIDC-backed or temporary platform-admin path and remain part of the
|
|
production-readiness closeout.
|
|
|
|
**2026-06-01:** Added the source-side declarative file-audit configuration
|
|
required by `NET-WP-0017-T02`: `helm/openbao-values.yaml` now includes an
|
|
OpenBao `audit "file" "file"` stanza writing to
|
|
`/openbao/audit/openbao-audit.log`, and
|
|
`scripts/openbao-apply-initial-config.sh` now verifies audit visibility with
|
|
`bao audit list` instead of attempting API-managed audit creation. The
|
|
post-unseal verifier now warns when the audit log file is missing or empty.
|
|
Live verification still reports the pod unsealed and healthy, but also reports
|
|
the audit log file missing because this Helm change has not yet been rolled
|
|
out. Roll out only in an attended window with unseal shares available.
|
|
|
|
**2026-06-01:** Rolled out the declarative audit configuration to the live
|
|
Railiance01 OpenBao release in an attended window. Because the StatefulSet uses
|
|
`OnDelete`, the pod was explicitly recycled after the Helm values upgrade and
|
|
then unsealed by the operator. Post-unseal verification now reports OpenBao
|
|
`2.5.4`, `Sealed: false`, the audit directory present, and
|
|
`/openbao/audit/openbao-audit.log` present and non-empty. The source values now
|
|
pin the live OpenBao image tag to `2.5.4`; Helm release revision 3 has the same
|
|
explicit tag and the pod remained ready, so future chart upgrades do not
|
|
implicitly change the runtime version while applying unrelated configuration.
|
|
|
|
**2026-06-01:** Added `make openbao-verify-authenticated` as a non-mutating
|
|
operator proof for the remaining OpenBao readiness checks that require an
|
|
approved token. The helper prompts for the token without echoing it, verifies
|
|
`file/` audit visibility, `platform/` secrets, `kubernetes/` and `keycape/`
|
|
auth methods, and confirms the audit log file is non-empty. It can also use an
|
|
already-valid pod token helper via
|
|
`OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper` so the token does not move
|
|
through the local shell at all. Durable audit shipping beyond the audit PVC
|
|
remains intentionally open until a tested sink is selected; State Hub notes and
|
|
hashes are evidence, not retained audit custody.
|
|
|
|
**2026-06-01:** Ran the authenticated verifier against the live pod token
|
|
helper immediately after a fresh `bao login -no-print -method=oidc
|
|
-path=keycape role=platform-admin` browser/MFA flow. The verifier passed:
|
|
OpenBao is unsealed on `2.5.4`, `bao audit list` shows `file/`,
|
|
`bao secrets list` shows `platform/`, `bao auth list` shows `kubernetes/` and
|
|
`keycape/`, and `/openbao/audit/openbao-audit.log` grew from 7969 bytes to
|
|
23330 bytes during the check. No token value was printed or copied into the
|
|
workplan. The cached verifier token was then revoked with
|
|
`bao token revoke -self`.
|
|
|
|
**2026-06-01:** Durable tenant-aware audit retention is now a separate
|
|
`audit-core` product/repo instead of a Railiance OpenBao bootstrap subtask. The
|
|
initial Audit Core mock backend writes JSONL events under
|
|
`/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and removes files older than seven
|
|
days; it is suitable for interface wiring and setup validation only. Railiance
|
|
still owns the OpenBao file audit device and PVC, while production retention,
|
|
tenant policy, and tamper-evident archive belong to Audit Core.
|
|
|
|
**2026-06-01:** Added a non-secret OpenBao restore-drill evidence template and
|
|
`make openbao-validate-restore-evidence`. The validator requires concrete
|
|
review evidence such as snapshot hashes, encrypted snapshot location, isolated
|
|
restore completion, unseal/status/test-secret verification, isolated
|
|
environment destruction, and a `no_secret_material_recorded` assertion. This
|
|
keeps `NET-WP-0017-T02` from relying on a bare UI checkbox for restore proof.
|
|
|
|
**2026-06-01:** Added the matching non-secret emergency seal/unseal drill
|
|
evidence template and `make openbao-validate-emergency-evidence`. The validator
|
|
requires an attended seal/unseal evidence file with timing, sealed-state proof,
|
|
unseal quorum availability, post-unseal verification, availability-window
|
|
duration, and `no_secret_material_recorded`. The validator does not run the
|
|
disruptive drill; it only checks the evidence captured after the attended
|
|
operation.
|
|
|
|
**2026-06-02:** Hardened both evidence validators so unchanged templates or
|
|
obvious placeholder values cannot accidentally satisfy NetKingdom T02. Restore
|
|
evidence now rejects placeholder digests and template wording, while emergency
|
|
drill evidence rejects template wording. Operators must copy the examples into
|
|
local evidence files and replace placeholders with real non-secret drill
|
|
evidence before validation can pass.
|
|
|
|
### T07 - Cross-Repo Transition Tasks
|
|
|
|
```task
|
|
id: RAIL-PL-WP-0002-T07
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114"
|
|
```
|
|
|
|
Create or link follow-up tasks for NetKingdom, ops-warden, ops-bridge,
|
|
artifact-store, and S5 applications where documentation or integration
|
|
must move from HashiCorp Vault-specific assumptions to OpenBao-first
|
|
or Vault-compatible abstraction language.
|
|
|
|
**2026-05-17:** Started cross-repo transition by updating
|
|
`net-kingdom/docs/platform-identity-security-architecture.md` and
|
|
`net-kingdom/SCOPE.md` so NetKingdom treats OpenBao as the runtime
|
|
platform secrets authority while SOPS/age remains bootstrap/Git-at-rest
|
|
protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app,
|
|
and stale HashiCorp Vault wording follow-ups.
|
|
|
|
**2026-05-24:** Updated NetKingdom custody linkage:
|
|
`net-kingdom/docs/platform-root-custody.md`, `NET-WP-0015`, and `NET-WP-0016`
|
|
now define `tegwick` as setup operator/contact and a separate king credential
|
|
as the platform-root custody target for OpenBao.
|
|
|
|
**2026-05-17:** Linked the artifact-store transition to
|
|
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
|
|
Credential Vending` instead of creating duplicate S3 backend work in
|
|
`railiance-platform`. The OpenBao side of the handoff is now documented in
|
|
`docs/openbao.md`; remaining artifact-store work belongs in
|
|
`ARTIFACT-STORE-WP-0007-T004` and follow-up routing in
|
|
`ARTIFACT-STORE-WP-0007-T005`.
|
|
|
|
**2026-05-29:** Cross-repo transition ownership is explicit enough for
|
|
Railiance closeout. NetKingdom owns the remaining identity, OIDC admin login,
|
|
operator UX, hardening, and onboarding-readiness gates through `NET-WP-0017`.
|
|
Artifact-store owns S3-compatible backend and credential-vending decisions
|
|
through `ARTIFACT-STORE-WP-0007`. Future application-specific OpenBao adoption
|
|
belongs with the relevant S5/application workplans once user onboarding is
|
|
unblocked.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- Railiance has an explicit decision on OpenBao versus HashiCorp Vault
|
|
for platform secrets management.
|
|
- OpenBao deployment topology is defined for the S3 platform-services
|
|
layer.
|
|
- Bootstrap, unseal, backup, restore, audit, and break-glass procedures
|
|
are documented before live secrets are migrated.
|
|
- Integration choices are clear for Kubernetes workloads, NetKingdom
|
|
identity, dynamic database credentials, and future object-storage STS
|
|
credential vending.
|
|
- SOPS/age remains the bootstrap Git-at-rest mechanism unless a later
|
|
ADR deliberately replaces it.
|