Archive closed workplans to workplans/archived/ (ADR-001)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
2026-07-02 00:25:41 +02:00
parent 5dc71fb6ff
commit 60814fc76a
5 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,373 @@
---
id: RAIL-PL-WP-0002
type: workplan
title: "OpenBao Platform Secrets Service"
domain: financials
repo: railiance-platform
status: finished
owner: codex
topic_slug: railiance
planning_priority: high
planning_order: 2
created: "2026-05-17"
updated: "2026-05-29"
depends_on:
- RAIL-PL-WP-0001
state_hub_workstream_id: "fd1c045a-01d4-43be-980f-acbda6c64e6c"
---
# RAIL-PL-WP-0002 - OpenBao Platform Secrets Service
## Goal
Establish OpenBao as the canonical Railiance S3 platform secrets service,
or define a controlled transition path from existing HashiCorp Vault
assumptions to OpenBao.
This workplan belongs in `railiance-platform` because S3 owns shared
platform services: secret management, identity integration, object
storage, backups, and other services consumed by S5 applications.
## Context
OpenBao is an open-source, Linux Foundation-governed fork of Vault for
managing, storing, and distributing secrets, certificates, and keys.
The official OpenBao documentation includes Kubernetes deployment via
Helm, CSI provider support, dynamic database secrets, Kubernetes service
account token generation, and lease/revocation semantics.
Current local architecture references still mention HashiCorp Vault in
several places, especially credential bootstrap and ops-warden's Vault
SSH backend. Railiance also uses SOPS/age for Git-at-rest secrets. The
platform needs an explicit decision and migration path so "Vault" does
not remain an accidental brand-specific dependency where "secrets
manager" is what we really mean.
## Scope
In scope:
- decide whether OpenBao is the canonical Railiance platform secrets
service
- define deployment topology for OpenBao on the Railiance Kubernetes
platform
- define auth methods for workloads, operators, and automations
- define secret engines for KV, database dynamic secrets, Kubernetes
tokens, PKI/certificates, and future object-storage credential
vending integrations
- define CSI provider and/or External Secrets Operator integration
- define unseal, backup, restore, break-glass, audit, and monitoring
procedures
- identify NetKingdom documentation and workplan updates needed to
replace HashiCorp Vault-specific language with OpenBao-first language
Out of scope:
- replacing SOPS/age for Git-at-rest bootstrap secrets
- changing S1/S2 cluster runtime configuration without coordination
- rewriting ops-warden's SSH certificate backend in this workplan
- implementing application-specific secrets in S5
## Tasks
### T01 - OpenBao Decision And Migration Inventory
```task
id: RAIL-PL-WP-0002-T01
status: done
priority: high
state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99"
```
Inventory current HashiCorp Vault assumptions across NetKingdom,
ops-warden, Railiance, and application runbooks. Decide whether
Railiance standardizes on OpenBao, keeps Vault-compatible abstraction
language, or supports both for a transition period.
**2026-05-17:** Decision recorded in State Hub:
`a0df816c-3749-4418-9c8b-28eb428be953`. Railiance S3 standardizes on
OpenBao as the runtime platform secrets service. SOPS/age remains the
Git-at-rest bootstrap mechanism.
### T02 - Kubernetes Deployment Design
```task
id: RAIL-PL-WP-0002-T02
status: done
priority: high
state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef"
```
Design the OpenBao Helm deployment for Railiance: namespace, storage
backend, HA posture, ingress/internal service exposure, TLS, resource
limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback
strategy.
**2026-05-17:** Implemented `helm/openbao-values.yaml`, Make targets, and
`docs/openbao.md`. Deployed chart `openbao/openbao` `0.28.2` (app
`v2.5.3`) to Railiance01 namespace `openbao` as internal-only,
single-replica Raft with data/audit PVCs. Public ingress remains disabled;
OpenBao is intentionally uninitialized and sealed until the bootstrap
ceremony.
### T03 - Bootstrap, Unseal, And Break-Glass Procedure
```task
id: RAIL-PL-WP-0002-T03
status: done
priority: high
state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e"
```
Define initialization, unseal, root-token retirement, operator access,
emergency access, backup escrow, and recovery drill. Ensure the design
does not introduce an unmanaged "secret zero" worse than the current
SOPS/age bootstrap.
**2026-05-17:** Initial ceremony documented in `docs/openbao.md`. Still
needs human escrow assignment, root-token retirement details, and a
restore/recovery drill before live secrets move into OpenBao.
**2026-05-23:** Added non-secret bootstrap support: `make openbao-verify`,
`make openbao-verify-post-unseal`, `make openbao-configure-initial`,
`scripts/openbao-verify.sh`, `scripts/openbao-apply-initial-config.sh`, and
initial platform policies under `openbao/policies/`. `docs/openbao.md` now
spells out pre-flight checks, escrow handling, root-token retirement, and the
post-unseal initial configuration path. The actual initialization/unseal
ceremony remains gated on named human escrow recipients and must not happen in
a casual agent shell.
**2026-05-24:** Revised the custody model: `tegwick`
(`bernd.worsch@gmail.com`, Gitea `tegwick`) is the setup operator/contact, not
the long-term platform root of trust. The OpenBao ceremony is now gated on a
separate NetKingdom king credential and guided bootstrap path. T03 remains
`in_progress`: the live OpenBao init/unseal ceremony is still gated on king
credential creation, custody mode approval, root-token disposition,
reset/rotation, and restore-drill execution.
**2026-05-26:** Live OpenBao is now initialized, unsealed, and post-unseal
verified on Railiance01. NetKingdom bootstrap metadata records custody approval,
root-token revocation, unseal-key rotation, and restore-drill confirmation.
T03 remains `in_progress` for production-trust closeout: declarative audit,
durable audit shipping, OIDC-backed admin login verification, residual taint
response, and cleanup before live application secrets move in. These remaining
operator-facing gates are consolidated in `NET-WP-0017`.
**2026-05-29:** Railiance-owned bootstrap and break-glass scope is complete:
`make openbao-status` and `make openbao-verify-post-unseal` pass against the
live Railiance01 OpenBao pod, which is initialized, unsealed, and active with
Bound data/audit PVCs. The production-trust gates that remain before ordinary
user onboarding or live application secrets move into OpenBao are now explicitly
owned by `NET-WP-0017`: declarative/durable audit closeout, OIDC-backed admin
login evidence, residual taint cleanup, and hardening.
### T04 - Auth Methods And Workload Integration
```task
id: RAIL-PL-WP-0002-T04
status: done
priority: high
state_hub_task_id: "ca2b3ac2-b522-4445-a418-c6ec312cd5f4"
```
Configure or document auth methods for Kubernetes workloads,
NetKingdom identity, admins, agents, and automations. Decide when
workloads use OpenBao directly, CSI-mounted secrets, External Secrets
Operator, or sidecars/controllers.
**2026-05-23:** Documented the auth and delivery model in `docs/openbao.md`.
Bootstrap uses the one-time root token only for initial setup; platform
operators use a non-root `platform-admin` token until NetKingdom OIDC/admin
integration is ready; reviewers use `platform-readonly`; workloads use
Kubernetes auth with namespace/service-account-bound policies. External
Secrets Operator is preferred for Helm-compatible Kubernetes Secrets, CSI is
reserved for mounted-file delivery and refresh-sensitive workloads, and the
OpenBao injector remains disabled.
### T05 - Secret Engines And Dynamic Credentials
```task
id: RAIL-PL-WP-0002-T05
status: done
priority: medium
state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095"
```
Enable and document the initial secret engines: KV v2 for platform
configuration, database dynamic credentials for CNPG-managed
PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH
future paths, and an assessment of object-storage credential vending
integration with NK-WP-0007.
**2026-05-17:** Object-storage credential vending assessment started and
documented in `docs/openbao.md`. Existing `artifact-store` capabilities cover
artifact package preservation, an S3-compatible backend, env/file secret refs,
and `artifactstore storage verify --backend s3`. Railiance S3 should use
OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret
delivery, while `artifact-store` owns S3 backend behavior and
`ARTIFACT-STORE-WP-0007` owns MinIO/fork compatibility plus temporary
credential refresh decisions. NetKingdom remains the default owner for OIDC
identity if object storage adopts `AssumeRoleWithWebIdentity`.
**2026-05-29:** Initial secret-engine scope is complete for this workplan:
OpenBao has the `platform/` KV path and Kubernetes auth configured through the
initial configuration helper, with `platform-admin` and `platform-readonly`
policies present. Database dynamic credentials, PKI, SSH, and object-storage
STS vending remain future integration work owned by their downstream service
workplans and `ARTIFACT-STORE-WP-0007`; they are not blockers for the platform
secrets service closeout.
### T06 - Backup, Audit, Monitoring, And Verification
```task
id: RAIL-PL-WP-0002-T06
status: done
priority: medium
state_hub_task_id: "cd61bc7d-8b9f-484f-97bd-7254c227b0ee"
```
Define backup/restore procedure, audit device configuration, metrics,
logs, health checks, restore drill, and smoke tests. Include a
developer/operator verification script for the deployed service.
**2026-05-23:** Documented audit, Raft snapshot, encrypted snapshot custody,
isolated restore drill, durable audit-log shipping, and monitoring baseline in
`docs/openbao.md`. Added `scripts/openbao-verify.sh` plus Make targets for
basic and post-unseal verification. The restore drill still must be executed
before any live application secrets are migrated; that remains a gate under
T03.
**2026-05-26:** `make openbao-verify-post-unseal` passes against the live
OpenBao pod: Kubernetes objects exist, the pod is running, OpenBao reports
`Initialized: true` and `Sealed: false`, and data/audit directories exist.
Authenticated checks for audit devices, auth methods, and mounts still require
the OIDC-backed or temporary platform-admin path and remain part of the
production-readiness closeout.
**2026-06-01:** Added the source-side declarative file-audit configuration
required by `NET-WP-0017-T02`: `helm/openbao-values.yaml` now includes an
OpenBao `audit "file" "file"` stanza writing to
`/openbao/audit/openbao-audit.log`, and
`scripts/openbao-apply-initial-config.sh` now verifies audit visibility with
`bao audit list` instead of attempting API-managed audit creation. The
post-unseal verifier now warns when the audit log file is missing or empty.
Live verification still reports the pod unsealed and healthy, but also reports
the audit log file missing because this Helm change has not yet been rolled
out. Roll out only in an attended window with unseal shares available.
**2026-06-01:** Rolled out the declarative audit configuration to the live
Railiance01 OpenBao release in an attended window. Because the StatefulSet uses
`OnDelete`, the pod was explicitly recycled after the Helm values upgrade and
then unsealed by the operator. Post-unseal verification now reports OpenBao
`2.5.4`, `Sealed: false`, the audit directory present, and
`/openbao/audit/openbao-audit.log` present and non-empty. The source values now
pin the live OpenBao image tag to `2.5.4`; Helm release revision 3 has the same
explicit tag and the pod remained ready, so future chart upgrades do not
implicitly change the runtime version while applying unrelated configuration.
**2026-06-01:** Added `make openbao-verify-authenticated` as a non-mutating
operator proof for the remaining OpenBao readiness checks that require an
approved token. The helper prompts for the token without echoing it, verifies
`file/` audit visibility, `platform/` secrets, `kubernetes/` and `keycape/`
auth methods, and confirms the audit log file is non-empty. It can also use an
already-valid pod token helper via
`OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper` so the token does not move
through the local shell at all. Durable audit shipping beyond the audit PVC
remains intentionally open until a tested sink is selected; State Hub notes and
hashes are evidence, not retained audit custody.
**2026-06-01:** Ran the authenticated verifier against the live pod token
helper immediately after a fresh `bao login -no-print -method=oidc
-path=keycape role=platform-admin` browser/MFA flow. The verifier passed:
OpenBao is unsealed on `2.5.4`, `bao audit list` shows `file/`,
`bao secrets list` shows `platform/`, `bao auth list` shows `kubernetes/` and
`keycape/`, and `/openbao/audit/openbao-audit.log` grew from 7969 bytes to
23330 bytes during the check. No token value was printed or copied into the
workplan. The cached verifier token was then revoked with
`bao token revoke -self`.
**2026-06-01:** Durable tenant-aware audit retention is now a separate
`audit-core` product/repo instead of a Railiance OpenBao bootstrap subtask. The
initial Audit Core mock backend writes JSONL events under
`/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and removes files older than seven
days; it is suitable for interface wiring and setup validation only. Railiance
still owns the OpenBao file audit device and PVC, while production retention,
tenant policy, and tamper-evident archive belong to Audit Core.
**2026-06-01:** Added a non-secret OpenBao restore-drill evidence template and
`make openbao-validate-restore-evidence`. The validator requires concrete
review evidence such as snapshot hashes, encrypted snapshot location, isolated
restore completion, unseal/status/test-secret verification, isolated
environment destruction, and a `no_secret_material_recorded` assertion. This
keeps `NET-WP-0017-T02` from relying on a bare UI checkbox for restore proof.
**2026-06-01:** Added the matching non-secret emergency seal/unseal drill
evidence template and `make openbao-validate-emergency-evidence`. The validator
requires an attended seal/unseal evidence file with timing, sealed-state proof,
unseal quorum availability, post-unseal verification, availability-window
duration, and `no_secret_material_recorded`. The validator does not run the
disruptive drill; it only checks the evidence captured after the attended
operation.
**2026-06-02:** Hardened both evidence validators so unchanged templates or
obvious placeholder values cannot accidentally satisfy NetKingdom T02. Restore
evidence now rejects placeholder digests and template wording, while emergency
drill evidence rejects template wording. Operators must copy the examples into
local evidence files and replace placeholders with real non-secret drill
evidence before validation can pass.
### T07 - Cross-Repo Transition Tasks
```task
id: RAIL-PL-WP-0002-T07
status: done
priority: medium
state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114"
```
Create or link follow-up tasks for NetKingdom, ops-warden, ops-bridge,
artifact-store, and S5 applications where documentation or integration
must move from HashiCorp Vault-specific assumptions to OpenBao-first
or Vault-compatible abstraction language.
**2026-05-17:** Started cross-repo transition by updating
`net-kingdom/docs/platform-identity-security-architecture.md` and
`net-kingdom/SCOPE.md` so NetKingdom treats OpenBao as the runtime
platform secrets authority while SOPS/age remains bootstrap/Git-at-rest
protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app,
and stale HashiCorp Vault wording follow-ups.
**2026-05-24:** Updated NetKingdom custody linkage:
`net-kingdom/docs/platform-root-custody.md`, `NET-WP-0015`, and `NET-WP-0016`
now define `tegwick` as setup operator/contact and a separate king credential
as the platform-root custody target for OpenBao.
**2026-05-17:** Linked the artifact-store transition to
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
Credential Vending` instead of creating duplicate S3 backend work in
`railiance-platform`. The OpenBao side of the handoff is now documented in
`docs/openbao.md`; remaining artifact-store work belongs in
`ARTIFACT-STORE-WP-0007-T004` and follow-up routing in
`ARTIFACT-STORE-WP-0007-T005`.
**2026-05-29:** Cross-repo transition ownership is explicit enough for
Railiance closeout. NetKingdom owns the remaining identity, OIDC admin login,
operator UX, hardening, and onboarding-readiness gates through `NET-WP-0017`.
Artifact-store owns S3-compatible backend and credential-vending decisions
through `ARTIFACT-STORE-WP-0007`. Future application-specific OpenBao adoption
belongs with the relevant S5/application workplans once user onboarding is
unblocked.
## Acceptance Criteria
- Railiance has an explicit decision on OpenBao versus HashiCorp Vault
for platform secrets management.
- OpenBao deployment topology is defined for the S3 platform-services
layer.
- Bootstrap, unseal, backup, restore, audit, and break-glass procedures
are documented before live secrets are migrated.
- Integration choices are clear for Kubernetes workloads, NetKingdom
identity, dynamic database credentials, and future object-storage STS
credential vending.
- SOPS/age remains the bootstrap Git-at-rest mechanism unless a later
ADR deliberately replaces it.

View File

@@ -0,0 +1,416 @@
---
id: RAILIANCE-WP-0003
type: workplan
title: "Provision shared CNPG cluster apps-pg"
domain: financials
repo: railiance-platform
status: finished
owner: codex
topic_slug: railiance
planning_priority: high
planning_order: 3
created: "2026-05-19"
updated: "2026-05-19"
state_hub_workstream_id: "665b3b9b-608a-4be4-84b6-dcb8261ff57b"
---
# RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg
## Goal
Provision a new shared CloudNativePG cluster `apps-pg` in the
`databases` namespace that S5 application workloads can use to host
their own PostgreSQL databases — without each app forcing the creation
of a dedicated CNPG cluster.
This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme
needs a `vergabe` role + `vergabe_db` database) by establishing the
shared cluster and the governed onboarding contract future S5 apps adopt
by default.
## Context
`railiance-apps` workplan `RAILIANCE-WP-0002` (establish
vergabe-teilnahme on railiance01) found at T01 that the two existing
CNPG clusters in `databases` are app-dedicated:
| Cluster | PG | Owner app |
|----------------|----|-------------|
| `gitea-db` | 18 | gitea |
| `net-kingdom-pg` | 16 | net-kingdom |
Decision `D-01` (resolved 2026-05-18, bernd) selected option D:
**provision a new shared cluster `apps-pg`** rather than create a third
dedicated cluster (option A) or retrofit an existing app cluster (B/C).
A coordination message was sent from `railiance-apps` to
`railiance-platform` requesting this work; this workplan is the
response.
## Placement in the Railiance Tooling Set
S3 owns CNPG `Cluster` CRs (per ADR-003 and the pattern already
established by `helm/gitea-db-cluster.yaml`). CNPG 1.28 has standalone
`Database` CRs, but PostgreSQL role lifecycle is managed through the
target `Cluster` spec's `.spec.managed.roles` stanza or through a
controlled operator-run SQL workflow. The shared-cluster contract must
therefore make role onboarding explicit; S5 repos should not assume a
standalone CNPG `Role` CR exists.
| Concern | Owner repo | Scope |
|---------|------------|-------|
| `Cluster apps-pg` CR, shared NetworkPolicies, bootstrap secret, baseline docs | `railiance-platform` | this workplan |
| Per-app database request and application DSN wiring | each S5 repo | not here |
| Per-app PostgreSQL role + credential provisioning | coordinated | documented here; platform-administered until OpenBao/dedicated automation exists |
| Per-app runtime Secret in the consumer namespace | each S5 repo | not here |
## Current Evidence
- `kubectl get crd | grep cnpg` confirms CNPG 1.28.1 with the
`databases.postgresql.cnpg.io` CRD — databases can be represented
declaratively.
- CNPG role management is cluster-scoped via `.spec.managed.roles`;
no standalone CNPG `Role` CR is available for app repos to apply.
- Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`
(`cnpg-system` namespace).
- `databases` namespace has a default-deny-all NetworkPolicy; each
CNPG cluster therefore needs its own NetworkPolicy triplet
(egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns)
— pattern visible in `helm/gitea-db-networkpolicies.yaml`.
- `helm/apps-pg-cluster.yaml`, `helm/apps-pg-networkpolicies.yaml`,
`helm/apps-pg-secret.sops.yaml.template`, and `docs/apps-pg.md` are
present in the repo.
- Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f`
(state-hub thread).
## Implementation Notes
Completed on 2026-05-19.
- CNPG operator is `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`.
- `clusters.postgresql.cnpg.io` and `databases.postgresql.cnpg.io` CRDs
are present; `roles.postgresql.cnpg.io` is not present, so role
onboarding remains platform-administered through managed roles or a
controlled SQL workflow.
- `local-path` is the default StorageClass. The single K3s node reports
no memory, disk, or PID pressure; allocatable ephemeral storage is
about 97.7 GB and memory is about 3.8 GiB. Existing CNPG PVC footprint
before `apps-pg` was two 10Gi PVCs (`gitea-db-1`,
`net-kingdom-pg-1`).
- `databases` exists with `default-deny-all`; `cnpg-system` has the
required `kubernetes.io/metadata.name=cnpg-system` namespace label.
- The live CNPG CRD rejected `spec.postgresql.version`; the deployed
`apps-pg` manifest therefore pins PostgreSQL 16 with
`imageName: ghcr.io/cloudnative-pg/postgresql:16`.
- `apps-pg` is deployed in `databases`, reports `Cluster in healthy
state`, and has primary `apps-pg-1`.
- Services `apps-pg-rw` and `apps-pg-ro` exist. With one instance,
`apps-pg-ro` is present but has no replica endpoint until HA is added.
- A disposable namespace labeled
`railiance.io/postgres-client=apps-pg` successfully connected to
`apps-pg-rw.databases.svc.cluster.local:5432/apps_meta` as
`apps_admin`; the temporary namespace and copied smoke-test secret
were deleted immediately after the check.
## Safety Contract
- Do not commit plaintext credentials. Bootstrap secret is a one-time
manual `kubectl create secret` then SOPS-encrypt a template into
`helm/apps-pg-secret.sops.yaml.template`.
- Do not expose `apps_admin` to S5 applications. It is a platform
bootstrap/smoke-test role, not a runtime credential.
- Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`.
This cluster is for S5 *application* DBs.
- Preserve the default-deny NetworkPolicy posture in `databases`;
only allow ingress from namespaces that have a registered consumer.
- Do not advertise self-service role creation until the role
provisioning mechanism is explicit. CNPG `Database` CRs still require
their owner role to exist.
- Initial sizing is conservative (1 instance, 10Gi) to match the
existing per-cluster footprint. Resize is a follow-up workplan.
- Cluster name `apps-pg` is locked once published — renaming changes
every consumer DSN.
## Target State
- `kubectl get cluster apps-pg -n databases` reports
`Cluster in healthy state` with the primary `apps-pg-1`.
- `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists.
- NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet.
- `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs`
targets exist and work.
- Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`)
exist for cluster health probes and to anchor the bootstrap; the
cluster is otherwise empty of per-app data.
- Documentation explains how an S5 consumer registers a new database,
including the current CNPG boundary: the `Database` CR is separate,
but role lifecycle is cluster-scoped and therefore governed by the
platform contract.
- `railiance-apps` is notified via the hub thread; their
`RAILIANCE-WP-0002 T04` can proceed using the documented onboarding
path.
## Tasks
### T01 — Inventory and capacity check
```task
id: RAILIANCE-WP-0003-T01
status: done
priority: high
state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749"
```
Confirm the substrate before adding a new cluster.
Checks:
- CNPG operator version (≥ 1.28.x required for the `Database` CR
consumer pattern).
- Role/database API boundary: `Database` CR is present; role lifecycle
is `.spec.managed.roles` or controlled SQL, not a separate `Role` CR.
- Node-level disk space available for an additional 10Gi PVC
(`local-path` storage class is the active default).
- Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any
current resource pressure.
- That the `databases` namespace already exists and has its
default-deny NetworkPolicy in place.
- That `cnpg-system` namespace label
`kubernetes.io/metadata.name=cnpg-system` is set (required by the
ingress-from-operator NetworkPolicy).
**Done when:** the implementation notes record CNPG version, available
PVC capacity, the chosen role onboarding mechanism, and any
pre-condition gaps.
---
### T02 — Create bootstrap credential secret
```task
id: RAILIANCE-WP-0003-T02
status: done
priority: high
state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08"
```
Mint the one-time bootstrap secret that CNPG uses to create the initial
`apps_admin` role.
Steps:
```bash
APPS_PG_PW=$(openssl rand -base64 32)
kubectl create secret generic apps-pg-credentials \
--namespace databases \
--from-literal=username=apps_admin \
--from-literal=password="$APPS_PG_PW"
```
Then commit a SOPS-encrypted template:
- `helm/apps-pg-secret.sops.yaml.template` — encrypted form for
declarative reapply; do not commit the plaintext password.
The bootstrap role is intentionally not a consumer role. Per-app runtime
roles are created later through the onboarding mechanism documented in
T06; until dedicated automation exists, that mechanism is
platform-administered.
**Done when:** the secret exists in the cluster and an encrypted
template is committed.
---
### T03 — Add the CNPG Cluster manifest
```task
id: RAILIANCE-WP-0003-T03
status: done
priority: high
state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12"
```
Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`.
Do not add app-specific roles or databases to the baseline cluster
manifest unless T01 explicitly chooses a platform-owned managed-role
stanza as the interim onboarding path for the first consumer.
Shape:
```yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: apps-pg
namespace: databases
labels:
app.kubernetes.io/name: apps-pg
app.kubernetes.io/component: database
app.kubernetes.io/managed-by: manual
railiance.io/layer: s3-platform
railiance.io/role: shared-apps-database
spec:
instances: 1 # bump when node RAM > 8GB
imageName: ghcr.io/cloudnative-pg/postgresql:16
storage:
size: 10Gi
bootstrap:
initdb:
database: apps_meta
owner: apps_admin
secret:
name: apps-pg-credentials
```
Note: PG version is 16 (matches vergabe-teilnahme's minimum and the
existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision.
**Done when:** the manifest is committed and `kubectl apply --dry-run`
validates against the cluster.
---
### T04 — Add NetworkPolicies for apps-pg
```task
id: RAILIANCE-WP-0003-T04
status: done
priority: high
state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1"
```
Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet
but parameterised for the *apps* consumer namespace pattern.
Three policies (all in `databases`, all selecting
`cnpg.io/cluster: apps-pg`):
1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443.
2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from
`namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP
ports 5432 / 8000 / 9187.
3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432
from any namespace carrying the label
`railiance.io/postgres-client=apps-pg`. (Each consuming app
namespace adds this label; this avoids hard-coding a namespace list
in the platform repo.)
The label-based selector is the meaningful difference from gitea-db,
which hard-codes `default`. The shared cluster cannot know its
consumer namespaces in advance, so it expects a positive opt-in label.
**Done when:** the policies are committed and applied; consumer namespaces
can connect after applying the `railiance.io/postgres-client=apps-pg`
label.
---
### T05 — Makefile targets, deploy, verify
```task
id: RAILIANCE-WP-0003-T05
status: done
priority: high
state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648"
```
Add targets that mirror the `db-*` (gitea-db) family:
```make
apps-pg-deploy: ## Apply shared apps-pg CNPG Cluster + NetworkPolicies
$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml
apps-pg-status: ## Show apps-pg CNPG cluster health
$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
$(KUBECTL) get cluster apps-pg -n databases -o wide
apps-pg-shell: ## Open psql shell on apps-pg primary
$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta
apps-pg-logs: ## Tail apps-pg primary logs
$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50
```
Then deploy and wait for the cluster to converge:
```bash
make apps-pg-deploy
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m
```
Smoke checks:
- `cnpg status` reports `Cluster in healthy state`.
- Services `apps-pg-rw` and `apps-pg-ro` exist.
- From a disposable pod in a temporary namespace labeled
`railiance.io/postgres-client=apps-pg`, a platform-operated test
connection to `apps-pg-rw.databases:5432/apps_meta` succeeds. Delete
the temporary namespace and any copied test secret immediately after
the check; do not place `apps_admin` in an application namespace.
**Done when:** the smoke checks pass.
---
### T06 — Reply to railiance-apps, document the consumer contract
```task
id: RAILIANCE-WP-0003-T06
status: done
priority: medium
state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547"
```
Notify the requester and capture the pattern.
Steps:
- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` through the
State Hub `/messages/` REST API with this workplan's id and the
cluster's connection details. Do not send bootstrap credentials.
- Add `docs/apps-pg.md` with:
- Cluster identity and connection endpoints.
- The per-app onboarding recipe: (a) request/approve a per-app role,
(b) provision the backing role and credential through the chosen
platform mechanism, (c) create the CNPG `Database` CR in the
`databases` namespace with `spec.cluster.name: apps-pg` and
`spec.owner` set to the approved role, (d) label the consumer
namespace `railiance.io/postgres-client=apps-pg`, (e) publish or
mirror the runtime Secret into the consumer namespace, and (f) wire
the DSN into the application Helm values.
- The CNPG 1.28 boundary: `Database` is standalone; role management is
not a standalone `Role` CR and must follow the platform contract.
- Backup posture (when the cluster is added to the existing platform
backup process) and the resize / replicate roadmap.
**Done when:** the message is replied to and `docs/apps-pg.md` is
committed.
## Completion Criteria
This workplan is complete when:
1. `apps-pg` reports healthy in the `databases` namespace.
2. NetworkPolicies enforce the default-deny posture with label-based
consumer opt-in.
3. Makefile targets work end-to-end.
4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly
acknowledged via the hub thread.
5. `docs/apps-pg.md` explains the consumer onboarding contract,
including the CNPG role/database boundary.
## Notes
- This intentionally does **not** hard-code the `vergabe` role or
`vergabe_db` into the shared cluster baseline. The consumer onboarding
doc must describe the follow-up request/manifest needed for
`railiance-apps` so the platform layer stays generic until an app
explicitly registers.
- Backup inclusion of `apps-pg` is a follow-up. The existing
`make backup` target only covers the legacy PostgreSQL-HA setup;
CNPG backup configuration is its own workplan.
- A second replica (HA) and a connection pooler (PgBouncer / CNPG
`Pooler`) are deferred. The cluster spec leaves room
for both — re-enable when node capacity allows.

View File

@@ -0,0 +1,281 @@
---
id: RAILIANCE-WP-0004
type: workplan
title: "Establish ArgoCD GitOps bootstrap contract"
domain: financials
repo: railiance-platform
status: finished
owner: codex
topic_slug: railiance
planning_priority: high
planning_order: 4
created: "2026-06-19"
updated: "2026-06-25"
state_hub_workstream_id: "e57e487b-8557-439d-8093-0457c73ede93"
---
# RAILIANCE-WP-0004 - Establish ArgoCD GitOps Bootstrap Contract
## Goal
Establish the minimal platform-owned ArgoCD GitOps contract needed for
Railiance application teams to deploy through the already-installed ArgoCD
instance on `railiance01`.
This work responds to the `issue-core` dependency message from 2026-06-18:
ArgoCD is installed and healthy on `railiance01`, but unused. `issue-core`
will be the first tenant Application and needs platform decisions before it
can author its workload deployment.
## Intent Alignment
`INTENT.md` defines this repo as the shared platform-services layer:
stateful services, secret custody, stable interfaces, and recoverable
operational contracts.
ArgoCD itself is not an application, database, or secret store. The work in
this repo is therefore intentionally limited to the platform contract around
GitOps:
- repository trust and credential registration for ArgoCD;
- AppProject guardrails that keep tenant syncs inside expected boundaries;
- a root app-of-apps entrypoint that provides a stable onboarding surface;
- the OpenBao-backed runtime secret delivery convention tenants must use.
Application workloads, container images, per-service manifests, and business
logic remain owned by the tenant repos.
## Scope
In scope:
- Define the bootstrap manifests for ArgoCD AppProjects and the root
app-of-apps Application.
- Define how Git source repositories are registered without committing
credentials.
- Define where tenant Application manifests are placed and how they point
back to tenant-owned workload manifests.
- Confirm the runtime secret delivery pattern: OpenBao custody delivered to
Kubernetes via External Secrets Operator by default; CSI-mounted files only
when a workload requires file references; OpenBao injector remains disabled.
- Provide an `issue-core` pilot Application example so that repo can author
its final manifest against a concrete contract.
Out of scope:
- Installing or upgrading ArgoCD itself; that is cluster/runtime ownership.
- Moving S5 application workload manifests into this repo.
- Storing ArgoCD repository credentials, API tokens, or application secrets in
Git, workplans, State Hub, or chat.
- Applying live manifests that require operator-owned credentials.
## Decisions
### D-01 - Bootstrap Layout
Use this repo only for the platform-owned GitOps bootstrap:
```text
argocd/bootstrap/ AppProjects and root app-of-apps Application
argocd/applications/ thin tenant Application manifests reviewed by platform
argocd/repositories/ SOPS templates for ArgoCD repository Secret objects
docs/argocd-gitops.md GitOps contract and onboarding guidance
```
The root Application syncs `argocd/applications/` from this repo. Tenant
Application manifests in that directory point to workload manifests in each
tenant repo, normally `k8s/railiance/`.
### D-02 - AppProject Model
Create two AppProjects:
- `railiance-bootstrap` only allows the root app to manage ArgoCD
`Application` objects in the `argocd` namespace.
- `railiance-tenants` allows tenant Applications to sync ordinary namespaced
workload resources into their own namespaces, plus namespace creation. It
does not grant CRD, ClusterRole, ClusterRoleBinding, or arbitrary
cluster-admin authority.
### D-03 - Sync Policy
Default tenant Applications use automated sync with prune and self-heal
enabled after platform review. Recommended sync options are:
```yaml
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- ApplyOutOfSyncOnly=true
- PruneLast=true
```
Sync waves are reserved for dependency ordering. Platform services and secret
delivery resources should sync before workloads that consume them.
### D-04 - Secret Delivery
OpenBao remains the canonical runtime secret custody service. For ordinary
Kubernetes workloads, use External Secrets Operator to materialize OpenBao
values as Kubernetes Secrets. Do not use the OpenBao injector in the current
deployment.
Runtime path convention for workload credential custody:
```text
platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>
```
Kubernetes namespace and service-account bounds belong in the auth role or
External Secrets binding unless the namespace is itself the approved workload
identity.
ArgoCD repository credentials are operator credentials, not workload secrets,
and should live under:
```text
platform/operators/argocd/repositories/<repo-name>
```
## Current Evidence
- State Hub inbox message `d7a18ff9-e6c6-4e44-a39e-78369e530dfc` reports
ArgoCD is installed and healthy on `railiance01`, with zero Applications,
zero ApplicationSets, zero registered repositories, and only the stock
`default` AppProject.
- `INTENT.md` and `SCOPE.md` keep this repo focused on shared platform
services and secret custody. This work therefore creates a bootstrap
contract and secret-delivery convention, not app workload ownership.
- `docs/openbao.md` already states the preferred delivery pattern:
External Secrets Operator for values that become Kubernetes Secrets, CSI for
file-reference workloads, and no OpenBao injector in the current deployment.
## Follow-up Progress (2026-06-25)
- Added a platform-owned `railiance-platform-addons` AppProject for
cluster-scoped add-ons.
- Added the `external-secrets` ArgoCD Application for External Secrets
Operator and the `openbao-secretstore` Application for
`ClusterSecretStore/openbao`.
- Added the least-privilege OpenBao policy and Kubernetes auth role helper for
the issue-core ESO pilot. The role binds only the
`external-secrets/external-secrets` service account and reads only
`platform/workloads/issue-core/issue-core/*`.
- Limited the initial `ClusterSecretStore/openbao` to the `issue-core`
namespace; broaden only through a later platform review.
## Target State
- `argocd/bootstrap/` contains the two AppProjects and root app-of-apps
Application.
- `argocd/applications/` documents the tenant Application contract and includes
an `issue-core` example manifest.
- `argocd/repositories/` contains non-secret SOPS templates for ArgoCD
repository registration.
- `docs/argocd-gitops.md` answers the four questions raised by `issue-core`:
repository registration, source layout, sync policy, and secret delivery.
- Make targets exist for dry-run, deploy, status, and SOPS-backed repository
secret application.
- `issue-core` can author its final Application and workload manifests against
this contract without waiting for more platform design.
## Tasks
### T01 - Review intent and scope boundary
```task
id: RAILIANCE-WP-0004-T01
status: done
priority: high
state_hub_task_id: "7cb56ad6-5435-41af-b416-e68fe661b7a0"
```
Review `INTENT.md`, `SCOPE.md`, existing OpenBao delivery docs, and the
`issue-core` inbox request. Capture the boundary that ArgoCD bootstrap belongs
here only as a platform trust and secret-delivery contract.
### T02 - Add ArgoCD bootstrap manifests
```task
id: RAILIANCE-WP-0004-T02
status: done
priority: high
state_hub_task_id: "68f7ef19-686d-4d16-bf75-ffcbba158023"
```
Add AppProject manifests and the root app-of-apps Application under
`argocd/bootstrap/`.
Done when the manifests can be rendered by `kubectl apply -k` and avoid secret
material.
### T03 - Define tenant onboarding and repository registration
```task
id: RAILIANCE-WP-0004-T03
status: done
priority: high
state_hub_task_id: "e6dc9176-af33-4216-9871-a61ad7e69943"
```
Add documentation and templates for tenant Applications, per-repo ArgoCD
repository Secret registration, and the `issue-core` pilot example.
### T04 - Confirm OpenBao-backed secret delivery
```task
id: RAILIANCE-WP-0004-T04
status: done
priority: high
state_hub_task_id: "d859e4ef-d8d1-4403-8225-839925f8bedf"
```
Document that OpenBao remains the runtime custody authority, External Secrets
Operator is the default Kubernetes delivery mechanism, CSI is reserved for
file-reference workloads, and the OpenBao injector remains disabled.
### T05 - Operator live bootstrap
```task
id: RAILIANCE-WP-0004-T05
status: done
priority: high
state_hub_task_id: "981f46c0-8dd7-4111-9a4f-2ca58ddb0664"
```
Apply the bootstrap and repository credentials to live ArgoCD after these repo
changes are merged to the Git source ArgoCD reads and, if the source repo is
private, after an operator provides or materializes read-only repository
credentials through the approved OpenBao/operator path.
Applied 2026-06-19 on the live ArgoCD cluster (`92.205.130.254`, default
`~/.kube/config`). `make argocd-bootstrap-dry-run` and
`make argocd-bootstrap-deploy` succeeded. Repository registration was skipped
because `railiance-platform` and `issue-core` Gitea repos are currently public.
Post-bootstrap status:
- `railiance-apps-root`: Synced / Healthy
- `issue-core`: OutOfSync / Missing — sync fails because
`external-secrets.io/ExternalSecret` CRD is not installed on the cluster
Do not paste credentials into the workplan, State Hub, or chat.
### T06 - Notify first tenant
```task
id: RAILIANCE-WP-0004-T06
status: done
priority: medium
state_hub_task_id: "73bdda1d-8e25-48d2-ab92-b203c5050d45"
```
Reply to `issue-core` with the GitOps contract pointer and confirm that it owns
the final `issue-core` Application proposal and workload manifests. Include the
OpenBao path convention for `ISSUE_CORE_API_KEY` and the Gitea backend token.
State Hub reply: `56df276d-77d0-427f-92a5-a99cacc1290f`.

View File

@@ -0,0 +1,359 @@
---
id: RAILIANCE-WP-0006
type: workplan
title: "Workload KV Access Lanes for ops-warden Fetch"
domain: financials
repo: railiance-platform
status: finished
owner: codex
topic_slug: railiance
planning_priority: high
planning_order: 6
created: "2026-06-27"
updated: "2026-06-29"
depends_on_workplans:
- RAIL-PL-WP-0002
- RAILIANCE-WP-0004
related_state_hub_messages:
- "551031d1-335e-4db8-9535-820fea52d0a3"
- "f76d3a9e-a98f-4081-885d-b79d94312699"
state_hub_workstream_id: "96c8a93d-7a5a-4fa9-8f7b-865119551da3"
---
# RAILIANCE-WP-0006 - Workload KV Access Lanes for ops-warden Fetch
## Goal
Provision concrete, least-privilege OpenBao workload KV read lanes that
`ops-warden` can expose through `warden access --fetch` / `--exec` without
holding secret values itself.
The immediate request is for `whynot-design` to retrieve its npm publish token.
The path must be concrete, policy-scoped, and documented so the ops-warden
catalog can replace the current unresolved template path with a live
`whynot-design-npm-publish` entry.
No task in this workplan may paste, commit, log, or send secret values through
Git, State Hub, chat, prompts, or workplan text.
## Requirements Reviewed
Ops-warden message `551031d1-335e-4db8-9535-820fea52d0a3` asks
`railiance-platform` to provide non-secret pointers for:
- a concrete OpenBao KV path and field for `NPM_AUTH_TOKEN`;
- the KV mount used by the path;
- the OIDC login role for whynot-design or its operator identity;
- a read policy scoped to whynot-design's identity/service account;
- the flex-auth policy reference, if pre-approval is required.
Once these pointers are live, ops-warden will add a dedicated
`whynot-design-npm-publish` access catalog entry and a playbook, then notify
whynot-design.
## Proposed Contract
Use the workload credential convention documented in `docs/openbao.md`:
```text
platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>
```
For this lane, the proposed non-secret contract is:
| Item | Proposed value |
| --- | --- |
| KV mount | `platform` |
| Tenant/org | `coulomb` |
| Workload/project | `whynot-design` |
| CLI path | `platform/workloads/coulomb/whynot-design/npm-publish` |
| KV-v2 policy data path | `platform/data/workloads/coulomb/whynot-design/npm-publish` |
| KV-v2 policy metadata path | `platform/metadata/workloads/coulomb/whynot-design/npm-publish` |
| Secret field | `NPM_AUTH_TOKEN` |
| OpenBao read policy | `workload-kv-read-whynot-design-npm-publish` |
| OIDC auth mount | `netkingdom` unless KeyCape compatibility requires `keycape` |
| OIDC role | `whynot-design-workload-kv-read` |
| Kubernetes auth role | `whynot-design-workload-kv-read` if an in-cluster service account consumes it |
| flex-auth ref | `secret.read:whynot-design` if tenant policy requires pre-approval |
The expected caller-facing read shape is:
```bash
bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read
bao kv get -field=NPM_AUTH_TOKEN platform/workloads/coulomb/whynot-design/npm-publish
```
The command shape is illustrative only. Verification must avoid printing the
secret value; use attended operator checks or commands that prove read access
without persisting the token in logs.
## Tasks
## T01 - Capture ops-warden request and path contract
```task
id: RAILIANCE-WP-0006-T01
status: done
priority: high
state_hub_task_id: "0c93496a-48bf-44e7-a75b-52e51e2639bc"
```
Record the ops-warden request, existing workload path convention, and proposed
whynot-design contract in this workplan.
Acceptance:
- The workplan names the concrete path, field, mount, policy, auth role, and
optional flex-auth ref needed by ops-warden.
- The plan distinguishes non-secret pointers from secret values.
- The plan keeps this workload KV read lane separate from
`RAILIANCE-WP-0005`, which tracks short-lived OpenBao token issuance for the
ops-warden signing smoke.
**2026-06-27:** Reviewed the unread ops-warden request and existing
`platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>` convention.
Captured the proposed `whynot-design` npm publish lane above with no secret
values.
## T02 - Add least-privilege OpenBao read policy
```task
id: RAILIANCE-WP-0006-T02
status: done
priority: high
state_hub_task_id: "9c06d531-2566-4767-aa2f-8339605f23d5"
```
Create a concrete policy artifact for the whynot-design npm publish lane,
derived from `openbao/policies/workload-kv-read-template.hcl` but narrowed to
the selected `npm-publish` path.
Acceptance:
- A policy file under `openbao/policies/` defines read access to the exact
`platform/data/workloads/coulomb/whynot-design/npm-publish` path.
- Metadata/list capabilities are only as broad as needed for the caller and
ops-warden fetch UX.
- The policy grants no write, delete, patch, sudo, auth, or unrelated workload
capabilities.
- The policy name matches the pointer intended for ops-warden:
`workload-kv-read-whynot-design-npm-publish`.
**2026-06-27:** Added the concrete policy artifact at
`openbao/policies/workload-kv-read-whynot-design-npm-publish.hcl`. It grants
only `read` on the exact KV-v2 data and metadata paths for
`platform/workloads/coulomb/whynot-design/npm-publish`; it does not grant
write/delete/list/sudo/auth or sibling workload access. Added
`scripts/openbao-apply-workload-kv-lanes.sh`,
`make openbao-workload-kv-lanes-dry-run`, and
`make openbao-configure-workload-kv-lanes` for the source-owned policy apply
step. Dry-run passed. A live apply attempt with
`OPENBAO_WORKLOAD_KV_ARGS=--use-token-helper` reached unsealed OpenBao but was
denied with `403 permission denied` while writing the policy, so live policy
application waits on an approved platform-admin/operator token or a narrow
token-helper capability.
**2026-06-28:** Using the temporary operator token provided outside the repo,
Codex applied/confirmed the live policy in OpenBao. The verification read of the
policy succeeded and no secret values were printed or recorded.
## T03 - Define and apply auth bindings
```task
id: RAILIANCE-WP-0006-T03
status: done
priority: high
state_hub_task_id: "a217371a-0f85-40c6-b691-ac67834c86b5"
```
Define the auth role that lets whynot-design or an approved operator identity
read the lane as itself.
Acceptance:
- The OIDC login role is documented as
`bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read`,
or a different approved role is recorded with the reason.
- The role attaches only the whynot-design npm publish read policy.
- If an in-cluster whynot-design service account consumes the token, the
Kubernetes auth role binds only the approved namespace and service account.
- Compatibility with the legacy `keycape` auth mount is either configured or
explicitly declined.
**2026-06-27:** Documented the intended OIDC role pointer as
`auth/netkingdom/role/whynot-design-workload-kv-read` in
`docs/workload-kv-access-lanes.md`. Live application is waiting on confirmation
of the KeyCape/NetKingdom whynot-design bound claim or approved service-account
subject; do not create an unbounded OIDC role.
**2026-06-28:** Created/confirmed
`auth/netkingdom/role/whynot-design-workload-kv-read` with
`groups=["whynot-design"]`, only the
`workload-kv-read-whynot-design-npm-publish` policy, `ttl=15m`, and the approved
browser/local CLI callback URIs.
**2026-06-28:** Positive verification found the OIDC role was missing
`oidc_scopes`, causing OpenBao login to fail with `groups claim not found`.
Updated the live role and source CCR to request `openid`, `profile`, `email`,
and `groups`, matching the platform-admin OIDC scope shape.
## T04 - Provision the KV path without exposing the token
```task
id: RAILIANCE-WP-0006-T04
status: done
priority: high
state_hub_task_id: "c43724a3-c83e-4ab6-b7d1-e427fd93a9a9"
```
Have an approved operator create or confirm the OpenBao KV entry for the npm
publish token.
Acceptance:
- The path exists at
`platform/workloads/coulomb/whynot-design/npm-publish`.
- The field is named exactly `NPM_AUTH_TOKEN`.
- The token value is entered through an approved operator/OpenBao path and is
never written to Git, State Hub, chat, prompts, shell history, or workplan
text.
- Non-secret evidence records only the path, field name, actor, timestamp,
policy name, and verification result.
**2026-06-27:** The concrete path and field are now documented. Live secret
provisioning is waiting on an approved operator/OpenBao custody path for the
actual `NPM_AUTH_TOKEN` value.
**2026-06-28:** Confirmed the OpenBao metadata at
`platform/workloads/coulomb/whynot-design/npm-publish` includes
`catalog-id=whynot-design-npm-publish` and that the `NPM_AUTH_TOKEN` field is
present. The value was not printed, recorded, or copied into Git, State Hub,
chat, or workplans.
## T05 - Verify caller-scoped fetch behavior
```task
id: RAILIANCE-WP-0006-T05
status: done
priority: high
state_hub_task_id: "dc1f470b-e78a-48a9-9957-965aed47861f"
```
Prove that the authorized identity can read the token through the intended
OpenBao path and that unauthorized identities cannot.
Acceptance:
- An approved whynot-design identity or operator role can authenticate and
perform the fetch without unresolved `<...>` placeholders.
- Negative verification shows a non-whynot identity cannot read the path.
- Verification output contains no token value.
- OpenBao audit evidence exists for the authorized read and denied read, with
only non-secret request ids/timestamps recorded in the workplan or State Hub.
**2026-06-27:** Verification is waiting on live policy/role application and
secret provisioning. The runbook requires positive and negative fetch evidence
without printing the token value.
**2026-06-28:** Non-secret operator checks now pass for policy, auth role,
metadata, and field presence. Remaining verification is the attended
whynot-design OIDC positive check and a non-whynot denial check, both without
printing the token.
**2026-06-29:** Positive and negative caller verification passed without
printing the token value. The negative check failed OIDC login with the expected
groups bound-claim mismatch. `platform-root` was restored to the
`whynot-design` group after the temporary negative-test removal.
## T06 - Coordinate ops-warden catalog activation
```task
id: RAILIANCE-WP-0006-T06
status: done
priority: high
state_hub_task_id: "8e84ec19-01db-4baf-a532-de87e51d4994"
```
Send ops-warden the non-secret pointers needed to create and activate its
dedicated access catalog entry.
Acceptance:
- The State Hub reply to ops-warden includes only path, field, KV mount,
OIDC role, policy name/path, optional flex-auth ref, and runbook location.
- Ops-warden confirms the `whynot-design-npm-publish` catalog entry no longer
contains unresolved placeholders.
- `warden access "npm auth token" --fetch` or the agreed exact selector resolves
to the whynot-design lane and proxies the read as the caller.
- ops-warden confirms it holds no token value and only proxies OpenBao access.
**2026-06-27:** Added `docs/workload-kv-access-lanes.md` with the non-secret
handoff payload for ops-warden and sent the pointers by State Hub message. The
entry should remain draft/non-active until live OpenBao provisioning and
verification complete.
**2026-06-28:** The generic `openbao-api-key` ops-warden access lane can proxy
the check with explicit `--path` and `--field`, but the dedicated
`whynot-design-npm-publish` route is not yet present in the ops-warden routing
catalog. Keep activation pending until caller verification and catalog update.
**2026-06-29:** `CCR-2026-0001` is now active with
`access_frontdoor.readiness=ready` and `resolvable=true`. ops-warden still needs
to confirm that its dedicated `whynot-design-npm-publish` catalog selector
resolves through the caller-scoped lane.
**2026-06-29:** ops-warden confirmed in State Hub message
`f76d3a9e-a98f-4081-885d-b79d94312699` that catalog selector
`whynot-design-npm-publish` is `status: active`, `resolvable: true`, and wired
to the owner-confirmed lane:
`platform/workloads/coulomb/whynot-design/npm-publish`, field
`NPM_AUTH_TOKEN`, OIDC role `whynot-design-workload-kv-read`, and policy
`workload-kv-read-whynot-design-npm-publish`. ops-warden also confirmed it
notified whynot-design with `warden access whynot-design-npm-publish --exec -- npm publish`,
and that the sibling lanes remain draft for separate planning.
## T07 - Decide whether to batch sibling workload-KV requests
```task
id: RAILIANCE-WP-0006-T07
status: done
priority: medium
state_hub_task_id: "0b3ab5f5-e933-41f2-b29a-ab4ac50593aa"
```
Ops-warden noted similar still-open access lanes for
`issue-core-ingestion-api-key` and `openrouter-llm-connect`. Decide whether to
batch those paths in the same provisioning pass or keep this workplan scoped to
whynot-design.
Acceptance:
- The decision is recorded without secret values.
- If batching is approved, add concrete sub-tasks or a follow-up workplan for
each additional lane.
- If batching is deferred, notify ops-warden that this workplan will deliver
whynot-design first and leave the sibling entries for separate planning.
**2026-06-27:** Initially deferred sibling lanes (`issue-core-ingestion-api-key`
and `openrouter-llm-connect`) so the whynot-design npm token request could be
serviced first. The later ops-warden batch follow-up is now represented as
proposed CCRs in `RAILIANCE-WP-0007`, still unapproved and unresolvable until
human review and verification.
**2026-06-29:** Reviewed the sibling lane suggestions against `INTENT.md`.
Created follow-up workplans `RAILIANCE-WP-0009` for the issue-core runtime
ingestion credential lane and `RAILIANCE-WP-0010` for the llm-connect
OpenRouter provider key lane. Both plans keep this repo's scope limited to
shared platform secret custody, least-privilege OpenBao/External Secrets
delivery, verification, and ops-warden front-door handoff.
## Exit Criteria
- The whynot-design npm publish token has a concrete OpenBao KV path, field,
read policy, and auth role.
- The authorized caller can fetch the token as itself through OpenBao and
ops-warden without ops-warden storing the value.
- Unauthorized reads are denied.
- ops-warden has enough non-secret pointers to activate
`whynot-design-npm-publish`.
- No secret values appear in Git, State Hub, chat, prompts, logs, or workplans.

View File

@@ -0,0 +1,404 @@
---
id: RAILIANCE-WP-0007
type: workplan
title: "Credential Change Proposal Review Workflow"
domain: financials
repo: railiance-platform
status: finished
owner: codex
topic_slug: railiance
planning_priority: high
planning_order: 7
created: "2026-06-27"
updated: "2026-06-30"
depends_on_workplans:
- RAIL-PL-WP-0002
- RAILIANCE-WP-0005
- RAILIANCE-WP-0006
state_hub_workstream_id: "4d7ce243-f40a-4249-a46a-a24f75d6fe4c"
---
# RAILIANCE-WP-0007 - Credential Change Proposal Review Workflow
## Goal
Create a proposal -> review -> approve/deny with comment -> apply -> verify
workflow for credential and it-sec changes, so operators do not need to author
or mentally validate raw OpenBao commands.
The first target is the whynot-design npm token lane from `RAILIANCE-WP-0006`.
The workflow should then generalize to workload KV paths, OpenBao token roles,
ops-warden access catalog entries, External Secrets lanes, credential rotation,
deactivation, and compromise handling.
## Direction
Do not start by extending OpenBao. Instead, build a small approval control
plane around OpenBao:
- OpenBao remains the enforcement, secret storage, token, and audit engine.
- State Hub stores non-secret request lifecycle, comments, decisions, and
evidence.
- Repo files store reviewable non-secret request specs and generated policy
artifacts.
- Agents and CLIs create proposals and render them for human review.
- Humans approve or deny with comments.
- Only approved requests can be applied by an operator-controlled runner or
interactive runbook.
If the workflow proves valuable, a later UI or OpenBao extension can surface the
same request index and statuses.
## Proposed Object
Introduce a non-secret Credential Change Request, or `CCR`.
Each CCR captures:
- request id, title, requester, reviewer, approver, and applier;
- target tenant/workload/environment/purpose;
- OpenBao mount, path, fields, policies, auth roles, and bound claims;
- access front door such as ops-warden, External Secrets, CSI, or direct caller
fetch;
- risk classification and approval requirements;
- generated apply plan and verification plan;
- rollback, deactivate, rotate, and compromise response plan;
- comments, decision, timestamps, and non-secret audit evidence.
Each CCR explicitly excludes secret values, token values, private keys,
passwords, unseal/recovery material, and secret-bearing command output.
## Tasks
## T01 - Record the approval workflow design
```task
id: RAILIANCE-WP-0007-T01
status: done
priority: high
state_hub_task_id: "c82ee783-80f1-48da-a9ed-4565eac699fc"
```
Document the desired operator workflow and why it should sit around OpenBao
rather than inside the OpenBao UI initially.
Acceptance:
- The design describes the proposal, review, approval/denial, apply, verify,
activate, deactivate, rotate, and compromised states.
- The design names where State Hub, OpenBao, ops-warden, repo files, agents,
and interactive runbooks fit.
- The design keeps secret values out of State Hub, Git, chat, and prompts.
**2026-06-27:** Added `docs/credential-change-approval.md` with the control
plane direction, CCR object, state machine, State Hub/OpenBao/ops-warden roles,
interactive runbook role, and compromise/deactivation path.
## T02 - Define the CCR schema and storage layout
```task
id: RAILIANCE-WP-0007-T02
status: done
priority: high
state_hub_task_id: "d50fb9e2-68c2-4a2b-8476-ce646d13e60a"
```
Create a versioned non-secret schema for credential change requests.
Acceptance:
- A schema exists for `workload-kv-read` requests covering mount, path, fields,
policy name, auth role, bound claims, access front door, verification plan,
and activation conditions.
- The schema supports decision metadata: requested, proposed, approved,
denied, needs_changes, applied, verified, active, deactivated, rotated,
compromised, superseded, and cancelled.
- The schema supports comments and references State Hub ids without storing
secrets.
- Example CCR fixtures include the whynot-design npm token lane.
**2026-06-27:** Added `schemas/credential-change-request.schema.yaml`, the
`credential-change-requests/` storage directory, and
`credential-change-requests/CCR-2026-0001-whynot-design-npm-publish.yaml` as the
first non-secret CCR fixture. The whynot CCR is intentionally `proposed` and
marks the bound claim as unconfirmed, so apply is blocked until review.
## T03 - Add offline validation and rendering
```task
id: RAILIANCE-WP-0007-T03
status: done
priority: high
state_hub_task_id: "012f05cd-30ce-43dd-802b-4acc938db133"
```
Add a helper that validates CCR files and renders human review summaries.
Acceptance:
- Invalid CCRs fail before any OpenBao apply is attempted.
- The renderer produces a compact review block that a human can understand in
chat or State Hub.
- The renderer highlights risky fields: broad claims, wildcard paths,
privileged policies, missing negative verification, and missing deactivation
plan.
- A secret-pattern scan rejects likely token values in CCR files.
**2026-06-27:** Added `scripts/credential-change.py validate` and `render`,
plus Make targets `credential-change-validate` and `credential-change-render`.
Validation rejects secret-looking markers and broad/unsafe request shapes; render
produces the chat/State Hub review summary and highlights unconfirmed bound
claims. CCRs now also carry machine-readable front-door readiness fields:
`access_frontdoor.readiness` and `access_frontdoor.resolvable`. Unit coverage
lives in `tests/test_credential_change.py`.
## T04 - Generate OpenBao apply plans from approved CCRs
```task
id: RAILIANCE-WP-0007-T04
status: done
priority: high
state_hub_task_id: "1b2e7752-815c-46f8-a2e2-212e8d04da80"
```
Generate deterministic, reviewable OpenBao apply plans from CCRs.
Acceptance:
- A workload KV CCR can generate policy HCL and auth-role commands or API
payloads.
- The plan includes a dry-run mode and a diff against existing source
artifacts when available.
- Applying a plan is refused unless the CCR is approved.
- The applier uses an approved operator authority path and does not accept raw
tokens in argv or logs.
**2026-06-27:** Added `plan` and guarded `apply-plan` rendering for workload KV
CCRs, with Make targets `credential-change-plan` and
`credential-change-apply-plan`. `apply-plan` currently refuses any CCR that is
not `approved` and also refuses unconfirmed bound claims. Remaining T04 work is
to add a richer diff against existing source artifacts and eventually bridge
from reviewed plan to the interactive live applier.
**2026-06-28:** Added OIDC `allowed_redirect_uris` to the CCR contract and
generated role payloads after live OpenBao rejected an OIDC role without
callbacks. Unit coverage now checks the generated whynot-design role payload.
**2026-06-30:** Added source-artifact diff rendering to `plan` and delegated
`applier-dry-run` output. The generated plan now reports whether the checked-in
policy artifact matches the CCR-generated HCL and shows a unified diff when it
does not. Approved-only `apply-plan`/`operator-commands` remain gated by CCR
status and confirmed auth binding.
## T05 - Add chat/CLI approval commands
```task
id: RAILIANCE-WP-0007-T05
status: done
priority: high
state_hub_task_id: "e6d4d2d1-1881-4db7-92f8-05e3fdb846ae"
```
Make the workflow usable from chat and command line.
Acceptance:
- Operators can approve, deny, or request changes with a comment.
- Approvals/denials are recorded as non-secret State Hub events and in the CCR
file or linked decision record.
- The system refuses apply when the latest human decision is denied or
needs_changes.
- Agents can propose changes and respond to review comments without receiving
secret values.
**2026-06-27:** Added file-backed `approve`, `deny`, and `needs-changes`
commands that require reviewer and comment text and append non-secret review
comments to the CCR. Added `confirm-binding` for explicit non-secret auth
binding confirmation. Added `status` plus Make targets
`credential-change-status` and `credential-change-status-json` so ops-warden can
consume `readiness`/`resolvable` without scraping prose. Remaining T05 work is
State Hub decision-event emission and tighter chat integration. Created a
State Hub decision for `CCR-2026-0001` and added `sync-decision` so resolved
State Hub decisions can update the file-backed CCR status.
**2026-06-30:** Added optional `--record-state-hub` emission for approve, deny,
and needs-changes commands. Review comments are checked for known secret markers
before being written, and the State Hub progress event records only non-secret
CCR id/path/policy/field/auth-role metadata plus the reviewer comment.
## T06 - Build an interactive runbook for apply and verify
```task
id: RAILIANCE-WP-0007-T06
status: done
priority: high
state_hub_task_id: "3c3fc38c-afa4-4367-b3e6-ba4b286ced30"
```
Wrap privileged application in an operator-friendly guided runbook.
Acceptance:
- The runbook loads an approved CCR, shows the plan, asks for final attended
confirmation, then applies policy/auth metadata.
- Secret value entry is handled through an approved OpenBao/operator path and
is never echoed or logged.
- Positive and negative verification steps are guided.
- Non-secret evidence is recorded automatically.
**2026-06-30:** Added `scripts/credential-change.py runbook <CCR>` and Make
target `credential-change-runbook` to render the attended operator checklist,
final confirmation phrase, metadata apply guidance, secret custody instructions,
positive/negative verification steps, activation conditions, and evidence
commands. `runbook --execute-metadata` is opt-in, requires the exact `APPLY
<CCR-ID>` confirmation phrase, uses the local `bao` CLI with ambient approved
operator authority, writes only policy/auth metadata, and records a non-secret
`metadata_apply` evidence entry. Added `record-evidence` plus Make target
`credential-change-record-evidence` so operators can append apply, secret
provisioning, verification, and activation evidence to the CCR and optionally
State Hub without storing secret values.
## T07 - Pilot with whynot-design and ops-warden
```task
id: RAILIANCE-WP-0007-T07
status: done
priority: high
state_hub_task_id: "07a7d8bf-5528-41c8-a791-d6ccd0466a33"
```
Use the existing whynot-design npm token lane as the first end-to-end pilot.
Acceptance:
- The current whynot-design lane is represented as a CCR.
- The CCR is rendered and reviewed in chat or State Hub.
- A human approval or denial comment is recorded.
- If approved, the runbook applies the policy/auth metadata, guides secret
provisioning, verifies access, and notifies ops-warden.
- ops-warden activates its catalog entry only after CCR verification.
**2026-06-27:** The whynot-design lane is represented as `CCR-2026-0001` and
can be rendered for review. The whynot-design bound claim was confirmed from
operator chat context and recorded in the CCR, but it remains proposed/unapproved,
so live apply and ops-warden activation are correctly blocked.
**2026-06-27:** Converted the ops-warden batch follow-up
`fe5b1696-8956-4bd5-9d6f-dbde1901a076` into three proposed CCRs:
`CCR-2026-0001` for `whynot-design-npm-publish`, `CCR-2026-0002` for
`issue-core-ingestion-api-key`, and `CCR-2026-0003` for
`llm-connect-openrouter-api-key`. All three are explicitly `readiness: template`
and `resolvable: false` until owner confirmation, approval, OpenBao apply,
secret provisioning, and verification are complete.
**2026-06-28:** Synced State Hub decision
`250669d0-8475-4527-9624-cd072249f9a9` into `CCR-2026-0001`; the lane is now
`approved` with confirmed binding and `apply_allowed: true`. A live OpenBao
policy apply using the available token helper reached the active OpenBao pod but
still failed with `403 permission denied` on
`sys/policies/acl/workload-kv-read-whynot-design-npm-publish`, so the front door
remains `readiness: template` and `resolvable: false`. Added guarded
`credential-change-operator-commands` output so a platform operator can run the
reviewed non-secret policy and auth-role commands without hand-writing them;
secret value provisioning and verification remain under approved custody.
**2026-06-28:** After correcting the tenant/org to `coulomb`, the corrected
approval was synced from State Hub decision
`e6381a56-6b04-4fd5-b2de-f3ef59cde888`; `CCR-2026-0001` is approved and
`apply_allowed: true` for
`platform/workloads/coulomb/whynot-design/npm-publish`. The operator reported
secret provisioning likely completed, but Codex metadata-only verification still
received `403 permission denied`. Prepared
`docs/whynot-design-npm-publish-handoff.md` as the next-session checklist for
policy, auth-role, metadata verification, positive verification, negative
verification, and activation without printing the token.
**2026-06-28:** With the temporary operator token, Codex applied/confirmed the
OpenBao read policy and OIDC role, confirmed metadata `catalog-id`, and confirmed
`NPM_AUTH_TOKEN` field presence without printing or recording the value. The CCR
now records non-secret evidence for that apply check. Positive whynot-design and
negative non-whynot caller verification still gate `active`/`ready`.
**2026-06-29:** The whynot-design pilot completed OpenBao verification. Positive
fetch succeeded with output suppressed, negative login failed with the expected
groups bound-claim mismatch, `platform-root` membership was restored afterward,
and `CCR-2026-0001` is now active/ready/resolvable. ops-warden catalog
confirmation remains the external closeout step.
**2026-06-30:** Closed the pilot task based on the active/ready/resolvable CCR
state and prior ops-warden catalog confirmation that the selector is active and
resolvable. The remaining lifecycle work is now tracked separately in T08.
## T08 - Add deactivation, rotation, and compromise flows
```task
id: RAILIANCE-WP-0007-T08
status: done
priority: medium
state_hub_task_id: "23d6ef9d-8dbc-4468-b486-5ec8ada71130"
```
Support lifecycle states beyond initial creation.
Acceptance:
- Existing credentials can be imported as CCR-backed inventory without secret
values.
- Operators can mark a lane deactivated, rotated, or compromised with reason
and evidence.
- Deactivation disables the relevant access front door and auth/policy path.
- Compromise flow records blast-radius notes and required follow-up tasks.
**2026-06-30:** Added `lifecycle-plan`, `lifecycle-event`, and
`import-inventory` commands plus Make targets. Lifecycle plans render
deactivation, rotation, and compromise guidance, including access-front-door
state changes and OpenBao metadata disable commands for deactivation or
compromise. Lifecycle events update CCR status/front-door readiness, append
non-secret lifecycle evidence, and optionally post State Hub progress.
Compromise events accept non-secret blast-radius and follow-up references.
`import-inventory` can create a CCR-backed inventory file and matching read
policy artifact for an existing lane without asking for or storing secret
values.
## T09 - Add decision templates and guided review actions
```task
id: RAILIANCE-WP-0007-T09
status: done
priority: high
state_hub_task_id: "c436fd8b-cd82-4600-81b0-87ec069d7ae6"
```
Remove the current friction where reviewers must know magic rationale prefixes
for State Hub decisions to sync back into CCR status.
Acceptance:
- Each CCR review page or chat handoff shows explicit approve, deny, and needs
changes templates.
- Generated templates include the accepted prefixes (`APPROVE:`, `DENY:`, and
`NEEDS_CHANGES:`) and pre-fill the CCR id, corrected path, policy, auth role,
and non-secret rationale prompt.
- The dashboard or agent response links directly to the decision and states what
phrase or button will be recognized.
- The sync tooling refuses ambiguous free-text approvals with a friendly message
that shows the valid templates.
- Future UI work can replace prefix parsing with structured decision outcomes
without changing the CCR audit trail.
**2026-06-30:** Added `scripts/credential-change.py decision-templates <CCR>`
and Make target `credential-change-decision-templates`. The generated templates
include accepted prefixes, CCR id, KV path, policy, auth-role path, and the
linked State Hub decision. Ambiguous State Hub rationale text now fails with the
valid templates in the error message.
## Exit Criteria
- A human can review and approve or deny a credential/security change without
writing raw OpenBao commands.
- An approved request can be applied by an operator-controlled helper or
interactive runbook.
- State Hub and repo artifacts contain non-secret lifecycle, decision, and
evidence records.
- OpenBao remains the enforcement and audit source for actual secret access.
- The whynot-design npm token lane can complete through this workflow.