Archive closed workplans to workplans/archived/ (ADR-001)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,373 @@
|
||||
---
|
||||
id: RAIL-PL-WP-0002
|
||||
type: workplan
|
||||
title: "OpenBao Platform Secrets Service"
|
||||
domain: financials
|
||||
repo: railiance-platform
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 2
|
||||
created: "2026-05-17"
|
||||
updated: "2026-05-29"
|
||||
depends_on:
|
||||
- RAIL-PL-WP-0001
|
||||
state_hub_workstream_id: "fd1c045a-01d4-43be-980f-acbda6c64e6c"
|
||||
---
|
||||
|
||||
# RAIL-PL-WP-0002 - OpenBao Platform Secrets Service
|
||||
|
||||
## Goal
|
||||
|
||||
Establish OpenBao as the canonical Railiance S3 platform secrets service,
|
||||
or define a controlled transition path from existing HashiCorp Vault
|
||||
assumptions to OpenBao.
|
||||
|
||||
This workplan belongs in `railiance-platform` because S3 owns shared
|
||||
platform services: secret management, identity integration, object
|
||||
storage, backups, and other services consumed by S5 applications.
|
||||
|
||||
## Context
|
||||
|
||||
OpenBao is an open-source, Linux Foundation-governed fork of Vault for
|
||||
managing, storing, and distributing secrets, certificates, and keys.
|
||||
The official OpenBao documentation includes Kubernetes deployment via
|
||||
Helm, CSI provider support, dynamic database secrets, Kubernetes service
|
||||
account token generation, and lease/revocation semantics.
|
||||
|
||||
Current local architecture references still mention HashiCorp Vault in
|
||||
several places, especially credential bootstrap and ops-warden's Vault
|
||||
SSH backend. Railiance also uses SOPS/age for Git-at-rest secrets. The
|
||||
platform needs an explicit decision and migration path so "Vault" does
|
||||
not remain an accidental brand-specific dependency where "secrets
|
||||
manager" is what we really mean.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- decide whether OpenBao is the canonical Railiance platform secrets
|
||||
service
|
||||
- define deployment topology for OpenBao on the Railiance Kubernetes
|
||||
platform
|
||||
- define auth methods for workloads, operators, and automations
|
||||
- define secret engines for KV, database dynamic secrets, Kubernetes
|
||||
tokens, PKI/certificates, and future object-storage credential
|
||||
vending integrations
|
||||
- define CSI provider and/or External Secrets Operator integration
|
||||
- define unseal, backup, restore, break-glass, audit, and monitoring
|
||||
procedures
|
||||
- identify NetKingdom documentation and workplan updates needed to
|
||||
replace HashiCorp Vault-specific language with OpenBao-first language
|
||||
|
||||
Out of scope:
|
||||
|
||||
- replacing SOPS/age for Git-at-rest bootstrap secrets
|
||||
- changing S1/S2 cluster runtime configuration without coordination
|
||||
- rewriting ops-warden's SSH certificate backend in this workplan
|
||||
- implementing application-specific secrets in S5
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - OpenBao Decision And Migration Inventory
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "e997ffe0-6b61-4242-b585-f271e9b75e99"
|
||||
```
|
||||
|
||||
Inventory current HashiCorp Vault assumptions across NetKingdom,
|
||||
ops-warden, Railiance, and application runbooks. Decide whether
|
||||
Railiance standardizes on OpenBao, keeps Vault-compatible abstraction
|
||||
language, or supports both for a transition period.
|
||||
|
||||
**2026-05-17:** Decision recorded in State Hub:
|
||||
`a0df816c-3749-4418-9c8b-28eb428be953`. Railiance S3 standardizes on
|
||||
OpenBao as the runtime platform secrets service. SOPS/age remains the
|
||||
Git-at-rest bootstrap mechanism.
|
||||
|
||||
### T02 - Kubernetes Deployment Design
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "fb6ac85d-e77f-400d-8342-70a0ec6e82ef"
|
||||
```
|
||||
|
||||
Design the OpenBao Helm deployment for Railiance: namespace, storage
|
||||
backend, HA posture, ingress/internal service exposure, TLS, resource
|
||||
limits, PodDisruptionBudget, NetworkPolicies, and upgrade/rollback
|
||||
strategy.
|
||||
|
||||
**2026-05-17:** Implemented `helm/openbao-values.yaml`, Make targets, and
|
||||
`docs/openbao.md`. Deployed chart `openbao/openbao` `0.28.2` (app
|
||||
`v2.5.3`) to Railiance01 namespace `openbao` as internal-only,
|
||||
single-replica Raft with data/audit PVCs. Public ingress remains disabled;
|
||||
OpenBao is intentionally uninitialized and sealed until the bootstrap
|
||||
ceremony.
|
||||
|
||||
### T03 - Bootstrap, Unseal, And Break-Glass Procedure
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "509ccfd4-1775-4be4-b8e4-8d5bcf17f91e"
|
||||
```
|
||||
|
||||
Define initialization, unseal, root-token retirement, operator access,
|
||||
emergency access, backup escrow, and recovery drill. Ensure the design
|
||||
does not introduce an unmanaged "secret zero" worse than the current
|
||||
SOPS/age bootstrap.
|
||||
|
||||
**2026-05-17:** Initial ceremony documented in `docs/openbao.md`. Still
|
||||
needs human escrow assignment, root-token retirement details, and a
|
||||
restore/recovery drill before live secrets move into OpenBao.
|
||||
|
||||
**2026-05-23:** Added non-secret bootstrap support: `make openbao-verify`,
|
||||
`make openbao-verify-post-unseal`, `make openbao-configure-initial`,
|
||||
`scripts/openbao-verify.sh`, `scripts/openbao-apply-initial-config.sh`, and
|
||||
initial platform policies under `openbao/policies/`. `docs/openbao.md` now
|
||||
spells out pre-flight checks, escrow handling, root-token retirement, and the
|
||||
post-unseal initial configuration path. The actual initialization/unseal
|
||||
ceremony remains gated on named human escrow recipients and must not happen in
|
||||
a casual agent shell.
|
||||
|
||||
**2026-05-24:** Revised the custody model: `tegwick`
|
||||
(`bernd.worsch@gmail.com`, Gitea `tegwick`) is the setup operator/contact, not
|
||||
the long-term platform root of trust. The OpenBao ceremony is now gated on a
|
||||
separate NetKingdom king credential and guided bootstrap path. T03 remains
|
||||
`in_progress`: the live OpenBao init/unseal ceremony is still gated on king
|
||||
credential creation, custody mode approval, root-token disposition,
|
||||
reset/rotation, and restore-drill execution.
|
||||
|
||||
**2026-05-26:** Live OpenBao is now initialized, unsealed, and post-unseal
|
||||
verified on Railiance01. NetKingdom bootstrap metadata records custody approval,
|
||||
root-token revocation, unseal-key rotation, and restore-drill confirmation.
|
||||
T03 remains `in_progress` for production-trust closeout: declarative audit,
|
||||
durable audit shipping, OIDC-backed admin login verification, residual taint
|
||||
response, and cleanup before live application secrets move in. These remaining
|
||||
operator-facing gates are consolidated in `NET-WP-0017`.
|
||||
|
||||
**2026-05-29:** Railiance-owned bootstrap and break-glass scope is complete:
|
||||
`make openbao-status` and `make openbao-verify-post-unseal` pass against the
|
||||
live Railiance01 OpenBao pod, which is initialized, unsealed, and active with
|
||||
Bound data/audit PVCs. The production-trust gates that remain before ordinary
|
||||
user onboarding or live application secrets move into OpenBao are now explicitly
|
||||
owned by `NET-WP-0017`: declarative/durable audit closeout, OIDC-backed admin
|
||||
login evidence, residual taint cleanup, and hardening.
|
||||
|
||||
### T04 - Auth Methods And Workload Integration
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "ca2b3ac2-b522-4445-a418-c6ec312cd5f4"
|
||||
```
|
||||
|
||||
Configure or document auth methods for Kubernetes workloads,
|
||||
NetKingdom identity, admins, agents, and automations. Decide when
|
||||
workloads use OpenBao directly, CSI-mounted secrets, External Secrets
|
||||
Operator, or sidecars/controllers.
|
||||
|
||||
**2026-05-23:** Documented the auth and delivery model in `docs/openbao.md`.
|
||||
Bootstrap uses the one-time root token only for initial setup; platform
|
||||
operators use a non-root `platform-admin` token until NetKingdom OIDC/admin
|
||||
integration is ready; reviewers use `platform-readonly`; workloads use
|
||||
Kubernetes auth with namespace/service-account-bound policies. External
|
||||
Secrets Operator is preferred for Helm-compatible Kubernetes Secrets, CSI is
|
||||
reserved for mounted-file delivery and refresh-sensitive workloads, and the
|
||||
OpenBao injector remains disabled.
|
||||
|
||||
### T05 - Secret Engines And Dynamic Credentials
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T05
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "0d717bdd-76bc-41b4-b633-ba07214b4095"
|
||||
```
|
||||
|
||||
Enable and document the initial secret engines: KV v2 for platform
|
||||
configuration, database dynamic credentials for CNPG-managed
|
||||
PostgreSQL, Kubernetes token generation where appropriate, PKI/SSH
|
||||
future paths, and an assessment of object-storage credential vending
|
||||
integration with NK-WP-0007.
|
||||
|
||||
**2026-05-17:** Object-storage credential vending assessment started and
|
||||
documented in `docs/openbao.md`. Existing `artifact-store` capabilities cover
|
||||
artifact package preservation, an S3-compatible backend, env/file secret refs,
|
||||
and `artifactstore storage verify --backend s3`. Railiance S3 should use
|
||||
OpenBao for bootstrap custody, policy, audit, break-glass, and workload secret
|
||||
delivery, while `artifact-store` owns S3 backend behavior and
|
||||
`ARTIFACT-STORE-WP-0007` owns MinIO/fork compatibility plus temporary
|
||||
credential refresh decisions. NetKingdom remains the default owner for OIDC
|
||||
identity if object storage adopts `AssumeRoleWithWebIdentity`.
|
||||
|
||||
**2026-05-29:** Initial secret-engine scope is complete for this workplan:
|
||||
OpenBao has the `platform/` KV path and Kubernetes auth configured through the
|
||||
initial configuration helper, with `platform-admin` and `platform-readonly`
|
||||
policies present. Database dynamic credentials, PKI, SSH, and object-storage
|
||||
STS vending remain future integration work owned by their downstream service
|
||||
workplans and `ARTIFACT-STORE-WP-0007`; they are not blockers for the platform
|
||||
secrets service closeout.
|
||||
|
||||
### T06 - Backup, Audit, Monitoring, And Verification
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "cd61bc7d-8b9f-484f-97bd-7254c227b0ee"
|
||||
```
|
||||
|
||||
Define backup/restore procedure, audit device configuration, metrics,
|
||||
logs, health checks, restore drill, and smoke tests. Include a
|
||||
developer/operator verification script for the deployed service.
|
||||
|
||||
**2026-05-23:** Documented audit, Raft snapshot, encrypted snapshot custody,
|
||||
isolated restore drill, durable audit-log shipping, and monitoring baseline in
|
||||
`docs/openbao.md`. Added `scripts/openbao-verify.sh` plus Make targets for
|
||||
basic and post-unseal verification. The restore drill still must be executed
|
||||
before any live application secrets are migrated; that remains a gate under
|
||||
T03.
|
||||
|
||||
**2026-05-26:** `make openbao-verify-post-unseal` passes against the live
|
||||
OpenBao pod: Kubernetes objects exist, the pod is running, OpenBao reports
|
||||
`Initialized: true` and `Sealed: false`, and data/audit directories exist.
|
||||
Authenticated checks for audit devices, auth methods, and mounts still require
|
||||
the OIDC-backed or temporary platform-admin path and remain part of the
|
||||
production-readiness closeout.
|
||||
|
||||
**2026-06-01:** Added the source-side declarative file-audit configuration
|
||||
required by `NET-WP-0017-T02`: `helm/openbao-values.yaml` now includes an
|
||||
OpenBao `audit "file" "file"` stanza writing to
|
||||
`/openbao/audit/openbao-audit.log`, and
|
||||
`scripts/openbao-apply-initial-config.sh` now verifies audit visibility with
|
||||
`bao audit list` instead of attempting API-managed audit creation. The
|
||||
post-unseal verifier now warns when the audit log file is missing or empty.
|
||||
Live verification still reports the pod unsealed and healthy, but also reports
|
||||
the audit log file missing because this Helm change has not yet been rolled
|
||||
out. Roll out only in an attended window with unseal shares available.
|
||||
|
||||
**2026-06-01:** Rolled out the declarative audit configuration to the live
|
||||
Railiance01 OpenBao release in an attended window. Because the StatefulSet uses
|
||||
`OnDelete`, the pod was explicitly recycled after the Helm values upgrade and
|
||||
then unsealed by the operator. Post-unseal verification now reports OpenBao
|
||||
`2.5.4`, `Sealed: false`, the audit directory present, and
|
||||
`/openbao/audit/openbao-audit.log` present and non-empty. The source values now
|
||||
pin the live OpenBao image tag to `2.5.4`; Helm release revision 3 has the same
|
||||
explicit tag and the pod remained ready, so future chart upgrades do not
|
||||
implicitly change the runtime version while applying unrelated configuration.
|
||||
|
||||
**2026-06-01:** Added `make openbao-verify-authenticated` as a non-mutating
|
||||
operator proof for the remaining OpenBao readiness checks that require an
|
||||
approved token. The helper prompts for the token without echoing it, verifies
|
||||
`file/` audit visibility, `platform/` secrets, `kubernetes/` and `keycape/`
|
||||
auth methods, and confirms the audit log file is non-empty. It can also use an
|
||||
already-valid pod token helper via
|
||||
`OPENBAO_VERIFY_AUTH_ARGS=--use-token-helper` so the token does not move
|
||||
through the local shell at all. Durable audit shipping beyond the audit PVC
|
||||
remains intentionally open until a tested sink is selected; State Hub notes and
|
||||
hashes are evidence, not retained audit custody.
|
||||
|
||||
**2026-06-01:** Ran the authenticated verifier against the live pod token
|
||||
helper immediately after a fresh `bao login -no-print -method=oidc
|
||||
-path=keycape role=platform-admin` browser/MFA flow. The verifier passed:
|
||||
OpenBao is unsealed on `2.5.4`, `bao audit list` shows `file/`,
|
||||
`bao secrets list` shows `platform/`, `bao auth list` shows `kubernetes/` and
|
||||
`keycape/`, and `/openbao/audit/openbao-audit.log` grew from 7969 bytes to
|
||||
23330 bytes during the check. No token value was printed or copied into the
|
||||
workplan. The cached verifier token was then revoked with
|
||||
`bao token revoke -self`.
|
||||
|
||||
**2026-06-01:** Durable tenant-aware audit retention is now a separate
|
||||
`audit-core` product/repo instead of a Railiance OpenBao bootstrap subtask. The
|
||||
initial Audit Core mock backend writes JSONL events under
|
||||
`/tmp/audit-core/audit-YYYYMMDDTHH.jsonl` and removes files older than seven
|
||||
days; it is suitable for interface wiring and setup validation only. Railiance
|
||||
still owns the OpenBao file audit device and PVC, while production retention,
|
||||
tenant policy, and tamper-evident archive belong to Audit Core.
|
||||
|
||||
**2026-06-01:** Added a non-secret OpenBao restore-drill evidence template and
|
||||
`make openbao-validate-restore-evidence`. The validator requires concrete
|
||||
review evidence such as snapshot hashes, encrypted snapshot location, isolated
|
||||
restore completion, unseal/status/test-secret verification, isolated
|
||||
environment destruction, and a `no_secret_material_recorded` assertion. This
|
||||
keeps `NET-WP-0017-T02` from relying on a bare UI checkbox for restore proof.
|
||||
|
||||
**2026-06-01:** Added the matching non-secret emergency seal/unseal drill
|
||||
evidence template and `make openbao-validate-emergency-evidence`. The validator
|
||||
requires an attended seal/unseal evidence file with timing, sealed-state proof,
|
||||
unseal quorum availability, post-unseal verification, availability-window
|
||||
duration, and `no_secret_material_recorded`. The validator does not run the
|
||||
disruptive drill; it only checks the evidence captured after the attended
|
||||
operation.
|
||||
|
||||
**2026-06-02:** Hardened both evidence validators so unchanged templates or
|
||||
obvious placeholder values cannot accidentally satisfy NetKingdom T02. Restore
|
||||
evidence now rejects placeholder digests and template wording, while emergency
|
||||
drill evidence rejects template wording. Operators must copy the examples into
|
||||
local evidence files and replace placeholders with real non-secret drill
|
||||
evidence before validation can pass.
|
||||
|
||||
### T07 - Cross-Repo Transition Tasks
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0002-T07
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "89149b60-562b-4a5b-978d-0f9136ffa114"
|
||||
```
|
||||
|
||||
Create or link follow-up tasks for NetKingdom, ops-warden, ops-bridge,
|
||||
artifact-store, and S5 applications where documentation or integration
|
||||
must move from HashiCorp Vault-specific assumptions to OpenBao-first
|
||||
or Vault-compatible abstraction language.
|
||||
|
||||
**2026-05-17:** Started cross-repo transition by updating
|
||||
`net-kingdom/docs/platform-identity-security-architecture.md` and
|
||||
`net-kingdom/SCOPE.md` so NetKingdom treats OpenBao as the runtime
|
||||
platform secrets authority while SOPS/age remains bootstrap/Git-at-rest
|
||||
protection. Still needs ops-warden, ops-bridge, artifact-store, S5 app,
|
||||
and stale HashiCorp Vault wording follow-ups.
|
||||
|
||||
**2026-05-24:** Updated NetKingdom custody linkage:
|
||||
`net-kingdom/docs/platform-root-custody.md`, `NET-WP-0015`, and `NET-WP-0016`
|
||||
now define `tegwick` as setup operator/contact and a separate king credential
|
||||
as the platform-root custody target for OpenBao.
|
||||
|
||||
**2026-05-17:** Linked the artifact-store transition to
|
||||
`ARTIFACT-STORE-WP-0007 - MinIO Compatibility, MaxIO Fork Assessment, And STS
|
||||
Credential Vending` instead of creating duplicate S3 backend work in
|
||||
`railiance-platform`. The OpenBao side of the handoff is now documented in
|
||||
`docs/openbao.md`; remaining artifact-store work belongs in
|
||||
`ARTIFACT-STORE-WP-0007-T004` and follow-up routing in
|
||||
`ARTIFACT-STORE-WP-0007-T005`.
|
||||
|
||||
**2026-05-29:** Cross-repo transition ownership is explicit enough for
|
||||
Railiance closeout. NetKingdom owns the remaining identity, OIDC admin login,
|
||||
operator UX, hardening, and onboarding-readiness gates through `NET-WP-0017`.
|
||||
Artifact-store owns S3-compatible backend and credential-vending decisions
|
||||
through `ARTIFACT-STORE-WP-0007`. Future application-specific OpenBao adoption
|
||||
belongs with the relevant S5/application workplans once user onboarding is
|
||||
unblocked.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Railiance has an explicit decision on OpenBao versus HashiCorp Vault
|
||||
for platform secrets management.
|
||||
- OpenBao deployment topology is defined for the S3 platform-services
|
||||
layer.
|
||||
- Bootstrap, unseal, backup, restore, audit, and break-glass procedures
|
||||
are documented before live secrets are migrated.
|
||||
- Integration choices are clear for Kubernetes workloads, NetKingdom
|
||||
identity, dynamic database credentials, and future object-storage STS
|
||||
credential vending.
|
||||
- SOPS/age remains the bootstrap Git-at-rest mechanism unless a later
|
||||
ADR deliberately replaces it.
|
||||
@@ -0,0 +1,416 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0003
|
||||
type: workplan
|
||||
title: "Provision shared CNPG cluster apps-pg"
|
||||
domain: financials
|
||||
repo: railiance-platform
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 3
|
||||
created: "2026-05-19"
|
||||
updated: "2026-05-19"
|
||||
state_hub_workstream_id: "665b3b9b-608a-4be4-84b6-dcb8261ff57b"
|
||||
---
|
||||
|
||||
# RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg
|
||||
|
||||
## Goal
|
||||
|
||||
Provision a new shared CloudNativePG cluster `apps-pg` in the
|
||||
`databases` namespace that S5 application workloads can use to host
|
||||
their own PostgreSQL databases — without each app forcing the creation
|
||||
of a dedicated CNPG cluster.
|
||||
|
||||
This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme
|
||||
needs a `vergabe` role + `vergabe_db` database) by establishing the
|
||||
shared cluster and the governed onboarding contract future S5 apps adopt
|
||||
by default.
|
||||
|
||||
## Context
|
||||
|
||||
`railiance-apps` workplan `RAILIANCE-WP-0002` (establish
|
||||
vergabe-teilnahme on railiance01) found at T01 that the two existing
|
||||
CNPG clusters in `databases` are app-dedicated:
|
||||
|
||||
| Cluster | PG | Owner app |
|
||||
|----------------|----|-------------|
|
||||
| `gitea-db` | 18 | gitea |
|
||||
| `net-kingdom-pg` | 16 | net-kingdom |
|
||||
|
||||
Decision `D-01` (resolved 2026-05-18, bernd) selected option D:
|
||||
**provision a new shared cluster `apps-pg`** rather than create a third
|
||||
dedicated cluster (option A) or retrofit an existing app cluster (B/C).
|
||||
|
||||
A coordination message was sent from `railiance-apps` to
|
||||
`railiance-platform` requesting this work; this workplan is the
|
||||
response.
|
||||
|
||||
## Placement in the Railiance Tooling Set
|
||||
|
||||
S3 owns CNPG `Cluster` CRs (per ADR-003 and the pattern already
|
||||
established by `helm/gitea-db-cluster.yaml`). CNPG 1.28 has standalone
|
||||
`Database` CRs, but PostgreSQL role lifecycle is managed through the
|
||||
target `Cluster` spec's `.spec.managed.roles` stanza or through a
|
||||
controlled operator-run SQL workflow. The shared-cluster contract must
|
||||
therefore make role onboarding explicit; S5 repos should not assume a
|
||||
standalone CNPG `Role` CR exists.
|
||||
|
||||
| Concern | Owner repo | Scope |
|
||||
|---------|------------|-------|
|
||||
| `Cluster apps-pg` CR, shared NetworkPolicies, bootstrap secret, baseline docs | `railiance-platform` | this workplan |
|
||||
| Per-app database request and application DSN wiring | each S5 repo | not here |
|
||||
| Per-app PostgreSQL role + credential provisioning | coordinated | documented here; platform-administered until OpenBao/dedicated automation exists |
|
||||
| Per-app runtime Secret in the consumer namespace | each S5 repo | not here |
|
||||
|
||||
## Current Evidence
|
||||
|
||||
- `kubectl get crd | grep cnpg` confirms CNPG 1.28.1 with the
|
||||
`databases.postgresql.cnpg.io` CRD — databases can be represented
|
||||
declaratively.
|
||||
- CNPG role management is cluster-scoped via `.spec.managed.roles`;
|
||||
no standalone CNPG `Role` CR is available for app repos to apply.
|
||||
- Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`
|
||||
(`cnpg-system` namespace).
|
||||
- `databases` namespace has a default-deny-all NetworkPolicy; each
|
||||
CNPG cluster therefore needs its own NetworkPolicy triplet
|
||||
(egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns)
|
||||
— pattern visible in `helm/gitea-db-networkpolicies.yaml`.
|
||||
- `helm/apps-pg-cluster.yaml`, `helm/apps-pg-networkpolicies.yaml`,
|
||||
`helm/apps-pg-secret.sops.yaml.template`, and `docs/apps-pg.md` are
|
||||
present in the repo.
|
||||
- Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f`
|
||||
(state-hub thread).
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
Completed on 2026-05-19.
|
||||
|
||||
- CNPG operator is `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`.
|
||||
- `clusters.postgresql.cnpg.io` and `databases.postgresql.cnpg.io` CRDs
|
||||
are present; `roles.postgresql.cnpg.io` is not present, so role
|
||||
onboarding remains platform-administered through managed roles or a
|
||||
controlled SQL workflow.
|
||||
- `local-path` is the default StorageClass. The single K3s node reports
|
||||
no memory, disk, or PID pressure; allocatable ephemeral storage is
|
||||
about 97.7 GB and memory is about 3.8 GiB. Existing CNPG PVC footprint
|
||||
before `apps-pg` was two 10Gi PVCs (`gitea-db-1`,
|
||||
`net-kingdom-pg-1`).
|
||||
- `databases` exists with `default-deny-all`; `cnpg-system` has the
|
||||
required `kubernetes.io/metadata.name=cnpg-system` namespace label.
|
||||
- The live CNPG CRD rejected `spec.postgresql.version`; the deployed
|
||||
`apps-pg` manifest therefore pins PostgreSQL 16 with
|
||||
`imageName: ghcr.io/cloudnative-pg/postgresql:16`.
|
||||
- `apps-pg` is deployed in `databases`, reports `Cluster in healthy
|
||||
state`, and has primary `apps-pg-1`.
|
||||
- Services `apps-pg-rw` and `apps-pg-ro` exist. With one instance,
|
||||
`apps-pg-ro` is present but has no replica endpoint until HA is added.
|
||||
- A disposable namespace labeled
|
||||
`railiance.io/postgres-client=apps-pg` successfully connected to
|
||||
`apps-pg-rw.databases.svc.cluster.local:5432/apps_meta` as
|
||||
`apps_admin`; the temporary namespace and copied smoke-test secret
|
||||
were deleted immediately after the check.
|
||||
|
||||
## Safety Contract
|
||||
|
||||
- Do not commit plaintext credentials. Bootstrap secret is a one-time
|
||||
manual `kubectl create secret` then SOPS-encrypt a template into
|
||||
`helm/apps-pg-secret.sops.yaml.template`.
|
||||
- Do not expose `apps_admin` to S5 applications. It is a platform
|
||||
bootstrap/smoke-test role, not a runtime credential.
|
||||
- Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`.
|
||||
This cluster is for S5 *application* DBs.
|
||||
- Preserve the default-deny NetworkPolicy posture in `databases`;
|
||||
only allow ingress from namespaces that have a registered consumer.
|
||||
- Do not advertise self-service role creation until the role
|
||||
provisioning mechanism is explicit. CNPG `Database` CRs still require
|
||||
their owner role to exist.
|
||||
- Initial sizing is conservative (1 instance, 10Gi) to match the
|
||||
existing per-cluster footprint. Resize is a follow-up workplan.
|
||||
- Cluster name `apps-pg` is locked once published — renaming changes
|
||||
every consumer DSN.
|
||||
|
||||
## Target State
|
||||
|
||||
- `kubectl get cluster apps-pg -n databases` reports
|
||||
`Cluster in healthy state` with the primary `apps-pg-1`.
|
||||
- `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists.
|
||||
- NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet.
|
||||
- `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs`
|
||||
targets exist and work.
|
||||
- Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`)
|
||||
exist for cluster health probes and to anchor the bootstrap; the
|
||||
cluster is otherwise empty of per-app data.
|
||||
- Documentation explains how an S5 consumer registers a new database,
|
||||
including the current CNPG boundary: the `Database` CR is separate,
|
||||
but role lifecycle is cluster-scoped and therefore governed by the
|
||||
platform contract.
|
||||
- `railiance-apps` is notified via the hub thread; their
|
||||
`RAILIANCE-WP-0002 T04` can proceed using the documented onboarding
|
||||
path.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 — Inventory and capacity check
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749"
|
||||
```
|
||||
|
||||
Confirm the substrate before adding a new cluster.
|
||||
|
||||
Checks:
|
||||
|
||||
- CNPG operator version (≥ 1.28.x required for the `Database` CR
|
||||
consumer pattern).
|
||||
- Role/database API boundary: `Database` CR is present; role lifecycle
|
||||
is `.spec.managed.roles` or controlled SQL, not a separate `Role` CR.
|
||||
- Node-level disk space available for an additional 10Gi PVC
|
||||
(`local-path` storage class is the active default).
|
||||
- Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any
|
||||
current resource pressure.
|
||||
- That the `databases` namespace already exists and has its
|
||||
default-deny NetworkPolicy in place.
|
||||
- That `cnpg-system` namespace label
|
||||
`kubernetes.io/metadata.name=cnpg-system` is set (required by the
|
||||
ingress-from-operator NetworkPolicy).
|
||||
|
||||
**Done when:** the implementation notes record CNPG version, available
|
||||
PVC capacity, the chosen role onboarding mechanism, and any
|
||||
pre-condition gaps.
|
||||
|
||||
---
|
||||
|
||||
### T02 — Create bootstrap credential secret
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08"
|
||||
```
|
||||
|
||||
Mint the one-time bootstrap secret that CNPG uses to create the initial
|
||||
`apps_admin` role.
|
||||
|
||||
Steps:
|
||||
|
||||
```bash
|
||||
APPS_PG_PW=$(openssl rand -base64 32)
|
||||
kubectl create secret generic apps-pg-credentials \
|
||||
--namespace databases \
|
||||
--from-literal=username=apps_admin \
|
||||
--from-literal=password="$APPS_PG_PW"
|
||||
```
|
||||
|
||||
Then commit a SOPS-encrypted template:
|
||||
|
||||
- `helm/apps-pg-secret.sops.yaml.template` — encrypted form for
|
||||
declarative reapply; do not commit the plaintext password.
|
||||
|
||||
The bootstrap role is intentionally not a consumer role. Per-app runtime
|
||||
roles are created later through the onboarding mechanism documented in
|
||||
T06; until dedicated automation exists, that mechanism is
|
||||
platform-administered.
|
||||
|
||||
**Done when:** the secret exists in the cluster and an encrypted
|
||||
template is committed.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Add the CNPG Cluster manifest
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12"
|
||||
```
|
||||
|
||||
Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`.
|
||||
Do not add app-specific roles or databases to the baseline cluster
|
||||
manifest unless T01 explicitly chooses a platform-owned managed-role
|
||||
stanza as the interim onboarding path for the first consumer.
|
||||
|
||||
Shape:
|
||||
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: apps-pg
|
||||
namespace: databases
|
||||
labels:
|
||||
app.kubernetes.io/name: apps-pg
|
||||
app.kubernetes.io/component: database
|
||||
app.kubernetes.io/managed-by: manual
|
||||
railiance.io/layer: s3-platform
|
||||
railiance.io/role: shared-apps-database
|
||||
spec:
|
||||
instances: 1 # bump when node RAM > 8GB
|
||||
imageName: ghcr.io/cloudnative-pg/postgresql:16
|
||||
storage:
|
||||
size: 10Gi
|
||||
bootstrap:
|
||||
initdb:
|
||||
database: apps_meta
|
||||
owner: apps_admin
|
||||
secret:
|
||||
name: apps-pg-credentials
|
||||
```
|
||||
|
||||
Note: PG version is 16 (matches vergabe-teilnahme's minimum and the
|
||||
existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision.
|
||||
|
||||
**Done when:** the manifest is committed and `kubectl apply --dry-run`
|
||||
validates against the cluster.
|
||||
|
||||
---
|
||||
|
||||
### T04 — Add NetworkPolicies for apps-pg
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1"
|
||||
```
|
||||
|
||||
Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet
|
||||
but parameterised for the *apps* consumer namespace pattern.
|
||||
|
||||
Three policies (all in `databases`, all selecting
|
||||
`cnpg.io/cluster: apps-pg`):
|
||||
|
||||
1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443.
|
||||
2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from
|
||||
`namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP
|
||||
ports 5432 / 8000 / 9187.
|
||||
3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432
|
||||
from any namespace carrying the label
|
||||
`railiance.io/postgres-client=apps-pg`. (Each consuming app
|
||||
namespace adds this label; this avoids hard-coding a namespace list
|
||||
in the platform repo.)
|
||||
|
||||
The label-based selector is the meaningful difference from gitea-db,
|
||||
which hard-codes `default`. The shared cluster cannot know its
|
||||
consumer namespaces in advance, so it expects a positive opt-in label.
|
||||
|
||||
**Done when:** the policies are committed and applied; consumer namespaces
|
||||
can connect after applying the `railiance.io/postgres-client=apps-pg`
|
||||
label.
|
||||
|
||||
---
|
||||
|
||||
### T05 — Makefile targets, deploy, verify
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648"
|
||||
```
|
||||
|
||||
Add targets that mirror the `db-*` (gitea-db) family:
|
||||
|
||||
```make
|
||||
apps-pg-deploy: ## Apply shared apps-pg CNPG Cluster + NetworkPolicies
|
||||
$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
|
||||
$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml
|
||||
|
||||
apps-pg-status: ## Show apps-pg CNPG cluster health
|
||||
$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
|
||||
$(KUBECTL) get cluster apps-pg -n databases -o wide
|
||||
|
||||
apps-pg-shell: ## Open psql shell on apps-pg primary
|
||||
$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta
|
||||
|
||||
apps-pg-logs: ## Tail apps-pg primary logs
|
||||
$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50
|
||||
```
|
||||
|
||||
Then deploy and wait for the cluster to converge:
|
||||
|
||||
```bash
|
||||
make apps-pg-deploy
|
||||
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m
|
||||
```
|
||||
|
||||
Smoke checks:
|
||||
|
||||
- `cnpg status` reports `Cluster in healthy state`.
|
||||
- Services `apps-pg-rw` and `apps-pg-ro` exist.
|
||||
- From a disposable pod in a temporary namespace labeled
|
||||
`railiance.io/postgres-client=apps-pg`, a platform-operated test
|
||||
connection to `apps-pg-rw.databases:5432/apps_meta` succeeds. Delete
|
||||
the temporary namespace and any copied test secret immediately after
|
||||
the check; do not place `apps_admin` in an application namespace.
|
||||
|
||||
**Done when:** the smoke checks pass.
|
||||
|
||||
---
|
||||
|
||||
### T06 — Reply to railiance-apps, document the consumer contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547"
|
||||
```
|
||||
|
||||
Notify the requester and capture the pattern.
|
||||
|
||||
Steps:
|
||||
|
||||
- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` through the
|
||||
State Hub `/messages/` REST API with this workplan's id and the
|
||||
cluster's connection details. Do not send bootstrap credentials.
|
||||
- Add `docs/apps-pg.md` with:
|
||||
- Cluster identity and connection endpoints.
|
||||
- The per-app onboarding recipe: (a) request/approve a per-app role,
|
||||
(b) provision the backing role and credential through the chosen
|
||||
platform mechanism, (c) create the CNPG `Database` CR in the
|
||||
`databases` namespace with `spec.cluster.name: apps-pg` and
|
||||
`spec.owner` set to the approved role, (d) label the consumer
|
||||
namespace `railiance.io/postgres-client=apps-pg`, (e) publish or
|
||||
mirror the runtime Secret into the consumer namespace, and (f) wire
|
||||
the DSN into the application Helm values.
|
||||
- The CNPG 1.28 boundary: `Database` is standalone; role management is
|
||||
not a standalone `Role` CR and must follow the platform contract.
|
||||
- Backup posture (when the cluster is added to the existing platform
|
||||
backup process) and the resize / replicate roadmap.
|
||||
|
||||
**Done when:** the message is replied to and `docs/apps-pg.md` is
|
||||
committed.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This workplan is complete when:
|
||||
|
||||
1. `apps-pg` reports healthy in the `databases` namespace.
|
||||
2. NetworkPolicies enforce the default-deny posture with label-based
|
||||
consumer opt-in.
|
||||
3. Makefile targets work end-to-end.
|
||||
4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly
|
||||
acknowledged via the hub thread.
|
||||
5. `docs/apps-pg.md` explains the consumer onboarding contract,
|
||||
including the CNPG role/database boundary.
|
||||
|
||||
## Notes
|
||||
|
||||
- This intentionally does **not** hard-code the `vergabe` role or
|
||||
`vergabe_db` into the shared cluster baseline. The consumer onboarding
|
||||
doc must describe the follow-up request/manifest needed for
|
||||
`railiance-apps` so the platform layer stays generic until an app
|
||||
explicitly registers.
|
||||
- Backup inclusion of `apps-pg` is a follow-up. The existing
|
||||
`make backup` target only covers the legacy PostgreSQL-HA setup;
|
||||
CNPG backup configuration is its own workplan.
|
||||
- A second replica (HA) and a connection pooler (PgBouncer / CNPG
|
||||
`Pooler`) are deferred. The cluster spec leaves room
|
||||
for both — re-enable when node capacity allows.
|
||||
@@ -0,0 +1,281 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0004
|
||||
type: workplan
|
||||
title: "Establish ArgoCD GitOps bootstrap contract"
|
||||
domain: financials
|
||||
repo: railiance-platform
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 4
|
||||
created: "2026-06-19"
|
||||
updated: "2026-06-25"
|
||||
state_hub_workstream_id: "e57e487b-8557-439d-8093-0457c73ede93"
|
||||
---
|
||||
|
||||
# RAILIANCE-WP-0004 - Establish ArgoCD GitOps Bootstrap Contract
|
||||
|
||||
## Goal
|
||||
|
||||
Establish the minimal platform-owned ArgoCD GitOps contract needed for
|
||||
Railiance application teams to deploy through the already-installed ArgoCD
|
||||
instance on `railiance01`.
|
||||
|
||||
This work responds to the `issue-core` dependency message from 2026-06-18:
|
||||
ArgoCD is installed and healthy on `railiance01`, but unused. `issue-core`
|
||||
will be the first tenant Application and needs platform decisions before it
|
||||
can author its workload deployment.
|
||||
|
||||
## Intent Alignment
|
||||
|
||||
`INTENT.md` defines this repo as the shared platform-services layer:
|
||||
stateful services, secret custody, stable interfaces, and recoverable
|
||||
operational contracts.
|
||||
|
||||
ArgoCD itself is not an application, database, or secret store. The work in
|
||||
this repo is therefore intentionally limited to the platform contract around
|
||||
GitOps:
|
||||
|
||||
- repository trust and credential registration for ArgoCD;
|
||||
- AppProject guardrails that keep tenant syncs inside expected boundaries;
|
||||
- a root app-of-apps entrypoint that provides a stable onboarding surface;
|
||||
- the OpenBao-backed runtime secret delivery convention tenants must use.
|
||||
|
||||
Application workloads, container images, per-service manifests, and business
|
||||
logic remain owned by the tenant repos.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- Define the bootstrap manifests for ArgoCD AppProjects and the root
|
||||
app-of-apps Application.
|
||||
- Define how Git source repositories are registered without committing
|
||||
credentials.
|
||||
- Define where tenant Application manifests are placed and how they point
|
||||
back to tenant-owned workload manifests.
|
||||
- Confirm the runtime secret delivery pattern: OpenBao custody delivered to
|
||||
Kubernetes via External Secrets Operator by default; CSI-mounted files only
|
||||
when a workload requires file references; OpenBao injector remains disabled.
|
||||
- Provide an `issue-core` pilot Application example so that repo can author
|
||||
its final manifest against a concrete contract.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- Installing or upgrading ArgoCD itself; that is cluster/runtime ownership.
|
||||
- Moving S5 application workload manifests into this repo.
|
||||
- Storing ArgoCD repository credentials, API tokens, or application secrets in
|
||||
Git, workplans, State Hub, or chat.
|
||||
- Applying live manifests that require operator-owned credentials.
|
||||
|
||||
## Decisions
|
||||
|
||||
### D-01 - Bootstrap Layout
|
||||
|
||||
Use this repo only for the platform-owned GitOps bootstrap:
|
||||
|
||||
```text
|
||||
argocd/bootstrap/ AppProjects and root app-of-apps Application
|
||||
argocd/applications/ thin tenant Application manifests reviewed by platform
|
||||
argocd/repositories/ SOPS templates for ArgoCD repository Secret objects
|
||||
docs/argocd-gitops.md GitOps contract and onboarding guidance
|
||||
```
|
||||
|
||||
The root Application syncs `argocd/applications/` from this repo. Tenant
|
||||
Application manifests in that directory point to workload manifests in each
|
||||
tenant repo, normally `k8s/railiance/`.
|
||||
|
||||
### D-02 - AppProject Model
|
||||
|
||||
Create two AppProjects:
|
||||
|
||||
- `railiance-bootstrap` only allows the root app to manage ArgoCD
|
||||
`Application` objects in the `argocd` namespace.
|
||||
- `railiance-tenants` allows tenant Applications to sync ordinary namespaced
|
||||
workload resources into their own namespaces, plus namespace creation. It
|
||||
does not grant CRD, ClusterRole, ClusterRoleBinding, or arbitrary
|
||||
cluster-admin authority.
|
||||
|
||||
### D-03 - Sync Policy
|
||||
|
||||
Default tenant Applications use automated sync with prune and self-heal
|
||||
enabled after platform review. Recommended sync options are:
|
||||
|
||||
```yaml
|
||||
syncPolicy:
|
||||
automated:
|
||||
prune: true
|
||||
selfHeal: true
|
||||
syncOptions:
|
||||
- CreateNamespace=true
|
||||
- ApplyOutOfSyncOnly=true
|
||||
- PruneLast=true
|
||||
```
|
||||
|
||||
Sync waves are reserved for dependency ordering. Platform services and secret
|
||||
delivery resources should sync before workloads that consume them.
|
||||
|
||||
### D-04 - Secret Delivery
|
||||
|
||||
OpenBao remains the canonical runtime secret custody service. For ordinary
|
||||
Kubernetes workloads, use External Secrets Operator to materialize OpenBao
|
||||
values as Kubernetes Secrets. Do not use the OpenBao injector in the current
|
||||
deployment.
|
||||
|
||||
Runtime path convention for workload credential custody:
|
||||
|
||||
```text
|
||||
platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>
|
||||
```
|
||||
|
||||
Kubernetes namespace and service-account bounds belong in the auth role or
|
||||
External Secrets binding unless the namespace is itself the approved workload
|
||||
identity.
|
||||
|
||||
ArgoCD repository credentials are operator credentials, not workload secrets,
|
||||
and should live under:
|
||||
|
||||
```text
|
||||
platform/operators/argocd/repositories/<repo-name>
|
||||
```
|
||||
|
||||
## Current Evidence
|
||||
|
||||
- State Hub inbox message `d7a18ff9-e6c6-4e44-a39e-78369e530dfc` reports
|
||||
ArgoCD is installed and healthy on `railiance01`, with zero Applications,
|
||||
zero ApplicationSets, zero registered repositories, and only the stock
|
||||
`default` AppProject.
|
||||
- `INTENT.md` and `SCOPE.md` keep this repo focused on shared platform
|
||||
services and secret custody. This work therefore creates a bootstrap
|
||||
contract and secret-delivery convention, not app workload ownership.
|
||||
- `docs/openbao.md` already states the preferred delivery pattern:
|
||||
External Secrets Operator for values that become Kubernetes Secrets, CSI for
|
||||
file-reference workloads, and no OpenBao injector in the current deployment.
|
||||
|
||||
|
||||
## Follow-up Progress (2026-06-25)
|
||||
|
||||
- Added a platform-owned `railiance-platform-addons` AppProject for
|
||||
cluster-scoped add-ons.
|
||||
- Added the `external-secrets` ArgoCD Application for External Secrets
|
||||
Operator and the `openbao-secretstore` Application for
|
||||
`ClusterSecretStore/openbao`.
|
||||
- Added the least-privilege OpenBao policy and Kubernetes auth role helper for
|
||||
the issue-core ESO pilot. The role binds only the
|
||||
`external-secrets/external-secrets` service account and reads only
|
||||
`platform/workloads/issue-core/issue-core/*`.
|
||||
- Limited the initial `ClusterSecretStore/openbao` to the `issue-core`
|
||||
namespace; broaden only through a later platform review.
|
||||
|
||||
## Target State
|
||||
|
||||
- `argocd/bootstrap/` contains the two AppProjects and root app-of-apps
|
||||
Application.
|
||||
- `argocd/applications/` documents the tenant Application contract and includes
|
||||
an `issue-core` example manifest.
|
||||
- `argocd/repositories/` contains non-secret SOPS templates for ArgoCD
|
||||
repository registration.
|
||||
- `docs/argocd-gitops.md` answers the four questions raised by `issue-core`:
|
||||
repository registration, source layout, sync policy, and secret delivery.
|
||||
- Make targets exist for dry-run, deploy, status, and SOPS-backed repository
|
||||
secret application.
|
||||
- `issue-core` can author its final Application and workload manifests against
|
||||
this contract without waiting for more platform design.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 - Review intent and scope boundary
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7cb56ad6-5435-41af-b416-e68fe661b7a0"
|
||||
```
|
||||
|
||||
Review `INTENT.md`, `SCOPE.md`, existing OpenBao delivery docs, and the
|
||||
`issue-core` inbox request. Capture the boundary that ArgoCD bootstrap belongs
|
||||
here only as a platform trust and secret-delivery contract.
|
||||
|
||||
### T02 - Add ArgoCD bootstrap manifests
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "68f7ef19-686d-4d16-bf75-ffcbba158023"
|
||||
```
|
||||
|
||||
Add AppProject manifests and the root app-of-apps Application under
|
||||
`argocd/bootstrap/`.
|
||||
|
||||
Done when the manifests can be rendered by `kubectl apply -k` and avoid secret
|
||||
material.
|
||||
|
||||
### T03 - Define tenant onboarding and repository registration
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "e6dc9176-af33-4216-9871-a61ad7e69943"
|
||||
```
|
||||
|
||||
Add documentation and templates for tenant Applications, per-repo ArgoCD
|
||||
repository Secret registration, and the `issue-core` pilot example.
|
||||
|
||||
### T04 - Confirm OpenBao-backed secret delivery
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d859e4ef-d8d1-4403-8225-839925f8bedf"
|
||||
```
|
||||
|
||||
Document that OpenBao remains the runtime custody authority, External Secrets
|
||||
Operator is the default Kubernetes delivery mechanism, CSI is reserved for
|
||||
file-reference workloads, and the OpenBao injector remains disabled.
|
||||
|
||||
### T05 - Operator live bootstrap
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "981f46c0-8dd7-4111-9a4f-2ca58ddb0664"
|
||||
```
|
||||
|
||||
Apply the bootstrap and repository credentials to live ArgoCD after these repo
|
||||
changes are merged to the Git source ArgoCD reads and, if the source repo is
|
||||
private, after an operator provides or materializes read-only repository
|
||||
credentials through the approved OpenBao/operator path.
|
||||
|
||||
Applied 2026-06-19 on the live ArgoCD cluster (`92.205.130.254`, default
|
||||
`~/.kube/config`). `make argocd-bootstrap-dry-run` and
|
||||
`make argocd-bootstrap-deploy` succeeded. Repository registration was skipped
|
||||
because `railiance-platform` and `issue-core` Gitea repos are currently public.
|
||||
|
||||
Post-bootstrap status:
|
||||
|
||||
- `railiance-apps-root`: Synced / Healthy
|
||||
- `issue-core`: OutOfSync / Missing — sync fails because
|
||||
`external-secrets.io/ExternalSecret` CRD is not installed on the cluster
|
||||
|
||||
Do not paste credentials into the workplan, State Hub, or chat.
|
||||
|
||||
### T06 - Notify first tenant
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0004-T06
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "73bdda1d-8e25-48d2-ab92-b203c5050d45"
|
||||
```
|
||||
|
||||
Reply to `issue-core` with the GitOps contract pointer and confirm that it owns
|
||||
the final `issue-core` Application proposal and workload manifests. Include the
|
||||
OpenBao path convention for `ISSUE_CORE_API_KEY` and the Gitea backend token.
|
||||
|
||||
State Hub reply: `56df276d-77d0-427f-92a5-a99cacc1290f`.
|
||||
@@ -0,0 +1,359 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0006
|
||||
type: workplan
|
||||
title: "Workload KV Access Lanes for ops-warden Fetch"
|
||||
domain: financials
|
||||
repo: railiance-platform
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 6
|
||||
created: "2026-06-27"
|
||||
updated: "2026-06-29"
|
||||
depends_on_workplans:
|
||||
- RAIL-PL-WP-0002
|
||||
- RAILIANCE-WP-0004
|
||||
related_state_hub_messages:
|
||||
- "551031d1-335e-4db8-9535-820fea52d0a3"
|
||||
- "f76d3a9e-a98f-4081-885d-b79d94312699"
|
||||
state_hub_workstream_id: "96c8a93d-7a5a-4fa9-8f7b-865119551da3"
|
||||
---
|
||||
|
||||
# RAILIANCE-WP-0006 - Workload KV Access Lanes for ops-warden Fetch
|
||||
|
||||
## Goal
|
||||
|
||||
Provision concrete, least-privilege OpenBao workload KV read lanes that
|
||||
`ops-warden` can expose through `warden access --fetch` / `--exec` without
|
||||
holding secret values itself.
|
||||
|
||||
The immediate request is for `whynot-design` to retrieve its npm publish token.
|
||||
The path must be concrete, policy-scoped, and documented so the ops-warden
|
||||
catalog can replace the current unresolved template path with a live
|
||||
`whynot-design-npm-publish` entry.
|
||||
|
||||
No task in this workplan may paste, commit, log, or send secret values through
|
||||
Git, State Hub, chat, prompts, or workplan text.
|
||||
|
||||
## Requirements Reviewed
|
||||
|
||||
Ops-warden message `551031d1-335e-4db8-9535-820fea52d0a3` asks
|
||||
`railiance-platform` to provide non-secret pointers for:
|
||||
|
||||
- a concrete OpenBao KV path and field for `NPM_AUTH_TOKEN`;
|
||||
- the KV mount used by the path;
|
||||
- the OIDC login role for whynot-design or its operator identity;
|
||||
- a read policy scoped to whynot-design's identity/service account;
|
||||
- the flex-auth policy reference, if pre-approval is required.
|
||||
|
||||
Once these pointers are live, ops-warden will add a dedicated
|
||||
`whynot-design-npm-publish` access catalog entry and a playbook, then notify
|
||||
whynot-design.
|
||||
|
||||
## Proposed Contract
|
||||
|
||||
Use the workload credential convention documented in `docs/openbao.md`:
|
||||
|
||||
```text
|
||||
platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>
|
||||
```
|
||||
|
||||
For this lane, the proposed non-secret contract is:
|
||||
|
||||
| Item | Proposed value |
|
||||
| --- | --- |
|
||||
| KV mount | `platform` |
|
||||
| Tenant/org | `coulomb` |
|
||||
| Workload/project | `whynot-design` |
|
||||
| CLI path | `platform/workloads/coulomb/whynot-design/npm-publish` |
|
||||
| KV-v2 policy data path | `platform/data/workloads/coulomb/whynot-design/npm-publish` |
|
||||
| KV-v2 policy metadata path | `platform/metadata/workloads/coulomb/whynot-design/npm-publish` |
|
||||
| Secret field | `NPM_AUTH_TOKEN` |
|
||||
| OpenBao read policy | `workload-kv-read-whynot-design-npm-publish` |
|
||||
| OIDC auth mount | `netkingdom` unless KeyCape compatibility requires `keycape` |
|
||||
| OIDC role | `whynot-design-workload-kv-read` |
|
||||
| Kubernetes auth role | `whynot-design-workload-kv-read` if an in-cluster service account consumes it |
|
||||
| flex-auth ref | `secret.read:whynot-design` if tenant policy requires pre-approval |
|
||||
|
||||
The expected caller-facing read shape is:
|
||||
|
||||
```bash
|
||||
bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read
|
||||
bao kv get -field=NPM_AUTH_TOKEN platform/workloads/coulomb/whynot-design/npm-publish
|
||||
```
|
||||
|
||||
The command shape is illustrative only. Verification must avoid printing the
|
||||
secret value; use attended operator checks or commands that prove read access
|
||||
without persisting the token in logs.
|
||||
|
||||
## Tasks
|
||||
|
||||
## T01 - Capture ops-warden request and path contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "0c93496a-48bf-44e7-a75b-52e51e2639bc"
|
||||
```
|
||||
|
||||
Record the ops-warden request, existing workload path convention, and proposed
|
||||
whynot-design contract in this workplan.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The workplan names the concrete path, field, mount, policy, auth role, and
|
||||
optional flex-auth ref needed by ops-warden.
|
||||
- The plan distinguishes non-secret pointers from secret values.
|
||||
- The plan keeps this workload KV read lane separate from
|
||||
`RAILIANCE-WP-0005`, which tracks short-lived OpenBao token issuance for the
|
||||
ops-warden signing smoke.
|
||||
|
||||
**2026-06-27:** Reviewed the unread ops-warden request and existing
|
||||
`platform/workloads/<tenant-or-org>/<workload>/<secret-purpose>` convention.
|
||||
Captured the proposed `whynot-design` npm publish lane above with no secret
|
||||
values.
|
||||
|
||||
## T02 - Add least-privilege OpenBao read policy
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "9c06d531-2566-4767-aa2f-8339605f23d5"
|
||||
```
|
||||
|
||||
Create a concrete policy artifact for the whynot-design npm publish lane,
|
||||
derived from `openbao/policies/workload-kv-read-template.hcl` but narrowed to
|
||||
the selected `npm-publish` path.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- A policy file under `openbao/policies/` defines read access to the exact
|
||||
`platform/data/workloads/coulomb/whynot-design/npm-publish` path.
|
||||
- Metadata/list capabilities are only as broad as needed for the caller and
|
||||
ops-warden fetch UX.
|
||||
- The policy grants no write, delete, patch, sudo, auth, or unrelated workload
|
||||
capabilities.
|
||||
- The policy name matches the pointer intended for ops-warden:
|
||||
`workload-kv-read-whynot-design-npm-publish`.
|
||||
|
||||
**2026-06-27:** Added the concrete policy artifact at
|
||||
`openbao/policies/workload-kv-read-whynot-design-npm-publish.hcl`. It grants
|
||||
only `read` on the exact KV-v2 data and metadata paths for
|
||||
`platform/workloads/coulomb/whynot-design/npm-publish`; it does not grant
|
||||
write/delete/list/sudo/auth or sibling workload access. Added
|
||||
`scripts/openbao-apply-workload-kv-lanes.sh`,
|
||||
`make openbao-workload-kv-lanes-dry-run`, and
|
||||
`make openbao-configure-workload-kv-lanes` for the source-owned policy apply
|
||||
step. Dry-run passed. A live apply attempt with
|
||||
`OPENBAO_WORKLOAD_KV_ARGS=--use-token-helper` reached unsealed OpenBao but was
|
||||
denied with `403 permission denied` while writing the policy, so live policy
|
||||
application waits on an approved platform-admin/operator token or a narrow
|
||||
token-helper capability.
|
||||
|
||||
**2026-06-28:** Using the temporary operator token provided outside the repo,
|
||||
Codex applied/confirmed the live policy in OpenBao. The verification read of the
|
||||
policy succeeded and no secret values were printed or recorded.
|
||||
|
||||
## T03 - Define and apply auth bindings
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "a217371a-0f85-40c6-b691-ac67834c86b5"
|
||||
```
|
||||
|
||||
Define the auth role that lets whynot-design or an approved operator identity
|
||||
read the lane as itself.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The OIDC login role is documented as
|
||||
`bao login -method=oidc -path=netkingdom role=whynot-design-workload-kv-read`,
|
||||
or a different approved role is recorded with the reason.
|
||||
- The role attaches only the whynot-design npm publish read policy.
|
||||
- If an in-cluster whynot-design service account consumes the token, the
|
||||
Kubernetes auth role binds only the approved namespace and service account.
|
||||
- Compatibility with the legacy `keycape` auth mount is either configured or
|
||||
explicitly declined.
|
||||
|
||||
**2026-06-27:** Documented the intended OIDC role pointer as
|
||||
`auth/netkingdom/role/whynot-design-workload-kv-read` in
|
||||
`docs/workload-kv-access-lanes.md`. Live application is waiting on confirmation
|
||||
of the KeyCape/NetKingdom whynot-design bound claim or approved service-account
|
||||
subject; do not create an unbounded OIDC role.
|
||||
|
||||
**2026-06-28:** Created/confirmed
|
||||
`auth/netkingdom/role/whynot-design-workload-kv-read` with
|
||||
`groups=["whynot-design"]`, only the
|
||||
`workload-kv-read-whynot-design-npm-publish` policy, `ttl=15m`, and the approved
|
||||
browser/local CLI callback URIs.
|
||||
|
||||
**2026-06-28:** Positive verification found the OIDC role was missing
|
||||
`oidc_scopes`, causing OpenBao login to fail with `groups claim not found`.
|
||||
Updated the live role and source CCR to request `openid`, `profile`, `email`,
|
||||
and `groups`, matching the platform-admin OIDC scope shape.
|
||||
|
||||
## T04 - Provision the KV path without exposing the token
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c43724a3-c83e-4ab6-b7d1-e427fd93a9a9"
|
||||
```
|
||||
|
||||
Have an approved operator create or confirm the OpenBao KV entry for the npm
|
||||
publish token.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The path exists at
|
||||
`platform/workloads/coulomb/whynot-design/npm-publish`.
|
||||
- The field is named exactly `NPM_AUTH_TOKEN`.
|
||||
- The token value is entered through an approved operator/OpenBao path and is
|
||||
never written to Git, State Hub, chat, prompts, shell history, or workplan
|
||||
text.
|
||||
- Non-secret evidence records only the path, field name, actor, timestamp,
|
||||
policy name, and verification result.
|
||||
|
||||
**2026-06-27:** The concrete path and field are now documented. Live secret
|
||||
provisioning is waiting on an approved operator/OpenBao custody path for the
|
||||
actual `NPM_AUTH_TOKEN` value.
|
||||
|
||||
**2026-06-28:** Confirmed the OpenBao metadata at
|
||||
`platform/workloads/coulomb/whynot-design/npm-publish` includes
|
||||
`catalog-id=whynot-design-npm-publish` and that the `NPM_AUTH_TOKEN` field is
|
||||
present. The value was not printed, recorded, or copied into Git, State Hub,
|
||||
chat, or workplans.
|
||||
|
||||
## T05 - Verify caller-scoped fetch behavior
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "dc1f470b-e78a-48a9-9957-965aed47861f"
|
||||
```
|
||||
|
||||
Prove that the authorized identity can read the token through the intended
|
||||
OpenBao path and that unauthorized identities cannot.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- An approved whynot-design identity or operator role can authenticate and
|
||||
perform the fetch without unresolved `<...>` placeholders.
|
||||
- Negative verification shows a non-whynot identity cannot read the path.
|
||||
- Verification output contains no token value.
|
||||
- OpenBao audit evidence exists for the authorized read and denied read, with
|
||||
only non-secret request ids/timestamps recorded in the workplan or State Hub.
|
||||
|
||||
**2026-06-27:** Verification is waiting on live policy/role application and
|
||||
secret provisioning. The runbook requires positive and negative fetch evidence
|
||||
without printing the token value.
|
||||
|
||||
**2026-06-28:** Non-secret operator checks now pass for policy, auth role,
|
||||
metadata, and field presence. Remaining verification is the attended
|
||||
whynot-design OIDC positive check and a non-whynot denial check, both without
|
||||
printing the token.
|
||||
|
||||
**2026-06-29:** Positive and negative caller verification passed without
|
||||
printing the token value. The negative check failed OIDC login with the expected
|
||||
groups bound-claim mismatch. `platform-root` was restored to the
|
||||
`whynot-design` group after the temporary negative-test removal.
|
||||
|
||||
## T06 - Coordinate ops-warden catalog activation
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T06
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "8e84ec19-01db-4baf-a532-de87e51d4994"
|
||||
```
|
||||
|
||||
Send ops-warden the non-secret pointers needed to create and activate its
|
||||
dedicated access catalog entry.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The State Hub reply to ops-warden includes only path, field, KV mount,
|
||||
OIDC role, policy name/path, optional flex-auth ref, and runbook location.
|
||||
- Ops-warden confirms the `whynot-design-npm-publish` catalog entry no longer
|
||||
contains unresolved placeholders.
|
||||
- `warden access "npm auth token" --fetch` or the agreed exact selector resolves
|
||||
to the whynot-design lane and proxies the read as the caller.
|
||||
- ops-warden confirms it holds no token value and only proxies OpenBao access.
|
||||
|
||||
**2026-06-27:** Added `docs/workload-kv-access-lanes.md` with the non-secret
|
||||
handoff payload for ops-warden and sent the pointers by State Hub message. The
|
||||
entry should remain draft/non-active until live OpenBao provisioning and
|
||||
verification complete.
|
||||
|
||||
**2026-06-28:** The generic `openbao-api-key` ops-warden access lane can proxy
|
||||
the check with explicit `--path` and `--field`, but the dedicated
|
||||
`whynot-design-npm-publish` route is not yet present in the ops-warden routing
|
||||
catalog. Keep activation pending until caller verification and catalog update.
|
||||
|
||||
**2026-06-29:** `CCR-2026-0001` is now active with
|
||||
`access_frontdoor.readiness=ready` and `resolvable=true`. ops-warden still needs
|
||||
to confirm that its dedicated `whynot-design-npm-publish` catalog selector
|
||||
resolves through the caller-scoped lane.
|
||||
|
||||
**2026-06-29:** ops-warden confirmed in State Hub message
|
||||
`f76d3a9e-a98f-4081-885d-b79d94312699` that catalog selector
|
||||
`whynot-design-npm-publish` is `status: active`, `resolvable: true`, and wired
|
||||
to the owner-confirmed lane:
|
||||
`platform/workloads/coulomb/whynot-design/npm-publish`, field
|
||||
`NPM_AUTH_TOKEN`, OIDC role `whynot-design-workload-kv-read`, and policy
|
||||
`workload-kv-read-whynot-design-npm-publish`. ops-warden also confirmed it
|
||||
notified whynot-design with `warden access whynot-design-npm-publish --exec -- npm publish`,
|
||||
and that the sibling lanes remain draft for separate planning.
|
||||
|
||||
## T07 - Decide whether to batch sibling workload-KV requests
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0006-T07
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "0b3ab5f5-e933-41f2-b29a-ab4ac50593aa"
|
||||
```
|
||||
|
||||
Ops-warden noted similar still-open access lanes for
|
||||
`issue-core-ingestion-api-key` and `openrouter-llm-connect`. Decide whether to
|
||||
batch those paths in the same provisioning pass or keep this workplan scoped to
|
||||
whynot-design.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The decision is recorded without secret values.
|
||||
- If batching is approved, add concrete sub-tasks or a follow-up workplan for
|
||||
each additional lane.
|
||||
- If batching is deferred, notify ops-warden that this workplan will deliver
|
||||
whynot-design first and leave the sibling entries for separate planning.
|
||||
|
||||
**2026-06-27:** Initially deferred sibling lanes (`issue-core-ingestion-api-key`
|
||||
and `openrouter-llm-connect`) so the whynot-design npm token request could be
|
||||
serviced first. The later ops-warden batch follow-up is now represented as
|
||||
proposed CCRs in `RAILIANCE-WP-0007`, still unapproved and unresolvable until
|
||||
human review and verification.
|
||||
|
||||
**2026-06-29:** Reviewed the sibling lane suggestions against `INTENT.md`.
|
||||
Created follow-up workplans `RAILIANCE-WP-0009` for the issue-core runtime
|
||||
ingestion credential lane and `RAILIANCE-WP-0010` for the llm-connect
|
||||
OpenRouter provider key lane. Both plans keep this repo's scope limited to
|
||||
shared platform secret custody, least-privilege OpenBao/External Secrets
|
||||
delivery, verification, and ops-warden front-door handoff.
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- The whynot-design npm publish token has a concrete OpenBao KV path, field,
|
||||
read policy, and auth role.
|
||||
- The authorized caller can fetch the token as itself through OpenBao and
|
||||
ops-warden without ops-warden storing the value.
|
||||
- Unauthorized reads are denied.
|
||||
- ops-warden has enough non-secret pointers to activate
|
||||
`whynot-design-npm-publish`.
|
||||
- No secret values appear in Git, State Hub, chat, prompts, logs, or workplans.
|
||||
@@ -0,0 +1,404 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0007
|
||||
type: workplan
|
||||
title: "Credential Change Proposal Review Workflow"
|
||||
domain: financials
|
||||
repo: railiance-platform
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 7
|
||||
created: "2026-06-27"
|
||||
updated: "2026-06-30"
|
||||
depends_on_workplans:
|
||||
- RAIL-PL-WP-0002
|
||||
- RAILIANCE-WP-0005
|
||||
- RAILIANCE-WP-0006
|
||||
state_hub_workstream_id: "4d7ce243-f40a-4249-a46a-a24f75d6fe4c"
|
||||
---
|
||||
|
||||
# RAILIANCE-WP-0007 - Credential Change Proposal Review Workflow
|
||||
|
||||
## Goal
|
||||
|
||||
Create a proposal -> review -> approve/deny with comment -> apply -> verify
|
||||
workflow for credential and it-sec changes, so operators do not need to author
|
||||
or mentally validate raw OpenBao commands.
|
||||
|
||||
The first target is the whynot-design npm token lane from `RAILIANCE-WP-0006`.
|
||||
The workflow should then generalize to workload KV paths, OpenBao token roles,
|
||||
ops-warden access catalog entries, External Secrets lanes, credential rotation,
|
||||
deactivation, and compromise handling.
|
||||
|
||||
## Direction
|
||||
|
||||
Do not start by extending OpenBao. Instead, build a small approval control
|
||||
plane around OpenBao:
|
||||
|
||||
- OpenBao remains the enforcement, secret storage, token, and audit engine.
|
||||
- State Hub stores non-secret request lifecycle, comments, decisions, and
|
||||
evidence.
|
||||
- Repo files store reviewable non-secret request specs and generated policy
|
||||
artifacts.
|
||||
- Agents and CLIs create proposals and render them for human review.
|
||||
- Humans approve or deny with comments.
|
||||
- Only approved requests can be applied by an operator-controlled runner or
|
||||
interactive runbook.
|
||||
|
||||
If the workflow proves valuable, a later UI or OpenBao extension can surface the
|
||||
same request index and statuses.
|
||||
|
||||
## Proposed Object
|
||||
|
||||
Introduce a non-secret Credential Change Request, or `CCR`.
|
||||
|
||||
Each CCR captures:
|
||||
|
||||
- request id, title, requester, reviewer, approver, and applier;
|
||||
- target tenant/workload/environment/purpose;
|
||||
- OpenBao mount, path, fields, policies, auth roles, and bound claims;
|
||||
- access front door such as ops-warden, External Secrets, CSI, or direct caller
|
||||
fetch;
|
||||
- risk classification and approval requirements;
|
||||
- generated apply plan and verification plan;
|
||||
- rollback, deactivate, rotate, and compromise response plan;
|
||||
- comments, decision, timestamps, and non-secret audit evidence.
|
||||
|
||||
Each CCR explicitly excludes secret values, token values, private keys,
|
||||
passwords, unseal/recovery material, and secret-bearing command output.
|
||||
|
||||
## Tasks
|
||||
|
||||
## T01 - Record the approval workflow design
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c82ee783-80f1-48da-a9ed-4565eac699fc"
|
||||
```
|
||||
|
||||
Document the desired operator workflow and why it should sit around OpenBao
|
||||
rather than inside the OpenBao UI initially.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The design describes the proposal, review, approval/denial, apply, verify,
|
||||
activate, deactivate, rotate, and compromised states.
|
||||
- The design names where State Hub, OpenBao, ops-warden, repo files, agents,
|
||||
and interactive runbooks fit.
|
||||
- The design keeps secret values out of State Hub, Git, chat, and prompts.
|
||||
|
||||
**2026-06-27:** Added `docs/credential-change-approval.md` with the control
|
||||
plane direction, CCR object, state machine, State Hub/OpenBao/ops-warden roles,
|
||||
interactive runbook role, and compromise/deactivation path.
|
||||
|
||||
## T02 - Define the CCR schema and storage layout
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d50fb9e2-68c2-4a2b-8476-ce646d13e60a"
|
||||
```
|
||||
|
||||
Create a versioned non-secret schema for credential change requests.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- A schema exists for `workload-kv-read` requests covering mount, path, fields,
|
||||
policy name, auth role, bound claims, access front door, verification plan,
|
||||
and activation conditions.
|
||||
- The schema supports decision metadata: requested, proposed, approved,
|
||||
denied, needs_changes, applied, verified, active, deactivated, rotated,
|
||||
compromised, superseded, and cancelled.
|
||||
- The schema supports comments and references State Hub ids without storing
|
||||
secrets.
|
||||
- Example CCR fixtures include the whynot-design npm token lane.
|
||||
|
||||
**2026-06-27:** Added `schemas/credential-change-request.schema.yaml`, the
|
||||
`credential-change-requests/` storage directory, and
|
||||
`credential-change-requests/CCR-2026-0001-whynot-design-npm-publish.yaml` as the
|
||||
first non-secret CCR fixture. The whynot CCR is intentionally `proposed` and
|
||||
marks the bound claim as unconfirmed, so apply is blocked until review.
|
||||
|
||||
## T03 - Add offline validation and rendering
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T03
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "012f05cd-30ce-43dd-802b-4acc938db133"
|
||||
```
|
||||
|
||||
Add a helper that validates CCR files and renders human review summaries.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Invalid CCRs fail before any OpenBao apply is attempted.
|
||||
- The renderer produces a compact review block that a human can understand in
|
||||
chat or State Hub.
|
||||
- The renderer highlights risky fields: broad claims, wildcard paths,
|
||||
privileged policies, missing negative verification, and missing deactivation
|
||||
plan.
|
||||
- A secret-pattern scan rejects likely token values in CCR files.
|
||||
|
||||
**2026-06-27:** Added `scripts/credential-change.py validate` and `render`,
|
||||
plus Make targets `credential-change-validate` and `credential-change-render`.
|
||||
Validation rejects secret-looking markers and broad/unsafe request shapes; render
|
||||
produces the chat/State Hub review summary and highlights unconfirmed bound
|
||||
claims. CCRs now also carry machine-readable front-door readiness fields:
|
||||
`access_frontdoor.readiness` and `access_frontdoor.resolvable`. Unit coverage
|
||||
lives in `tests/test_credential_change.py`.
|
||||
|
||||
## T04 - Generate OpenBao apply plans from approved CCRs
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T04
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "1b2e7752-815c-46f8-a2e2-212e8d04da80"
|
||||
```
|
||||
|
||||
Generate deterministic, reviewable OpenBao apply plans from CCRs.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- A workload KV CCR can generate policy HCL and auth-role commands or API
|
||||
payloads.
|
||||
- The plan includes a dry-run mode and a diff against existing source
|
||||
artifacts when available.
|
||||
- Applying a plan is refused unless the CCR is approved.
|
||||
- The applier uses an approved operator authority path and does not accept raw
|
||||
tokens in argv or logs.
|
||||
|
||||
**2026-06-27:** Added `plan` and guarded `apply-plan` rendering for workload KV
|
||||
CCRs, with Make targets `credential-change-plan` and
|
||||
`credential-change-apply-plan`. `apply-plan` currently refuses any CCR that is
|
||||
not `approved` and also refuses unconfirmed bound claims. Remaining T04 work is
|
||||
to add a richer diff against existing source artifacts and eventually bridge
|
||||
from reviewed plan to the interactive live applier.
|
||||
|
||||
**2026-06-28:** Added OIDC `allowed_redirect_uris` to the CCR contract and
|
||||
generated role payloads after live OpenBao rejected an OIDC role without
|
||||
callbacks. Unit coverage now checks the generated whynot-design role payload.
|
||||
|
||||
**2026-06-30:** Added source-artifact diff rendering to `plan` and delegated
|
||||
`applier-dry-run` output. The generated plan now reports whether the checked-in
|
||||
policy artifact matches the CCR-generated HCL and shows a unified diff when it
|
||||
does not. Approved-only `apply-plan`/`operator-commands` remain gated by CCR
|
||||
status and confirmed auth binding.
|
||||
|
||||
## T05 - Add chat/CLI approval commands
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T05
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "e6d4d2d1-1881-4db7-92f8-05e3fdb846ae"
|
||||
```
|
||||
|
||||
Make the workflow usable from chat and command line.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Operators can approve, deny, or request changes with a comment.
|
||||
- Approvals/denials are recorded as non-secret State Hub events and in the CCR
|
||||
file or linked decision record.
|
||||
- The system refuses apply when the latest human decision is denied or
|
||||
needs_changes.
|
||||
- Agents can propose changes and respond to review comments without receiving
|
||||
secret values.
|
||||
|
||||
**2026-06-27:** Added file-backed `approve`, `deny`, and `needs-changes`
|
||||
commands that require reviewer and comment text and append non-secret review
|
||||
comments to the CCR. Added `confirm-binding` for explicit non-secret auth
|
||||
binding confirmation. Added `status` plus Make targets
|
||||
`credential-change-status` and `credential-change-status-json` so ops-warden can
|
||||
consume `readiness`/`resolvable` without scraping prose. Remaining T05 work is
|
||||
State Hub decision-event emission and tighter chat integration. Created a
|
||||
State Hub decision for `CCR-2026-0001` and added `sync-decision` so resolved
|
||||
State Hub decisions can update the file-backed CCR status.
|
||||
|
||||
**2026-06-30:** Added optional `--record-state-hub` emission for approve, deny,
|
||||
and needs-changes commands. Review comments are checked for known secret markers
|
||||
before being written, and the State Hub progress event records only non-secret
|
||||
CCR id/path/policy/field/auth-role metadata plus the reviewer comment.
|
||||
|
||||
## T06 - Build an interactive runbook for apply and verify
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T06
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3c3fc38c-afa4-4367-b3e6-ba4b286ced30"
|
||||
```
|
||||
|
||||
Wrap privileged application in an operator-friendly guided runbook.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The runbook loads an approved CCR, shows the plan, asks for final attended
|
||||
confirmation, then applies policy/auth metadata.
|
||||
- Secret value entry is handled through an approved OpenBao/operator path and
|
||||
is never echoed or logged.
|
||||
- Positive and negative verification steps are guided.
|
||||
- Non-secret evidence is recorded automatically.
|
||||
|
||||
**2026-06-30:** Added `scripts/credential-change.py runbook <CCR>` and Make
|
||||
target `credential-change-runbook` to render the attended operator checklist,
|
||||
final confirmation phrase, metadata apply guidance, secret custody instructions,
|
||||
positive/negative verification steps, activation conditions, and evidence
|
||||
commands. `runbook --execute-metadata` is opt-in, requires the exact `APPLY
|
||||
<CCR-ID>` confirmation phrase, uses the local `bao` CLI with ambient approved
|
||||
operator authority, writes only policy/auth metadata, and records a non-secret
|
||||
`metadata_apply` evidence entry. Added `record-evidence` plus Make target
|
||||
`credential-change-record-evidence` so operators can append apply, secret
|
||||
provisioning, verification, and activation evidence to the CCR and optionally
|
||||
State Hub without storing secret values.
|
||||
|
||||
## T07 - Pilot with whynot-design and ops-warden
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T07
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "07a7d8bf-5528-41c8-a791-d6ccd0466a33"
|
||||
```
|
||||
|
||||
Use the existing whynot-design npm token lane as the first end-to-end pilot.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- The current whynot-design lane is represented as a CCR.
|
||||
- The CCR is rendered and reviewed in chat or State Hub.
|
||||
- A human approval or denial comment is recorded.
|
||||
- If approved, the runbook applies the policy/auth metadata, guides secret
|
||||
provisioning, verifies access, and notifies ops-warden.
|
||||
- ops-warden activates its catalog entry only after CCR verification.
|
||||
|
||||
**2026-06-27:** The whynot-design lane is represented as `CCR-2026-0001` and
|
||||
can be rendered for review. The whynot-design bound claim was confirmed from
|
||||
operator chat context and recorded in the CCR, but it remains proposed/unapproved,
|
||||
so live apply and ops-warden activation are correctly blocked.
|
||||
|
||||
**2026-06-27:** Converted the ops-warden batch follow-up
|
||||
`fe5b1696-8956-4bd5-9d6f-dbde1901a076` into three proposed CCRs:
|
||||
`CCR-2026-0001` for `whynot-design-npm-publish`, `CCR-2026-0002` for
|
||||
`issue-core-ingestion-api-key`, and `CCR-2026-0003` for
|
||||
`llm-connect-openrouter-api-key`. All three are explicitly `readiness: template`
|
||||
and `resolvable: false` until owner confirmation, approval, OpenBao apply,
|
||||
secret provisioning, and verification are complete.
|
||||
|
||||
**2026-06-28:** Synced State Hub decision
|
||||
`250669d0-8475-4527-9624-cd072249f9a9` into `CCR-2026-0001`; the lane is now
|
||||
`approved` with confirmed binding and `apply_allowed: true`. A live OpenBao
|
||||
policy apply using the available token helper reached the active OpenBao pod but
|
||||
still failed with `403 permission denied` on
|
||||
`sys/policies/acl/workload-kv-read-whynot-design-npm-publish`, so the front door
|
||||
remains `readiness: template` and `resolvable: false`. Added guarded
|
||||
`credential-change-operator-commands` output so a platform operator can run the
|
||||
reviewed non-secret policy and auth-role commands without hand-writing them;
|
||||
secret value provisioning and verification remain under approved custody.
|
||||
|
||||
**2026-06-28:** After correcting the tenant/org to `coulomb`, the corrected
|
||||
approval was synced from State Hub decision
|
||||
`e6381a56-6b04-4fd5-b2de-f3ef59cde888`; `CCR-2026-0001` is approved and
|
||||
`apply_allowed: true` for
|
||||
`platform/workloads/coulomb/whynot-design/npm-publish`. The operator reported
|
||||
secret provisioning likely completed, but Codex metadata-only verification still
|
||||
received `403 permission denied`. Prepared
|
||||
`docs/whynot-design-npm-publish-handoff.md` as the next-session checklist for
|
||||
policy, auth-role, metadata verification, positive verification, negative
|
||||
verification, and activation without printing the token.
|
||||
|
||||
**2026-06-28:** With the temporary operator token, Codex applied/confirmed the
|
||||
OpenBao read policy and OIDC role, confirmed metadata `catalog-id`, and confirmed
|
||||
`NPM_AUTH_TOKEN` field presence without printing or recording the value. The CCR
|
||||
now records non-secret evidence for that apply check. Positive whynot-design and
|
||||
negative non-whynot caller verification still gate `active`/`ready`.
|
||||
|
||||
**2026-06-29:** The whynot-design pilot completed OpenBao verification. Positive
|
||||
fetch succeeded with output suppressed, negative login failed with the expected
|
||||
groups bound-claim mismatch, `platform-root` membership was restored afterward,
|
||||
and `CCR-2026-0001` is now active/ready/resolvable. ops-warden catalog
|
||||
confirmation remains the external closeout step.
|
||||
|
||||
**2026-06-30:** Closed the pilot task based on the active/ready/resolvable CCR
|
||||
state and prior ops-warden catalog confirmation that the selector is active and
|
||||
resolvable. The remaining lifecycle work is now tracked separately in T08.
|
||||
|
||||
## T08 - Add deactivation, rotation, and compromise flows
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T08
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "23d6ef9d-8dbc-4468-b486-5ec8ada71130"
|
||||
```
|
||||
|
||||
Support lifecycle states beyond initial creation.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Existing credentials can be imported as CCR-backed inventory without secret
|
||||
values.
|
||||
- Operators can mark a lane deactivated, rotated, or compromised with reason
|
||||
and evidence.
|
||||
- Deactivation disables the relevant access front door and auth/policy path.
|
||||
- Compromise flow records blast-radius notes and required follow-up tasks.
|
||||
|
||||
**2026-06-30:** Added `lifecycle-plan`, `lifecycle-event`, and
|
||||
`import-inventory` commands plus Make targets. Lifecycle plans render
|
||||
deactivation, rotation, and compromise guidance, including access-front-door
|
||||
state changes and OpenBao metadata disable commands for deactivation or
|
||||
compromise. Lifecycle events update CCR status/front-door readiness, append
|
||||
non-secret lifecycle evidence, and optionally post State Hub progress.
|
||||
Compromise events accept non-secret blast-radius and follow-up references.
|
||||
`import-inventory` can create a CCR-backed inventory file and matching read
|
||||
policy artifact for an existing lane without asking for or storing secret
|
||||
values.
|
||||
|
||||
## T09 - Add decision templates and guided review actions
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0007-T09
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c436fd8b-cd82-4600-81b0-87ec069d7ae6"
|
||||
```
|
||||
|
||||
Remove the current friction where reviewers must know magic rationale prefixes
|
||||
for State Hub decisions to sync back into CCR status.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- Each CCR review page or chat handoff shows explicit approve, deny, and needs
|
||||
changes templates.
|
||||
- Generated templates include the accepted prefixes (`APPROVE:`, `DENY:`, and
|
||||
`NEEDS_CHANGES:`) and pre-fill the CCR id, corrected path, policy, auth role,
|
||||
and non-secret rationale prompt.
|
||||
- The dashboard or agent response links directly to the decision and states what
|
||||
phrase or button will be recognized.
|
||||
- The sync tooling refuses ambiguous free-text approvals with a friendly message
|
||||
that shows the valid templates.
|
||||
- Future UI work can replace prefix parsing with structured decision outcomes
|
||||
without changing the CCR audit trail.
|
||||
|
||||
**2026-06-30:** Added `scripts/credential-change.py decision-templates <CCR>`
|
||||
and Make target `credential-change-decision-templates`. The generated templates
|
||||
include accepted prefixes, CCR id, KV path, policy, auth-role path, and the
|
||||
linked State Hub decision. Ambiguous State Hub rationale text now fails with the
|
||||
valid templates in the error message.
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- A human can review and approve or deny a credential/security change without
|
||||
writing raw OpenBao commands.
|
||||
- An approved request can be applied by an operator-controlled helper or
|
||||
interactive runbook.
|
||||
- State Hub and repo artifacts contain non-secret lifecycle, decision, and
|
||||
evidence records.
|
||||
- OpenBao remains the enforcement and audit source for actual secret access.
|
||||
- The whynot-design npm token lane can complete through this workflow.
|
||||
Reference in New Issue
Block a user