First consumer of the shared apps-pg cluster: managed role vergabe in apps-pg-cluster.yaml plus Database CR vergabe-db in new helm/apps-pg-databases.yaml. .gitignore whitelists helm/*-databases.yaml. Workplan implementation notes from codex folded in. Live: Database CR applied=true, psql from vergabe-teilnahme ns returns PostgreSQL 16.13. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
417 lines
15 KiB
Markdown
417 lines
15 KiB
Markdown
---
|
|
id: RAILIANCE-WP-0003
|
|
type: workplan
|
|
title: "Provision shared CNPG cluster apps-pg"
|
|
domain: railiance
|
|
repo: railiance-platform
|
|
status: finished
|
|
owner: codex
|
|
topic_slug: railiance
|
|
planning_priority: high
|
|
planning_order: 3
|
|
created: "2026-05-19"
|
|
updated: "2026-05-19"
|
|
state_hub_workstream_id: "665b3b9b-608a-4be4-84b6-dcb8261ff57b"
|
|
---
|
|
|
|
# RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg
|
|
|
|
## Goal
|
|
|
|
Provision a new shared CloudNativePG cluster `apps-pg` in the
|
|
`databases` namespace that S5 application workloads can use to host
|
|
their own PostgreSQL databases — without each app forcing the creation
|
|
of a dedicated CNPG cluster.
|
|
|
|
This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme
|
|
needs a `vergabe` role + `vergabe_db` database) by establishing the
|
|
shared cluster and the governed onboarding contract future S5 apps adopt
|
|
by default.
|
|
|
|
## Context
|
|
|
|
`railiance-apps` workplan `RAILIANCE-WP-0002` (establish
|
|
vergabe-teilnahme on railiance01) found at T01 that the two existing
|
|
CNPG clusters in `databases` are app-dedicated:
|
|
|
|
| Cluster | PG | Owner app |
|
|
|----------------|----|-------------|
|
|
| `gitea-db` | 18 | gitea |
|
|
| `net-kingdom-pg` | 16 | net-kingdom |
|
|
|
|
Decision `D-01` (resolved 2026-05-18, bernd) selected option D:
|
|
**provision a new shared cluster `apps-pg`** rather than create a third
|
|
dedicated cluster (option A) or retrofit an existing app cluster (B/C).
|
|
|
|
A coordination message was sent from `railiance-apps` to
|
|
`railiance-platform` requesting this work; this workplan is the
|
|
response.
|
|
|
|
## Placement in the Railiance Tooling Set
|
|
|
|
S3 owns CNPG `Cluster` CRs (per ADR-003 and the pattern already
|
|
established by `helm/gitea-db-cluster.yaml`). CNPG 1.28 has standalone
|
|
`Database` CRs, but PostgreSQL role lifecycle is managed through the
|
|
target `Cluster` spec's `.spec.managed.roles` stanza or through a
|
|
controlled operator-run SQL workflow. The shared-cluster contract must
|
|
therefore make role onboarding explicit; S5 repos should not assume a
|
|
standalone CNPG `Role` CR exists.
|
|
|
|
| Concern | Owner repo | Scope |
|
|
|---------|------------|-------|
|
|
| `Cluster apps-pg` CR, shared NetworkPolicies, bootstrap secret, baseline docs | `railiance-platform` | this workplan |
|
|
| Per-app database request and application DSN wiring | each S5 repo | not here |
|
|
| Per-app PostgreSQL role + credential provisioning | coordinated | documented here; platform-administered until OpenBao/dedicated automation exists |
|
|
| Per-app runtime Secret in the consumer namespace | each S5 repo | not here |
|
|
|
|
## Current Evidence
|
|
|
|
- `kubectl get crd | grep cnpg` confirms CNPG 1.28.1 with the
|
|
`databases.postgresql.cnpg.io` CRD — databases can be represented
|
|
declaratively.
|
|
- CNPG role management is cluster-scoped via `.spec.managed.roles`;
|
|
no standalone CNPG `Role` CR is available for app repos to apply.
|
|
- Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`
|
|
(`cnpg-system` namespace).
|
|
- `databases` namespace has a default-deny-all NetworkPolicy; each
|
|
CNPG cluster therefore needs its own NetworkPolicy triplet
|
|
(egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns)
|
|
— pattern visible in `helm/gitea-db-networkpolicies.yaml`.
|
|
- `helm/apps-pg-cluster.yaml`, `helm/apps-pg-networkpolicies.yaml`,
|
|
`helm/apps-pg-secret.sops.yaml.template`, and `docs/apps-pg.md` are
|
|
present in the repo.
|
|
- Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f`
|
|
(state-hub thread).
|
|
|
|
## Implementation Notes
|
|
|
|
Completed on 2026-05-19.
|
|
|
|
- CNPG operator is `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`.
|
|
- `clusters.postgresql.cnpg.io` and `databases.postgresql.cnpg.io` CRDs
|
|
are present; `roles.postgresql.cnpg.io` is not present, so role
|
|
onboarding remains platform-administered through managed roles or a
|
|
controlled SQL workflow.
|
|
- `local-path` is the default StorageClass. The single K3s node reports
|
|
no memory, disk, or PID pressure; allocatable ephemeral storage is
|
|
about 97.7 GB and memory is about 3.8 GiB. Existing CNPG PVC footprint
|
|
before `apps-pg` was two 10Gi PVCs (`gitea-db-1`,
|
|
`net-kingdom-pg-1`).
|
|
- `databases` exists with `default-deny-all`; `cnpg-system` has the
|
|
required `kubernetes.io/metadata.name=cnpg-system` namespace label.
|
|
- The live CNPG CRD rejected `spec.postgresql.version`; the deployed
|
|
`apps-pg` manifest therefore pins PostgreSQL 16 with
|
|
`imageName: ghcr.io/cloudnative-pg/postgresql:16`.
|
|
- `apps-pg` is deployed in `databases`, reports `Cluster in healthy
|
|
state`, and has primary `apps-pg-1`.
|
|
- Services `apps-pg-rw` and `apps-pg-ro` exist. With one instance,
|
|
`apps-pg-ro` is present but has no replica endpoint until HA is added.
|
|
- A disposable namespace labeled
|
|
`railiance.io/postgres-client=apps-pg` successfully connected to
|
|
`apps-pg-rw.databases.svc.cluster.local:5432/apps_meta` as
|
|
`apps_admin`; the temporary namespace and copied smoke-test secret
|
|
were deleted immediately after the check.
|
|
|
|
## Safety Contract
|
|
|
|
- Do not commit plaintext credentials. Bootstrap secret is a one-time
|
|
manual `kubectl create secret` then SOPS-encrypt a template into
|
|
`helm/apps-pg-secret.sops.yaml.template`.
|
|
- Do not expose `apps_admin` to S5 applications. It is a platform
|
|
bootstrap/smoke-test role, not a runtime credential.
|
|
- Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`.
|
|
This cluster is for S5 *application* DBs.
|
|
- Preserve the default-deny NetworkPolicy posture in `databases`;
|
|
only allow ingress from namespaces that have a registered consumer.
|
|
- Do not advertise self-service role creation until the role
|
|
provisioning mechanism is explicit. CNPG `Database` CRs still require
|
|
their owner role to exist.
|
|
- Initial sizing is conservative (1 instance, 10Gi) to match the
|
|
existing per-cluster footprint. Resize is a follow-up workplan.
|
|
- Cluster name `apps-pg` is locked once published — renaming changes
|
|
every consumer DSN.
|
|
|
|
## Target State
|
|
|
|
- `kubectl get cluster apps-pg -n databases` reports
|
|
`Cluster in healthy state` with the primary `apps-pg-1`.
|
|
- `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists.
|
|
- NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet.
|
|
- `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs`
|
|
targets exist and work.
|
|
- Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`)
|
|
exist for cluster health probes and to anchor the bootstrap; the
|
|
cluster is otherwise empty of per-app data.
|
|
- Documentation explains how an S5 consumer registers a new database,
|
|
including the current CNPG boundary: the `Database` CR is separate,
|
|
but role lifecycle is cluster-scoped and therefore governed by the
|
|
platform contract.
|
|
- `railiance-apps` is notified via the hub thread; their
|
|
`RAILIANCE-WP-0002 T04` can proceed using the documented onboarding
|
|
path.
|
|
|
|
## Tasks
|
|
|
|
### T01 — Inventory and capacity check
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749"
|
|
```
|
|
|
|
Confirm the substrate before adding a new cluster.
|
|
|
|
Checks:
|
|
|
|
- CNPG operator version (≥ 1.28.x required for the `Database` CR
|
|
consumer pattern).
|
|
- Role/database API boundary: `Database` CR is present; role lifecycle
|
|
is `.spec.managed.roles` or controlled SQL, not a separate `Role` CR.
|
|
- Node-level disk space available for an additional 10Gi PVC
|
|
(`local-path` storage class is the active default).
|
|
- Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any
|
|
current resource pressure.
|
|
- That the `databases` namespace already exists and has its
|
|
default-deny NetworkPolicy in place.
|
|
- That `cnpg-system` namespace label
|
|
`kubernetes.io/metadata.name=cnpg-system` is set (required by the
|
|
ingress-from-operator NetworkPolicy).
|
|
|
|
**Done when:** the implementation notes record CNPG version, available
|
|
PVC capacity, the chosen role onboarding mechanism, and any
|
|
pre-condition gaps.
|
|
|
|
---
|
|
|
|
### T02 — Create bootstrap credential secret
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08"
|
|
```
|
|
|
|
Mint the one-time bootstrap secret that CNPG uses to create the initial
|
|
`apps_admin` role.
|
|
|
|
Steps:
|
|
|
|
```bash
|
|
APPS_PG_PW=$(openssl rand -base64 32)
|
|
kubectl create secret generic apps-pg-credentials \
|
|
--namespace databases \
|
|
--from-literal=username=apps_admin \
|
|
--from-literal=password="$APPS_PG_PW"
|
|
```
|
|
|
|
Then commit a SOPS-encrypted template:
|
|
|
|
- `helm/apps-pg-secret.sops.yaml.template` — encrypted form for
|
|
declarative reapply; do not commit the plaintext password.
|
|
|
|
The bootstrap role is intentionally not a consumer role. Per-app runtime
|
|
roles are created later through the onboarding mechanism documented in
|
|
T06; until dedicated automation exists, that mechanism is
|
|
platform-administered.
|
|
|
|
**Done when:** the secret exists in the cluster and an encrypted
|
|
template is committed.
|
|
|
|
---
|
|
|
|
### T03 — Add the CNPG Cluster manifest
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12"
|
|
```
|
|
|
|
Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`.
|
|
Do not add app-specific roles or databases to the baseline cluster
|
|
manifest unless T01 explicitly chooses a platform-owned managed-role
|
|
stanza as the interim onboarding path for the first consumer.
|
|
|
|
Shape:
|
|
|
|
```yaml
|
|
apiVersion: postgresql.cnpg.io/v1
|
|
kind: Cluster
|
|
metadata:
|
|
name: apps-pg
|
|
namespace: databases
|
|
labels:
|
|
app.kubernetes.io/name: apps-pg
|
|
app.kubernetes.io/component: database
|
|
app.kubernetes.io/managed-by: manual
|
|
railiance.io/layer: s3-platform
|
|
railiance.io/role: shared-apps-database
|
|
spec:
|
|
instances: 1 # bump when node RAM > 8GB
|
|
imageName: ghcr.io/cloudnative-pg/postgresql:16
|
|
storage:
|
|
size: 10Gi
|
|
bootstrap:
|
|
initdb:
|
|
database: apps_meta
|
|
owner: apps_admin
|
|
secret:
|
|
name: apps-pg-credentials
|
|
```
|
|
|
|
Note: PG version is 16 (matches vergabe-teilnahme's minimum and the
|
|
existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision.
|
|
|
|
**Done when:** the manifest is committed and `kubectl apply --dry-run`
|
|
validates against the cluster.
|
|
|
|
---
|
|
|
|
### T04 — Add NetworkPolicies for apps-pg
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T04
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1"
|
|
```
|
|
|
|
Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet
|
|
but parameterised for the *apps* consumer namespace pattern.
|
|
|
|
Three policies (all in `databases`, all selecting
|
|
`cnpg.io/cluster: apps-pg`):
|
|
|
|
1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443.
|
|
2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from
|
|
`namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP
|
|
ports 5432 / 8000 / 9187.
|
|
3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432
|
|
from any namespace carrying the label
|
|
`railiance.io/postgres-client=apps-pg`. (Each consuming app
|
|
namespace adds this label; this avoids hard-coding a namespace list
|
|
in the platform repo.)
|
|
|
|
The label-based selector is the meaningful difference from gitea-db,
|
|
which hard-codes `default`. The shared cluster cannot know its
|
|
consumer namespaces in advance, so it expects a positive opt-in label.
|
|
|
|
**Done when:** the policies are committed and applied; consumer namespaces
|
|
can connect after applying the `railiance.io/postgres-client=apps-pg`
|
|
label.
|
|
|
|
---
|
|
|
|
### T05 — Makefile targets, deploy, verify
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T05
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648"
|
|
```
|
|
|
|
Add targets that mirror the `db-*` (gitea-db) family:
|
|
|
|
```make
|
|
apps-pg-deploy: ## Apply shared apps-pg CNPG Cluster + NetworkPolicies
|
|
$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
|
|
$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml
|
|
|
|
apps-pg-status: ## Show apps-pg CNPG cluster health
|
|
$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
|
|
$(KUBECTL) get cluster apps-pg -n databases -o wide
|
|
|
|
apps-pg-shell: ## Open psql shell on apps-pg primary
|
|
$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta
|
|
|
|
apps-pg-logs: ## Tail apps-pg primary logs
|
|
$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50
|
|
```
|
|
|
|
Then deploy and wait for the cluster to converge:
|
|
|
|
```bash
|
|
make apps-pg-deploy
|
|
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m
|
|
```
|
|
|
|
Smoke checks:
|
|
|
|
- `cnpg status` reports `Cluster in healthy state`.
|
|
- Services `apps-pg-rw` and `apps-pg-ro` exist.
|
|
- From a disposable pod in a temporary namespace labeled
|
|
`railiance.io/postgres-client=apps-pg`, a platform-operated test
|
|
connection to `apps-pg-rw.databases:5432/apps_meta` succeeds. Delete
|
|
the temporary namespace and any copied test secret immediately after
|
|
the check; do not place `apps_admin` in an application namespace.
|
|
|
|
**Done when:** the smoke checks pass.
|
|
|
|
---
|
|
|
|
### T06 — Reply to railiance-apps, document the consumer contract
|
|
|
|
```task
|
|
id: RAILIANCE-WP-0003-T06
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547"
|
|
```
|
|
|
|
Notify the requester and capture the pattern.
|
|
|
|
Steps:
|
|
|
|
- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` through the
|
|
State Hub `/messages/` REST API with this workplan's id and the
|
|
cluster's connection details. Do not send bootstrap credentials.
|
|
- Add `docs/apps-pg.md` with:
|
|
- Cluster identity and connection endpoints.
|
|
- The per-app onboarding recipe: (a) request/approve a per-app role,
|
|
(b) provision the backing role and credential through the chosen
|
|
platform mechanism, (c) create the CNPG `Database` CR in the
|
|
`databases` namespace with `spec.cluster.name: apps-pg` and
|
|
`spec.owner` set to the approved role, (d) label the consumer
|
|
namespace `railiance.io/postgres-client=apps-pg`, (e) publish or
|
|
mirror the runtime Secret into the consumer namespace, and (f) wire
|
|
the DSN into the application Helm values.
|
|
- The CNPG 1.28 boundary: `Database` is standalone; role management is
|
|
not a standalone `Role` CR and must follow the platform contract.
|
|
- Backup posture (when the cluster is added to the existing platform
|
|
backup process) and the resize / replicate roadmap.
|
|
|
|
**Done when:** the message is replied to and `docs/apps-pg.md` is
|
|
committed.
|
|
|
|
## Completion Criteria
|
|
|
|
This workplan is complete when:
|
|
|
|
1. `apps-pg` reports healthy in the `databases` namespace.
|
|
2. NetworkPolicies enforce the default-deny posture with label-based
|
|
consumer opt-in.
|
|
3. Makefile targets work end-to-end.
|
|
4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly
|
|
acknowledged via the hub thread.
|
|
5. `docs/apps-pg.md` explains the consumer onboarding contract,
|
|
including the CNPG role/database boundary.
|
|
|
|
## Notes
|
|
|
|
- This intentionally does **not** hard-code the `vergabe` role or
|
|
`vergabe_db` into the shared cluster baseline. The consumer onboarding
|
|
doc must describe the follow-up request/manifest needed for
|
|
`railiance-apps` so the platform layer stays generic until an app
|
|
explicitly registers.
|
|
- Backup inclusion of `apps-pg` is a follow-up. The existing
|
|
`make backup` target only covers the legacy PostgreSQL-HA setup;
|
|
CNPG backup configuration is its own workplan.
|
|
- A second replica (HA) and a connection pooler (PgBouncer / CNPG
|
|
`Pooler`) are deferred. The cluster spec leaves room
|
|
for both — re-enable when node capacity allows.
|