--- id: RAILIANCE-WP-0003 type: workplan title: "Provision shared CNPG cluster apps-pg" domain: financials repo: railiance-platform status: finished owner: codex topic_slug: railiance planning_priority: high planning_order: 3 created: "2026-05-19" updated: "2026-05-19" state_hub_workstream_id: "665b3b9b-608a-4be4-84b6-dcb8261ff57b" --- # RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg ## Goal Provision a new shared CloudNativePG cluster `apps-pg` in the `databases` namespace that S5 application workloads can use to host their own PostgreSQL databases — without each app forcing the creation of a dedicated CNPG cluster. This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme needs a `vergabe` role + `vergabe_db` database) by establishing the shared cluster and the governed onboarding contract future S5 apps adopt by default. ## Context `railiance-apps` workplan `RAILIANCE-WP-0002` (establish vergabe-teilnahme on railiance01) found at T01 that the two existing CNPG clusters in `databases` are app-dedicated: | Cluster | PG | Owner app | |----------------|----|-------------| | `gitea-db` | 18 | gitea | | `net-kingdom-pg` | 16 | net-kingdom | Decision `D-01` (resolved 2026-05-18, bernd) selected option D: **provision a new shared cluster `apps-pg`** rather than create a third dedicated cluster (option A) or retrofit an existing app cluster (B/C). A coordination message was sent from `railiance-apps` to `railiance-platform` requesting this work; this workplan is the response. ## Placement in the Railiance Tooling Set S3 owns CNPG `Cluster` CRs (per ADR-003 and the pattern already established by `helm/gitea-db-cluster.yaml`). CNPG 1.28 has standalone `Database` CRs, but PostgreSQL role lifecycle is managed through the target `Cluster` spec's `.spec.managed.roles` stanza or through a controlled operator-run SQL workflow. The shared-cluster contract must therefore make role onboarding explicit; S5 repos should not assume a standalone CNPG `Role` CR exists. | Concern | Owner repo | Scope | |---------|------------|-------| | `Cluster apps-pg` CR, shared NetworkPolicies, bootstrap secret, baseline docs | `railiance-platform` | this workplan | | Per-app database request and application DSN wiring | each S5 repo | not here | | Per-app PostgreSQL role + credential provisioning | coordinated | documented here; platform-administered until OpenBao/dedicated automation exists | | Per-app runtime Secret in the consumer namespace | each S5 repo | not here | ## Current Evidence - `kubectl get crd | grep cnpg` confirms CNPG 1.28.1 with the `databases.postgresql.cnpg.io` CRD — databases can be represented declaratively. - CNPG role management is cluster-scoped via `.spec.managed.roles`; no standalone CNPG `Role` CR is available for app repos to apply. - Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1` (`cnpg-system` namespace). - `databases` namespace has a default-deny-all NetworkPolicy; each CNPG cluster therefore needs its own NetworkPolicy triplet (egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns) — pattern visible in `helm/gitea-db-networkpolicies.yaml`. - `helm/apps-pg-cluster.yaml`, `helm/apps-pg-networkpolicies.yaml`, `helm/apps-pg-secret.sops.yaml.template`, and `docs/apps-pg.md` are present in the repo. - Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f` (state-hub thread). ## Implementation Notes Completed on 2026-05-19. - CNPG operator is `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`. - `clusters.postgresql.cnpg.io` and `databases.postgresql.cnpg.io` CRDs are present; `roles.postgresql.cnpg.io` is not present, so role onboarding remains platform-administered through managed roles or a controlled SQL workflow. - `local-path` is the default StorageClass. The single K3s node reports no memory, disk, or PID pressure; allocatable ephemeral storage is about 97.7 GB and memory is about 3.8 GiB. Existing CNPG PVC footprint before `apps-pg` was two 10Gi PVCs (`gitea-db-1`, `net-kingdom-pg-1`). - `databases` exists with `default-deny-all`; `cnpg-system` has the required `kubernetes.io/metadata.name=cnpg-system` namespace label. - The live CNPG CRD rejected `spec.postgresql.version`; the deployed `apps-pg` manifest therefore pins PostgreSQL 16 with `imageName: ghcr.io/cloudnative-pg/postgresql:16`. - `apps-pg` is deployed in `databases`, reports `Cluster in healthy state`, and has primary `apps-pg-1`. - Services `apps-pg-rw` and `apps-pg-ro` exist. With one instance, `apps-pg-ro` is present but has no replica endpoint until HA is added. - A disposable namespace labeled `railiance.io/postgres-client=apps-pg` successfully connected to `apps-pg-rw.databases.svc.cluster.local:5432/apps_meta` as `apps_admin`; the temporary namespace and copied smoke-test secret were deleted immediately after the check. ## Safety Contract - Do not commit plaintext credentials. Bootstrap secret is a one-time manual `kubectl create secret` then SOPS-encrypt a template into `helm/apps-pg-secret.sops.yaml.template`. - Do not expose `apps_admin` to S5 applications. It is a platform bootstrap/smoke-test role, not a runtime credential. - Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`. This cluster is for S5 *application* DBs. - Preserve the default-deny NetworkPolicy posture in `databases`; only allow ingress from namespaces that have a registered consumer. - Do not advertise self-service role creation until the role provisioning mechanism is explicit. CNPG `Database` CRs still require their owner role to exist. - Initial sizing is conservative (1 instance, 10Gi) to match the existing per-cluster footprint. Resize is a follow-up workplan. - Cluster name `apps-pg` is locked once published — renaming changes every consumer DSN. ## Target State - `kubectl get cluster apps-pg -n databases` reports `Cluster in healthy state` with the primary `apps-pg-1`. - `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists. - NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet. - `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs` targets exist and work. - Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`) exist for cluster health probes and to anchor the bootstrap; the cluster is otherwise empty of per-app data. - Documentation explains how an S5 consumer registers a new database, including the current CNPG boundary: the `Database` CR is separate, but role lifecycle is cluster-scoped and therefore governed by the platform contract. - `railiance-apps` is notified via the hub thread; their `RAILIANCE-WP-0002 T04` can proceed using the documented onboarding path. ## Tasks ### T01 — Inventory and capacity check ```task id: RAILIANCE-WP-0003-T01 status: done priority: high state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749" ``` Confirm the substrate before adding a new cluster. Checks: - CNPG operator version (≥ 1.28.x required for the `Database` CR consumer pattern). - Role/database API boundary: `Database` CR is present; role lifecycle is `.spec.managed.roles` or controlled SQL, not a separate `Role` CR. - Node-level disk space available for an additional 10Gi PVC (`local-path` storage class is the active default). - Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any current resource pressure. - That the `databases` namespace already exists and has its default-deny NetworkPolicy in place. - That `cnpg-system` namespace label `kubernetes.io/metadata.name=cnpg-system` is set (required by the ingress-from-operator NetworkPolicy). **Done when:** the implementation notes record CNPG version, available PVC capacity, the chosen role onboarding mechanism, and any pre-condition gaps. --- ### T02 — Create bootstrap credential secret ```task id: RAILIANCE-WP-0003-T02 status: done priority: high state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08" ``` Mint the one-time bootstrap secret that CNPG uses to create the initial `apps_admin` role. Steps: ```bash APPS_PG_PW=$(openssl rand -base64 32) kubectl create secret generic apps-pg-credentials \ --namespace databases \ --from-literal=username=apps_admin \ --from-literal=password="$APPS_PG_PW" ``` Then commit a SOPS-encrypted template: - `helm/apps-pg-secret.sops.yaml.template` — encrypted form for declarative reapply; do not commit the plaintext password. The bootstrap role is intentionally not a consumer role. Per-app runtime roles are created later through the onboarding mechanism documented in T06; until dedicated automation exists, that mechanism is platform-administered. **Done when:** the secret exists in the cluster and an encrypted template is committed. --- ### T03 — Add the CNPG Cluster manifest ```task id: RAILIANCE-WP-0003-T03 status: done priority: high state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12" ``` Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`. Do not add app-specific roles or databases to the baseline cluster manifest unless T01 explicitly chooses a platform-owned managed-role stanza as the interim onboarding path for the first consumer. Shape: ```yaml apiVersion: postgresql.cnpg.io/v1 kind: Cluster metadata: name: apps-pg namespace: databases labels: app.kubernetes.io/name: apps-pg app.kubernetes.io/component: database app.kubernetes.io/managed-by: manual railiance.io/layer: s3-platform railiance.io/role: shared-apps-database spec: instances: 1 # bump when node RAM > 8GB imageName: ghcr.io/cloudnative-pg/postgresql:16 storage: size: 10Gi bootstrap: initdb: database: apps_meta owner: apps_admin secret: name: apps-pg-credentials ``` Note: PG version is 16 (matches vergabe-teilnahme's minimum and the existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision. **Done when:** the manifest is committed and `kubectl apply --dry-run` validates against the cluster. --- ### T04 — Add NetworkPolicies for apps-pg ```task id: RAILIANCE-WP-0003-T04 status: done priority: high state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1" ``` Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet but parameterised for the *apps* consumer namespace pattern. Three policies (all in `databases`, all selecting `cnpg.io/cluster: apps-pg`): 1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443. 2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from `namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP ports 5432 / 8000 / 9187. 3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432 from any namespace carrying the label `railiance.io/postgres-client=apps-pg`. (Each consuming app namespace adds this label; this avoids hard-coding a namespace list in the platform repo.) The label-based selector is the meaningful difference from gitea-db, which hard-codes `default`. The shared cluster cannot know its consumer namespaces in advance, so it expects a positive opt-in label. **Done when:** the policies are committed and applied; consumer namespaces can connect after applying the `railiance.io/postgres-client=apps-pg` label. --- ### T05 — Makefile targets, deploy, verify ```task id: RAILIANCE-WP-0003-T05 status: done priority: high state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648" ``` Add targets that mirror the `db-*` (gitea-db) family: ```make apps-pg-deploy: ## Apply shared apps-pg CNPG Cluster + NetworkPolicies $(KUBECTL) apply -f helm/apps-pg-cluster.yaml $(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml apps-pg-status: ## Show apps-pg CNPG cluster health $(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \ $(KUBECTL) get cluster apps-pg -n databases -o wide apps-pg-shell: ## Open psql shell on apps-pg primary $(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta apps-pg-logs: ## Tail apps-pg primary logs $(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50 ``` Then deploy and wait for the cluster to converge: ```bash make apps-pg-deploy kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m ``` Smoke checks: - `cnpg status` reports `Cluster in healthy state`. - Services `apps-pg-rw` and `apps-pg-ro` exist. - From a disposable pod in a temporary namespace labeled `railiance.io/postgres-client=apps-pg`, a platform-operated test connection to `apps-pg-rw.databases:5432/apps_meta` succeeds. Delete the temporary namespace and any copied test secret immediately after the check; do not place `apps_admin` in an application namespace. **Done when:** the smoke checks pass. --- ### T06 — Reply to railiance-apps, document the consumer contract ```task id: RAILIANCE-WP-0003-T06 status: done priority: medium state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547" ``` Notify the requester and capture the pattern. Steps: - Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` through the State Hub `/messages/` REST API with this workplan's id and the cluster's connection details. Do not send bootstrap credentials. - Add `docs/apps-pg.md` with: - Cluster identity and connection endpoints. - The per-app onboarding recipe: (a) request/approve a per-app role, (b) provision the backing role and credential through the chosen platform mechanism, (c) create the CNPG `Database` CR in the `databases` namespace with `spec.cluster.name: apps-pg` and `spec.owner` set to the approved role, (d) label the consumer namespace `railiance.io/postgres-client=apps-pg`, (e) publish or mirror the runtime Secret into the consumer namespace, and (f) wire the DSN into the application Helm values. - The CNPG 1.28 boundary: `Database` is standalone; role management is not a standalone `Role` CR and must follow the platform contract. - Backup posture (when the cluster is added to the existing platform backup process) and the resize / replicate roadmap. **Done when:** the message is replied to and `docs/apps-pg.md` is committed. ## Completion Criteria This workplan is complete when: 1. `apps-pg` reports healthy in the `databases` namespace. 2. NetworkPolicies enforce the default-deny posture with label-based consumer opt-in. 3. Makefile targets work end-to-end. 4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly acknowledged via the hub thread. 5. `docs/apps-pg.md` explains the consumer onboarding contract, including the CNPG role/database boundary. ## Notes - This intentionally does **not** hard-code the `vergabe` role or `vergabe_db` into the shared cluster baseline. The consumer onboarding doc must describe the follow-up request/manifest needed for `railiance-apps` so the platform layer stays generic until an app explicitly registers. - Backup inclusion of `apps-pg` is a follow-up. The existing `make backup` target only covers the legacy PostgreSQL-HA setup; CNPG backup configuration is its own workplan. - A second replica (HA) and a connection pooler (PgBouncer / CNPG `Pooler`) are deferred. The cluster spec leaves room for both — re-enable when node capacity allows.