Files
railiance-platform/workplans/RAILIANCE-WP-0003-apps-pg-shared-cluster.md
tegwick 017934d479 Add vergabe role + vergabe_db database for RAILIANCE-WP-0002 T04
First consumer of the shared apps-pg cluster: managed role vergabe in apps-pg-cluster.yaml plus Database CR vergabe-db in new helm/apps-pg-databases.yaml. .gitignore whitelists helm/*-databases.yaml. Workplan implementation notes from codex folded in. Live: Database CR applied=true, psql from vergabe-teilnahme ns returns PostgreSQL 16.13.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 15:47:06 +02:00

15 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order created updated state_hub_workstream_id
RAILIANCE-WP-0003 workplan Provision shared CNPG cluster apps-pg railiance railiance-platform finished codex railiance high 3 2026-05-19 2026-05-19 665b3b9b-608a-4be4-84b6-dcb8261ff57b

RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg

Goal

Provision a new shared CloudNativePG cluster apps-pg in the databases namespace that S5 application workloads can use to host their own PostgreSQL databases — without each app forcing the creation of a dedicated CNPG cluster.

This unblocks railiance-apps RAILIANCE-WP-0002 T04 (vergabe-teilnahme needs a vergabe role + vergabe_db database) by establishing the shared cluster and the governed onboarding contract future S5 apps adopt by default.

Context

railiance-apps workplan RAILIANCE-WP-0002 (establish vergabe-teilnahme on railiance01) found at T01 that the two existing CNPG clusters in databases are app-dedicated:

Cluster PG Owner app
gitea-db 18 gitea
net-kingdom-pg 16 net-kingdom

Decision D-01 (resolved 2026-05-18, bernd) selected option D: provision a new shared cluster apps-pg rather than create a third dedicated cluster (option A) or retrofit an existing app cluster (B/C).

A coordination message was sent from railiance-apps to railiance-platform requesting this work; this workplan is the response.

Placement in the Railiance Tooling Set

S3 owns CNPG Cluster CRs (per ADR-003 and the pattern already established by helm/gitea-db-cluster.yaml). CNPG 1.28 has standalone Database CRs, but PostgreSQL role lifecycle is managed through the target Cluster spec's .spec.managed.roles stanza or through a controlled operator-run SQL workflow. The shared-cluster contract must therefore make role onboarding explicit; S5 repos should not assume a standalone CNPG Role CR exists.

Concern Owner repo Scope
Cluster apps-pg CR, shared NetworkPolicies, bootstrap secret, baseline docs railiance-platform this workplan
Per-app database request and application DSN wiring each S5 repo not here
Per-app PostgreSQL role + credential provisioning coordinated documented here; platform-administered until OpenBao/dedicated automation exists
Per-app runtime Secret in the consumer namespace each S5 repo not here

Current Evidence

  • kubectl get crd | grep cnpg confirms CNPG 1.28.1 with the databases.postgresql.cnpg.io CRD — databases can be represented declaratively.
  • CNPG role management is cluster-scoped via .spec.managed.roles; no standalone CNPG Role CR is available for app repos to apply.
  • Operator image: ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1 (cnpg-system namespace).
  • databases namespace has a default-deny-all NetworkPolicy; each CNPG cluster therefore needs its own NetworkPolicy triplet (egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns) — pattern visible in helm/gitea-db-networkpolicies.yaml.
  • helm/apps-pg-cluster.yaml, helm/apps-pg-networkpolicies.yaml, helm/apps-pg-secret.sops.yaml.template, and docs/apps-pg.md are present in the repo.
  • Coordination message id: 768c18f4-8785-4108-a900-fa117eb8778f (state-hub thread).

Implementation Notes

Completed on 2026-05-19.

  • CNPG operator is ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1.
  • clusters.postgresql.cnpg.io and databases.postgresql.cnpg.io CRDs are present; roles.postgresql.cnpg.io is not present, so role onboarding remains platform-administered through managed roles or a controlled SQL workflow.
  • local-path is the default StorageClass. The single K3s node reports no memory, disk, or PID pressure; allocatable ephemeral storage is about 97.7 GB and memory is about 3.8 GiB. Existing CNPG PVC footprint before apps-pg was two 10Gi PVCs (gitea-db-1, net-kingdom-pg-1).
  • databases exists with default-deny-all; cnpg-system has the required kubernetes.io/metadata.name=cnpg-system namespace label.
  • The live CNPG CRD rejected spec.postgresql.version; the deployed apps-pg manifest therefore pins PostgreSQL 16 with imageName: ghcr.io/cloudnative-pg/postgresql:16.
  • apps-pg is deployed in databases, reports Cluster in healthy state, and has primary apps-pg-1.
  • Services apps-pg-rw and apps-pg-ro exist. With one instance, apps-pg-ro is present but has no replica endpoint until HA is added.
  • A disposable namespace labeled railiance.io/postgres-client=apps-pg successfully connected to apps-pg-rw.databases.svc.cluster.local:5432/apps_meta as apps_admin; the temporary namespace and copied smoke-test secret were deleted immediately after the check.

Safety Contract

  • Do not commit plaintext credentials. Bootstrap secret is a one-time manual kubectl create secret then SOPS-encrypt a template into helm/apps-pg-secret.sops.yaml.template.
  • Do not expose apps_admin to S5 applications. It is a platform bootstrap/smoke-test role, not a runtime credential.
  • Do not collocate non-app data (Gitea, net-kingdom) into apps-pg. This cluster is for S5 application DBs.
  • Preserve the default-deny NetworkPolicy posture in databases; only allow ingress from namespaces that have a registered consumer.
  • Do not advertise self-service role creation until the role provisioning mechanism is explicit. CNPG Database CRs still require their owner role to exist.
  • Initial sizing is conservative (1 instance, 10Gi) to match the existing per-cluster footprint. Resize is a follow-up workplan.
  • Cluster name apps-pg is locked once published — renaming changes every consumer DSN.

Target State

  • kubectl get cluster apps-pg -n databases reports Cluster in healthy state with the primary apps-pg-1.
  • kubectl get svc apps-pg-rw apps-pg-ro -n databases exists.
  • NetworkPolicies for apps-pg mirror the gitea-db triplet.
  • make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs targets exist and work.
  • Bootstrap admin role (apps_admin) and meta database (apps_meta) exist for cluster health probes and to anchor the bootstrap; the cluster is otherwise empty of per-app data.
  • Documentation explains how an S5 consumer registers a new database, including the current CNPG boundary: the Database CR is separate, but role lifecycle is cluster-scoped and therefore governed by the platform contract.
  • railiance-apps is notified via the hub thread; their RAILIANCE-WP-0002 T04 can proceed using the documented onboarding path.

Tasks

T01 — Inventory and capacity check

id: RAILIANCE-WP-0003-T01
status: done
priority: high
state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749"

Confirm the substrate before adding a new cluster.

Checks:

  • CNPG operator version (≥ 1.28.x required for the Database CR consumer pattern).
  • Role/database API boundary: Database CR is present; role lifecycle is .spec.managed.roles or controlled SQL, not a separate Role CR.
  • Node-level disk space available for an additional 10Gi PVC (local-path storage class is the active default).
  • Existing cluster footprint (gitea-db, net-kingdom-pg) and any current resource pressure.
  • That the databases namespace already exists and has its default-deny NetworkPolicy in place.
  • That cnpg-system namespace label kubernetes.io/metadata.name=cnpg-system is set (required by the ingress-from-operator NetworkPolicy).

Done when: the implementation notes record CNPG version, available PVC capacity, the chosen role onboarding mechanism, and any pre-condition gaps.


T02 — Create bootstrap credential secret

id: RAILIANCE-WP-0003-T02
status: done
priority: high
state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08"

Mint the one-time bootstrap secret that CNPG uses to create the initial apps_admin role.

Steps:

APPS_PG_PW=$(openssl rand -base64 32)
kubectl create secret generic apps-pg-credentials \
  --namespace databases \
  --from-literal=username=apps_admin \
  --from-literal=password="$APPS_PG_PW"

Then commit a SOPS-encrypted template:

  • helm/apps-pg-secret.sops.yaml.template — encrypted form for declarative reapply; do not commit the plaintext password.

The bootstrap role is intentionally not a consumer role. Per-app runtime roles are created later through the onboarding mechanism documented in T06; until dedicated automation exists, that mechanism is platform-administered.

Done when: the secret exists in the cluster and an encrypted template is committed.


T03 — Add the CNPG Cluster manifest

id: RAILIANCE-WP-0003-T03
status: done
priority: high
state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12"

Add helm/apps-pg-cluster.yaml modeled on helm/gitea-db-cluster.yaml. Do not add app-specific roles or databases to the baseline cluster manifest unless T01 explicitly chooses a platform-owned managed-role stanza as the interim onboarding path for the first consumer.

Shape:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: apps-pg
  namespace: databases
  labels:
    app.kubernetes.io/name: apps-pg
    app.kubernetes.io/component: database
    app.kubernetes.io/managed-by: manual
    railiance.io/layer: s3-platform
    railiance.io/role: shared-apps-database
spec:
  instances: 1            # bump when node RAM > 8GB
  imageName: ghcr.io/cloudnative-pg/postgresql:16
  storage:
    size: 10Gi
  bootstrap:
    initdb:
      database: apps_meta
      owner: apps_admin
      secret:
        name: apps-pg-credentials

Note: PG version is 16 (matches vergabe-teilnahme's minimum and the existing net-kingdom-pg). Bumping to 17/18 is a separate decision.

Done when: the manifest is committed and kubectl apply --dry-run validates against the cluster.


T04 — Add NetworkPolicies for apps-pg

id: RAILIANCE-WP-0003-T04
status: done
priority: high
state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1"

Add helm/apps-pg-networkpolicies.yaml modeled on the gitea-db triplet but parameterised for the apps consumer namespace pattern.

Three policies (all in databases, all selecting cnpg.io/cluster: apps-pg):

  1. allow-egress-kube-api-apps-pg — egress to TCP/6443.
  2. allow-ingress-from-cnpg-operator-apps-pg — ingress from namespaceSelector kubernetes.io/metadata.name=cnpg-system on TCP ports 5432 / 8000 / 9187.
  3. allow-ingress-from-app-namespaces-apps-pg — ingress on TCP/5432 from any namespace carrying the label railiance.io/postgres-client=apps-pg. (Each consuming app namespace adds this label; this avoids hard-coding a namespace list in the platform repo.)

The label-based selector is the meaningful difference from gitea-db, which hard-codes default. The shared cluster cannot know its consumer namespaces in advance, so it expects a positive opt-in label.

Done when: the policies are committed and applied; consumer namespaces can connect after applying the railiance.io/postgres-client=apps-pg label.


T05 — Makefile targets, deploy, verify

id: RAILIANCE-WP-0003-T05
status: done
priority: high
state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648"

Add targets that mirror the db-* (gitea-db) family:

apps-pg-deploy:  ## Apply shared apps-pg CNPG Cluster + NetworkPolicies
	$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
	$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml

apps-pg-status:  ## Show apps-pg CNPG cluster health
	$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
	  $(KUBECTL) get cluster apps-pg -n databases -o wide

apps-pg-shell:   ## Open psql shell on apps-pg primary
	$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta

apps-pg-logs:    ## Tail apps-pg primary logs
	$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50

Then deploy and wait for the cluster to converge:

make apps-pg-deploy
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m

Smoke checks:

  • cnpg status reports Cluster in healthy state.
  • Services apps-pg-rw and apps-pg-ro exist.
  • From a disposable pod in a temporary namespace labeled railiance.io/postgres-client=apps-pg, a platform-operated test connection to apps-pg-rw.databases:5432/apps_meta succeeds. Delete the temporary namespace and any copied test secret immediately after the check; do not place apps_admin in an application namespace.

Done when: the smoke checks pass.


T06 — Reply to railiance-apps, document the consumer contract

id: RAILIANCE-WP-0003-T06
status: done
priority: medium
state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547"

Notify the requester and capture the pattern.

Steps:

  • Reply to thread 768c18f4-8785-4108-a900-fa117eb8778f through the State Hub /messages/ REST API with this workplan's id and the cluster's connection details. Do not send bootstrap credentials.
  • Add docs/apps-pg.md with:
    • Cluster identity and connection endpoints.
    • The per-app onboarding recipe: (a) request/approve a per-app role, (b) provision the backing role and credential through the chosen platform mechanism, (c) create the CNPG Database CR in the databases namespace with spec.cluster.name: apps-pg and spec.owner set to the approved role, (d) label the consumer namespace railiance.io/postgres-client=apps-pg, (e) publish or mirror the runtime Secret into the consumer namespace, and (f) wire the DSN into the application Helm values.
    • The CNPG 1.28 boundary: Database is standalone; role management is not a standalone Role CR and must follow the platform contract.
    • Backup posture (when the cluster is added to the existing platform backup process) and the resize / replicate roadmap.

Done when: the message is replied to and docs/apps-pg.md is committed.

Completion Criteria

This workplan is complete when:

  1. apps-pg reports healthy in the databases namespace.
  2. NetworkPolicies enforce the default-deny posture with label-based consumer opt-in.
  3. Makefile targets work end-to-end.
  4. railiance-apps RAILIANCE-WP-0002 T04 is unblocked and explicitly acknowledged via the hub thread.
  5. docs/apps-pg.md explains the consumer onboarding contract, including the CNPG role/database boundary.

Notes

  • This intentionally does not hard-code the vergabe role or vergabe_db into the shared cluster baseline. The consumer onboarding doc must describe the follow-up request/manifest needed for railiance-apps so the platform layer stays generic until an app explicitly registers.
  • Backup inclusion of apps-pg is a follow-up. The existing make backup target only covers the legacy PostgreSQL-HA setup; CNPG backup configuration is its own workplan.
  • A second replica (HA) and a connection pooler (PgBouncer / CNPG Pooler) are deferred. The cluster spec leaves room for both — re-enable when node capacity allows.