Files
railiance-platform/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md
tegwick e1a6ea5f18 Propose RAILIANCE-WP-0003: shared cnpg cluster apps-pg
6-task plan to provision a shared CloudNative PG cluster apps-pg in
the databases namespace, with NetworkPolicies that use a label-based
consumer opt-in (railiance.io/postgres-client=apps-pg) instead of
the per-namespace allowlist gitea-db uses.

Responds to coordination message 768c18f4 from railiance-apps and
unblocks RAILIANCE-WP-0002 T04 (vergabe-teilnahme role+db creation).

Keeps platform agnostic of individual apps per ADR-003: per-app
Database CRs and credential Secrets are owned by the consuming repos.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 00:46:50 +02:00

11 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated
id type title domain repo status owner topic_slug planning_priority planning_order created updated
RAILIANCE-WP-0003 workplan Provision shared cnpg cluster apps-pg railiance railiance-platform proposed railiance-platform railiance high 3 2026-05-19 2026-05-19

Provision shared cnpg cluster apps-pg

Goal

Provision a new shared CloudNative PG cluster apps-pg in the databases namespace that S5 application workloads can use to host their own PostgreSQL databases — without each app forcing the creation of a dedicated cnpg cluster.

This unblocks railiance-apps RAILIANCE-WP-0002 T04 (vergabe-teilnahme needs a vergabe role + vergabe_db database) and establishes the shared-cluster pattern future S5 apps adopt by default.

Context

railiance-apps workplan RAILIANCE-WP-0002 (establish vergabe-teilnahme on railiance01) found at T01 that the two existing cnpg clusters in databases are app-dedicated:

Cluster PG Owner app
gitea-db 18 gitea
net-kingdom-pg 16 net-kingdom

Decision D-01 (resolved 2026-05-18, bernd) selected option D: provision a new shared cluster apps-pg rather than create a third dedicated cluster (option A) or retrofit an existing app cluster (B/C).

A coordination message was sent from railiance-apps to railiance-platform requesting this work; this workplan is the response.

Placement in the Railiance Tooling Set

S3 owns cnpg Cluster CRs (per ADR-003 and the pattern already established by helm/gitea-db-cluster.yaml). S5 consumers create their own per-app Database CRs and credential Secrets pointing at the shared cluster's service.

Concern Owner repo Scope
Cluster apps-pg CR, NetworkPolicies, backups railiance-platform this workplan
Per-app Database CRs (vergabe_db, ...) each S5 repo not here
Per-app credential Secrets each S5 repo not here
Helm release wiring DSNs into app pods each S5 repo not here

Current Evidence

  • kubectl get crd | grep cnpg confirms cnpg 1.28.1 with the databases.postgresql.cnpg.io CRD — consumers can self-provision DBs declaratively.
  • Operator image: ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1 (cnpg-system namespace).
  • databases namespace has a default-deny-all NetworkPolicy; each cnpg cluster therefore needs its own NetworkPolicy triplet (egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns) — pattern visible in helm/gitea-db-networkpolicies.yaml.
  • No helm/apps-pg-*.yaml artifacts exist yet.
  • Coordination message id: 768c18f4-8785-4108-a900-fa117eb8778f (state-hub thread).

Safety Contract

  • Do not commit plaintext credentials. Bootstrap secret is a one-time manual kubectl create secret then SOPS-encrypt a template into helm/apps-pg-secret.sops.yaml.template.
  • Do not collocate non-app data (Gitea, net-kingdom) into apps-pg. This cluster is for S5 application DBs.
  • Preserve the default-deny NetworkPolicy posture in databases; only allow ingress from namespaces that have a registered consumer.
  • Initial sizing is conservative (1 instance, 10Gi) to match the existing per-cluster footprint. Resize is a follow-up workplan.
  • Cluster name apps-pg is locked once published — renaming changes every consumer DSN.

Target State

  • kubectl get cluster apps-pg -n databases reports Cluster in healthy state with the primary apps-pg-1.
  • kubectl get svc apps-pg-rw apps-pg-ro -n databases exists.
  • NetworkPolicies for apps-pg mirror the gitea-db triplet.
  • make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs targets exist and work.
  • Bootstrap admin role (apps_admin) and meta database (apps_meta) exist for cluster health probes and to anchor the bootstrap; the cluster is otherwise empty of per-app data.
  • Documentation explains how an S5 consumer registers a new database via a Database CR plus its own credential Secret, without touching this repo.
  • railiance-apps is notified via the hub thread; their RAILIANCE-WP-0002 T04 can proceed.

Tasks

T01 — Inventory and capacity check

id: RAILIANCE-WP-0003-T01
status: todo
priority: high

Confirm the substrate before adding a new cluster.

Checks:

  • cnpg operator version (≥ 1.28.x required for the Database CR consumer pattern).
  • Node-level disk space available for an additional 10Gi PVC (local-path storage class is the active default).
  • Existing cluster footprint (gitea-db, net-kingdom-pg) and any current resource pressure.
  • That the databases namespace already exists and has its default-deny NetworkPolicy in place.
  • That cnpg-system namespace label kubernetes.io/metadata.name=cnpg-system is set (required by the ingress-from-operator NetworkPolicy).

Done when: the workplan records cnpg version, available PVC capacity, and any pre-condition gaps.


T02 — Create bootstrap credential secret

id: RAILIANCE-WP-0003-T02
status: todo
priority: high

Mint the one-time bootstrap secret that cnpg uses to create the initial apps_admin role.

Steps:

APPS_PG_PW=$(openssl rand -base64 32)
kubectl create secret generic apps-pg-credentials \
  --namespace databases \
  --from-literal=username=apps_admin \
  --from-literal=password="$APPS_PG_PW"

Then commit a SOPS-encrypted template:

  • helm/apps-pg-secret.sops.yaml.template — encrypted form for declarative reapply; do not commit the plaintext password.

The bootstrap role is intentionally minimal — per-app roles are created later by their owning repos via cnpg Role declarations or direct grants.

Done when: the secret exists in the cluster and an encrypted template is committed.


T03 — Add the cnpg Cluster manifest

id: RAILIANCE-WP-0003-T03
status: todo
priority: high

Add helm/apps-pg-cluster.yaml modeled on helm/gitea-db-cluster.yaml.

Shape:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: apps-pg
  namespace: databases
  labels:
    app.kubernetes.io/name: apps-pg
    app.kubernetes.io/component: database
    app.kubernetes.io/managed-by: manual
    railiance.io/layer: s3-platform
    railiance.io/role: shared-apps-database
spec:
  instances: 1            # bump when node RAM > 8GB
  postgresql:
    version: "16"
  storage:
    size: 10Gi
  bootstrap:
    initdb:
      database: apps_meta
      owner: apps_admin
      secret:
        name: apps-pg-credentials

Note: PG version is 16 (matches vergabe-teilnahme's minimum and the existing net-kingdom-pg). Bumping to 17/18 is a separate decision.

Done when: the manifest is committed and kubectl apply --dry-run validates against the cluster.


T04 — Add NetworkPolicies for apps-pg

id: RAILIANCE-WP-0003-T04
status: todo
priority: high

Add helm/apps-pg-networkpolicies.yaml modeled on the gitea-db triplet but parameterised for the apps consumer namespace pattern.

Three policies (all in databases, all selecting cnpg.io/cluster: apps-pg):

  1. allow-egress-kube-api-apps-pg — egress to TCP/6443.
  2. allow-ingress-from-cnpg-operator-apps-pg — ingress from namespaceSelector kubernetes.io/metadata.name=cnpg-system on TCP ports 5432 / 8000 / 9187.
  3. allow-ingress-from-app-namespaces-apps-pg — ingress on TCP/5432 from any namespace carrying the label railiance.io/postgres-client=apps-pg. (Each consuming app namespace adds this label; this avoids hard-coding a namespace list in the platform repo.)

The label-based selector is the meaningful difference from gitea-db, which hard-codes default. The shared cluster cannot know its consumer namespaces in advance, so it expects a positive opt-in label.

Done when: the policies are committed and applied; consumer namespaces can connect after applying the railiance.io/postgres-client=apps-pg label.


T05 — Makefile targets, deploy, verify

id: RAILIANCE-WP-0003-T05
status: todo
priority: high

Add targets that mirror the db-* (gitea-db) family:

apps-pg-deploy:  ## Apply shared apps-pg cnpg Cluster + NetworkPolicies
	$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
	$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml

apps-pg-status:  ## Show apps-pg cnpg cluster health
	$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
	  $(KUBECTL) get cluster apps-pg -n databases -o wide

apps-pg-shell:   ## Open psql shell on apps-pg primary
	$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta

apps-pg-logs:    ## Tail apps-pg primary logs
	$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50

Then deploy and wait for the cluster to converge:

make apps-pg-deploy
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m

Smoke checks:

  • cnpg status reports Cluster in healthy state.
  • Services apps-pg-rw and apps-pg-ro exist.
  • From a disposable pod in a namespace labeled railiance.io/postgres-client=apps-pg, psql 'postgresql://apps_admin:...@apps-pg-rw.databases:5432/apps_meta' connects.

Done when: the smoke checks pass.


T06 — Reply to railiance-apps, document the consumer contract

id: RAILIANCE-WP-0003-T06
status: todo
priority: medium

Notify the requester and capture the pattern.

Steps:

  • Reply to thread 768c18f4-8785-4108-a900-fa117eb8778f via reply_to_message with this workplan's id and the cluster's connection details (service name, port, bootstrap admin role).
  • Add docs/apps-pg.md with:
    • Cluster identity and connection endpoints.
    • The per-app onboarding recipe: (a) label the consumer namespace railiance.io/postgres-client=apps-pg, (b) create a credential Secret in the consumer namespace, (c) create a cnpg Database CR referencing the cluster and the credential Secret, (d) wire the DSN into the application Helm values.
    • Backup posture (when the cluster is added to the existing platform backup process) and the resize / replicate roadmap.

Done when: the message is replied to and docs/apps-pg.md is committed.

Completion Criteria

This workplan is complete when:

  1. apps-pg reports healthy in the databases namespace.
  2. NetworkPolicies enforce the default-deny posture with label-based consumer opt-in.
  3. Makefile targets work end-to-end.
  4. railiance-apps RAILIANCE-WP-0002 T04 is unblocked and explicitly acknowledged via the hub thread.
  5. docs/apps-pg.md explains the consumer onboarding contract.

Notes

  • This intentionally does not create the vergabe role or vergabe_db — that work belongs in railiance-apps. Keeping the platform layer ignorant of individual apps preserves ADR-003.
  • Backup inclusion of apps-pg is a follow-up. The existing make backup target only covers the legacy PostgreSQL-HA setup; cnpg backup configuration is its own workplan.
  • A second replica (HA) and a connection pooler (PgBouncer / cnpg Pooler) are deferred. The cluster spec leaves room for both — re-enable when node capacity allows.