From e1a6ea5f18c085bbe886d92fcbd9d0693395f247 Mon Sep 17 00:00:00 2001 From: tegwick Date: Tue, 19 May 2026 00:46:50 +0200 Subject: [PATCH] Propose RAILIANCE-WP-0003: shared cnpg cluster apps-pg 6-task plan to provision a shared CloudNative PG cluster apps-pg in the databases namespace, with NetworkPolicies that use a label-based consumer opt-in (railiance.io/postgres-client=apps-pg) instead of the per-namespace allowlist gitea-db uses. Responds to coordination message 768c18f4 from railiance-apps and unblocks RAILIANCE-WP-0002 T04 (vergabe-teilnahme role+db creation). Keeps platform agnostic of individual apps per ADR-003: per-app Database CRs and credential Secrets are owned by the consuming repos. Co-Authored-By: Claude Opus 4.7 --- ...platform-WP-0003-apps-pg-shared-cluster.md | 350 ++++++++++++++++++ 1 file changed, 350 insertions(+) create mode 100644 workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md diff --git a/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md b/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md new file mode 100644 index 0000000..09b1c92 --- /dev/null +++ b/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md @@ -0,0 +1,350 @@ +--- +id: RAILIANCE-WP-0003 +type: workplan +title: "Provision shared cnpg cluster apps-pg" +domain: railiance +repo: railiance-platform +status: proposed +owner: railiance-platform +topic_slug: railiance +planning_priority: high +planning_order: 3 +created: "2026-05-19" +updated: "2026-05-19" +--- + +# Provision shared cnpg cluster apps-pg + +## Goal + +Provision a new shared CloudNative PG cluster `apps-pg` in the +`databases` namespace that S5 application workloads can use to host +their own PostgreSQL databases — without each app forcing the creation +of a dedicated cnpg cluster. + +This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme +needs a `vergabe` role + `vergabe_db` database) and establishes the +shared-cluster pattern future S5 apps adopt by default. + +## Context + +`railiance-apps` workplan `RAILIANCE-WP-0002` (establish +vergabe-teilnahme on railiance01) found at T01 that the two existing +cnpg clusters in `databases` are app-dedicated: + +| Cluster | PG | Owner app | +|----------------|----|-------------| +| `gitea-db` | 18 | gitea | +| `net-kingdom-pg`| 16 | net-kingdom | + +Decision `D-01` (resolved 2026-05-18, bernd) selected option D: +**provision a new shared cluster `apps-pg`** rather than create a third +dedicated cluster (option A) or retrofit an existing app cluster (B/C). + +A coordination message was sent from `railiance-apps` to +`railiance-platform` requesting this work; this workplan is the +response. + +## Placement in the Railiance Tooling Set + +S3 owns cnpg `Cluster` CRs (per ADR-003 and the pattern already +established by `helm/gitea-db-cluster.yaml`). S5 consumers create their +own per-app `Database` CRs and credential Secrets pointing at the +shared cluster's service. + +| Concern | Owner repo | Scope | +|---------|------------|-------| +| `Cluster apps-pg` CR, NetworkPolicies, backups | `railiance-platform` | this workplan | +| Per-app `Database` CRs (`vergabe_db`, ...) | each S5 repo | not here | +| Per-app credential Secrets | each S5 repo | not here | +| Helm release wiring DSNs into app pods | each S5 repo | not here | + +## Current Evidence + +- `kubectl get crd | grep cnpg` confirms cnpg 1.28.1 with the + `databases.postgresql.cnpg.io` CRD — consumers can self-provision + DBs declaratively. +- Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1` + (`cnpg-system` namespace). +- `databases` namespace has a default-deny-all NetworkPolicy; each + cnpg cluster therefore needs its own NetworkPolicy triplet + (egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns) + — pattern visible in `helm/gitea-db-networkpolicies.yaml`. +- No `helm/apps-pg-*.yaml` artifacts exist yet. +- Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f` + (state-hub thread). + +## Safety Contract + +- Do not commit plaintext credentials. Bootstrap secret is a one-time + manual `kubectl create secret` then SOPS-encrypt a template into + `helm/apps-pg-secret.sops.yaml.template`. +- Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`. + This cluster is for S5 *application* DBs. +- Preserve the default-deny NetworkPolicy posture in `databases`; + only allow ingress from namespaces that have a registered consumer. +- Initial sizing is conservative (1 instance, 10Gi) to match the + existing per-cluster footprint. Resize is a follow-up workplan. +- Cluster name `apps-pg` is locked once published — renaming changes + every consumer DSN. + +## Target State + +- `kubectl get cluster apps-pg -n databases` reports + `Cluster in healthy state` with the primary `apps-pg-1`. +- `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists. +- NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet. +- `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs` + targets exist and work. +- Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`) + exist for cluster health probes and to anchor the bootstrap; the + cluster is otherwise empty of per-app data. +- Documentation explains how an S5 consumer registers a new database + via a `Database` CR plus its own credential Secret, without touching + this repo. +- `railiance-apps` is notified via the hub thread; their + `RAILIANCE-WP-0002 T04` can proceed. + +## Tasks + +### T01 — Inventory and capacity check + +```task +id: RAILIANCE-WP-0003-T01 +status: todo +priority: high +``` + +Confirm the substrate before adding a new cluster. + +Checks: + +- cnpg operator version (≥ 1.28.x required for the `Database` CR + consumer pattern). +- Node-level disk space available for an additional 10Gi PVC + (`local-path` storage class is the active default). +- Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any + current resource pressure. +- That the `databases` namespace already exists and has its + default-deny NetworkPolicy in place. +- That `cnpg-system` namespace label + `kubernetes.io/metadata.name=cnpg-system` is set (required by the + ingress-from-operator NetworkPolicy). + +**Done when:** the workplan records cnpg version, available PVC +capacity, and any pre-condition gaps. + +--- + +### T02 — Create bootstrap credential secret + +```task +id: RAILIANCE-WP-0003-T02 +status: todo +priority: high +``` + +Mint the one-time bootstrap secret that cnpg uses to create the initial +`apps_admin` role. + +Steps: + +```bash +APPS_PG_PW=$(openssl rand -base64 32) +kubectl create secret generic apps-pg-credentials \ + --namespace databases \ + --from-literal=username=apps_admin \ + --from-literal=password="$APPS_PG_PW" +``` + +Then commit a SOPS-encrypted template: + +- `helm/apps-pg-secret.sops.yaml.template` — encrypted form for + declarative reapply; do not commit the plaintext password. + +The bootstrap role is intentionally minimal — per-app roles are +created later by their owning repos via cnpg `Role` declarations or +direct grants. + +**Done when:** the secret exists in the cluster and an encrypted +template is committed. + +--- + +### T03 — Add the cnpg Cluster manifest + +```task +id: RAILIANCE-WP-0003-T03 +status: todo +priority: high +``` + +Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`. + +Shape: + +```yaml +apiVersion: postgresql.cnpg.io/v1 +kind: Cluster +metadata: + name: apps-pg + namespace: databases + labels: + app.kubernetes.io/name: apps-pg + app.kubernetes.io/component: database + app.kubernetes.io/managed-by: manual + railiance.io/layer: s3-platform + railiance.io/role: shared-apps-database +spec: + instances: 1 # bump when node RAM > 8GB + postgresql: + version: "16" + storage: + size: 10Gi + bootstrap: + initdb: + database: apps_meta + owner: apps_admin + secret: + name: apps-pg-credentials +``` + +Note: PG version is 16 (matches vergabe-teilnahme's minimum and the +existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision. + +**Done when:** the manifest is committed and `kubectl apply --dry-run` +validates against the cluster. + +--- + +### T04 — Add NetworkPolicies for apps-pg + +```task +id: RAILIANCE-WP-0003-T04 +status: todo +priority: high +``` + +Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet +but parameterised for the *apps* consumer namespace pattern. + +Three policies (all in `databases`, all selecting +`cnpg.io/cluster: apps-pg`): + +1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443. +2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from + `namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP + ports 5432 / 8000 / 9187. +3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432 + from any namespace carrying the label + `railiance.io/postgres-client=apps-pg`. (Each consuming app + namespace adds this label; this avoids hard-coding a namespace list + in the platform repo.) + +The label-based selector is the meaningful difference from gitea-db, +which hard-codes `default`. The shared cluster cannot know its +consumer namespaces in advance, so it expects a positive opt-in label. + +**Done when:** the policies are committed and applied; consumer namespaces +can connect after applying the `railiance.io/postgres-client=apps-pg` +label. + +--- + +### T05 — Makefile targets, deploy, verify + +```task +id: RAILIANCE-WP-0003-T05 +status: todo +priority: high +``` + +Add targets that mirror the `db-*` (gitea-db) family: + +```make +apps-pg-deploy: ## Apply shared apps-pg cnpg Cluster + NetworkPolicies + $(KUBECTL) apply -f helm/apps-pg-cluster.yaml + $(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml + +apps-pg-status: ## Show apps-pg cnpg cluster health + $(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \ + $(KUBECTL) get cluster apps-pg -n databases -o wide + +apps-pg-shell: ## Open psql shell on apps-pg primary + $(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta + +apps-pg-logs: ## Tail apps-pg primary logs + $(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50 +``` + +Then deploy and wait for the cluster to converge: + +```bash +make apps-pg-deploy +kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m +``` + +Smoke checks: + +- `cnpg status` reports `Cluster in healthy state`. +- Services `apps-pg-rw` and `apps-pg-ro` exist. +- From a disposable pod in a namespace labeled + `railiance.io/postgres-client=apps-pg`, + `psql 'postgresql://apps_admin:...@apps-pg-rw.databases:5432/apps_meta'` + connects. + +**Done when:** the smoke checks pass. + +--- + +### T06 — Reply to railiance-apps, document the consumer contract + +```task +id: RAILIANCE-WP-0003-T06 +status: todo +priority: medium +``` + +Notify the requester and capture the pattern. + +Steps: + +- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` via + `reply_to_message` with this workplan's id and the cluster's + connection details (service name, port, bootstrap admin role). +- Add `docs/apps-pg.md` with: + - Cluster identity and connection endpoints. + - The per-app onboarding recipe: (a) label the consumer namespace + `railiance.io/postgres-client=apps-pg`, (b) create a credential + Secret in the consumer namespace, (c) create a cnpg `Database` CR + referencing the cluster and the credential Secret, (d) wire the + DSN into the application Helm values. + - Backup posture (when the cluster is added to the existing platform + backup process) and the resize / replicate roadmap. + +**Done when:** the message is replied to and `docs/apps-pg.md` is +committed. + +## Completion Criteria + +This workplan is complete when: + +1. `apps-pg` reports healthy in the `databases` namespace. +2. NetworkPolicies enforce the default-deny posture with label-based + consumer opt-in. +3. Makefile targets work end-to-end. +4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly + acknowledged via the hub thread. +5. `docs/apps-pg.md` explains the consumer onboarding contract. + +## Notes + +- This intentionally does **not** create the `vergabe` role or + `vergabe_db` — that work belongs in `railiance-apps`. Keeping the + platform layer ignorant of individual apps preserves ADR-003. +- Backup inclusion of `apps-pg` is a follow-up. The existing + `make backup` target only covers the legacy PostgreSQL-HA setup; + cnpg backup configuration is its own workplan. +- A second replica (HA) and a connection pooler (PgBouncer / cnpg + `Pooler`) are deferred. The cluster spec leaves room + for both — re-enable when node capacity allows.