# PostgreSQL HA — Platform Service **Chart:** `bitnami/postgresql-ha` **Namespace:** `platform` **Managed by:** `railiance-platform` (S3) **Workplan:** `RAIL-PL-WP-0001` --- ## Architecture ``` Apps (S5) └── pgpool (load balancer / connection pooler) ├── postgresql-0 [Primary — repmgr] ├── postgresql-1 [Standby — repmgr] └── postgresql-2 [Standby — repmgr] ``` - **pgpool-II** distributes reads across standbys, routes writes to primary - **repmgr** handles automatic failover if the primary disappears - All pods in `platform` namespace; app pods connect via pgpool service ## Connection string pattern ``` postgresql://DBUSER:DBPASS@postgresql-ha-pgpool.platform.svc.cluster.local:5432/DBNAME ``` Replace `DBUSER`, `DBPASS`, `DBNAME` with the database-specific credentials. --- ## Initial deployment ### Prerequisites - `railiance-cluster` converged (`make smoke` passes) - SOPS age key accessible: `sops -d helm/postgresql-ha-values.sops.yaml` returns plaintext - `helm repo add bitnami https://charts.bitnami.com/bitnami && helm repo update` done on the node ### Steps ```bash # 1. Ensure the platform namespace exists kubectl create namespace platform --dry-run=client -o yaml | kubectl apply -f - # 2. Deploy (from railiance-platform/) make pg-deploy # 3. Verify make pg-status # Expected: 3 postgresql pods + 1 pgpool pod, all Running # 4. Smoke test make smoke ``` --- ## Creating a new database for an app ```bash # Connect via pgpool kubectl exec -it -n platform \ $(kubectl get pod -n platform -l app.kubernetes.io/component=pgpool -o name | head -1) \ -- psql -U postgres # Inside psql: CREATE DATABASE myapp; CREATE USER myapp WITH PASSWORD 'strong-password'; GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp; \c myapp GRANT ALL ON SCHEMA public TO myapp; \q ``` Add the user password to the app's own secrets (managed in the app's repo, not here). The connection string for the app will be: ``` postgresql://myapp:strong-password@postgresql-ha-pgpool.platform.svc.cluster.local:5432/myapp ``` --- ## Password rotation 1. Update the password in the plaintext values template 2. Re-encrypt: `sops -e -i helm/postgresql-ha-values.sops.yaml` 3. Upgrade: `make pg-deploy` 4. Update the app's connection secret to match 5. Rolling restart the app pods to pick up the new connection --- ## pgpool-password — critical note The `postgresql.pgpoolPassword` value in the Helm chart maps to the `pgpool-password` key in the `postgresql-ha-postgresql` Kubernetes Secret. The pgpool container mounts this key at startup; if it is absent, pgpool enters CrashLoopBackOff with **no log output**. **This was the root cause of the 2026-03-10 incident (RAIL-BS-WP-0003).** Always verify after `helm upgrade`: ```bash kubectl get secret -n platform postgresql-ha-postgresql \ -o jsonpath='{.data.pgpool-password}' | base64 -d && echo # Must print a non-empty string ``` --- ## HA failover test Per Decision D3, any change to this service requires a passing failover test: ```bash # From railiance-cluster/ make test-ha-failover GITEA_URL=https:// ``` The test kills the primary PostgreSQL pod and asserts: 1. repmgr promotes a standby within 60s 2. All pods return to Running within 120s 3. pgpool returns to Running (catches the missing-key bug) --- ## Backup Platform backup (PostgreSQL logical dump) is handled by the `railiance-backup` tool in this repo: ```bash make backup ``` This produces an age-encrypted dump uploaded to Nextcloud. For cluster-level backup (etcd, kubeconfig), see `railiance-cluster/`. --- ## Scaling to 3 nodes (ThreePhoenix) When Railiance02 and Railiance03 join the cluster: 1. Switch StorageClass from `local-path` to `longhorn` in the values file 2. Change `postgresql.podAntiAffinityPreset` from `soft` to `hard` 3. Run `make pg-deploy` — Helm rolling update spreads pods across nodes 4. Run `make test-ha-failover` to confirm HA is genuine (not just replicated on one node)