feat(platform): T01 — standalone PostgreSQL HA chart scaffold
Lays out the S3 platform layer foundation for RAIL-PL-WP-0001 T01: - .sops.yaml: age encryption policy (shared key, *.sops.yaml pattern) - .gitignore: prevents accidental commit of decrypted values files - Makefile: pg-deploy, pg-status, pg-pgpool-check, valkey-deploy, valkey-status, backup targets with KUBECONFIG/HELM wiring - helm/postgresql-ha-values.yaml.template: annotated values schema with CHANGEME_ placeholders; includes pgpool-password fix from RAIL-BS-WP-0003; notes on single-node vs ThreePhoenix scaling - docs/postgresql-ha.md: connection strings, DB creation, password rotation, pgpool-password critical note, HA failover test ref, ThreePhoenix scaling path To complete T01: fill in CHANGEME_ values, encrypt with sops -e -i, then run make pg-deploy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
151
docs/postgresql-ha.md
Normal file
151
docs/postgresql-ha.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# PostgreSQL HA — Platform Service
|
||||
|
||||
**Chart:** `bitnami/postgresql-ha`
|
||||
**Namespace:** `platform`
|
||||
**Managed by:** `railiance-platform` (S3)
|
||||
**Workplan:** `RAIL-PL-WP-0001`
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Apps (S5)
|
||||
└── pgpool (load balancer / connection pooler)
|
||||
├── postgresql-0 [Primary — repmgr]
|
||||
├── postgresql-1 [Standby — repmgr]
|
||||
└── postgresql-2 [Standby — repmgr]
|
||||
```
|
||||
|
||||
- **pgpool-II** distributes reads across standbys, routes writes to primary
|
||||
- **repmgr** handles automatic failover if the primary disappears
|
||||
- All pods in `platform` namespace; app pods connect via pgpool service
|
||||
|
||||
## Connection string pattern
|
||||
|
||||
```
|
||||
postgresql://DBUSER:DBPASS@postgresql-ha-pgpool.platform.svc.cluster.local:5432/DBNAME
|
||||
```
|
||||
|
||||
Replace `DBUSER`, `DBPASS`, `DBNAME` with the database-specific credentials.
|
||||
|
||||
---
|
||||
|
||||
## Initial deployment
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- `railiance-cluster` converged (`make smoke` passes)
|
||||
- SOPS age key accessible: `sops -d helm/postgresql-ha-values.sops.yaml` returns plaintext
|
||||
- `helm repo add bitnami https://charts.bitnami.com/bitnami && helm repo update` done on the node
|
||||
|
||||
### Steps
|
||||
|
||||
```bash
|
||||
# 1. Ensure the platform namespace exists
|
||||
kubectl create namespace platform --dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
# 2. Deploy (from railiance-platform/)
|
||||
make pg-deploy
|
||||
|
||||
# 3. Verify
|
||||
make pg-status
|
||||
# Expected: 3 postgresql pods + 1 pgpool pod, all Running
|
||||
|
||||
# 4. Smoke test
|
||||
make smoke
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Creating a new database for an app
|
||||
|
||||
```bash
|
||||
# Connect via pgpool
|
||||
kubectl exec -it -n platform \
|
||||
$(kubectl get pod -n platform -l app.kubernetes.io/component=pgpool -o name | head -1) \
|
||||
-- psql -U postgres
|
||||
|
||||
# Inside psql:
|
||||
CREATE DATABASE myapp;
|
||||
CREATE USER myapp WITH PASSWORD 'strong-password';
|
||||
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp;
|
||||
\c myapp
|
||||
GRANT ALL ON SCHEMA public TO myapp;
|
||||
\q
|
||||
```
|
||||
|
||||
Add the user password to the app's own secrets (managed in the app's repo,
|
||||
not here). The connection string for the app will be:
|
||||
```
|
||||
postgresql://myapp:strong-password@postgresql-ha-pgpool.platform.svc.cluster.local:5432/myapp
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Password rotation
|
||||
|
||||
1. Update the password in the plaintext values template
|
||||
2. Re-encrypt: `sops -e -i helm/postgresql-ha-values.sops.yaml`
|
||||
3. Upgrade: `make pg-deploy`
|
||||
4. Update the app's connection secret to match
|
||||
5. Rolling restart the app pods to pick up the new connection
|
||||
|
||||
---
|
||||
|
||||
## pgpool-password — critical note
|
||||
|
||||
The `postgresql.pgpoolPassword` value in the Helm chart maps to the
|
||||
`pgpool-password` key in the `postgresql-ha-postgresql` Kubernetes Secret.
|
||||
The pgpool container mounts this key at startup; if it is absent, pgpool
|
||||
enters CrashLoopBackOff with **no log output**.
|
||||
|
||||
**This was the root cause of the 2026-03-10 incident (RAIL-BS-WP-0003).**
|
||||
|
||||
Always verify after `helm upgrade`:
|
||||
```bash
|
||||
kubectl get secret -n platform postgresql-ha-postgresql \
|
||||
-o jsonpath='{.data.pgpool-password}' | base64 -d && echo
|
||||
# Must print a non-empty string
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## HA failover test
|
||||
|
||||
Per Decision D3, any change to this service requires a passing failover test:
|
||||
|
||||
```bash
|
||||
# From railiance-cluster/
|
||||
make test-ha-failover GITEA_URL=https://<gitea-hostname>
|
||||
```
|
||||
|
||||
The test kills the primary PostgreSQL pod and asserts:
|
||||
1. repmgr promotes a standby within 60s
|
||||
2. All pods return to Running within 120s
|
||||
3. pgpool returns to Running (catches the missing-key bug)
|
||||
|
||||
---
|
||||
|
||||
## Backup
|
||||
|
||||
Platform backup (PostgreSQL logical dump) is handled by the `railiance-backup`
|
||||
tool in this repo:
|
||||
|
||||
```bash
|
||||
make backup
|
||||
```
|
||||
|
||||
This produces an age-encrypted dump uploaded to Nextcloud. For cluster-level
|
||||
backup (etcd, kubeconfig), see `railiance-cluster/`.
|
||||
|
||||
---
|
||||
|
||||
## Scaling to 3 nodes (ThreePhoenix)
|
||||
|
||||
When Railiance02 and Railiance03 join the cluster:
|
||||
|
||||
1. Switch StorageClass from `local-path` to `longhorn` in the values file
|
||||
2. Change `postgresql.podAntiAffinityPreset` from `soft` to `hard`
|
||||
3. Run `make pg-deploy` — Helm rolling update spreads pods across nodes
|
||||
4. Run `make test-ha-failover` to confirm HA is genuine (not just replicated on one node)
|
||||
Reference in New Issue
Block a user