feat(platform): T01 — standalone PostgreSQL HA chart scaffold

Lays out the S3 platform layer foundation for RAIL-PL-WP-0001 T01:

- .sops.yaml: age encryption policy (shared key, *.sops.yaml pattern)
- .gitignore: prevents accidental commit of decrypted values files
- Makefile: pg-deploy, pg-status, pg-pgpool-check, valkey-deploy,
  valkey-status, backup targets with KUBECONFIG/HELM wiring
- helm/postgresql-ha-values.yaml.template: annotated values schema
  with CHANGEME_ placeholders; includes pgpool-password fix from
  RAIL-BS-WP-0003; notes on single-node vs ThreePhoenix scaling
- docs/postgresql-ha.md: connection strings, DB creation, password
  rotation, pgpool-password critical note, HA failover test ref,
  ThreePhoenix scaling path

To complete T01: fill in CHANGEME_ values, encrypt with sops -e -i,
then run make pg-deploy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-11 02:17:55 +01:00
parent b2d9b67783
commit 01d280120d
5 changed files with 290 additions and 1 deletions

View File

@@ -0,0 +1,64 @@
# postgresql-ha-values.yaml.template
#
# Standalone PostgreSQL HA for railiance-platform (S3)
# Chart: bitnami/postgresql-ha version: ~16.x (pin to 16.2.2 or latest stable)
#
# Usage:
# 1. Copy this file:
# cp helm/postgresql-ha-values.yaml.template helm/postgresql-ha-values.sops.yaml
# 2. Fill in all CHANGEME_ values (passwords, storage class, replica count)
# 3. Encrypt with SOPS (age key must be loaded):
# sops -e -i helm/postgresql-ha-values.sops.yaml
# 4. Deploy:
# make pg-deploy
#
# Never commit the plaintext .template file with real passwords.
# The .sops.yaml file (encrypted) is what gets committed.
#
# NOTE: pgpoolPassword MUST match postgresql.pgpoolPassword.
# This was the root cause of the 2026-03-10 incident (RAIL-BS-WP-0003).
# Do not omit it.
global:
postgresql:
username: postgres
password: CHANGEME_postgres_password
database: postgres
repmgrUsername: repmgr
repmgrPassword: CHANGEME_repmgr_password
postgresql:
replicaCount: 3 # all 3 pods on 1 node for now; set anti-affinity when 3 nodes exist
password: CHANGEME_postgres_password # must match global.postgresql.password
postgresPassword: CHANGEME_postgres_superuser_password
repmgrPassword: CHANGEME_repmgr_password # must match global.postgresql.repmgrPassword
# pgpoolPassword is the sr_check_password used by pgpool to probe replicas.
# It MUST be set here to survive helm upgrade (see incident RAIL-BS-WP-0003).
pgpoolPassword: CHANGEME_pgpool_sr_check_password
persistence:
enabled: true
storageClass: "" # use default StorageClass (local-path on single node; longhorn on 3 nodes)
size: 10Gi
podAntiAffinityPreset: "soft" # soft = prefer spread; switch to "hard" when 3 nodes exist
pgpool:
replicaCount: 1
adminPassword: CHANGEME_pgpool_admin_password
# numInitChildren controls max connections; default 32 is fine for single node
numInitChildren: 32
maxPool: 4
# Connection load balancing
loadBalancingOnWrite: "transaction"
readinessProbe:
enabled: true
livenessProbe:
enabled: true
# Metrics (optional — enable when Prometheus is deployed)
metrics:
enabled: false
serviceMonitor:
enabled: false