# Incident: pgpool CrashLoopBackOff on PostgreSQL HA Failover **Date:** 2026-03-10 **Severity:** High (Gitea write operations unavailable for ~4 hours) **Component:** postgresql-ha subchart (Bitnami v16.2.2) via Gitea Helm chart v12.2.0 **Status:** Resolved — permanent fix pending `helm upgrade` with correct values --- ## Summary A PostgreSQL HA failover caused the pgpool connection pooler to enter CrashLoopBackOff. Gitea logins and all write operations hung silently for approximately 4 hours. The root page continued to load (served from Valkey cache), masking the failure. Root cause: the `pgpool-password` key was absent from the `gitea-postgresql-ha-postgresql` Kubernetes Secret. The Bitnami postgresql-ha subchart does not populate this key automatically. The missing key had been present since initial deployment (2025-08-31) but was never discovered because the pgpool pod had not restarted in 20 days. --- ## Timeline | Time (UTC) | Event | |---|---| | ~09:45 | `postgresql-0`, `postgresql-2` pods restarted (repmgr failover) | | ~09:45 | pgpool pod restarted → CrashLoopBackOff (silent, no logs) | | ~11:00 | User noticed Gitea login hanging; home page still loading | | ~13:00 | Root cause identified: missing `pgpool-password` secret key | | ~13:10 | Secret patched manually; pgpool pod deleted and restarted | | ~13:15 | Gitea fully operational | --- ## Root Cause The Bitnami `pgpool` container startup script reads `/opt/bitnami/pgpool/secrets/pgpool-password`, mounted from the `gitea-postgresql-ha-postgresql` Secret via `subPath`. That key was never written by the Helm chart. The container exited immediately with no log output, making it appear as a silent crash. --- ## Evidence ```bash # Secret was missing pgpool-password — only these keys existed: kubectl get secret -n default gitea-postgresql-ha-postgresql -o jsonpath='{.data}' | python3 -m json.tool # password, postgres-password, repmgr-password — pgpool-password absent # pgpool had 824 back-off restarts over 173 minutes with no logs kubectl logs -n default --previous # (empty output) # Gitea process had zero TCP connections to PostgreSQL (5432 = 0x1538) cat /proc//net/tcp | grep 1538 # no results # All connections were to Valkey (6379 = 0x18EB) ``` --- ## Immediate Fix (manual — will regress on helm upgrade) ```bash # Base64 of the pgpool admin password PASSWORD_B64=$(echo -n "" | base64) kubectl patch secret -n default gitea-postgresql-ha-postgresql \ --type='json' \ -p="[{\"op\":\"add\",\"path\":\"/data/pgpool-password\",\"value\":\"${PASSWORD_B64}\"}]" kubectl delete pod -n default ``` --- ## Permanent Fix Add `pgpool.adminPassword` to `helm/gitea-values.yaml` so the key is present after every `helm upgrade`: ```bash helm upgrade gitea gitea/gitea --values helm/gitea-values.yaml ``` See: `helm/gitea-values.yaml` — must be filled with the actual pgpool password before running the upgrade. --- ## Decisions Triggered **D3 — HA and failover scenarios must be tested before a workplan is considered done.** Any workplan deploying an HA component is not complete until: 1. A failover test script in `tests/` passes against a live cluster 2. Smoke tests check the connection pooler/proxy, not just backing nodes 3. All required Helm values are in the versioned values file See: `DECISIONS.md` and `workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md` --- ## Recovery Checklist If pgpool enters CrashLoopBackOff again: ```bash # 1. Verify the secret key exists kubectl get secret -n default gitea-postgresql-ha-postgresql \ -o jsonpath='{.data.pgpool-password}' # Empty output = key missing → apply patch above # 2. After patching, force pgpool restart kubectl delete pod -n default \ $(kubectl get pod -n default -l app.kubernetes.io/component=pgpool -o name) # 3. Confirm Running state kubectl get pods -n default | grep pgpool # 4. Confirm Gitea can reach PostgreSQL # In the Gitea pod: nc -zv gitea-postgresql-ha-pgpool 5432 ```