--- id: RAIL-PL-WP-0001 type: workplan title: "S3 Platform Services Baseline" domain: railiance repo: railiance-platform status: superseded owner: railiance topic_slug: railiance state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7" superseded_by: RAIL-HO-WP-0004 created: "2026-03-11" updated: "2026-05-17" --- # S3 Platform Services Baseline > **SUPERSEDED** by `RAIL-HO-WP-0004` (2026-03-26). > This workplan targeted Bitnami postgresql-ha, which is now stale. > CloudNative PG (cnpg) is the deployed operator. See WP-0004 T03–T05 for > the current cnpg-based platform baseline work. ## Goal Establish `railiance-platform` (S3) as a reproducible, OAS-compliant platform layer. Currently, PostgreSQL HA and Valkey are deployed implicitly as subcharts of the Gitea Helm release in S2 (`railiance-cluster`). This violates the OAS boundary rule: S3 owns platform services; S2 owns only the cluster runtime. This workplan makes S3 a proper, standalone layer that S5 applications can depend on. ## Scope | Concern | Current location | After this workplan | |---------|-----------------|---------------------| | PostgreSQL HA (repmgr + pgpool) | Gitea subchart in S2 | Standalone Helm release in S3 | | Valkey (Redis-compatible cache) | Gitea subchart in S2 | Standalone Helm release in S3 | | Gitea Helm values | `railiance-cluster/helm/` (S2) | `railiance-apps/helm/` (S5) | | `railiance-backup` tool | `railiance-cluster/tools/cmd/` (S2) | `railiance-platform/tools/cmd/` (S3) | ## Pre-conditions - `railiance-cluster` converged: k3s running, Helm available (`make smoke` passes) - Active backup on Nextcloud before any migration step - SSH tunnel active for State Hub MCP access ## Boundary rule reminder (ADR-003) > S3 owns shared platform services. S5 owns application deployments. > S2 must not manage database or cache services directly. --- ## Tasks ### T01 — Codify standalone PostgreSQL HA Helm chart ```task id: RAIL-PL-WP-0001-T01 state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99 status: cancelled priority: high ``` Write `helm/postgresql-ha-values.sops.yaml` using the Bitnami `postgresql-ha` chart. Capture the values currently baked into the Gitea subchart, including the `pgpool-password` fix from RAIL-BS-WP-0003: ```yaml # helm/postgresql-ha-values.sops.yaml (schema only — encrypt secrets with SOPS) postgresql: replicaCount: 3 password: ENC[...] postgresPassword: ENC[...] repmgrPassword: ENC[...] pgpool: replicaCount: 1 adminPassword: ENC[...] # pgpool-password must be set — see RAIL-BS-WP-0003 pgpoolPassword: ENC[...] persistence: enabled: true size: 10Gi ``` Add a `make` target: ```makefile pg-deploy: ## Deploy standalone PostgreSQL HA to cluster helm upgrade --install postgresql-ha bitnami/postgresql-ha \ -f helm/postgresql-ha-values.yaml --namespace platform --create-namespace pg-status: ## Check PostgreSQL HA pod status kubectl get pods -n platform -l app.kubernetes.io/name=postgresql-ha ``` Add `docs/postgresql-ha.md` documenting: - Chart version pinned - Connection string pattern for apps - How to create a new database for an app - How to rotate passwords (SOPS re-encrypt → helm upgrade) **Done when:** `make pg-deploy` succeeds; three postgresql-ha pods + pgpool Running in the `platform` namespace; `make smoke` still passes. --- ### T02 — Migrate Gitea to use external PostgreSQL ```task id: RAIL-PL-WP-0001-T02 state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88 status: cancelled priority: high ``` **Pre-condition:** T01 done and `postgresql-ha` healthy in `platform` namespace. Steps: 1. **Backup first:** `make backup` in `railiance-cluster` — verify upload to Nextcloud. 2. Create a `gitea` database and user on the new standalone cluster: ```bash kubectl exec -n platform postgresql-ha-postgresql-0 -- \ psql -U postgres -c "CREATE DATABASE gitea; CREATE USER gitea WITH PASSWORD '...'; GRANT ALL ON DATABASE gitea TO gitea;" ``` 3. Migrate data: `pg_dump` from old DB → `pg_restore` into new cluster. 4. Update `helm/gitea-values.sops.yaml` to disable the subchart and point to the external DB: ```yaml postgresql-ha: enabled: false externalDatabase: host: postgresql-ha-pgpool.platform.svc.cluster.local port: 5432 database: gitea username: gitea password: ENC[...] ``` 5. `helm upgrade gitea` — verify Gitea operational. **Done when:** Gitea login works; `postgresql-ha` subchart pods are gone; all data intact. --- ### T03 — Relocate Gitea Helm deployment to railiance-apps (S5) ```task id: RAIL-PL-WP-0001-T03 state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1 status: cancelled priority: medium ``` **Pre-condition:** T02 done. ```bash # In railiance-cluster: git mv helm/gitea-values.sops.yaml ../railiance-apps/helm/ ``` Add to `railiance-apps/Makefile`: ```makefile gitea-deploy: ## Deploy / upgrade Gitea helm upgrade --install gitea gitea-charts/gitea \ -f helm/gitea-values.yaml --namespace apps --create-namespace gitea-status: ## Check Gitea pod status kubectl get pods -n apps -l app.kubernetes.io/name=gitea ``` Add tombstone in `railiance-cluster/helm/MOVED.md`: ``` gitea-values.sops.yaml moved to railiance-apps/helm/ (2026-03-11, RAIL-PL-WP-0001-T03) ``` Update `railiance-cluster/tests/smoke_kube.sh` and `tests/test_ha_failover.sh` to reference the new namespace (`apps`) if Gitea moves namespaces. **Done when:** `gitea-values.sops.yaml` is in `railiance-apps/helm/`; Gitea still operational; tombstone in place. --- ### T04 — Smoke + HA failover tests pass post-migration ```task id: RAIL-PL-WP-0001-T04 state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee status: cancelled priority: high ``` Per Decision D3: no HA deployment is complete until the failover test exits 0. ```bash # From railiance-cluster: make smoke # all assertions green make test-ha-failover GITEA_URL=https:// ``` Expected: pgpool recovers cleanly after primary pod deletion; Gitea login remains available within the recovery window. **Done when:** both scripts exit 0 against the migrated live cluster. --- ### T05 — Relocate railiance-backup tool from S2 to S3 ```task id: RAIL-PL-WP-0001-T05 state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3 status: cancelled priority: medium ``` As flagged in RAIL-HO-WP-0003 T04: backup is a platform concern (S3), not a cluster runtime concern (S2). ```bash mkdir -p ~/railiance-platform/tools/cmd git mv ~/railiance-cluster/tools/cmd/railiance-backup \ ~/railiance-platform/tools/cmd/railiance-backup ``` Update `railiance-platform/Makefile`: ```makefile backup: ## Backup platform services (PostgreSQL, Valkey) — age-encrypted sudo tools/cmd/railiance-backup ``` Add tombstone stub in `railiance-cluster/tools/cmd/`: ```bash # railiance-backup — MOVED to railiance-platform/tools/cmd/ (RAIL-PL-WP-0001-T05) ``` Update `railiance-cluster/Makefile` `backup` target to delegate: ```makefile backup: ## Backup cluster runtime — delegates platform backup to railiance-platform @echo "Cluster backup (etcd + kubeconfig):" sudo tools/cmd/railiance-backup-s2 @echo "Platform backup (PostgreSQL, Valkey): run 'make backup' in railiance-platform" ``` **Done when:** `make backup` in railiance-platform runs the platform backup; railiance-cluster backup still covers etcd/kubeconfig; no duplication. --- ### T06 — Codify Valkey as standalone S3 asset ```task id: RAIL-PL-WP-0001-T06 state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811 status: cancelled priority: low ``` Valkey is currently deployed as a Gitea subchart. Once T02 removes the subchart bundle, Valkey must be deployed independently so Gitea and future apps (Zulip) can use it. Write `helm/valkey-values.sops.yaml`: ```yaml # Bitnami Valkey chart auth: enabled: true password: ENC[...] replica: replicaCount: 1 persistence: enabled: true size: 2Gi ``` Add `make` targets: ```makefile valkey-deploy: ## Deploy Valkey (Redis-compatible) to platform namespace helm upgrade --install valkey bitnami/valkey \ -f helm/valkey-values.yaml --namespace platform valkey-status: ## Check Valkey pod status kubectl get pods -n platform -l app.kubernetes.io/name=valkey ``` **Done when:** `make valkey-deploy` succeeds; Valkey Running in `platform` namespace; Gitea reconnected to new Valkey endpoint. --- ## References - OAS Standard: `canon/standards/orthogonal-architecture_v1.0.md` - ADR-003 (boundary rule): `railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md` - RAIL-BS-WP-0003 (pgpool fix): `railiance-cluster/workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md` - RAIL-HO-WP-0003 T04 (relocation table): `railiance-infra/workplans/RAIL-HO-WP-0003-5repo-stack-restructure.md` - Decision D3 (HA testing policy): `railiance-cluster/DECISIONS.md` - State Hub workstream: `e4ec133c-7cb9-43c6-95f0-50d6591f13d7`