From b2d9b67783a0c00a34810c3bb7c8d86576827888 Mon Sep 17 00:00:00 2001 From: tegwick Date: Wed, 11 Mar 2026 02:10:06 +0100 Subject: [PATCH] feat(workplan): RAIL-PL-WP-0001 S3 Platform Services Baseline First workplan for railiance-platform (S3). Separates platform services from the S2 cluster runtime layer per ADR-003: - T01: standalone PostgreSQL HA Helm chart (platform namespace) - T02: migrate Gitea to external DB, remove subchart coupling - T03: relocate Gitea Helm values to railiance-apps (S5) - T04: smoke + HA failover tests (D3 policy) - T05: relocate railiance-backup tool from S2 to S3 - T06: standalone Valkey deployment (enables Zulip reuse) Workstream: e4ec133c-7cb9-43c6-95f0-50d6591f13d7 Co-Authored-By: Claude Sonnet 4.6 --- .../RAIL-PL-WP-0001-platform-baseline.md | 294 ++++++++++++++++++ 1 file changed, 294 insertions(+) create mode 100644 workplans/RAIL-PL-WP-0001-platform-baseline.md diff --git a/workplans/RAIL-PL-WP-0001-platform-baseline.md b/workplans/RAIL-PL-WP-0001-platform-baseline.md new file mode 100644 index 0000000..e53164c --- /dev/null +++ b/workplans/RAIL-PL-WP-0001-platform-baseline.md @@ -0,0 +1,294 @@ +--- +id: RAIL-PL-WP-0001 +type: workplan +title: "S3 Platform Services Baseline" +domain: railiance +repo: railiance-platform +status: active +owner: railiance +topic_slug: railiance +state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7" +created: "2026-03-11" +updated: "2026-03-11" +--- + +# S3 Platform Services Baseline + +## Goal + +Establish `railiance-platform` (S3) as a reproducible, OAS-compliant platform +layer. Currently, PostgreSQL HA and Valkey are deployed implicitly as subcharts +of the Gitea Helm release in S2 (`railiance-cluster`). This violates the OAS +boundary rule: S3 owns platform services; S2 owns only the cluster runtime. + +This workplan makes S3 a proper, standalone layer that S5 applications can +depend on. + +## Scope + +| Concern | Current location | After this workplan | +|---------|-----------------|---------------------| +| PostgreSQL HA (repmgr + pgpool) | Gitea subchart in S2 | Standalone Helm release in S3 | +| Valkey (Redis-compatible cache) | Gitea subchart in S2 | Standalone Helm release in S3 | +| Gitea Helm values | `railiance-cluster/helm/` (S2) | `railiance-apps/helm/` (S5) | +| `railiance-backup` tool | `railiance-cluster/tools/cmd/` (S2) | `railiance-platform/tools/cmd/` (S3) | + +## Pre-conditions + +- `railiance-cluster` converged: k3s running, Helm available (`make smoke` passes) +- Active backup on Nextcloud before any migration step +- SSH tunnel active for State Hub MCP access + +## Boundary rule reminder (ADR-003) + +> S3 owns shared platform services. S5 owns application deployments. +> S2 must not manage database or cache services directly. + +--- + +## Tasks + +### T01 — Codify standalone PostgreSQL HA Helm chart + +```task +id: RAIL-PL-WP-0001-T01 +state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99 +status: todo +priority: high +``` + +Write `helm/postgresql-ha-values.sops.yaml` using the Bitnami `postgresql-ha` +chart. Capture the values currently baked into the Gitea subchart, including +the `pgpool-password` fix from RAIL-BS-WP-0003: + +```yaml +# helm/postgresql-ha-values.sops.yaml (schema only — encrypt secrets with SOPS) +postgresql: + replicaCount: 3 + password: ENC[...] + postgresPassword: ENC[...] + repmgrPassword: ENC[...] +pgpool: + replicaCount: 1 + adminPassword: ENC[...] + # pgpool-password must be set — see RAIL-BS-WP-0003 + pgpoolPassword: ENC[...] +persistence: + enabled: true + size: 10Gi +``` + +Add a `make` target: + +```makefile +pg-deploy: ## Deploy standalone PostgreSQL HA to cluster + helm upgrade --install postgresql-ha bitnami/postgresql-ha \ + -f helm/postgresql-ha-values.yaml --namespace platform --create-namespace + +pg-status: ## Check PostgreSQL HA pod status + kubectl get pods -n platform -l app.kubernetes.io/name=postgresql-ha +``` + +Add `docs/postgresql-ha.md` documenting: +- Chart version pinned +- Connection string pattern for apps +- How to create a new database for an app +- How to rotate passwords (SOPS re-encrypt → helm upgrade) + +**Done when:** `make pg-deploy` succeeds; three postgresql-ha pods + pgpool +Running in the `platform` namespace; `make smoke` still passes. + +--- + +### T02 — Migrate Gitea to use external PostgreSQL + +```task +id: RAIL-PL-WP-0001-T02 +state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88 +status: todo +priority: high +``` + +**Pre-condition:** T01 done and `postgresql-ha` healthy in `platform` namespace. + +Steps: +1. **Backup first:** `make backup` in `railiance-cluster` — verify upload to Nextcloud. +2. Create a `gitea` database and user on the new standalone cluster: + ```bash + kubectl exec -n platform postgresql-ha-postgresql-0 -- \ + psql -U postgres -c "CREATE DATABASE gitea; CREATE USER gitea WITH PASSWORD '...'; GRANT ALL ON DATABASE gitea TO gitea;" + ``` +3. Migrate data: `pg_dump` from old DB → `pg_restore` into new cluster. +4. Update `helm/gitea-values.sops.yaml` to disable the subchart and point to + the external DB: + ```yaml + postgresql-ha: + enabled: false + externalDatabase: + host: postgresql-ha-pgpool.platform.svc.cluster.local + port: 5432 + database: gitea + username: gitea + password: ENC[...] + ``` +5. `helm upgrade gitea` — verify Gitea operational. + +**Done when:** Gitea login works; `postgresql-ha` subchart pods are gone; +all data intact. + +--- + +### T03 — Relocate Gitea Helm deployment to railiance-apps (S5) + +```task +id: RAIL-PL-WP-0001-T03 +state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1 +status: todo +priority: medium +``` + +**Pre-condition:** T02 done. + +```bash +# In railiance-cluster: +git mv helm/gitea-values.sops.yaml ../railiance-apps/helm/ +``` + +Add to `railiance-apps/Makefile`: +```makefile +gitea-deploy: ## Deploy / upgrade Gitea + helm upgrade --install gitea gitea-charts/gitea \ + -f helm/gitea-values.yaml --namespace apps --create-namespace + +gitea-status: ## Check Gitea pod status + kubectl get pods -n apps -l app.kubernetes.io/name=gitea +``` + +Add tombstone in `railiance-cluster/helm/MOVED.md`: +``` +gitea-values.sops.yaml moved to railiance-apps/helm/ (2026-03-11, RAIL-PL-WP-0001-T03) +``` + +Update `railiance-cluster/tests/smoke_kube.sh` and `tests/test_ha_failover.sh` +to reference the new namespace (`apps`) if Gitea moves namespaces. + +**Done when:** `gitea-values.sops.yaml` is in `railiance-apps/helm/`; Gitea +still operational; tombstone in place. + +--- + +### T04 — Smoke + HA failover tests pass post-migration + +```task +id: RAIL-PL-WP-0001-T04 +state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee +status: todo +priority: high +``` + +Per Decision D3: no HA deployment is complete until the failover test exits 0. + +```bash +# From railiance-cluster: +make smoke # all assertions green +make test-ha-failover GITEA_URL=https:// +``` + +Expected: pgpool recovers cleanly after primary pod deletion; Gitea login +remains available within the recovery window. + +**Done when:** both scripts exit 0 against the migrated live cluster. + +--- + +### T05 — Relocate railiance-backup tool from S2 to S3 + +```task +id: RAIL-PL-WP-0001-T05 +state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3 +status: todo +priority: medium +``` + +As flagged in RAIL-HO-WP-0003 T04: backup is a platform concern (S3), not a +cluster runtime concern (S2). + +```bash +mkdir -p ~/railiance-platform/tools/cmd +git mv ~/railiance-cluster/tools/cmd/railiance-backup \ + ~/railiance-platform/tools/cmd/railiance-backup +``` + +Update `railiance-platform/Makefile`: +```makefile +backup: ## Backup platform services (PostgreSQL, Valkey) — age-encrypted + sudo tools/cmd/railiance-backup +``` + +Add tombstone stub in `railiance-cluster/tools/cmd/`: +```bash +# railiance-backup — MOVED to railiance-platform/tools/cmd/ (RAIL-PL-WP-0001-T05) +``` + +Update `railiance-cluster/Makefile` `backup` target to delegate: +```makefile +backup: ## Backup cluster runtime — delegates platform backup to railiance-platform + @echo "Cluster backup (etcd + kubeconfig):" + sudo tools/cmd/railiance-backup-s2 + @echo "Platform backup (PostgreSQL, Valkey): run 'make backup' in railiance-platform" +``` + +**Done when:** `make backup` in railiance-platform runs the platform backup; +railiance-cluster backup still covers etcd/kubeconfig; no duplication. + +--- + +### T06 — Codify Valkey as standalone S3 asset + +```task +id: RAIL-PL-WP-0001-T06 +state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811 +status: todo +priority: low +``` + +Valkey is currently deployed as a Gitea subchart. Once T02 removes the subchart +bundle, Valkey must be deployed independently so Gitea and future apps +(Zulip) can use it. + +Write `helm/valkey-values.sops.yaml`: +```yaml +# Bitnami Valkey chart +auth: + enabled: true + password: ENC[...] +replica: + replicaCount: 1 +persistence: + enabled: true + size: 2Gi +``` + +Add `make` targets: +```makefile +valkey-deploy: ## Deploy Valkey (Redis-compatible) to platform namespace + helm upgrade --install valkey bitnami/valkey \ + -f helm/valkey-values.yaml --namespace platform + +valkey-status: ## Check Valkey pod status + kubectl get pods -n platform -l app.kubernetes.io/name=valkey +``` + +**Done when:** `make valkey-deploy` succeeds; Valkey Running in `platform` +namespace; Gitea reconnected to new Valkey endpoint. + +--- + +## References + +- OAS Standard: `canon/standards/orthogonal-architecture_v1.0.md` +- ADR-003 (boundary rule): `railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md` +- RAIL-BS-WP-0003 (pgpool fix): `railiance-cluster/workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md` +- RAIL-HO-WP-0003 T04 (relocation table): `railiance-infra/workplans/RAIL-HO-WP-0003-5repo-stack-restructure.md` +- Decision D3 (HA testing policy): `railiance-cluster/DECISIONS.md` +- State Hub workstream: `e4ec133c-7cb9-43c6-95f0-50d6591f13d7`