Files
railiance-platform/workplans/archived/RAIL-PL-WP-0001-platform-baseline.md
2026-05-19 01:40:42 +02:00

301 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: RAIL-PL-WP-0001
type: workplan
title: "S3 Platform Services Baseline"
domain: railiance
repo: railiance-platform
status: archived
owner: railiance
topic_slug: railiance
state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7"
superseded_by: RAIL-HO-WP-0004
created: "2026-03-11"
updated: "2026-05-17"
---
# S3 Platform Services Baseline
> **SUPERSEDED** by `RAIL-HO-WP-0004` (2026-03-26).
> This workplan targeted Bitnami postgresql-ha, which is now stale.
> CloudNative PG (cnpg) is the deployed operator. See WP-0004 T03T05 for
> the current cnpg-based platform baseline work.
## Goal
Establish `railiance-platform` (S3) as a reproducible, OAS-compliant platform
layer. Currently, PostgreSQL HA and Valkey are deployed implicitly as subcharts
of the Gitea Helm release in S2 (`railiance-cluster`). This violates the OAS
boundary rule: S3 owns platform services; S2 owns only the cluster runtime.
This workplan makes S3 a proper, standalone layer that S5 applications can
depend on.
## Scope
| Concern | Current location | After this workplan |
|---------|-----------------|---------------------|
| PostgreSQL HA (repmgr + pgpool) | Gitea subchart in S2 | Standalone Helm release in S3 |
| Valkey (Redis-compatible cache) | Gitea subchart in S2 | Standalone Helm release in S3 |
| Gitea Helm values | `railiance-cluster/helm/` (S2) | `railiance-apps/helm/` (S5) |
| `railiance-backup` tool | `railiance-cluster/tools/cmd/` (S2) | `railiance-platform/tools/cmd/` (S3) |
## Pre-conditions
- `railiance-cluster` converged: k3s running, Helm available (`make smoke` passes)
- Active backup on Nextcloud before any migration step
- SSH tunnel active for State Hub MCP access
## Boundary rule reminder (ADR-003)
> S3 owns shared platform services. S5 owns application deployments.
> S2 must not manage database or cache services directly.
---
## Tasks
### T01 — Codify standalone PostgreSQL HA Helm chart
```task
id: RAIL-PL-WP-0001-T01
state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99
status: cancelled
priority: high
```
Write `helm/postgresql-ha-values.sops.yaml` using the Bitnami `postgresql-ha`
chart. Capture the values currently baked into the Gitea subchart, including
the `pgpool-password` fix from RAIL-BS-WP-0003:
```yaml
# helm/postgresql-ha-values.sops.yaml (schema only — encrypt secrets with SOPS)
postgresql:
replicaCount: 3
password: ENC[...]
postgresPassword: ENC[...]
repmgrPassword: ENC[...]
pgpool:
replicaCount: 1
adminPassword: ENC[...]
# pgpool-password must be set — see RAIL-BS-WP-0003
pgpoolPassword: ENC[...]
persistence:
enabled: true
size: 10Gi
```
Add a `make` target:
```makefile
pg-deploy: ## Deploy standalone PostgreSQL HA to cluster
helm upgrade --install postgresql-ha bitnami/postgresql-ha \
-f helm/postgresql-ha-values.yaml --namespace platform --create-namespace
pg-status: ## Check PostgreSQL HA pod status
kubectl get pods -n platform -l app.kubernetes.io/name=postgresql-ha
```
Add `docs/postgresql-ha.md` documenting:
- Chart version pinned
- Connection string pattern for apps
- How to create a new database for an app
- How to rotate passwords (SOPS re-encrypt → helm upgrade)
**Done when:** `make pg-deploy` succeeds; three postgresql-ha pods + pgpool
Running in the `platform` namespace; `make smoke` still passes.
---
### T02 — Migrate Gitea to use external PostgreSQL
```task
id: RAIL-PL-WP-0001-T02
state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88
status: cancelled
priority: high
```
**Pre-condition:** T01 done and `postgresql-ha` healthy in `platform` namespace.
Steps:
1. **Backup first:** `make backup` in `railiance-cluster` — verify upload to Nextcloud.
2. Create a `gitea` database and user on the new standalone cluster:
```bash
kubectl exec -n platform postgresql-ha-postgresql-0 -- \
psql -U postgres -c "CREATE DATABASE gitea; CREATE USER gitea WITH PASSWORD '...'; GRANT ALL ON DATABASE gitea TO gitea;"
```
3. Migrate data: `pg_dump` from old DB → `pg_restore` into new cluster.
4. Update `helm/gitea-values.sops.yaml` to disable the subchart and point to
the external DB:
```yaml
postgresql-ha:
enabled: false
externalDatabase:
host: postgresql-ha-pgpool.platform.svc.cluster.local
port: 5432
database: gitea
username: gitea
password: ENC[...]
```
5. `helm upgrade gitea` — verify Gitea operational.
**Done when:** Gitea login works; `postgresql-ha` subchart pods are gone;
all data intact.
---
### T03 — Relocate Gitea Helm deployment to railiance-apps (S5)
```task
id: RAIL-PL-WP-0001-T03
state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1
status: cancelled
priority: medium
```
**Pre-condition:** T02 done.
```bash
# In railiance-cluster:
git mv helm/gitea-values.sops.yaml ../railiance-apps/helm/
```
Add to `railiance-apps/Makefile`:
```makefile
gitea-deploy: ## Deploy / upgrade Gitea
helm upgrade --install gitea gitea-charts/gitea \
-f helm/gitea-values.yaml --namespace apps --create-namespace
gitea-status: ## Check Gitea pod status
kubectl get pods -n apps -l app.kubernetes.io/name=gitea
```
Add tombstone in `railiance-cluster/helm/MOVED.md`:
```
gitea-values.sops.yaml moved to railiance-apps/helm/ (2026-03-11, RAIL-PL-WP-0001-T03)
```
Update `railiance-cluster/tests/smoke_kube.sh` and `tests/test_ha_failover.sh`
to reference the new namespace (`apps`) if Gitea moves namespaces.
**Done when:** `gitea-values.sops.yaml` is in `railiance-apps/helm/`; Gitea
still operational; tombstone in place.
---
### T04 — Smoke + HA failover tests pass post-migration
```task
id: RAIL-PL-WP-0001-T04
state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee
status: cancelled
priority: high
```
Per Decision D3: no HA deployment is complete until the failover test exits 0.
```bash
# From railiance-cluster:
make smoke # all assertions green
make test-ha-failover GITEA_URL=https://<gitea-hostname>
```
Expected: pgpool recovers cleanly after primary pod deletion; Gitea login
remains available within the recovery window.
**Done when:** both scripts exit 0 against the migrated live cluster.
---
### T05 — Relocate railiance-backup tool from S2 to S3
```task
id: RAIL-PL-WP-0001-T05
state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3
status: cancelled
priority: medium
```
As flagged in RAIL-HO-WP-0003 T04: backup is a platform concern (S3), not a
cluster runtime concern (S2).
```bash
mkdir -p ~/railiance-platform/tools/cmd
git mv ~/railiance-cluster/tools/cmd/railiance-backup \
~/railiance-platform/tools/cmd/railiance-backup
```
Update `railiance-platform/Makefile`:
```makefile
backup: ## Backup platform services (PostgreSQL, Valkey) — age-encrypted
sudo tools/cmd/railiance-backup
```
Add tombstone stub in `railiance-cluster/tools/cmd/`:
```bash
# railiance-backup — MOVED to railiance-platform/tools/cmd/ (RAIL-PL-WP-0001-T05)
```
Update `railiance-cluster/Makefile` `backup` target to delegate:
```makefile
backup: ## Backup cluster runtime — delegates platform backup to railiance-platform
@echo "Cluster backup (etcd + kubeconfig):"
sudo tools/cmd/railiance-backup-s2
@echo "Platform backup (PostgreSQL, Valkey): run 'make backup' in railiance-platform"
```
**Done when:** `make backup` in railiance-platform runs the platform backup;
railiance-cluster backup still covers etcd/kubeconfig; no duplication.
---
### T06 — Codify Valkey as standalone S3 asset
```task
id: RAIL-PL-WP-0001-T06
state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811
status: cancelled
priority: low
```
Valkey is currently deployed as a Gitea subchart. Once T02 removes the subchart
bundle, Valkey must be deployed independently so Gitea and future apps
(Zulip) can use it.
Write `helm/valkey-values.sops.yaml`:
```yaml
# Bitnami Valkey chart
auth:
enabled: true
password: ENC[...]
replica:
replicaCount: 1
persistence:
enabled: true
size: 2Gi
```
Add `make` targets:
```makefile
valkey-deploy: ## Deploy Valkey (Redis-compatible) to platform namespace
helm upgrade --install valkey bitnami/valkey \
-f helm/valkey-values.yaml --namespace platform
valkey-status: ## Check Valkey pod status
kubectl get pods -n platform -l app.kubernetes.io/name=valkey
```
**Done when:** `make valkey-deploy` succeeds; Valkey Running in `platform`
namespace; Gitea reconnected to new Valkey endpoint.
---
## References
- OAS Standard: `canon/standards/orthogonal-architecture_v1.0.md`
- ADR-003 (boundary rule): `railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md`
- RAIL-BS-WP-0003 (pgpool fix): `railiance-cluster/workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md`
- RAIL-HO-WP-0003 T04 (relocation table): `railiance-infra/workplans/RAIL-HO-WP-0003-5repo-stack-restructure.md`
- Decision D3 (HA testing policy): `railiance-cluster/DECISIONS.md`
- State Hub workstream: `e4ec133c-7cb9-43c6-95f0-50d6591f13d7`