feat(workplan): RAIL-PL-WP-0001 S3 Platform Services Baseline
First workplan for railiance-platform (S3). Separates platform services from the S2 cluster runtime layer per ADR-003: - T01: standalone PostgreSQL HA Helm chart (platform namespace) - T02: migrate Gitea to external DB, remove subchart coupling - T03: relocate Gitea Helm values to railiance-apps (S5) - T04: smoke + HA failover tests (D3 policy) - T05: relocate railiance-backup tool from S2 to S3 - T06: standalone Valkey deployment (enables Zulip reuse) Workstream: e4ec133c-7cb9-43c6-95f0-50d6591f13d7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
294
workplans/RAIL-PL-WP-0001-platform-baseline.md
Normal file
294
workplans/RAIL-PL-WP-0001-platform-baseline.md
Normal file
@@ -0,0 +1,294 @@
|
||||
---
|
||||
id: RAIL-PL-WP-0001
|
||||
type: workplan
|
||||
title: "S3 Platform Services Baseline"
|
||||
domain: railiance
|
||||
repo: railiance-platform
|
||||
status: active
|
||||
owner: railiance
|
||||
topic_slug: railiance
|
||||
state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7"
|
||||
created: "2026-03-11"
|
||||
updated: "2026-03-11"
|
||||
---
|
||||
|
||||
# S3 Platform Services Baseline
|
||||
|
||||
## Goal
|
||||
|
||||
Establish `railiance-platform` (S3) as a reproducible, OAS-compliant platform
|
||||
layer. Currently, PostgreSQL HA and Valkey are deployed implicitly as subcharts
|
||||
of the Gitea Helm release in S2 (`railiance-cluster`). This violates the OAS
|
||||
boundary rule: S3 owns platform services; S2 owns only the cluster runtime.
|
||||
|
||||
This workplan makes S3 a proper, standalone layer that S5 applications can
|
||||
depend on.
|
||||
|
||||
## Scope
|
||||
|
||||
| Concern | Current location | After this workplan |
|
||||
|---------|-----------------|---------------------|
|
||||
| PostgreSQL HA (repmgr + pgpool) | Gitea subchart in S2 | Standalone Helm release in S3 |
|
||||
| Valkey (Redis-compatible cache) | Gitea subchart in S2 | Standalone Helm release in S3 |
|
||||
| Gitea Helm values | `railiance-cluster/helm/` (S2) | `railiance-apps/helm/` (S5) |
|
||||
| `railiance-backup` tool | `railiance-cluster/tools/cmd/` (S2) | `railiance-platform/tools/cmd/` (S3) |
|
||||
|
||||
## Pre-conditions
|
||||
|
||||
- `railiance-cluster` converged: k3s running, Helm available (`make smoke` passes)
|
||||
- Active backup on Nextcloud before any migration step
|
||||
- SSH tunnel active for State Hub MCP access
|
||||
|
||||
## Boundary rule reminder (ADR-003)
|
||||
|
||||
> S3 owns shared platform services. S5 owns application deployments.
|
||||
> S2 must not manage database or cache services directly.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 — Codify standalone PostgreSQL HA Helm chart
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T01
|
||||
state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Write `helm/postgresql-ha-values.sops.yaml` using the Bitnami `postgresql-ha`
|
||||
chart. Capture the values currently baked into the Gitea subchart, including
|
||||
the `pgpool-password` fix from RAIL-BS-WP-0003:
|
||||
|
||||
```yaml
|
||||
# helm/postgresql-ha-values.sops.yaml (schema only — encrypt secrets with SOPS)
|
||||
postgresql:
|
||||
replicaCount: 3
|
||||
password: ENC[...]
|
||||
postgresPassword: ENC[...]
|
||||
repmgrPassword: ENC[...]
|
||||
pgpool:
|
||||
replicaCount: 1
|
||||
adminPassword: ENC[...]
|
||||
# pgpool-password must be set — see RAIL-BS-WP-0003
|
||||
pgpoolPassword: ENC[...]
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
```
|
||||
|
||||
Add a `make` target:
|
||||
|
||||
```makefile
|
||||
pg-deploy: ## Deploy standalone PostgreSQL HA to cluster
|
||||
helm upgrade --install postgresql-ha bitnami/postgresql-ha \
|
||||
-f helm/postgresql-ha-values.yaml --namespace platform --create-namespace
|
||||
|
||||
pg-status: ## Check PostgreSQL HA pod status
|
||||
kubectl get pods -n platform -l app.kubernetes.io/name=postgresql-ha
|
||||
```
|
||||
|
||||
Add `docs/postgresql-ha.md` documenting:
|
||||
- Chart version pinned
|
||||
- Connection string pattern for apps
|
||||
- How to create a new database for an app
|
||||
- How to rotate passwords (SOPS re-encrypt → helm upgrade)
|
||||
|
||||
**Done when:** `make pg-deploy` succeeds; three postgresql-ha pods + pgpool
|
||||
Running in the `platform` namespace; `make smoke` still passes.
|
||||
|
||||
---
|
||||
|
||||
### T02 — Migrate Gitea to use external PostgreSQL
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T02
|
||||
state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
**Pre-condition:** T01 done and `postgresql-ha` healthy in `platform` namespace.
|
||||
|
||||
Steps:
|
||||
1. **Backup first:** `make backup` in `railiance-cluster` — verify upload to Nextcloud.
|
||||
2. Create a `gitea` database and user on the new standalone cluster:
|
||||
```bash
|
||||
kubectl exec -n platform postgresql-ha-postgresql-0 -- \
|
||||
psql -U postgres -c "CREATE DATABASE gitea; CREATE USER gitea WITH PASSWORD '...'; GRANT ALL ON DATABASE gitea TO gitea;"
|
||||
```
|
||||
3. Migrate data: `pg_dump` from old DB → `pg_restore` into new cluster.
|
||||
4. Update `helm/gitea-values.sops.yaml` to disable the subchart and point to
|
||||
the external DB:
|
||||
```yaml
|
||||
postgresql-ha:
|
||||
enabled: false
|
||||
externalDatabase:
|
||||
host: postgresql-ha-pgpool.platform.svc.cluster.local
|
||||
port: 5432
|
||||
database: gitea
|
||||
username: gitea
|
||||
password: ENC[...]
|
||||
```
|
||||
5. `helm upgrade gitea` — verify Gitea operational.
|
||||
|
||||
**Done when:** Gitea login works; `postgresql-ha` subchart pods are gone;
|
||||
all data intact.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Relocate Gitea Helm deployment to railiance-apps (S5)
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T03
|
||||
state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
**Pre-condition:** T02 done.
|
||||
|
||||
```bash
|
||||
# In railiance-cluster:
|
||||
git mv helm/gitea-values.sops.yaml ../railiance-apps/helm/
|
||||
```
|
||||
|
||||
Add to `railiance-apps/Makefile`:
|
||||
```makefile
|
||||
gitea-deploy: ## Deploy / upgrade Gitea
|
||||
helm upgrade --install gitea gitea-charts/gitea \
|
||||
-f helm/gitea-values.yaml --namespace apps --create-namespace
|
||||
|
||||
gitea-status: ## Check Gitea pod status
|
||||
kubectl get pods -n apps -l app.kubernetes.io/name=gitea
|
||||
```
|
||||
|
||||
Add tombstone in `railiance-cluster/helm/MOVED.md`:
|
||||
```
|
||||
gitea-values.sops.yaml moved to railiance-apps/helm/ (2026-03-11, RAIL-PL-WP-0001-T03)
|
||||
```
|
||||
|
||||
Update `railiance-cluster/tests/smoke_kube.sh` and `tests/test_ha_failover.sh`
|
||||
to reference the new namespace (`apps`) if Gitea moves namespaces.
|
||||
|
||||
**Done when:** `gitea-values.sops.yaml` is in `railiance-apps/helm/`; Gitea
|
||||
still operational; tombstone in place.
|
||||
|
||||
---
|
||||
|
||||
### T04 — Smoke + HA failover tests pass post-migration
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T04
|
||||
state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Per Decision D3: no HA deployment is complete until the failover test exits 0.
|
||||
|
||||
```bash
|
||||
# From railiance-cluster:
|
||||
make smoke # all assertions green
|
||||
make test-ha-failover GITEA_URL=https://<gitea-hostname>
|
||||
```
|
||||
|
||||
Expected: pgpool recovers cleanly after primary pod deletion; Gitea login
|
||||
remains available within the recovery window.
|
||||
|
||||
**Done when:** both scripts exit 0 against the migrated live cluster.
|
||||
|
||||
---
|
||||
|
||||
### T05 — Relocate railiance-backup tool from S2 to S3
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T05
|
||||
state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
As flagged in RAIL-HO-WP-0003 T04: backup is a platform concern (S3), not a
|
||||
cluster runtime concern (S2).
|
||||
|
||||
```bash
|
||||
mkdir -p ~/railiance-platform/tools/cmd
|
||||
git mv ~/railiance-cluster/tools/cmd/railiance-backup \
|
||||
~/railiance-platform/tools/cmd/railiance-backup
|
||||
```
|
||||
|
||||
Update `railiance-platform/Makefile`:
|
||||
```makefile
|
||||
backup: ## Backup platform services (PostgreSQL, Valkey) — age-encrypted
|
||||
sudo tools/cmd/railiance-backup
|
||||
```
|
||||
|
||||
Add tombstone stub in `railiance-cluster/tools/cmd/`:
|
||||
```bash
|
||||
# railiance-backup — MOVED to railiance-platform/tools/cmd/ (RAIL-PL-WP-0001-T05)
|
||||
```
|
||||
|
||||
Update `railiance-cluster/Makefile` `backup` target to delegate:
|
||||
```makefile
|
||||
backup: ## Backup cluster runtime — delegates platform backup to railiance-platform
|
||||
@echo "Cluster backup (etcd + kubeconfig):"
|
||||
sudo tools/cmd/railiance-backup-s2
|
||||
@echo "Platform backup (PostgreSQL, Valkey): run 'make backup' in railiance-platform"
|
||||
```
|
||||
|
||||
**Done when:** `make backup` in railiance-platform runs the platform backup;
|
||||
railiance-cluster backup still covers etcd/kubeconfig; no duplication.
|
||||
|
||||
---
|
||||
|
||||
### T06 — Codify Valkey as standalone S3 asset
|
||||
|
||||
```task
|
||||
id: RAIL-PL-WP-0001-T06
|
||||
state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811
|
||||
status: todo
|
||||
priority: low
|
||||
```
|
||||
|
||||
Valkey is currently deployed as a Gitea subchart. Once T02 removes the subchart
|
||||
bundle, Valkey must be deployed independently so Gitea and future apps
|
||||
(Zulip) can use it.
|
||||
|
||||
Write `helm/valkey-values.sops.yaml`:
|
||||
```yaml
|
||||
# Bitnami Valkey chart
|
||||
auth:
|
||||
enabled: true
|
||||
password: ENC[...]
|
||||
replica:
|
||||
replicaCount: 1
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 2Gi
|
||||
```
|
||||
|
||||
Add `make` targets:
|
||||
```makefile
|
||||
valkey-deploy: ## Deploy Valkey (Redis-compatible) to platform namespace
|
||||
helm upgrade --install valkey bitnami/valkey \
|
||||
-f helm/valkey-values.yaml --namespace platform
|
||||
|
||||
valkey-status: ## Check Valkey pod status
|
||||
kubectl get pods -n platform -l app.kubernetes.io/name=valkey
|
||||
```
|
||||
|
||||
**Done when:** `make valkey-deploy` succeeds; Valkey Running in `platform`
|
||||
namespace; Gitea reconnected to new Valkey endpoint.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- OAS Standard: `canon/standards/orthogonal-architecture_v1.0.md`
|
||||
- ADR-003 (boundary rule): `railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md`
|
||||
- RAIL-BS-WP-0003 (pgpool fix): `railiance-cluster/workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md`
|
||||
- RAIL-HO-WP-0003 T04 (relocation table): `railiance-infra/workplans/RAIL-HO-WP-0003-5repo-stack-restructure.md`
|
||||
- Decision D3 (HA testing policy): `railiance-cluster/DECISIONS.md`
|
||||
- State Hub workstream: `e4ec133c-7cb9-43c6-95f0-50d6591f13d7`
|
||||
Reference in New Issue
Block a user