Files
railiance-platform/workplans/RAIL-PL-WP-0001-platform-baseline.md
tegwick 007afdcb6b chore(workplan): mark WP-0001 superseded by RAIL-HO-WP-0004
WP-0001 targeted Bitnami postgresql-ha; CloudNative PG (cnpg) is the
deployed operator. Migration path now tracked in RAIL-HO-WP-0004-T03–T05.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:02:23 +01:00

8.7 KiB
Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, state_hub_workstream_id, superseded_by, created, updated
id type title domain repo status owner topic_slug state_hub_workstream_id superseded_by created updated
RAIL-PL-WP-0001 workplan S3 Platform Services Baseline railiance railiance-platform superseded railiance railiance e4ec133c-7cb9-43c6-95f0-50d6591f13d7 RAIL-HO-WP-0004 2026-03-11 2026-03-26

S3 Platform Services Baseline

SUPERSEDED by RAIL-HO-WP-0004 (2026-03-26). This workplan targeted Bitnami postgresql-ha, which is now stale. CloudNative PG (cnpg) is the deployed operator. See WP-0004 T03T05 for the current cnpg-based platform baseline work.

Goal

Establish railiance-platform (S3) as a reproducible, OAS-compliant platform layer. Currently, PostgreSQL HA and Valkey are deployed implicitly as subcharts of the Gitea Helm release in S2 (railiance-cluster). This violates the OAS boundary rule: S3 owns platform services; S2 owns only the cluster runtime.

This workplan makes S3 a proper, standalone layer that S5 applications can depend on.

Scope

Concern Current location After this workplan
PostgreSQL HA (repmgr + pgpool) Gitea subchart in S2 Standalone Helm release in S3
Valkey (Redis-compatible cache) Gitea subchart in S2 Standalone Helm release in S3
Gitea Helm values railiance-cluster/helm/ (S2) railiance-apps/helm/ (S5)
railiance-backup tool railiance-cluster/tools/cmd/ (S2) railiance-platform/tools/cmd/ (S3)

Pre-conditions

  • railiance-cluster converged: k3s running, Helm available (make smoke passes)
  • Active backup on Nextcloud before any migration step
  • SSH tunnel active for State Hub MCP access

Boundary rule reminder (ADR-003)

S3 owns shared platform services. S5 owns application deployments. S2 must not manage database or cache services directly.


Tasks

T01 — Codify standalone PostgreSQL HA Helm chart

id: RAIL-PL-WP-0001-T01
state_hub_task_id: f5af95bf-3d2d-458a-b695-666d4dc2dc99
status: todo
priority: high

Write helm/postgresql-ha-values.sops.yaml using the Bitnami postgresql-ha chart. Capture the values currently baked into the Gitea subchart, including the pgpool-password fix from RAIL-BS-WP-0003:

# helm/postgresql-ha-values.sops.yaml (schema only — encrypt secrets with SOPS)
postgresql:
  replicaCount: 3
  password: ENC[...]
  postgresPassword: ENC[...]
  repmgrPassword: ENC[...]
pgpool:
  replicaCount: 1
  adminPassword: ENC[...]
  # pgpool-password must be set — see RAIL-BS-WP-0003
  pgpoolPassword: ENC[...]
persistence:
  enabled: true
  size: 10Gi

Add a make target:

pg-deploy: ## Deploy standalone PostgreSQL HA to cluster
    helm upgrade --install postgresql-ha bitnami/postgresql-ha \
        -f helm/postgresql-ha-values.yaml --namespace platform --create-namespace

pg-status: ## Check PostgreSQL HA pod status
    kubectl get pods -n platform -l app.kubernetes.io/name=postgresql-ha

Add docs/postgresql-ha.md documenting:

  • Chart version pinned
  • Connection string pattern for apps
  • How to create a new database for an app
  • How to rotate passwords (SOPS re-encrypt → helm upgrade)

Done when: make pg-deploy succeeds; three postgresql-ha pods + pgpool Running in the platform namespace; make smoke still passes.


T02 — Migrate Gitea to use external PostgreSQL

id: RAIL-PL-WP-0001-T02
state_hub_task_id: c1073011-935a-4c1a-9a9f-dc4db1fc3e88
status: todo
priority: high

Pre-condition: T01 done and postgresql-ha healthy in platform namespace.

Steps:

  1. Backup first: make backup in railiance-cluster — verify upload to Nextcloud.
  2. Create a gitea database and user on the new standalone cluster:
    kubectl exec -n platform postgresql-ha-postgresql-0 -- \
      psql -U postgres -c "CREATE DATABASE gitea; CREATE USER gitea WITH PASSWORD '...'; GRANT ALL ON DATABASE gitea TO gitea;"
    
  3. Migrate data: pg_dump from old DB → pg_restore into new cluster.
  4. Update helm/gitea-values.sops.yaml to disable the subchart and point to the external DB:
    postgresql-ha:
      enabled: false
    externalDatabase:
      host: postgresql-ha-pgpool.platform.svc.cluster.local
      port: 5432
      database: gitea
      username: gitea
      password: ENC[...]
    
  5. helm upgrade gitea — verify Gitea operational.

Done when: Gitea login works; postgresql-ha subchart pods are gone; all data intact.


T03 — Relocate Gitea Helm deployment to railiance-apps (S5)

id: RAIL-PL-WP-0001-T03
state_hub_task_id: a820cd02-0f30-4488-abf1-897120f1fbc1
status: todo
priority: medium

Pre-condition: T02 done.

# In railiance-cluster:
git mv helm/gitea-values.sops.yaml ../railiance-apps/helm/

Add to railiance-apps/Makefile:

gitea-deploy: ## Deploy / upgrade Gitea
    helm upgrade --install gitea gitea-charts/gitea \
        -f helm/gitea-values.yaml --namespace apps --create-namespace

gitea-status: ## Check Gitea pod status
    kubectl get pods -n apps -l app.kubernetes.io/name=gitea

Add tombstone in railiance-cluster/helm/MOVED.md:

gitea-values.sops.yaml moved to railiance-apps/helm/ (2026-03-11, RAIL-PL-WP-0001-T03)

Update railiance-cluster/tests/smoke_kube.sh and tests/test_ha_failover.sh to reference the new namespace (apps) if Gitea moves namespaces.

Done when: gitea-values.sops.yaml is in railiance-apps/helm/; Gitea still operational; tombstone in place.


T04 — Smoke + HA failover tests pass post-migration

id: RAIL-PL-WP-0001-T04
state_hub_task_id: 8df4774c-5251-4c85-be57-61b903be82ee
status: todo
priority: high

Per Decision D3: no HA deployment is complete until the failover test exits 0.

# From railiance-cluster:
make smoke           # all assertions green
make test-ha-failover GITEA_URL=https://<gitea-hostname>

Expected: pgpool recovers cleanly after primary pod deletion; Gitea login remains available within the recovery window.

Done when: both scripts exit 0 against the migrated live cluster.


T05 — Relocate railiance-backup tool from S2 to S3

id: RAIL-PL-WP-0001-T05
state_hub_task_id: 231f6f8a-97a0-4aa0-8318-8e4361af67a3
status: todo
priority: medium

As flagged in RAIL-HO-WP-0003 T04: backup is a platform concern (S3), not a cluster runtime concern (S2).

mkdir -p ~/railiance-platform/tools/cmd
git mv ~/railiance-cluster/tools/cmd/railiance-backup \
       ~/railiance-platform/tools/cmd/railiance-backup

Update railiance-platform/Makefile:

backup: ## Backup platform services (PostgreSQL, Valkey) — age-encrypted
    sudo tools/cmd/railiance-backup

Add tombstone stub in railiance-cluster/tools/cmd/:

# railiance-backup — MOVED to railiance-platform/tools/cmd/ (RAIL-PL-WP-0001-T05)

Update railiance-cluster/Makefile backup target to delegate:

backup: ## Backup cluster runtime — delegates platform backup to railiance-platform
    @echo "Cluster backup (etcd + kubeconfig):"
    sudo tools/cmd/railiance-backup-s2
    @echo "Platform backup (PostgreSQL, Valkey): run 'make backup' in railiance-platform"

Done when: make backup in railiance-platform runs the platform backup; railiance-cluster backup still covers etcd/kubeconfig; no duplication.


T06 — Codify Valkey as standalone S3 asset

id: RAIL-PL-WP-0001-T06
state_hub_task_id: 20899c81-2b24-4d70-ad02-f6a1383b6811
status: todo
priority: low

Valkey is currently deployed as a Gitea subchart. Once T02 removes the subchart bundle, Valkey must be deployed independently so Gitea and future apps (Zulip) can use it.

Write helm/valkey-values.sops.yaml:

# Bitnami Valkey chart
auth:
  enabled: true
  password: ENC[...]
replica:
  replicaCount: 1
persistence:
  enabled: true
  size: 2Gi

Add make targets:

valkey-deploy: ## Deploy Valkey (Redis-compatible) to platform namespace
    helm upgrade --install valkey bitnami/valkey \
        -f helm/valkey-values.yaml --namespace platform

valkey-status: ## Check Valkey pod status
    kubectl get pods -n platform -l app.kubernetes.io/name=valkey

Done when: make valkey-deploy succeeds; Valkey Running in platform namespace; Gitea reconnected to new Valkey endpoint.


References

  • OAS Standard: canon/standards/orthogonal-architecture_v1.0.md
  • ADR-003 (boundary rule): railiance-infra/docs/adr/ADR-003-railiance-5repo-stack-architecture.md
  • RAIL-BS-WP-0003 (pgpool fix): railiance-cluster/workplans/RAIL-BS-WP-0003-pgpool-ha-failover-fix.md
  • RAIL-HO-WP-0003 T04 (relocation table): railiance-infra/workplans/RAIL-HO-WP-0003-5repo-stack-restructure.md
  • Decision D3 (HA testing policy): railiance-cluster/DECISIONS.md
  • State Hub workstream: e4ec133c-7cb9-43c6-95f0-50d6591f13d7