Propose RAILIANCE-WP-0003: shared cnpg cluster apps-pg
6-task plan to provision a shared CloudNative PG cluster apps-pg in the databases namespace, with NetworkPolicies that use a label-based consumer opt-in (railiance.io/postgres-client=apps-pg) instead of the per-namespace allowlist gitea-db uses. Responds to coordination message 768c18f4 from railiance-apps and unblocks RAILIANCE-WP-0002 T04 (vergabe-teilnahme role+db creation). Keeps platform agnostic of individual apps per ADR-003: per-app Database CRs and credential Secrets are owned by the consuming repos. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
350
workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md
Normal file
350
workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md
Normal file
@@ -0,0 +1,350 @@
|
||||
---
|
||||
id: RAILIANCE-WP-0003
|
||||
type: workplan
|
||||
title: "Provision shared cnpg cluster apps-pg"
|
||||
domain: railiance
|
||||
repo: railiance-platform
|
||||
status: proposed
|
||||
owner: railiance-platform
|
||||
topic_slug: railiance
|
||||
planning_priority: high
|
||||
planning_order: 3
|
||||
created: "2026-05-19"
|
||||
updated: "2026-05-19"
|
||||
---
|
||||
|
||||
# Provision shared cnpg cluster apps-pg
|
||||
|
||||
## Goal
|
||||
|
||||
Provision a new shared CloudNative PG cluster `apps-pg` in the
|
||||
`databases` namespace that S5 application workloads can use to host
|
||||
their own PostgreSQL databases — without each app forcing the creation
|
||||
of a dedicated cnpg cluster.
|
||||
|
||||
This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme
|
||||
needs a `vergabe` role + `vergabe_db` database) and establishes the
|
||||
shared-cluster pattern future S5 apps adopt by default.
|
||||
|
||||
## Context
|
||||
|
||||
`railiance-apps` workplan `RAILIANCE-WP-0002` (establish
|
||||
vergabe-teilnahme on railiance01) found at T01 that the two existing
|
||||
cnpg clusters in `databases` are app-dedicated:
|
||||
|
||||
| Cluster | PG | Owner app |
|
||||
|----------------|----|-------------|
|
||||
| `gitea-db` | 18 | gitea |
|
||||
| `net-kingdom-pg`| 16 | net-kingdom |
|
||||
|
||||
Decision `D-01` (resolved 2026-05-18, bernd) selected option D:
|
||||
**provision a new shared cluster `apps-pg`** rather than create a third
|
||||
dedicated cluster (option A) or retrofit an existing app cluster (B/C).
|
||||
|
||||
A coordination message was sent from `railiance-apps` to
|
||||
`railiance-platform` requesting this work; this workplan is the
|
||||
response.
|
||||
|
||||
## Placement in the Railiance Tooling Set
|
||||
|
||||
S3 owns cnpg `Cluster` CRs (per ADR-003 and the pattern already
|
||||
established by `helm/gitea-db-cluster.yaml`). S5 consumers create their
|
||||
own per-app `Database` CRs and credential Secrets pointing at the
|
||||
shared cluster's service.
|
||||
|
||||
| Concern | Owner repo | Scope |
|
||||
|---------|------------|-------|
|
||||
| `Cluster apps-pg` CR, NetworkPolicies, backups | `railiance-platform` | this workplan |
|
||||
| Per-app `Database` CRs (`vergabe_db`, ...) | each S5 repo | not here |
|
||||
| Per-app credential Secrets | each S5 repo | not here |
|
||||
| Helm release wiring DSNs into app pods | each S5 repo | not here |
|
||||
|
||||
## Current Evidence
|
||||
|
||||
- `kubectl get crd | grep cnpg` confirms cnpg 1.28.1 with the
|
||||
`databases.postgresql.cnpg.io` CRD — consumers can self-provision
|
||||
DBs declaratively.
|
||||
- Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1`
|
||||
(`cnpg-system` namespace).
|
||||
- `databases` namespace has a default-deny-all NetworkPolicy; each
|
||||
cnpg cluster therefore needs its own NetworkPolicy triplet
|
||||
(egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns)
|
||||
— pattern visible in `helm/gitea-db-networkpolicies.yaml`.
|
||||
- No `helm/apps-pg-*.yaml` artifacts exist yet.
|
||||
- Coordination message id: `768c18f4-8785-4108-a900-fa117eb8778f`
|
||||
(state-hub thread).
|
||||
|
||||
## Safety Contract
|
||||
|
||||
- Do not commit plaintext credentials. Bootstrap secret is a one-time
|
||||
manual `kubectl create secret` then SOPS-encrypt a template into
|
||||
`helm/apps-pg-secret.sops.yaml.template`.
|
||||
- Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`.
|
||||
This cluster is for S5 *application* DBs.
|
||||
- Preserve the default-deny NetworkPolicy posture in `databases`;
|
||||
only allow ingress from namespaces that have a registered consumer.
|
||||
- Initial sizing is conservative (1 instance, 10Gi) to match the
|
||||
existing per-cluster footprint. Resize is a follow-up workplan.
|
||||
- Cluster name `apps-pg` is locked once published — renaming changes
|
||||
every consumer DSN.
|
||||
|
||||
## Target State
|
||||
|
||||
- `kubectl get cluster apps-pg -n databases` reports
|
||||
`Cluster in healthy state` with the primary `apps-pg-1`.
|
||||
- `kubectl get svc apps-pg-rw apps-pg-ro -n databases` exists.
|
||||
- NetworkPolicies for `apps-pg` mirror the `gitea-db` triplet.
|
||||
- `make apps-pg-deploy / apps-pg-status / apps-pg-shell / apps-pg-logs`
|
||||
targets exist and work.
|
||||
- Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`)
|
||||
exist for cluster health probes and to anchor the bootstrap; the
|
||||
cluster is otherwise empty of per-app data.
|
||||
- Documentation explains how an S5 consumer registers a new database
|
||||
via a `Database` CR plus its own credential Secret, without touching
|
||||
this repo.
|
||||
- `railiance-apps` is notified via the hub thread; their
|
||||
`RAILIANCE-WP-0002 T04` can proceed.
|
||||
|
||||
## Tasks
|
||||
|
||||
### T01 — Inventory and capacity check
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Confirm the substrate before adding a new cluster.
|
||||
|
||||
Checks:
|
||||
|
||||
- cnpg operator version (≥ 1.28.x required for the `Database` CR
|
||||
consumer pattern).
|
||||
- Node-level disk space available for an additional 10Gi PVC
|
||||
(`local-path` storage class is the active default).
|
||||
- Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any
|
||||
current resource pressure.
|
||||
- That the `databases` namespace already exists and has its
|
||||
default-deny NetworkPolicy in place.
|
||||
- That `cnpg-system` namespace label
|
||||
`kubernetes.io/metadata.name=cnpg-system` is set (required by the
|
||||
ingress-from-operator NetworkPolicy).
|
||||
|
||||
**Done when:** the workplan records cnpg version, available PVC
|
||||
capacity, and any pre-condition gaps.
|
||||
|
||||
---
|
||||
|
||||
### T02 — Create bootstrap credential secret
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Mint the one-time bootstrap secret that cnpg uses to create the initial
|
||||
`apps_admin` role.
|
||||
|
||||
Steps:
|
||||
|
||||
```bash
|
||||
APPS_PG_PW=$(openssl rand -base64 32)
|
||||
kubectl create secret generic apps-pg-credentials \
|
||||
--namespace databases \
|
||||
--from-literal=username=apps_admin \
|
||||
--from-literal=password="$APPS_PG_PW"
|
||||
```
|
||||
|
||||
Then commit a SOPS-encrypted template:
|
||||
|
||||
- `helm/apps-pg-secret.sops.yaml.template` — encrypted form for
|
||||
declarative reapply; do not commit the plaintext password.
|
||||
|
||||
The bootstrap role is intentionally minimal — per-app roles are
|
||||
created later by their owning repos via cnpg `Role` declarations or
|
||||
direct grants.
|
||||
|
||||
**Done when:** the secret exists in the cluster and an encrypted
|
||||
template is committed.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Add the cnpg Cluster manifest
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`.
|
||||
|
||||
Shape:
|
||||
|
||||
```yaml
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: apps-pg
|
||||
namespace: databases
|
||||
labels:
|
||||
app.kubernetes.io/name: apps-pg
|
||||
app.kubernetes.io/component: database
|
||||
app.kubernetes.io/managed-by: manual
|
||||
railiance.io/layer: s3-platform
|
||||
railiance.io/role: shared-apps-database
|
||||
spec:
|
||||
instances: 1 # bump when node RAM > 8GB
|
||||
postgresql:
|
||||
version: "16"
|
||||
storage:
|
||||
size: 10Gi
|
||||
bootstrap:
|
||||
initdb:
|
||||
database: apps_meta
|
||||
owner: apps_admin
|
||||
secret:
|
||||
name: apps-pg-credentials
|
||||
```
|
||||
|
||||
Note: PG version is 16 (matches vergabe-teilnahme's minimum and the
|
||||
existing `net-kingdom-pg`). Bumping to 17/18 is a separate decision.
|
||||
|
||||
**Done when:** the manifest is committed and `kubectl apply --dry-run`
|
||||
validates against the cluster.
|
||||
|
||||
---
|
||||
|
||||
### T04 — Add NetworkPolicies for apps-pg
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T04
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet
|
||||
but parameterised for the *apps* consumer namespace pattern.
|
||||
|
||||
Three policies (all in `databases`, all selecting
|
||||
`cnpg.io/cluster: apps-pg`):
|
||||
|
||||
1. `allow-egress-kube-api-apps-pg` — egress to TCP/6443.
|
||||
2. `allow-ingress-from-cnpg-operator-apps-pg` — ingress from
|
||||
`namespaceSelector kubernetes.io/metadata.name=cnpg-system` on TCP
|
||||
ports 5432 / 8000 / 9187.
|
||||
3. `allow-ingress-from-app-namespaces-apps-pg` — ingress on TCP/5432
|
||||
from any namespace carrying the label
|
||||
`railiance.io/postgres-client=apps-pg`. (Each consuming app
|
||||
namespace adds this label; this avoids hard-coding a namespace list
|
||||
in the platform repo.)
|
||||
|
||||
The label-based selector is the meaningful difference from gitea-db,
|
||||
which hard-codes `default`. The shared cluster cannot know its
|
||||
consumer namespaces in advance, so it expects a positive opt-in label.
|
||||
|
||||
**Done when:** the policies are committed and applied; consumer namespaces
|
||||
can connect after applying the `railiance.io/postgres-client=apps-pg`
|
||||
label.
|
||||
|
||||
---
|
||||
|
||||
### T05 — Makefile targets, deploy, verify
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T05
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add targets that mirror the `db-*` (gitea-db) family:
|
||||
|
||||
```make
|
||||
apps-pg-deploy: ## Apply shared apps-pg cnpg Cluster + NetworkPolicies
|
||||
$(KUBECTL) apply -f helm/apps-pg-cluster.yaml
|
||||
$(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml
|
||||
|
||||
apps-pg-status: ## Show apps-pg cnpg cluster health
|
||||
$(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \
|
||||
$(KUBECTL) get cluster apps-pg -n databases -o wide
|
||||
|
||||
apps-pg-shell: ## Open psql shell on apps-pg primary
|
||||
$(KUBECTL) cnpg psql apps-pg -n databases -- -U apps_admin apps_meta
|
||||
|
||||
apps-pg-logs: ## Tail apps-pg primary logs
|
||||
$(KUBECTL) logs -n databases -l cnpg.io/cluster=apps-pg -f --tail=50
|
||||
```
|
||||
|
||||
Then deploy and wait for the cluster to converge:
|
||||
|
||||
```bash
|
||||
make apps-pg-deploy
|
||||
kubectl wait --for=condition=Ready cluster/apps-pg -n databases --timeout=5m
|
||||
```
|
||||
|
||||
Smoke checks:
|
||||
|
||||
- `cnpg status` reports `Cluster in healthy state`.
|
||||
- Services `apps-pg-rw` and `apps-pg-ro` exist.
|
||||
- From a disposable pod in a namespace labeled
|
||||
`railiance.io/postgres-client=apps-pg`,
|
||||
`psql 'postgresql://apps_admin:...@apps-pg-rw.databases:5432/apps_meta'`
|
||||
connects.
|
||||
|
||||
**Done when:** the smoke checks pass.
|
||||
|
||||
---
|
||||
|
||||
### T06 — Reply to railiance-apps, document the consumer contract
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0003-T06
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Notify the requester and capture the pattern.
|
||||
|
||||
Steps:
|
||||
|
||||
- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` via
|
||||
`reply_to_message` with this workplan's id and the cluster's
|
||||
connection details (service name, port, bootstrap admin role).
|
||||
- Add `docs/apps-pg.md` with:
|
||||
- Cluster identity and connection endpoints.
|
||||
- The per-app onboarding recipe: (a) label the consumer namespace
|
||||
`railiance.io/postgres-client=apps-pg`, (b) create a credential
|
||||
Secret in the consumer namespace, (c) create a cnpg `Database` CR
|
||||
referencing the cluster and the credential Secret, (d) wire the
|
||||
DSN into the application Helm values.
|
||||
- Backup posture (when the cluster is added to the existing platform
|
||||
backup process) and the resize / replicate roadmap.
|
||||
|
||||
**Done when:** the message is replied to and `docs/apps-pg.md` is
|
||||
committed.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This workplan is complete when:
|
||||
|
||||
1. `apps-pg` reports healthy in the `databases` namespace.
|
||||
2. NetworkPolicies enforce the default-deny posture with label-based
|
||||
consumer opt-in.
|
||||
3. Makefile targets work end-to-end.
|
||||
4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly
|
||||
acknowledged via the hub thread.
|
||||
5. `docs/apps-pg.md` explains the consumer onboarding contract.
|
||||
|
||||
## Notes
|
||||
|
||||
- This intentionally does **not** create the `vergabe` role or
|
||||
`vergabe_db` — that work belongs in `railiance-apps`. Keeping the
|
||||
platform layer ignorant of individual apps preserves ADR-003.
|
||||
- Backup inclusion of `apps-pg` is a follow-up. The existing
|
||||
`make backup` target only covers the legacy PostgreSQL-HA setup;
|
||||
cnpg backup configuration is its own workplan.
|
||||
- A second replica (HA) and a connection pooler (PgBouncer / cnpg
|
||||
`Pooler`) are deferred. The cluster spec leaves room
|
||||
for both — re-enable when node capacity allows.
|
||||
Reference in New Issue
Block a user