diff --git a/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md b/workplans/RAILIANCE-WP-0003-apps-pg-shared-cluster.md similarity index 61% rename from workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md rename to workplans/RAILIANCE-WP-0003-apps-pg-shared-cluster.md index 09b1c92..bfc757b 100644 --- a/workplans/railiance-platform-WP-0003-apps-pg-shared-cluster.md +++ b/workplans/RAILIANCE-WP-0003-apps-pg-shared-cluster.md @@ -1,41 +1,43 @@ --- id: RAILIANCE-WP-0003 type: workplan -title: "Provision shared cnpg cluster apps-pg" +title: "Provision shared CNPG cluster apps-pg" domain: railiance repo: railiance-platform -status: proposed -owner: railiance-platform +status: ready +owner: codex topic_slug: railiance planning_priority: high planning_order: 3 created: "2026-05-19" updated: "2026-05-19" +state_hub_workstream_id: "665b3b9b-608a-4be4-84b6-dcb8261ff57b" --- -# Provision shared cnpg cluster apps-pg +# RAILIANCE-WP-0003 - Provision shared CNPG cluster apps-pg ## Goal -Provision a new shared CloudNative PG cluster `apps-pg` in the +Provision a new shared CloudNativePG cluster `apps-pg` in the `databases` namespace that S5 application workloads can use to host their own PostgreSQL databases — without each app forcing the creation -of a dedicated cnpg cluster. +of a dedicated CNPG cluster. This unblocks `railiance-apps RAILIANCE-WP-0002 T04` (vergabe-teilnahme -needs a `vergabe` role + `vergabe_db` database) and establishes the -shared-cluster pattern future S5 apps adopt by default. +needs a `vergabe` role + `vergabe_db` database) by establishing the +shared cluster and the governed onboarding contract future S5 apps adopt +by default. ## Context `railiance-apps` workplan `RAILIANCE-WP-0002` (establish vergabe-teilnahme on railiance01) found at T01 that the two existing -cnpg clusters in `databases` are app-dedicated: +CNPG clusters in `databases` are app-dedicated: | Cluster | PG | Owner app | |----------------|----|-------------| | `gitea-db` | 18 | gitea | -| `net-kingdom-pg`| 16 | net-kingdom | +| `net-kingdom-pg` | 16 | net-kingdom | Decision `D-01` (resolved 2026-05-18, bernd) selected option D: **provision a new shared cluster `apps-pg`** rather than create a third @@ -47,27 +49,32 @@ response. ## Placement in the Railiance Tooling Set -S3 owns cnpg `Cluster` CRs (per ADR-003 and the pattern already -established by `helm/gitea-db-cluster.yaml`). S5 consumers create their -own per-app `Database` CRs and credential Secrets pointing at the -shared cluster's service. +S3 owns CNPG `Cluster` CRs (per ADR-003 and the pattern already +established by `helm/gitea-db-cluster.yaml`). CNPG 1.28 has standalone +`Database` CRs, but PostgreSQL role lifecycle is managed through the +target `Cluster` spec's `.spec.managed.roles` stanza or through a +controlled operator-run SQL workflow. The shared-cluster contract must +therefore make role onboarding explicit; S5 repos should not assume a +standalone CNPG `Role` CR exists. | Concern | Owner repo | Scope | |---------|------------|-------| -| `Cluster apps-pg` CR, NetworkPolicies, backups | `railiance-platform` | this workplan | -| Per-app `Database` CRs (`vergabe_db`, ...) | each S5 repo | not here | -| Per-app credential Secrets | each S5 repo | not here | -| Helm release wiring DSNs into app pods | each S5 repo | not here | +| `Cluster apps-pg` CR, shared NetworkPolicies, bootstrap secret, baseline docs | `railiance-platform` | this workplan | +| Per-app database request and application DSN wiring | each S5 repo | not here | +| Per-app PostgreSQL role + credential provisioning | coordinated | documented here; platform-administered until OpenBao/dedicated automation exists | +| Per-app runtime Secret in the consumer namespace | each S5 repo | not here | ## Current Evidence -- `kubectl get crd | grep cnpg` confirms cnpg 1.28.1 with the - `databases.postgresql.cnpg.io` CRD — consumers can self-provision - DBs declaratively. +- `kubectl get crd | grep cnpg` confirms CNPG 1.28.1 with the + `databases.postgresql.cnpg.io` CRD — databases can be represented + declaratively. +- CNPG role management is cluster-scoped via `.spec.managed.roles`; + no standalone CNPG `Role` CR is available for app repos to apply. - Operator image: `ghcr.io/cloudnative-pg/cloudnative-pg:1.28.1` (`cnpg-system` namespace). - `databases` namespace has a default-deny-all NetworkPolicy; each - cnpg cluster therefore needs its own NetworkPolicy triplet + CNPG cluster therefore needs its own NetworkPolicy triplet (egress-to-kube-api, ingress-from-cnpg-operator, ingress-from-app-ns) — pattern visible in `helm/gitea-db-networkpolicies.yaml`. - No `helm/apps-pg-*.yaml` artifacts exist yet. @@ -79,10 +86,15 @@ shared cluster's service. - Do not commit plaintext credentials. Bootstrap secret is a one-time manual `kubectl create secret` then SOPS-encrypt a template into `helm/apps-pg-secret.sops.yaml.template`. +- Do not expose `apps_admin` to S5 applications. It is a platform + bootstrap/smoke-test role, not a runtime credential. - Do not collocate non-app data (Gitea, net-kingdom) into `apps-pg`. This cluster is for S5 *application* DBs. - Preserve the default-deny NetworkPolicy posture in `databases`; only allow ingress from namespaces that have a registered consumer. +- Do not advertise self-service role creation until the role + provisioning mechanism is explicit. CNPG `Database` CRs still require + their owner role to exist. - Initial sizing is conservative (1 instance, 10Gi) to match the existing per-cluster footprint. Resize is a follow-up workplan. - Cluster name `apps-pg` is locked once published — renaming changes @@ -99,11 +111,13 @@ shared cluster's service. - Bootstrap admin role (`apps_admin`) and meta database (`apps_meta`) exist for cluster health probes and to anchor the bootstrap; the cluster is otherwise empty of per-app data. -- Documentation explains how an S5 consumer registers a new database - via a `Database` CR plus its own credential Secret, without touching - this repo. +- Documentation explains how an S5 consumer registers a new database, + including the current CNPG boundary: the `Database` CR is separate, + but role lifecycle is cluster-scoped and therefore governed by the + platform contract. - `railiance-apps` is notified via the hub thread; their - `RAILIANCE-WP-0002 T04` can proceed. + `RAILIANCE-WP-0002 T04` can proceed using the documented onboarding + path. ## Tasks @@ -113,14 +127,17 @@ shared cluster's service. id: RAILIANCE-WP-0003-T01 status: todo priority: high +state_hub_task_id: "37843f2f-0022-4725-ab07-29f6ae4c1749" ``` Confirm the substrate before adding a new cluster. Checks: -- cnpg operator version (≥ 1.28.x required for the `Database` CR +- CNPG operator version (≥ 1.28.x required for the `Database` CR consumer pattern). +- Role/database API boundary: `Database` CR is present; role lifecycle + is `.spec.managed.roles` or controlled SQL, not a separate `Role` CR. - Node-level disk space available for an additional 10Gi PVC (`local-path` storage class is the active default). - Existing cluster footprint (`gitea-db`, `net-kingdom-pg`) and any @@ -131,8 +148,9 @@ Checks: `kubernetes.io/metadata.name=cnpg-system` is set (required by the ingress-from-operator NetworkPolicy). -**Done when:** the workplan records cnpg version, available PVC -capacity, and any pre-condition gaps. +**Done when:** the implementation notes record CNPG version, available +PVC capacity, the chosen role onboarding mechanism, and any +pre-condition gaps. --- @@ -142,9 +160,10 @@ capacity, and any pre-condition gaps. id: RAILIANCE-WP-0003-T02 status: todo priority: high +state_hub_task_id: "b4777198-e42f-4ca1-b562-a595559fdf08" ``` -Mint the one-time bootstrap secret that cnpg uses to create the initial +Mint the one-time bootstrap secret that CNPG uses to create the initial `apps_admin` role. Steps: @@ -162,24 +181,29 @@ Then commit a SOPS-encrypted template: - `helm/apps-pg-secret.sops.yaml.template` — encrypted form for declarative reapply; do not commit the plaintext password. -The bootstrap role is intentionally minimal — per-app roles are -created later by their owning repos via cnpg `Role` declarations or -direct grants. +The bootstrap role is intentionally not a consumer role. Per-app runtime +roles are created later through the onboarding mechanism documented in +T06; until dedicated automation exists, that mechanism is +platform-administered. **Done when:** the secret exists in the cluster and an encrypted template is committed. --- -### T03 — Add the cnpg Cluster manifest +### T03 — Add the CNPG Cluster manifest ```task id: RAILIANCE-WP-0003-T03 status: todo priority: high +state_hub_task_id: "0840583d-23b2-4b93-9002-7977e6896a12" ``` Add `helm/apps-pg-cluster.yaml` modeled on `helm/gitea-db-cluster.yaml`. +Do not add app-specific roles or databases to the baseline cluster +manifest unless T01 explicitly chooses a platform-owned managed-role +stanza as the interim onboarding path for the first consumer. Shape: @@ -223,6 +247,7 @@ validates against the cluster. id: RAILIANCE-WP-0003-T04 status: todo priority: high +state_hub_task_id: "7237f0f2-28e6-4eee-981b-06d0115cb0d1" ``` Add `helm/apps-pg-networkpolicies.yaml` modeled on the gitea-db triplet @@ -257,16 +282,17 @@ label. id: RAILIANCE-WP-0003-T05 status: todo priority: high +state_hub_task_id: "dc346e73-eadf-4eaa-8296-358df262f648" ``` Add targets that mirror the `db-*` (gitea-db) family: ```make -apps-pg-deploy: ## Apply shared apps-pg cnpg Cluster + NetworkPolicies +apps-pg-deploy: ## Apply shared apps-pg CNPG Cluster + NetworkPolicies $(KUBECTL) apply -f helm/apps-pg-cluster.yaml $(KUBECTL) apply -f helm/apps-pg-networkpolicies.yaml -apps-pg-status: ## Show apps-pg cnpg cluster health +apps-pg-status: ## Show apps-pg CNPG cluster health $(KUBECTL) cnpg status apps-pg -n databases 2>/dev/null || \ $(KUBECTL) get cluster apps-pg -n databases -o wide @@ -288,10 +314,11 @@ Smoke checks: - `cnpg status` reports `Cluster in healthy state`. - Services `apps-pg-rw` and `apps-pg-ro` exist. -- From a disposable pod in a namespace labeled - `railiance.io/postgres-client=apps-pg`, - `psql 'postgresql://apps_admin:...@apps-pg-rw.databases:5432/apps_meta'` - connects. +- From a disposable pod in a temporary namespace labeled + `railiance.io/postgres-client=apps-pg`, a platform-operated test + connection to `apps-pg-rw.databases:5432/apps_meta` succeeds. Delete + the temporary namespace and any copied test secret immediately after + the check; do not place `apps_admin` in an application namespace. **Done when:** the smoke checks pass. @@ -303,22 +330,28 @@ Smoke checks: id: RAILIANCE-WP-0003-T06 status: todo priority: medium +state_hub_task_id: "8b78934d-0a3c-413c-a66f-295092282547" ``` Notify the requester and capture the pattern. Steps: -- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` via - `reply_to_message` with this workplan's id and the cluster's - connection details (service name, port, bootstrap admin role). +- Reply to thread `768c18f4-8785-4108-a900-fa117eb8778f` through the + State Hub `/messages/` REST API with this workplan's id and the + cluster's connection details. Do not send bootstrap credentials. - Add `docs/apps-pg.md` with: - Cluster identity and connection endpoints. - - The per-app onboarding recipe: (a) label the consumer namespace - `railiance.io/postgres-client=apps-pg`, (b) create a credential - Secret in the consumer namespace, (c) create a cnpg `Database` CR - referencing the cluster and the credential Secret, (d) wire the - DSN into the application Helm values. + - The per-app onboarding recipe: (a) request/approve a per-app role, + (b) provision the backing role and credential through the chosen + platform mechanism, (c) create the CNPG `Database` CR in the + `databases` namespace with `spec.cluster.name: apps-pg` and + `spec.owner` set to the approved role, (d) label the consumer + namespace `railiance.io/postgres-client=apps-pg`, (e) publish or + mirror the runtime Secret into the consumer namespace, and (f) wire + the DSN into the application Helm values. + - The CNPG 1.28 boundary: `Database` is standalone; role management is + not a standalone `Role` CR and must follow the platform contract. - Backup posture (when the cluster is added to the existing platform backup process) and the resize / replicate roadmap. @@ -335,16 +368,19 @@ This workplan is complete when: 3. Makefile targets work end-to-end. 4. `railiance-apps RAILIANCE-WP-0002 T04` is unblocked and explicitly acknowledged via the hub thread. -5. `docs/apps-pg.md` explains the consumer onboarding contract. +5. `docs/apps-pg.md` explains the consumer onboarding contract, + including the CNPG role/database boundary. ## Notes -- This intentionally does **not** create the `vergabe` role or - `vergabe_db` — that work belongs in `railiance-apps`. Keeping the - platform layer ignorant of individual apps preserves ADR-003. +- This intentionally does **not** hard-code the `vergabe` role or + `vergabe_db` into the shared cluster baseline. The consumer onboarding + doc must describe the follow-up request/manifest needed for + `railiance-apps` so the platform layer stays generic until an app + explicitly registers. - Backup inclusion of `apps-pg` is a follow-up. The existing `make backup` target only covers the legacy PostgreSQL-HA setup; - cnpg backup configuration is its own workplan. -- A second replica (HA) and a connection pooler (PgBouncer / cnpg + CNPG backup configuration is its own workplan. +- A second replica (HA) and a connection pooler (PgBouncer / CNPG `Pooler`) are deferred. The cluster spec leaves room for both — re-enable when node capacity allows. diff --git a/workplans/archived/RAIL-PL-WP-0001-platform-baseline.md b/workplans/archived/RAIL-PL-WP-0001-platform-baseline.md index 1031ffa..0a13e16 100644 --- a/workplans/archived/RAIL-PL-WP-0001-platform-baseline.md +++ b/workplans/archived/RAIL-PL-WP-0001-platform-baseline.md @@ -4,7 +4,7 @@ type: workplan title: "S3 Platform Services Baseline" domain: railiance repo: railiance-platform -status: superseded +status: archived owner: railiance topic_slug: railiance state_hub_workstream_id: "e4ec133c-7cb9-43c6-95f0-50d6591f13d7"