From 52efcaa0b286eed50b9d0e2d086b27be6e19fb05 Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 18 May 2026 18:21:28 +0200 Subject: [PATCH] Propose RAILIANCE-WP-0002: vergabe-teilnahme on railiance01 8-task plan to deploy vergabe-teilnahme as a Helm release at vergabe-teilnahme.whywhynot.de with image from gitea.coulomb.social and a dedicated role on the shared cnpg cluster. Co-Authored-By: Claude Opus 4.7 --- ...P-0002-vergabe-teilnahme-on-railiance01.md | 383 ++++++++++++++++++ 1 file changed, 383 insertions(+) create mode 100644 workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md diff --git a/workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md b/workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md new file mode 100644 index 0000000..84a5f86 --- /dev/null +++ b/workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md @@ -0,0 +1,383 @@ +--- +id: RAILIANCE-WP-0002 +type: workplan +title: "Establish vergabe-teilnahme as an Application on railiance01" +domain: railiance +repo: railiance-apps +status: proposed +owner: railiance +topic_slug: railiance +created: "2026-05-18" +updated: "2026-05-18" +planning_priority: high +planning_order: 2 +--- + +# Establish vergabe-teilnahme as an Application on railiance01 + +## Goal + +Deploy the `vergabe-teilnahme` Django application as a governed Helm release on +the production cluster node `railiance01` (`92.205.130.254`), reachable at +`https://vergabe-teilnahme.whywhynot.de`, with its container image published +through the Railiance Gitea OCI registry and its PostgreSQL data living in the +shared cnpg cluster. + +This establishes vergabe-teilnahme as the second application (after Gitea) +running on the S5 layer of the Railiance OAS Stack and exercises the freshly +enabled container registry from `RAILIANCE-WP-0001` end-to-end. + +## Placement in the Railiance Tooling Set + +This workplan lives in `railiance-apps` because vergabe-teilnahme is an S5 +application workload. The deployment surface added by this workplan is: + +- `helm/vergabe-teilnahme-values.sops.yaml` — SOPS-encrypted Helm values + (`DJANGO_SECRET_KEY`, DB DSN, etc.). +- `releases/vergabe-teilnahme/` — chart selection / overlay (Bitnami generic + chart or hand-rolled chart, decided in T05). +- `manifests/vergabe-teilnahme-ingress.yaml` — Traefik ingress + cert-manager + TLS for `vergabe-teilnahme.whywhynot.de`. +- `Makefile` targets: `vergabe-deploy`, `vergabe-status`, `vergabe-migrate`. + +Cross-repo coordination required: + +| Concern | Owner repo | Coordination | +|---------|------------|--------------| +| Application Helm release | `railiance-apps` | This workplan | +| Containerization (Dockerfile, entrypoint, asset build) | `vergabe-teilnahme` | This workplan opens a task in that repo | +| PostgreSQL role + database in shared cnpg cluster | `railiance-platform` | Capability request via hub | +| DNS A record for `vergabe-teilnahme.whywhynot.de` | DNS owner of `whywhynot.de` | Out-of-band, captured in T06 | +| Ingress controller / cluster routing primitives | `railiance-cluster` | Reuse — no changes expected | +| cert-manager ClusterIssuer `letsencrypt-prod` | `railiance-platform` | Reuse — no changes expected | + +## Current Evidence + +- `vergabe-teilnahme/CLAUDE.md`: Django 6.x · uv · Tailwind CSS v4 (Vite) · + HTMX 2.x · Alpine.js 3.x · PostgreSQL 16+ (psycopg3). German UI. +- `vergabe-teilnahme/` currently has no `Dockerfile`. `docker-compose.dev.yml` + documents the local Postgres but isn't started when the shared + `infra-postgres-1` is up. +- `railiance-apps/Makefile` deploys Gitea via `helm/gitea-values.sops.yaml`; + the same SOPS + Helm pattern is the template for this workplan. +- `RAILIANCE-WP-0001` confirmed `https://gitea.coulomb.social/v2/` returns + the OCI registry auth challenge. Image naming convention established: + `gitea.coulomb.social//:`. +- `manifests/gitea-ingress.yaml` confirms the ingress recipe: + `ingressClassName: traefik` + annotation + `cert-manager.io/cluster-issuer: letsencrypt-prod`. +- Domain `whywhynot.de` has no prior references in any railiance repo — + DNS and a fresh Let's Encrypt cert will need to be set up. +- Live cnpg state: `gitea-db` runs in the `databases` namespace. T01 + confirms whether a single shared cluster exists for app DBs or whether + one must be designated. + +## Safety Contract + +- Do not commit decrypted Helm values, the Django `SECRET_KEY`, DB + credentials, or any other secret material. +- Use a dedicated PostgreSQL role with privileges scoped to a single + database; never reuse the gitea role or grant superuser. +- No public exposure until cert-manager has successfully issued a TLS + certificate for `vergabe-teilnahme.whywhynot.de`. +- Do not start with `DEBUG=True`; production settings only. +- Preserve Gitea behavior: the new ingress must not conflict with the + `gitea` ingress on the cluster's default ingress controller. +- If DB role provisioning requires changes to the shared cnpg cluster + resource limits, pause and create a `railiance-platform` task. +- If DNS for `whywhynot.de` is owned outside this operator's control, + pause T06 until DNS ownership is confirmed. +- Start fresh — no migration of data from any existing dev database in + this workplan. + +## Target State + +- `vergabe-teilnahme:` image is built and pushed to + `gitea.coulomb.social/coulomb/vergabe-teilnahme:`. +- A `vergabe` PostgreSQL role and `vergabe_db` database exist in the + shared cnpg cluster (single role, single DB, no cross-app grants). +- A Helm release `vergabe-teilnahme` is deployed in a dedicated + namespace with a single replica, a Service, a PVC for media (if any), + and the necessary secrets sourced from SOPS values. +- Django `migrate` and `make seed` have run successfully against the + shared cnpg database. +- `https://vergabe-teilnahme.whywhynot.de` serves the Django app behind + a valid Let's Encrypt certificate. +- Login as a superuser succeeds; the dashboard renders; static assets + are served correctly (Tailwind/Vite build pipeline integrated into the + image). +- Runbook, registry naming, DB credentials handling, and rollback steps + are documented for the next operator. + +## Tasks + +### T01 — Inventory the deployment substrate + +```task +id: RAILIANCE-WP-0002-T01 +status: todo +priority: high +``` + +Confirm the pre-conditions before any code is written. + +Checks: + +- Identify the shared cnpg cluster intended for app databases (name, + namespace, version, current databases/roles, available capacity). +- Verify `gitea.coulomb.social/v2/` still returns an OCI registry auth + challenge and that the operator has a package-capable token. +- Verify cert-manager `letsencrypt-prod` ClusterIssuer is healthy and + has successfully issued at least one production certificate recently + (`gitea-tls` is a known good example). +- Confirm DNS ownership and the change path for `whywhynot.de` — record + who owns the zone and how an A record is added. +- Confirm Traefik is the active ingress controller and note the public + IP/hostname an A record must point at. + +**Done when:** the workplan records (a) the cnpg cluster to use, +(b) the operator's path to a Gitea package token, (c) the DNS change +path for `whywhynot.de`, and (d) any pre-condition gaps. + +--- + +### T02 — Add Dockerfile and asset build to vergabe-teilnahme + +```task +id: RAILIANCE-WP-0002-T02 +status: todo +priority: high +repo: vergabe-teilnahme +``` + +Open a companion task in the `vergabe-teilnahme` repo to add a +production-ready container image definition. + +Scope: + +- Multi-stage `Dockerfile` at the repo root: + - Stage 1: Node + Vite + Tailwind asset build (`npm ci` → + `npm run build` → emits to `static/dist/`). + - Stage 2: Python image, `uv sync --frozen`, copy app and built + assets, run `manage.py collectstatic --noinput`. +- Production WSGI/ASGI server (gunicorn) listening on `:8000`. +- Whitenoise-style static asset serving (or document an alternative). +- Non-root container user, sensible `WORKDIR`, healthcheck endpoint. +- `.dockerignore` excluding `node_modules`, `media/`, `__pycache__`, + `.venv`, `static/dist` source, etc. +- Document the build command and the chosen image tag scheme in the + vergabe-teilnahme README. + +Coordination: this task crosses into `vergabe-teilnahme`. Track via a +hub task and reference the PR/commit when complete. + +**Done when:** `docker build` against the vergabe-teilnahme repo produces +a runnable image that responds to a smoke request locally. + +--- + +### T03 — Build and publish image to Gitea container registry + +```task +id: RAILIANCE-WP-0002-T03 +status: todo +priority: high +``` + +Push the first production image of vergabe-teilnahme through the +registry enabled in `RAILIANCE-WP-0001`. + +Steps: + +```bash +docker login gitea.coulomb.social +docker build -t gitea.coulomb.social/coulomb/vergabe-teilnahme: \ + /home/worsch/vergabe-teilnahme +docker push gitea.coulomb.social/coulomb/vergabe-teilnahme: +``` + +Choose a deterministic tag scheme (`` recommended). Record the +exact image reference and digest used for the first deployment. + +**Done when:** the image is fetchable from a disposable Kubernetes pod +on the cluster. + +--- + +### T04 — Provision PostgreSQL role and database + +```task +id: RAILIANCE-WP-0002-T04 +status: todo +priority: high +``` + +Create a `vergabe` PostgreSQL role and `vergabe_db` database inside the +shared cnpg cluster identified in T01. + +Approach: + +- Use a cnpg `Database` and `Role` (or `ScheduledBackup` / SQL bootstrap) + resource — never an out-of-band `psql` change without recording it. +- The role owns only `vergabe_db`; no `CREATEDB`, no superuser, no grants + on other databases. +- Capture the database DSN in the SOPS values file (T05). +- Coordinate with `railiance-platform` if any cluster-level change is + needed (resource limits, backup inclusion, monitoring). + +**Done when:** the new role can connect to `vergabe_db` from inside the +cluster (`kubectl run --rm -it psql ...`) and is recorded in the SOPS +values used by T05. + +--- + +### T05 — Author Helm release for vergabe-teilnahme + +```task +id: RAILIANCE-WP-0002-T05 +status: todo +priority: high +``` + +Add the chart selection (or bespoke chart) and SOPS-encrypted values +that turn the published image into a Kubernetes Deployment. + +Deliverables: + +- Decide chart approach: Bitnami `common` template, a thin in-repo + chart under `charts/vergabe-teilnahme/`, or raw manifests. Record the + rationale in the workplan log. +- `helm/vergabe-teilnahme-values.sops.yaml` containing: + - image repo + tag from T03, + - env (DJANGO_SETTINGS_MODULE=`vergabe_teilnahme.settings.prod`, + `ALLOWED_HOSTS`, `CSRF_TRUSTED_ORIGINS`), + - `SECRET_KEY` (generated, never committed in clear), + - DB DSN from T04, + - resource requests/limits, single replica, readiness/liveness probes + targeting the healthcheck endpoint introduced in T02. +- A dedicated namespace (`vergabe-teilnahme`). +- Optional: PVC for media uploads if Django `MEDIA_ROOT` is needed in + v1; otherwise omit and document deferral. +- `Makefile` targets: `vergabe-deploy`, `vergabe-status`, + `vergabe-migrate`. + +**Done when:** `make vergabe-deploy` renders cleanly with `--dry-run` +and produces no plaintext secrets in the rendered manifest source. + +--- + +### T06 — DNS, ingress, and TLS for vergabe-teilnahme.whywhynot.de + +```task +id: RAILIANCE-WP-0002-T06 +status: todo +priority: high +``` + +Make the application reachable behind a valid Let's Encrypt certificate. + +Steps: + +- Add an A record `vergabe-teilnahme.whywhynot.de` → + cluster public IP (per T01). Use the DNS change path captured in T01. +- Add `manifests/vergabe-teilnahme-ingress.yaml` modeled on + `gitea-ingress.yaml`: + - `ingressClassName: traefik`, + - annotation `cert-manager.io/cluster-issuer: letsencrypt-prod`, + - `tls.secretName: vergabe-teilnahme-tls`, + - host `vergabe-teilnahme.whywhynot.de`, backend the Service from T05. +- Wait for cert-manager to issue the cert. +- Validate `https://vergabe-teilnahme.whywhynot.de/healthz` (or + equivalent) returns 200 with a trusted cert chain. + +Boundary note: ingress controller and cluster networking changes +belong in `railiance-cluster`. This task only adds an `Ingress` +resource that consumes the existing controller. + +**Done when:** the public hostname serves the app over HTTPS and the +certificate chain validates from outside the cluster. + +--- + +### T07 — Initial migration, seed, and smoke test + +```task +id: RAILIANCE-WP-0002-T07 +status: todo +priority: high +``` + +Bring the app to a usable state in production. + +Steps: + +- Run `manage.py migrate` as a Kubernetes `Job` or one-shot + `kubectl exec` against the running Deployment (record which). +- Run `manage.py seed` (the `make seed` target) — vergabe-teilnahme's + idempotent seed. +- Create the first superuser via `manage.py createsuperuser`. +- Smoke checklist: + - Login at `/admin/` succeeds. + - The dashboard at `/` renders without errors. + - Static assets (Tailwind build output) are served with correct + content-type and 200 status. + - HTMX partial requests succeed on at least one page. + - A new `Ausschreibung` can be created and saved. + +**Done when:** the smoke checklist passes and `kubectl logs` shows no +unexpected errors. + +--- + +### T08 — Document handoff, runbook, and backup posture + +```task +id: RAILIANCE-WP-0002-T08 +status: todo +priority: medium +``` + +Capture everything an on-call operator needs. + +Deliverables in `docs/vergabe-teilnahme.md`: + +- Registry image naming and tag scheme. +- Namespace, Deployment, Service, Ingress names. +- DB DSN handling (where secrets live, how to rotate). +- Restart, rollback (`helm rollback`), and migration commands. +- Backup posture: confirm whether the shared cnpg cluster's backup job + includes `vergabe_db`; if not, open a `railiance-platform` follow-up. +- Pointer to the vergabe-teilnahme repo for app-level changes vs. + `railiance-apps` for Helm/ingress changes. + +**Done when:** a new operator can find vergabe-teilnahme, deploy a new +image tag, and recover from a pod crash without reading this workplan. + +## Completion Criteria + +This workplan is complete when: + +1. The vergabe-teilnahme image is published to + `gitea.coulomb.social/coulomb/vergabe-teilnahme:`. +2. A dedicated PostgreSQL role and database serve the app from the + shared cnpg cluster. +3. `helm/vergabe-teilnahme-values.sops.yaml` and the ingress manifest + are committed; `make vergabe-deploy` is the single command to deploy. +4. `https://vergabe-teilnahme.whywhynot.de` serves the app over HTTPS + with a valid Let's Encrypt cert. +5. Migrations + seed have run; the smoke checklist passes. +6. Runbook is in `docs/vergabe-teilnahme.md`. + +## Notes + +- This is the second application on `railiance01` (after Gitea). It + intentionally adopts the same SOPS + Helm + Traefik + cert-manager + pattern so the operator workflow stays consistent. +- v1 deliberately defers: 3-stage canary (Staged Promotion Lifecycle is + still 0/7), SSO/Keycloak integration, S3-backed media storage, + Celery + Redis workers (optional in the architecture blueprint), and + multi-replica HA. Each can become its own follow-up workplan once the + baseline runs. +- The `whywhynot.de` domain enters the Railiance stack for the first + time here. Treat the DNS path established in T01/T06 as the reference + for any future `*.whywhynot.de` workloads.