--- id: RAILIANCE-WP-0002 type: workplan title: "Establish vergabe-teilnahme as an Application on railiance01" domain: railiance repo: railiance-apps status: proposed owner: railiance topic_slug: railiance created: "2026-05-18" updated: "2026-05-18" planning_priority: high planning_order: 2 state_hub_workstream_id: "94522a85-80d5-4f2c-8eb0-8d0bcb15f3b0" --- # Establish vergabe-teilnahme as an Application on railiance01 ## Goal Deploy the `vergabe-teilnahme` Django application as a governed Helm release on the production cluster node `railiance01` (`92.205.130.254`), reachable at `https://vergabe-teilnahme.whywhynot.de`, with its container image published through the Railiance Gitea OCI registry and its PostgreSQL data living in the shared cnpg cluster. This establishes vergabe-teilnahme as the second application (after Gitea) running on the S5 layer of the Railiance OAS Stack and exercises the freshly enabled container registry from `RAILIANCE-WP-0001` end-to-end. ## Placement in the Railiance Tooling Set This workplan lives in `railiance-apps` because vergabe-teilnahme is an S5 application workload. The deployment surface added by this workplan is: - `helm/vergabe-teilnahme-values.sops.yaml` — SOPS-encrypted Helm values (`DJANGO_SECRET_KEY`, DB DSN, etc.). - `releases/vergabe-teilnahme/` — chart selection / overlay (Bitnami generic chart or hand-rolled chart, decided in T05). - `manifests/vergabe-teilnahme-ingress.yaml` — Traefik ingress + cert-manager TLS for `vergabe-teilnahme.whywhynot.de`. - `Makefile` targets: `vergabe-deploy`, `vergabe-status`, `vergabe-migrate`. Cross-repo coordination required: | Concern | Owner repo | Coordination | |---------|------------|--------------| | Application Helm release | `railiance-apps` | This workplan | | Containerization (Dockerfile, entrypoint, asset build) | `vergabe-teilnahme` | This workplan opens a task in that repo | | PostgreSQL role + database in shared cnpg cluster | `railiance-platform` | Capability request via hub | | DNS A record for `vergabe-teilnahme.whywhynot.de` | DNS owner of `whywhynot.de` | Out-of-band, captured in T06 | | Ingress controller / cluster routing primitives | `railiance-cluster` | Reuse — no changes expected | | cert-manager ClusterIssuer `letsencrypt-prod` | `railiance-platform` | Reuse — no changes expected | ## Current Evidence - `vergabe-teilnahme/CLAUDE.md`: Django 6.x · uv · Tailwind CSS v4 (Vite) · HTMX 2.x · Alpine.js 3.x · PostgreSQL 16+ (psycopg3). German UI. - `vergabe-teilnahme/` currently has no `Dockerfile`. `docker-compose.dev.yml` documents the local Postgres but isn't started when the shared `infra-postgres-1` is up. - `railiance-apps/Makefile` deploys Gitea via `helm/gitea-values.sops.yaml`; the same SOPS + Helm pattern is the template for this workplan. - `RAILIANCE-WP-0001` confirmed `https://gitea.coulomb.social/v2/` returns the OCI registry auth challenge. Image naming convention established: `gitea.coulomb.social//:`. - `manifests/gitea-ingress.yaml` confirms the ingress recipe: `ingressClassName: traefik` + annotation `cert-manager.io/cluster-issuer: letsencrypt-prod`. - Domain `whywhynot.de` has no prior references in any railiance repo — DNS and a fresh Let's Encrypt cert will need to be set up. - Live cnpg state: `gitea-db` runs in the `databases` namespace. T01 confirms whether a single shared cluster exists for app DBs or whether one must be designated. ## Safety Contract - Do not commit decrypted Helm values, the Django `SECRET_KEY`, DB credentials, or any other secret material. - Use a dedicated PostgreSQL role with privileges scoped to a single database; never reuse the gitea role or grant superuser. - No public exposure until cert-manager has successfully issued a TLS certificate for `vergabe-teilnahme.whywhynot.de`. - Do not start with `DEBUG=True`; production settings only. - Preserve Gitea behavior: the new ingress must not conflict with the `gitea` ingress on the cluster's default ingress controller. - If DB role provisioning requires changes to the shared cnpg cluster resource limits, pause and create a `railiance-platform` task. - If DNS for `whywhynot.de` is owned outside this operator's control, pause T06 until DNS ownership is confirmed. - Start fresh — no migration of data from any existing dev database in this workplan. ## Target State - `vergabe-teilnahme:` image is built and pushed to `gitea.coulomb.social/coulomb/vergabe-teilnahme:`. - A `vergabe` PostgreSQL role and `vergabe_db` database exist in the shared cnpg cluster (single role, single DB, no cross-app grants). - A Helm release `vergabe-teilnahme` is deployed in a dedicated namespace with a single replica, a Service, a PVC for media (if any), and the necessary secrets sourced from SOPS values. - Django `migrate` and `make seed` have run successfully against the shared cnpg database. - `https://vergabe-teilnahme.whywhynot.de` serves the Django app behind a valid Let's Encrypt certificate. - Login as a superuser succeeds; the dashboard renders; static assets are served correctly (Tailwind/Vite build pipeline integrated into the image). - Runbook, registry naming, DB credentials handling, and rollback steps are documented for the next operator. ## Tasks ### T01 — Inventory the deployment substrate ```task id: RAILIANCE-WP-0002-T01 status: done priority: high state_hub_task_id: "49aa7d85-96bd-4d97-952c-80dcfff06610" ``` Confirm the pre-conditions before any code is written. Checks: - Identify the shared cnpg cluster intended for app databases (name, namespace, version, current databases/roles, available capacity). - Verify `gitea.coulomb.social/v2/` still returns an OCI registry auth challenge and that the operator has a package-capable token. - Verify cert-manager `letsencrypt-prod` ClusterIssuer is healthy and has successfully issued at least one production certificate recently (`gitea-tls` is a known good example). - Confirm DNS ownership and the change path for `whywhynot.de` — record who owns the zone and how an A record is added. - Confirm Traefik is the active ingress controller and note the public IP/hostname an A record must point at. **Done when:** the workplan records (a) the cnpg cluster to use, (b) the operator's path to a Gitea package token, (c) the DNS change path for `whywhynot.de`, and (d) any pre-condition gaps. **Findings (2026-05-18):** - **cnpg landscape — no shared cluster yet.** `kubectl get clusters.postgresql.cnpg.io -A` returns two app-dedicated clusters in the `databases` namespace: - `gitea-db` — `ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie`, 1 instance, 10Gi - `net-kingdom-pg` — `ghcr.io/cloudnative-pg/postgresql:16`, 1 instance, 10Gi Neither was provisioned as a shared cluster. The user's earlier choice ("shared cnpg cluster, new database role") therefore requires a sub-decision — see **Decision D-01** below. - **Gitea registry reachable.** `curl --resolve gitea.coulomb.social:443:92.205.130.254 https://gitea.coulomb.social/v2/` returns `HTTP 405` for `HEAD` with a valid TLS chain (cert: `default/gitea-tls`, ready 3d). The OCI endpoint is up; HEAD-vs-GET is expected. - **Gitea package token — still required.** No package-capable PAT is currently held by the operator in this session (carryover blocker from `RAILIANCE-WP-0001-T04`). Token must be minted via the Gitea web UI by a user with `write:package` scope before T03. - **Public DNS for `whywhynot.de`:** A-record currently `217.160.0.212` (IONOS web hosting). Authoritative NS = `ns1126.ui-dns.{de,biz,com,org}` (IONOS / 1&1). The zone is administered through the operator's IONOS web console — DNS change is a manual out-of-band step. - **Traefik LB public IP:** `92.205.130.254` (`kube-system/traefik` LoadBalancer service, ports 80/443). This is the target the new A-record must point at. - **cert-manager:** `ClusterIssuer/letsencrypt-prod` is `Ready=True` (59d). Most recent successful issuance: `default/gitea-tls`, 3d4h ago. Several stale failing certs in `mfa` and `sso` namespaces are unrelated to this workplan. - **Pre-condition gaps before downstream tasks unblock:** 1. D-01 below (cnpg target cluster) — blocks T04. 2. Gitea package-capable PAT — blocks T03. 3. DNS A-record for `vergabe-teilnahme.whywhynot.de → 92.205.130.254` — blocks T06. **Decision D-01 — cnpg target for `vergabe_db`** (pending; required before T04): | Option | Pros | Cons | |--------|------|------| | A. New dedicated cluster `vergabe-pg` | Matches the existing one-cluster-per-app pattern; clean blast radius | Resource cost grows linearly with apps; no actual "shared" cluster emerges | | B. Add role+db to existing `net-kingdom-pg` (PG 16) | Reuses a healthy PG 16 cluster matching vergabe-teilnahme's minimum; lowest cost | Cluster name no longer reflects its content; coupling with netkingdom domain | | C. Add role+db to existing `gitea-db` (PG 18) | Newest cluster image; same operator | Couples gitea ops with vergabe ops; name no longer reflects content | | D. Provision a new general-purpose cluster `apps-pg` (PG 16+) | Establishes a real shared cluster that future apps adopt | Net-new infra; needs a `railiance-platform` task to own the cluster | Recommendation: **D** (creates the "shared cluster" the user asked for as a real artifact rather than retrofitting an existing name). Recorded as a pending hub decision. **Resolution (2026-05-18, bernd):** Option D. Provision a new shared cnpg cluster `apps-pg` (PG 16, 1 instance, 10Gi initial) in namespace `databases`. cnpg `Cluster` CRs live in `railiance-platform` per ADR-003 (confirmed: `helm/gitea-db-cluster.yaml`). A coordination message has been sent to `railiance-platform` requesting the cluster. T04 below is now sequenced **after** that cluster reports healthy. --- ### T02 — Add Dockerfile and asset build to vergabe-teilnahme ```task id: RAILIANCE-WP-0002-T02 status: todo priority: high repo: vergabe-teilnahme state_hub_task_id: "43ce85c4-0bdb-43c4-b0a5-81fa366800a6" ``` Open a companion task in the `vergabe-teilnahme` repo to add a production-ready container image definition. Scope: - Multi-stage `Dockerfile` at the repo root: - Stage 1: Node + Vite + Tailwind asset build (`npm ci` → `npm run build` → emits to `static/dist/`). - Stage 2: Python image, `uv sync --frozen`, copy app and built assets, run `manage.py collectstatic --noinput`. - Production WSGI/ASGI server (gunicorn) listening on `:8000`. - Whitenoise-style static asset serving (or document an alternative). - Non-root container user, sensible `WORKDIR`, healthcheck endpoint. - `.dockerignore` excluding `node_modules`, `media/`, `__pycache__`, `.venv`, `static/dist` source, etc. - Document the build command and the chosen image tag scheme in the vergabe-teilnahme README. Coordination: this task crosses into `vergabe-teilnahme`. Track via a hub task and reference the PR/commit when complete. **Done when:** `docker build` against the vergabe-teilnahme repo produces a runnable image that responds to a smoke request locally. --- ### T03 — Build and publish image to Gitea container registry ```task id: RAILIANCE-WP-0002-T03 status: todo priority: high state_hub_task_id: "d0f8db8c-fad9-4e0b-a404-9e3a04cffb05" ``` Push the first production image of vergabe-teilnahme through the registry enabled in `RAILIANCE-WP-0001`. Steps: ```bash docker login gitea.coulomb.social docker build -t gitea.coulomb.social/coulomb/vergabe-teilnahme: \ /home/worsch/vergabe-teilnahme docker push gitea.coulomb.social/coulomb/vergabe-teilnahme: ``` Choose a deterministic tag scheme (`` recommended). Record the exact image reference and digest used for the first deployment. **Done when:** the image is fetchable from a disposable Kubernetes pod on the cluster. --- ### T04 — Provision PostgreSQL role and database ```task id: RAILIANCE-WP-0002-T04 status: blocked priority: high state_hub_task_id: "925ace1c-f9bf-4644-bd0b-637705d72ea6" ``` Create a `vergabe` PostgreSQL role and `vergabe_db` database inside the new shared `apps-pg` cnpg cluster being provisioned by `railiance-platform` (per resolved decision D-01). Blocked on: `apps-pg` cluster reaching `Cluster in healthy state` in namespace `databases`. Coordination message sent to `railiance-platform` on 2026-05-18; record the platform workstream/task IDs here once returned. Approach: - Use a cnpg `Database` and `Role` resource — never an out-of-band `psql` change without recording it. - The role owns only `vergabe_db`; no `CREATEDB`, no superuser, no grants on other databases. - Capture the database DSN in the SOPS values file (T05). - If the cluster needs to grow (more instances, more storage, backup inclusion), pause and add a follow-up `railiance-platform` task — do not edit cluster-level resources from this repo. **Done when:** the new role can connect to `vergabe_db` from inside the cluster (`kubectl run --rm -it psql ...`) and is recorded in the SOPS values used by T05. --- ### T05 — Author Helm release for vergabe-teilnahme ```task id: RAILIANCE-WP-0002-T05 status: todo priority: high state_hub_task_id: "29ba6add-6f23-4053-acb9-9d7efa0b3881" ``` Add the chart selection (or bespoke chart) and SOPS-encrypted values that turn the published image into a Kubernetes Deployment. Deliverables: - Decide chart approach: Bitnami `common` template, a thin in-repo chart under `charts/vergabe-teilnahme/`, or raw manifests. Record the rationale in the workplan log. - `helm/vergabe-teilnahme-values.sops.yaml` containing: - image repo + tag from T03, - env (DJANGO_SETTINGS_MODULE=`vergabe_teilnahme.settings.prod`, `ALLOWED_HOSTS`, `CSRF_TRUSTED_ORIGINS`), - `SECRET_KEY` (generated, never committed in clear), - DB DSN from T04, - resource requests/limits, single replica, readiness/liveness probes targeting the healthcheck endpoint introduced in T02. - A dedicated namespace (`vergabe-teilnahme`). - Optional: PVC for media uploads if Django `MEDIA_ROOT` is needed in v1; otherwise omit and document deferral. - `Makefile` targets: `vergabe-deploy`, `vergabe-status`, `vergabe-migrate`. **Done when:** `make vergabe-deploy` renders cleanly with `--dry-run` and produces no plaintext secrets in the rendered manifest source. --- ### T06 — DNS, ingress, and TLS for vergabe-teilnahme.whywhynot.de ```task id: RAILIANCE-WP-0002-T06 status: todo priority: high state_hub_task_id: "8e673ee6-5338-4eb5-8973-a1818b4dc7f5" ``` Make the application reachable behind a valid Let's Encrypt certificate. Steps: - Add an A record `vergabe-teilnahme.whywhynot.de` → cluster public IP (per T01). Use the DNS change path captured in T01. - Add `manifests/vergabe-teilnahme-ingress.yaml` modeled on `gitea-ingress.yaml`: - `ingressClassName: traefik`, - annotation `cert-manager.io/cluster-issuer: letsencrypt-prod`, - `tls.secretName: vergabe-teilnahme-tls`, - host `vergabe-teilnahme.whywhynot.de`, backend the Service from T05. - Wait for cert-manager to issue the cert. - Validate `https://vergabe-teilnahme.whywhynot.de/healthz` (or equivalent) returns 200 with a trusted cert chain. Boundary note: ingress controller and cluster networking changes belong in `railiance-cluster`. This task only adds an `Ingress` resource that consumes the existing controller. **Done when:** the public hostname serves the app over HTTPS and the certificate chain validates from outside the cluster. --- ### T07 — Initial migration, seed, and smoke test ```task id: RAILIANCE-WP-0002-T07 status: todo priority: high state_hub_task_id: "be1decb5-b734-4312-b98d-20ed5299d02c" ``` Bring the app to a usable state in production. Steps: - Run `manage.py migrate` as a Kubernetes `Job` or one-shot `kubectl exec` against the running Deployment (record which). - Run `manage.py seed` (the `make seed` target) — vergabe-teilnahme's idempotent seed. - Create the first superuser via `manage.py createsuperuser`. - Smoke checklist: - Login at `/admin/` succeeds. - The dashboard at `/` renders without errors. - Static assets (Tailwind build output) are served with correct content-type and 200 status. - HTMX partial requests succeed on at least one page. - A new `Ausschreibung` can be created and saved. **Done when:** the smoke checklist passes and `kubectl logs` shows no unexpected errors. --- ### T08 — Document handoff, runbook, and backup posture ```task id: RAILIANCE-WP-0002-T08 status: todo priority: medium state_hub_task_id: "594d3591-b61f-40c4-850c-efaa02c859ed" ``` Capture everything an on-call operator needs. Deliverables in `docs/vergabe-teilnahme.md`: - Registry image naming and tag scheme. - Namespace, Deployment, Service, Ingress names. - DB DSN handling (where secrets live, how to rotate). - Restart, rollback (`helm rollback`), and migration commands. - Backup posture: confirm whether the shared cnpg cluster's backup job includes `vergabe_db`; if not, open a `railiance-platform` follow-up. - Pointer to the vergabe-teilnahme repo for app-level changes vs. `railiance-apps` for Helm/ingress changes. **Done when:** a new operator can find vergabe-teilnahme, deploy a new image tag, and recover from a pod crash without reading this workplan. ## Completion Criteria This workplan is complete when: 1. The vergabe-teilnahme image is published to `gitea.coulomb.social/coulomb/vergabe-teilnahme:`. 2. A dedicated PostgreSQL role and database serve the app from the shared cnpg cluster. 3. `helm/vergabe-teilnahme-values.sops.yaml` and the ingress manifest are committed; `make vergabe-deploy` is the single command to deploy. 4. `https://vergabe-teilnahme.whywhynot.de` serves the app over HTTPS with a valid Let's Encrypt cert. 5. Migrations + seed have run; the smoke checklist passes. 6. Runbook is in `docs/vergabe-teilnahme.md`. ## Notes - This is the second application on `railiance01` (after Gitea). It intentionally adopts the same SOPS + Helm + Traefik + cert-manager pattern so the operator workflow stays consistent. - v1 deliberately defers: 3-stage canary (Staged Promotion Lifecycle is still 0/7), SSO/Keycloak integration, S3-backed media storage, Celery + Redis workers (optional in the architecture blueprint), and multi-replica HA. Each can become its own follow-up workplan once the baseline runs. - The `whywhynot.de` domain enters the Railiance stack for the first time here. Treat the DNS path established in T01/T06 as the reference for any future `*.whywhynot.de` workloads.