Propose RAILIANCE-WP-0002: vergabe-teilnahme on railiance01

8-task plan to deploy vergabe-teilnahme as a Helm release at
vergabe-teilnahme.whywhynot.de with image from gitea.coulomb.social
and a dedicated role on the shared cnpg cluster.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-18 18:21:28 +02:00
parent 2537ca17b8
commit 52efcaa0b2

View File

@@ -0,0 +1,383 @@
---
id: RAILIANCE-WP-0002
type: workplan
title: "Establish vergabe-teilnahme as an Application on railiance01"
domain: railiance
repo: railiance-apps
status: proposed
owner: railiance
topic_slug: railiance
created: "2026-05-18"
updated: "2026-05-18"
planning_priority: high
planning_order: 2
---
# Establish vergabe-teilnahme as an Application on railiance01
## Goal
Deploy the `vergabe-teilnahme` Django application as a governed Helm release on
the production cluster node `railiance01` (`92.205.130.254`), reachable at
`https://vergabe-teilnahme.whywhynot.de`, with its container image published
through the Railiance Gitea OCI registry and its PostgreSQL data living in the
shared cnpg cluster.
This establishes vergabe-teilnahme as the second application (after Gitea)
running on the S5 layer of the Railiance OAS Stack and exercises the freshly
enabled container registry from `RAILIANCE-WP-0001` end-to-end.
## Placement in the Railiance Tooling Set
This workplan lives in `railiance-apps` because vergabe-teilnahme is an S5
application workload. The deployment surface added by this workplan is:
- `helm/vergabe-teilnahme-values.sops.yaml` — SOPS-encrypted Helm values
(`DJANGO_SECRET_KEY`, DB DSN, etc.).
- `releases/vergabe-teilnahme/` — chart selection / overlay (Bitnami generic
chart or hand-rolled chart, decided in T05).
- `manifests/vergabe-teilnahme-ingress.yaml` — Traefik ingress + cert-manager
TLS for `vergabe-teilnahme.whywhynot.de`.
- `Makefile` targets: `vergabe-deploy`, `vergabe-status`, `vergabe-migrate`.
Cross-repo coordination required:
| Concern | Owner repo | Coordination |
|---------|------------|--------------|
| Application Helm release | `railiance-apps` | This workplan |
| Containerization (Dockerfile, entrypoint, asset build) | `vergabe-teilnahme` | This workplan opens a task in that repo |
| PostgreSQL role + database in shared cnpg cluster | `railiance-platform` | Capability request via hub |
| DNS A record for `vergabe-teilnahme.whywhynot.de` | DNS owner of `whywhynot.de` | Out-of-band, captured in T06 |
| Ingress controller / cluster routing primitives | `railiance-cluster` | Reuse — no changes expected |
| cert-manager ClusterIssuer `letsencrypt-prod` | `railiance-platform` | Reuse — no changes expected |
## Current Evidence
- `vergabe-teilnahme/CLAUDE.md`: Django 6.x · uv · Tailwind CSS v4 (Vite) ·
HTMX 2.x · Alpine.js 3.x · PostgreSQL 16+ (psycopg3). German UI.
- `vergabe-teilnahme/` currently has no `Dockerfile`. `docker-compose.dev.yml`
documents the local Postgres but isn't started when the shared
`infra-postgres-1` is up.
- `railiance-apps/Makefile` deploys Gitea via `helm/gitea-values.sops.yaml`;
the same SOPS + Helm pattern is the template for this workplan.
- `RAILIANCE-WP-0001` confirmed `https://gitea.coulomb.social/v2/` returns
the OCI registry auth challenge. Image naming convention established:
`gitea.coulomb.social/<org>/<image>:<tag>`.
- `manifests/gitea-ingress.yaml` confirms the ingress recipe:
`ingressClassName: traefik` + annotation
`cert-manager.io/cluster-issuer: letsencrypt-prod`.
- Domain `whywhynot.de` has no prior references in any railiance repo —
DNS and a fresh Let's Encrypt cert will need to be set up.
- Live cnpg state: `gitea-db` runs in the `databases` namespace. T01
confirms whether a single shared cluster exists for app DBs or whether
one must be designated.
## Safety Contract
- Do not commit decrypted Helm values, the Django `SECRET_KEY`, DB
credentials, or any other secret material.
- Use a dedicated PostgreSQL role with privileges scoped to a single
database; never reuse the gitea role or grant superuser.
- No public exposure until cert-manager has successfully issued a TLS
certificate for `vergabe-teilnahme.whywhynot.de`.
- Do not start with `DEBUG=True`; production settings only.
- Preserve Gitea behavior: the new ingress must not conflict with the
`gitea` ingress on the cluster's default ingress controller.
- If DB role provisioning requires changes to the shared cnpg cluster
resource limits, pause and create a `railiance-platform` task.
- If DNS for `whywhynot.de` is owned outside this operator's control,
pause T06 until DNS ownership is confirmed.
- Start fresh — no migration of data from any existing dev database in
this workplan.
## Target State
- `vergabe-teilnahme:<tag>` image is built and pushed to
`gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>`.
- A `vergabe` PostgreSQL role and `vergabe_db` database exist in the
shared cnpg cluster (single role, single DB, no cross-app grants).
- A Helm release `vergabe-teilnahme` is deployed in a dedicated
namespace with a single replica, a Service, a PVC for media (if any),
and the necessary secrets sourced from SOPS values.
- Django `migrate` and `make seed` have run successfully against the
shared cnpg database.
- `https://vergabe-teilnahme.whywhynot.de` serves the Django app behind
a valid Let's Encrypt certificate.
- Login as a superuser succeeds; the dashboard renders; static assets
are served correctly (Tailwind/Vite build pipeline integrated into the
image).
- Runbook, registry naming, DB credentials handling, and rollback steps
are documented for the next operator.
## Tasks
### T01 — Inventory the deployment substrate
```task
id: RAILIANCE-WP-0002-T01
status: todo
priority: high
```
Confirm the pre-conditions before any code is written.
Checks:
- Identify the shared cnpg cluster intended for app databases (name,
namespace, version, current databases/roles, available capacity).
- Verify `gitea.coulomb.social/v2/` still returns an OCI registry auth
challenge and that the operator has a package-capable token.
- Verify cert-manager `letsencrypt-prod` ClusterIssuer is healthy and
has successfully issued at least one production certificate recently
(`gitea-tls` is a known good example).
- Confirm DNS ownership and the change path for `whywhynot.de` — record
who owns the zone and how an A record is added.
- Confirm Traefik is the active ingress controller and note the public
IP/hostname an A record must point at.
**Done when:** the workplan records (a) the cnpg cluster to use,
(b) the operator's path to a Gitea package token, (c) the DNS change
path for `whywhynot.de`, and (d) any pre-condition gaps.
---
### T02 — Add Dockerfile and asset build to vergabe-teilnahme
```task
id: RAILIANCE-WP-0002-T02
status: todo
priority: high
repo: vergabe-teilnahme
```
Open a companion task in the `vergabe-teilnahme` repo to add a
production-ready container image definition.
Scope:
- Multi-stage `Dockerfile` at the repo root:
- Stage 1: Node + Vite + Tailwind asset build (`npm ci`
`npm run build` → emits to `static/dist/`).
- Stage 2: Python image, `uv sync --frozen`, copy app and built
assets, run `manage.py collectstatic --noinput`.
- Production WSGI/ASGI server (gunicorn) listening on `:8000`.
- Whitenoise-style static asset serving (or document an alternative).
- Non-root container user, sensible `WORKDIR`, healthcheck endpoint.
- `.dockerignore` excluding `node_modules`, `media/`, `__pycache__`,
`.venv`, `static/dist` source, etc.
- Document the build command and the chosen image tag scheme in the
vergabe-teilnahme README.
Coordination: this task crosses into `vergabe-teilnahme`. Track via a
hub task and reference the PR/commit when complete.
**Done when:** `docker build` against the vergabe-teilnahme repo produces
a runnable image that responds to a smoke request locally.
---
### T03 — Build and publish image to Gitea container registry
```task
id: RAILIANCE-WP-0002-T03
status: todo
priority: high
```
Push the first production image of vergabe-teilnahme through the
registry enabled in `RAILIANCE-WP-0001`.
Steps:
```bash
docker login gitea.coulomb.social
docker build -t gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag> \
/home/worsch/vergabe-teilnahme
docker push gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>
```
Choose a deterministic tag scheme (`<git-sha>` recommended). Record the
exact image reference and digest used for the first deployment.
**Done when:** the image is fetchable from a disposable Kubernetes pod
on the cluster.
---
### T04 — Provision PostgreSQL role and database
```task
id: RAILIANCE-WP-0002-T04
status: todo
priority: high
```
Create a `vergabe` PostgreSQL role and `vergabe_db` database inside the
shared cnpg cluster identified in T01.
Approach:
- Use a cnpg `Database` and `Role` (or `ScheduledBackup` / SQL bootstrap)
resource — never an out-of-band `psql` change without recording it.
- The role owns only `vergabe_db`; no `CREATEDB`, no superuser, no grants
on other databases.
- Capture the database DSN in the SOPS values file (T05).
- Coordinate with `railiance-platform` if any cluster-level change is
needed (resource limits, backup inclusion, monitoring).
**Done when:** the new role can connect to `vergabe_db` from inside the
cluster (`kubectl run --rm -it psql ...`) and is recorded in the SOPS
values used by T05.
---
### T05 — Author Helm release for vergabe-teilnahme
```task
id: RAILIANCE-WP-0002-T05
status: todo
priority: high
```
Add the chart selection (or bespoke chart) and SOPS-encrypted values
that turn the published image into a Kubernetes Deployment.
Deliverables:
- Decide chart approach: Bitnami `common` template, a thin in-repo
chart under `charts/vergabe-teilnahme/`, or raw manifests. Record the
rationale in the workplan log.
- `helm/vergabe-teilnahme-values.sops.yaml` containing:
- image repo + tag from T03,
- env (DJANGO_SETTINGS_MODULE=`vergabe_teilnahme.settings.prod`,
`ALLOWED_HOSTS`, `CSRF_TRUSTED_ORIGINS`),
- `SECRET_KEY` (generated, never committed in clear),
- DB DSN from T04,
- resource requests/limits, single replica, readiness/liveness probes
targeting the healthcheck endpoint introduced in T02.
- A dedicated namespace (`vergabe-teilnahme`).
- Optional: PVC for media uploads if Django `MEDIA_ROOT` is needed in
v1; otherwise omit and document deferral.
- `Makefile` targets: `vergabe-deploy`, `vergabe-status`,
`vergabe-migrate`.
**Done when:** `make vergabe-deploy` renders cleanly with `--dry-run`
and produces no plaintext secrets in the rendered manifest source.
---
### T06 — DNS, ingress, and TLS for vergabe-teilnahme.whywhynot.de
```task
id: RAILIANCE-WP-0002-T06
status: todo
priority: high
```
Make the application reachable behind a valid Let's Encrypt certificate.
Steps:
- Add an A record `vergabe-teilnahme.whywhynot.de`
cluster public IP (per T01). Use the DNS change path captured in T01.
- Add `manifests/vergabe-teilnahme-ingress.yaml` modeled on
`gitea-ingress.yaml`:
- `ingressClassName: traefik`,
- annotation `cert-manager.io/cluster-issuer: letsencrypt-prod`,
- `tls.secretName: vergabe-teilnahme-tls`,
- host `vergabe-teilnahme.whywhynot.de`, backend the Service from T05.
- Wait for cert-manager to issue the cert.
- Validate `https://vergabe-teilnahme.whywhynot.de/healthz` (or
equivalent) returns 200 with a trusted cert chain.
Boundary note: ingress controller and cluster networking changes
belong in `railiance-cluster`. This task only adds an `Ingress`
resource that consumes the existing controller.
**Done when:** the public hostname serves the app over HTTPS and the
certificate chain validates from outside the cluster.
---
### T07 — Initial migration, seed, and smoke test
```task
id: RAILIANCE-WP-0002-T07
status: todo
priority: high
```
Bring the app to a usable state in production.
Steps:
- Run `manage.py migrate` as a Kubernetes `Job` or one-shot
`kubectl exec` against the running Deployment (record which).
- Run `manage.py seed` (the `make seed` target) — vergabe-teilnahme's
idempotent seed.
- Create the first superuser via `manage.py createsuperuser`.
- Smoke checklist:
- Login at `/admin/` succeeds.
- The dashboard at `/` renders without errors.
- Static assets (Tailwind build output) are served with correct
content-type and 200 status.
- HTMX partial requests succeed on at least one page.
- A new `Ausschreibung` can be created and saved.
**Done when:** the smoke checklist passes and `kubectl logs` shows no
unexpected errors.
---
### T08 — Document handoff, runbook, and backup posture
```task
id: RAILIANCE-WP-0002-T08
status: todo
priority: medium
```
Capture everything an on-call operator needs.
Deliverables in `docs/vergabe-teilnahme.md`:
- Registry image naming and tag scheme.
- Namespace, Deployment, Service, Ingress names.
- DB DSN handling (where secrets live, how to rotate).
- Restart, rollback (`helm rollback`), and migration commands.
- Backup posture: confirm whether the shared cnpg cluster's backup job
includes `vergabe_db`; if not, open a `railiance-platform` follow-up.
- Pointer to the vergabe-teilnahme repo for app-level changes vs.
`railiance-apps` for Helm/ingress changes.
**Done when:** a new operator can find vergabe-teilnahme, deploy a new
image tag, and recover from a pod crash without reading this workplan.
## Completion Criteria
This workplan is complete when:
1. The vergabe-teilnahme image is published to
`gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>`.
2. A dedicated PostgreSQL role and database serve the app from the
shared cnpg cluster.
3. `helm/vergabe-teilnahme-values.sops.yaml` and the ingress manifest
are committed; `make vergabe-deploy` is the single command to deploy.
4. `https://vergabe-teilnahme.whywhynot.de` serves the app over HTTPS
with a valid Let's Encrypt cert.
5. Migrations + seed have run; the smoke checklist passes.
6. Runbook is in `docs/vergabe-teilnahme.md`.
## Notes
- This is the second application on `railiance01` (after Gitea). It
intentionally adopts the same SOPS + Helm + Traefik + cert-manager
pattern so the operator workflow stays consistent.
- v1 deliberately defers: 3-stage canary (Staged Promotion Lifecycle is
still 0/7), SSO/Keycloak integration, S3-backed media storage,
Celery + Redis workers (optional in the architecture blueprint), and
multi-replica HA. Each can become its own follow-up workplan once the
baseline runs.
- The `whywhynot.de` domain enters the Railiance stack for the first
time here. Treat the DNS path established in T01/T06 as the reference
for any future `*.whywhynot.de` workloads.