# vergabe-teilnahme — operator runbook Production deployment of the Django tender-management app, shipped under `RAILIANCE-WP-0002`. ## Identity | | | |---|---| | Public URL | https://vergabe-teilnahme.whywhynot.de | | Namespace | `vergabe-teilnahme` | | Helm release | `vergabe-teilnahme` | | Chart | `charts/vergabe-teilnahme/` | | Values | `helm/vergabe-teilnahme-values.yaml` (plain — no SOPS) | | Ingress | `manifests/vergabe-teilnahme-ingress.yaml` | | Image | `gitea.coulomb.social/coulomb/vergabe-teilnahme:` | | Database | `vergabe_db` on shared cnpg `apps-pg` (see `railiance-platform/docs/apps-pg.md`) | | TLS | `vergabe-teilnahme-tls`, issued by cert-manager `letsencrypt-prod` | ## Secrets Two K8s Secrets in the `vergabe-teilnahme` namespace: | Secret | Type | Source of truth | Used for | |--------|------|-----------------|----------| | `vergabe-app-credentials` | `kubernetes.io/basic-auth` | mirror of `databases/vergabe-app-credentials` (cnpg-owned) | raw DB role credential | | `vergabe-teilnahme-env` | `Opaque` | created by operator | `SECRET_KEY` + URL-encoded `DATABASE_URL` (envFrom on the Deployment) | **No SOPS encryption** for this app — all sensitive material lives in K8s Secrets, not in committed values files. ### Rotating the DB password 1. Have `railiance-platform` rotate the cnpg-managed Secret (`databases/vergabe-app-credentials`). 2. Mirror the new password into `vergabe-teilnahme/vergabe-app-credentials`. 3. Rebuild `DATABASE_URL` in `vergabe-teilnahme-env`, **URL-encoding the password** (the base64 character set breaks the URL parser otherwise - see `RAILIANCE-WP-0004 I01`): ```bash make vergabe-db-url-secret kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme ``` ### Rotating `SECRET_KEY` Django `SECRET_KEY` rotation invalidates active sessions but is otherwise zero-downtime: ```bash NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc') kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \ --type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}" kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme ``` ## Day-to-day commands ```bash make vergabe-status # pods, svc, ingress, certificate make vergabe-logs # tail app logs make vergabe-dry-run # helm template render (audit values) make vergabe-deploy # helm upgrade --install (idempotent) make vergabe-migrate # manage.py migrate against live deploy make vergabe-seed # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod) make vergabe-superuser # interactive createsuperuser ``` ## Promoting a new image tag 1. Build + push from the `vergabe-teilnahme` repo using the portable package path: `issue-core` resolves from the Gitea PyPI registry, not from a sibling checkout. 2. Update `image.tag` in `helm/vergabe-teilnahme-values.yaml` to the new git SHA. 3. `make vergabe-deploy` — Helm rolls a new ReplicaSet with zero-downtime (`maxSurge: 1, maxUnavailable: 0`). 4. Verify via `make vergabe-status` and an HTTPS probe. 5. If migrations are needed, run `make vergabe-migrate` after the rollout completes. ## Rollback ```bash helm history vergabe-teilnahme -n vergabe-teilnahme helm rollback vergabe-teilnahme -n vergabe-teilnahme ``` Rollback does **not** unwind DB migrations. For any rollback that crosses a migration boundary, plan a `manage.py migrate ` reverse step explicitly. ## Troubleshooting ### Pod stuck `Running` 0/1, kube-probe failing Most likely the probe's `Host` header doesn't match `ALLOWED_HOSTS`. The chart sets `probes.hostHeader: vergabe-teilnahme.whywhynot.de` precisely to avoid this — if you change `ALLOWED_HOSTS` in values, also update `probes.hostHeader`. Symptom in `kubectl logs`: kube-probe requests returning HTTP 400. See `docs/django-on-railiance.md` for the reusable pattern. ### `dj-database-url` error: "The database name 'XYZ...' is longer than 63 characters" The `DATABASE_URL` password isn't URL-encoded. See the rotation recipe above. Tracked in `RAILIANCE-WP-0004 I01`. ### Cert-manager: cert stuck in `False` Check the Order/Challenge resources: ```bash kubectl get order,challenge -n vergabe-teilnahme kubectl describe challenge -n vergabe-teilnahme ``` Common causes: DNS not yet propagated to all resolvers, Let's Encrypt rate-limited, or the ingress controller isn't forwarding `/.well-known/acme-challenge/` requests. ### `make vergabe-status` shows certificate `False` The chart leaves cert lifecycle to cert-manager. If the cert renews fail, cert-manager keeps serving the old cert until it expires. Investigate with `kubectl describe certificate vergabe-teilnahme-tls -n vergabe-teilnahme`. ## Data durability and restore readiness `vergabe_db` lives on the shared `apps-pg` CNPG cluster owned by `railiance-platform`. S5 owns the app release runbook and post-restore app checks; platform owns the database backup and restore mechanism. Current status: `apps-pg` backup coverage is still platform follow-up work, so `vergabe-teilnahme` should not be treated as production-critical data until the gate in `docs/app-data-backup-restore-handoff.md` is satisfied. Manual logical dump is a break-glass or inspection option, not the durable backup contract: ```bash kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump ``` Before promotion beyond smoke or development use, record platform backup evidence, an isolated restore drill, migration result, health check, HTTPS smoke check, and representative app workflow verification. ## Deferred for v1 - Multi-replica HA (`replicaCount: 1`). - Media-upload PVC (`persistence.media.enabled: false` — Django `MEDIA_ROOT` is in-pod ephemeral). - 3-stage canary (the Staged Promotion Lifecycle workstream is still 0/7). - SSO / Keycloak integration (Django built-in auth only). - Celery + Redis workers. ## Cross-references - Workplan: `workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md` - Improvements backlog: `workplans/railiance-apps-WP-0004-app-deployment-improvements.md` - Shared DB cluster: `railiance-platform/docs/apps-pg.md` - Container registry: `/home/worsch/railiance-forge/docs/gitea-container-registry.md` - Python package registry: `/home/worsch/railiance-forge/docs/gitea-package-registry.md` - S5 app onboarding checklist: `docs/s5-app-onboarding-checklist.md` - App data backup handoff: `docs/app-data-backup-restore-handoff.md` - Manifest dry-run prerequisites: `docs/manifest-server-dry-run.md` - Django deployment recipe: `docs/django-on-railiance.md` - Operator setup: `docs/operator-setup.md` - Operator recipes: `docs/operator-recipes.md` - App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme