Files
railiance-apps/docs/vergabe-teilnahme.md

6.1 KiB

vergabe-teilnahme — operator runbook

Production deployment of the Django tender-management app, shipped under RAILIANCE-WP-0002.

Identity

Public URL https://vergabe-teilnahme.whywhynot.de
Namespace vergabe-teilnahme
Helm release vergabe-teilnahme
Chart charts/vergabe-teilnahme/
Values helm/vergabe-teilnahme-values.yaml (plain — no SOPS)
Ingress manifests/vergabe-teilnahme-ingress.yaml
Image gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>
Database vergabe_db on shared cnpg apps-pg (see railiance-platform/docs/apps-pg.md)
TLS vergabe-teilnahme-tls, issued by cert-manager letsencrypt-prod

Secrets

Two K8s Secrets in the vergabe-teilnahme namespace:

Secret Type Source of truth Used for
vergabe-app-credentials kubernetes.io/basic-auth mirror of databases/vergabe-app-credentials (cnpg-owned) raw DB role credential
vergabe-teilnahme-env Opaque created by operator SECRET_KEY + URL-encoded DATABASE_URL (envFrom on the Deployment)

No SOPS encryption for this app — all sensitive material lives in K8s Secrets, not in committed values files.

Rotating the DB password

  1. Have railiance-platform rotate the cnpg-managed Secret (databases/vergabe-app-credentials).
  2. Mirror the new password into vergabe-teilnahme/vergabe-app-credentials.
  3. Rebuild DATABASE_URL in vergabe-teilnahme-env, URL-encoding the password (the base64 character set breaks the URL parser otherwise - see RAILIANCE-WP-0004 I01):
    make vergabe-db-url-secret
    kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
    

Rotating SECRET_KEY

Django SECRET_KEY rotation invalidates active sessions but is otherwise zero-downtime:

NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc')
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
  --type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}"
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme

Day-to-day commands

make vergabe-status      # pods, svc, ingress, certificate
make vergabe-logs        # tail app logs
make vergabe-dry-run     # helm template render (audit values)
make vergabe-deploy      # helm upgrade --install (idempotent)
make vergabe-migrate     # manage.py migrate against live deploy
make vergabe-seed        # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod)
make vergabe-superuser   # interactive createsuperuser

Promoting a new image tag

  1. Build + push from the vergabe-teilnahme repo using the portable package path: issue-core must resolve from the Gitea PyPI registry, not from a sibling checkout. If issue-core==0.2.0 is not published yet, keep railiance-apps-WP-0004 I03 in wait.
  2. Update image.tag in helm/vergabe-teilnahme-values.yaml to the new git SHA.
  3. make vergabe-deploy — Helm rolls a new ReplicaSet with zero-downtime (maxSurge: 1, maxUnavailable: 0).
  4. Verify via make vergabe-status and an HTTPS probe.
  5. If migrations are needed, run make vergabe-migrate after the rollout completes.

Rollback

helm history vergabe-teilnahme -n vergabe-teilnahme
helm rollback vergabe-teilnahme <REVISION> -n vergabe-teilnahme

Rollback does not unwind DB migrations. For any rollback that crosses a migration boundary, plan a manage.py migrate <app> <name> reverse step explicitly.

Troubleshooting

Pod stuck Running 0/1, kube-probe failing

Most likely the probe's Host header doesn't match ALLOWED_HOSTS. The chart sets probes.hostHeader: vergabe-teilnahme.whywhynot.de precisely to avoid this — if you change ALLOWED_HOSTS in values, also update probes.hostHeader. Symptom in kubectl logs: kube-probe requests returning HTTP 400. See docs/django-on-railiance.md for the reusable pattern.

dj-database-url error: "The database name 'XYZ...' is longer than 63 characters"

The DATABASE_URL password isn't URL-encoded. See the rotation recipe above. Tracked in RAILIANCE-WP-0004 I01.

Cert-manager: cert stuck in False

Check the Order/Challenge resources:

kubectl get order,challenge -n vergabe-teilnahme
kubectl describe challenge -n vergabe-teilnahme

Common causes: DNS not yet propagated to all resolvers, Let's Encrypt rate-limited, or the ingress controller isn't forwarding /.well-known/acme-challenge/ requests.

make vergabe-status shows certificate False

The chart leaves cert lifecycle to cert-manager. If the cert renews fail, cert-manager keeps serving the old cert until it expires. Investigate with kubectl describe certificate vergabe-teilnahme-tls -n vergabe-teilnahme.

Backup posture (open)

The shared apps-pg cluster is not yet covered by an automated backup job — only the legacy PostgreSQL-HA setup is. Manual logical dump for now:

kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump

Tracked as a follow-up in RAILIANCE-WP-0003 Notes (CNPG backup configuration belongs to railiance-platform).

Deferred for v1

  • Multi-replica HA (replicaCount: 1).
  • Media-upload PVC (persistence.media.enabled: false — Django MEDIA_ROOT is in-pod ephemeral).
  • 3-stage canary (the Staged Promotion Lifecycle workstream is still 0/7).
  • SSO / Keycloak integration (Django built-in auth only).
  • Celery + Redis workers.

Cross-references

  • Workplan: workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md
  • Improvements backlog: workplans/railiance-apps-WP-0004-app-deployment-improvements.md
  • Shared DB cluster: railiance-platform/docs/apps-pg.md
  • Container registry: /home/worsch/railiance-forge/docs/gitea-container-registry.md
  • Python package registry: /home/worsch/railiance-forge/docs/gitea-package-registry.md
  • Django deployment recipe: docs/django-on-railiance.md
  • Operator setup: docs/operator-setup.md
  • Operator recipes: docs/operator-recipes.md
  • App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme