Files
railiance-apps/docs/vergabe-teilnahme.md

6.8 KiB

vergabe-teilnahme — operator runbook

Production deployment of the Django tender-management app, shipped under RAILIANCE-WP-0002.

Identity

Public URL https://vergabe-teilnahme.whywhynot.de
Namespace vergabe-teilnahme
Helm release vergabe-teilnahme
Chart charts/vergabe-teilnahme/
Values helm/vergabe-teilnahme-values.yaml (plain — no SOPS)
Ingress manifests/vergabe-teilnahme-ingress.yaml
Image gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>
Database vergabe_db on shared cnpg apps-pg (see railiance-platform/docs/apps-pg.md)
TLS vergabe-teilnahme-tls, issued by cert-manager letsencrypt-prod

Secrets

Two K8s Secrets in the vergabe-teilnahme namespace:

Secret Type Source of truth Used for
vergabe-app-credentials kubernetes.io/basic-auth mirror of databases/vergabe-app-credentials (cnpg-owned) raw DB role credential
vergabe-teilnahme-env Opaque created by operator SECRET_KEY + URL-encoded DATABASE_URL (envFrom on the Deployment)

No SOPS encryption for this app — all sensitive material lives in K8s Secrets, not in committed values files.

Rotating the DB password

  1. Have railiance-platform rotate the cnpg-managed Secret (databases/vergabe-app-credentials).
  2. Mirror the new password into vergabe-teilnahme/vergabe-app-credentials.
  3. Rebuild DATABASE_URL in vergabe-teilnahme-env, URL-encoding the password (the base64 character set breaks the URL parser otherwise - see RAILIANCE-WP-0004 I01):
    make vergabe-db-url-secret
    kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
    

Rotating SECRET_KEY

Django SECRET_KEY rotation invalidates active sessions but is otherwise zero-downtime:

NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc')
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
  --type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}"
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme

Day-to-day commands

make vergabe-status      # pods, svc, ingress, certificate
make vergabe-logs        # tail app logs
make vergabe-dry-run     # helm template render (audit values)
make vergabe-deploy      # helm upgrade --install (idempotent)
make vergabe-migrate     # manage.py migrate against live deploy
make vergabe-seed        # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod)
make vergabe-superuser   # interactive createsuperuser

Promoting a new image tag

  1. Build + push from the vergabe-teilnahme repo using the portable package path: issue-core must resolve from the Gitea PyPI registry, not from a sibling checkout. If issue-core==0.2.0 is not published yet, keep railiance-apps-WP-0004 I03 in wait.
  2. Update image.tag in helm/vergabe-teilnahme-values.yaml to the new git SHA.
  3. make vergabe-deploy — Helm rolls a new ReplicaSet with zero-downtime (maxSurge: 1, maxUnavailable: 0).
  4. Verify via make vergabe-status and an HTTPS probe.
  5. If migrations are needed, run make vergabe-migrate after the rollout completes.

Rollback

helm history vergabe-teilnahme -n vergabe-teilnahme
helm rollback vergabe-teilnahme <REVISION> -n vergabe-teilnahme

Rollback does not unwind DB migrations. For any rollback that crosses a migration boundary, plan a manage.py migrate <app> <name> reverse step explicitly.

Troubleshooting

Pod stuck Running 0/1, kube-probe failing

Most likely the probe's Host header doesn't match ALLOWED_HOSTS. The chart sets probes.hostHeader: vergabe-teilnahme.whywhynot.de precisely to avoid this — if you change ALLOWED_HOSTS in values, also update probes.hostHeader. Symptom in kubectl logs: kube-probe requests returning HTTP 400. See docs/django-on-railiance.md for the reusable pattern.

dj-database-url error: "The database name 'XYZ...' is longer than 63 characters"

The DATABASE_URL password isn't URL-encoded. See the rotation recipe above. Tracked in RAILIANCE-WP-0004 I01.

Cert-manager: cert stuck in False

Check the Order/Challenge resources:

kubectl get order,challenge -n vergabe-teilnahme
kubectl describe challenge -n vergabe-teilnahme

Common causes: DNS not yet propagated to all resolvers, Let's Encrypt rate-limited, or the ingress controller isn't forwarding /.well-known/acme-challenge/ requests.

make vergabe-status shows certificate False

The chart leaves cert lifecycle to cert-manager. If the cert renews fail, cert-manager keeps serving the old cert until it expires. Investigate with kubectl describe certificate vergabe-teilnahme-tls -n vergabe-teilnahme.

Data durability and restore readiness

vergabe_db lives on the shared apps-pg CNPG cluster owned by railiance-platform. S5 owns the app release runbook and post-restore app checks; platform owns the database backup and restore mechanism.

Current status: apps-pg backup coverage is still platform follow-up work, so vergabe-teilnahme should not be treated as production-critical data until the gate in docs/app-data-backup-restore-handoff.md is satisfied.

Manual logical dump is a break-glass or inspection option, not the durable backup contract:

kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump

Before promotion beyond smoke or development use, record platform backup evidence, an isolated restore drill, migration result, health check, HTTPS smoke check, and representative app workflow verification.

Deferred for v1

  • Multi-replica HA (replicaCount: 1).
  • Media-upload PVC (persistence.media.enabled: false — Django MEDIA_ROOT is in-pod ephemeral).
  • 3-stage canary (the Staged Promotion Lifecycle workstream is still 0/7).
  • SSO / Keycloak integration (Django built-in auth only).
  • Celery + Redis workers.

Cross-references

  • Workplan: workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md
  • Improvements backlog: workplans/railiance-apps-WP-0004-app-deployment-improvements.md
  • Shared DB cluster: railiance-platform/docs/apps-pg.md
  • Container registry: /home/worsch/railiance-forge/docs/gitea-container-registry.md
  • Python package registry: /home/worsch/railiance-forge/docs/gitea-package-registry.md
  • S5 app onboarding checklist: docs/s5-app-onboarding-checklist.md
  • App data backup handoff: docs/app-data-backup-restore-handoff.md
  • Manifest dry-run prerequisites: docs/manifest-server-dry-run.md
  • Django deployment recipe: docs/django-on-railiance.md
  • Operator setup: docs/operator-setup.md
  • Operator recipes: docs/operator-recipes.md
  • App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme