6.1 KiB
vergabe-teilnahme — operator runbook
Production deployment of the Django tender-management app, shipped
under RAILIANCE-WP-0002.
Identity
| Public URL | https://vergabe-teilnahme.whywhynot.de |
| Namespace | vergabe-teilnahme |
| Helm release | vergabe-teilnahme |
| Chart | charts/vergabe-teilnahme/ |
| Values | helm/vergabe-teilnahme-values.yaml (plain — no SOPS) |
| Ingress | manifests/vergabe-teilnahme-ingress.yaml |
| Image | gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag> |
| Database | vergabe_db on shared cnpg apps-pg (see railiance-platform/docs/apps-pg.md) |
| TLS | vergabe-teilnahme-tls, issued by cert-manager letsencrypt-prod |
Secrets
Two K8s Secrets in the vergabe-teilnahme namespace:
| Secret | Type | Source of truth | Used for |
|---|---|---|---|
vergabe-app-credentials |
kubernetes.io/basic-auth |
mirror of databases/vergabe-app-credentials (cnpg-owned) |
raw DB role credential |
vergabe-teilnahme-env |
Opaque |
created by operator | SECRET_KEY + URL-encoded DATABASE_URL (envFrom on the Deployment) |
No SOPS encryption for this app — all sensitive material lives in K8s Secrets, not in committed values files.
Rotating the DB password
- Have
railiance-platformrotate the cnpg-managed Secret (databases/vergabe-app-credentials). - Mirror the new password into
vergabe-teilnahme/vergabe-app-credentials. - Rebuild
DATABASE_URLinvergabe-teilnahme-env, URL-encoding the password (the base64 character set breaks the URL parser otherwise - seeRAILIANCE-WP-0004 I01):make vergabe-db-url-secret kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
Rotating SECRET_KEY
Django SECRET_KEY rotation invalidates active sessions but is
otherwise zero-downtime:
NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc')
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
--type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}"
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
Day-to-day commands
make vergabe-status # pods, svc, ingress, certificate
make vergabe-logs # tail app logs
make vergabe-dry-run # helm template render (audit values)
make vergabe-deploy # helm upgrade --install (idempotent)
make vergabe-migrate # manage.py migrate against live deploy
make vergabe-seed # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod)
make vergabe-superuser # interactive createsuperuser
Promoting a new image tag
- Build + push from the
vergabe-teilnahmerepo using the portable package path:issue-coremust resolve from the Gitea PyPI registry, not from a sibling checkout. Ifissue-core==0.2.0is not published yet, keeprailiance-apps-WP-0004 I03inwait. - Update
image.taginhelm/vergabe-teilnahme-values.yamlto the new git SHA. make vergabe-deploy— Helm rolls a new ReplicaSet with zero-downtime (maxSurge: 1, maxUnavailable: 0).- Verify via
make vergabe-statusand an HTTPS probe. - If migrations are needed, run
make vergabe-migrateafter the rollout completes.
Rollback
helm history vergabe-teilnahme -n vergabe-teilnahme
helm rollback vergabe-teilnahme <REVISION> -n vergabe-teilnahme
Rollback does not unwind DB migrations. For any rollback that
crosses a migration boundary, plan a manage.py migrate <app> <name>
reverse step explicitly.
Troubleshooting
Pod stuck Running 0/1, kube-probe failing
Most likely the probe's Host header doesn't match
ALLOWED_HOSTS. The chart sets probes.hostHeader: vergabe-teilnahme.whywhynot.de precisely to avoid this — if you
change ALLOWED_HOSTS in values, also update probes.hostHeader.
Symptom in kubectl logs: kube-probe requests returning HTTP 400.
See docs/django-on-railiance.md for the reusable pattern.
dj-database-url error: "The database name 'XYZ...' is longer than 63 characters"
The DATABASE_URL password isn't URL-encoded. See the rotation
recipe above. Tracked in RAILIANCE-WP-0004 I01.
Cert-manager: cert stuck in False
Check the Order/Challenge resources:
kubectl get order,challenge -n vergabe-teilnahme
kubectl describe challenge -n vergabe-teilnahme
Common causes: DNS not yet propagated to all resolvers, Let's
Encrypt rate-limited, or the ingress controller isn't forwarding
/.well-known/acme-challenge/ requests.
make vergabe-status shows certificate False
The chart leaves cert lifecycle to cert-manager. If the cert renews
fail, cert-manager keeps serving the old cert until it expires.
Investigate with kubectl describe certificate vergabe-teilnahme-tls -n vergabe-teilnahme.
Backup posture (open)
The shared apps-pg cluster is not yet covered by an automated
backup job — only the legacy PostgreSQL-HA setup is. Manual logical
dump for now:
kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump
Tracked as a follow-up in RAILIANCE-WP-0003 Notes (CNPG backup
configuration belongs to railiance-platform).
Deferred for v1
- Multi-replica HA (
replicaCount: 1). - Media-upload PVC (
persistence.media.enabled: false— DjangoMEDIA_ROOTis in-pod ephemeral). - 3-stage canary (the Staged Promotion Lifecycle workstream is still 0/7).
- SSO / Keycloak integration (Django built-in auth only).
- Celery + Redis workers.
Cross-references
- Workplan:
workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md - Improvements backlog:
workplans/railiance-apps-WP-0004-app-deployment-improvements.md - Shared DB cluster:
railiance-platform/docs/apps-pg.md - Container registry:
/home/worsch/railiance-forge/docs/gitea-container-registry.md - Python package registry:
/home/worsch/railiance-forge/docs/gitea-package-registry.md - Django deployment recipe:
docs/django-on-railiance.md - Operator setup:
docs/operator-setup.md - Operator recipes:
docs/operator-recipes.md - App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme