6.8 KiB
vergabe-teilnahme — operator runbook
Production deployment of the Django tender-management app, shipped
under RAILIANCE-WP-0002.
Identity
| Public URL | https://vergabe-teilnahme.whywhynot.de |
| Namespace | vergabe-teilnahme |
| Helm release | vergabe-teilnahme |
| Chart | charts/vergabe-teilnahme/ |
| Values | helm/vergabe-teilnahme-values.yaml (plain — no SOPS) |
| Ingress | manifests/vergabe-teilnahme-ingress.yaml |
| Image | gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag> |
| Database | vergabe_db on shared cnpg apps-pg (see railiance-platform/docs/apps-pg.md) |
| TLS | vergabe-teilnahme-tls, issued by cert-manager letsencrypt-prod |
Secrets
Two K8s Secrets in the vergabe-teilnahme namespace:
| Secret | Type | Source of truth | Used for |
|---|---|---|---|
vergabe-app-credentials |
kubernetes.io/basic-auth |
mirror of databases/vergabe-app-credentials (cnpg-owned) |
raw DB role credential |
vergabe-teilnahme-env |
Opaque |
created by operator | SECRET_KEY + URL-encoded DATABASE_URL (envFrom on the Deployment) |
No SOPS encryption for this app — all sensitive material lives in K8s Secrets, not in committed values files.
Rotating the DB password
- Have
railiance-platformrotate the cnpg-managed Secret (databases/vergabe-app-credentials). - Mirror the new password into
vergabe-teilnahme/vergabe-app-credentials. - Rebuild
DATABASE_URLinvergabe-teilnahme-env, URL-encoding the password (the base64 character set breaks the URL parser otherwise - seeRAILIANCE-WP-0004 I01):make vergabe-db-url-secret kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
Rotating SECRET_KEY
Django SECRET_KEY rotation invalidates active sessions but is
otherwise zero-downtime:
NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc')
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
--type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}"
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
Day-to-day commands
make vergabe-status # pods, svc, ingress, certificate
make vergabe-logs # tail app logs
make vergabe-dry-run # helm template render (audit values)
make vergabe-deploy # helm upgrade --install (idempotent)
make vergabe-migrate # manage.py migrate against live deploy
make vergabe-seed # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod)
make vergabe-superuser # interactive createsuperuser
Promoting a new image tag
- Build + push from the
vergabe-teilnahmerepo using the portable package path:issue-coremust resolve from the Gitea PyPI registry, not from a sibling checkout. Ifissue-core==0.2.0is not published yet, keeprailiance-apps-WP-0004 I03inwait. - Update
image.taginhelm/vergabe-teilnahme-values.yamlto the new git SHA. make vergabe-deploy— Helm rolls a new ReplicaSet with zero-downtime (maxSurge: 1, maxUnavailable: 0).- Verify via
make vergabe-statusand an HTTPS probe. - If migrations are needed, run
make vergabe-migrateafter the rollout completes.
Rollback
helm history vergabe-teilnahme -n vergabe-teilnahme
helm rollback vergabe-teilnahme <REVISION> -n vergabe-teilnahme
Rollback does not unwind DB migrations. For any rollback that
crosses a migration boundary, plan a manage.py migrate <app> <name>
reverse step explicitly.
Troubleshooting
Pod stuck Running 0/1, kube-probe failing
Most likely the probe's Host header doesn't match
ALLOWED_HOSTS. The chart sets probes.hostHeader: vergabe-teilnahme.whywhynot.de precisely to avoid this — if you
change ALLOWED_HOSTS in values, also update probes.hostHeader.
Symptom in kubectl logs: kube-probe requests returning HTTP 400.
See docs/django-on-railiance.md for the reusable pattern.
dj-database-url error: "The database name 'XYZ...' is longer than 63 characters"
The DATABASE_URL password isn't URL-encoded. See the rotation
recipe above. Tracked in RAILIANCE-WP-0004 I01.
Cert-manager: cert stuck in False
Check the Order/Challenge resources:
kubectl get order,challenge -n vergabe-teilnahme
kubectl describe challenge -n vergabe-teilnahme
Common causes: DNS not yet propagated to all resolvers, Let's
Encrypt rate-limited, or the ingress controller isn't forwarding
/.well-known/acme-challenge/ requests.
make vergabe-status shows certificate False
The chart leaves cert lifecycle to cert-manager. If the cert renews
fail, cert-manager keeps serving the old cert until it expires.
Investigate with kubectl describe certificate vergabe-teilnahme-tls -n vergabe-teilnahme.
Data durability and restore readiness
vergabe_db lives on the shared apps-pg CNPG cluster owned by
railiance-platform. S5 owns the app release runbook and post-restore app
checks; platform owns the database backup and restore mechanism.
Current status: apps-pg backup coverage is still platform follow-up work, so
vergabe-teilnahme should not be treated as production-critical data until the
gate in docs/app-data-backup-restore-handoff.md is satisfied.
Manual logical dump is a break-glass or inspection option, not the durable backup contract:
kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump
Before promotion beyond smoke or development use, record platform backup evidence, an isolated restore drill, migration result, health check, HTTPS smoke check, and representative app workflow verification.
Deferred for v1
- Multi-replica HA (
replicaCount: 1). - Media-upload PVC (
persistence.media.enabled: false— DjangoMEDIA_ROOTis in-pod ephemeral). - 3-stage canary (the Staged Promotion Lifecycle workstream is still 0/7).
- SSO / Keycloak integration (Django built-in auth only).
- Celery + Redis workers.
Cross-references
- Workplan:
workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md - Improvements backlog:
workplans/railiance-apps-WP-0004-app-deployment-improvements.md - Shared DB cluster:
railiance-platform/docs/apps-pg.md - Container registry:
/home/worsch/railiance-forge/docs/gitea-container-registry.md - Python package registry:
/home/worsch/railiance-forge/docs/gitea-package-registry.md - S5 app onboarding checklist:
docs/s5-app-onboarding-checklist.md - App data backup handoff:
docs/app-data-backup-restore-handoff.md - Manifest dry-run prerequisites:
docs/manifest-server-dry-run.md - Django deployment recipe:
docs/django-on-railiance.md - Operator setup:
docs/operator-setup.md - Operator recipes:
docs/operator-recipes.md - App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme