RAILIANCE-WP-0002 finished: vergabe-teilnahme T07+T08 done
T07 smoke: migrate all apps; /health/ 200, /ausschreibungen/dashboard/ Übersicht, /admin/login/ Anmelden, static assets (Tailwind, Alpine, htmx, Django admin) all 200. Auth-required smoke and createsuperuser deferred to the operator (interactive credentials not safe through this session); seed_dev deliberately skipped (hardcoded dev user). T08 runbook in docs/vergabe-teilnahme.md: identity, secret rotation recipes, day-to-day make targets, image promotion + rollback, troubleshooting, deferred backup posture, cross-refs. Workplan status: finished. vergabe-teilnahme is the second S5 application on railiance01 (after Gitea). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
159
docs/vergabe-teilnahme.md
Normal file
159
docs/vergabe-teilnahme.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# vergabe-teilnahme — operator runbook
|
||||
|
||||
Production deployment of the Django tender-management app, shipped
|
||||
under `RAILIANCE-WP-0002`.
|
||||
|
||||
## Identity
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Public URL | https://vergabe-teilnahme.whywhynot.de |
|
||||
| Namespace | `vergabe-teilnahme` |
|
||||
| Helm release | `vergabe-teilnahme` |
|
||||
| Chart | `charts/vergabe-teilnahme/` |
|
||||
| Values | `helm/vergabe-teilnahme-values.yaml` (plain — no SOPS) |
|
||||
| Ingress | `manifests/vergabe-teilnahme-ingress.yaml` |
|
||||
| Image | `gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>` |
|
||||
| Database | `vergabe_db` on shared cnpg `apps-pg` (see `railiance-platform/docs/apps-pg.md`) |
|
||||
| TLS | `vergabe-teilnahme-tls`, issued by cert-manager `letsencrypt-prod` |
|
||||
|
||||
## Secrets
|
||||
|
||||
Two K8s Secrets in the `vergabe-teilnahme` namespace:
|
||||
|
||||
| Secret | Type | Source of truth | Used for |
|
||||
|--------|------|-----------------|----------|
|
||||
| `vergabe-app-credentials` | `kubernetes.io/basic-auth` | mirror of `databases/vergabe-app-credentials` (cnpg-owned) | raw DB role credential |
|
||||
| `vergabe-teilnahme-env` | `Opaque` | created by operator | `SECRET_KEY` + URL-encoded `DATABASE_URL` (envFrom on the Deployment) |
|
||||
|
||||
**No SOPS encryption** for this app — all sensitive material lives in
|
||||
K8s Secrets, not in committed values files.
|
||||
|
||||
### Rotating the DB password
|
||||
|
||||
1. Have `railiance-platform` rotate the cnpg-managed Secret
|
||||
(`databases/vergabe-app-credentials`).
|
||||
2. Mirror the new password into `vergabe-teilnahme/vergabe-app-credentials`.
|
||||
3. Rebuild `DATABASE_URL` in `vergabe-teilnahme-env`, **URL-encoding
|
||||
the password** (the base64 character set breaks the URL parser
|
||||
otherwise — see `RAILIANCE-WP-0004 I01`):
|
||||
```bash
|
||||
PW=$(kubectl get secret vergabe-app-credentials -n vergabe-teilnahme -o jsonpath='{.data.password}' | base64 -d)
|
||||
ENCODED=$(python3 -c "import urllib.parse,sys; print(urllib.parse.quote(sys.argv[1], safe=''))" "$PW")
|
||||
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
|
||||
--type=merge \
|
||||
-p "{\"stringData\":{\"DATABASE_URL\":\"postgresql://vergabe:$ENCODED@apps-pg-rw.databases:5432/vergabe_db\"}}"
|
||||
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
|
||||
```
|
||||
|
||||
### Rotating `SECRET_KEY`
|
||||
|
||||
Django `SECRET_KEY` rotation invalidates active sessions but is
|
||||
otherwise zero-downtime:
|
||||
|
||||
```bash
|
||||
NEW=$(openssl rand -base64 50 | tr -d '\n' | tr '/+=' 'abc')
|
||||
kubectl patch secret vergabe-teilnahme-env -n vergabe-teilnahme \
|
||||
--type=merge -p "{\"stringData\":{\"SECRET_KEY\":\"$NEW\"}}"
|
||||
kubectl rollout restart deploy/vergabe-teilnahme -n vergabe-teilnahme
|
||||
```
|
||||
|
||||
## Day-to-day commands
|
||||
|
||||
```bash
|
||||
make vergabe-status # pods, svc, ingress, certificate
|
||||
make vergabe-logs # tail app logs
|
||||
make vergabe-dry-run # helm template render (audit values)
|
||||
make vergabe-deploy # helm upgrade --install (idempotent)
|
||||
make vergabe-migrate # manage.py migrate against live deploy
|
||||
make vergabe-seed # seed_dev — DEV ONLY, creates max.muster/testpass123 (do not run in prod)
|
||||
make vergabe-superuser # interactive createsuperuser
|
||||
```
|
||||
|
||||
## Promoting a new image tag
|
||||
|
||||
1. Build + push from `vergabe-teilnahme` repo (see its `Dockerfile`
|
||||
header for the BuildKit `--build-context` invocation — see also
|
||||
`RAILIANCE-WP-0004 I03`).
|
||||
2. Update `image.tag` in `helm/vergabe-teilnahme-values.yaml` to the
|
||||
new git SHA.
|
||||
3. `make vergabe-deploy` — Helm rolls a new ReplicaSet with
|
||||
zero-downtime (`maxSurge: 1, maxUnavailable: 0`).
|
||||
4. Verify via `make vergabe-status` and an HTTPS probe.
|
||||
5. If migrations are needed, run `make vergabe-migrate` after the
|
||||
rollout completes.
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
helm history vergabe-teilnahme -n vergabe-teilnahme
|
||||
helm rollback vergabe-teilnahme <REVISION> -n vergabe-teilnahme
|
||||
```
|
||||
|
||||
Rollback does **not** unwind DB migrations. For any rollback that
|
||||
crosses a migration boundary, plan a `manage.py migrate <app> <name>`
|
||||
reverse step explicitly.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Pod stuck `Running` 0/1, kube-probe failing
|
||||
|
||||
Most likely the probe's `Host` header doesn't match
|
||||
`ALLOWED_HOSTS`. The chart sets `probes.hostHeader:
|
||||
vergabe-teilnahme.whywhynot.de` precisely to avoid this — if you
|
||||
change `ALLOWED_HOSTS` in values, also update `probes.hostHeader`.
|
||||
Symptom in `kubectl logs`: kube-probe requests returning HTTP 400.
|
||||
|
||||
### `dj-database-url` error: "The database name 'XYZ...' is longer than 63 characters"
|
||||
|
||||
The `DATABASE_URL` password isn't URL-encoded. See the rotation
|
||||
recipe above. Tracked in `RAILIANCE-WP-0004 I01`.
|
||||
|
||||
### Cert-manager: cert stuck in `False`
|
||||
|
||||
Check the Order/Challenge resources:
|
||||
```bash
|
||||
kubectl get order,challenge -n vergabe-teilnahme
|
||||
kubectl describe challenge -n vergabe-teilnahme
|
||||
```
|
||||
Common causes: DNS not yet propagated to all resolvers, Let's
|
||||
Encrypt rate-limited, or the ingress controller isn't forwarding
|
||||
`/.well-known/acme-challenge/` requests.
|
||||
|
||||
### `make vergabe-status` shows certificate `False`
|
||||
|
||||
The chart leaves cert lifecycle to cert-manager. If the cert renews
|
||||
fail, cert-manager keeps serving the old cert until it expires.
|
||||
Investigate with `kubectl describe certificate vergabe-teilnahme-tls
|
||||
-n vergabe-teilnahme`.
|
||||
|
||||
## Backup posture (open)
|
||||
|
||||
The shared `apps-pg` cluster is not yet covered by an automated
|
||||
backup job — only the legacy PostgreSQL-HA setup is. Manual logical
|
||||
dump for now:
|
||||
|
||||
```bash
|
||||
kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump
|
||||
```
|
||||
|
||||
Tracked as a follow-up in `RAILIANCE-WP-0003 Notes` (CNPG backup
|
||||
configuration belongs to `railiance-platform`).
|
||||
|
||||
## Deferred for v1
|
||||
|
||||
- Multi-replica HA (`replicaCount: 1`).
|
||||
- Media-upload PVC (`persistence.media.enabled: false` — Django
|
||||
`MEDIA_ROOT` is in-pod ephemeral).
|
||||
- 3-stage canary (the Staged Promotion Lifecycle workstream is still
|
||||
0/7).
|
||||
- SSO / Keycloak integration (Django built-in auth only).
|
||||
- Celery + Redis workers.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- Workplan: `workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md`
|
||||
- Improvements backlog: `workplans/railiance-apps-WP-0004-app-deployment-improvements.md`
|
||||
- Shared DB cluster: `railiance-platform/docs/apps-pg.md`
|
||||
- Container registry: `docs/gitea-container-registry.md`
|
||||
- App source: https://gitea.coulomb.social/coulomb/vergabe-teilnahme
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Establish vergabe-teilnahme as an Application on railiance01"
|
||||
domain: railiance
|
||||
repo: railiance-apps
|
||||
status: proposed
|
||||
status: finished
|
||||
owner: railiance
|
||||
topic_slug: railiance
|
||||
created: "2026-05-18"
|
||||
@@ -506,7 +506,7 @@ certificate chain validates from outside the cluster.
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T07
|
||||
status: in_progress
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "be1decb5-b734-4312-b98d-20ed5299d02c"
|
||||
```
|
||||
@@ -531,13 +531,39 @@ Steps:
|
||||
**Done when:** the smoke checklist passes and `kubectl logs` shows no
|
||||
unexpected errors.
|
||||
|
||||
**Done (2026-05-19, with deliberate deferrals):**
|
||||
|
||||
- ✅ `manage.py migrate` ran via `make vergabe-migrate` against the
|
||||
live deployment. All Django apps migrated (`accounts`, `core`,
|
||||
`ausschreibungen`, `lose`, `aufgaben`, `dokumente`, `preise`,
|
||||
`partner`, `bibliothek`, `marktbegleiter`, `nachbetrachtung`,
|
||||
`feedback`, plus framework apps).
|
||||
- ❌ `make seed` (= `seed_dev`) deliberately **skipped**: it creates a
|
||||
hardcoded dev user `max.muster / testpass123`. Not prod-safe.
|
||||
- ❌ `createsuperuser` deferred to the operator (interactive
|
||||
credential should not be minted through this session). Recipe in
|
||||
`docs/vergabe-teilnahme.md`.
|
||||
- ✅ Smoke (no-auth surface):
|
||||
- `/health/` → `200 {"status":"ok"}`
|
||||
- `/` → `302 → /ausschreibungen/dashboard/` → `200`, page title
|
||||
`Übersicht`.
|
||||
- `/admin/login/` → `200`, title
|
||||
`Anmelden | Django-Systemverwaltung` (German Django admin).
|
||||
- Static assets: `/static/dist/main.css` 200 (Tailwind),
|
||||
`/static/admin/css/base.css` 200 (Django admin),
|
||||
`/static/vendor/{alpinejs,htmx}/...` referenced from the
|
||||
rendered HTML.
|
||||
- ❌ Auth-required smoke (login, create Ausschreibung) deferred to the
|
||||
operator after `createsuperuser`.
|
||||
- ✅ `kubectl logs` clean — only gunicorn boot + kube-probe 200s.
|
||||
|
||||
---
|
||||
|
||||
### T08 — Document handoff, runbook, and backup posture
|
||||
|
||||
```task
|
||||
id: RAILIANCE-WP-0002-T08
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "594d3591-b61f-40c4-850c-efaa02c859ed"
|
||||
```
|
||||
@@ -558,6 +584,14 @@ Deliverables in `docs/vergabe-teilnahme.md`:
|
||||
**Done when:** a new operator can find vergabe-teilnahme, deploy a new
|
||||
image tag, and recover from a pod crash without reading this workplan.
|
||||
|
||||
**Done (2026-05-19):** `docs/vergabe-teilnahme.md` covers identity,
|
||||
secrets + rotation recipes (DB password and SECRET_KEY), day-to-day
|
||||
make targets, image promotion + rollback, troubleshooting
|
||||
(kube-probe Host header, DSN URL-encoding, cert-manager failure
|
||||
modes), open backup posture, and cross-references to the improvements
|
||||
backlog (`RAILIANCE-WP-0004`), the shared DB cluster doc, and the
|
||||
container registry doc.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
This workplan is complete when:
|
||||
|
||||
Reference in New Issue
Block a user