Files
railiance-apps/workplans/railiance-apps-WP-0002-vergabe-teilnahme-on-railiance01.md
tegwick 45cfd7fd66 RAILIANCE-WP-0002 T06 partial: DNS live for vergabe-teilnahme.whywhynot.de
A record now resolves to 92.205.130.254 (Traefik LB). HTTP probe
reaches Traefik and returns 404 as expected (no Ingress rule yet).
Ingress + cert-manager TLS will be created together with the backing
Service from T05 to avoid wasting a Let's Encrypt issuance attempt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 00:02:00 +02:00

21 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, planning_priority, planning_order, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated planning_priority planning_order state_hub_workstream_id
RAILIANCE-WP-0002 workplan Establish vergabe-teilnahme as an Application on railiance01 railiance railiance-apps proposed railiance railiance 2026-05-18 2026-05-18 high 2 94522a85-80d5-4f2c-8eb0-8d0bcb15f3b0

Establish vergabe-teilnahme as an Application on railiance01

Goal

Deploy the vergabe-teilnahme Django application as a governed Helm release on the production cluster node railiance01 (92.205.130.254), reachable at https://vergabe-teilnahme.whywhynot.de, with its container image published through the Railiance Gitea OCI registry and its PostgreSQL data living in the shared cnpg cluster.

This establishes vergabe-teilnahme as the second application (after Gitea) running on the S5 layer of the Railiance OAS Stack and exercises the freshly enabled container registry from RAILIANCE-WP-0001 end-to-end.

Placement in the Railiance Tooling Set

This workplan lives in railiance-apps because vergabe-teilnahme is an S5 application workload. The deployment surface added by this workplan is:

  • helm/vergabe-teilnahme-values.sops.yaml — SOPS-encrypted Helm values (DJANGO_SECRET_KEY, DB DSN, etc.).
  • releases/vergabe-teilnahme/ — chart selection / overlay (Bitnami generic chart or hand-rolled chart, decided in T05).
  • manifests/vergabe-teilnahme-ingress.yaml — Traefik ingress + cert-manager TLS for vergabe-teilnahme.whywhynot.de.
  • Makefile targets: vergabe-deploy, vergabe-status, vergabe-migrate.

Cross-repo coordination required:

Concern Owner repo Coordination
Application Helm release railiance-apps This workplan
Containerization (Dockerfile, entrypoint, asset build) vergabe-teilnahme This workplan opens a task in that repo
PostgreSQL role + database in shared cnpg cluster railiance-platform Capability request via hub
DNS A record for vergabe-teilnahme.whywhynot.de DNS owner of whywhynot.de Out-of-band, captured in T06
Ingress controller / cluster routing primitives railiance-cluster Reuse — no changes expected
cert-manager ClusterIssuer letsencrypt-prod railiance-platform Reuse — no changes expected

Current Evidence

  • vergabe-teilnahme/CLAUDE.md: Django 6.x · uv · Tailwind CSS v4 (Vite) · HTMX 2.x · Alpine.js 3.x · PostgreSQL 16+ (psycopg3). German UI.
  • vergabe-teilnahme/ currently has no Dockerfile. docker-compose.dev.yml documents the local Postgres but isn't started when the shared infra-postgres-1 is up.
  • railiance-apps/Makefile deploys Gitea via helm/gitea-values.sops.yaml; the same SOPS + Helm pattern is the template for this workplan.
  • RAILIANCE-WP-0001 confirmed https://gitea.coulomb.social/v2/ returns the OCI registry auth challenge. Image naming convention established: gitea.coulomb.social/<org>/<image>:<tag>.
  • manifests/gitea-ingress.yaml confirms the ingress recipe: ingressClassName: traefik + annotation cert-manager.io/cluster-issuer: letsencrypt-prod.
  • Domain whywhynot.de has no prior references in any railiance repo — DNS and a fresh Let's Encrypt cert will need to be set up.
  • Live cnpg state: gitea-db runs in the databases namespace. T01 confirms whether a single shared cluster exists for app DBs or whether one must be designated.

Safety Contract

  • Do not commit decrypted Helm values, the Django SECRET_KEY, DB credentials, or any other secret material.
  • Use a dedicated PostgreSQL role with privileges scoped to a single database; never reuse the gitea role or grant superuser.
  • No public exposure until cert-manager has successfully issued a TLS certificate for vergabe-teilnahme.whywhynot.de.
  • Do not start with DEBUG=True; production settings only.
  • Preserve Gitea behavior: the new ingress must not conflict with the gitea ingress on the cluster's default ingress controller.
  • If DB role provisioning requires changes to the shared cnpg cluster resource limits, pause and create a railiance-platform task.
  • If DNS for whywhynot.de is owned outside this operator's control, pause T06 until DNS ownership is confirmed.
  • Start fresh — no migration of data from any existing dev database in this workplan.

Target State

  • vergabe-teilnahme:<tag> image is built and pushed to gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>.
  • A vergabe PostgreSQL role and vergabe_db database exist in the shared cnpg cluster (single role, single DB, no cross-app grants).
  • A Helm release vergabe-teilnahme is deployed in a dedicated namespace with a single replica, a Service, a PVC for media (if any), and the necessary secrets sourced from SOPS values.
  • Django migrate and make seed have run successfully against the shared cnpg database.
  • https://vergabe-teilnahme.whywhynot.de serves the Django app behind a valid Let's Encrypt certificate.
  • Login as a superuser succeeds; the dashboard renders; static assets are served correctly (Tailwind/Vite build pipeline integrated into the image).
  • Runbook, registry naming, DB credentials handling, and rollback steps are documented for the next operator.

Tasks

T01 — Inventory the deployment substrate

id: RAILIANCE-WP-0002-T01
status: done
priority: high
state_hub_task_id: "49aa7d85-96bd-4d97-952c-80dcfff06610"

Confirm the pre-conditions before any code is written.

Checks:

  • Identify the shared cnpg cluster intended for app databases (name, namespace, version, current databases/roles, available capacity).
  • Verify gitea.coulomb.social/v2/ still returns an OCI registry auth challenge and that the operator has a package-capable token.
  • Verify cert-manager letsencrypt-prod ClusterIssuer is healthy and has successfully issued at least one production certificate recently (gitea-tls is a known good example).
  • Confirm DNS ownership and the change path for whywhynot.de — record who owns the zone and how an A record is added.
  • Confirm Traefik is the active ingress controller and note the public IP/hostname an A record must point at.

Done when: the workplan records (a) the cnpg cluster to use, (b) the operator's path to a Gitea package token, (c) the DNS change path for whywhynot.de, and (d) any pre-condition gaps.

Findings (2026-05-18):

  • cnpg landscape — no shared cluster yet. kubectl get clusters.postgresql.cnpg.io -A returns two app-dedicated clusters in the databases namespace:
    • gitea-dbghcr.io/cloudnative-pg/postgresql:18.1-system-trixie, 1 instance, 10Gi
    • net-kingdom-pgghcr.io/cloudnative-pg/postgresql:16, 1 instance, 10Gi Neither was provisioned as a shared cluster. The user's earlier choice ("shared cnpg cluster, new database role") therefore requires a sub-decision — see Decision D-01 below.
  • Gitea registry reachable. curl --resolve gitea.coulomb.social:443:92.205.130.254 https://gitea.coulomb.social/v2/ returns HTTP 405 for HEAD with a valid TLS chain (cert: default/gitea-tls, ready 3d). The OCI endpoint is up; HEAD-vs-GET is expected.
  • Gitea package token — still required. No package-capable PAT is currently held by the operator in this session (carryover blocker from RAILIANCE-WP-0001-T04). Token must be minted via the Gitea web UI by a user with write:package scope before T03.
  • Public DNS for whywhynot.de: A-record currently 217.160.0.212 (IONOS web hosting). Authoritative NS = ns1126.ui-dns.{de,biz,com,org} (IONOS / 1&1). The zone is administered through the operator's IONOS web console — DNS change is a manual out-of-band step.
  • Traefik LB public IP: 92.205.130.254 (kube-system/traefik LoadBalancer service, ports 80/443). This is the target the new A-record must point at.
  • cert-manager: ClusterIssuer/letsencrypt-prod is Ready=True (59d). Most recent successful issuance: default/gitea-tls, 3d4h ago. Several stale failing certs in mfa and sso namespaces are unrelated to this workplan.
  • Pre-condition gaps before downstream tasks unblock:
    1. D-01 below (cnpg target cluster) — blocks T04.
    2. Gitea package-capable PAT — blocks T03.
    3. DNS A-record for vergabe-teilnahme.whywhynot.de → 92.205.130.254 — blocks T06.

Decision D-01 — cnpg target for vergabe_db (pending; required before T04):

Option Pros Cons
A. New dedicated cluster vergabe-pg Matches the existing one-cluster-per-app pattern; clean blast radius Resource cost grows linearly with apps; no actual "shared" cluster emerges
B. Add role+db to existing net-kingdom-pg (PG 16) Reuses a healthy PG 16 cluster matching vergabe-teilnahme's minimum; lowest cost Cluster name no longer reflects its content; coupling with netkingdom domain
C. Add role+db to existing gitea-db (PG 18) Newest cluster image; same operator Couples gitea ops with vergabe ops; name no longer reflects content
D. Provision a new general-purpose cluster apps-pg (PG 16+) Establishes a real shared cluster that future apps adopt Net-new infra; needs a railiance-platform task to own the cluster

Recommendation: D (creates the "shared cluster" the user asked for as a real artifact rather than retrofitting an existing name). Recorded as a pending hub decision.

Resolution (2026-05-18, bernd): Option D. Provision a new shared cnpg cluster apps-pg (PG 16, 1 instance, 10Gi initial) in namespace databases. cnpg Cluster CRs live in railiance-platform per ADR-003 (confirmed: helm/gitea-db-cluster.yaml). A coordination message has been sent to railiance-platform requesting the cluster. T04 below is now sequenced after that cluster reports healthy.


T02 — Add Dockerfile and asset build to vergabe-teilnahme

id: RAILIANCE-WP-0002-T02
status: done
priority: high
repo: vergabe-teilnahme
state_hub_task_id: "43ce85c4-0bdb-43c4-b0a5-81fa366800a6"

Open a companion task in the vergabe-teilnahme repo to add a production-ready container image definition.

Scope:

  • Multi-stage Dockerfile at the repo root:
    • Stage 1: Node + Vite + Tailwind asset build (npm cinpm run build → emits to static/dist/).
    • Stage 2: Python image, uv sync --frozen, copy app and built assets, run manage.py collectstatic --noinput.
  • Production WSGI/ASGI server (gunicorn) listening on :8000.
  • Whitenoise-style static asset serving (or document an alternative).
  • Non-root container user, sensible WORKDIR, healthcheck endpoint.
  • .dockerignore excluding node_modules, media/, __pycache__, .venv, static/dist source, etc.
  • Document the build command and the chosen image tag scheme in the vergabe-teilnahme README.

Coordination: this task crosses into vergabe-teilnahme. Track via a hub task and reference the PR/commit when complete.

Done when: docker build against the vergabe-teilnahme repo produces a runnable image that responds to a smoke request locally.

Resolution (2026-05-18): issue-facade was renamed to issue-core upstream. Rewired vergabe-teilnahme to depend on issue-core (@ file:///home/worsch/issue-core); the three Python imports were updated (issue_tracker.*issue_core.*). All 20 aufgaben tests pass after the rewire. See vergabe-teilnahme commit 17f511f.

Dockerfile delivered in vergabe-teilnahme repo:

  • Three stages (assets / python-deps / runtime) with whitenoise static serving and collectstatic at build time.
  • BuildKit named context resolves the ../issue-core path dep: docker build --build-context issue-core=/home/worsch/issue-core .
  • Non-root app user, /health/ HEALTHCHECK, gunicorn on :8000.
  • Smoke test: container reports (healthy), /health/ → 200.

Original blocker (now resolved): vergabe-teilnahme couldn't uv sync because pyproject.toml referenced universal-issue-tracker @ file:///home/worsch/issue-facade, but that directory was effectively empty (only .claude/ remained).

error: Failed to generate package metadata for
  `universal-issue-tracker==0.1.0 @ directory+../issue-facade`
  Caused by: /home/worsch/issue-facade does not appear to be a Python
  project, as neither `pyproject.toml` nor `setup.py` are present.

Related candidate sources investigated:

  • /home/worsch/issue-core/ — a separate package (issue-core), not the missing universal-issue-tracker facade.
  • /home/worsch/markitect-main/_issue-tracking/issue-facade/ — does not exist.

This must be resolved upstream in vergabe-teilnahme (or by restoring issue-facade) before T02 can produce a buildable image. Options:

  1. Restore issue-facade — recover the missing source (git reflog, backup, or recreate from issue-core's public surface).
  2. Repoint vergabe-teilnahme's dep to issue-core directly if that's the intended replacement, then update uv.lock.
  3. Vendor a minimal stub interface in vergabe-teilnahme/vendor/ to unblock the container build; restore the real dep later.

Recommendation: route to whoever owns issue-facade (likely a markitect or personhood domain task) and pause T02 until the dep resolves cleanly outside Docker.


T03 — Build and publish image to Gitea container registry

id: RAILIANCE-WP-0002-T03
status: todo
priority: high
state_hub_task_id: "d0f8db8c-fad9-4e0b-a404-9e3a04cffb05"

Push the first production image of vergabe-teilnahme through the registry enabled in RAILIANCE-WP-0001.

Steps:

docker login gitea.coulomb.social
docker build -t gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag> \
  /home/worsch/vergabe-teilnahme
docker push gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>

Choose a deterministic tag scheme (<git-sha> recommended). Record the exact image reference and digest used for the first deployment.

Done when: the image is fetchable from a disposable Kubernetes pod on the cluster.


T04 — Provision PostgreSQL role and database

id: RAILIANCE-WP-0002-T04
status: blocked
priority: high
state_hub_task_id: "925ace1c-f9bf-4644-bd0b-637705d72ea6"

Create a vergabe PostgreSQL role and vergabe_db database inside the new shared apps-pg cnpg cluster being provisioned by railiance-platform (per resolved decision D-01).

Blocked on: apps-pg cluster reaching Cluster in healthy state in namespace databases. Coordination message sent to railiance-platform on 2026-05-18; record the platform workstream/task IDs here once returned.

Approach:

  • Use a cnpg Database and Role resource — never an out-of-band psql change without recording it.
  • The role owns only vergabe_db; no CREATEDB, no superuser, no grants on other databases.
  • Capture the database DSN in the SOPS values file (T05).
  • If the cluster needs to grow (more instances, more storage, backup inclusion), pause and add a follow-up railiance-platform task — do not edit cluster-level resources from this repo.

Done when: the new role can connect to vergabe_db from inside the cluster (kubectl run --rm -it psql ...) and is recorded in the SOPS values used by T05.


T05 — Author Helm release for vergabe-teilnahme

id: RAILIANCE-WP-0002-T05
status: todo
priority: high
state_hub_task_id: "29ba6add-6f23-4053-acb9-9d7efa0b3881"

Add the chart selection (or bespoke chart) and SOPS-encrypted values that turn the published image into a Kubernetes Deployment.

Deliverables:

  • Decide chart approach: Bitnami common template, a thin in-repo chart under charts/vergabe-teilnahme/, or raw manifests. Record the rationale in the workplan log.
  • helm/vergabe-teilnahme-values.sops.yaml containing:
    • image repo + tag from T03,
    • env (DJANGO_SETTINGS_MODULE=vergabe_teilnahme.settings.prod, ALLOWED_HOSTS, CSRF_TRUSTED_ORIGINS),
    • SECRET_KEY (generated, never committed in clear),
    • DB DSN from T04,
    • resource requests/limits, single replica, readiness/liveness probes targeting the healthcheck endpoint introduced in T02.
  • A dedicated namespace (vergabe-teilnahme).
  • Optional: PVC for media uploads if Django MEDIA_ROOT is needed in v1; otherwise omit and document deferral.
  • Makefile targets: vergabe-deploy, vergabe-status, vergabe-migrate.

Done when: make vergabe-deploy renders cleanly with --dry-run and produces no plaintext secrets in the rendered manifest source.


T06 — DNS, ingress, and TLS for vergabe-teilnahme.whywhynot.de

id: RAILIANCE-WP-0002-T06
status: in_progress
priority: high
state_hub_task_id: "8e673ee6-5338-4eb5-8973-a1818b4dc7f5"

Make the application reachable behind a valid Let's Encrypt certificate.

Steps:

  • Add an A record vergabe-teilnahme.whywhynot.de → cluster public IP (per T01). Use the DNS change path captured in T01.
  • Add manifests/vergabe-teilnahme-ingress.yaml modeled on gitea-ingress.yaml:
    • ingressClassName: traefik,
    • annotation cert-manager.io/cluster-issuer: letsencrypt-prod,
    • tls.secretName: vergabe-teilnahme-tls,
    • host vergabe-teilnahme.whywhynot.de, backend the Service from T05.
  • Wait for cert-manager to issue the cert.
  • Validate https://vergabe-teilnahme.whywhynot.de/healthz (or equivalent) returns 200 with a trusted cert chain.

Boundary note: ingress controller and cluster networking changes belong in railiance-cluster. This task only adds an Ingress resource that consumes the existing controller.

Done when: the public hostname serves the app over HTTPS and the certificate chain validates from outside the cluster.

Progress (2026-05-18):

  • DNS A record live: vergabe-teilnahme.whywhynot.de → 92.205.130.254 (TTL 3600; served authoritatively by ns1126.ui-dns.*).
  • Traefik routing reaches the cluster: HTTP probe returns 404 — the expected pre-state because no Ingress rule matches the host yet.
  • manifests/vergabe-teilnahme-ingress.yaml — not yet created (waits on T05's Service to point at; creating the ingress before the backend Service exists would waste a Let's Encrypt issuance attempt).
  • vergabe-teilnahme-tls Secret — pending ingress.

T07 — Initial migration, seed, and smoke test

id: RAILIANCE-WP-0002-T07
status: todo
priority: high
state_hub_task_id: "be1decb5-b734-4312-b98d-20ed5299d02c"

Bring the app to a usable state in production.

Steps:

  • Run manage.py migrate as a Kubernetes Job or one-shot kubectl exec against the running Deployment (record which).
  • Run manage.py seed (the make seed target) — vergabe-teilnahme's idempotent seed.
  • Create the first superuser via manage.py createsuperuser.
  • Smoke checklist:
    • Login at /admin/ succeeds.
    • The dashboard at / renders without errors.
    • Static assets (Tailwind build output) are served with correct content-type and 200 status.
    • HTMX partial requests succeed on at least one page.
    • A new Ausschreibung can be created and saved.

Done when: the smoke checklist passes and kubectl logs shows no unexpected errors.


T08 — Document handoff, runbook, and backup posture

id: RAILIANCE-WP-0002-T08
status: todo
priority: medium
state_hub_task_id: "594d3591-b61f-40c4-850c-efaa02c859ed"

Capture everything an on-call operator needs.

Deliverables in docs/vergabe-teilnahme.md:

  • Registry image naming and tag scheme.
  • Namespace, Deployment, Service, Ingress names.
  • DB DSN handling (where secrets live, how to rotate).
  • Restart, rollback (helm rollback), and migration commands.
  • Backup posture: confirm whether the shared cnpg cluster's backup job includes vergabe_db; if not, open a railiance-platform follow-up.
  • Pointer to the vergabe-teilnahme repo for app-level changes vs. railiance-apps for Helm/ingress changes.

Done when: a new operator can find vergabe-teilnahme, deploy a new image tag, and recover from a pod crash without reading this workplan.

Completion Criteria

This workplan is complete when:

  1. The vergabe-teilnahme image is published to gitea.coulomb.social/coulomb/vergabe-teilnahme:<tag>.
  2. A dedicated PostgreSQL role and database serve the app from the shared cnpg cluster.
  3. helm/vergabe-teilnahme-values.sops.yaml and the ingress manifest are committed; make vergabe-deploy is the single command to deploy.
  4. https://vergabe-teilnahme.whywhynot.de serves the app over HTTPS with a valid Let's Encrypt cert.
  5. Migrations + seed have run; the smoke checklist passes.
  6. Runbook is in docs/vergabe-teilnahme.md.

Notes

  • This is the second application on railiance01 (after Gitea). It intentionally adopts the same SOPS + Helm + Traefik + cert-manager pattern so the operator workflow stays consistent.
  • v1 deliberately defers: 3-stage canary (Staged Promotion Lifecycle is still 0/7), SSO/Keycloak integration, S3-backed media storage, Celery + Redis workers (optional in the architecture blueprint), and multi-replica HA. Each can become its own follow-up workplan once the baseline runs.
  • The whywhynot.de domain enters the Railiance stack for the first time here. Treat the DNS path established in T01/T06 as the reference for any future *.whywhynot.de workloads.