9.7 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|
| RAILIANCE-WP-0004 | workplan | App deployment improvements (lessons from RAILIANCE-WP-0002) | railiance | railiance-apps | finished | railiance | railiance | medium | 2026-05-19 | 2026-06-05 | b61a9aca-4e43-4b3d-a48b-999e0fa842cf |
App deployment improvements
This workplan collects concrete follow-ups surfaced while shipping
vergabe-teilnahme under RAILIANCE-WP-0002. Each item is small,
independent, and can be picked up in isolation when the next S5 app
lands or when the next operator onboards. Activated on 2026-05-22;
local railiance-apps guardrails are implemented, with the package
publication item completed through the forge-owned Gitea package registry.
I01 — URL-encode DB passwords at Secret-build time
id: RAILIANCE-WP-0004-I01
status: done
priority: medium
state_hub_task_id: "a05a855a-00a0-4e0e-ba82-27e0a072f777"
Problem. cnpg-generated bootstrap passwords come from
openssl rand -base64 N and contain =, +, /. Embedded raw in
DATABASE_URL, those characters confuse dj-database-url (it parsed
vergabe:<pw>@apps-pg-rw:5432/vergabe_db as having an 80-character
database name). Cost us one Helm revision and one pod restart to
diagnose.
Fix. Add a tiny helper (shell script or Makefile target) that
takes the raw role password from the cnpg secret and emits the
DSN-ready URL-encoded form into the consumer-namespace env Secret.
Alternative: switch to individual env vars (POSTGRES_HOST,
POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB) so no URL
parsing is needed at all.
Where it lives: new tools/ script + Makefile target, or chart
helper template.
Implemented 2026-05-22. Added tools/build-database-url-secret.sh
and make vergabe-db-url-secret; updated the app runbook to use the
helper during DB password rotation.
I02 — Document the Django + kube-probe Host-header pattern
id: RAILIANCE-WP-0004-I02
status: done
priority: low
state_hub_task_id: "22a212e6-31b1-490a-8d1c-0a33ddc62501"
Problem. The kube-probe sends Host: <pod-ip>:8000. With
production Django settings (DEBUG=False, narrow ALLOWED_HOSTS),
that fails the Host validation and returns HTTP 400 Bad Request,
which the kubelet treats as Unhealthy. First deploy revision
restarted on liveness failure for ~5 minutes before diagnosis.
Fix. The charts/vergabe-teilnahme chart already sets
httpGet.httpHeaders[Host] from probes.hostHeader. Promote this
pattern into a documented "Django-on-Railiance" recipe (short doc in
docs/) so the next Django app starts there rather than rediscovering
the gotcha. Also worth a "common chart values" sketch if a second
Django app justifies the abstraction.
Implemented 2026-05-22. Added docs/django-on-railiance.md and
cross-linked it from the vergabe-teilnahme runbook.
I03 — Publish issue-core to a Gitea Python package registry
id: RAILIANCE-WP-0004-I03
status: done
priority: medium
state_hub_task_id: "f412b874-0670-4a4a-89fc-575fe4994646"
Problem. vergabe-teilnahme/pyproject.toml has a path dependency
on ../issue-core. Building the container image therefore requires
the --build-context issue-core=/home/worsch/issue-core BuildKit
flag, which is operator-machine-specific and breaks CI builds /
remote builds / other workstations.
Fix. Enable the Gitea Python package registry (analogous to the
container registry from RAIL-AP-WP-0001), publish issue-core as a
proper wheel with version, and switch the dep to
issue-core>=0.2,<0.3 with a normal index URL. The Dockerfile then
drops the --build-context and the build becomes portable.
Coordination: depends on the forge-owned Gitea PyPI endpoint and package
token posture in railiance-forge, plus a release pipeline for issue-core
in its source repo.
Local progress 2026-05-22. helm/gitea-registry-values.yaml set
packages.LIMIT_SIZE_PYPI: -1 while Gitea was still operated from this repo.
That registry operating surface has since moved to railiance-forge; current
PyPI endpoint docs live at
/home/worsch/railiance-forge/docs/gitea-package-registry.md. The remaining
release and dependency change must happen in the issue-core and
vergabe-teilnahme repos.
Cross-repo progress 2026-05-23. issue-core now has a validated
make package-check build and Gitea Actions publish workflow for the
0.2.x package series. vergabe-teilnahme has been switched in
pyproject.toml to issue-core>=0.2,<0.3, with the Docker named
issue-core build context removed in favor of the Gitea PyPI index.
The final unblock still requires a Gitea package username/token to
publish issue-core==0.2.0; once published, regenerate
vergabe-teilnahme/uv.lock from the registry and mark this task done.
Completed 2026-06-05. Published issue-core==0.2.0 to the Coulomb Gitea
PyPI registry using an operator token read from /tmp/gat.tmp without
recording the secret value. railiance-forge exposed the approved
/api/packages ingress path, the public package-specific simple index returned
200, a clean temporary environment installed issue-core==0.2.0 from Gitea,
and vergabe-teilnahme/uv.lock was regenerated so it uses the Gitea registry
instead of ../issue-core.
I04 — Operator onboarding: install the kubectl cnpg plugin
id: RAILIANCE-WP-0004-I04
status: done
priority: low
state_hub_task_id: "2f44cad1-b70c-4406-91a9-0c0fa9c75583"
Problem. make vergabe-status, apps-pg-status, db-shell use
kubectl cnpg ... first and fall back to bare kubectl when the
plugin is missing. The fallback works but the cnpg plugin gives much
better cluster diagnostics (status table, primary/replica health,
backup state).
Fix. Add the plugin install command to operator onboarding (one
line: kubectl krew install cnpg or a direct binary download). Add
a make check-tools target that warns when kubectl cnpg or helm
is missing.
Implemented 2026-05-22. Added make check-tools,
docs/operator-setup.md, and cnpg fallback status output for Gitea and
the shared apps-pg cluster.
I05 — Operator onboarding: SOPS / age key bootstrap
id: RAILIANCE-WP-0004-I05
status: done
priority: low
state_hub_task_id: "741d8a73-8cb0-40ac-a218-f1d3a74ebef3"
Problem. Several Makefile targets read helm/*.sops.yaml via
sops -d. A new operator with no ~/.config/sops/age/keys.txt
sees a confusing decryption failure rather than a clear "you need
the age key" message. The session that produced this workplan had to
skip the SOPS template step for apps-pg-secret.sops.yaml.template.
Fix. Add a docs/operator-setup.md with the age key handoff
procedure (where to put the key, how to verify, how to rotate). A
make check-sops target that asserts the keys file exists and can
decrypt a known sentinel would catch this at the first deploy attempt
rather than at the failing apply.
Implemented 2026-05-22. Added docs/operator-setup.md,
tools/check-sops.sh, and make check-sops. After the forge extraction,
make check-sops requires an explicit SOPS_SENTINEL=<encrypted-file> so this
repo does not depend on forge-owned Gitea SOPS files.
I06 — CI guard against stale committed manifests vs live CRD drift
id: RAILIANCE-WP-0004-I06
status: done
priority: medium
state_hub_task_id: "a319c20b-993c-46b7-889a-f0ac738056c4"
Problem. helm/gitea-db-cluster.yaml (in railiance-platform)
had spec.postgresql.version: "16" — a field that has never
existed in the CNPG v1 schema. The committed manifest had silently
diverged from the live cluster for months and would have rejected on
the next make db-deploy. Caught only by trying to apply a new file
that copied the same stale shape.
Fix. Add a per-PR CI job that runs
kubectl apply --dry-run=server -f <changed-yaml> against a
representative cluster (or a kind cluster seeded with the same CRDs).
The cnpg / cert-manager / Traefik CRDs change between operator
releases; strict server-side decoding catches drift that
yamllint and Helm template rendering miss.
Note. Primarily a railiance-platform and railiance-cluster
concern, but mirrored here because every S5 manifest in
charts/ and manifests/ carries the same risk.
Implemented 2026-05-22. Added tools/k8s-server-dry-run.sh,
make k8s-server-dry-run, and a .gitea/workflows/ PR workflow that
runs the guard when charts, Helm values, manifests, or the dry-run tool
change.
I07 — kubectl run --rm -i smoke pattern is unreliable
id: RAILIANCE-WP-0004-I07
status: done
priority: low
state_hub_task_id: "e3f59b3d-95c8-4cf9-9943-b1597954fd77"
Problem. Repeated false negatives when testing service-IP
connectivity with kubectl run --rm -i …: the smoke pod exits
before the connection completes, producing "Connection refused"
output even though the destination service was fully healthy. Wasted
significant debugging time during apps-pg verification before
switching to a persistent pod + kubectl exec.
Fix. Add an docs/operator-recipes.md note (or inline in the
runbook) recommending the persistent-pod-plus-exec pattern for any
service-IP smoke check. Optional: ship tools/smoke.sh that
wraps the pattern.
Implemented 2026-05-22. Added docs/operator-recipes.md and
tools/smoke-service.sh.
Notes
- Items were activated on 2026-05-22 and completed on 2026-06-05. I03 closed
after
issue-core==0.2.0was published to the Gitea PyPI registry, the package API route was exposed byrailiance-forge, and thevergabe-teilnahmesource lock moved off the sibling checkout. - I06 is genuinely cross-repo; the others are local to
railiance-appsor its operator workflow. - The first three items (I01, I02, I03) are the highest-leverage for the second S5 app onboarding.