Files
the-custodian/docs/forgejo-production-decisions.md
codex e8a7f49bde Record ADR-004 in-cluster Forgejo runner decision for T04
Updates forgejo-production-decisions and CUST-WP-0054-T04 partial progress.
2026-07-03 22:29:28 +02:00

61 lines
3.5 KiB
Markdown

# Forgejo Production Decisions (Wave 1)
Date: 2026-07-03
Workplans: `RAIL-HO-WP-0005-T02`, `CUST-WP-0054-T04`
Operator input: 2026-07-03
## Decision log
| # | Topic | Decision | Status | Evidence (2026-07-03) |
| --- | --- | --- | --- | --- |
| 1 | **Production hostname** | `forgejo.coulomb.social` | **decided** | DNS A → `92.205.62.239` (railiance01); HTTPS reaches Traefik on railiance01 |
| 2 | Exposure model | Private HTTPS via railiance01 Traefik ingress + cert-manager `letsencrypt-prod` | **decided** | Same pattern as Gitea (`manifests/forgejo-ingress.yaml` in `railiance-apps`) |
| 2b | Deployment pattern | `gitea-charts/gitea` **12.5.0** Helm + Forgejo image; CNPG `forgejo-db` in `railiance-platform`; Makefile in `railiance-apps` | **decided** | Chart 12.6+ requires Gitea 1.26 `config edit-ini` (incompatible with Forgejo 11); see `railiance-apps/docs/forgejo-on-railiance01.md` |
| 2c | Live deploy | Forgejo pod + ingress + TLS on railiance01 | **done** (2026-07-03) | `make forgejo-smoke` → HTTP 200 + OCI `/v2/` 401 challenge; cert `forgejo-tls` Ready |
| 3 | Gitea during transition | `gitea.coulomb.social` on coulombcore remains canonical **until** Forgejo restore/migration drills pass; then read-only mirror | unchanged | Per `RAIL-HO-WP-0005` safety contract |
| 4 | SMTP / password reset | TBD | open | — |
| 5 | Package registry scope | TBD (container images first assumed) | open | — |
| 6 | Actions runner model | **In-cluster** on railiance01: `forgejo-runner` Deployment + DinD (`railiance01-build-01`) | **decided** | `railiance-infra/docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md`; manifests in `railiance-apps/manifests/forgejo-runner.yaml` |
| 7 | Backup target + retention | TBD | open | — |
| 8 | Cutover mode | TBD (staged per-repo vs freeze-all) | open | — |
## Hostname decision detail
**Chosen hostname:** `https://forgejo.coulomb.social`
| Field | Value |
| --- | --- |
| DNS | `forgejo.coulomb.social``92.205.62.239` (railiance01) |
| Edge | railiance01 k3s Traefik (`kube-system/traefik` LoadBalancer) |
| Target machine | railiance01 (production home per `CUST-WP-0054`) |
| Canonical git remote (post-cutover) | `https://forgejo.coulomb.social/coulomb/<repo>.git` |
| OCI registry (post-cutover) | `forgejo.coulomb.social/coulomb/<image>` |
### Live probe (2026-07-03, post-deploy)
```bash
getent hosts forgejo.coulomb.social # 92.205.62.239
curl -fsS -o /dev/null -w '%{http_code}\n' https://forgejo.coulomb.social/ # 200
curl -sSI -X GET https://forgejo.coulomb.social/v2/ | grep -i docker-distribution # registry/2.0
KUBECONFIG=~/.kube/config-hosteurope kubectl get pods,ingress,certificate -n forgejo
```
Forgejo is serving HTTPS with a valid Let's Encrypt cert. Gitea on coulombcore
remains canonical for git remotes until migration drills pass.
### Implications for CUST-WP-0054
- Wave 1 can proceed with a fixed hostname for overlays, ingress manifests, and
CI `IMAGE_REPOSITORY` variables.
- State Hub / sweep checkouts on railiance01 (T05) should clone from
`forgejo.coulomb.social` once cutover completes.
- Remaining T02 items (SMTP, runners, backup, cutover mode) still block
production cutover and `RAIL-HO-WP-0005-T11`.
## Open decisions (need operator input)
1. SMTP provider, sender address, and SPF/DKIM alignment for `@coulomb.social`
2. Package types beyond OCI at launch (npm, PyPI, Helm, …)
3. Actions runner: in-cluster ephemeral vs long-lived pod vs host runner
4. Backup destination and restore cadence
5. Cutover: staged project-by-project vs single freeze window