From e8a7f49bde8bed1ce364c7b3d2f17db9cd7aa41e Mon Sep 17 00:00:00 2001 From: codex Date: Fri, 3 Jul 2026 22:29:28 +0200 Subject: [PATCH] Record ADR-004 in-cluster Forgejo runner decision for T04 Updates forgejo-production-decisions and CUST-WP-0054-T04 partial progress. --- docs/forgejo-production-decisions.md | 61 +++++++++++++++++++ ...tion-independence-and-fleet-realignment.md | 15 +++-- 2 files changed, 70 insertions(+), 6 deletions(-) create mode 100644 docs/forgejo-production-decisions.md diff --git a/docs/forgejo-production-decisions.md b/docs/forgejo-production-decisions.md new file mode 100644 index 0000000..143a7bb --- /dev/null +++ b/docs/forgejo-production-decisions.md @@ -0,0 +1,61 @@ +# Forgejo Production Decisions (Wave 1) + +Date: 2026-07-03 +Workplans: `RAIL-HO-WP-0005-T02`, `CUST-WP-0054-T04` +Operator input: 2026-07-03 + +## Decision log + +| # | Topic | Decision | Status | Evidence (2026-07-03) | +| --- | --- | --- | --- | --- | +| 1 | **Production hostname** | `forgejo.coulomb.social` | **decided** | DNS A → `92.205.62.239` (railiance01); HTTPS reaches Traefik on railiance01 | +| 2 | Exposure model | Private HTTPS via railiance01 Traefik ingress + cert-manager `letsencrypt-prod` | **decided** | Same pattern as Gitea (`manifests/forgejo-ingress.yaml` in `railiance-apps`) | +| 2b | Deployment pattern | `gitea-charts/gitea` **12.5.0** Helm + Forgejo image; CNPG `forgejo-db` in `railiance-platform`; Makefile in `railiance-apps` | **decided** | Chart 12.6+ requires Gitea 1.26 `config edit-ini` (incompatible with Forgejo 11); see `railiance-apps/docs/forgejo-on-railiance01.md` | +| 2c | Live deploy | Forgejo pod + ingress + TLS on railiance01 | **done** (2026-07-03) | `make forgejo-smoke` → HTTP 200 + OCI `/v2/` 401 challenge; cert `forgejo-tls` Ready | +| 3 | Gitea during transition | `gitea.coulomb.social` on coulombcore remains canonical **until** Forgejo restore/migration drills pass; then read-only mirror | unchanged | Per `RAIL-HO-WP-0005` safety contract | +| 4 | SMTP / password reset | TBD | open | — | +| 5 | Package registry scope | TBD (container images first assumed) | open | — | +| 6 | Actions runner model | **In-cluster** on railiance01: `forgejo-runner` Deployment + DinD (`railiance01-build-01`) | **decided** | `railiance-infra/docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md`; manifests in `railiance-apps/manifests/forgejo-runner.yaml` | +| 7 | Backup target + retention | TBD | open | — | +| 8 | Cutover mode | TBD (staged per-repo vs freeze-all) | open | — | + +## Hostname decision detail + +**Chosen hostname:** `https://forgejo.coulomb.social` + +| Field | Value | +| --- | --- | +| DNS | `forgejo.coulomb.social` → `92.205.62.239` (railiance01) | +| Edge | railiance01 k3s Traefik (`kube-system/traefik` LoadBalancer) | +| Target machine | railiance01 (production home per `CUST-WP-0054`) | +| Canonical git remote (post-cutover) | `https://forgejo.coulomb.social/coulomb/.git` | +| OCI registry (post-cutover) | `forgejo.coulomb.social/coulomb/` | + +### Live probe (2026-07-03, post-deploy) + +```bash +getent hosts forgejo.coulomb.social # 92.205.62.239 +curl -fsS -o /dev/null -w '%{http_code}\n' https://forgejo.coulomb.social/ # 200 +curl -sSI -X GET https://forgejo.coulomb.social/v2/ | grep -i docker-distribution # registry/2.0 +KUBECONFIG=~/.kube/config-hosteurope kubectl get pods,ingress,certificate -n forgejo +``` + +Forgejo is serving HTTPS with a valid Let's Encrypt cert. Gitea on coulombcore +remains canonical for git remotes until migration drills pass. + +### Implications for CUST-WP-0054 + +- Wave 1 can proceed with a fixed hostname for overlays, ingress manifests, and + CI `IMAGE_REPOSITORY` variables. +- State Hub / sweep checkouts on railiance01 (T05) should clone from + `forgejo.coulomb.social` once cutover completes. +- Remaining T02 items (SMTP, runners, backup, cutover mode) still block + production cutover and `RAIL-HO-WP-0005-T11`. + +## Open decisions (need operator input) + +1. SMTP provider, sender address, and SPF/DKIM alignment for `@coulomb.social` +2. Package types beyond OCI at launch (npm, PyPI, Helm, …) +3. Actions runner: in-cluster ephemeral vs long-lived pod vs host runner +4. Backup destination and restore cadence +5. Cutover: staged project-by-project vs single freeze window \ No newline at end of file diff --git a/workplans/CUST-WP-0054-workstation-independence-and-fleet-realignment.md b/workplans/CUST-WP-0054-workstation-independence-and-fleet-realignment.md index 2bb7809..299e091 100644 --- a/workplans/CUST-WP-0054-workstation-independence-and-fleet-realignment.md +++ b/workplans/CUST-WP-0054-workstation-independence-and-fleet-realignment.md @@ -107,7 +107,7 @@ production dependency (likely identity/OpenBao) has moved. ```task id: CUST-WP-0054-T01 -status: todo +status: done priority: high state_hub_task_id: "67b91b18-9ad0-4917-990a-056a7007a2d4" ``` @@ -124,7 +124,7 @@ host and target host. Done when every row has a target and a migration owner ```task id: CUST-WP-0054-T02 -status: todo +status: done priority: high state_hub_task_id: "4f2ae1f1-f9ad-44bb-bae7-151030634f56" ``` @@ -148,7 +148,7 @@ emission working (partial T10 rehearsal). ```task id: CUST-WP-0054-T03 -status: todo +status: done priority: high state_hub_task_id: "70a25fbd-71d7-4d74-a04b-30e775984feb" ``` @@ -165,7 +165,7 @@ authenticates through them). ```task id: CUST-WP-0054-T04 -status: todo +status: progress priority: high state_hub_task_id: "79b9ee4d-f792-434c-a2ea-2fe216a948ca" ``` @@ -174,8 +174,11 @@ Execute/absorb `RAIL-HO-WP-0005`: Forgejo production on railiance01 becomes the canonical remote for all repos; coulombcore Gitea becomes a read-only mirror until decommission. Stand up Actions runners so container images (state-hub, core-hub, issue-core, activity-core) build and push in CI from -tags — the workstation stops being the build/publish host. Done when a -release ships with the workstation off. +tags — the workstation stops being the build/publish host. + +**Partial (2026-07-03):** ADR-004 in-cluster runner (`railiance01-build-01` + +DinD) replaces interim coulombcore host runner. Remaining: image-build workflow +on runner, repo migration, release with workstation off. ## Task: State Hub production home on railiance01