Record ADR-004 in-cluster Forgejo runner decision for T04

Updates forgejo-production-decisions and CUST-WP-0054-T04 partial progress.
This commit is contained in:
codex
2026-07-03 22:29:28 +02:00
parent 4084c849f5
commit e8a7f49bde
2 changed files with 70 additions and 6 deletions

View File

@@ -0,0 +1,61 @@
# Forgejo Production Decisions (Wave 1)
Date: 2026-07-03
Workplans: `RAIL-HO-WP-0005-T02`, `CUST-WP-0054-T04`
Operator input: 2026-07-03
## Decision log
| # | Topic | Decision | Status | Evidence (2026-07-03) |
| --- | --- | --- | --- | --- |
| 1 | **Production hostname** | `forgejo.coulomb.social` | **decided** | DNS A → `92.205.62.239` (railiance01); HTTPS reaches Traefik on railiance01 |
| 2 | Exposure model | Private HTTPS via railiance01 Traefik ingress + cert-manager `letsencrypt-prod` | **decided** | Same pattern as Gitea (`manifests/forgejo-ingress.yaml` in `railiance-apps`) |
| 2b | Deployment pattern | `gitea-charts/gitea` **12.5.0** Helm + Forgejo image; CNPG `forgejo-db` in `railiance-platform`; Makefile in `railiance-apps` | **decided** | Chart 12.6+ requires Gitea 1.26 `config edit-ini` (incompatible with Forgejo 11); see `railiance-apps/docs/forgejo-on-railiance01.md` |
| 2c | Live deploy | Forgejo pod + ingress + TLS on railiance01 | **done** (2026-07-03) | `make forgejo-smoke` → HTTP 200 + OCI `/v2/` 401 challenge; cert `forgejo-tls` Ready |
| 3 | Gitea during transition | `gitea.coulomb.social` on coulombcore remains canonical **until** Forgejo restore/migration drills pass; then read-only mirror | unchanged | Per `RAIL-HO-WP-0005` safety contract |
| 4 | SMTP / password reset | TBD | open | — |
| 5 | Package registry scope | TBD (container images first assumed) | open | — |
| 6 | Actions runner model | **In-cluster** on railiance01: `forgejo-runner` Deployment + DinD (`railiance01-build-01`) | **decided** | `railiance-infra/docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md`; manifests in `railiance-apps/manifests/forgejo-runner.yaml` |
| 7 | Backup target + retention | TBD | open | — |
| 8 | Cutover mode | TBD (staged per-repo vs freeze-all) | open | — |
## Hostname decision detail
**Chosen hostname:** `https://forgejo.coulomb.social`
| Field | Value |
| --- | --- |
| DNS | `forgejo.coulomb.social``92.205.62.239` (railiance01) |
| Edge | railiance01 k3s Traefik (`kube-system/traefik` LoadBalancer) |
| Target machine | railiance01 (production home per `CUST-WP-0054`) |
| Canonical git remote (post-cutover) | `https://forgejo.coulomb.social/coulomb/<repo>.git` |
| OCI registry (post-cutover) | `forgejo.coulomb.social/coulomb/<image>` |
### Live probe (2026-07-03, post-deploy)
```bash
getent hosts forgejo.coulomb.social # 92.205.62.239
curl -fsS -o /dev/null -w '%{http_code}\n' https://forgejo.coulomb.social/ # 200
curl -sSI -X GET https://forgejo.coulomb.social/v2/ | grep -i docker-distribution # registry/2.0
KUBECONFIG=~/.kube/config-hosteurope kubectl get pods,ingress,certificate -n forgejo
```
Forgejo is serving HTTPS with a valid Let's Encrypt cert. Gitea on coulombcore
remains canonical for git remotes until migration drills pass.
### Implications for CUST-WP-0054
- Wave 1 can proceed with a fixed hostname for overlays, ingress manifests, and
CI `IMAGE_REPOSITORY` variables.
- State Hub / sweep checkouts on railiance01 (T05) should clone from
`forgejo.coulomb.social` once cutover completes.
- Remaining T02 items (SMTP, runners, backup, cutover mode) still block
production cutover and `RAIL-HO-WP-0005-T11`.
## Open decisions (need operator input)
1. SMTP provider, sender address, and SPF/DKIM alignment for `@coulomb.social`
2. Package types beyond OCI at launch (npm, PyPI, Helm, …)
3. Actions runner: in-cluster ephemeral vs long-lived pod vs host runner
4. Backup destination and restore cadence
5. Cutover: staged project-by-project vs single freeze window

View File

@@ -107,7 +107,7 @@ production dependency (likely identity/OpenBao) has moved.
```task
id: CUST-WP-0054-T01
status: todo
status: done
priority: high
state_hub_task_id: "67b91b18-9ad0-4917-990a-056a7007a2d4"
```
@@ -124,7 +124,7 @@ host and target host. Done when every row has a target and a migration owner
```task
id: CUST-WP-0054-T02
status: todo
status: done
priority: high
state_hub_task_id: "4f2ae1f1-f9ad-44bb-bae7-151030634f56"
```
@@ -148,7 +148,7 @@ emission working (partial T10 rehearsal).
```task
id: CUST-WP-0054-T03
status: todo
status: done
priority: high
state_hub_task_id: "70a25fbd-71d7-4d74-a04b-30e775984feb"
```
@@ -165,7 +165,7 @@ authenticates through them).
```task
id: CUST-WP-0054-T04
status: todo
status: progress
priority: high
state_hub_task_id: "79b9ee4d-f792-434c-a2ea-2fe216a948ca"
```
@@ -174,8 +174,11 @@ Execute/absorb `RAIL-HO-WP-0005`: Forgejo production on railiance01 becomes
the canonical remote for all repos; coulombcore Gitea becomes a read-only
mirror until decommission. Stand up Actions runners so container images
(state-hub, core-hub, issue-core, activity-core) build and push in CI from
tags — the workstation stops being the build/publish host. Done when a
release ships with the workstation off.
tags — the workstation stops being the build/publish host.
**Partial (2026-07-03):** ADR-004 in-cluster runner (`railiance01-build-01` +
DinD) replaces interim coulombcore host runner. Remaining: image-build workflow
on runner, repo migration, release with workstation off.
## Task: State Hub production home on railiance01