--- id: RAIL-HO-WP-0005 type: workplan title: "Forgejo Production Migration on railiance01" domain: railiance repo: railiance-infra status: active owner: railiance topic_slug: railiance created: "2026-05-03" updated: "2026-06-04" state_hub_workstream_id: "84e17675-0d15-4268-a8bd-540124d37018" --- # Forgejo Production Migration on railiance01 ## Goal Establish Forgejo as the production-grade source forge and package base for Railiance, then migrate all repositories and workflows currently relying on Gitea to the new Forgejo installation. Forgejo will become the heart of Railiance infrastructure. The work must be fully automated, backup-backed, recovery-drilled, and suitable for long-lived operation on railiance01 before any production cutover happens. ## Placement in the Railiance Tooling Set This workplan lives in `railiance-infra` because it is the cross-layer production infrastructure coordination plan and belongs next to `RAIL-HO-WP-0004-production-readiness.md`. Implementation must respect the OAS repo boundaries: | Concern | Repo | Layer | |---------|------|-------| | Server prerequisites, inventory, OS packages, SSH/system users | `railiance-infra` | S1 | | k3s runtime prerequisites, namespaces, ingress class, cluster backup hooks | `railiance-cluster` | S2 | | PostgreSQL, object storage, backup targets, registry storage dependencies | `railiance-platform` | S3 | | Forgejo Actions runner templates, CI conventions, migration automation | `railiance-enablement` | S4 | | Forgejo Helm release, app config, mail config, package registry, app backups | `railiance-apps` | S5 | This file is the umbrella plan. If an implementation step requires files in a different repo, that repo should receive its own workplan or task before the change is made there. ## Key Decisions to Confirm 1. Public/private hostname for Forgejo and whether Gitea remains reachable during the transition. 2. Mail delivery path for password reset and account recovery (SMTP relay, sender domain, SPF/DKIM/DMARC expectations). 3. Package registry scope: container images only at first, or also generic, npm, PyPI, Go, Maven, and Helm packages. 4. Actions runner model: in-cluster ephemeral runners, long-lived runner pod, or isolated host runner. 5. Backup destination and retention target for database, repositories, attachments, LFS, Actions artifacts/logs, and package data. 6. Cutover mode: freeze-and-migrate all repos in one window, or staged project-by-project transition. ## Safety Contract - Gitea remains the production source of truth until Forgejo restore and migration drills pass. - No repository is deleted from Gitea during this workplan. - A fresh Gitea backup must be taken before every migration drill and before final cutover. - Forgejo backups must be restored into an isolated namespace before accepting production use. - Password reset and email recovery must be verified with a real controlled account before onboarding users. - Forgejo Actions may not receive broad cluster credentials by default; runner permissions must be least-privilege and repo-scoped where practical. - Secrets stay in SOPS/age or Kubernetes Secrets managed by the appropriate repo. No plaintext SMTP passwords, admin tokens, runner tokens, or registry credentials in Git. ## Probe Strategy A `forgejo-railiance-probe` is reasonable and should be treated as a disposable S5/S4 integration probe, not as the production install. The probe should prove: - Helm values and cnpg database wiring converge cleanly. - Initial admin bootstrap is automated and repeatable. - SMTP/password reset works end-to-end. - Package registry endpoints work for the package types Railiance needs first. - Forgejo Actions can run a minimal workflow and publish a test package. - Backup and restore works in an isolated namespace. - Migration from a sample Gitea repo preserves git history, issues, releases, wiki, LFS or attachments where applicable. The probe is destroyed or explicitly archived after production Forgejo is live. ## Target Architecture ``` operator / agents / developers -> private HTTPS endpoint -> railiance01 ingress -> forgejo Service in forgejo namespace -> Forgejo Deployment/StatefulSet -> forgejo-db CloudNative PG Cluster in databases namespace -> Valkey/cache if required -> persistent storage for repositories, attachments, LFS, packages -> Actions runner(s) with restricted execution scope -> backup jobs to the approved backup target ``` ## Tasks ### T01 — Inventory current Gitea functionality and migration requirements ```task id: RAIL-HO-WP-0005-T01 status: in_progress priority: high state_hub_task_id: "cf59d171-5629-45c9-9d44-8d6499827ffc" ``` Create a source-of-truth inventory of current Gitea usage. First-pass inventory artifact: `docs/forgejo-migration-inventory.md`. Minimum inventory: - All repositories in the `coulomb` organization. - Registered vs unregistered State Hub repos. - Users, organizations, teams, deploy keys, SSH keys, access tokens. - Issues, labels, milestones, releases, wiki, packages, LFS, attachments. - Existing webhook usage and automation assumptions. - Current Gitea package registry status and the missing `[packages]` config that is blocking container image publication. **Done when:** the inventory identifies every feature that must work in Forgejo before cutover and classifies each migration item as automatic, manual, unsupported, or explicitly out of scope. --- ### T02 — Resolve Forgejo production design decisions ```task id: RAIL-HO-WP-0005-T02 status: todo priority: high needs_human: true state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3" ``` Decide the production choices listed in "Key Decisions to Confirm". Expected output: - A short decision record in this workplan or a dedicated ADR. - Hostname and exposure model. - SMTP provider and sender identity. - Package registry scope. - Actions runner isolation model. - Backup target, retention, encryption, and restore cadence. - Cutover strategy and rollback window. **Done when:** implementation tasks are no longer blocked by open production choices. --- ### T03 — Build forgejo-railiance-probe ```task id: RAIL-HO-WP-0005-T03 status: todo priority: high state_hub_task_id: "b516018a-415e-4a58-8c62-07c14ece9353" ``` Create a disposable probe environment for Forgejo before touching production. Expected repo ownership: - `railiance-platform`: probe cnpg database and storage dependencies. - `railiance-apps`: probe Forgejo Helm values and namespace. - `railiance-enablement`: probe Actions runner template and workflows. Probe acceptance: - `make forgejo-probe-deploy` or equivalent converges from a clean cluster state. - Admin bootstrap is automated. - A test user can reset a password via email. - A test repository can be created, cloned, pushed, and protected. - A test package can be published and pulled. - A test Forgejo Actions workflow runs successfully. - A probe backup restores into an isolated namespace. **Done when:** the probe demonstrates the whole lifecycle without manual cluster surgery. --- ### T04 — Define Forgejo platform services ```task id: RAIL-HO-WP-0005-T04 status: todo priority: high state_hub_task_id: "28b351fe-bfbe-4a8b-bbfa-1b148e69f8e0" ``` In `railiance-platform`, define production platform services for Forgejo. Minimum scope: - `forgejo-db` CloudNative PG cluster. - Database credentials via SOPS-managed Secret or approved secret flow. - Backup configuration for database base backups and WAL archiving. - Object storage or persistent volume plan for repositories, attachments, LFS, packages, Actions artifacts, and logs. - Restore runbook for database and blob/package data. **Done when:** platform dependencies can be deployed and restored without the Forgejo app running. --- ### T05 — Define production Forgejo application deployment ```task id: RAIL-HO-WP-0005-T05 status: todo priority: high state_hub_task_id: "11540ba4-d31c-4f64-836b-c6de69107aa4" ``` In `railiance-apps`, create the production Forgejo deployment. Minimum scope: - Forgejo Helm release or manifests in the S5 boundary. - App configuration for database, SSH, HTTPS, mailer, packages, LFS, and security settings. - Initial admin/user bootstrap that is automated but does not commit secrets. - Health/status targets in the Makefile. - Migration-safe configuration for coexistence with Gitea during the cutover. **Done when:** Forgejo runs on railiance01 against production platform services and can serve login, git clone/push, package registry, and admin operations. --- ### T06 — Implement usable email recovery cycle ```task id: RAIL-HO-WP-0005-T06 status: todo priority: high needs_human: true state_hub_task_id: "417faa4d-eab8-4247-9485-4f80e5d5b7ff" ``` Configure and test mail delivery for account recovery. Minimum scope: - SMTP credentials stored through the approved secret path. - Sender address and domain alignment documented. - Password reset email works for a controlled non-admin account. - Account recovery runbook covers lost password, lost MFA, disabled account, and emergency admin access. - Mail failure is observable through logs or a health check. **Done when:** a user can complete password recovery without operator database edits, and the operator has a documented emergency path. --- ### T07 — Enable and harden package registry base ```task id: RAIL-HO-WP-0005-T07 status: todo priority: high state_hub_task_id: "9578f672-e2b8-43a3-8419-5f86f8871326" ``` Enable Forgejo packages for Railiance's near-term build and deployment needs. Initial package types: - Container registry for State Hub and future app images. - Generic packages for release artifacts. - Additional package types only after the inventory proves they are needed. Acceptance: - Authenticated push and pull works from operator workstation and railiance01. - Container image pull works from k3s deployments. - Retention and cleanup expectations are documented. - Package data is included in backup and restore drills. **Done when:** `state-hub` or a probe image can be published to Forgejo and pulled by railiance01. --- ### T08 — Enable Forgejo Actions ```task id: RAIL-HO-WP-0005-T08 status: todo priority: high state_hub_task_id: "f45f98c9-2f02-4224-bbfd-c2e1ec38581e" ``` Enable Forgejo Actions with a least-privilege runner model. Minimum scope: - Runner registration automated without committing runner tokens. - Runner isolation model documented. - Minimal workflows for lint/test/build on representative repositories. - Workflow to build and publish a probe container image to Forgejo packages. - Secret handling policy for Actions. - Resource limits to avoid repeating previous single-node overload patterns. **Done when:** a representative repository can run Forgejo Actions and publish a test artifact without privileged cluster-wide credentials. --- ### T09 — Implement Forgejo backup and restore automation ```task id: RAIL-HO-WP-0005-T09 status: todo priority: high state_hub_task_id: "25892007-36ca-4bd9-8adf-84d505465d7d" ``` Create backup automation for all Forgejo state. Must cover: - PostgreSQL database. - Git repositories. - Attachments. - LFS. - Packages. - Avatars and app data. - Actions logs/artifacts if retained. - App configuration required for restore. Acceptance: - Scheduled backups run without manual intervention. - Backups are encrypted or stored in an approved protected target. - Restore into an isolated namespace is drilled and documented. - RPO/RTO expectations are recorded. **Done when:** a fresh backup restores to a working isolated Forgejo instance with repository, package, and user recovery checks passing. --- ### T10 — Drill Gitea to Forgejo migration ```task id: RAIL-HO-WP-0005-T10 status: todo priority: high state_hub_task_id: "6befde73-00bc-4643-be0b-a7ce7944e75f" ``` Run a non-production migration drill from Gitea to Forgejo. Minimum checks: - Git history and default branches preserved. - Issues, labels, milestones, releases, wiki, and attachments handled per inventory classification. - SSH/HTTPS clone and push paths work. - Existing local remotes can be transformed predictably. - State Hub registered repo remotes can be updated safely. - Rollback plan is rehearsed. **Done when:** a sample migration has a written result matrix and no unknown critical migration gaps remain. --- ### T11 — Production cutover from Gitea to Forgejo ```task id: RAIL-HO-WP-0005-T11 status: todo priority: high needs_human: true state_hub_task_id: "b1b66687-ca33-4971-b312-743c8e059c5e" ``` Execute the production migration only after the probe, backup restore, package registry, email recovery, and Actions gates pass. Cutover sequence: 1. Announce freeze window. 2. Take final Gitea backup and verify it exists. 3. Freeze Gitea writes. 4. Migrate repositories and metadata to Forgejo. 5. Validate critical repositories and package pulls. 6. Update State Hub repo remotes and host paths as needed. 7. Update local and railiance01 remotes. 8. Keep Gitea read-only as rollback until the stabilization window passes. **Done when:** all Railiance/Custodian repos use Forgejo as primary, Gitea is read-only fallback, and rollback instructions are documented. --- ### T12 — Retire or archive legacy Gitea ```task id: RAIL-HO-WP-0005-T12 status: todo priority: medium needs_human: true state_hub_task_id: "a63147b0-31d5-4705-89ea-40c10faf779f" ``` Retire legacy Gitea only after a stabilization period and explicit approval. Minimum scope: - Confirm no active remotes, webhooks, packages, or dashboards depend on Gitea. - Preserve final Gitea backup. - Update runbooks and dashboards from Gitea to Forgejo. - Remove or archive Gitea Helm release according to the rollback decision. - Close stale State Hub references to `railiance-bootstrap` if confirmed as an alias rather than a real repo. **Done when:** Forgejo is the only active source forge and package base, with legacy Gitea either archived or intentionally retained as documented fallback. ## Phasing and Dependencies ``` T01 inventory ─┬─► T02 decisions ─┬─► T03 probe ─┬─► T04 platform │ │ ├─► T05 app │ │ ├─► T06 mail recovery │ │ ├─► T07 packages │ │ ├─► T08 actions │ │ └─► T09 backups └────────────────────────────────────► T10 migration drill T03-T10 all pass ─► T11 production cutover ─► T12 legacy Gitea retirement ``` Recommended first slice: T01, T02, T03. Do not start T11 until T06, T07, T08, T09, and T10 are complete. ## railiance-bootstrap Note State Hub currently registers both `railiance-bootstrap` and `railiance-cluster`, but they point to the same local path (`/home/worsch/railiance-cluster`) and the same git fingerprint. The `railiance-bootstrap` entry has no remote URL. The earlier restructure workplan (`RAIL-HO-WP-0003-T03`) says `railiance-bootstrap` was renamed to `railiance-cluster`. Working assumption: `railiance-bootstrap` is a stale logical alias or leftover repo goal, not a separate Gitea repository. This workplan should not create a new Forgejo repository named `railiance-bootstrap` unless a concrete remaining purpose is identified. ## References - `RAIL-HO-WP-0004-production-readiness.md` - `RAIL-HO-WP-0003-5repo-stack-restructure.md` - `CUST-WP-0014-repo-sync-automation.md` - `CUST-WP-0021-multi-host-repo-paths.md` - `ops/incidents/2026-03-25-gitea-pgpool-crashloop.md` - `ops/incidents/2026-03-26-coulombcore-runaway-agent-overload.md`