486 lines
16 KiB
Markdown
486 lines
16 KiB
Markdown
---
|
|
id: RAIL-HO-WP-0005
|
|
type: workplan
|
|
title: "Forgejo Production Migration on railiance01"
|
|
domain: railiance
|
|
repo: railiance-infra
|
|
status: active
|
|
owner: railiance
|
|
topic_slug: railiance
|
|
created: "2026-05-03"
|
|
updated: "2026-06-04"
|
|
state_hub_workstream_id: "84e17675-0d15-4268-a8bd-540124d37018"
|
|
---
|
|
|
|
# Forgejo Production Migration on railiance01
|
|
|
|
## Goal
|
|
|
|
Establish Forgejo as the production-grade source forge and package base for
|
|
Railiance, then migrate all repositories and workflows currently relying on
|
|
Gitea to the new Forgejo installation.
|
|
|
|
Forgejo will become the heart of Railiance infrastructure. The work must be
|
|
fully automated, backup-backed, recovery-drilled, and suitable for long-lived
|
|
operation on railiance01 before any production cutover happens.
|
|
|
|
## Placement in the Railiance Tooling Set
|
|
|
|
This workplan lives in `railiance-infra` because it is the cross-layer
|
|
production infrastructure coordination plan and belongs next to
|
|
`RAIL-HO-WP-0004-production-readiness.md`.
|
|
|
|
Implementation must respect the OAS repo boundaries:
|
|
|
|
| Concern | Repo | Layer |
|
|
|---------|------|-------|
|
|
| Server prerequisites, inventory, OS packages, SSH/system users | `railiance-infra` | S1 |
|
|
| k3s runtime prerequisites, namespaces, ingress class, cluster backup hooks | `railiance-cluster` | S2 |
|
|
| PostgreSQL, object storage, backup targets, registry storage dependencies | `railiance-platform` | S3 |
|
|
| Forgejo Actions runner templates, CI conventions, migration automation | `railiance-enablement` | S4 |
|
|
| Forgejo Helm release, app config, mail config, package registry, app backups | `railiance-apps` | S5 |
|
|
|
|
This file is the umbrella plan. If an implementation step requires files in a
|
|
different repo, that repo should receive its own workplan or task before the
|
|
change is made there.
|
|
|
|
## Key Decisions to Confirm
|
|
|
|
1. Public/private hostname for Forgejo and whether Gitea remains reachable
|
|
during the transition.
|
|
2. Mail delivery path for password reset and account recovery
|
|
(SMTP relay, sender domain, SPF/DKIM/DMARC expectations).
|
|
3. Package registry scope: container images only at first, or also generic,
|
|
npm, PyPI, Go, Maven, and Helm packages.
|
|
4. Actions runner model: in-cluster ephemeral runners, long-lived runner pod,
|
|
or isolated host runner.
|
|
5. Backup destination and retention target for database, repositories,
|
|
attachments, LFS, Actions artifacts/logs, and package data.
|
|
6. Cutover mode: freeze-and-migrate all repos in one window, or staged
|
|
project-by-project transition.
|
|
|
|
## Safety Contract
|
|
|
|
- Gitea remains the production source of truth until Forgejo restore and
|
|
migration drills pass.
|
|
- No repository is deleted from Gitea during this workplan.
|
|
- A fresh Gitea backup must be taken before every migration drill and before
|
|
final cutover.
|
|
- Forgejo backups must be restored into an isolated namespace before accepting
|
|
production use.
|
|
- Password reset and email recovery must be verified with a real controlled
|
|
account before onboarding users.
|
|
- Forgejo Actions may not receive broad cluster credentials by default; runner
|
|
permissions must be least-privilege and repo-scoped where practical.
|
|
- Secrets stay in SOPS/age or Kubernetes Secrets managed by the appropriate
|
|
repo. No plaintext SMTP passwords, admin tokens, runner tokens, or registry
|
|
credentials in Git.
|
|
|
|
## Probe Strategy
|
|
|
|
A `forgejo-railiance-probe` is reasonable and should be treated as a disposable
|
|
S5/S4 integration probe, not as the production install.
|
|
|
|
The probe should prove:
|
|
|
|
- Helm values and cnpg database wiring converge cleanly.
|
|
- Initial admin bootstrap is automated and repeatable.
|
|
- SMTP/password reset works end-to-end.
|
|
- Package registry endpoints work for the package types Railiance needs first.
|
|
- Forgejo Actions can run a minimal workflow and publish a test package.
|
|
- Backup and restore works in an isolated namespace.
|
|
- Migration from a sample Gitea repo preserves git history, issues, releases,
|
|
wiki, LFS or attachments where applicable.
|
|
|
|
The probe is destroyed or explicitly archived after production Forgejo is live.
|
|
|
|
## Target Architecture
|
|
|
|
```
|
|
operator / agents / developers
|
|
-> private HTTPS endpoint
|
|
-> railiance01 ingress
|
|
-> forgejo Service in forgejo namespace
|
|
-> Forgejo Deployment/StatefulSet
|
|
-> forgejo-db CloudNative PG Cluster in databases namespace
|
|
-> Valkey/cache if required
|
|
-> persistent storage for repositories, attachments, LFS, packages
|
|
-> Actions runner(s) with restricted execution scope
|
|
-> backup jobs to the approved backup target
|
|
```
|
|
|
|
## Tasks
|
|
|
|
### T01 — Inventory current Gitea functionality and migration requirements
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T01
|
|
status: in_progress
|
|
priority: high
|
|
state_hub_task_id: "cf59d171-5629-45c9-9d44-8d6499827ffc"
|
|
```
|
|
|
|
Create a source-of-truth inventory of current Gitea usage.
|
|
|
|
First-pass inventory artifact: `docs/forgejo-migration-inventory.md`.
|
|
|
|
Minimum inventory:
|
|
|
|
- All repositories in the `coulomb` organization.
|
|
- Registered vs unregistered State Hub repos.
|
|
- Users, organizations, teams, deploy keys, SSH keys, access tokens.
|
|
- Issues, labels, milestones, releases, wiki, packages, LFS, attachments.
|
|
- Existing webhook usage and automation assumptions.
|
|
- Current Gitea package registry status and the missing `[packages]` config
|
|
that is blocking container image publication.
|
|
|
|
**Done when:** the inventory identifies every feature that must work in
|
|
Forgejo before cutover and classifies each migration item as automatic,
|
|
manual, unsupported, or explicitly out of scope.
|
|
|
|
---
|
|
|
|
### T02 — Resolve Forgejo production design decisions
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T02
|
|
status: todo
|
|
priority: high
|
|
needs_human: true
|
|
state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3"
|
|
```
|
|
|
|
Decide the production choices listed in "Key Decisions to Confirm".
|
|
|
|
Expected output:
|
|
|
|
- A short decision record in this workplan or a dedicated ADR.
|
|
- Hostname and exposure model.
|
|
- SMTP provider and sender identity.
|
|
- Package registry scope.
|
|
- Actions runner isolation model.
|
|
- Backup target, retention, encryption, and restore cadence.
|
|
- Cutover strategy and rollback window.
|
|
|
|
**Done when:** implementation tasks are no longer blocked by open production
|
|
choices.
|
|
|
|
---
|
|
|
|
### T03 — Build forgejo-railiance-probe
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T03
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "b516018a-415e-4a58-8c62-07c14ece9353"
|
|
```
|
|
|
|
Create a disposable probe environment for Forgejo before touching production.
|
|
|
|
Expected repo ownership:
|
|
|
|
- `railiance-platform`: probe cnpg database and storage dependencies.
|
|
- `railiance-apps`: probe Forgejo Helm values and namespace.
|
|
- `railiance-enablement`: probe Actions runner template and workflows.
|
|
|
|
Probe acceptance:
|
|
|
|
- `make forgejo-probe-deploy` or equivalent converges from a clean cluster
|
|
state.
|
|
- Admin bootstrap is automated.
|
|
- A test user can reset a password via email.
|
|
- A test repository can be created, cloned, pushed, and protected.
|
|
- A test package can be published and pulled.
|
|
- A test Forgejo Actions workflow runs successfully.
|
|
- A probe backup restores into an isolated namespace.
|
|
|
|
**Done when:** the probe demonstrates the whole lifecycle without manual
|
|
cluster surgery.
|
|
|
|
---
|
|
|
|
### T04 — Define Forgejo platform services
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T04
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "28b351fe-bfbe-4a8b-bbfa-1b148e69f8e0"
|
|
```
|
|
|
|
In `railiance-platform`, define production platform services for Forgejo.
|
|
|
|
Minimum scope:
|
|
|
|
- `forgejo-db` CloudNative PG cluster.
|
|
- Database credentials via SOPS-managed Secret or approved secret flow.
|
|
- Backup configuration for database base backups and WAL archiving.
|
|
- Object storage or persistent volume plan for repositories, attachments, LFS,
|
|
packages, Actions artifacts, and logs.
|
|
- Restore runbook for database and blob/package data.
|
|
|
|
**Done when:** platform dependencies can be deployed and restored without the
|
|
Forgejo app running.
|
|
|
|
---
|
|
|
|
### T05 — Define production Forgejo application deployment
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T05
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "11540ba4-d31c-4f64-836b-c6de69107aa4"
|
|
```
|
|
|
|
In `railiance-apps`, create the production Forgejo deployment.
|
|
|
|
Minimum scope:
|
|
|
|
- Forgejo Helm release or manifests in the S5 boundary.
|
|
- App configuration for database, SSH, HTTPS, mailer, packages, LFS, and
|
|
security settings.
|
|
- Initial admin/user bootstrap that is automated but does not commit secrets.
|
|
- Health/status targets in the Makefile.
|
|
- Migration-safe configuration for coexistence with Gitea during the cutover.
|
|
|
|
**Done when:** Forgejo runs on railiance01 against production platform
|
|
services and can serve login, git clone/push, package registry, and admin
|
|
operations.
|
|
|
|
---
|
|
|
|
### T06 — Implement usable email recovery cycle
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T06
|
|
status: todo
|
|
priority: high
|
|
needs_human: true
|
|
state_hub_task_id: "417faa4d-eab8-4247-9485-4f80e5d5b7ff"
|
|
```
|
|
|
|
Configure and test mail delivery for account recovery.
|
|
|
|
Minimum scope:
|
|
|
|
- SMTP credentials stored through the approved secret path.
|
|
- Sender address and domain alignment documented.
|
|
- Password reset email works for a controlled non-admin account.
|
|
- Account recovery runbook covers lost password, lost MFA, disabled account,
|
|
and emergency admin access.
|
|
- Mail failure is observable through logs or a health check.
|
|
|
|
**Done when:** a user can complete password recovery without operator database
|
|
edits, and the operator has a documented emergency path.
|
|
|
|
---
|
|
|
|
### T07 — Enable and harden package registry base
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T07
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "9578f672-e2b8-43a3-8419-5f86f8871326"
|
|
```
|
|
|
|
Enable Forgejo packages for Railiance's near-term build and deployment needs.
|
|
|
|
Initial package types:
|
|
|
|
- Container registry for State Hub and future app images.
|
|
- Generic packages for release artifacts.
|
|
- Additional package types only after the inventory proves they are needed.
|
|
|
|
Acceptance:
|
|
|
|
- Authenticated push and pull works from operator workstation and railiance01.
|
|
- Container image pull works from k3s deployments.
|
|
- Retention and cleanup expectations are documented.
|
|
- Package data is included in backup and restore drills.
|
|
|
|
**Done when:** `state-hub` or a probe image can be published to Forgejo and
|
|
pulled by railiance01.
|
|
|
|
---
|
|
|
|
### T08 — Enable Forgejo Actions
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T08
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "f45f98c9-2f02-4224-bbfd-c2e1ec38581e"
|
|
```
|
|
|
|
Enable Forgejo Actions with a least-privilege runner model.
|
|
|
|
Minimum scope:
|
|
|
|
- Runner registration automated without committing runner tokens.
|
|
- Runner isolation model documented.
|
|
- Minimal workflows for lint/test/build on representative repositories.
|
|
- Workflow to build and publish a probe container image to Forgejo packages.
|
|
- Secret handling policy for Actions.
|
|
- Resource limits to avoid repeating previous single-node overload patterns.
|
|
|
|
**Done when:** a representative repository can run Forgejo Actions and publish
|
|
a test artifact without privileged cluster-wide credentials.
|
|
|
|
---
|
|
|
|
### T09 — Implement Forgejo backup and restore automation
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T09
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "25892007-36ca-4bd9-8adf-84d505465d7d"
|
|
```
|
|
|
|
Create backup automation for all Forgejo state.
|
|
|
|
Must cover:
|
|
|
|
- PostgreSQL database.
|
|
- Git repositories.
|
|
- Attachments.
|
|
- LFS.
|
|
- Packages.
|
|
- Avatars and app data.
|
|
- Actions logs/artifacts if retained.
|
|
- App configuration required for restore.
|
|
|
|
Acceptance:
|
|
|
|
- Scheduled backups run without manual intervention.
|
|
- Backups are encrypted or stored in an approved protected target.
|
|
- Restore into an isolated namespace is drilled and documented.
|
|
- RPO/RTO expectations are recorded.
|
|
|
|
**Done when:** a fresh backup restores to a working isolated Forgejo instance
|
|
with repository, package, and user recovery checks passing.
|
|
|
|
---
|
|
|
|
### T10 — Drill Gitea to Forgejo migration
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T10
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "6befde73-00bc-4643-be0b-a7ce7944e75f"
|
|
```
|
|
|
|
Run a non-production migration drill from Gitea to Forgejo.
|
|
|
|
Minimum checks:
|
|
|
|
- Git history and default branches preserved.
|
|
- Issues, labels, milestones, releases, wiki, and attachments handled per
|
|
inventory classification.
|
|
- SSH/HTTPS clone and push paths work.
|
|
- Existing local remotes can be transformed predictably.
|
|
- State Hub registered repo remotes can be updated safely.
|
|
- Rollback plan is rehearsed.
|
|
|
|
**Done when:** a sample migration has a written result matrix and no unknown
|
|
critical migration gaps remain.
|
|
|
|
---
|
|
|
|
### T11 — Production cutover from Gitea to Forgejo
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T11
|
|
status: todo
|
|
priority: high
|
|
needs_human: true
|
|
state_hub_task_id: "b1b66687-ca33-4971-b312-743c8e059c5e"
|
|
```
|
|
|
|
Execute the production migration only after the probe, backup restore, package
|
|
registry, email recovery, and Actions gates pass.
|
|
|
|
Cutover sequence:
|
|
|
|
1. Announce freeze window.
|
|
2. Take final Gitea backup and verify it exists.
|
|
3. Freeze Gitea writes.
|
|
4. Migrate repositories and metadata to Forgejo.
|
|
5. Validate critical repositories and package pulls.
|
|
6. Update State Hub repo remotes and host paths as needed.
|
|
7. Update local and railiance01 remotes.
|
|
8. Keep Gitea read-only as rollback until the stabilization window passes.
|
|
|
|
**Done when:** all Railiance/Custodian repos use Forgejo as primary, Gitea is
|
|
read-only fallback, and rollback instructions are documented.
|
|
|
|
---
|
|
|
|
### T12 — Retire or archive legacy Gitea
|
|
|
|
```task
|
|
id: RAIL-HO-WP-0005-T12
|
|
status: todo
|
|
priority: medium
|
|
needs_human: true
|
|
state_hub_task_id: "a63147b0-31d5-4705-89ea-40c10faf779f"
|
|
```
|
|
|
|
Retire legacy Gitea only after a stabilization period and explicit approval.
|
|
|
|
Minimum scope:
|
|
|
|
- Confirm no active remotes, webhooks, packages, or dashboards depend on Gitea.
|
|
- Preserve final Gitea backup.
|
|
- Update runbooks and dashboards from Gitea to Forgejo.
|
|
- Remove or archive Gitea Helm release according to the rollback decision.
|
|
- Close stale State Hub references to `railiance-bootstrap` if confirmed as
|
|
an alias rather than a real repo.
|
|
|
|
**Done when:** Forgejo is the only active source forge and package base, with
|
|
legacy Gitea either archived or intentionally retained as documented fallback.
|
|
|
|
## Phasing and Dependencies
|
|
|
|
```
|
|
T01 inventory ─┬─► T02 decisions ─┬─► T03 probe ─┬─► T04 platform
|
|
│ │ ├─► T05 app
|
|
│ │ ├─► T06 mail recovery
|
|
│ │ ├─► T07 packages
|
|
│ │ ├─► T08 actions
|
|
│ │ └─► T09 backups
|
|
└────────────────────────────────────► T10 migration drill
|
|
|
|
T03-T10 all pass ─► T11 production cutover ─► T12 legacy Gitea retirement
|
|
```
|
|
|
|
Recommended first slice: T01, T02, T03. Do not start T11 until T06, T07, T08,
|
|
T09, and T10 are complete.
|
|
|
|
## railiance-bootstrap Note
|
|
|
|
State Hub currently registers both `railiance-bootstrap` and
|
|
`railiance-cluster`, but they point to the same local path
|
|
(`/home/worsch/railiance-cluster`) and the same git fingerprint. The
|
|
`railiance-bootstrap` entry has no remote URL. The earlier restructure workplan
|
|
(`RAIL-HO-WP-0003-T03`) says `railiance-bootstrap` was renamed to
|
|
`railiance-cluster`.
|
|
|
|
Working assumption: `railiance-bootstrap` is a stale logical alias or leftover
|
|
repo goal, not a separate Gitea repository. This workplan should not create a
|
|
new Forgejo repository named `railiance-bootstrap` unless a concrete remaining
|
|
purpose is identified.
|
|
|
|
## References
|
|
|
|
- `RAIL-HO-WP-0004-production-readiness.md`
|
|
- `RAIL-HO-WP-0003-5repo-stack-restructure.md`
|
|
- `CUST-WP-0014-repo-sync-automation.md`
|
|
- `CUST-WP-0021-multi-host-repo-paths.md`
|
|
- `ops/incidents/2026-03-25-gitea-pgpool-crashloop.md`
|
|
- `ops/incidents/2026-03-26-coulombcore-runaway-agent-overload.md`
|