Adapt RAIL-HO-WP-0005 for production Forgejo and staged repo ladder
Reflects live railiance01 deploy, cancels isolated probe T03 in favor of in-production pilots, marks T08/T10 progress (forgejo-actions-probe, glas-harness), and documents tier 0-3 migration sequencing before state-hub.
This commit is contained in:
@@ -204,3 +204,25 @@ lost or left with an untracked remote.
|
||||
This first pass satisfies the public and infrastructure metadata part of T01.
|
||||
T01 should remain open until the authenticated admin inventory and missing repo
|
||||
classification are complete.
|
||||
|
||||
## Addendum (2026-07-04) — migration ladder and new repos
|
||||
|
||||
`RAIL-HO-WP-0005` now uses a **staged per-repo ladder** instead of an isolated
|
||||
probe namespace (T03 cancelled). Repos to add or re-classify on next inventory
|
||||
refresh:
|
||||
|
||||
| Repo | On Gitea (2026-06) | On Forgejo (2026-07-04) | Tier | Notes |
|
||||
| --- | --- | --- | ---: | --- |
|
||||
| `forgejo-actions-probe` | — | yes | 0 | Disposable runner/OCI probe |
|
||||
| `glas-harness` | yes (not in table above) | yes (canonical) | 1 | Git+SSH+CI pilot; see `the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md` |
|
||||
|
||||
**Tier definitions** (for per-repo `migration tier` column in a future refresh):
|
||||
|
||||
| Tier | Criteria | Examples |
|
||||
| ---: | --- | --- |
|
||||
| 0 | Disposable integration probes | `forgejo-actions-probe` |
|
||||
| 1 | Non-production; git+CI only | `glas-harness` |
|
||||
| 2 | Non-production with container image + registry pull | TBD (`key-cape` candidate) |
|
||||
| 3 | Production drain wave / sweep registration | `state-hub`, `issue-core`, … |
|
||||
|
||||
Production repos stay on Gitea until tier 0–2 gates and T09 backup drill pass.
|
||||
|
||||
@@ -8,7 +8,7 @@ status: active
|
||||
owner: railiance
|
||||
topic_slug: railiance
|
||||
created: "2026-05-03"
|
||||
updated: "2026-06-04"
|
||||
updated: "2026-07-04"
|
||||
state_hub_workstream_id: "84e17675-0d15-4268-a8bd-540124d37018"
|
||||
---
|
||||
|
||||
@@ -24,6 +24,13 @@ Forgejo will become the heart of Railiance infrastructure. The work must be
|
||||
fully automated, backup-backed, recovery-drilled, and suitable for long-lived
|
||||
operation on railiance01 before any production cutover happens.
|
||||
|
||||
**Sequencing update (2026-07-04):** Production Forgejo is live on railiance01
|
||||
with Gitea still canonical per the safety contract. Repo cutover proceeds
|
||||
**staged per-repo** using a migration ladder (disposable probes → non-production
|
||||
pilots → image-capable pilots → production repos). `state-hub` is last. See
|
||||
`CUST-WP-0054-T04` and
|
||||
`the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md`.
|
||||
|
||||
## Placement in the Railiance Tooling Set
|
||||
|
||||
This workplan lives in `railiance-infra` because it is the cross-layer
|
||||
@@ -48,7 +55,7 @@ change is made there.
|
||||
|
||||
1. ~~Public/private hostname for Forgejo~~ **DECIDED 2026-07-03:**
|
||||
`forgejo.coulomb.social` → railiance01 (`92.205.62.239`). DNS active;
|
||||
Traefik edge live; Forgejo workload not deployed yet (404). Gitea remains
|
||||
Traefik edge live; Forgejo workload deployed and serving HTTPS. Gitea remains
|
||||
canonical until migration drills pass. Record:
|
||||
`the-custodian/docs/forgejo-production-decisions.md`.
|
||||
2. Mail delivery path for password reset and account recovery
|
||||
@@ -60,8 +67,9 @@ change is made there.
|
||||
host runner retired after cutover.
|
||||
5. Backup destination and retention target for database, repositories,
|
||||
attachments, LFS, Actions artifacts/logs, and package data.
|
||||
6. Cutover mode: freeze-and-migrate all repos in one window, or staged
|
||||
project-by-project transition.
|
||||
6. Cutover mode: ~~freeze-all vs staged~~ **LEANING staged per-repo (2026-07-04)**
|
||||
based on `glas-harness` pilot; operator confirmation still needed. Freeze-all
|
||||
remains fallback for final production wave if drift risk is unacceptable.
|
||||
|
||||
## Safety Contract
|
||||
|
||||
@@ -80,23 +88,30 @@ change is made there.
|
||||
repo. No plaintext SMTP passwords, admin tokens, runner tokens, or registry
|
||||
credentials in Git.
|
||||
|
||||
## Probe Strategy
|
||||
## Probe and pilot strategy (revised 2026-07-04)
|
||||
|
||||
A `forgejo-railiance-probe` is reasonable and should be treated as a disposable
|
||||
S5/S4 integration probe, not as the production install.
|
||||
Original T03 planned a **disposable isolated-namespace probe** before any
|
||||
production install. That path was **superseded**: production Forgejo deployed on
|
||||
railiance01 under the safety contract (Gitea remains canonical; no Gitea deletes).
|
||||
|
||||
The probe should prove:
|
||||
Integration evidence now comes from **in-production probes and repo pilots**:
|
||||
|
||||
- Helm values and cnpg database wiring converge cleanly.
|
||||
- Initial admin bootstrap is automated and repeatable.
|
||||
- SMTP/password reset works end-to-end.
|
||||
- Package registry endpoints work for the package types Railiance needs first.
|
||||
- Forgejo Actions can run a minimal workflow and publish a test package.
|
||||
- Backup and restore works in an isolated namespace.
|
||||
- Migration from a sample Gitea repo preserves git history, issues, releases,
|
||||
wiki, LFS or attachments where applicable.
|
||||
| Tier | Repo | Purpose | Status |
|
||||
| --- | --- | --- | --- |
|
||||
| 0 | `coulomb/forgejo-actions-probe` | Runner scheduling, DinD, OCI image-build | **done** |
|
||||
| 1 | `coulomb/glas-harness` | Non-production git+SSH+CI routing drill | **done** |
|
||||
| 2 | TBD (small lib with image, e.g. `key-cape`) | Image-build workflow + registry pull on railiance01 | **next** |
|
||||
| 3 | Production set (`state-hub`, `issue-core`, …) | Canonical remotes, sweep paths, deploy loops | **gated** |
|
||||
|
||||
The probe is destroyed or explicitly archived after production Forgejo is live.
|
||||
Each tier must pass before the next. T03 (isolated probe namespace) is cancelled;
|
||||
acceptance criteria below are tracked across T05, T07, T08, and T10 instead.
|
||||
|
||||
Still to prove before T11:
|
||||
|
||||
- SMTP/password reset end-to-end (T06).
|
||||
- Backup and restore in isolated namespace (T09).
|
||||
- Issues/releases/wiki/LFS per inventory classification (T10 matrix).
|
||||
- Operator SSH identity on Forgejo beyond interim `forgejo_admin` keys (T02/T10).
|
||||
|
||||
## Target Architecture
|
||||
|
||||
@@ -141,6 +156,10 @@ Minimum inventory:
|
||||
Forgejo before cutover and classifies each migration item as automatic,
|
||||
manual, unsupported, or explicitly out of scope.
|
||||
|
||||
**Gap (2026-07-04):** first-pass inventory predates repos created after
|
||||
2026-06-04 (e.g. `glas-harness`, `forgejo-actions-probe`). Refresh org repo
|
||||
list and add a **migration tier** column (0–3) per repo before T11.
|
||||
|
||||
---
|
||||
|
||||
### T02 — Resolve Forgejo production design decisions
|
||||
@@ -155,8 +174,10 @@ state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3"
|
||||
|
||||
Decide the production choices listed in "Key Decisions to Confirm".
|
||||
|
||||
**Partial (2026-07-03):** hostname and in-cluster runner model decided (`ADR-004`).
|
||||
Remaining: SMTP, package scope, backup, cutover mode. See
|
||||
**Partial (2026-07-04):** hostname, exposure, deployment pattern, live deploy,
|
||||
and in-cluster runner model decided (`ADR-004`). Cutover mode **leaning** staged
|
||||
per-repo (glas-harness pilot). Remaining operator decisions: SMTP, package scope
|
||||
beyond OCI, backup target, final cutover confirmation. See
|
||||
`the-custodian/docs/forgejo-production-decisions.md`.
|
||||
|
||||
Expected output:
|
||||
@@ -174,36 +195,21 @@ choices.
|
||||
|
||||
---
|
||||
|
||||
### T03 — Build forgejo-railiance-probe
|
||||
### T03 — Build forgejo-railiance-probe (isolated namespace)
|
||||
|
||||
```task
|
||||
id: RAIL-HO-WP-0005-T03
|
||||
status: todo
|
||||
status: cancel
|
||||
priority: high
|
||||
state_hub_task_id: "b516018a-415e-4a58-8c62-07c14ece9353"
|
||||
```
|
||||
|
||||
Create a disposable probe environment for Forgejo before touching production.
|
||||
|
||||
Expected repo ownership:
|
||||
|
||||
- `railiance-platform`: probe cnpg database and storage dependencies.
|
||||
- `railiance-apps`: probe Forgejo Helm values and namespace.
|
||||
- `railiance-enablement`: probe Actions runner template and workflows.
|
||||
|
||||
Probe acceptance:
|
||||
|
||||
- `make forgejo-probe-deploy` or equivalent converges from a clean cluster
|
||||
state.
|
||||
- Admin bootstrap is automated.
|
||||
- A test user can reset a password via email.
|
||||
- A test repository can be created, cloned, pushed, and protected.
|
||||
- A test package can be published and pulled.
|
||||
- A test Forgejo Actions workflow runs successfully.
|
||||
- A probe backup restores into an isolated namespace.
|
||||
|
||||
**Done when:** the probe demonstrates the whole lifecycle without manual
|
||||
cluster surgery.
|
||||
**Cancelled 2026-07-04:** superseded by production Forgejo on railiance01 (T05)
|
||||
plus in-production integration probes (`forgejo-actions-probe`, `glas-harness`).
|
||||
Isolated-namespace probe added latency without reducing risk given the safety
|
||||
contract (Gitea canonical, no deletes). Remaining T03 acceptance items map to:
|
||||
T05 (deploy), T06 (mail), T07 (packages), T08 (Actions), T09 (backup restore),
|
||||
T10 (repo migration drill).
|
||||
|
||||
---
|
||||
|
||||
@@ -227,6 +233,11 @@ Minimum scope:
|
||||
packages, Actions artifacts, and logs.
|
||||
- Restore runbook for database and blob/package data.
|
||||
|
||||
**Partial (2026-07-04):** `forgejo-db` CNPG cluster healthy on railiance01
|
||||
(`make forgejo-db-status` → Cluster in healthy state). SOPS secret path and
|
||||
network policies in `railiance-platform`. Remaining: backup/WAL archiving to
|
||||
approved target, blob/package storage restore drill (feeds T09).
|
||||
|
||||
**Done when:** platform dependencies can be deployed and restored without the
|
||||
Forgejo app running.
|
||||
|
||||
@@ -252,9 +263,11 @@ Minimum scope:
|
||||
- Health/status targets in the Makefile.
|
||||
- Migration-safe configuration for coexistence with Gitea during the cutover.
|
||||
|
||||
**Partial (2026-07-03):** `railiance-apps` deploy live — HTTPS smoke pass, Actions
|
||||
enabled, `coulomb` org + probe workflow success. Remaining: SOPS secrets,
|
||||
SMTP, Docker on runner host for image builds, migration drills.
|
||||
**Partial (2026-07-04):** `railiance-apps` deploy live — HTTPS smoke pass,
|
||||
ingress + TLS, SSH NodePort `30022`, Actions enabled, `coulomb` org,
|
||||
`railiance01-build-01` runner (ADR-004). Git push/pull via HTTPS and
|
||||
`forgejo-remote` SSH proven. Remaining: SOPS hardening for all secrets,
|
||||
SMTP (T06), operator user accounts beyond `forgejo_admin`.
|
||||
|
||||
**Done when:** Forgejo runs on railiance01 against production platform
|
||||
services and can serve login, git clone/push, package registry, and admin
|
||||
@@ -312,8 +325,13 @@ Acceptance:
|
||||
- Retention and cleanup expectations are documented.
|
||||
- Package data is included in backup and restore drills.
|
||||
|
||||
**Done when:** `state-hub` or a probe image can be published to Forgejo and
|
||||
pulled by railiance01.
|
||||
**Partial (2026-07-04):** OCI registry live (`/v2/` auth challenge). Probe image
|
||||
`forgejo.coulomb.social/coulomb/forgejo-actions-probe` built and pushed via
|
||||
Actions. Remaining: publish and pull a **tier-2 pilot** app image (not yet
|
||||
`state-hub`); document retention; include packages in backup drill (T09).
|
||||
|
||||
**Done when:** a tier-2 pilot image (or `state-hub` after explicit approval) can
|
||||
be published to Forgejo and pulled by railiance01 k3s.
|
||||
|
||||
---
|
||||
|
||||
@@ -321,7 +339,7 @@ pulled by railiance01.
|
||||
|
||||
```task
|
||||
id: RAIL-HO-WP-0005-T08
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "f45f98c9-2f02-4224-bbfd-c2e1ec38581e"
|
||||
```
|
||||
@@ -337,8 +355,16 @@ Minimum scope:
|
||||
- Secret handling policy for Actions.
|
||||
- Resource limits to avoid repeating previous single-node overload patterns.
|
||||
|
||||
**Done when:** a representative repository can run Forgejo Actions and publish
|
||||
a test artifact without privileged cluster-wide credentials.
|
||||
**Partial (2026-07-04):** in-cluster runner live (`railiance-apps/manifests/
|
||||
forgejo-runner.yaml`, ADR-004). Proven workflows: `forgejo-actions-probe`
|
||||
(image-build), `glas-harness` (host+container CI smoke). Org secrets
|
||||
`REGISTRY_USER`/`REGISTRY_TOKEN` set. Documented constraints: host runner is
|
||||
non-root (static docker-cli, no `apk add`); `actions/checkout@v4` fails — use
|
||||
`git clone` in job. Remaining: reusable workflow templates in
|
||||
`railiance-enablement` (S4); resource limits review; no cluster-admin on runner.
|
||||
|
||||
**Done when:** tier-2 pilot repo runs Forgejo Actions end-to-end and publishes
|
||||
a pullable image without privileged cluster-wide credentials.
|
||||
|
||||
---
|
||||
|
||||
@@ -376,29 +402,38 @@ with repository, package, and user recovery checks passing.
|
||||
|
||||
---
|
||||
|
||||
### T10 — Drill Gitea to Forgejo migration
|
||||
### T10 — Drill Gitea to Forgejo migration (staged ladder)
|
||||
|
||||
```task
|
||||
id: RAIL-HO-WP-0005-T10
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "6befde73-00bc-4643-be0b-a7ce7944e75f"
|
||||
```
|
||||
|
||||
Run a non-production migration drill from Gitea to Forgejo.
|
||||
Run staged migration drills from Gitea to Forgejo before production repos move.
|
||||
|
||||
Minimum checks:
|
||||
**Tier 1 complete (2026-07-04):** `glas-harness` — git history preserved,
|
||||
`origin` on Forgejo, `gitea` legacy remote retained, SSH+HTTPS push, CI smoke
|
||||
green. Result matrix:
|
||||
`the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md`.
|
||||
|
||||
Minimum checks (per tier):
|
||||
|
||||
- Git history and default branches preserved.
|
||||
- Issues, labels, milestones, releases, wiki, and attachments handled per
|
||||
inventory classification.
|
||||
- SSH/HTTPS clone and push paths work.
|
||||
- Existing local remotes can be transformed predictably.
|
||||
- State Hub registered repo remotes can be updated safely.
|
||||
- Rollback plan is rehearsed.
|
||||
inventory classification (N/A for tier-1 git-only repos).
|
||||
- SSH/HTTPS clone and push paths work (`forgejo-remote` in `~/.ssh/config`).
|
||||
- Existing local remotes can be transformed predictably (`origin`/`gitea` split).
|
||||
- State Hub registered repo remotes can be updated safely (deferred for tier-1).
|
||||
- Rollback plan is rehearsed (Gitea copy unchanged).
|
||||
|
||||
**Done when:** a sample migration has a written result matrix and no unknown
|
||||
critical migration gaps remain.
|
||||
**Next:** tier-2 repo with container image + `.gitea/workflows` port to
|
||||
`.forgejo/workflows`. **Not ready:** `state-hub` until hub-core build context
|
||||
template and sweep `remote_url` playbook exist.
|
||||
|
||||
**Done when:** tiers 0–2 pass with written result matrices and no unknown
|
||||
critical migration gaps remain for production repos.
|
||||
|
||||
---
|
||||
|
||||
@@ -412,19 +447,21 @@ needs_human: true
|
||||
state_hub_task_id: "b1b66687-ca33-4971-b312-743c8e059c5e"
|
||||
```
|
||||
|
||||
Execute the production migration only after the probe, backup restore, package
|
||||
registry, email recovery, and Actions gates pass.
|
||||
Execute production migration only after T06, T07, T08, T09, and T10 tier 0–2
|
||||
gates pass. `state-hub` and other Wave-1 production repos require explicit
|
||||
operator approval per `CUST-WP-0054` drain sequence.
|
||||
|
||||
Cutover sequence:
|
||||
**Preferred cutover (staged per-repo):**
|
||||
|
||||
1. Announce freeze window.
|
||||
2. Take final Gitea backup and verify it exists.
|
||||
3. Freeze Gitea writes.
|
||||
4. Migrate repositories and metadata to Forgejo.
|
||||
5. Validate critical repositories and package pulls.
|
||||
6. Update State Hub repo remotes and host paths as needed.
|
||||
7. Update local and railiance01 remotes.
|
||||
8. Keep Gitea read-only as rollback until the stabilization window passes.
|
||||
1. Per repo: Gitea backup snapshot (or org-wide before each wave).
|
||||
2. Mirror git to Forgejo; switch workstation `origin` to `forgejo-remote`.
|
||||
3. Port/verify Actions workflows on Forgejo runner.
|
||||
4. Update State Hub `remote_url` and railiance01 sweep checkouts when promoted.
|
||||
5. Mark Gitea repo read-only (org policy); do not delete.
|
||||
6. Repeat until production set complete.
|
||||
|
||||
**Freeze-all fallback:** single window if staged drift is unacceptable — same
|
||||
steps but all repos in one maintenance period.
|
||||
|
||||
**Done when:** all Railiance/Custodian repos use Forgejo as primary, Gitea is
|
||||
read-only fallback, and rollback instructions are documented.
|
||||
@@ -458,19 +495,28 @@ legacy Gitea either archived or intentionally retained as documented fallback.
|
||||
## Phasing and Dependencies
|
||||
|
||||
```
|
||||
T01 inventory ─┬─► T02 decisions ─┬─► T03 probe ─┬─► T04 platform
|
||||
│ │ ├─► T05 app
|
||||
│ │ ├─► T06 mail recovery
|
||||
│ │ ├─► T07 packages
|
||||
│ │ ├─► T08 actions
|
||||
│ │ └─► T09 backups
|
||||
└────────────────────────────────────► T10 migration drill
|
||||
T01 inventory ──► T02 decisions ──┬──► T04 platform (forgejo-db ✓ partial)
|
||||
├──► T05 app (live ✓ partial)
|
||||
├──► T06 mail recovery
|
||||
├──► T07 packages (OCI probe ✓ partial)
|
||||
├──► T08 actions (runner ✓ partial)
|
||||
└──► T09 backups
|
||||
|
||||
T03-T10 all pass ─► T11 production cutover ─► T12 legacy Gitea retirement
|
||||
T05+T08 ──► T10 migration ladder ──► T11 production cutover ──► T12 Gitea retire
|
||||
tier0 probe ✓
|
||||
tier1 glas-harness ✓
|
||||
tier2 image repo (next)
|
||||
tier3 production (gated)
|
||||
|
||||
T03 isolated probe: CANCELLED (superseded by T05 + in-production pilots)
|
||||
```
|
||||
|
||||
Recommended first slice: T01, T02, T03. Do not start T11 until T06, T07, T08,
|
||||
T09, and T10 are complete.
|
||||
**Current focus (2026-07-04):** T10 tier-2 image pilot; parallel T09 backup
|
||||
drill and T02 open decisions (SMTP, backup target). Do not start T11
|
||||
`state-hub` until T09 complete and `CUST-WP-0054` Wave-1 gates satisfied.
|
||||
|
||||
**Absorbed by `CUST-WP-0054-T04`:** forge + CI on railiance01; workstation
|
||||
build retirement; staged repo promotion before State Hub primary move (T05).
|
||||
|
||||
## railiance-bootstrap Note
|
||||
|
||||
@@ -490,7 +536,14 @@ purpose is identified.
|
||||
|
||||
- `RAIL-HO-WP-0004-production-readiness.md`
|
||||
- `RAIL-HO-WP-0003-5repo-stack-restructure.md`
|
||||
- `CUST-WP-0054-workstation-independence-and-fleet-realignment.md` (T04 forge+CI)
|
||||
- `CUST-WP-0014-repo-sync-automation.md`
|
||||
- `CUST-WP-0021-multi-host-repo-paths.md`
|
||||
- `docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md`
|
||||
- `docs/forgejo-migration-inventory.md`
|
||||
- `the-custodian/docs/forgejo-production-decisions.md`
|
||||
- `the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md`
|
||||
- `railiance-apps/docs/forgejo-on-railiance01.md`
|
||||
- `railiance-forge/docs/forgejo-actions-runner-substrate.md`
|
||||
- `ops/incidents/2026-03-25-gitea-pgpool-crashloop.md`
|
||||
- `ops/incidents/2026-03-26-coulombcore-runaway-agent-overload.md`
|
||||
|
||||
Reference in New Issue
Block a user