Documents the three-machine role model, fleet mesh topology, coulombcore freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel install assets and refreshes ops service inventory to reflect 2026-07-03 production placement (cluster State Hub, fleet mesh, draining coulombcore).
12 KiB
CoulombCore Drain and Production Placement Plan
Date: 2026-07-03
Workplan: CUST-WP-0054-T03
Freeze policy: canon/standards/coulombcore-production-freeze_v0.1.md
Architecture: docs/workstation-independence-fleet-architecture.md
Purpose
Ordered drain sequence for every production workload on coulombcore
(92.205.130.254, coulombcore-k3s). Each row names current placement,
target placement, migration method, owner workplan, and prerequisites.
Coupling rule: forge and State Hub move early; identity + OpenBao move last because everything authenticates through them.
Wave overview
Wave 0 Freeze policy (this document + canon) — effective 2026-07-03
Wave 1 Source forge + CI runners ─────────── RAIL-HO-WP-0005 / CUST-WP-0054-T04
Wave 2 State Hub primary + sweep checkouts ── CUST-WP-0054-T05 / CUST-WP-0011
Wave 3 Core Hub production ────────────────── CORE-WP-0005
Wave 4 issue-core ─────────────────────────── ISSUE-WP-0003 + overlay
Wave 5 GitOps control plane (ESO, ArgoCD) ─── railiance-cluster overlays
Wave 6 Application stragglers ─────────────── per-app overlays
Wave 7 OpenBao + identity stack ───────────── NET-WP-0020 + key-cape (LAST)
Wave 8 coulombcore phoenix → railiance02 ─── CUST-WP-0054-T09
Placement register
| # | Workload | Current (2026-07-03) | Target | Method | Owner | Wave | Status |
|---|---|---|---|---|---|---|---|
| 1 | Gitea + OCI registry | coulombcore-k3s default; gitea.coulomb.social |
railiance01 forgejo.coulomb.social |
Staged-promotion S5 overlay; RAIL-HO-WP-0005 probe → production; Gitea → read-only mirror |
RAIL-HO-WP-0005, CUST-WP-0054-T04 |
1 | grandfathered |
| 2 | Forgejo Actions / CI runners | none (workstation manual build) | railiance01 | New S5 overlay; image build on tag push | CUST-WP-0054-T04 |
1 | planned |
| 3 | Gitea DB + PVC | coulombcore databases / gitea-shared-storage |
railiance01 CNPG + PVC | Migrate with Forgejo; backup/restore drill required | RAIL-HO-WP-0005 |
1 | grandfathered |
| 4 | State Hub API (primary) | coulombcore CNPG state-hub-db; cluster Svc 10.43.170.94:8000 |
railiance01 CNPG + Deployment | CUST-WP-0011-T07 playbook: freeze → exact-count restore → rewire; staged-promotion overlay |
CUST-WP-0054-T05, CUST-WP-0011 |
2 | grandfathered |
| 5 | State Hub sweep checkouts | workstation /home/worsch/* (74 repos) |
railiance01 clone tree from forge | Relocate host_paths / local_path; no workstation writeback |
CUST-WP-0054-T05, STATE-WP-0064 |
2 | planned |
| 6 | WSL2 State Hub fallback | workstation WSL2 | retired | Stop after railiance01 primary stabilizes | CUST-WP-0011-T08/T09, CUST-WP-0054-T10 |
2 | grandfathered |
| 7 | Core Hub | coulombcore core-hub-staging; public hub.coulomb.social |
railiance01 | Staged-promotion overlay; dual-run prerequisite (CORE-WP-0005-T04) |
CORE-WP-0005 |
3 | grandfathered |
| 8 | Inter-Hub (Haskell) | coulombcore external | retired | Rollback-only after Core Hub cutover | CORE-WP-0007 |
3 | grandfathered |
| 9 | issue-core | coulombcore issue-core ns; ClusterIP 10.43.103.154:8765 |
railiance01 | Staged-promotion overlay; shorten fleet tunnel to local svc | ISSUE-WP-0003, CUST-WP-0054-T03 |
4 | grandfathered |
| 10 | issue-core CNPG | coulombcore | railiance01 | Migrate with issue-core workload | railiance-platform |
4 | grandfathered |
| 11 | External Secrets Operator | coulombcore | railiance01 | GitOps follows forge; ESO stores point at railiance01 OpenBao post-Wave 7 or interim coulombcore path documented | railiance-platform |
5 | grandfathered |
| 12 | ArgoCD | coulombcore (boundary: should be S4) | railiance01 | Staged-promotion; repoint repo URLs to Forgejo | railiance-cluster |
5 | grandfathered |
| 13 | llm-connect | railiance01 activity-core ns (partial) |
railiance01 | Already on target machine; complete in-cluster profile | CCR-2026-0003 lane |
6 | observed |
| 14 | activity-core | railiance01 activity-core ns |
railiance01 (retain) | No move; update sinks (T06) and hub URL post-Wave 2 | — | — | on target |
| 15 | Temporal / NATS | railiance01 | railiance01 (retain) | Co-located with activity-core | — | — | on target |
| 16 | ops-hub evidence / widgets | files + Core Hub path | railiance01 via Core Hub | Follows Core Hub; not coulombcore-blocking | CUST-WP-0025, CUST-WP-0049 |
6 | planned |
| 17 | artifact-store / MinIO lane | assessment only | railiance01 or compatible endpoint | Compatibility-profile per ARTIFACT-STORE-WP-0007 |
ARTIFACT-STORE-WP-0007 |
6 | planned |
| 18 | OpenBao | coulombcore | railiance01 | Last infrastructure wave; NET-WP-0020 unseal automation; CNPG + seal migration |
NET-WP-0020, railiance-platform |
7 | grandfathered |
| 19 | KeyCape | coulombcore | railiance01 | Follows OpenBao; OIDC/MFA paths | key-cape |
7 | grandfathered |
| 20 | Authelia | coulombcore | railiance01 | Identity front door | key-cape / railiance-platform |
7 | grandfathered |
| 21 | privacyIDEA | coulombcore | railiance01 | MFA backend | key-cape |
7 | grandfathered |
| 22 | lldap | coulombcore | railiance01 | LDAP directory | key-cape / railiance-platform |
7 | grandfathered |
| 23 | flex-auth | coulombcore | railiance01 | Policy registry follows identity | flex-auth |
7 | grandfathered |
| 24 | Fleet mesh transit tunnels | railiance01 systemd → coulombcore ClusterIPs | railiance01-local services | Retire when Waves 2+4 complete (hub + issue-core local) | CUST-WP-0054-T02 |
2–4 | interim active |
| 25 | CNPG operator | coulombcore (boundary note) | railiance01 | Platform operator moves with Wave 2+ workloads | railiance-platform |
2–7 | grandfathered |
| 26 | coulombcore host identity | coulombcore | railiance02 | Machine phoenix after Wave 7 | CUST-WP-0054-T09, CUST-WP-0054-T08 |
8 | wait |
Per-wave detail
Wave 1 — Source forge + CI (unblocks repos and images)
Goal: All repos and container images publish from railiance01; coulombcore Gitea becomes read-only mirror.
| Step | Action | Done when |
|---|---|---|
| 1.1 | Resolve RAIL-HO-WP-0005-T02 production decisions (hostname decided: forgejo.coulomb.social; SMTP, runners, backup still open) |
docs/forgejo-production-decisions.md |
| 1.2 | Disposable Forgejo probe namespace + restore drill | Backup/restore evidence id recorded |
| 1.3 | Production Forgejo cutover | All 74 repo remotes point at Forgejo; push/pull verified |
| 1.4 | Actions runners for state-hub, core-hub, activity-core, issue-core |
Tag-triggered image lands in forge OCI |
| 1.5 | Gitea → read-only mirror on coulombcore | Rollback window documented; no new writes |
Blocks: Wave 2 sweep checkouts (needs forge clones on railiance01).
Wave 2 — State Hub home on railiance01
Goal: Automation loop machine-local; consistency sweeps write back to railiance01 checkouts, not workstation paths.
| Step | Action | Done when |
|---|---|---|
| 2.1 | CNPG + storage review on railiance01 | Platform sign-off |
| 2.2 | CUST-WP-0011-T07 cutover to railiance01 primary |
Row counts match; 127.0.0.1:8000 serves railiance01 hub |
| 2.3 | Clone/register 74 repos on railiance01 from Forgejo | fix-consistency writebacks use railiance01 paths |
| 2.4 | Retire fleet tunnel fleet-state-hub-coulombcore |
activity-core reaches hub without coulombcore hop |
| 2.5 | WSL2 fallback retirement (optional, after stabilization) | CUST-WP-0011-T08/T09 |
Prereq: Wave 1 forge (clone source).
Wave 3 — Core Hub production
Goal: hub.coulomb.social served from railiance01 Core Hub.
| Step | Action | Done when |
|---|---|---|
| 3.1 | Close CORE-WP-0005-T04 prerequisites (widget types, auth posture) |
Catalog gap resolved |
| 3.2 | Operator-approved cutover with rollback plan | Deployed smoke + activity-core sink green |
| 3.3 | Inter-Hub marked rollback-only | CORE-WP-0007 unblocks |
Prereq: Wave 1 (images via forge CI).
Wave 4 — issue-core
Goal: Emission path is railiance01-local; no coulombcore ClusterIP in path.
| Step | Action | Done when |
|---|---|---|
| 4.1 | Staged-promotion overlay on railiance01 | ArgoCD sync healthy |
| 4.2 | Migrate CNPG + secrets | ExternalSecret Ready |
| 4.3 | Point ISSUE_CORE_URL at in-cluster svc |
Retire fleet-issue-core-coulombcore tunnel |
| 4.4 | Safe emission smoke | HTTP 201 + Gitea/Forgejo issue created |
Prereq: Wave 1 (image + gitops); credential lane CCR-2026-0002 active.
Wave 5 — GitOps control plane
Goal: ArgoCD and ESO run on railiance01 and track Forgejo repos.
| Step | Action | Done when |
|---|---|---|
| 5.1 | ArgoCD overlay on railiance01 | Sync from Forgejo remotes |
| 5.2 | ESO → SecretStore paths updated | Workloads on railiance01 pull secrets |
| 5.3 | Decommission coulombcore ArgoCD Applications | No new syncs to coulombcore-k3s |
Prereq: Waves 1–2 (forge URLs, hub coordination).
Wave 6 — Application stragglers
Low-coupling apps and evidence lanes that do not block earlier waves:
- llm-connect production profile completion
- ops-hub widget evidence via Core Hub
- artifact-store compatibility endpoint (if approved)
Each uses staged-promotion unless listed under Documented exceptions.
Wave 7 — OpenBao + identity (LAST)
Goal: Authentication and secret custody off coulombcore.
| Step | Action | Done when |
|---|---|---|
| 7.1 | OpenBao staged-promotion to railiance01 | Unseal automation (NET-WP-0020) proven |
| 7.2 | KeyCape / Authelia / privacyIDEA / lldap migration | OIDC login smoke on railiance01 |
| 7.3 | flex-auth registry points at new identity endpoints | Credential lanes re-pointed |
| 7.4 | CCR/applier paths verified | No production secret reads from coulombcore OpenBao |
Gate: CUST-WP-0054-T09 cannot start until Wave 7 completes.
Wave 8 — Phoenix to railiance02
Execute CUST-WP-0054-T09 via T08 automation: wipe coulombcore, rebuild as
railiance02, join fleet. DNS/cert plan for remaining *.coulomb.social names.
Documented exceptions
| Workload | Reason | Target date | Rollback | Approval |
|---|---|---|---|---|
| Fleet mesh systemd tunnels | Wave 2/4 not complete; railiance01 reaches coulombcore ClusterIPs | Until Waves 2+4 done | Re-enable workstation reverse tunnels per docs/fleet-mesh-dehub-runbook.md |
CUST-WP-0054-T02 cutover 2026-07-03 |
| Core Hub staging on coulombcore | Pre-cutover smoke environment | Until Wave 3 cutover | Keep staging namespace | CORE-WP-0005 |
Static id_ops SSH key on railiance01 fleet units |
atm-fleet-mesh cert_command blocked on VAULT_TOKEN |
Until warden sign available | ops-bridge or rotated key | CUST-WP-0054-T02 interim |
No other exceptions as of 2026-07-03. New exceptions require a State Hub decision or workplan amendment.
Staged-promotion method (default)
Per RAIL-BS-WP-0006 (finished):
railiance/<app>/app.toml+ overlay in owning repo- Stage 1 deploy → observe → promote with evidence
- Backup/restore drill before production promotion
- Rollback revision documented
Apps without overlays yet must get an overlay scaffold before Wave execution.
Inventory sync
ops/service-inventory.yml updated 2026-07-03 for:
- coulombcore
lifecycle_state: drainingon grandfathered production services - State Hub primary on coulombcore cluster (not workstation)
- railiance01 fleet-mesh and activity-core placement
- ops-bridge on railiance01 via systemd (not workstation hub)
Regenerate catalog view: make ops-inventory-view
Human gates (not agent-executable)
| Gate | Owner | Blocks |
|---|---|---|
| Forgejo T02 production decisions | operator | Wave 1 |
| State Hub railiance01 cutover approval | operator; CUST-WP-0011-T07 |
Wave 2 |
| Core Hub production cutover | operator; CORE-WP-0005-T04 |
Wave 3 |
| OpenBao/identity migration approval | operator + custody | Wave 7 |
| coulombcore phoenix approval | operator | Wave 8 |