Documents the three-machine role model, fleet mesh topology, coulombcore freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel install assets and refreshes ops service inventory to reflect 2026-07-03 production placement (cluster State Hub, fleet mesh, draining coulombcore).
200 lines
12 KiB
Markdown
200 lines
12 KiB
Markdown
# CoulombCore Drain and Production Placement Plan
|
||
|
||
Date: 2026-07-03
|
||
Workplan: `CUST-WP-0054-T03`
|
||
Freeze policy: `canon/standards/coulombcore-production-freeze_v0.1.md`
|
||
Architecture: `docs/workstation-independence-fleet-architecture.md`
|
||
|
||
## Purpose
|
||
|
||
Ordered drain sequence for every production workload on coulombcore
|
||
(`92.205.130.254`, `coulombcore-k3s`). Each row names current placement,
|
||
target placement, migration method, owner workplan, and prerequisites.
|
||
|
||
**Coupling rule:** forge and State Hub move early; identity + OpenBao move
|
||
last because everything authenticates through them.
|
||
|
||
## Wave overview
|
||
|
||
```
|
||
Wave 0 Freeze policy (this document + canon) — effective 2026-07-03
|
||
Wave 1 Source forge + CI runners ─────────── RAIL-HO-WP-0005 / CUST-WP-0054-T04
|
||
Wave 2 State Hub primary + sweep checkouts ── CUST-WP-0054-T05 / CUST-WP-0011
|
||
Wave 3 Core Hub production ────────────────── CORE-WP-0005
|
||
Wave 4 issue-core ─────────────────────────── ISSUE-WP-0003 + overlay
|
||
Wave 5 GitOps control plane (ESO, ArgoCD) ─── railiance-cluster overlays
|
||
Wave 6 Application stragglers ─────────────── per-app overlays
|
||
Wave 7 OpenBao + identity stack ───────────── NET-WP-0020 + key-cape (LAST)
|
||
Wave 8 coulombcore phoenix → railiance02 ─── CUST-WP-0054-T09
|
||
```
|
||
|
||
## Placement register
|
||
|
||
| # | Workload | Current (2026-07-03) | Target | Method | Owner | Wave | Status |
|
||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||
| 1 | **Gitea + OCI registry** | coulombcore-k3s `default`; `gitea.coulomb.social` | railiance01 **`forgejo.coulomb.social`** | Staged-promotion S5 overlay; `RAIL-HO-WP-0005` probe → production; Gitea → read-only mirror | `RAIL-HO-WP-0005`, `CUST-WP-0054-T04` | 1 | grandfathered |
|
||
| 2 | **Forgejo Actions / CI runners** | none (workstation manual build) | railiance01 | New S5 overlay; image build on tag push | `CUST-WP-0054-T04` | 1 | planned |
|
||
| 3 | **Gitea DB + PVC** | coulombcore `databases` / `gitea-shared-storage` | railiance01 CNPG + PVC | Migrate with Forgejo; backup/restore drill required | `RAIL-HO-WP-0005` | 1 | grandfathered |
|
||
| 4 | **State Hub API (primary)** | coulombcore CNPG `state-hub-db`; cluster Svc `10.43.170.94:8000` | railiance01 CNPG + Deployment | `CUST-WP-0011-T07` playbook: freeze → exact-count restore → rewire; staged-promotion overlay | `CUST-WP-0054-T05`, `CUST-WP-0011` | 2 | grandfathered |
|
||
| 5 | **State Hub sweep checkouts** | workstation `/home/worsch/*` (74 repos) | railiance01 clone tree from forge | Relocate `host_paths` / `local_path`; no workstation writeback | `CUST-WP-0054-T05`, `STATE-WP-0064` | 2 | planned |
|
||
| 6 | **WSL2 State Hub fallback** | workstation WSL2 | retired | Stop after railiance01 primary stabilizes | `CUST-WP-0011-T08/T09`, `CUST-WP-0054-T10` | 2 | grandfathered |
|
||
| 7 | **Core Hub** | coulombcore `core-hub-staging`; public `hub.coulomb.social` | railiance01 | Staged-promotion overlay; dual-run prerequisite (`CORE-WP-0005-T04`) | `CORE-WP-0005` | 3 | grandfathered |
|
||
| 8 | **Inter-Hub (Haskell)** | coulombcore external | retired | Rollback-only after Core Hub cutover | `CORE-WP-0007` | 3 | grandfathered |
|
||
| 9 | **issue-core** | coulombcore `issue-core` ns; ClusterIP `10.43.103.154:8765` | railiance01 | Staged-promotion overlay; shorten fleet tunnel to local svc | `ISSUE-WP-0003`, `CUST-WP-0054-T03` | 4 | grandfathered |
|
||
| 10 | **issue-core CNPG** | coulombcore | railiance01 | Migrate with issue-core workload | `railiance-platform` | 4 | grandfathered |
|
||
| 11 | **External Secrets Operator** | coulombcore | railiance01 | GitOps follows forge; ESO stores point at railiance01 OpenBao post-Wave 7 or interim coulombcore path documented | `railiance-platform` | 5 | grandfathered |
|
||
| 12 | **ArgoCD** | coulombcore (boundary: should be S4) | railiance01 | Staged-promotion; repoint repo URLs to Forgejo | `railiance-cluster` | 5 | grandfathered |
|
||
| 13 | **llm-connect** | railiance01 `activity-core` ns (partial) | railiance01 | Already on target machine; complete in-cluster profile | `CCR-2026-0003` lane | 6 | observed |
|
||
| 14 | **activity-core** | railiance01 `activity-core` ns | railiance01 (retain) | No move; update sinks (T06) and hub URL post-Wave 2 | — | — | **on target** |
|
||
| 15 | **Temporal / NATS** | railiance01 | railiance01 (retain) | Co-located with activity-core | — | — | **on target** |
|
||
| 16 | **ops-hub evidence / widgets** | files + Core Hub path | railiance01 via Core Hub | Follows Core Hub; not coulombcore-blocking | `CUST-WP-0025`, `CUST-WP-0049` | 6 | planned |
|
||
| 17 | **artifact-store / MinIO lane** | assessment only | railiance01 or compatible endpoint | Compatibility-profile per `ARTIFACT-STORE-WP-0007` | `ARTIFACT-STORE-WP-0007` | 6 | planned |
|
||
| 18 | **OpenBao** | coulombcore | railiance01 | **Last infrastructure wave**; `NET-WP-0020` unseal automation; CNPG + seal migration | `NET-WP-0020`, `railiance-platform` | 7 | grandfathered |
|
||
| 19 | **KeyCape** | coulombcore | railiance01 | Follows OpenBao; OIDC/MFA paths | `key-cape` | 7 | grandfathered |
|
||
| 20 | **Authelia** | coulombcore | railiance01 | Identity front door | `key-cape` / `railiance-platform` | 7 | grandfathered |
|
||
| 21 | **privacyIDEA** | coulombcore | railiance01 | MFA backend | `key-cape` | 7 | grandfathered |
|
||
| 22 | **lldap** | coulombcore | railiance01 | LDAP directory | `key-cape` / `railiance-platform` | 7 | grandfathered |
|
||
| 23 | **flex-auth** | coulombcore | railiance01 | Policy registry follows identity | `flex-auth` | 7 | grandfathered |
|
||
| 24 | **Fleet mesh transit tunnels** | railiance01 systemd → coulombcore ClusterIPs | railiance01-local services | Retire when Waves 2+4 complete (hub + issue-core local) | `CUST-WP-0054-T02` | 2–4 | **interim active** |
|
||
| 25 | **CNPG operator** | coulombcore (boundary note) | railiance01 | Platform operator moves with Wave 2+ workloads | `railiance-platform` | 2–7 | grandfathered |
|
||
| 26 | **coulombcore host identity** | coulombcore | railiance02 | Machine phoenix after Wave 7 | `CUST-WP-0054-T09`, `CUST-WP-0054-T08` | 8 | wait |
|
||
|
||
## Per-wave detail
|
||
|
||
### Wave 1 — Source forge + CI (unblocks repos and images)
|
||
|
||
**Goal:** All repos and container images publish from railiance01; coulombcore
|
||
Gitea becomes read-only mirror.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 1.1 | Resolve `RAIL-HO-WP-0005-T02` production decisions (hostname **decided:** `forgejo.coulomb.social`; SMTP, runners, backup still open) | `docs/forgejo-production-decisions.md` |
|
||
| 1.2 | Disposable Forgejo probe namespace + restore drill | Backup/restore evidence id recorded |
|
||
| 1.3 | Production Forgejo cutover | All 74 repo remotes point at Forgejo; push/pull verified |
|
||
| 1.4 | Actions runners for `state-hub`, `core-hub`, `activity-core`, `issue-core` | Tag-triggered image lands in forge OCI |
|
||
| 1.5 | Gitea → read-only mirror on coulombcore | Rollback window documented; no new writes |
|
||
|
||
**Blocks:** Wave 2 sweep checkouts (needs forge clones on railiance01).
|
||
|
||
### Wave 2 — State Hub home on railiance01
|
||
|
||
**Goal:** Automation loop machine-local; consistency sweeps write back to
|
||
railiance01 checkouts, not workstation paths.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 2.1 | CNPG + storage review on railiance01 | Platform sign-off |
|
||
| 2.2 | `CUST-WP-0011-T07` cutover to railiance01 primary | Row counts match; `127.0.0.1:8000` serves railiance01 hub |
|
||
| 2.3 | Clone/register 74 repos on railiance01 from Forgejo | `fix-consistency` writebacks use railiance01 paths |
|
||
| 2.4 | Retire fleet tunnel `fleet-state-hub-coulombcore` | activity-core reaches hub without coulombcore hop |
|
||
| 2.5 | WSL2 fallback retirement (optional, after stabilization) | `CUST-WP-0011-T08/T09` |
|
||
|
||
**Prereq:** Wave 1 forge (clone source).
|
||
|
||
### Wave 3 — Core Hub production
|
||
|
||
**Goal:** `hub.coulomb.social` served from railiance01 Core Hub.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 3.1 | Close `CORE-WP-0005-T04` prerequisites (widget types, auth posture) | Catalog gap resolved |
|
||
| 3.2 | Operator-approved cutover with rollback plan | Deployed smoke + activity-core sink green |
|
||
| 3.3 | Inter-Hub marked rollback-only | `CORE-WP-0007` unblocks |
|
||
|
||
**Prereq:** Wave 1 (images via forge CI).
|
||
|
||
### Wave 4 — issue-core
|
||
|
||
**Goal:** Emission path is railiance01-local; no coulombcore ClusterIP in path.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 4.1 | Staged-promotion overlay on railiance01 | ArgoCD sync healthy |
|
||
| 4.2 | Migrate CNPG + secrets | ExternalSecret Ready |
|
||
| 4.3 | Point `ISSUE_CORE_URL` at in-cluster svc | Retire `fleet-issue-core-coulombcore` tunnel |
|
||
| 4.4 | Safe emission smoke | HTTP 201 + Gitea/Forgejo issue created |
|
||
|
||
**Prereq:** Wave 1 (image + gitops); credential lane `CCR-2026-0002` active.
|
||
|
||
### Wave 5 — GitOps control plane
|
||
|
||
**Goal:** ArgoCD and ESO run on railiance01 and track Forgejo repos.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 5.1 | ArgoCD overlay on railiance01 | Sync from Forgejo remotes |
|
||
| 5.2 | ESO → SecretStore paths updated | Workloads on railiance01 pull secrets |
|
||
| 5.3 | Decommission coulombcore ArgoCD Applications | No new syncs to coulombcore-k3s |
|
||
|
||
**Prereq:** Waves 1–2 (forge URLs, hub coordination).
|
||
|
||
### Wave 6 — Application stragglers
|
||
|
||
Low-coupling apps and evidence lanes that do not block earlier waves:
|
||
|
||
- llm-connect production profile completion
|
||
- ops-hub widget evidence via Core Hub
|
||
- artifact-store compatibility endpoint (if approved)
|
||
|
||
Each uses staged-promotion unless listed under **Documented exceptions**.
|
||
|
||
### Wave 7 — OpenBao + identity (LAST)
|
||
|
||
**Goal:** Authentication and secret custody off coulombcore.
|
||
|
||
| Step | Action | Done when |
|
||
| --- | --- | --- |
|
||
| 7.1 | OpenBao staged-promotion to railiance01 | Unseal automation (`NET-WP-0020`) proven |
|
||
| 7.2 | KeyCape / Authelia / privacyIDEA / lldap migration | OIDC login smoke on railiance01 |
|
||
| 7.3 | flex-auth registry points at new identity endpoints | Credential lanes re-pointed |
|
||
| 7.4 | CCR/applier paths verified | No production secret reads from coulombcore OpenBao |
|
||
|
||
**Gate:** `CUST-WP-0054-T09` cannot start until Wave 7 completes.
|
||
|
||
### Wave 8 — Phoenix to railiance02
|
||
|
||
Execute `CUST-WP-0054-T09` via T08 automation: wipe coulombcore, rebuild as
|
||
railiance02, join fleet. DNS/cert plan for remaining `*.coulomb.social` names.
|
||
|
||
## Documented exceptions
|
||
|
||
| Workload | Reason | Target date | Rollback | Approval |
|
||
| --- | --- | --- | --- | --- |
|
||
| Fleet mesh systemd tunnels | Wave 2/4 not complete; railiance01 reaches coulombcore ClusterIPs | Until Waves 2+4 done | Re-enable workstation reverse tunnels per `docs/fleet-mesh-dehub-runbook.md` | `CUST-WP-0054-T02` cutover 2026-07-03 |
|
||
| Core Hub staging on coulombcore | Pre-cutover smoke environment | Until Wave 3 cutover | Keep staging namespace | `CORE-WP-0005` |
|
||
| Static `id_ops` SSH key on railiance01 fleet units | `atm-fleet-mesh` cert_command blocked on VAULT_TOKEN | Until warden sign available | ops-bridge or rotated key | `CUST-WP-0054-T02` interim |
|
||
|
||
No other exceptions as of 2026-07-03. New exceptions require a State Hub
|
||
decision or workplan amendment.
|
||
|
||
## Staged-promotion method (default)
|
||
|
||
Per `RAIL-BS-WP-0006` (finished):
|
||
|
||
1. `railiance/<app>/app.toml` + overlay in owning repo
|
||
2. Stage 1 deploy → observe → promote with evidence
|
||
3. Backup/restore drill before production promotion
|
||
4. Rollback revision documented
|
||
|
||
Apps without overlays yet must get an overlay scaffold before Wave execution.
|
||
|
||
## Inventory sync
|
||
|
||
`ops/service-inventory.yml` updated 2026-07-03 for:
|
||
|
||
- coulombcore `lifecycle_state: draining` on grandfathered production services
|
||
- State Hub primary on coulombcore cluster (not workstation)
|
||
- railiance01 fleet-mesh and activity-core placement
|
||
- ops-bridge on railiance01 via systemd (not workstation hub)
|
||
|
||
Regenerate catalog view: `make ops-inventory-view`
|
||
|
||
## Human gates (not agent-executable)
|
||
|
||
| Gate | Owner | Blocks |
|
||
| --- | --- | --- |
|
||
| Forgejo T02 production decisions | operator | Wave 1 |
|
||
| State Hub railiance01 cutover approval | operator; `CUST-WP-0011-T07` | Wave 2 |
|
||
| Core Hub production cutover | operator; `CORE-WP-0005-T04` | Wave 3 |
|
||
| OpenBao/identity migration approval | operator + custody | Wave 7 |
|
||
| coulombcore phoenix approval | operator | Wave 8 | |