Files
the-custodian/docs/coulombcore-drain-placement-plan.md
codex cf4be716e1 CUST-WP-0054 T01-T03: fleet architecture, de-hub runbook, drain plan
Documents the three-machine role model, fleet mesh topology, coulombcore
freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel
install assets and refreshes ops service inventory to reflect 2026-07-03
production placement (cluster State Hub, fleet mesh, draining coulombcore).
2026-07-04 00:29:55 +02:00

12 KiB
Raw Blame History

CoulombCore Drain and Production Placement Plan

Date: 2026-07-03
Workplan: CUST-WP-0054-T03
Freeze policy: canon/standards/coulombcore-production-freeze_v0.1.md
Architecture: docs/workstation-independence-fleet-architecture.md

Purpose

Ordered drain sequence for every production workload on coulombcore (92.205.130.254, coulombcore-k3s). Each row names current placement, target placement, migration method, owner workplan, and prerequisites.

Coupling rule: forge and State Hub move early; identity + OpenBao move last because everything authenticates through them.

Wave overview

Wave 0  Freeze policy (this document + canon) — effective 2026-07-03
Wave 1  Source forge + CI runners ─────────── RAIL-HO-WP-0005 / CUST-WP-0054-T04
Wave 2  State Hub primary + sweep checkouts ── CUST-WP-0054-T05 / CUST-WP-0011
Wave 3  Core Hub production ────────────────── CORE-WP-0005
Wave 4  issue-core ─────────────────────────── ISSUE-WP-0003 + overlay
Wave 5  GitOps control plane (ESO, ArgoCD) ─── railiance-cluster overlays
Wave 6  Application stragglers ─────────────── per-app overlays
Wave 7  OpenBao + identity stack ───────────── NET-WP-0020 + key-cape (LAST)
Wave 8  coulombcore phoenix → railiance02 ─── CUST-WP-0054-T09

Placement register

# Workload Current (2026-07-03) Target Method Owner Wave Status
1 Gitea + OCI registry coulombcore-k3s default; gitea.coulomb.social railiance01 forgejo.coulomb.social Staged-promotion S5 overlay; RAIL-HO-WP-0005 probe → production; Gitea → read-only mirror RAIL-HO-WP-0005, CUST-WP-0054-T04 1 grandfathered
2 Forgejo Actions / CI runners none (workstation manual build) railiance01 New S5 overlay; image build on tag push CUST-WP-0054-T04 1 planned
3 Gitea DB + PVC coulombcore databases / gitea-shared-storage railiance01 CNPG + PVC Migrate with Forgejo; backup/restore drill required RAIL-HO-WP-0005 1 grandfathered
4 State Hub API (primary) coulombcore CNPG state-hub-db; cluster Svc 10.43.170.94:8000 railiance01 CNPG + Deployment CUST-WP-0011-T07 playbook: freeze → exact-count restore → rewire; staged-promotion overlay CUST-WP-0054-T05, CUST-WP-0011 2 grandfathered
5 State Hub sweep checkouts workstation /home/worsch/* (74 repos) railiance01 clone tree from forge Relocate host_paths / local_path; no workstation writeback CUST-WP-0054-T05, STATE-WP-0064 2 planned
6 WSL2 State Hub fallback workstation WSL2 retired Stop after railiance01 primary stabilizes CUST-WP-0011-T08/T09, CUST-WP-0054-T10 2 grandfathered
7 Core Hub coulombcore core-hub-staging; public hub.coulomb.social railiance01 Staged-promotion overlay; dual-run prerequisite (CORE-WP-0005-T04) CORE-WP-0005 3 grandfathered
8 Inter-Hub (Haskell) coulombcore external retired Rollback-only after Core Hub cutover CORE-WP-0007 3 grandfathered
9 issue-core coulombcore issue-core ns; ClusterIP 10.43.103.154:8765 railiance01 Staged-promotion overlay; shorten fleet tunnel to local svc ISSUE-WP-0003, CUST-WP-0054-T03 4 grandfathered
10 issue-core CNPG coulombcore railiance01 Migrate with issue-core workload railiance-platform 4 grandfathered
11 External Secrets Operator coulombcore railiance01 GitOps follows forge; ESO stores point at railiance01 OpenBao post-Wave 7 or interim coulombcore path documented railiance-platform 5 grandfathered
12 ArgoCD coulombcore (boundary: should be S4) railiance01 Staged-promotion; repoint repo URLs to Forgejo railiance-cluster 5 grandfathered
13 llm-connect railiance01 activity-core ns (partial) railiance01 Already on target machine; complete in-cluster profile CCR-2026-0003 lane 6 observed
14 activity-core railiance01 activity-core ns railiance01 (retain) No move; update sinks (T06) and hub URL post-Wave 2 on target
15 Temporal / NATS railiance01 railiance01 (retain) Co-located with activity-core on target
16 ops-hub evidence / widgets files + Core Hub path railiance01 via Core Hub Follows Core Hub; not coulombcore-blocking CUST-WP-0025, CUST-WP-0049 6 planned
17 artifact-store / MinIO lane assessment only railiance01 or compatible endpoint Compatibility-profile per ARTIFACT-STORE-WP-0007 ARTIFACT-STORE-WP-0007 6 planned
18 OpenBao coulombcore railiance01 Last infrastructure wave; NET-WP-0020 unseal automation; CNPG + seal migration NET-WP-0020, railiance-platform 7 grandfathered
19 KeyCape coulombcore railiance01 Follows OpenBao; OIDC/MFA paths key-cape 7 grandfathered
20 Authelia coulombcore railiance01 Identity front door key-cape / railiance-platform 7 grandfathered
21 privacyIDEA coulombcore railiance01 MFA backend key-cape 7 grandfathered
22 lldap coulombcore railiance01 LDAP directory key-cape / railiance-platform 7 grandfathered
23 flex-auth coulombcore railiance01 Policy registry follows identity flex-auth 7 grandfathered
24 Fleet mesh transit tunnels railiance01 systemd → coulombcore ClusterIPs railiance01-local services Retire when Waves 2+4 complete (hub + issue-core local) CUST-WP-0054-T02 24 interim active
25 CNPG operator coulombcore (boundary note) railiance01 Platform operator moves with Wave 2+ workloads railiance-platform 27 grandfathered
26 coulombcore host identity coulombcore railiance02 Machine phoenix after Wave 7 CUST-WP-0054-T09, CUST-WP-0054-T08 8 wait

Per-wave detail

Wave 1 — Source forge + CI (unblocks repos and images)

Goal: All repos and container images publish from railiance01; coulombcore Gitea becomes read-only mirror.

Step Action Done when
1.1 Resolve RAIL-HO-WP-0005-T02 production decisions (hostname decided: forgejo.coulomb.social; SMTP, runners, backup still open) docs/forgejo-production-decisions.md
1.2 Disposable Forgejo probe namespace + restore drill Backup/restore evidence id recorded
1.3 Production Forgejo cutover All 74 repo remotes point at Forgejo; push/pull verified
1.4 Actions runners for state-hub, core-hub, activity-core, issue-core Tag-triggered image lands in forge OCI
1.5 Gitea → read-only mirror on coulombcore Rollback window documented; no new writes

Blocks: Wave 2 sweep checkouts (needs forge clones on railiance01).

Wave 2 — State Hub home on railiance01

Goal: Automation loop machine-local; consistency sweeps write back to railiance01 checkouts, not workstation paths.

Step Action Done when
2.1 CNPG + storage review on railiance01 Platform sign-off
2.2 CUST-WP-0011-T07 cutover to railiance01 primary Row counts match; 127.0.0.1:8000 serves railiance01 hub
2.3 Clone/register 74 repos on railiance01 from Forgejo fix-consistency writebacks use railiance01 paths
2.4 Retire fleet tunnel fleet-state-hub-coulombcore activity-core reaches hub without coulombcore hop
2.5 WSL2 fallback retirement (optional, after stabilization) CUST-WP-0011-T08/T09

Prereq: Wave 1 forge (clone source).

Wave 3 — Core Hub production

Goal: hub.coulomb.social served from railiance01 Core Hub.

Step Action Done when
3.1 Close CORE-WP-0005-T04 prerequisites (widget types, auth posture) Catalog gap resolved
3.2 Operator-approved cutover with rollback plan Deployed smoke + activity-core sink green
3.3 Inter-Hub marked rollback-only CORE-WP-0007 unblocks

Prereq: Wave 1 (images via forge CI).

Wave 4 — issue-core

Goal: Emission path is railiance01-local; no coulombcore ClusterIP in path.

Step Action Done when
4.1 Staged-promotion overlay on railiance01 ArgoCD sync healthy
4.2 Migrate CNPG + secrets ExternalSecret Ready
4.3 Point ISSUE_CORE_URL at in-cluster svc Retire fleet-issue-core-coulombcore tunnel
4.4 Safe emission smoke HTTP 201 + Gitea/Forgejo issue created

Prereq: Wave 1 (image + gitops); credential lane CCR-2026-0002 active.

Wave 5 — GitOps control plane

Goal: ArgoCD and ESO run on railiance01 and track Forgejo repos.

Step Action Done when
5.1 ArgoCD overlay on railiance01 Sync from Forgejo remotes
5.2 ESO → SecretStore paths updated Workloads on railiance01 pull secrets
5.3 Decommission coulombcore ArgoCD Applications No new syncs to coulombcore-k3s

Prereq: Waves 12 (forge URLs, hub coordination).

Wave 6 — Application stragglers

Low-coupling apps and evidence lanes that do not block earlier waves:

  • llm-connect production profile completion
  • ops-hub widget evidence via Core Hub
  • artifact-store compatibility endpoint (if approved)

Each uses staged-promotion unless listed under Documented exceptions.

Wave 7 — OpenBao + identity (LAST)

Goal: Authentication and secret custody off coulombcore.

Step Action Done when
7.1 OpenBao staged-promotion to railiance01 Unseal automation (NET-WP-0020) proven
7.2 KeyCape / Authelia / privacyIDEA / lldap migration OIDC login smoke on railiance01
7.3 flex-auth registry points at new identity endpoints Credential lanes re-pointed
7.4 CCR/applier paths verified No production secret reads from coulombcore OpenBao

Gate: CUST-WP-0054-T09 cannot start until Wave 7 completes.

Wave 8 — Phoenix to railiance02

Execute CUST-WP-0054-T09 via T08 automation: wipe coulombcore, rebuild as railiance02, join fleet. DNS/cert plan for remaining *.coulomb.social names.

Documented exceptions

Workload Reason Target date Rollback Approval
Fleet mesh systemd tunnels Wave 2/4 not complete; railiance01 reaches coulombcore ClusterIPs Until Waves 2+4 done Re-enable workstation reverse tunnels per docs/fleet-mesh-dehub-runbook.md CUST-WP-0054-T02 cutover 2026-07-03
Core Hub staging on coulombcore Pre-cutover smoke environment Until Wave 3 cutover Keep staging namespace CORE-WP-0005
Static id_ops SSH key on railiance01 fleet units atm-fleet-mesh cert_command blocked on VAULT_TOKEN Until warden sign available ops-bridge or rotated key CUST-WP-0054-T02 interim

No other exceptions as of 2026-07-03. New exceptions require a State Hub decision or workplan amendment.

Staged-promotion method (default)

Per RAIL-BS-WP-0006 (finished):

  1. railiance/<app>/app.toml + overlay in owning repo
  2. Stage 1 deploy → observe → promote with evidence
  3. Backup/restore drill before production promotion
  4. Rollback revision documented

Apps without overlays yet must get an overlay scaffold before Wave execution.

Inventory sync

ops/service-inventory.yml updated 2026-07-03 for:

  • coulombcore lifecycle_state: draining on grandfathered production services
  • State Hub primary on coulombcore cluster (not workstation)
  • railiance01 fleet-mesh and activity-core placement
  • ops-bridge on railiance01 via systemd (not workstation hub)

Regenerate catalog view: make ops-inventory-view

Human gates (not agent-executable)

Gate Owner Blocks
Forgejo T02 production decisions operator Wave 1
State Hub railiance01 cutover approval operator; CUST-WP-0011-T07 Wave 2
Core Hub production cutover operator; CORE-WP-0005-T04 Wave 3
OpenBao/identity migration approval operator + custody Wave 7
coulombcore phoenix approval operator Wave 8