CUST-WP-0054 T01-T03: fleet architecture, de-hub runbook, drain plan
Documents the three-machine role model, fleet mesh topology, coulombcore freeze policy, and ordered drain sequence. Adds railiance01 systemd tunnel install assets and refreshes ops service inventory to reflect 2026-07-03 production placement (cluster State Hub, fleet mesh, draining coulombcore).
This commit is contained in:
200
docs/coulombcore-drain-placement-plan.md
Normal file
200
docs/coulombcore-drain-placement-plan.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# CoulombCore Drain and Production Placement Plan
|
||||
|
||||
Date: 2026-07-03
|
||||
Workplan: `CUST-WP-0054-T03`
|
||||
Freeze policy: `canon/standards/coulombcore-production-freeze_v0.1.md`
|
||||
Architecture: `docs/workstation-independence-fleet-architecture.md`
|
||||
|
||||
## Purpose
|
||||
|
||||
Ordered drain sequence for every production workload on coulombcore
|
||||
(`92.205.130.254`, `coulombcore-k3s`). Each row names current placement,
|
||||
target placement, migration method, owner workplan, and prerequisites.
|
||||
|
||||
**Coupling rule:** forge and State Hub move early; identity + OpenBao move
|
||||
last because everything authenticates through them.
|
||||
|
||||
## Wave overview
|
||||
|
||||
```
|
||||
Wave 0 Freeze policy (this document + canon) — effective 2026-07-03
|
||||
Wave 1 Source forge + CI runners ─────────── RAIL-HO-WP-0005 / CUST-WP-0054-T04
|
||||
Wave 2 State Hub primary + sweep checkouts ── CUST-WP-0054-T05 / CUST-WP-0011
|
||||
Wave 3 Core Hub production ────────────────── CORE-WP-0005
|
||||
Wave 4 issue-core ─────────────────────────── ISSUE-WP-0003 + overlay
|
||||
Wave 5 GitOps control plane (ESO, ArgoCD) ─── railiance-cluster overlays
|
||||
Wave 6 Application stragglers ─────────────── per-app overlays
|
||||
Wave 7 OpenBao + identity stack ───────────── NET-WP-0020 + key-cape (LAST)
|
||||
Wave 8 coulombcore phoenix → railiance02 ─── CUST-WP-0054-T09
|
||||
```
|
||||
|
||||
## Placement register
|
||||
|
||||
| # | Workload | Current (2026-07-03) | Target | Method | Owner | Wave | Status |
|
||||
| --- | --- | --- | --- | --- | --- | --- | --- |
|
||||
| 1 | **Gitea + OCI registry** | coulombcore-k3s `default`; `gitea.coulomb.social` | railiance01 **`forgejo.coulomb.social`** | Staged-promotion S5 overlay; `RAIL-HO-WP-0005` probe → production; Gitea → read-only mirror | `RAIL-HO-WP-0005`, `CUST-WP-0054-T04` | 1 | grandfathered |
|
||||
| 2 | **Forgejo Actions / CI runners** | none (workstation manual build) | railiance01 | New S5 overlay; image build on tag push | `CUST-WP-0054-T04` | 1 | planned |
|
||||
| 3 | **Gitea DB + PVC** | coulombcore `databases` / `gitea-shared-storage` | railiance01 CNPG + PVC | Migrate with Forgejo; backup/restore drill required | `RAIL-HO-WP-0005` | 1 | grandfathered |
|
||||
| 4 | **State Hub API (primary)** | coulombcore CNPG `state-hub-db`; cluster Svc `10.43.170.94:8000` | railiance01 CNPG + Deployment | `CUST-WP-0011-T07` playbook: freeze → exact-count restore → rewire; staged-promotion overlay | `CUST-WP-0054-T05`, `CUST-WP-0011` | 2 | grandfathered |
|
||||
| 5 | **State Hub sweep checkouts** | workstation `/home/worsch/*` (74 repos) | railiance01 clone tree from forge | Relocate `host_paths` / `local_path`; no workstation writeback | `CUST-WP-0054-T05`, `STATE-WP-0064` | 2 | planned |
|
||||
| 6 | **WSL2 State Hub fallback** | workstation WSL2 | retired | Stop after railiance01 primary stabilizes | `CUST-WP-0011-T08/T09`, `CUST-WP-0054-T10` | 2 | grandfathered |
|
||||
| 7 | **Core Hub** | coulombcore `core-hub-staging`; public `hub.coulomb.social` | railiance01 | Staged-promotion overlay; dual-run prerequisite (`CORE-WP-0005-T04`) | `CORE-WP-0005` | 3 | grandfathered |
|
||||
| 8 | **Inter-Hub (Haskell)** | coulombcore external | retired | Rollback-only after Core Hub cutover | `CORE-WP-0007` | 3 | grandfathered |
|
||||
| 9 | **issue-core** | coulombcore `issue-core` ns; ClusterIP `10.43.103.154:8765` | railiance01 | Staged-promotion overlay; shorten fleet tunnel to local svc | `ISSUE-WP-0003`, `CUST-WP-0054-T03` | 4 | grandfathered |
|
||||
| 10 | **issue-core CNPG** | coulombcore | railiance01 | Migrate with issue-core workload | `railiance-platform` | 4 | grandfathered |
|
||||
| 11 | **External Secrets Operator** | coulombcore | railiance01 | GitOps follows forge; ESO stores point at railiance01 OpenBao post-Wave 7 or interim coulombcore path documented | `railiance-platform` | 5 | grandfathered |
|
||||
| 12 | **ArgoCD** | coulombcore (boundary: should be S4) | railiance01 | Staged-promotion; repoint repo URLs to Forgejo | `railiance-cluster` | 5 | grandfathered |
|
||||
| 13 | **llm-connect** | railiance01 `activity-core` ns (partial) | railiance01 | Already on target machine; complete in-cluster profile | `CCR-2026-0003` lane | 6 | observed |
|
||||
| 14 | **activity-core** | railiance01 `activity-core` ns | railiance01 (retain) | No move; update sinks (T06) and hub URL post-Wave 2 | — | — | **on target** |
|
||||
| 15 | **Temporal / NATS** | railiance01 | railiance01 (retain) | Co-located with activity-core | — | — | **on target** |
|
||||
| 16 | **ops-hub evidence / widgets** | files + Core Hub path | railiance01 via Core Hub | Follows Core Hub; not coulombcore-blocking | `CUST-WP-0025`, `CUST-WP-0049` | 6 | planned |
|
||||
| 17 | **artifact-store / MinIO lane** | assessment only | railiance01 or compatible endpoint | Compatibility-profile per `ARTIFACT-STORE-WP-0007` | `ARTIFACT-STORE-WP-0007` | 6 | planned |
|
||||
| 18 | **OpenBao** | coulombcore | railiance01 | **Last infrastructure wave**; `NET-WP-0020` unseal automation; CNPG + seal migration | `NET-WP-0020`, `railiance-platform` | 7 | grandfathered |
|
||||
| 19 | **KeyCape** | coulombcore | railiance01 | Follows OpenBao; OIDC/MFA paths | `key-cape` | 7 | grandfathered |
|
||||
| 20 | **Authelia** | coulombcore | railiance01 | Identity front door | `key-cape` / `railiance-platform` | 7 | grandfathered |
|
||||
| 21 | **privacyIDEA** | coulombcore | railiance01 | MFA backend | `key-cape` | 7 | grandfathered |
|
||||
| 22 | **lldap** | coulombcore | railiance01 | LDAP directory | `key-cape` / `railiance-platform` | 7 | grandfathered |
|
||||
| 23 | **flex-auth** | coulombcore | railiance01 | Policy registry follows identity | `flex-auth` | 7 | grandfathered |
|
||||
| 24 | **Fleet mesh transit tunnels** | railiance01 systemd → coulombcore ClusterIPs | railiance01-local services | Retire when Waves 2+4 complete (hub + issue-core local) | `CUST-WP-0054-T02` | 2–4 | **interim active** |
|
||||
| 25 | **CNPG operator** | coulombcore (boundary note) | railiance01 | Platform operator moves with Wave 2+ workloads | `railiance-platform` | 2–7 | grandfathered |
|
||||
| 26 | **coulombcore host identity** | coulombcore | railiance02 | Machine phoenix after Wave 7 | `CUST-WP-0054-T09`, `CUST-WP-0054-T08` | 8 | wait |
|
||||
|
||||
## Per-wave detail
|
||||
|
||||
### Wave 1 — Source forge + CI (unblocks repos and images)
|
||||
|
||||
**Goal:** All repos and container images publish from railiance01; coulombcore
|
||||
Gitea becomes read-only mirror.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 1.1 | Resolve `RAIL-HO-WP-0005-T02` production decisions (hostname **decided:** `forgejo.coulomb.social`; SMTP, runners, backup still open) | `docs/forgejo-production-decisions.md` |
|
||||
| 1.2 | Disposable Forgejo probe namespace + restore drill | Backup/restore evidence id recorded |
|
||||
| 1.3 | Production Forgejo cutover | All 74 repo remotes point at Forgejo; push/pull verified |
|
||||
| 1.4 | Actions runners for `state-hub`, `core-hub`, `activity-core`, `issue-core` | Tag-triggered image lands in forge OCI |
|
||||
| 1.5 | Gitea → read-only mirror on coulombcore | Rollback window documented; no new writes |
|
||||
|
||||
**Blocks:** Wave 2 sweep checkouts (needs forge clones on railiance01).
|
||||
|
||||
### Wave 2 — State Hub home on railiance01
|
||||
|
||||
**Goal:** Automation loop machine-local; consistency sweeps write back to
|
||||
railiance01 checkouts, not workstation paths.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 2.1 | CNPG + storage review on railiance01 | Platform sign-off |
|
||||
| 2.2 | `CUST-WP-0011-T07` cutover to railiance01 primary | Row counts match; `127.0.0.1:8000` serves railiance01 hub |
|
||||
| 2.3 | Clone/register 74 repos on railiance01 from Forgejo | `fix-consistency` writebacks use railiance01 paths |
|
||||
| 2.4 | Retire fleet tunnel `fleet-state-hub-coulombcore` | activity-core reaches hub without coulombcore hop |
|
||||
| 2.5 | WSL2 fallback retirement (optional, after stabilization) | `CUST-WP-0011-T08/T09` |
|
||||
|
||||
**Prereq:** Wave 1 forge (clone source).
|
||||
|
||||
### Wave 3 — Core Hub production
|
||||
|
||||
**Goal:** `hub.coulomb.social` served from railiance01 Core Hub.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 3.1 | Close `CORE-WP-0005-T04` prerequisites (widget types, auth posture) | Catalog gap resolved |
|
||||
| 3.2 | Operator-approved cutover with rollback plan | Deployed smoke + activity-core sink green |
|
||||
| 3.3 | Inter-Hub marked rollback-only | `CORE-WP-0007` unblocks |
|
||||
|
||||
**Prereq:** Wave 1 (images via forge CI).
|
||||
|
||||
### Wave 4 — issue-core
|
||||
|
||||
**Goal:** Emission path is railiance01-local; no coulombcore ClusterIP in path.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 4.1 | Staged-promotion overlay on railiance01 | ArgoCD sync healthy |
|
||||
| 4.2 | Migrate CNPG + secrets | ExternalSecret Ready |
|
||||
| 4.3 | Point `ISSUE_CORE_URL` at in-cluster svc | Retire `fleet-issue-core-coulombcore` tunnel |
|
||||
| 4.4 | Safe emission smoke | HTTP 201 + Gitea/Forgejo issue created |
|
||||
|
||||
**Prereq:** Wave 1 (image + gitops); credential lane `CCR-2026-0002` active.
|
||||
|
||||
### Wave 5 — GitOps control plane
|
||||
|
||||
**Goal:** ArgoCD and ESO run on railiance01 and track Forgejo repos.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 5.1 | ArgoCD overlay on railiance01 | Sync from Forgejo remotes |
|
||||
| 5.2 | ESO → SecretStore paths updated | Workloads on railiance01 pull secrets |
|
||||
| 5.3 | Decommission coulombcore ArgoCD Applications | No new syncs to coulombcore-k3s |
|
||||
|
||||
**Prereq:** Waves 1–2 (forge URLs, hub coordination).
|
||||
|
||||
### Wave 6 — Application stragglers
|
||||
|
||||
Low-coupling apps and evidence lanes that do not block earlier waves:
|
||||
|
||||
- llm-connect production profile completion
|
||||
- ops-hub widget evidence via Core Hub
|
||||
- artifact-store compatibility endpoint (if approved)
|
||||
|
||||
Each uses staged-promotion unless listed under **Documented exceptions**.
|
||||
|
||||
### Wave 7 — OpenBao + identity (LAST)
|
||||
|
||||
**Goal:** Authentication and secret custody off coulombcore.
|
||||
|
||||
| Step | Action | Done when |
|
||||
| --- | --- | --- |
|
||||
| 7.1 | OpenBao staged-promotion to railiance01 | Unseal automation (`NET-WP-0020`) proven |
|
||||
| 7.2 | KeyCape / Authelia / privacyIDEA / lldap migration | OIDC login smoke on railiance01 |
|
||||
| 7.3 | flex-auth registry points at new identity endpoints | Credential lanes re-pointed |
|
||||
| 7.4 | CCR/applier paths verified | No production secret reads from coulombcore OpenBao |
|
||||
|
||||
**Gate:** `CUST-WP-0054-T09` cannot start until Wave 7 completes.
|
||||
|
||||
### Wave 8 — Phoenix to railiance02
|
||||
|
||||
Execute `CUST-WP-0054-T09` via T08 automation: wipe coulombcore, rebuild as
|
||||
railiance02, join fleet. DNS/cert plan for remaining `*.coulomb.social` names.
|
||||
|
||||
## Documented exceptions
|
||||
|
||||
| Workload | Reason | Target date | Rollback | Approval |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Fleet mesh systemd tunnels | Wave 2/4 not complete; railiance01 reaches coulombcore ClusterIPs | Until Waves 2+4 done | Re-enable workstation reverse tunnels per `docs/fleet-mesh-dehub-runbook.md` | `CUST-WP-0054-T02` cutover 2026-07-03 |
|
||||
| Core Hub staging on coulombcore | Pre-cutover smoke environment | Until Wave 3 cutover | Keep staging namespace | `CORE-WP-0005` |
|
||||
| Static `id_ops` SSH key on railiance01 fleet units | `atm-fleet-mesh` cert_command blocked on VAULT_TOKEN | Until warden sign available | ops-bridge or rotated key | `CUST-WP-0054-T02` interim |
|
||||
|
||||
No other exceptions as of 2026-07-03. New exceptions require a State Hub
|
||||
decision or workplan amendment.
|
||||
|
||||
## Staged-promotion method (default)
|
||||
|
||||
Per `RAIL-BS-WP-0006` (finished):
|
||||
|
||||
1. `railiance/<app>/app.toml` + overlay in owning repo
|
||||
2. Stage 1 deploy → observe → promote with evidence
|
||||
3. Backup/restore drill before production promotion
|
||||
4. Rollback revision documented
|
||||
|
||||
Apps without overlays yet must get an overlay scaffold before Wave execution.
|
||||
|
||||
## Inventory sync
|
||||
|
||||
`ops/service-inventory.yml` updated 2026-07-03 for:
|
||||
|
||||
- coulombcore `lifecycle_state: draining` on grandfathered production services
|
||||
- State Hub primary on coulombcore cluster (not workstation)
|
||||
- railiance01 fleet-mesh and activity-core placement
|
||||
- ops-bridge on railiance01 via systemd (not workstation hub)
|
||||
|
||||
Regenerate catalog view: `make ops-inventory-view`
|
||||
|
||||
## Human gates (not agent-executable)
|
||||
|
||||
| Gate | Owner | Blocks |
|
||||
| --- | --- | --- |
|
||||
| Forgejo T02 production decisions | operator | Wave 1 |
|
||||
| State Hub railiance01 cutover approval | operator; `CUST-WP-0011-T07` | Wave 2 |
|
||||
| Core Hub production cutover | operator; `CORE-WP-0005-T04` | Wave 3 |
|
||||
| OpenBao/identity migration approval | operator + custody | Wave 7 |
|
||||
| coulombcore phoenix approval | operator | Wave 8 |
|
||||
147
docs/fleet-mesh-dehub-runbook.md
Normal file
147
docs/fleet-mesh-dehub-runbook.md
Normal file
@@ -0,0 +1,147 @@
|
||||
# Fleet Mesh De-Hub Runbook (CUST-WP-0054-T02)
|
||||
|
||||
Date: 2026-07-03
|
||||
Workplan: `CUST-WP-0054-T02`
|
||||
Architecture: `docs/workstation-independence-fleet-architecture.md`
|
||||
|
||||
## Goal
|
||||
|
||||
Remove the workstation from production data paths between railiance01
|
||||
(activity-core) and coulombcore (State Hub cluster, issue-core). Workstation
|
||||
tunnels become interactive dev access only.
|
||||
|
||||
## Before (workstation hub)
|
||||
|
||||
```
|
||||
railiance01:18000 ──reverse──► workstation:8000 ──forward──► coulombcore cluster State Hub
|
||||
railiance01:18765 ──reverse──► workstation:18765 ──forward──► coulombcore cluster issue-core
|
||||
```
|
||||
|
||||
## After (fleet-owned)
|
||||
|
||||
```
|
||||
railiance01:18000 ──forward via SSH to coulombcore──► 10.43.170.94:8000 (State Hub)
|
||||
railiance01:18765 ──forward via SSH to coulombcore──► 10.43.103.154:8765 (issue-core)
|
||||
```
|
||||
|
||||
activity-core `actcore-state-hub-bridge` and `actcore-issue-core-bridge` keep
|
||||
proxying to `127.0.0.1:18000` and `127.0.0.1:18765` on the railiance01 node.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
| Item | Check |
|
||||
| --- | --- |
|
||||
| ops-bridge installed on railiance01 | `which bridge` |
|
||||
| SSH key authorized on coulombcore | `ssh -i ~/.ssh/id_ops tegwick@92.205.130.254 true` from railiance01 |
|
||||
| ClusterIPs current | `state-hub-primary` and `issue-core-coulombcore` workstation tunnels |
|
||||
| warden `atm-fleet-mesh` (target) | `cert_command` migration after static-key smoke passes |
|
||||
|
||||
Reference config: `infra/fleet-mesh/railiance01-tunnels.yaml`
|
||||
|
||||
## Install (railiance01)
|
||||
|
||||
railiance01 ships the kernel `bridge` utility (`iproute2`), not ops-bridge. Use the
|
||||
systemd user units in `infra/fleet-mesh/systemd/` (or the installer script).
|
||||
|
||||
```bash
|
||||
# From the-custodian repo on the workstation
|
||||
bash infra/fleet-mesh/install-railiance01.sh railiance01
|
||||
```
|
||||
|
||||
The installer copies:
|
||||
|
||||
- `infra/fleet-mesh/systemd/*.service` → `~/.config/systemd/user/`
|
||||
- `infra/fleet-mesh/railiance01-tunnels.yaml` → `~/.config/bridge/tunnels.yaml` (reference for future ops-bridge install)
|
||||
- `~/.ssh/id_ops` → railiance01 (static key interim; migrate to `atm-fleet-mesh` + `cert_command`)
|
||||
|
||||
Enable lingering so user units survive logout/reboot:
|
||||
|
||||
```bash
|
||||
ssh railiance01 'sudo loginctl enable-linger tegwick'
|
||||
```
|
||||
|
||||
## Cutover
|
||||
|
||||
```bash
|
||||
# 1. Stop workstation reverse tunnels (one at a time — ops-bridge CLI)
|
||||
bridge down state-hub-railiance01
|
||||
bridge down issue-core-railiance01
|
||||
|
||||
# 2. Start fleet-owned forward tunnels on railiance01 (systemd)
|
||||
ssh railiance01 'systemctl --user enable --now fleet-state-hub-coulombcore fleet-issue-core-coulombcore'
|
||||
|
||||
# 3. Smoke from railiance01 node
|
||||
ssh railiance01 'curl -sf http://127.0.0.1:18000/state/health && curl -sf http://127.0.0.1:18765/healthz'
|
||||
```
|
||||
|
||||
**Cutover evidence (2026-07-03):** workstation reverse tunnels stopped;
|
||||
railiance01 systemd forwards healthy; `actcore-*-bridge` pods 1/1; progress
|
||||
write through fleet path succeeded (event `647b70c0`).
|
||||
|
||||
## Verify production (partial T10 rehearsal)
|
||||
|
||||
With workstation reverse tunnels **down**, confirm:
|
||||
|
||||
```bash
|
||||
# Bridge pods healthy
|
||||
ssh railiance01 'kubectl -n activity-core get pods -l app.kubernetes.io/part-of=activity-core | grep bridge'
|
||||
|
||||
# Consistency sweep API (from railiance01 cluster network)
|
||||
ssh railiance01 'kubectl -n activity-core exec deploy/actcore-api -- python -c "
|
||||
import urllib.request
|
||||
print(urllib.request.urlopen(\"http://actcore-state-hub-bridge:8000/state/health\").read().decode())
|
||||
"'
|
||||
|
||||
# Issue-core bridge
|
||||
ssh railiance01 'kubectl -n activity-core exec deploy/actcore-api -- python -c "
|
||||
import urllib.request
|
||||
print(urllib.request.urlopen(\"http://actcore-issue-core-bridge:8765/healthz\").read().decode())
|
||||
"'
|
||||
```
|
||||
|
||||
Optional emission smoke (safe label only): trigger a known-safe activity-core
|
||||
run or use the issue-core REST sink checklist from
|
||||
`near-term-production-service-lanes-status.md`.
|
||||
|
||||
## Persist across reboot
|
||||
|
||||
Systemd user units are enabled via `install-railiance01.sh`. Confirm:
|
||||
|
||||
```bash
|
||||
ssh railiance01 'loginctl show-user tegwick -p Linger; systemctl --user is-enabled fleet-state-hub-coulombcore fleet-issue-core-coulombcore'
|
||||
```
|
||||
|
||||
When ops-bridge is installed on railiance01, `railiance01-tunnels.yaml` is the
|
||||
drop-in config; until then systemd units are the production implementation.
|
||||
|
||||
## Rollback
|
||||
|
||||
```bash
|
||||
ssh railiance01 'bridge down fleet-state-hub-coulombcore fleet-issue-core-coulombcore'
|
||||
bridge up state-hub-railiance01 issue-core-railiance01
|
||||
```
|
||||
|
||||
## Workstation tunnel policy after cutover
|
||||
|
||||
| Keep (interactive dev) | Retire from production dependency |
|
||||
| --- | --- |
|
||||
| `state-hub-primary` (MCP/agents) | `state-hub-railiance01` |
|
||||
| `k3s-api-*` | `issue-core-railiance01` |
|
||||
| `state-hub-mcp-*` | — |
|
||||
| `issue-core-coulombcore` (workstation dev only) | — |
|
||||
|
||||
Production on railiance01 must not depend on any workstation tunnel.
|
||||
|
||||
## WireGuard evaluation
|
||||
|
||||
Current fleet mesh uses two forward tunnels (~2 units). WireGuard successor is
|
||||
deferred until persistent unit count exceeds ~5 per workplan T02.
|
||||
|
||||
## cert_command migration (follow-on)
|
||||
|
||||
Replace static `id_ops` with `atm-fleet-mesh` + `cert_command`:
|
||||
|
||||
1. Register `atm-fleet-mesh` in warden inventory and CoulombCore `ssh_principals.yaml`
|
||||
2. Generate dedicated keypair on railiance01
|
||||
3. Set `cert_command: "warden sign atm-fleet-mesh --pubkey ..."` per
|
||||
`ops-warden/wiki/playbooks/ops-bridge-tunnel-cert.md`
|
||||
@@ -3,7 +3,7 @@
|
||||
<!-- generated by ops/render_service_inventory.py; edit ops/service-inventory.yml instead -->
|
||||
|
||||
Source: `ops/service-inventory.yml`
|
||||
Inventory last reviewed: `2026-06-05`
|
||||
Inventory last reviewed: `2026-07-03`
|
||||
|
||||
This is the repo-native first view for `CUST-WP-0047`. It exists so an
|
||||
operator can answer what is running where before the full standalone
|
||||
@@ -16,9 +16,9 @@ operator can answer what is running where before the full standalone
|
||||
| Environments | 4 |
|
||||
| Hosts | 3 |
|
||||
| Clusters | 3 |
|
||||
| Services | 8 |
|
||||
| Services: observed_ok | 2 |
|
||||
| Services: unknown | 6 |
|
||||
| Services | 11 |
|
||||
| Services: observed_ok | 6 |
|
||||
| Services: unknown | 5 |
|
||||
|
||||
## Service Catalog
|
||||
|
||||
@@ -27,10 +27,13 @@ operator can answer what is running where before the full standalone
|
||||
| Gitea (gitea) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-apps | https://gitea.coulomb.social/v2/<br>Expected: status 401, OCI registry auth challenge | unknown<br>2026-05-16: Inventory draft records Helm release gitea, namespace default, app version 1.25.4, NodePort 32166, and registry auth challenge. | database:gitea-db<br>pvc:default/gitea-shared-storage | k8s: unknown (coulombcore-k3s/default) | Package token and push/pull verification need current evidence. |
|
||||
| Gitea Database (gitea-database) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: databases | railiance-platform | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/databases) | Backup and restore evidence not recorded in ops inventory. |
|
||||
| Gitea Shared Storage (gitea-shared-storage) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: default | railiance-platform<br>railiance-apps | - | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | k8s: unknown (coulombcore-k3s/default/pvc/gitea-shared-storage) | Package blob backup and restore evidence not confirmed. |
|
||||
| State Hub (state-hub) | Local Workstation<br>type: local-process; host: local-workstation; ports: 8000 | state-hub<br>the-custodian | http://127.0.0.1:8000/state/health<br>Expected: status 200, health response | observed_ok<br>2026-06-05: State Hub accepted inbox, task, and progress API calls. | postgresql:state-hub | http: observed_ok (http://127.0.0.1:8000) | Future cluster deployment readiness still needs ops evidence. |
|
||||
| State Hub (state-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: state-hub | state-hub<br>the-custodian | http://127.0.0.1:8000/state/health<br>Expected: status 200, health response | observed_ok<br>2026-07-03: Cluster hub healthy; railiance01 reaches via fleet forward tunnel. | postgresql:state-hub-db | http: observed_ok (workstation tunnel state-hub-primary → cluster)<br>tunnel: observed_ok (railiance01 systemd fleet-state-hub-coulombcore → cluster) | Primary home must move to railiance01 per CUST-WP-0054-T05. |
|
||||
| issue-core (issue-core) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: issue-core | issue-core | http://127.0.0.1:8765/healthz<br>Expected: status 200, version response | observed_ok<br>2026-07-02: REST emission live via cross-machine fleet path. | postgresql:issue-core | tunnel: observed_ok (railiance01 fleet-issue-core-coulombcore → cluster) | Target railiance01 overlay per CUST-WP-0054 drain Wave 4. |
|
||||
| Core Hub (core-hub) | CoulombCore<br>type: k3s; cluster: coulombcore-k3s; namespace: core-hub-staging | core-hub | https://hub.coulomb.social/api/v2/hubs<br>Expected: status 200, hub list when authenticated | observed_ok<br>2026-07-02: Staging deployed; production cutover gated on CORE-WP-0005-T04. | postgresql:core-hub | k8s: observed_ok (coulombcore-k3s/core-hub-staging) | Production cutover to railiance01 pending operator approval. |
|
||||
| Fleet Mesh (railiance01) (fleet-mesh-railiance01) | Railiance01<br>type: systemd; host: railiance01 | the-custodian<br>ops-bridge | http://127.0.0.1:18000/state/health<br>Expected: status 200 | observed_ok<br>2026-07-03: Workstation reverse tunnels stopped; systemd forwards healthy. | - | ssh-tunnel: observed_ok (railiance01 → coulombcore ClusterIPs) | Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available. |
|
||||
| Inter-Hub (inter-hub) | ThreePhoenix Production<br>type: external; public_endpoint: https://hub.coulomb.social | inter-hub | https://hub.coulomb.social/api/v2/openapi.json<br>Expected: status 200, OpenAPI document | unknown<br>2026-05-16: /home/worsch/helix-forge/wiki/OpsHubInventory.md | - | https: unknown (https://hub.coulomb.social) | ops-hub bootstrap requires authenticated UI flow or deployment-side migration. |
|
||||
| activity-core (activity-core) | Railiance01<br>type: k3s; cluster: railiance01-k3s; namespace: activity-core | activity-core<br>the-custodian | activity-core API health endpoint<br>Expected: status 200, healthy DB and Temporal status | observed_ok<br>2026-05-23: API health, worker rollout, Temporal CLI schedule listing, and State Hub bridge were verified. | postgresql:activity-core<br>temporal:activity-core<br>nats:railiance01 | k8s: observed_ok (railiance01-k3s/activity-core) | Add explicit ops inventory probes and evidence events. |
|
||||
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | unknown<br>2026-05-16: Bridge is useful for connected-server visibility but is not itself the service catalog. | - | ssh-tunnel: unknown (connected remote servers) | Emit reachability evidence into ops-hub instead of relying on bridge state as inventory. |
|
||||
| Ops Bridge (ops-bridge) | Local Workstation<br>type: bridge; host: local-workstation | ops-bridge | - | observed_ok<br>2026-07-03: state-hub-railiance01 and issue-core-railiance01 stopped; not production-critical. | - | ssh-tunnel: observed_ok (interactive dev tunnels only (k3s-api, state-hub-primary)) | Install ops-bridge on railiance01 or keep systemd fleet-mesh units. |
|
||||
| Haskell Build Agent (haskell-build-agent) | Local Workstation<br>type: systemd; host: haskell-build-vm | the-custodian | http://127.0.0.1:18000<br>Expected: VM can reach State Hub through SSH forward | unknown<br>undated: Build agent is a systemd service and registers with State Hub on boot. | - | ssh: unknown (local workstation reverse tunnel port 12222) | Current tunnel and capability registration need live evidence in ops-hub. |
|
||||
|
||||
## Open Operating Gaps
|
||||
@@ -50,7 +53,21 @@ operator can answer what is running where before the full standalone
|
||||
|
||||
### State Hub (`state-hub`)
|
||||
|
||||
- Future cluster deployment readiness still needs ops evidence.
|
||||
- Primary home must move to railiance01 per CUST-WP-0054-T05.
|
||||
- Consistency sweep writebacks still target workstation paths.
|
||||
|
||||
### issue-core (`issue-core`)
|
||||
|
||||
- Target railiance01 overlay per CUST-WP-0054 drain Wave 4.
|
||||
|
||||
### Core Hub (`core-hub`)
|
||||
|
||||
- Production cutover to railiance01 pending operator approval.
|
||||
|
||||
### Fleet Mesh (railiance01) (`fleet-mesh-railiance01`)
|
||||
|
||||
- Migrate to atm-fleet-mesh cert_command when VAULT_TOKEN available.
|
||||
- Retire when State Hub and issue-core move to railiance01.
|
||||
|
||||
### Inter-Hub (`inter-hub`)
|
||||
|
||||
@@ -62,7 +79,7 @@ operator can answer what is running where before the full standalone
|
||||
|
||||
### Ops Bridge (`ops-bridge`)
|
||||
|
||||
- Emit reachability evidence into ops-hub instead of relying on bridge state as inventory.
|
||||
- Install ops-bridge on railiance01 or keep systemd fleet-mesh units.
|
||||
|
||||
### Haskell Build Agent (`haskell-build-agent`)
|
||||
|
||||
|
||||
298
docs/workstation-independence-fleet-architecture.md
Normal file
298
docs/workstation-independence-fleet-architecture.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# Workstation Independence and Fleet Role Architecture
|
||||
|
||||
Date: 2026-07-03
|
||||
Status: draft (canon-adjacent; promote to `canon/architecture/` after review)
|
||||
Workplan: `CUST-WP-0054` T01
|
||||
Related: `ADR-001`, `ADR-004`, `RAIL-BS-WP-0006`, `RAIL-HO-WP-0005`, `CUST-WP-0011`
|
||||
|
||||
## Purpose
|
||||
|
||||
Fix the three-machine role model, the fleet mesh topology, the promotion gate
|
||||
for "production", and the phoenix path `coulombcore → railiance02`. Provide a
|
||||
dependency register so every workload, tunnel, repo remote, sink path, and
|
||||
build pipeline has a **current host**, **target host**, and **migration owner**.
|
||||
|
||||
The acceptance proof for the whole plan is `CUST-WP-0054-T10`: production runs
|
||||
24h+ with the workstation fully offline.
|
||||
|
||||
## Machine Roles
|
||||
|
||||
| Machine | IP / identity | Current role (2026-07-03) | Target role |
|
||||
| --- | --- | --- | --- |
|
||||
| **railiance01** | `92.205.62.239` | First ThreePhoenix foundation node; hosts activity-core production, partial State Hub cluster footprint, automation schedules | **Production home** — first node of the growing Railiance fleet; hosts State Hub primary, forge, CI runners, and the automation loop |
|
||||
| **coulombcore** | `92.205.130.254` | De-facto production host: State Hub cluster primary, Core Hub (`hub.coulomb.social`), issue-core, OpenBao, identity stack, ESO/ArgoCD, Gitea/registry | **Frozen legacy** — no new production; drain workload-by-workload; eventually wiped and **reborn as railiance02** |
|
||||
| **workstation** | `bnt-lap001` / WSL2 | Production network hub (all 16 ops-bridge tunnels), State Hub client endpoint (`127.0.0.1:8000`), consistency-sweep writebacks, image build/publish, dev checkouts for 74 registered repos | **Temporary dev environment** — clone repos, run `make dev-hub`, push when connected; nothing in the production loop may depend on it being on |
|
||||
|
||||
### Role invariants
|
||||
|
||||
1. Production workloads authenticate, schedule, emit, and reconcile without the
|
||||
workstation.
|
||||
2. `coulombcore` is frozen for new production immediately (policy; see T03).
|
||||
3. A workload counts as "production on railiance01" only after passing the
|
||||
staged-promotion gate (see below).
|
||||
4. Files remain authoritative per ADR-001; fleet databases are disposable caches.
|
||||
|
||||
## Fleet Mesh Topology
|
||||
|
||||
### Current topology (workstation as hub)
|
||||
|
||||
All ops-bridge tunnels originate on the workstation. Two production data paths
|
||||
**chain through** it:
|
||||
|
||||
```
|
||||
railiance01 workstation coulombcore
|
||||
─────────── ─────────── ───────────
|
||||
activity-core ──(state-hub-railiance01 reverse)──► :18000 ──(state-hub-primary forward)──► State Hub cluster
|
||||
activity-core ──(issue-core-railiance01 reverse)──► :local ──(issue-core-coulombcore forward)──► issue-core
|
||||
```
|
||||
|
||||
Live tunnel inventory (2026-07-03, `bridge status`):
|
||||
|
||||
| Tunnel | Direction | Actor | Production-critical? |
|
||||
| --- | --- | --- | --- |
|
||||
| `state-hub-primary` | workstation → coulombcore cluster | `agt-claude-coulombcore` | **yes** — MCP/agents reach cluster hub via `127.0.0.1:8000` |
|
||||
| `state-hub-cluster-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | dev/ops access |
|
||||
| `state-hub-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | **yes** — activity-core reaches hub |
|
||||
| `state-hub-mcp-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | dev MCP |
|
||||
| `issue-core-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | **yes** — emission lane |
|
||||
| `issue-core-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | **yes** — completes emission chain |
|
||||
| `state-hub-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | legacy/dev |
|
||||
| `state-hub-mcp-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | dev MCP |
|
||||
| `k3s-api-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | operator dev |
|
||||
| `k3s-api-haskelseed` | workstation → haskelseed | `agt-claude-haskelseed` | experimental |
|
||||
| `flex-auth-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | identity dev |
|
||||
| `core-hub-staging-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | staging |
|
||||
| `inter-hub-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | legacy Inter-Hub |
|
||||
| `state-hub-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | experimental |
|
||||
| `state-hub-mcp-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | experimental |
|
||||
| `nix-daemon-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | build dev |
|
||||
|
||||
A workstation reboot breaks daily triage evidence, consistency sweeps, and
|
||||
issue emission until tunnels recover.
|
||||
|
||||
### Target topology (fleet-owned mesh)
|
||||
|
||||
```
|
||||
railiance01 ◄────────────────────────────────────► coulombcore (draining)
|
||||
│ direct atm- tunnels (ops-bridge on-host) │
|
||||
│ State Hub API │ legacy until drain complete
|
||||
│ issue-core REST │
|
||||
└─ activity-core, Temporal, sweep checkouts └─ identity, OpenBao (last to move)
|
||||
|
||||
workstation (optional client)
|
||||
│ interactive-only: k3s API, hub read, dev-hub
|
||||
└─ may disconnect without production impact
|
||||
```
|
||||
|
||||
Implementation owner: `CUST-WP-0054-T02`.
|
||||
|
||||
Key changes:
|
||||
|
||||
- ops-bridge (or systemd ssh units) runs **on railiance01** with `atm-` actor
|
||||
certs for cross-machine lanes.
|
||||
- `actcore-state-hub-bridge` and `actcore-issue-core-bridge` point at
|
||||
machine-local tunnel ports, not workstation forwards.
|
||||
- Workstation tunnels remain for interactive dev only.
|
||||
- Evaluate WireGuard mesh when persistent unit count exceeds ~5.
|
||||
|
||||
This posture extends ADR-004 (connectivity-first) from "workstation connects
|
||||
everything" to "fleet machines connect each other; workstation is a client."
|
||||
|
||||
## Production Promotion Gate
|
||||
|
||||
A workload is **production on railiance01** only when it conforms to the
|
||||
finished staged-promotion contract (`RAIL-BS-WP-0006`):
|
||||
|
||||
| Gate | Requirement |
|
||||
| --- | --- |
|
||||
| Overlay repo | `railiance/<app>/` with `app.toml` and stage manifests |
|
||||
| Stage commands | `stage deploy`, `stage observe`, `stage promote`, `stage rollback` proven |
|
||||
| Evidence | Backup/restore drill, canary observation, operator approval recorded |
|
||||
| Registry | Image in forge OCI registry with immutable tag |
|
||||
|
||||
**Exceptions** must be documented in the placement plan (T03) with explicit
|
||||
rollback. No exception bypasses backup evidence for stateful workloads.
|
||||
|
||||
`coulombcore` workloads still running in production today are **grandfathered
|
||||
legacy** until their drain task completes — not newly promoted production.
|
||||
|
||||
## Phoenix Path: coulombcore → railiance02
|
||||
|
||||
Machine-scale phoenix rotation reuses the same automation intended for future
|
||||
3-node weekly rotations (`RAIL-BS-WP-0007`, `CUST-WP-0038` deferred until
|
||||
railiance02 exists).
|
||||
|
||||
### Preconditions (drain complete)
|
||||
|
||||
All production dependencies moved off coulombcore per T03 ordering:
|
||||
|
||||
1. Forge + CI (T04) — repos and images no longer depend on `gitea.coulomb.social`
|
||||
2. State Hub primary (T05) — cluster DB and sweep checkouts on railiance01
|
||||
3. Core Hub, issue-core, Inter-Hub legacy — per T03 sequence
|
||||
4. Identity + OpenBao — **last** (everything authenticates through them)
|
||||
|
||||
### Phoenix execution
|
||||
|
||||
Owner: `CUST-WP-0054-T09`, automation: `CUST-WP-0054-T08`.
|
||||
|
||||
| Phase | Action | Tooling |
|
||||
| --- | --- | --- |
|
||||
| S0 | Final inventory sweep, DNS/cert plan for `*.coulomb.social`, data archival | T09 |
|
||||
| S1 | Wipe and greenfield rebuild | `NET-WP-0020` unseal + bootstrap chain |
|
||||
| S2 | Join as `railiance02` | `railiance-cluster` overlay, `atm-` certs |
|
||||
| S3 | Prove join-ready | Phoenix drill on disposable target first (T08) |
|
||||
|
||||
Longhorn distributed storage and PG streaming HA unlock once railiance01 +
|
||||
railiance02 are both fleet nodes.
|
||||
|
||||
## Dev Environment (Files-First Beachhead)
|
||||
|
||||
Strategy A from the workplan; owner: `CUST-WP-0054-T07`.
|
||||
|
||||
```
|
||||
git clone → make dev-hub → local ephemeral hub (compose)
|
||||
│
|
||||
├─ C-06 registration rebuilds workplan/task state from files
|
||||
├─ offline write buffer (STATE-WP-0068) for progress/task events
|
||||
└─ reconnect relay upstream; files reconcile, databases do not replicate
|
||||
```
|
||||
|
||||
MCP config gains explicit `dev` / `fleet` profile switch. The workstation is
|
||||
genuinely temporary: no fleet DB sync required for orientation.
|
||||
|
||||
## Dependency Register
|
||||
|
||||
### Workloads
|
||||
|
||||
| Workload | Current host | Target host | Migration owner | Method / notes |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| State Hub API (primary) | coulombcore CNPG cluster via workstation tunnel `state-hub-primary` → `127.0.0.1:8000` | railiance01 | `CUST-WP-0054-T05` | `CUST-WP-0011-T07` playbook: freeze → exact-count restore → rewire |
|
||||
| State Hub API (WSL2 fallback) | workstation WSL2 | retired | `CUST-WP-0011-T08/T09` → absorbed by `CUST-WP-0054-T10` | Stabilization window; not part of target architecture |
|
||||
| activity-core | railiance01 k3s (`activity-core` ns) | railiance01 (retain) | — | Already on target machine; fix bridges in T02 |
|
||||
| issue-core | coulombcore k3s | railiance01 | `CUST-WP-0054-T03` drain seq. | `ISSUE-WP-0003` live; emission chain fixed in T02 |
|
||||
| Core Hub | coulombcore (`hub.coulomb.social`) | railiance01 | `CORE-WP-0005` + `CUST-WP-0054-T03` | Staging on coulombcore; production cutover human-gated |
|
||||
| Inter-Hub (legacy Haskell) | coulombcore external | retired | `CORE-WP-0007` | Rollback-only after Core Hub cutover |
|
||||
| Gitea + OCI registry | coulombcore k3s | railiance01 Forgejo | `RAIL-HO-WP-0005` / `CUST-WP-0054-T04` | Read-only mirror on coulombcore until decommission |
|
||||
| OpenBao | coulombcore | railiance01 | `CUST-WP-0054-T03` (last) | NET-WP-0020 unseal automation |
|
||||
| Identity stack (KeyCape, Authelia, privacyIDEA, lldap) | coulombcore | railiance01 | `CUST-WP-0054-T03` (last) | Coupled to OpenBao |
|
||||
| ESO + ArgoCD control plane | coulombcore | railiance01 | `CUST-WP-0054-T03` | GitOps follows forge move |
|
||||
| CNPG databases (per workload) | coulombcore / railiance01 | railiance01 per workload | `CUST-WP-0054-T03`, `CUST-WP-0054-T05` | CNPG pattern proven; migrate with workload |
|
||||
| llm-connect | TBD cluster | railiance01 | near-term lanes board | `CCR-2026-0003` credential lane active |
|
||||
| ops-hub (widget/evidence) | files + Inter-Hub widgets | railiance01 via Core Hub | `CUST-WP-0025`, `CUST-WP-0049` | Not blocking workstation independence |
|
||||
| Temporal (activity-core) | railiance01 | railiance01 (retain) | — | Co-locate with activity-core |
|
||||
| NATS (activity-core) | railiance01 | railiance01 (retain) | — | Co-locate with activity-core |
|
||||
|
||||
### Network tunnels (production-critical)
|
||||
|
||||
| Lane | Current path | Target path | Owner |
|
||||
| --- | --- | --- | --- |
|
||||
| activity-core → State Hub | railiance01 reverse → workstation → `state-hub-primary` → coulombcore | railiance01 `atm-` forward → railiance01 State Hub (local or short hop) | `CUST-WP-0054-T02` |
|
||||
| Agents/MCP → State Hub | workstation `127.0.0.1:8000` → `state-hub-primary` → coulombcore | workstation `127.0.0.1:8000` → tunnel to railiance01 hub (dev client) or fleet endpoint | `CUST-WP-0054-T05` + T07 profiles |
|
||||
| railiance01 automations → State Hub | `:18000` chain via workstation | railiance01-local bridge port | `CUST-WP-0054-T02` |
|
||||
| activity-core → issue-core | railiance01 reverse → workstation → `issue-core-coulombcore` | railiance01 `atm-` forward → issue-core (on railiance01 post-drain) | `CUST-WP-0054-T02`, then T03 |
|
||||
| Operator k3s access | workstation forwards (`k3s-api-*`) | workstation interactive (non-critical) | — |
|
||||
|
||||
### Repo remotes
|
||||
|
||||
All checked 2026-07-03; pattern is uniform:
|
||||
|
||||
| Repo (sample) | Current remote | Target remote | Owner |
|
||||
| --- | --- | --- | --- |
|
||||
| the-custodian | `gitea.coulomb.social/coulomb/the-custodian.git` | `forgejo.coulomb.social/coulomb/the-custodian.git` | `CUST-WP-0054-T04` |
|
||||
| state-hub | `gitea.coulomb.social/coulomb/state-hub.git` | `forgejo.coulomb.social/coulomb/state-hub.git` | `CUST-WP-0054-T04` |
|
||||
| activity-core | `gitea.coulomb.social/coulomb/activity-core.git` | `forgejo.coulomb.social/coulomb/activity-core.git` | `CUST-WP-0054-T04` |
|
||||
| issue-core | `gitea.coulomb.social/coulomb/issue-core.git` | `forgejo.coulomb.social/coulomb/issue-core.git` | `CUST-WP-0054-T04` |
|
||||
| ops-bridge | `gitea.coulomb.social/coulomb/ops-bridge.git` | `forgejo.coulomb.social/coulomb/ops-bridge.git` | `CUST-WP-0054-T04` |
|
||||
| ops-warden | `gitea.coulomb.social/coulomb/ops-warden.git` | `forgejo.coulomb.social/coulomb/ops-warden.git` | `CUST-WP-0054-T04` |
|
||||
| core-hub | `gitea.coulomb.social/coulomb/core-hub.git` | `forgejo.coulomb.social/coulomb/core-hub.git` | `CUST-WP-0054-T04` |
|
||||
| *(all 74 registered repos)* | `gitea.coulomb.social/coulomb/<slug>.git` | `forgejo.coulomb.social/coulomb/<slug>.git` | `CUST-WP-0054-T04` |
|
||||
|
||||
### State Hub repo checkout paths
|
||||
|
||||
| Concern | Current | Target | Owner |
|
||||
| --- | --- | --- | --- |
|
||||
| `local_path` for 74 repos | `/home/worsch/<repo>` on workstation | railiance01 clone tree (e.g. `/home/tegwick/<repo>` or gitops-managed path) | `CUST-WP-0054-T05` |
|
||||
| Consistency sweep writeback host | workstation (`consistency_check.py --remote` via API) | railiance01 checkouts from forge | `CUST-WP-0054-T05`, `STATE-WP-0064` |
|
||||
| COULOMBCORE `host_paths` | `/home/tegwick/<repo>` (11 repos, `CUST-WP-0021`) | retired with coulombcore drain | `CUST-WP-0054-T09` |
|
||||
| Multi-host path resolution | `host_paths` map per hostname | fleet-primary host only + dev-hub local | `CUST-WP-0054-T07` |
|
||||
|
||||
### Sink and prompt paths
|
||||
|
||||
| Sink / path | Current | Target | Owner |
|
||||
| --- | --- | --- | --- |
|
||||
| Daily triage working-memory | `/home/worsch/the-custodian/memory/working` (ActivityDefinition + PVC mount) | repo-relative or PVC-native path + sweep sync-to-repo | `CUST-WP-0054-T06` |
|
||||
| Daily triage State Hub progress | cluster hub via workstation tunnel | railiance01 hub direct | `CUST-WP-0054-T02`, `T05` |
|
||||
| Consistency sweep progress event | via workstation-hosted sweep | railiance01-hosted sweep | `CUST-WP-0054-T05`, `STATE-WP-0064` |
|
||||
| Agent session traces (`runtime/agent.py`) | `memory/working/agent-session-*.md` on workstation | dev-hub local buffer; commit on reconnect | `CUST-WP-0054-T07` |
|
||||
| `output_schema` in ActivityDefinitions | absolute paths under `/home/worsch/the-custodian/` | repo-relative resolution in activity-core | `CUST-WP-0054-T06` |
|
||||
|
||||
### Build and publish pipelines
|
||||
|
||||
| Image / artifact | Current build host | Current registry | Target build | Target registry | Owner |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| state-hub | workstation `docker build` | `gitea.coulomb.social/coulomb/state-hub` | Forgejo Actions runner on railiance01 | railiance01 forge OCI | `CUST-WP-0054-T04` |
|
||||
| core-hub | workstation / railiance-forge docs | `gitea.coulomb.social/coulomb/core-hub` | CI runner | railiance01 forge OCI | `CUST-WP-0054-T04` |
|
||||
| activity-core | workstation manual rebuild + scp | railiance01 k3s import / Gitea | CI on tag push | railiance01 forge OCI | `CUST-WP-0054-T04` |
|
||||
| issue-core | workstation / manual | `gitea.coulomb.social/coulomb/issue-core` | CI runner | railiance01 forge OCI | `CUST-WP-0054-T04` |
|
||||
| Haskell build agent | workstation VM (`haskell-build-vm`) | n/a | retired (`CORE-WP-0007`) | n/a | `CORE-WP-0007` |
|
||||
|
||||
Done criterion for T01: every row above has a target and migration owner. ✓
|
||||
|
||||
## Drain Sequence
|
||||
|
||||
Detailed plan: `docs/coulombcore-drain-placement-plan.md`
|
||||
Freeze policy: `canon/standards/coulombcore-production-freeze_v0.1.md`
|
||||
|
||||
```
|
||||
Wave 1 Forge + CI (T04)
|
||||
Wave 2 State Hub primary (T05)
|
||||
Wave 3 Core Hub (CORE-WP-0005)
|
||||
Wave 4 issue-core
|
||||
Wave 5 ESO / ArgoCD
|
||||
Wave 6 Supporting apps
|
||||
Wave 7 OpenBao + identity (LAST)
|
||||
Wave 8 coulombcore phoenix → railiance02 (T09)
|
||||
```
|
||||
|
||||
## Sequencing Map
|
||||
|
||||
```
|
||||
T01 (this document) ✓
|
||||
├─ T02 de-hub network ✓
|
||||
├─ T03 placement plan / freeze ✓
|
||||
│ ├─ T04 forge + CI
|
||||
│ └─ T05 State Hub home on railiance01
|
||||
├─ T06 sink decoupling
|
||||
├─ T07 dev beachhead
|
||||
└─ T08 phoenix drill
|
||||
└─ T09 coulombcore → railiance02
|
||||
└─ T10 workstation-off acceptance
|
||||
```
|
||||
|
||||
## Evidence and Inventory Sources
|
||||
|
||||
- Live tunnel state: `bridge status` (2026-07-03)
|
||||
- State Hub health: `http://127.0.0.1:8000/state/health` (cluster primary via tunnel)
|
||||
- Registered repos: `GET /repos/` — 74 repos, all `local_path` under `/home/worsch/`
|
||||
- `ops/service-inventory.yml` (2026-06-05; predates cluster cutover — refresh in T03)
|
||||
- `docs/infrastructure-stabilization-pickup-checkpoint.md` (2026-07-03 metaplan closeout)
|
||||
- Activity definitions: `activity-definitions/daily-statehub-wsjf-triage.md`,
|
||||
`activity-definitions/state-hub-consistency-sweep.md`
|
||||
|
||||
## Open Gaps (not T01 blockers)
|
||||
|
||||
| Gap | Follow-on |
|
||||
| --- | --- |
|
||||
| Forgejo production hostname / SMTP / exposure decisions | `RAIL-HO-WP-0005-T02` (human) |
|
||||
| `ops/service-inventory.yml` stale environment labels | Refresh during T03 |
|
||||
| Core Hub widget-type registry prerequisite | `CORE-WP-0005-T04` |
|
||||
| HA Postgres / Longhorn across 2+ nodes | `RAIL-BS-WP-0007`, `CUST-WP-0038` after railiance02 |
|
||||
|
||||
## Promotion to Canon
|
||||
|
||||
After operator review:
|
||||
|
||||
1. Move to `canon/architecture/adr-006-workstation-independence-fleet-roles.md`
|
||||
(or equivalent ADR number).
|
||||
2. Update `ops/service-inventory.yml` environment and service rows to match.
|
||||
3. Link from `SCOPE.md` and `.custodian-brief.md` generation inputs.
|
||||
Reference in New Issue
Block a user