# Workstation Independence and Fleet Role Architecture Date: 2026-07-03 Status: draft (canon-adjacent; promote to `canon/architecture/` after review) Workplan: `CUST-WP-0054` T01 Related: `ADR-001`, `ADR-004`, `RAIL-BS-WP-0006`, `RAIL-HO-WP-0005`, `CUST-WP-0011` ## Purpose Fix the three-machine role model, the fleet mesh topology, the promotion gate for "production", and the phoenix path `coulombcore → railiance02`. Provide a dependency register so every workload, tunnel, repo remote, sink path, and build pipeline has a **current host**, **target host**, and **migration owner**. The acceptance proof for the whole plan is `CUST-WP-0054-T10`: production runs 24h+ with the workstation fully offline. ## Machine Roles | Machine | IP / identity | Current role (2026-07-03) | Target role | | --- | --- | --- | --- | | **railiance01** | `92.205.62.239` | First ThreePhoenix foundation node; hosts activity-core production, partial State Hub cluster footprint, automation schedules | **Production home** — first node of the growing Railiance fleet; hosts State Hub primary, forge, CI runners, and the automation loop | | **coulombcore** | `92.205.130.254` | De-facto production host: State Hub cluster primary, Core Hub (`hub.coulomb.social`), issue-core, OpenBao, identity stack, ESO/ArgoCD, Gitea/registry | **Frozen legacy** — no new production; drain workload-by-workload; eventually wiped and **reborn as railiance02** | | **workstation** | `bnt-lap001` / WSL2 | Production network hub (all 16 ops-bridge tunnels), State Hub client endpoint (`127.0.0.1:8000`), consistency-sweep writebacks, image build/publish, dev checkouts for 74 registered repos | **Temporary dev environment** — clone repos, run `make dev-hub`, push when connected; nothing in the production loop may depend on it being on | ### Role invariants 1. Production workloads authenticate, schedule, emit, and reconcile without the workstation. 2. `coulombcore` is frozen for new production immediately (policy; see T03). 3. A workload counts as "production on railiance01" only after passing the staged-promotion gate (see below). 4. Files remain authoritative per ADR-001; fleet databases are disposable caches. ## Fleet Mesh Topology ### Current topology (workstation as hub) All ops-bridge tunnels originate on the workstation. Two production data paths **chain through** it: ``` railiance01 workstation coulombcore ─────────── ─────────── ─────────── activity-core ──(state-hub-railiance01 reverse)──► :18000 ──(state-hub-primary forward)──► State Hub cluster activity-core ──(issue-core-railiance01 reverse)──► :local ──(issue-core-coulombcore forward)──► issue-core ``` Live tunnel inventory (2026-07-03, `bridge status`): | Tunnel | Direction | Actor | Production-critical? | | --- | --- | --- | --- | | `state-hub-primary` | workstation → coulombcore cluster | `agt-claude-coulombcore` | **yes** — MCP/agents reach cluster hub via `127.0.0.1:8000` | | `state-hub-cluster-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | dev/ops access | | `state-hub-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | **yes** — activity-core reaches hub | | `state-hub-mcp-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | dev MCP | | `issue-core-railiance01` | railiance01 → workstation (reverse) | `agt-claude-railiance01` | **yes** — emission lane | | `issue-core-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | **yes** — completes emission chain | | `state-hub-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | legacy/dev | | `state-hub-mcp-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | dev MCP | | `k3s-api-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | operator dev | | `k3s-api-haskelseed` | workstation → haskelseed | `agt-claude-haskelseed` | experimental | | `flex-auth-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | identity dev | | `core-hub-staging-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | staging | | `inter-hub-coulombcore` | workstation → coulombcore | `agt-claude-coulombcore` | legacy Inter-Hub | | `state-hub-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | experimental | | `state-hub-mcp-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | experimental | | `nix-daemon-haskelseed` | haskelseed → workstation | `agt-claude-haskelseed` | build dev | A workstation reboot breaks daily triage evidence, consistency sweeps, and issue emission until tunnels recover. ### Target topology (fleet-owned mesh) ``` railiance01 ◄────────────────────────────────────► coulombcore (draining) │ direct atm- tunnels (ops-bridge on-host) │ │ State Hub API │ legacy until drain complete │ issue-core REST │ └─ activity-core, Temporal, sweep checkouts └─ identity, OpenBao (last to move) workstation (optional client) │ interactive-only: k3s API, hub read, dev-hub └─ may disconnect without production impact ``` Implementation owner: `CUST-WP-0054-T02`. Key changes: - ops-bridge (or systemd ssh units) runs **on railiance01** with `atm-` actor certs for cross-machine lanes. - `actcore-state-hub-bridge` and `actcore-issue-core-bridge` point at machine-local tunnel ports, not workstation forwards. - Workstation tunnels remain for interactive dev only. - Evaluate WireGuard mesh when persistent unit count exceeds ~5. This posture extends ADR-004 (connectivity-first) from "workstation connects everything" to "fleet machines connect each other; workstation is a client." ## Production Promotion Gate A workload is **production on railiance01** only when it conforms to the finished staged-promotion contract (`RAIL-BS-WP-0006`): | Gate | Requirement | | --- | --- | | Overlay repo | `railiance//` with `app.toml` and stage manifests | | Stage commands | `stage deploy`, `stage observe`, `stage promote`, `stage rollback` proven | | Evidence | Backup/restore drill, canary observation, operator approval recorded | | Registry | Image in forge OCI registry with immutable tag | **Exceptions** must be documented in the placement plan (T03) with explicit rollback. No exception bypasses backup evidence for stateful workloads. `coulombcore` workloads still running in production today are **grandfathered legacy** until their drain task completes — not newly promoted production. ## Phoenix Path: coulombcore → railiance02 Machine-scale phoenix rotation reuses the same automation intended for future 3-node weekly rotations (`RAIL-BS-WP-0007`, `CUST-WP-0038` deferred until railiance02 exists). ### Preconditions (drain complete) All production dependencies moved off coulombcore per T03 ordering: 1. Forge + CI (T04) — repos and images no longer depend on `gitea.coulomb.social` 2. State Hub primary (T05) — cluster DB and sweep checkouts on railiance01 3. Core Hub, issue-core, Inter-Hub legacy — per T03 sequence 4. Identity + OpenBao — **last** (everything authenticates through them) ### Phoenix execution Owner: `CUST-WP-0054-T09`, automation: `CUST-WP-0054-T08`. | Phase | Action | Tooling | | --- | --- | --- | | S0 | Final inventory sweep, DNS/cert plan for `*.coulomb.social`, data archival | T09 | | S1 | Wipe and greenfield rebuild | `NET-WP-0020` unseal + bootstrap chain | | S2 | Join as `railiance02` | `railiance-cluster` overlay, `atm-` certs | | S3 | Prove join-ready | Phoenix drill on disposable target first (T08) | Longhorn distributed storage and PG streaming HA unlock once railiance01 + railiance02 are both fleet nodes. ## Dev Environment (Files-First Beachhead) Strategy A from the workplan; owner: `CUST-WP-0054-T07`. ``` git clone → make dev-hub → local ephemeral hub (compose) │ ├─ C-06 registration rebuilds workplan/task state from files ├─ offline write buffer (STATE-WP-0068) for progress/task events └─ reconnect relay upstream; files reconcile, databases do not replicate ``` MCP config gains explicit `dev` / `fleet` profile switch. The workstation is genuinely temporary: no fleet DB sync required for orientation. ## Dependency Register ### Workloads | Workload | Current host | Target host | Migration owner | Method / notes | | --- | --- | --- | --- | --- | | State Hub API (primary) | coulombcore CNPG cluster via workstation tunnel `state-hub-primary` → `127.0.0.1:8000` | railiance01 | `CUST-WP-0054-T05` | `CUST-WP-0011-T07` playbook: freeze → exact-count restore → rewire | | State Hub API (WSL2 fallback) | workstation WSL2 | retired | `CUST-WP-0011-T08/T09` → absorbed by `CUST-WP-0054-T10` | Stabilization window; not part of target architecture | | activity-core | railiance01 k3s (`activity-core` ns) | railiance01 (retain) | — | Already on target machine; fix bridges in T02 | | issue-core | coulombcore k3s | railiance01 | `CUST-WP-0054-T03` drain seq. | `ISSUE-WP-0003` live; emission chain fixed in T02 | | Core Hub | coulombcore (`hub.coulomb.social`) | railiance01 | `CORE-WP-0005` + `CUST-WP-0054-T03` | Staging on coulombcore; production cutover human-gated | | Inter-Hub (legacy Haskell) | coulombcore external | retired | `CORE-WP-0007` | Rollback-only after Core Hub cutover | | Gitea + OCI registry | coulombcore k3s | railiance01 Forgejo | `RAIL-HO-WP-0005` / `CUST-WP-0054-T04` | Read-only mirror on coulombcore until decommission | | OpenBao | coulombcore | railiance01 | `CUST-WP-0054-T03` (last) | NET-WP-0020 unseal automation | | Identity stack (KeyCape, Authelia, privacyIDEA, lldap) | coulombcore | railiance01 | `CUST-WP-0054-T03` (last) | Coupled to OpenBao | | ESO + ArgoCD control plane | coulombcore | railiance01 | `CUST-WP-0054-T03` | GitOps follows forge move | | CNPG databases (per workload) | coulombcore / railiance01 | railiance01 per workload | `CUST-WP-0054-T03`, `CUST-WP-0054-T05` | CNPG pattern proven; migrate with workload | | llm-connect | TBD cluster | railiance01 | near-term lanes board | `CCR-2026-0003` credential lane active | | ops-hub (widget/evidence) | files + Inter-Hub widgets | railiance01 via Core Hub | `CUST-WP-0025`, `CUST-WP-0049` | Not blocking workstation independence | | Temporal (activity-core) | railiance01 | railiance01 (retain) | — | Co-locate with activity-core | | NATS (activity-core) | railiance01 | railiance01 (retain) | — | Co-locate with activity-core | ### Network tunnels (production-critical) | Lane | Current path | Target path | Owner | | --- | --- | --- | --- | | activity-core → State Hub | railiance01 reverse → workstation → `state-hub-primary` → coulombcore | railiance01 `atm-` forward → railiance01 State Hub (local or short hop) | `CUST-WP-0054-T02` | | Agents/MCP → State Hub | workstation `127.0.0.1:8000` → `state-hub-primary` → coulombcore | workstation `127.0.0.1:8000` → tunnel to railiance01 hub (dev client) or fleet endpoint | `CUST-WP-0054-T05` + T07 profiles | | railiance01 automations → State Hub | `:18000` chain via workstation | railiance01-local bridge port | `CUST-WP-0054-T02` | | activity-core → issue-core | railiance01 reverse → workstation → `issue-core-coulombcore` | railiance01 `atm-` forward → issue-core (on railiance01 post-drain) | `CUST-WP-0054-T02`, then T03 | | Operator k3s access | workstation forwards (`k3s-api-*`) | workstation interactive (non-critical) | — | ### Repo remotes All checked 2026-07-03; pattern is uniform: | Repo (sample) | Current remote | Target remote | Owner | | --- | --- | --- | --- | | the-custodian | `gitea.coulomb.social/coulomb/the-custodian.git` | `forgejo.coulomb.social/coulomb/the-custodian.git` | `CUST-WP-0054-T04` | | state-hub | `gitea.coulomb.social/coulomb/state-hub.git` | `forgejo.coulomb.social/coulomb/state-hub.git` | `CUST-WP-0054-T04` | | activity-core | `gitea.coulomb.social/coulomb/activity-core.git` | `forgejo.coulomb.social/coulomb/activity-core.git` | `CUST-WP-0054-T04` | | issue-core | `gitea.coulomb.social/coulomb/issue-core.git` | `forgejo.coulomb.social/coulomb/issue-core.git` | `CUST-WP-0054-T04` | | ops-bridge | `gitea.coulomb.social/coulomb/ops-bridge.git` | `forgejo.coulomb.social/coulomb/ops-bridge.git` | `CUST-WP-0054-T04` | | ops-warden | `gitea.coulomb.social/coulomb/ops-warden.git` | `forgejo.coulomb.social/coulomb/ops-warden.git` | `CUST-WP-0054-T04` | | core-hub | `gitea.coulomb.social/coulomb/core-hub.git` | `forgejo.coulomb.social/coulomb/core-hub.git` | `CUST-WP-0054-T04` | | *(all 74 registered repos)* | `gitea.coulomb.social/coulomb/.git` | `forgejo.coulomb.social/coulomb/.git` | `CUST-WP-0054-T04` | ### State Hub repo checkout paths | Concern | Current | Target | Owner | | --- | --- | --- | --- | | `local_path` for 74 repos | `/home/worsch/` on workstation | railiance01 clone tree (e.g. `/home/tegwick/` or gitops-managed path) | `CUST-WP-0054-T05` | | Consistency sweep writeback host | workstation (`consistency_check.py --remote` via API) | railiance01 checkouts from forge | `CUST-WP-0054-T05`, `STATE-WP-0064` | | COULOMBCORE `host_paths` | `/home/tegwick/` (11 repos, `CUST-WP-0021`) | retired with coulombcore drain | `CUST-WP-0054-T09` | | Multi-host path resolution | `host_paths` map per hostname | fleet-primary host only + dev-hub local | `CUST-WP-0054-T07` | ### Sink and prompt paths | Sink / path | Current | Target | Owner | | --- | --- | --- | --- | | Daily triage working-memory | `/home/worsch/the-custodian/memory/working` (ActivityDefinition + PVC mount) | repo-relative or PVC-native path + sweep sync-to-repo | `CUST-WP-0054-T06` | | Daily triage State Hub progress | cluster hub via workstation tunnel | railiance01 hub direct | `CUST-WP-0054-T02`, `T05` | | Consistency sweep progress event | via workstation-hosted sweep | railiance01-hosted sweep | `CUST-WP-0054-T05`, `STATE-WP-0064` | | Agent session traces (`runtime/agent.py`) | `memory/working/agent-session-*.md` on workstation | dev-hub local buffer; commit on reconnect | `CUST-WP-0054-T07` | | `output_schema` in ActivityDefinitions | absolute paths under `/home/worsch/the-custodian/` | repo-relative resolution in activity-core | `CUST-WP-0054-T06` | ### Build and publish pipelines | Image / artifact | Current build host | Current registry | Target build | Target registry | Owner | | --- | --- | --- | --- | --- | --- | | state-hub | workstation `docker build` | `gitea.coulomb.social/coulomb/state-hub` | Forgejo Actions runner on railiance01 | railiance01 forge OCI | `CUST-WP-0054-T04` | | core-hub | workstation / railiance-forge docs | `gitea.coulomb.social/coulomb/core-hub` | CI runner | railiance01 forge OCI | `CUST-WP-0054-T04` | | activity-core | workstation manual rebuild + scp | railiance01 k3s import / Gitea | CI on tag push | railiance01 forge OCI | `CUST-WP-0054-T04` | | issue-core | workstation / manual | `gitea.coulomb.social/coulomb/issue-core` | CI runner | railiance01 forge OCI | `CUST-WP-0054-T04` | | Haskell build agent | workstation VM (`haskell-build-vm`) | n/a | retired (`CORE-WP-0007`) | n/a | `CORE-WP-0007` | Done criterion for T01: every row above has a target and migration owner. ✓ ## Drain Sequence Detailed plan: `docs/coulombcore-drain-placement-plan.md` Freeze policy: `canon/standards/coulombcore-production-freeze_v0.1.md` ``` Wave 1 Forge + CI (T04) Wave 2 State Hub primary (T05) Wave 3 Core Hub (CORE-WP-0005) Wave 4 issue-core Wave 5 ESO / ArgoCD Wave 6 Supporting apps Wave 7 OpenBao + identity (LAST) Wave 8 coulombcore phoenix → railiance02 (T09) ``` ## Sequencing Map ``` T01 (this document) ✓ ├─ T02 de-hub network ✓ ├─ T03 placement plan / freeze ✓ │ ├─ T04 forge + CI │ └─ T05 State Hub home on railiance01 ├─ T06 sink decoupling ├─ T07 dev beachhead └─ T08 phoenix drill └─ T09 coulombcore → railiance02 └─ T10 workstation-off acceptance ``` ## Evidence and Inventory Sources - Live tunnel state: `bridge status` (2026-07-03) - State Hub health: `http://127.0.0.1:8000/state/health` (cluster primary via tunnel) - Registered repos: `GET /repos/` — 74 repos, all `local_path` under `/home/worsch/` - `ops/service-inventory.yml` (2026-06-05; predates cluster cutover — refresh in T03) - `docs/infrastructure-stabilization-pickup-checkpoint.md` (2026-07-03 metaplan closeout) - Activity definitions: `activity-definitions/daily-statehub-wsjf-triage.md`, `activity-definitions/state-hub-consistency-sweep.md` ## Open Gaps (not T01 blockers) | Gap | Follow-on | | --- | --- | | Forgejo production hostname / SMTP / exposure decisions | `RAIL-HO-WP-0005-T02` (human) | | `ops/service-inventory.yml` stale environment labels | Refresh during T03 | | Core Hub widget-type registry prerequisite | `CORE-WP-0005-T04` | | HA Postgres / Longhorn across 2+ nodes | `RAIL-BS-WP-0007`, `CUST-WP-0038` after railiance02 | ## Promotion to Canon After operator review: 1. Move to `canon/architecture/adr-006-workstation-independence-fleet-roles.md` (or equivalent ADR number). 2. Update `ops/service-inventory.yml` environment and service rows to match. 3. Link from `SCOPE.md` and `.custodian-brief.md` generation inputs.