# Railiance Stack — Full Deploy Runbook > **When to use this:** Starting from a bare server (post-OS install) or rebuilding > after a catastrophic failure. For day-to-day operations use the individual layer > repos. See ADR-003 for layer boundaries and ADR-004 for connectivity posture. ## Pre-conditions checklist Before starting, verify you have: - [ ] SSH access to the target server (COULOMBCORE: 92.205.130.254, user: tegwick, key: `~/.ssh/id_ops`) - [ ] SOPS age private key available (`~/.config/sops/age/keys.txt` or `SOPS_AGE_KEY` env) - [ ] ops-bridge running on the workstation (needed for state hub MCP): `make mcp-http` in `~/the-custodian/state-hub/` - [ ] Gitea accessible (needed for git pull on remote): SSH via `gitea-remote:coulomb/.git` - [ ] If re-provisioning from scratch: Hetzner/HostEurope API credentials decryptable via SOPS --- ## S1 — Infrastructure Substrate (`railiance-infra`) ```bash # On workstation cd ~/railiance-infra # Provision server (skip if server already exists) make tf-plan # review Terraform plan make tf-apply # create/update server # Converge OS baseline # NOTE: Ansible runs locally on CoulombCore (workstation has no Ansible installed) ssh -i ~/.ssh/id_ops tegwick@92.205.130.254 \ 'cd ~/railiance-infra && git pull && \ cd ansible && ansible-playbook playbooks/bootstrap.yaml \ -c local --become -l CoulombCore' # Verify OS baseline make verify ``` **Checkpoint:** UFW active, fail2ban running, swap enabled, nproc limits in place, SOPS/age installed. --- ## S2 — Cluster Runtime (`railiance-cluster`) ```bash # On CoulombCore (SSH in first) ssh -i ~/.ssh/id_ops tegwick@92.205.130.254 cd ~/railiance-cluster make converge # installs k3s, Helm, cert-manager, nginx ingress, cnpg operator make smoke # runs cluster health assertions ``` **Checkpoint:** k3s running, Helm available, cert-manager and nginx-ingress pods Running, cnpg-system namespace active. --- ## S3 — Platform Services (`railiance-platform`) ```bash # On CoulombCore (kubectl available after S2) cd ~/railiance-platform && git pull # Create Gitea DB credentials secret (one-time; do NOT commit plaintext) kubectl create secret generic gitea-db-credentials \ --namespace databases \ --from-literal=username=gitea \ --from-literal=password= # Deploy cnpg Gitea database cluster make db-deploy # Wait for cluster to be healthy (~60s) make db-status # Deploy Valkey cache (standalone, not as Gitea subchart) # Requires: helm/valkey-values.sops.yaml with encrypted password make valkey-deploy make valkey-status ``` **Checkpoint:** `kubectl get cluster -n databases` shows `gitea-db` healthy; Valkey pod Running in platform namespace. --- ## S4 — Developer Enablement (`railiance-enablement`) No formal workplan yet. ArgoCD is currently deployed at cluster level (S2 boundary violation, tracked in RAIL-HO-WP-0004). No S4-specific steps required at this time. --- ## S5 — Workloads & Experience (`railiance-apps`) ```bash # On CoulombCore cd ~/railiance-apps && git pull # Deploy Gitea (git hosting) # Requires: helm/gitea-values.sops.yaml with encrypted values make gitea-deploy make gitea-status # Deploy state-hub (Custodian cognitive infrastructure) # See RAIL-HO-WP-0004-T09 for full steps make state-hub-deploy # (not yet implemented — pending T09) # Deploy activity-core # See RAIL-HO-WP-0004-T10 for full steps make activity-core-deploy # (not yet implemented — pending T10) ``` **Checkpoint:** Gitea accessible and all repos cloneable via SSH; state-hub `/state/health` returns 200. --- ## ops-bridge tunnel setup (workstation) After S2 is up, establish the persistent tunnels from the workstation: ```bash bridge up state-hub-coulombcore # state-hub HTTP (port 18000 remote) bridge up state-hub-mcp-coulombcore # state-hub MCP (port 18001 remote) bridge up k3s-api-coulombcore # k3s API (port 16443 local) ``` Verify: `bridge status` shows all three connected. --- ## Recovery pointers - **Node overload / SSH unresponsive:** See `the-custodian/ops/runbooks/gitea-coulombcore.md` Issue #3 - **Incident report:** `the-custodian/ops/incidents/2026-03-26-coulombcore-runaway-agent-overload.md` - **Cluster backup restore:** `railiance-cluster/tools/cmd/railiance-restore-s2` - **Gitea SSH not working:** Check `gitea-ssh-nodeport` service exists: `kubectl get svc -n default gitea-ssh-nodeport` --- ## Layer dependency chain ``` S1 (infra) → S2 (cluster) → S3 (platform) → S4 (enablement) → S5 (workloads) ``` Each layer must be fully converged and verified before starting the next. Never configure S2 concerns from S3+ repos (ADR-003 boundary rule).