Files
railiance-infra/docs/provisioning.md

145 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🚀 Provisioning Servers with RailianceHosts
This guide explains **where you declare servers**, **how Terraform uses that declaration**, and **how to provision** (and later destroy) machines on Hetzner.
---
## 🚀 Fast Path: Using the Helper Script
Instead of manually editing `inventory/servers.yaml` and running `make tf-apply`, you can use the convenience script [`scripts/hcloud_new_server.sh`](../scripts/hcloud_new_server.sh).
This script will:
1. Add the new host entry to `inventory/servers.yaml`
2. Decrypt your Hetzner API token with SOPS
3. Run Terraform (`init/plan/apply`) to provision the server
4. Print the IPv4 address and a ready-to-use SSH command
**Example:**
```bash
scripts/hcloud_new_server.sh core-01 --type cpx11 --region nbg1 --role core
```
This will create a small cpx11 instance in the Nuremberg (nbg1) region, tagged with the role core.
You can then connect directly:
```bash
ssh admin@<printed-ip>
```
👉 The script is optional. You can always manage servers by editing inventory/servers.yaml and running make tf-apply instead.
## 1) Where you define servers
All desired hosts live in **`inventory/servers.yaml`**. Each entry is a simple YAML object with the required attributes:
```yaml
servers:
- name: core-01
labels: [core, wireguard, git]
role: "core"
region: "nbg1" # Hetzner location (e.g., nbg1, fsn1, hel1)
type: "cpx21" # Hetzner server type/flavor
image: "ubuntu-24.04" # OS image slug
ssh_user: "admin" # bootstrap user (cloud-init creates this)
```
> Tip: Keep **names stable**. Renaming a server in this file makes Terraform think the old one was destroyed and a new one should be created.
---
## 2) Two ways to add a server
### A) Edit YAML by hand (simple)
Open `inventory/servers.yaml`, add a new entry, save, commit.
### B) Use the helper script (safe & quick)
```bash
# requires scripts/new_host.py
make new-host NAME=web-01 TYPE=cpx21 REGION=nbg1 ROLE=web
# or directly:
python3 scripts/new_host.py --name web-01 --type cpx21 --region nbg1 --role web
```
You can also do **add + provision in one step**:
```bash
scripts/hcloud_new_server.sh web-01 --type cpx21 --region nbg1 --role web
```
---
## 3) How Terraform uses your declaration
The module at `terraform/hetzner/`:
- Reads `inventory/servers.yaml` (`for_each` over `servers`)
- Registers your SSH key from `keys/admin_ssh.pub`
- Injects **cloud-init** that sets up the `admin` user and basic hardening
- Creates/updates/destroys servers to match the YAML
Outputs include a map of server names → IPv4 addresses.
---
## 4) Provision (create/update)
Make sure your Hetzner API token is present and **SOPS-decryptable** in `inventory/group_vars/secrets.sops.yaml` under `ops.hcloud_token`.
Then run either:
```bash
# plan and apply in separate steps
make tf-plan
make tf-apply
```
or the end-to-end convenience:
```bash
make apply # terraform apply + ansible bootstrap
```
If you used the one-shot script:
```bash
scripts/hcloud_new_server.sh web-01 --type cpx21 --region nbg1 --role web
```
Terraform will print the new servers IPv4 addresses at the end.
---
## 5) Connect & converge
Connect via SSH:
```bash
ssh admin@<server-ip>
```
Run Ansible base bootstrap (if not using `make apply`):
```bash
make ansible-bootstrap
```
---
## 6) Destroy (tear down)
To remove all servers managed by this repo:
```bash
make tf-destroy
```
To remove just one server, delete its entry from `inventory/servers.yaml`, commit, then:
```bash
make tf-apply
```
Terraform will destroy the missing server and leave others intact.
---
## 7) Notes & conventions
- **Idempotent:** You can run `make apply` repeatedly; Terraform converges infra, Ansible converges config.
- **SSH keys:** Ensure `keys/admin_ssh.pub` exists before provisioning.
- **Secret token:** The Hetzner API token must be in `inventory/group_vars/secrets.sops.yaml` (encrypted with SOPS).
- **Cloud-init delay:** Allow ~3060s after creation for first-boot tasks before first SSH.
- **Labels & role:** `labels` are freeform tags; `role` can drive Ansible plays as you grow.