ThreePhoenixWorkplan *Self-healing, load-balanced application and service hosting* ThreePhoenixWorkplan This is a plan for moving to a "3-Node Phoenix" architecture with High Availability (HA) at every layer—on bare metal. Here is the staged workplan to go from **3 Blank Ubuntu Machines** to a **Self-Healing, Load-Balanced Gitea Cluster**. ### Prerequisite Checklist * **Hardware:** 3x Ubuntu Servers (22.04 or 24.04 LTS). * **Network:** All 3 nodes must be able to talk to each other. * **DNS:** A domain pointing to your cluster (e.g., `git.yourdomain.com`). --- ### Phase 1: The Foundation (K3s Cluster) We will use **K3s** with embedded etcd. This gives you a true HA control plane without the complexity of "The Hard Way." **1. Prepare the Nodes (Run on All 3)** Disable swap (Kubernetes requirement) and update. ```bash sudo swapoff -a sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab sudo apt update && sudo apt upgrade -y ``` **2. Initialize the First Node (The Seed)** Run this on **Node 1**: ```bash curl -sfL https://get.k3s.io | sh -s - server \ --cluster-init \ --tls-san git.yourdomain.com \ --token SECRET_CLUSTER_TOKEN ``` * `--cluster-init`: Tells K3s this is the start of an HA cluster. * `SECRET_CLUSTER_TOKEN`: Make up a strong password. You need this for the other nodes. **3. Join Nodes 2 & 3** Run this on **Node 2** and **Node 3**: ```bash curl -sfL https://get.k3s.io | sh -s - server \ --server https://:6443 \ --token SECRET_CLUSTER_TOKEN ``` **4. Verification** On Node 1, run `sudo k3s kubectl get nodes`. You should see 3 Masters. *(Copy the `/etc/rancher/k3s/k3s.yaml` to your local machine as `~/.kube/config` to manage it remotely.)* --- ### Phase 2: The Storage (Longhorn) **Crucial:** Gitea HA requires a "Shared Filesystem" (ReadWriteMany) so all 3 Gitea pods see the same Git Repos. On bare metal, **Longhorn** is the standard way to achieve this. **1. Install Longhorn** ```bash helm repo add longhorn https://charts.longhorn.io helm repo update helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace ``` **2. Verify Storage Class** Run `kubectl get sc`. You should see `longhorn` (default). --- ### Phase 3: The Database (Postgres HA) We deploy the 3-node Postgres cluster with `pgpool` load balancing. **1. Create `postgres-values.yaml**` ```yaml architecture: replication postgresql: replicaCount: 3 pgpool: replicaCount: 3 loadBalancing: mode: on # The magic setting for performance persistence: storageClass: "longhorn" size: 10Gi metrics: enabled: true # For monitoring later serviceMonitor: enabled: true ``` **2. Install** ```bash helm repo add bitnami https://charts.bitnami.com/bitnami helm install gitea-db bitnami/postgresql-ha -f postgres-values.yaml ``` --- ### Phase 4: The Application (Gitea HA) Now for the complex part. We need Gitea to be stateless. **1. Create `gitea-values.yaml**` ```yaml gitea: replicaCount: 3 # Run 3 copies config: database: DB_TYPE: postgres HOST: gitea-db-postgresql-ha-pgpool:5432 # Point to Pgpool! NAME: gitea USER: postgres # CRITICAL: Shared Storage for Repos repository: ROOT: /data/git/repositories # Use Memcached/Redis for sessions (required for HA) cache: ADAPTER: memory # Ideally switch to Redis for true HA later session: PROVIDER: memory # Ideally switch to Redis for true HA later persistence: enabled: true accessModes: - ReadWriteMany # This demands Longhorn size: 20Gi storageClass: longhorn service: http: type: ClusterIP # Don't expose directly! ``` **2. Install** ```bash helm repo add gitea-charts https://dl.gitea.io/charts/ helm install gitea gitea-charts/gitea -f gitea-values.yaml ``` --- ### Phase 5: The Security (Nginx + SSL) We stop exposing ports directly and use an Ingress Controller. **1. Install Nginx Ingress** K3s comes with Traefik by default. You can disable it or use it. If you prefer **Nginx**: ```bash helm install ingress-nginx ingress-nginx/ingress-nginx ``` **2. Install Cert-Manager (For Let's Encrypt)** ```bash kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml ``` **3. Create the Ingress Resource** Save as `gitea-ingress.yaml`: ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: gitea-ingress annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - git.yourdomain.com secretName: gitea-tls-secret rules: - host: git.yourdomain.com http: paths: - path: / pathType: Prefix backend: service: name: gitea-http port: number: 3000 ``` --- ### Phase 6: The "Phoenix" Automation (Bonus) Deploy the logic we discussed to rotate nodes. **1. Create the Service Account** Save as `rbac.yaml`. This gives the bot permission to kill pods. ```yaml apiVersion: v1 kind: ServiceAccount metadata: name: phoenix-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: phoenix-role rules: - apiGroups: [""] resources: ["pods", "persistentvolumeclaims"] verbs: ["get", "list", "delete"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["get", "list", "patch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: phoenix-binding subjects: - kind: ServiceAccount name: phoenix-sa namespace: default roleRef: kind: ClusterRole name: phoenix-role apiGroup: rbac.authorization.k8s.io ``` `kubectl apply -f rbac.yaml` **2. Deploy the CronJob** Use the **"3-Node Phoenix Script"** from my previous response. Save it as `phoenix-cron.yaml` and apply it. --- ### Phase 7: Monitoring & Notification (Bonus Material) You want to know *before* your users do if something breaks. **1. Install the "Kube-Prometheus-Stack"** This gives you Prometheus (Database), Grafana (Dashboards), and Alertmanager (Notifications) in one go. ```bash helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install monitoring prometheus-community/kube-prometheus-stack ``` **2. Configure Alerts (Email/Slack)** Edit the `alertmanager` config to send notifications. * **Trigger:** `PostgresqlDown` (If pgpool can't see a backend). * **Trigger:** `KubePodCrashLooping` (If Gitea is restarting). **3. The "Dead Man's Switch"** Since you have a Phoenix strategy that *intentionally* kills pods, you need to **Silence Alerts** during that specific maintenance window (Sunday 3 AM), or you will wake up to panic emails every week. * You can automate "Silence" creation via the Alertmanager API in your Phoenix script: ```bash curl -XPOST http://monitoring-alertmanager:9093/api/v2/silences -d '{...}' ``` ### Summary of Result You now have: 1. **3 Physical Nodes** mirroring data via Longhorn. 2. **3 Database Replicas** load-balanced by Pgpool. 3. **SSL Termination** handling security. 4. **Auto-Rotation** killing and rebuilding servers weekly. 5. **Monitoring** watching it all. xxx