docs: add ThreePhoenix architecture concept and workplan

RailianceThreePhoenix: 3-node HA Kubernetes cluster with embedded etcd, Longhorn distributed storage, PostgreSQL HA (repmgr + Pgpool-II), and Phoenix CronJob for weekly node rotation to prevent configuration drift. ThreePhoenixWorkplan: 7-phase implementation plan from blank Ubuntu nodes to self-healing Gitea cluster with monitoring and alert silencing. Also adds CLAUDE.md with Custodian State Hub session protocol. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 01:13:05 +01:00
parent b7696e657f
commit eb8a6902b6
3 changed files with 477 additions and 0 deletions
--- a/wiki/ThreePhoenixWorkplan.md
+++ b/wiki/ThreePhoenixWorkplan.md
@@ -0,0 +1,301 @@
+ThreePhoenixWorkplan
+
+*Self-healing, load-balanced application and service hosting*
+
+ThreePhoenixWorkplan
+
+This is a plan for moving to a "3-Node Phoenix" architecture with High Availability (HA) at every layer—on bare metal.
+
+Here is the staged workplan to go from **3 Blank Ubuntu Machines** to a **Self-Healing, Load-Balanced Gitea Cluster**.
+
+### Prerequisite Checklist
+
+* **Hardware:** 3x Ubuntu Servers (22.04 or 24.04 LTS).
+* **Network:** All 3 nodes must be able to talk to each other.
+* **DNS:** A domain pointing to your cluster (e.g., `git.yourdomain.com`).
+
+---
+
+### Phase 1: The Foundation (K3s Cluster)
+
+We will use **K3s** with embedded etcd. This gives you a true HA control plane without the complexity of "The Hard Way."
+
+**1. Prepare the Nodes (Run on All 3)**
+Disable swap (Kubernetes requirement) and update.
+
+```bash
+sudo swapoff -a
+sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
+sudo apt update && sudo apt upgrade -y
+
+```
+
+**2. Initialize the First Node (The Seed)**
+Run this on **Node 1**:
+
+```bash
+curl -sfL https://get.k3s.io | sh -s - server \
+  --cluster-init \
+  --tls-san git.yourdomain.com \
+  --token SECRET_CLUSTER_TOKEN
+
+```
+
+* `--cluster-init`: Tells K3s this is the start of an HA cluster.
+* `SECRET_CLUSTER_TOKEN`: Make up a strong password. You need this for the other nodes.
+
+**3. Join Nodes 2 & 3**
+Run this on **Node 2** and **Node 3**:
+
+```bash
+curl -sfL https://get.k3s.io | sh -s - server \
+  --server https://<IP_OF_NODE_1>:6443 \
+  --token SECRET_CLUSTER_TOKEN
+
+```
+
+**4. Verification**
+On Node 1, run `sudo k3s kubectl get nodes`. You should see 3 Masters.
+*(Copy the `/etc/rancher/k3s/k3s.yaml` to your local machine as `~/.kube/config` to manage it remotely.)*
+
+---
+
+### Phase 2: The Storage (Longhorn)
+
+**Crucial:** Gitea HA requires a "Shared Filesystem" (ReadWriteMany) so all 3 Gitea pods see the same Git Repos. On bare metal, **Longhorn** is the standard way to achieve this.
+
+**1. Install Longhorn**
+
+```bash
+helm repo add longhorn https://charts.longhorn.io
+helm repo update
+helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace
+
+```
+
+**2. Verify Storage Class**
+Run `kubectl get sc`. You should see `longhorn` (default).
+
+---
+
+### Phase 3: The Database (Postgres HA)
+
+We deploy the 3-node Postgres cluster with `pgpool` load balancing.
+
+**1. Create `postgres-values.yaml**`
+
+```yaml
+architecture: replication
+postgresql:
+  replicaCount: 3
+pgpool:
+  replicaCount: 3
+  loadBalancing:
+    mode: on # The magic setting for performance
+persistence:
+  storageClass: "longhorn"
+  size: 10Gi
+metrics:
+  enabled: true # For monitoring later
+  serviceMonitor:
+    enabled: true
+
+```
+
+**2. Install**
+
+```bash
+helm repo add bitnami https://charts.bitnami.com/bitnami
+helm install gitea-db bitnami/postgresql-ha -f postgres-values.yaml
+
+```
+
+---
+
+### Phase 4: The Application (Gitea HA)
+
+Now for the complex part. We need Gitea to be stateless.
+
+**1. Create `gitea-values.yaml**`
+
+```yaml
+gitea:
+  replicaCount: 3 # Run 3 copies
+  config:
+    database:
+      DB_TYPE: postgres
+      HOST: gitea-db-postgresql-ha-pgpool:5432 # Point to Pgpool!
+      NAME: gitea
+      USER: postgres
+    # CRITICAL: Shared Storage for Repos
+    repository:
+      ROOT: /data/git/repositories
+    # Use Memcached/Redis for sessions (required for HA)
+    cache:
+      ADAPTER: memory # Ideally switch to Redis for true HA later
+    session:
+      PROVIDER: memory # Ideally switch to Redis for true HA later
+
+persistence:
+  enabled: true
+  accessModes:
+    - ReadWriteMany # This demands Longhorn
+  size: 20Gi
+  storageClass: longhorn
+
+service:
+  http:
+    type: ClusterIP # Don't expose directly!
+
+```
+
+**2. Install**
+
+```bash
+helm repo add gitea-charts https://dl.gitea.io/charts/
+helm install gitea gitea-charts/gitea -f gitea-values.yaml
+
+```
+
+---
+
+### Phase 5: The Security (Nginx + SSL)
+
+We stop exposing ports directly and use an Ingress Controller.
+
+**1. Install Nginx Ingress**
+K3s comes with Traefik by default. You can disable it or use it. If you prefer **Nginx**:
+
+```bash
+helm install ingress-nginx ingress-nginx/ingress-nginx
+
+```
+
+**2. Install Cert-Manager (For Let's Encrypt)**
+
+```bash
+kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
+
+```
+
+**3. Create the Ingress Resource**
+Save as `gitea-ingress.yaml`:
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: gitea-ingress
+  annotations:
+    cert-manager.io/cluster-issuer: "letsencrypt-prod"
+spec:
+  ingressClassName: nginx
+  tls:
+  - hosts:
+    - git.yourdomain.com
+    secretName: gitea-tls-secret
+  rules:
+  - host: git.yourdomain.com
+    http:
+      paths:
+      - path: /
+        pathType: Prefix
+        backend:
+          service:
+            name: gitea-http
+            port:
+              number: 3000
+
+```
+
+---
+
+### Phase 6: The "Phoenix" Automation (Bonus)
+
+Deploy the logic we discussed to rotate nodes.
+
+**1. Create the Service Account**
+Save as `rbac.yaml`. This gives the bot permission to kill pods.
+
+```yaml
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: phoenix-sa
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: phoenix-role
+rules:
+- apiGroups: [""]
+  resources: ["pods", "persistentvolumeclaims"]
+  verbs: ["get", "list", "delete"]
+- apiGroups: ["apps"]
+  resources: ["statefulsets"]
+  verbs: ["get", "list", "patch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: phoenix-binding
+subjects:
+- kind: ServiceAccount
+  name: phoenix-sa
+  namespace: default
+roleRef:
+  kind: ClusterRole
+  name: phoenix-role
+  apiGroup: rbac.authorization.k8s.io
+
+```
+
+`kubectl apply -f rbac.yaml`
+
+**2. Deploy the CronJob**
+Use the **"3-Node Phoenix Script"** from my previous response. Save it as `phoenix-cron.yaml` and apply it.
+
+---
+
+### Phase 7: Monitoring & Notification (Bonus Material)
+
+You want to know *before* your users do if something breaks.
+
+**1. Install the "Kube-Prometheus-Stack"**
+This gives you Prometheus (Database), Grafana (Dashboards), and Alertmanager (Notifications) in one go.
+
+```bash
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm install monitoring prometheus-community/kube-prometheus-stack
+
+```
+
+**2. Configure Alerts (Email/Slack)**
+Edit the `alertmanager` config to send notifications.
+
+* **Trigger:** `PostgresqlDown` (If pgpool can't see a backend).
+* **Trigger:** `KubePodCrashLooping` (If Gitea is restarting).
+
+**3. The "Dead Man's Switch"**
+Since you have a Phoenix strategy that *intentionally* kills pods, you need to **Silence Alerts** during that specific maintenance window (Sunday 3 AM), or you will wake up to panic emails every week.
+
+* You can automate "Silence" creation via the Alertmanager API in your Phoenix script:
+```bash
+curl -XPOST http://monitoring-alertmanager:9093/api/v2/silences -d '{...}'
+
+```
+
+
+
+### Summary of Result
+
+You now have:
+
+1. **3 Physical Nodes** mirroring data via Longhorn.
+2. **3 Database Replicas** load-balanced by Pgpool.
+3. **SSL Termination** handling security.
+4. **Auto-Rotation** killing and rebuilding servers weekly.
+5. **Monitoring** watching it all.
+
+
+xxx