# inter-hub Production Deploy Runbook ## Architecture - **Deployment cluster:** COULOMBCORE K3s (`92.205.130.254`) as observed from the haskelseed runner kube context on 2026-06-14. - **Stale public DNS host:** `hub.coulomb.social` still resolved to `92.205.62.239` on 2026-06-14, which served the older API surface. - **Namespace:** `inter-hub` - **Image registry:** `gitea.coulomb.social/coulomb/inter-hub:` - **Database:** CloudNativePG cluster `net-kingdom-pg` in `databases` namespace - RW endpoint: `net-kingdom-pg-rw.databases.svc.cluster.local:5432` - Database: `interhub`, User: `interhub` - **Ingress:** Traefik → `hub.coulomb.social` (TLS via letsencrypt-prod) - **Secrets:** `inter-hub-env` Secret in `inter-hub` namespace - **App handoff:** `app.toml` points Railiance operators to `railiance-apps/charts/inter-hub` with values from `railiance-apps/helm/inter-hub-values.yaml` ## Public DNS Gate The app deployment can be healthy while public smoke tests still fail if DNS points `hub.coulomb.social` at the stale host. On 2026-06-14: - Kubernetes reported image `gitea.coulomb.social/coulomb/inter-hub:6455902` ready in namespace `inter-hub` on node `92.205.130.254`. - An in-cluster probe to `http://inter-hub:8000/api/v2/hubs` returned `401`. - Forcing public TLS to the cluster ingress also returned `401`: `curl --resolve hub.coulomb.social:443:92.205.130.254 https://hub.coulomb.social/api/v2/hubs`. - Normal DNS resolved `hub.coulomb.social` to `92.205.62.239`, where `/api/v2/hubs` returned `404` and OpenAPI lacked the bootstrap paths. Before treating a deploy as failed, compare DNS and forced-ingress probes: ```bash getent ahosts hub.coulomb.social curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/hubs curl --resolve hub.coulomb.social:443:92.205.130.254 \ -s -o /dev/null -w "%{http_code}" \ https://hub.coulomb.social/api/v2/hubs ``` The public bootstrap gate passes when the DNS A record for `hub.coulomb.social` points at the active ingress IP (`92.205.130.254`) or the workflow kubeconfig is intentionally rotated to deploy to the cluster behind the current DNS target. ## Deployment Normal deployment is handled by Gitea Actions on push to `main`: - runner labels: `self-hosted`, `haskelseed` - build: `nix build .#docker` - publish: `gitea.coulomb.social/coulomb/inter-hub:` and `latest` - deploy: `helm upgrade --install inter-hub deploy/helm/inter-hub ...` - smoke: public landing page and v2 auth gate Manual deployment from this repo: ```bash helm upgrade --install inter-hub deploy/helm/inter-hub \ --namespace inter-hub --create-namespace \ --set image.tag= \ --wait --timeout 5m ``` Manual deployment through the Railiance app handoff chart: ```bash helm upgrade --install inter-hub /home/worsch/railiance-apps/charts/inter-hub \ --namespace inter-hub --create-namespace \ -f /home/worsch/railiance-apps/helm/inter-hub-values.yaml \ --set image.tag= \ --wait --timeout 5m ``` ## Image Build (on haskelseed) ```bash ssh root@192.168.178.135 cd /root/inter-hub # Build: nix build .#docker --log-format raw > /tmp/build.log 2>&1 # Push: SHA=$(git rev-parse --short HEAD) TOKEN=$(curl -fsS \ "https://gitea.coulomb.social/v2/token?service=container_registry&scope=repository:coulomb/inter-hub:push,pull" \ -u "tegwick:" | awk -F'"' '/token/{print $4}') skopeo copy --insecure-policy \ --dest-registry-token "$TOKEN" \ docker-archive:result \ docker://gitea.coulomb.social/coulomb/inter-hub:$SHA ``` **Notes:** - Haskelseed is a build/deploy runner, not the production app host. - The IHP Nix Docker image may not have `/bin/sh`. Prefer Kubernetes-native checks from other pods or the database pod when possible. ## Gitea Registry Credentials The deploy workflow uses the repository Actions secret `REGISTRY_TOKEN` to request a short-lived registry bearer token from `https://gitea.coulomb.social/v2/token`. If publishing starts failing with an authentication error: 1. Generate or rotate a Gitea token with package write access. 2. Update the `REGISTRY_TOKEN` Actions secret for `coulomb/inter-hub`. 3. Rerun the workflow or push a non-production test commit. Do not print token values in logs, State Hub, or commits. ## Runtime Secret Source The live deployment currently consumes the Kubernetes Secret `inter-hub/inter-hub-env`. The durable source file is: ```text deploy/railiance/secrets/inter-hub.env.sops.yaml ``` Create or refresh it from the live Secret using: ```bash tmp="$(mktemp)" trap 'rm -f "$tmp"' EXIT kubectl -n inter-hub get secret inter-hub-env -o json \ | python3 deploy/railiance/secrets/k8s-secret-json-to-sops-input.py \ > "$tmp" sops --encrypt \ --age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \ "$tmp" > deploy/railiance/secrets/inter-hub.env.sops.yaml ``` Apply the encrypted source: ```bash sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml \ | kubectl apply -f - kubectl rollout restart deployment/inter-hub -n inter-hub kubectl rollout status deployment/inter-hub -n inter-hub ``` Custody-backed recovery verification: ```bash # after the approved custody unlock makes the age identity available make recovery-drill ``` The drill prints UTC/local timestamps, verifies that the committed SOPS file can be decrypted in memory, checks the expected Secret metadata and key names, and does not print secret values. Keep the PASS output as non-secret recovery evidence. ## Database Migration IHP migrations can be run from the production image when needed. Because the image is Nix-built and may not contain a shell, first inspect the binary path: ```bash kubectl exec -n inter-hub deploy/inter-hub -- find /nix/store -path '*inter-hub*/bin/RunProdServer' kubectl exec -n inter-hub deploy/inter-hub -- /nix/store/-inter-hub/bin/RunProdServer migrate ``` To check migration status: ```bash kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres interhub -c "\dt" ``` ## Logs ```bash kubectl logs -n inter-hub -l app=inter-hub --tail=100 -f # Previous pod logs: kubectl logs -n inter-hub -l app=inter-hub --previous --tail=50 ``` ## Restart / Rollback ```bash # Restart: kubectl rollout restart deployment/inter-hub -n inter-hub kubectl rollout status deployment/inter-hub -n inter-hub # Rollback to previous image: kubectl rollout undo deployment/inter-hub -n inter-hub # Rollback to specific version: helm rollback inter-hub 1 --namespace inter-hub ``` ## Secret Rotation To rotate the session secret: ```bash sops deploy/railiance/secrets/inter-hub.env.sops.yaml sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f - kubectl rollout restart deployment/inter-hub -n inter-hub ``` To rotate the database password: 1. Update the password in PostgreSQL (via kubectl exec to the CNPG pod) 2. Update the `inter-hub-env` secret 3. Restart the deployment ## Smoke Test ```bash getent ahosts hub.coulomb.social # expected: 92.205.130.254 curl -fsS https://hub.coulomb.social/ | grep "inter-hub" curl -fsS https://hub.coulomb.social/api/v2/openapi.json >/dev/null curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/widgets | grep 401 curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/hubs | grep 401 ``` ## Database Connection Check The IHP Nix image has no `/bin/sh`. Connect via the CNPG pod instead: ```bash kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres -d interhub -c "SELECT version();" ``` ## Password Hashing IHP uses `pwstore-fast` (`Crypto.PasswordStore`) — **not bcrypt**. Hash format: ``` sha256|17|| ``` To generate a correct hash (requires GHC with pwstore-fast available on haskelseed): ```bash ssh root@192.168.178.135 cat > /tmp/genhash.hs << 'EOF' import qualified Crypto.PasswordStore as PS import qualified Data.ByteString.Char8 as B8 main :: IO () main = do h <- PS.makePassword (B8.pack "yourpassword") 17 B8.putStrLn h EOF /nix/store/yp23474ys67f1fd2z2ff1nn3q5wrmjng-ghc-9.10.3-with-packages/bin/runghc /tmp/genhash.hs ``` ## haskelseed Build VM - **Host:** 192.168.178.135 - **Access:** ops-bridge SSH path with the approved operator key - **Role:** self-hosted Gitea Actions runner and Nix build machine only - **Runner:** OpenRC `act_runner` service registered to `https://gitea.coulomb.social` - **Build logs:** Gitea Actions logs and temporary runner work directories - **Nix store:** `/dev/sdb1` (100 GB, mounted at `/nix`)