Files
inter-hub/deploy/railiance/RUNBOOK.md

211 lines
6.4 KiB
Markdown

# inter-hub on Railiance01 — Runbook
## Architecture
- **Cluster:** Railiance01 (K3s, 92.205.62.239)
- **Namespace:** `inter-hub`
- **Image registry:** `gitea.coulomb.social/coulomb/inter-hub:<sha>`
- **Database:** CloudNativePG cluster `net-kingdom-pg` in `databases` namespace
- RW endpoint: `net-kingdom-pg-rw.databases.svc.cluster.local:5432`
- Database: `interhub`, User: `interhub`
- **Ingress:** Traefik → `hub.coulomb.social` (TLS via letsencrypt-prod)
- **Secrets:** `inter-hub-env` Secret in `inter-hub` namespace
- **App handoff:** `app.toml` points Railiance operators to
`railiance-apps/charts/inter-hub` with values from
`railiance-apps/helm/inter-hub-values.yaml`
## Deployment
Normal deployment is handled by Gitea Actions on push to `main`:
- runner labels: `self-hosted`, `haskelseed`
- build: `nix build .#docker`
- publish: `gitea.coulomb.social/coulomb/inter-hub:<short-sha>` and `latest`
- deploy: `helm upgrade --install inter-hub deploy/helm/inter-hub ...`
- smoke: public landing page and v2 auth gate
Manual deployment from this repo:
```bash
helm upgrade --install inter-hub deploy/helm/inter-hub \
--namespace inter-hub --create-namespace \
--set image.tag=<short-sha> \
--wait --timeout 5m
```
Manual deployment through the Railiance app handoff chart:
```bash
helm upgrade --install inter-hub /home/worsch/railiance-apps/charts/inter-hub \
--namespace inter-hub --create-namespace \
-f /home/worsch/railiance-apps/helm/inter-hub-values.yaml \
--set image.tag=<short-sha> \
--wait --timeout 5m
```
## Image Build (on haskelseed)
```bash
ssh root@192.168.178.135
cd /root/inter-hub
# Build:
nix build .#docker --log-format raw > /tmp/build.log 2>&1
# Push:
SHA=$(git rev-parse --short HEAD)
TOKEN=$(curl -fsS \
"https://gitea.coulomb.social/v2/token?service=container_registry&scope=repository:coulomb/inter-hub:push,pull" \
-u "tegwick:<REGISTRY_TOKEN>" | awk -F'"' '/token/{print $4}')
skopeo copy --insecure-policy \
--dest-registry-token "$TOKEN" \
docker-archive:result \
docker://gitea.coulomb.social/coulomb/inter-hub:$SHA
```
**Notes:**
- Haskelseed is a build/deploy runner, not the production app host.
- The IHP Nix Docker image may not have `/bin/sh`. Prefer Kubernetes-native
checks from other pods or the database pod when possible.
## Gitea Registry Credentials
The deploy workflow uses the repository Actions secret `REGISTRY_TOKEN` to
request a short-lived registry bearer token from
`https://gitea.coulomb.social/v2/token`.
If publishing starts failing with an authentication error:
1. Generate or rotate a Gitea token with package write access.
2. Update the `REGISTRY_TOKEN` Actions secret for `coulomb/inter-hub`.
3. Rerun the workflow or push a non-production test commit.
Do not print token values in logs, State Hub, or commits.
## Runtime Secret Source
The live deployment currently consumes the Kubernetes Secret
`inter-hub/inter-hub-env`. The durable source file is:
```text
deploy/railiance/secrets/inter-hub.env.sops.yaml
```
Create or refresh it from the live Secret using:
```bash
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
kubectl -n inter-hub get secret inter-hub-env -o json \
| python3 deploy/railiance/secrets/k8s-secret-json-to-sops-input.py \
> "$tmp"
sops --encrypt \
--age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \
"$tmp" > deploy/railiance/secrets/inter-hub.env.sops.yaml
```
Apply the encrypted source:
```bash
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml \
| kubectl apply -f -
kubectl rollout restart deployment/inter-hub -n inter-hub
kubectl rollout status deployment/inter-hub -n inter-hub
```
## Database Migration
IHP migrations can be run from the production image when needed. Because the
image is Nix-built and may not contain a shell, first inspect the binary path:
```bash
kubectl exec -n inter-hub deploy/inter-hub -- find /nix/store -path '*inter-hub*/bin/RunProdServer'
kubectl exec -n inter-hub deploy/inter-hub -- /nix/store/<hash>-inter-hub/bin/RunProdServer migrate
```
To check migration status:
```bash
kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres interhub -c "\dt"
```
## Logs
```bash
kubectl logs -n inter-hub -l app=inter-hub --tail=100 -f
# Previous pod logs:
kubectl logs -n inter-hub -l app=inter-hub --previous --tail=50
```
## Restart / Rollback
```bash
# Restart:
kubectl rollout restart deployment/inter-hub -n inter-hub
kubectl rollout status deployment/inter-hub -n inter-hub
# Rollback to previous image:
kubectl rollout undo deployment/inter-hub -n inter-hub
# Rollback to specific version:
helm rollback inter-hub 1 --namespace inter-hub
```
## Secret Rotation
To rotate the session secret:
```bash
sops deploy/railiance/secrets/inter-hub.env.sops.yaml
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -
kubectl rollout restart deployment/inter-hub -n inter-hub
```
To rotate the database password:
1. Update the password in PostgreSQL (via kubectl exec to the CNPG pod)
2. Update the `inter-hub-env` secret
3. Restart the deployment
## Smoke Test
```bash
curl -fsS https://hub.coulomb.social/ | grep "inter-hub"
curl -fsS https://hub.coulomb.social/api/v2/openapi.json >/dev/null
curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/widgets | grep 401
```
## Database Connection Check
The IHP Nix image has no `/bin/sh`. Connect via the CNPG pod instead:
```bash
kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres -d interhub -c "SELECT version();"
```
## Password Hashing
IHP uses `pwstore-fast` (`Crypto.PasswordStore`) — **not bcrypt**. Hash format:
```
sha256|17|<base64-salt>|<base64-hash>
```
To generate a correct hash (requires GHC with pwstore-fast available on haskelseed):
```bash
ssh root@192.168.178.135
cat > /tmp/genhash.hs << 'EOF'
import qualified Crypto.PasswordStore as PS
import qualified Data.ByteString.Char8 as B8
main :: IO ()
main = do
h <- PS.makePassword (B8.pack "yourpassword") 17
B8.putStrLn h
EOF
/nix/store/yp23474ys67f1fd2z2ff1nn3q5wrmjng-ghc-9.10.3-with-packages/bin/runghc /tmp/genhash.hs
```
## haskelseed Build VM
- **Host:** 192.168.178.135
- **Access:** ops-bridge SSH path with the approved operator key
- **Role:** self-hosted Gitea Actions runner and Nix build machine only
- **Runner:** OpenRC `act_runner` service registered to `https://gitea.coulomb.social`
- **Build logs:** Gitea Actions logs and temporary runner work directories
- **Nix store:** `/dev/sdb1` (100 GB, mounted at `/nix`)