Files
inter-hub/deploy/railiance/RUNBOOK.md

223 lines
6.8 KiB
Markdown

# inter-hub on Railiance01 — Runbook
## Architecture
- **Cluster:** Railiance01 (K3s, 92.205.62.239)
- **Namespace:** `inter-hub`
- **Image registry:** `gitea.coulomb.social/coulomb/inter-hub:<sha>`
- **Database:** CloudNativePG cluster `net-kingdom-pg` in `databases` namespace
- RW endpoint: `net-kingdom-pg-rw.databases.svc.cluster.local:5432`
- Database: `interhub`, User: `interhub`
- **Ingress:** Traefik → `hub.coulomb.social` (TLS via letsencrypt-prod)
- **Secrets:** `inter-hub-env` Secret in `inter-hub` namespace
- **App handoff:** `app.toml` points Railiance operators to
`railiance-apps/charts/inter-hub` with values from
`railiance-apps/helm/inter-hub-values.yaml`
## Deployment
Normal deployment is handled by Gitea Actions on push to `main`:
- runner labels: `self-hosted`, `haskelseed`
- build: `nix build .#docker`
- publish: `gitea.coulomb.social/coulomb/inter-hub:<short-sha>` and `latest`
- deploy: `helm upgrade --install inter-hub deploy/helm/inter-hub ...`
- smoke: public landing page and v2 auth gate
Manual deployment from this repo:
```bash
helm upgrade --install inter-hub deploy/helm/inter-hub \
--namespace inter-hub --create-namespace \
--set image.tag=<short-sha> \
--wait --timeout 5m
```
Manual deployment through the Railiance app handoff chart:
```bash
helm upgrade --install inter-hub /home/worsch/railiance-apps/charts/inter-hub \
--namespace inter-hub --create-namespace \
-f /home/worsch/railiance-apps/helm/inter-hub-values.yaml \
--set image.tag=<short-sha> \
--wait --timeout 5m
```
## Image Build (on haskelseed)
```bash
ssh root@192.168.178.135
cd /root/inter-hub
# Build:
nix build .#docker --log-format raw > /tmp/build.log 2>&1
# Push:
SHA=$(git rev-parse --short HEAD)
TOKEN=$(curl -fsS \
"https://gitea.coulomb.social/v2/token?service=container_registry&scope=repository:coulomb/inter-hub:push,pull" \
-u "tegwick:<REGISTRY_TOKEN>" | awk -F'"' '/token/{print $4}')
skopeo copy --insecure-policy \
--dest-registry-token "$TOKEN" \
docker-archive:result \
docker://gitea.coulomb.social/coulomb/inter-hub:$SHA
```
**Notes:**
- Haskelseed is a build/deploy runner, not the production app host.
- The IHP Nix Docker image may not have `/bin/sh`. Prefer Kubernetes-native
checks from other pods or the database pod when possible.
## Gitea Registry Credentials
The deploy workflow uses the repository Actions secret `REGISTRY_TOKEN` to
request a short-lived registry bearer token from
`https://gitea.coulomb.social/v2/token`.
If publishing starts failing with an authentication error:
1. Generate or rotate a Gitea token with package write access.
2. Update the `REGISTRY_TOKEN` Actions secret for `coulomb/inter-hub`.
3. Rerun the workflow or push a non-production test commit.
Do not print token values in logs, State Hub, or commits.
## Runtime Secret Source
The live deployment currently consumes the Kubernetes Secret
`inter-hub/inter-hub-env`. The durable source file is:
```text
deploy/railiance/secrets/inter-hub.env.sops.yaml
```
Create or refresh it from the live Secret using:
```bash
tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT
kubectl -n inter-hub get secret inter-hub-env -o json \
| python3 deploy/railiance/secrets/k8s-secret-json-to-sops-input.py \
> "$tmp"
sops --encrypt \
--age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \
"$tmp" > deploy/railiance/secrets/inter-hub.env.sops.yaml
```
Apply the encrypted source:
```bash
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml \
| kubectl apply -f -
kubectl rollout restart deployment/inter-hub -n inter-hub
kubectl rollout status deployment/inter-hub -n inter-hub
```
Custody-backed recovery verification:
```bash
# after the approved custody unlock makes the age identity available
make recovery-drill
```
The drill prints UTC/local timestamps, verifies that the committed SOPS file can
be decrypted in memory, checks the expected Secret metadata and key names, and
does not print secret values. Keep the PASS output as non-secret recovery
evidence.
## Database Migration
IHP migrations can be run from the production image when needed. Because the
image is Nix-built and may not contain a shell, first inspect the binary path:
```bash
kubectl exec -n inter-hub deploy/inter-hub -- find /nix/store -path '*inter-hub*/bin/RunProdServer'
kubectl exec -n inter-hub deploy/inter-hub -- /nix/store/<hash>-inter-hub/bin/RunProdServer migrate
```
To check migration status:
```bash
kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres interhub -c "\dt"
```
## Logs
```bash
kubectl logs -n inter-hub -l app=inter-hub --tail=100 -f
# Previous pod logs:
kubectl logs -n inter-hub -l app=inter-hub --previous --tail=50
```
## Restart / Rollback
```bash
# Restart:
kubectl rollout restart deployment/inter-hub -n inter-hub
kubectl rollout status deployment/inter-hub -n inter-hub
# Rollback to previous image:
kubectl rollout undo deployment/inter-hub -n inter-hub
# Rollback to specific version:
helm rollback inter-hub 1 --namespace inter-hub
```
## Secret Rotation
To rotate the session secret:
```bash
sops deploy/railiance/secrets/inter-hub.env.sops.yaml
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -
kubectl rollout restart deployment/inter-hub -n inter-hub
```
To rotate the database password:
1. Update the password in PostgreSQL (via kubectl exec to the CNPG pod)
2. Update the `inter-hub-env` secret
3. Restart the deployment
## Smoke Test
```bash
curl -fsS https://hub.coulomb.social/ | grep "inter-hub"
curl -fsS https://hub.coulomb.social/api/v2/openapi.json >/dev/null
curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/widgets | grep 401
```
## Database Connection Check
The IHP Nix image has no `/bin/sh`. Connect via the CNPG pod instead:
```bash
kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres -d interhub -c "SELECT version();"
```
## Password Hashing
IHP uses `pwstore-fast` (`Crypto.PasswordStore`) — **not bcrypt**. Hash format:
```
sha256|17|<base64-salt>|<base64-hash>
```
To generate a correct hash (requires GHC with pwstore-fast available on haskelseed):
```bash
ssh root@192.168.178.135
cat > /tmp/genhash.hs << 'EOF'
import qualified Crypto.PasswordStore as PS
import qualified Data.ByteString.Char8 as B8
main :: IO ()
main = do
h <- PS.makePassword (B8.pack "yourpassword") 17
B8.putStrLn h
EOF
/nix/store/yp23474ys67f1fd2z2ff1nn3q5wrmjng-ghc-9.10.3-with-packages/bin/runghc /tmp/genhash.hs
```
## haskelseed Build VM
- **Host:** 192.168.178.135
- **Access:** ops-bridge SSH path with the approved operator key
- **Role:** self-hosted Gitea Actions runner and Nix build machine only
- **Runner:** OpenRC `act_runner` service registered to `https://gitea.coulomb.social`
- **Build logs:** Gitea Actions logs and temporary runner work directories
- **Nix store:** `/dev/sdb1` (100 GB, mounted at `/nix`)