generated from coulomb/repo-seed
280 lines
9.6 KiB
Markdown
280 lines
9.6 KiB
Markdown
# inter-hub Production Deploy Runbook
|
|
|
|
## Architecture
|
|
|
|
- **Deployment cluster:** COULOMBCORE K3s (`92.205.130.254`) as observed from
|
|
the haskelseed runner kube context on 2026-06-14.
|
|
- **Stale public DNS host:** `hub.coulomb.social` still resolved to
|
|
`92.205.62.239` on 2026-06-14, which served the older API surface.
|
|
- **Namespace:** `inter-hub`
|
|
- **Image registry:** `gitea.coulomb.social/coulomb/inter-hub:<sha>`
|
|
- **Database:** CloudNativePG cluster `net-kingdom-pg` in `databases` namespace
|
|
- RW endpoint: `net-kingdom-pg-rw.databases.svc.cluster.local:5432`
|
|
- Database: `interhub`, User: `interhub`
|
|
- **Ingress:** Traefik → `hub.coulomb.social` (TLS via letsencrypt-prod)
|
|
- **Secrets:** `inter-hub-env` Secret in `inter-hub` namespace
|
|
- **App handoff:** `app.toml` points Railiance operators to
|
|
`railiance-apps/charts/inter-hub` with values from
|
|
`railiance-apps/helm/inter-hub-values.yaml`
|
|
|
|
## Public DNS Gate
|
|
|
|
The app deployment can be healthy while public smoke tests still fail if DNS
|
|
points `hub.coulomb.social` at the stale host. On 2026-06-14:
|
|
|
|
- Kubernetes reported image `gitea.coulomb.social/coulomb/inter-hub:6455902`
|
|
ready in namespace `inter-hub` on node `92.205.130.254`.
|
|
- An in-cluster probe to `http://inter-hub:8000/api/v2/hubs` returned `401`.
|
|
- Forcing public TLS to the cluster ingress also returned `401`:
|
|
`curl --resolve hub.coulomb.social:443:92.205.130.254 https://hub.coulomb.social/api/v2/hubs`.
|
|
- Normal DNS resolved `hub.coulomb.social` to `92.205.62.239`, where
|
|
`/api/v2/hubs` returned `404` and OpenAPI lacked the bootstrap paths.
|
|
|
|
Before treating a deploy as failed, compare DNS and forced-ingress probes:
|
|
|
|
```bash
|
|
getent ahosts hub.coulomb.social
|
|
curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/hubs
|
|
curl --resolve hub.coulomb.social:443:92.205.130.254 \
|
|
-s -o /dev/null -w "%{http_code}" \
|
|
https://hub.coulomb.social/api/v2/hubs
|
|
```
|
|
|
|
The public bootstrap gate passes when the DNS A record for
|
|
`hub.coulomb.social` points at the active ingress IP (`92.205.130.254`) or the
|
|
workflow kubeconfig is intentionally rotated to deploy to the cluster behind the
|
|
current DNS target.
|
|
|
|
## Deployment
|
|
|
|
Normal deployment is handled by Gitea Actions on push to `main`:
|
|
|
|
- runner labels: `self-hosted`, `haskelseed`
|
|
- build: `nix build .#docker`
|
|
- publish: `gitea.coulomb.social/coulomb/inter-hub:<short-sha>` and `latest`
|
|
- deploy: `helm upgrade --install inter-hub deploy/helm/inter-hub ...`
|
|
- smoke: public landing page and v2 auth gate
|
|
|
|
Manual deployment from this repo:
|
|
|
|
```bash
|
|
helm upgrade --install inter-hub deploy/helm/inter-hub \
|
|
--namespace inter-hub --create-namespace \
|
|
--set image.tag=<short-sha> \
|
|
--wait --timeout 5m
|
|
```
|
|
|
|
Manual deployment through the Railiance app handoff chart:
|
|
|
|
```bash
|
|
helm upgrade --install inter-hub /home/worsch/railiance-apps/charts/inter-hub \
|
|
--namespace inter-hub --create-namespace \
|
|
-f /home/worsch/railiance-apps/helm/inter-hub-values.yaml \
|
|
--set image.tag=<short-sha> \
|
|
--wait --timeout 5m
|
|
```
|
|
|
|
## Image Build (on haskelseed)
|
|
|
|
```bash
|
|
ssh root@192.168.178.135
|
|
cd /root/inter-hub
|
|
# Build:
|
|
nix build .#docker --log-format raw > /tmp/build.log 2>&1
|
|
|
|
# Push:
|
|
SHA=$(git rev-parse --short HEAD)
|
|
TOKEN=$(curl -fsS \
|
|
"https://gitea.coulomb.social/v2/token?service=container_registry&scope=repository:coulomb/inter-hub:push,pull" \
|
|
-u "tegwick:<REGISTRY_TOKEN>" | awk -F'"' '/token/{print $4}')
|
|
skopeo copy --insecure-policy \
|
|
--dest-registry-token "$TOKEN" \
|
|
docker-archive:result \
|
|
docker://gitea.coulomb.social/coulomb/inter-hub:$SHA
|
|
```
|
|
|
|
**Notes:**
|
|
- Haskelseed is a build/deploy runner, not the production app host.
|
|
- The IHP Nix Docker image may not have `/bin/sh`. Prefer Kubernetes-native
|
|
checks from other pods or the database pod when possible.
|
|
|
|
## Gitea Registry Credentials
|
|
|
|
The deploy workflow uses the repository Actions secret `REGISTRY_TOKEN` to
|
|
request a short-lived registry bearer token from
|
|
`https://gitea.coulomb.social/v2/token`.
|
|
|
|
If publishing starts failing with an authentication error:
|
|
1. Generate or rotate a Gitea token with package write access.
|
|
2. Update the `REGISTRY_TOKEN` Actions secret for `coulomb/inter-hub`.
|
|
3. Rerun the workflow or push a non-production test commit.
|
|
|
|
Do not print token values in logs, State Hub, or commits.
|
|
|
|
## Runtime Secret Source
|
|
|
|
The live deployment currently consumes the Kubernetes Secret
|
|
`inter-hub/inter-hub-env`. The durable source file is:
|
|
|
|
```text
|
|
deploy/railiance/secrets/inter-hub.env.sops.yaml
|
|
```
|
|
|
|
Create or refresh it from the live Secret using:
|
|
|
|
```bash
|
|
tmp="$(mktemp)"
|
|
trap 'rm -f "$tmp"' EXIT
|
|
|
|
kubectl -n inter-hub get secret inter-hub-env -o json \
|
|
| python3 deploy/railiance/secrets/k8s-secret-json-to-sops-input.py \
|
|
> "$tmp"
|
|
|
|
sops --encrypt \
|
|
--age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \
|
|
"$tmp" > deploy/railiance/secrets/inter-hub.env.sops.yaml
|
|
```
|
|
|
|
Apply the encrypted source:
|
|
|
|
```bash
|
|
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml \
|
|
| kubectl apply -f -
|
|
kubectl rollout restart deployment/inter-hub -n inter-hub
|
|
kubectl rollout status deployment/inter-hub -n inter-hub
|
|
```
|
|
|
|
Custody-backed recovery verification:
|
|
|
|
```bash
|
|
# after the approved custody unlock makes the age identity available
|
|
make recovery-drill
|
|
```
|
|
|
|
The drill prints UTC/local timestamps, verifies that the committed SOPS file can
|
|
be decrypted in memory, checks the expected Secret metadata and key names, and
|
|
does not print secret values. Keep the PASS output as non-secret recovery
|
|
evidence.
|
|
|
|
## Database Migration
|
|
|
|
The current Nix production image is intentionally minimal: image metadata for
|
|
`6455902` points at
|
|
`/nix/store/<hash>-inter-hub/bin/RunProdServer`, and the package contains only
|
|
`RunProdServer` and `RunJobs`. It has no shell and no packaged migration
|
|
runner, so schema work is performed through the CloudNativePG pod.
|
|
|
|
Check schema state:
|
|
```bash
|
|
kubectl exec -n databases net-kingdom-pg-1 -- \
|
|
psql -d interhub -Atc "SELECT count(*) FROM information_schema.tables WHERE table_schema = 'public';"
|
|
```
|
|
|
|
Initialize a blank production database from the canonical schema:
|
|
```bash
|
|
kubectl exec -i -n databases net-kingdom-pg-1 -- \
|
|
psql -d interhub -v ON_ERROR_STOP=1 -1 -f - < Application/Schema.sql
|
|
|
|
kubectl exec -i -n databases net-kingdom-pg-1 -- \
|
|
psql -d interhub -v ON_ERROR_STOP=1 -1 -f - < Application/Migration/1744502400-seed-type-registries.sql
|
|
|
|
kubectl exec -i -n databases net-kingdom-pg-1 -- psql -d interhub -v ON_ERROR_STOP=1 -1 -f - <<'SQL'
|
|
GRANT USAGE ON SCHEMA public TO interhub;
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO interhub;
|
|
GRANT USAGE, SELECT, UPDATE ON ALL SEQUENCES IN SCHEMA public TO interhub;
|
|
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO interhub;
|
|
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO interhub;
|
|
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT USAGE, SELECT, UPDATE ON SEQUENCES TO interhub;
|
|
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO interhub;
|
|
SQL
|
|
|
|
kubectl rollout restart deployment/inter-hub -n inter-hub
|
|
kubectl rollout status deployment/inter-hub -n inter-hub
|
|
```
|
|
|
|
Do not apply `1744416000-seed-admin-user.sql` unattended in production; it uses
|
|
a documented default password intended for initial local deployment only.
|
|
|
|
## Logs
|
|
|
|
```bash
|
|
kubectl logs -n inter-hub -l app=inter-hub --tail=100 -f
|
|
# Previous pod logs:
|
|
kubectl logs -n inter-hub -l app=inter-hub --previous --tail=50
|
|
```
|
|
|
|
## Restart / Rollback
|
|
|
|
```bash
|
|
# Restart:
|
|
kubectl rollout restart deployment/inter-hub -n inter-hub
|
|
kubectl rollout status deployment/inter-hub -n inter-hub
|
|
|
|
# Rollback to previous image:
|
|
kubectl rollout undo deployment/inter-hub -n inter-hub
|
|
|
|
# Rollback to specific version:
|
|
helm rollback inter-hub 1 --namespace inter-hub
|
|
```
|
|
|
|
## Secret Rotation
|
|
|
|
To rotate the session secret:
|
|
```bash
|
|
sops deploy/railiance/secrets/inter-hub.env.sops.yaml
|
|
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -
|
|
kubectl rollout restart deployment/inter-hub -n inter-hub
|
|
```
|
|
|
|
To rotate the database password:
|
|
1. Update the password in PostgreSQL (via kubectl exec to the CNPG pod)
|
|
2. Update the `inter-hub-env` secret
|
|
3. Restart the deployment
|
|
|
|
## Smoke Test
|
|
|
|
```bash
|
|
getent ahosts hub.coulomb.social # expected: 92.205.130.254
|
|
curl -fsS https://hub.coulomb.social/ | grep "inter-hub"
|
|
curl -fsS https://hub.coulomb.social/api/v2/openapi.json >/dev/null
|
|
curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/widgets | grep 401
|
|
curl -s -o /dev/null -w "%{http_code}" https://hub.coulomb.social/api/v2/hubs | grep 401
|
|
```
|
|
|
|
## Database Connection Check
|
|
|
|
The IHP Nix image has no `/bin/sh`. Connect via the CNPG pod instead:
|
|
```bash
|
|
kubectl exec -n databases net-kingdom-pg-1 -- psql -U postgres -d interhub -c "SELECT version();"
|
|
```
|
|
|
|
## Password Hashing
|
|
|
|
IHP uses `pwstore-fast` (`Crypto.PasswordStore`) — **not bcrypt**. Hash format:
|
|
```
|
|
sha256|17|<base64-salt>|<base64-hash>
|
|
```
|
|
|
|
To generate a correct hash (requires GHC with pwstore-fast available on haskelseed):
|
|
```bash
|
|
ssh root@192.168.178.135
|
|
cat > /tmp/genhash.hs << 'EOF'
|
|
import qualified Crypto.PasswordStore as PS
|
|
import qualified Data.ByteString.Char8 as B8
|
|
main :: IO ()
|
|
main = do
|
|
h <- PS.makePassword (B8.pack "yourpassword") 17
|
|
B8.putStrLn h
|
|
EOF
|
|
/nix/store/yp23474ys67f1fd2z2ff1nn3q5wrmjng-ghc-9.10.3-with-packages/bin/runghc /tmp/genhash.hs
|
|
```
|
|
|
|
## haskelseed Build VM
|
|
|
|
- **Host:** 192.168.178.135
|
|
- **Access:** ops-bridge SSH path with the approved operator key
|
|
- **Role:** self-hosted Gitea Actions runner and Nix build machine only
|
|
- **Runner:** OpenRC `act_runner` service registered to `https://gitea.coulomb.social`
|
|
- **Build logs:** Gitea Actions logs and temporary runner work directories
|
|
- **Nix store:** `/dev/sdb1` (100 GB, mounted at `/nix`)
|