diff --git a/workplans/IHUB-WP-0018-railiance01-deployment.md b/workplans/IHUB-WP-0018-railiance01-deployment.md new file mode 100644 index 0000000..12b17b1 --- /dev/null +++ b/workplans/IHUB-WP-0018-railiance01-deployment.md @@ -0,0 +1,294 @@ +--- +id: IHUB-WP-0018 +type: workplan +title: "Railiance01 Deployment — Production Operations Scaffold" +domain: inter_hub +repo: inter-hub +status: open +owner: custodian +topic_slug: inter_hub +created: "2026-04-29" +updated: "2026-04-29" +depends_on: IHUB-WP-0015 +state_hub_workstream_id: "080d841a-3acd-4adf-b684-2d1890a5e986" +--- + +# IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold + +## Goal + +Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic +deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a +Gitea Actions CI/CD pipeline. After this workplan, every push to `main` +automatically builds an OCI container image on haskelseed, pushes it to the +Railiance container registry, and deploys it — with automatic restart on node +reboot guaranteed by K3s. + +## Background + +inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer +and socat. That setup is a development convenience, not a production operations +scaffold. The target is the Railiance01 K3s cluster, which has: + +- K3s (single-node for now; ThreePhoenix HA cluster is in progress) +- Traefik ingress with TLS +- PostgreSQL HA (repmgr + pgpool) managed by railiance-platform +- SOPS/age secret management +- Gitea with built-in container registry (or separate registry service) +- Staged Promotion Lifecycle CLI (`railiance run / deploy / promote / rollback`) + +**Key constraint:** This workplan depends on Railiance01 K3s being operational. +Gate R3 verifies cluster readiness before any deployment work begins — if K3s +or the container registry is not ready, this workplan blocks there and the +cluster work must be completed first. + +**IHP specifics:** IHP DevServer is a development server. For production we +build the IHP binary via `nix build` (which produces a self-contained binary) +and wrap it in a minimal OCI image using Nix's `dockerTools.buildImage`. The +app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes +since Traefik routes directly to the pod's port. + +## Architecture + +``` +git push → Gitea Actions + → SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA + → helm upgrade inter-hub railiance-apps/helm/inter-hub + → Deployment (1 replica): inter-hub:$SHA + env from Secrets + → Service (ClusterIP :8000) + → Ingress (Traefik): hub.coulomb.social → Service + → PersistentVolumeClaim: /app/static (generated CSS/JS) + → PostgreSQL: database 'interhub' on railiance-platform HA cluster +``` + +## Tasks + +### R1 — Add OCI image build to flake.nix + +Add a `packages.docker` output to `flake.nix` using `pkgs.dockerTools.buildLayeredImage`. +The image wraps the IHP production binary produced by `nix build .#default`. + +```nix +packages.docker = pkgs.dockerTools.buildLayeredImage { + name = "inter-hub"; + tag = "latest"; + contents = [ self.packages.${system}.default pkgs.cacert ]; + config = { + Cmd = [ "/bin/inter-hub" ]; + ExposedPorts = { "8000/tcp" = {}; }; + Env = [ + "PORT=8000" + "IHP_ENV=Production" + ]; + }; +}; +``` + +Test locally on haskelseed: +```bash +nix build .#docker +docker load < result +docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest +``` + +**Note:** First build pulls the full Haskell binary closure (~2 GB); subsequent +builds are incremental (layer caching). Build must run on haskelseed — the only +machine with the Nix store populated for GHC 9.10.3. + +### R2 — Verify container runs correctly + +On haskelseed, run the container image against the existing `interhub` database. +Confirm: +- `curl http://localhost:8000/` returns 200 (LandingAction) +- `curl http://localhost:8000/api/v2/hubs` returns 401 (auth required) +- Static assets load (Tailwind CSS present in image) +- Container exits cleanly on SIGTERM + +If Tailwind CSS output (`static/app.css`) is not bundled into the Nix binary +closure, add a pre-build step: run tailwindcss and include `static/` in the +image via `dockerTools.buildLayeredImage` `contents` or a NixOS module. + +### R3 — Verify Railiance01 readiness (gate) + +This is a dependency gate. Before proceeding, confirm: + +```bash +# From CoulombCore (execution origin): +kubectl get nodes # must show Ready +kubectl get pods -n kube-system | grep traefik # Traefik must be running +kubectl get pods -n railiance-platform # PostgreSQL HA pods +``` + +Also confirm: +- Container registry is reachable from haskelseed (verify push access) +- Registry address (e.g., `registry.coulomb.social` or `gitea.coulomb.social`) +- SOPS/age key is present on CoulombCore at `~/.config/sops/age/keys.txt` + +If any check fails, block here and open the relevant Railiance workstream. +Do not proceed until all checks pass. + +### R4 — Provision inter-hub database on railiance-platform + +On the PostgreSQL HA cluster, create the inter-hub database and user: + +```sql +CREATE USER interhub WITH PASSWORD ''; +CREATE DATABASE interhub OWNER interhub; +GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub; +``` + +Run schema migration (IHP migrations) as part of the first deployment via an +init container or a manual `migrate` run inside the pod. Document the +migration procedure in `deploy/railiance/RUNBOOK.md`. + +### R5 — SOPS-encrypted secrets + +Create `deploy/railiance/secrets/inter-hub.env.sops.yaml` with: + +```yaml +# sops encrypted — do not edit manually +DATABASE_URL: postgresql://interhub:@pgpool.railiance-platform.svc:5432/interhub +IHP_SESSION_SECRET: <64-char-hex> +IHP_BASEURL: https://hub.coulomb.social +``` + +Encrypt with the age key: +```bash +sops --encrypt --age $(cat ~/.config/sops/age/keys.txt | grep public | awk '{print $4}') \ + deploy/railiance/secrets/inter-hub.env.sops.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml +``` + +Commit the encrypted file. The Gitea Actions workflow decrypts at deploy time +using the age key from a Kubernetes Secret (bootstrapped once manually). + +### R6 — Helm chart in railiance-apps + +Create `helm/inter-hub/` in the `railiance-apps` repository following the +Railiance app.toml contract. Minimal chart: + +``` +helm/inter-hub/ + Chart.yaml name: inter-hub, version: 0.1.0 + values.yaml image.tag, ingress.host, resources + values.prod.yaml replicas: 1, resources.requests.memory: 1Gi + templates/ + deployment.yaml envFrom: secretRef inter-hub-env + service.yaml ClusterIP :8000 + ingress.yaml Traefik annotations, TLS + secret.yaml created by sops-operator or external-secrets +``` + +`app.toml` in the inter-hub repo root for railiance CLI integration: +```toml +[app] +name = "inter-hub" +slug = "inter-hub" +kind = "native" +registry = "registry.coulomb.social/coulomb/inter-hub" + +[deploy] +chart = "railiance-apps/helm/inter-hub" +namespace = "inter-hub" +``` + +### R7 — Gitea Actions CI/CD pipeline + +Create `.gitea/workflows/deploy.yaml` in the inter-hub repo: + +```yaml +on: + push: + branches: [main] + +jobs: + build-and-deploy: + runs-on: ubuntu-latest # or self-hosted if available + steps: + - uses: actions/checkout@v4 + + - name: Build OCI image on haskelseed + run: | + ssh haskelseed "cd /root/inter-hub && git pull && \ + nix build .#docker && \ + docker load < result && \ + docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \ + docker push $REGISTRY/inter-hub:${{ github.sha }}" + + - name: Deploy to Railiance01 + run: | + ssh coulombcore "helm upgrade --install inter-hub \ + railiance-apps/helm/inter-hub \ + --namespace inter-hub --create-namespace \ + --set image.tag=${{ github.sha }} \ + -f railiance-apps/helm/inter-hub/values.prod.yaml" +``` + +Secrets in Gitea: `REGISTRY`, `SSH_KEY_HASKELSEED`, `SSH_KEY_COULOMBCORE`. + +**Alternative if self-hosted runner is available on CoulombCore:** run the +deploy step directly without the SSH hop to coulombcore. + +### R8 — Staged deployment and smoke test + +Follow the Railiance staged promotion lifecycle: + +1. **Local verify** (done in R2 — container runs correctly) +2. **Deploy to Railiance01:** + ```bash + railiance deploy inter-hub --tag + ``` +3. **Smoke test:** + ```bash + curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page + curl -s https://hub.coulomb.social/capabilities # Capabilities + curl -H "Authorization: Bearer " \ + https://hub.coulomb.social/api/v2/hubs # API (200) + curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401) + ``` +4. **Verify restart persistence:** + ```bash + kubectl rollout restart deployment/inter-hub -n inter-hub + kubectl rollout status deployment/inter-hub -n inter-hub + # Then re-run smoke test + ``` + +### R9 — Document and register + +- Write `deploy/railiance/RUNBOOK.md`: image build, migration procedure, + secret rotation, rollback (`railiance rollback inter-hub`), log access + (`kubectl logs -n inter-hub -l app=inter-hub --tail=100`) +- Add progress event to state hub +- Remove haskelseed socat/OpenRC production role note from quickstart — + document it as the build machine only, not the production host + +## Exit Criteria + +- `https://hub.coulomb.social/` returns the Landing page (200, no auth) +- `/api/v2/hubs` returns 401 unauthenticated, 200 with valid API key +- All 12 IHF dashboards accessible after admin login +- `kubectl rollout restart` followed by smoke test passes (K3s restart + persistence confirmed) +- Gitea Actions pipeline: push to `main` → image built → deployed → smoke + test green within 15 minutes +- No dependency on haskelseed being up for the app to *run* (only for builds) + +## Open Questions / Pre-flight Checks + +1. **K3s status**: ThreePhoenix HA cluster workstream is active but not complete. + Confirm whether Railiance01 is a single-node cluster already accepting + workloads or still being provisioned. Gate R3 is the go/no-go check. + +2. **Container registry**: Is Gitea's built-in registry available on Railiance01, + or is a separate registry service needed? If neither, add registry deployment + to the scope. + +3. **PostgreSQL HA status**: railiance-platform baseline workstream is active. + Confirm whether the HA cluster (repmgr + pgpool) is operational before R4. + +4. **Static asset bundling**: The Nix production binary may or may not include + `static/app.css` (Tailwind output). Verify in R2 and adjust image build + if needed. + +5. **Anthropic API key**: Phase 5 AI-assisted distillation requires + `IHP_ANTHROPIC_API_KEY`. Add to SOPS secrets if the feature is to be + active on Railiance01.