--- id: IHUB-WP-0018 type: workplan title: "Railiance01 Deployment — Production Operations Scaffold" domain: inter_hub repo: inter-hub status: active owner: custodian topic_slug: inter_hub created: "2026-04-29" updated: "2026-06-07" depends_on: IHUB-WP-0015 state_hub_workstream_id: "080d841a-3acd-4adf-b684-2d1890a5e986" --- # IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold ## Goal Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a Gitea Actions CI/CD pipeline. After this workplan, every push to `main` automatically builds an OCI container image on haskelseed, pushes it to the Railiance container registry, and deploys it — with automatic restart on node reboot guaranteed by K3s. ## Background inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer and socat. That setup is a development convenience, not a production operations scaffold. The target is the Railiance01 K3s cluster, which has: - K3s (single-node for now; ThreePhoenix HA cluster is in progress) - Traefik ingress with TLS - PostgreSQL HA (repmgr + pgpool) managed by railiance-platform - SOPS/age secret management - Gitea with built-in container registry (or separate registry service) - Staged Promotion Lifecycle CLI (`railiance run / deploy / promote / rollback`) **Key constraint:** This workplan depends on Railiance01 K3s being operational. Gate R3 verifies cluster readiness before any deployment work begins — if K3s or the container registry is not ready, this workplan blocks there and the cluster work must be completed first. **IHP specifics:** IHP DevServer is a development server. For production we build the IHP binary via `nix build` (which produces a self-contained binary) and wrap it in a minimal OCI image using Nix's `dockerTools.buildImage`. The app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes since Traefik routes directly to the pod's port. ## Architecture ``` git push → Gitea Actions → SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA → helm upgrade inter-hub railiance-apps/helm/inter-hub → Deployment (1 replica): inter-hub:$SHA + env from Secrets → Service (ClusterIP :8000) → Ingress (Traefik): hub.coulomb.social → Service → PersistentVolumeClaim: /app/static (generated CSS/JS) → PostgreSQL: database 'interhub' on railiance-platform HA cluster ``` ## Close-out Audit - 2026-06-04 WSJF triage flagged this workplan as a close-out candidate because State Hub had no indexed task rows for it. The deployment work is not complete; this file now contains explicit task blocks so the hub can track the remaining Railiance01 deployment work instead of treating the workplan as empty. ## Deployment Review - 2026-06-05 Review against the current repo and public Railiance endpoint shows the deployment scaffold is partially implemented but the live deployment is behind `origin/main`. - `origin/main` is at `a3d980c`, which includes the completed ops-hub bootstrap API work from `IHUB-WP-0019`. - `https://hub.coulomb.social/` returns 200 and serves inter-hub. - The public OpenAPI only lists the older v2 endpoints; it does not include `/hubs`, `/hub-capability-manifests`, `/api-consumers`, or `/policy-scopes`. - Unauthenticated `/api/v2/hubs` returns 404 publicly, while current source should route it and return 401. This means ops-hub bootstrap cannot run against production until the current image is deployed. - The registry endpoint returns the expected unauthenticated `/v2/` 401 challenge, but this workspace does not have `kubectl`, so R3 cluster readiness cannot be fully verified from here. ## Tasks ### R1 - Add OCI image build to flake.nix ```task id: IHUB-WP-0018-T01 status: done priority: high state_hub_task_id: "27420bd7-0f70-4793-8805-393d8d5cacfd" ``` Add a `packages.docker` output to `flake.nix` using `pkgs.dockerTools.buildLayeredImage`. The image wraps the IHP production binary produced by `nix build .#default`. ```nix packages.docker = pkgs.dockerTools.buildLayeredImage { name = "inter-hub"; tag = "latest"; contents = [ self.packages.${system}.default pkgs.cacert ]; config = { Cmd = [ "/bin/inter-hub" ]; ExposedPorts = { "8000/tcp" = {}; }; Env = [ "PORT=8000" "IHP_ENV=Production" ]; }; }; ``` Test locally on haskelseed: ```bash nix build .#docker docker load < result docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest ``` **Note:** First build pulls the full Haskell binary closure (~2 GB); subsequent builds are incremental (layer caching). Build must run on haskelseed - the only machine with the Nix store populated for GHC 9.10.3. **Implementation note (2026-06-05):** `flake.nix` exposes `packages.docker = config.packages.unoptimized-docker-image`, the IHP-provided production OCI image used by the Railiance runbook. The original `buildLayeredImage` sketch is superseded by that IHP image path. ### R2 — Verify container runs correctly ```task id: IHUB-WP-0018-T02 status: todo priority: high state_hub_task_id: "5ab45e4e-16bc-4feb-8b1b-e8eeb05bf39a" ``` On haskelseed, run the container image against the existing `interhub` database. Confirm: - `curl http://localhost:8000/` returns 200 (LandingAction) - `curl http://localhost:8000/api/v2/hubs` returns 401 (auth required) - Static assets load (Tailwind CSS present in image) - Container exits cleanly on SIGTERM If Tailwind CSS output (`static/app.css`) is not bundled into the Nix binary closure, add a pre-build step: run tailwindcss and include `static/` in the image via `dockerTools.buildLayeredImage` `contents` or a NixOS module. ### R3 — Verify Railiance01 readiness (gate) ```task id: IHUB-WP-0018-T03 status: blocked priority: high state_hub_task_id: "79b5cf2c-3a5b-4b4b-8f84-f635cb6891c1" ``` This is a dependency gate. Before proceeding, confirm: ```bash # From CoulombCore (execution origin): kubectl get nodes # must show Ready kubectl get pods -n kube-system | grep traefik # Traefik must be running kubectl get pods -n railiance-platform # PostgreSQL HA pods ``` Also confirm: - Container registry is reachable from haskelseed (verify push access) - Registry address (e.g., `registry.coulomb.social` or `gitea.coulomb.social`) - SOPS/age key is present on CoulombCore at `~/.config/sops/age/keys.txt` If any check fails, block here and open the relevant Railiance workstream. Do not proceed until all checks pass. **Review note (2026-06-05):** Public smoke probes show `https://hub.coulomb.social/` returning 200 and the Gitea registry `/v2/` endpoint returning the expected unauthenticated 401 challenge. Full R3 remains blocked from this workspace because `kubectl` is not available here, and the live app is not serving the current `origin/main` v2 bootstrap routes. ### R4 — Provision inter-hub database on railiance-platform ```task id: IHUB-WP-0018-T04 status: blocked priority: high state_hub_task_id: "c937cf36-3850-4ab3-aa83-2d846e1a378e" ``` On the PostgreSQL HA cluster, create the inter-hub database and user: ```sql CREATE USER interhub WITH PASSWORD ''; CREATE DATABASE interhub OWNER interhub; GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub; ``` Run schema migration (IHP migrations) as part of the first deployment via an init container or a manual `migrate` run inside the pod. Document the migration procedure in `deploy/railiance/RUNBOOK.md`. ### R5 — SOPS-encrypted secrets ```task id: IHUB-WP-0018-T05 status: blocked priority: high state_hub_task_id: "926f82d1-15cd-425d-8a41-3d6b51c07f0b" ``` Create `deploy/railiance/secrets/inter-hub.env.sops.yaml` with: ```yaml # sops encrypted — do not edit manually DATABASE_URL: postgresql://interhub:@pgpool.railiance-platform.svc:5432/interhub IHP_SESSION_SECRET: <64-char-hex> IHP_BASEURL: https://hub.coulomb.social ``` Encrypt with the age key: ```bash sops --encrypt --age $(cat ~/.config/sops/age/keys.txt | grep public | awk '{print $4}') \ deploy/railiance/secrets/inter-hub.env.sops.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml ``` Commit the encrypted file. The Gitea Actions workflow decrypts at deploy time using the age key from a Kubernetes Secret (bootstrapped once manually). ### R6 — Helm chart in railiance-apps ```task id: IHUB-WP-0018-T06 status: in_progress priority: high state_hub_task_id: "4c4acc98-5773-4289-ad57-03f3fd5c381c" ``` Create `helm/inter-hub/` in the `railiance-apps` repository following the Railiance app.toml contract. Minimal chart: ``` helm/inter-hub/ Chart.yaml name: inter-hub, version: 0.1.0 values.yaml image.tag, ingress.host, resources values.prod.yaml replicas: 1, resources.requests.memory: 1Gi templates/ deployment.yaml envFrom: secretRef inter-hub-env service.yaml ClusterIP :8000 ingress.yaml Traefik annotations, TLS secret.yaml created by sops-operator or external-secrets ``` `app.toml` in the inter-hub repo root for railiance CLI integration: ```toml [app] name = "inter-hub" slug = "inter-hub" kind = "native" registry = "registry.coulomb.social/coulomb/inter-hub" [deploy] chart = "railiance-apps/helm/inter-hub" namespace = "inter-hub" ``` **Implementation note (2026-06-05):** A Helm chart exists in `deploy/helm/inter-hub/` with Deployment, Service, Ingress, and values for the current Gitea registry and `hub.coulomb.social`. Remaining gaps: no repo-root `app.toml`, no committed SOPS secret manifest, and no separate `railiance-apps/helm/inter-hub` handoff in this repo. ### R7 — Gitea Actions CI/CD pipeline ```task id: IHUB-WP-0018-T07 status: blocked priority: medium state_hub_task_id: "ec25c67c-3cb0-4534-9fb0-9bd6578a2def" ``` Create `.gitea/workflows/deploy.yaml` in the inter-hub repo: ```yaml on: push: branches: [main] jobs: build-and-deploy: runs-on: ubuntu-latest # or self-hosted if available steps: - uses: actions/checkout@v4 - name: Build OCI image on haskelseed run: | ssh haskelseed "cd /root/inter-hub && git pull && \ nix build .#docker && \ docker load < result && \ docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \ docker push $REGISTRY/inter-hub:${{ github.sha }}" - name: Deploy to Railiance01 run: | ssh coulombcore "helm upgrade --install inter-hub \ railiance-apps/helm/inter-hub \ --namespace inter-hub --create-namespace \ --set image.tag=${{ github.sha }} \ -f railiance-apps/helm/inter-hub/values.prod.yaml" ``` Secrets in Gitea: `REGISTRY`, `SSH_KEY_HASKELSEED`, `SSH_KEY_COULOMBCORE`. **Alternative if self-hosted runner is available on CoulombCore:** run the deploy step directly without the SSH hop to coulombcore. **Implementation note (2026-06-05):** `.gitea/workflows/deploy.yaml` exists and builds `.#docker` on a self-hosted `haskelseed` runner, pushes to `92.205.130.254:32166/coulomb/inter-hub`, deploys with Helm, and smoke-tests the public endpoint. Remote `main` is already current, but production is still serving an older API surface, so the workflow needs an attended rerun/inspection or a new deployment trigger. **Runner substrate finding (2026-06-07):** Pushed commits `fa96fb8` and `7cc3173` to trigger the workflow, but public `/api/v2/hubs` remained `404` while `/` stayed `200`, indicating the current image was not deployed. Repo search shows `railiance-forge` owns Actions runner substrate, but its 2026-06-05 migration plan explicitly lists "No Actions runner deployment" as a non-goal and no runner manifest/script/workplan exists there yet. `haskelseed` itself is reachable on SSH and historical port 8080, but this workspace cannot authenticate non-interactively. Treat R7 as blocked on a forge-owned runner prerequisite rather than continuing to push commits as deployment probes. ### R8 — Staged deployment and smoke test ```task id: IHUB-WP-0018-T08 status: blocked priority: high state_hub_task_id: "2b02ae5c-47b9-4f09-88f0-a4af7900b38f" ``` Follow the Railiance staged promotion lifecycle: 1. **Local verify** (done in R2 — container runs correctly) 2. **Deploy to Railiance01:** ```bash railiance deploy inter-hub --tag ``` 3. **Smoke test:** ```bash curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page curl -s https://hub.coulomb.social/capabilities # Capabilities curl -H "Authorization: Bearer " \ https://hub.coulomb.social/api/v2/hubs # API (200) curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401) ``` 4. **Verify restart persistence:** ```bash kubectl rollout restart deployment/inter-hub -n inter-hub kubectl rollout status deployment/inter-hub -n inter-hub # Then re-run smoke test ``` ### R9 — Document and register ```task id: IHUB-WP-0018-T09 status: in_progress priority: medium state_hub_task_id: "4d1e55c7-8dbb-480f-b07b-6c5e39a04218" ``` - Write `deploy/railiance/RUNBOOK.md`: image build, migration procedure, secret rotation, rollback (`railiance rollback inter-hub`), log access (`kubectl logs -n inter-hub -l app=inter-hub --tail=100`) - Add progress event to state hub - Remove haskelseed socat/OpenRC production role note from quickstart - document it as the build machine only, not the production host **Implementation note (2026-06-05):** `deploy/railiance/RUNBOOK.md` exists and documents architecture, image build/push, Helm deployment, logs, restart, rollback, secret rotation, and smoke checks. The deployment record remains incomplete until current `main` is running and the ops-hub bootstrap smoke test passes against production. ## Exit Criteria - `https://hub.coulomb.social/` returns the Landing page (200, no auth) - `/api/v2/hubs` returns 401 unauthenticated, 200 with valid API key - All 12 IHF dashboards accessible after admin login - `kubectl rollout restart` followed by smoke test passes (K3s restart persistence confirmed) - Gitea Actions pipeline: push to `main` → image built → deployed → smoke test green within 15 minutes - No dependency on haskelseed being up for the app to *run* (only for builds) ## Open Questions / Pre-flight Checks 1. **K3s status**: ThreePhoenix HA cluster workstream is active but not complete. Confirm whether Railiance01 is a single-node cluster already accepting workloads or still being provisioned. Gate R3 is the go/no-go check. 2. **Container registry**: Is Gitea's built-in registry available on Railiance01, or is a separate registry service needed? If neither, add registry deployment to the scope. 3. **PostgreSQL HA status**: railiance-platform baseline workstream is active. Confirm whether the HA cluster (repmgr + pgpool) is operational before R4. 4. **Static asset bundling**: The Nix production binary may or may not include `static/app.css` (Tailwind output). Verify in R2 and adjust image build if needed. 5. **Anthropic API key**: Phase 5 AI-assisted distillation requires `IHP_ANTHROPIC_API_KEY`. Add to SOPS secrets if the feature is to be active on Railiance01.