--- id: IHUB-WP-0018 type: workplan title: "Railiance01 Deployment — Production Operations Scaffold" domain: inter_hub repo: inter-hub status: finished owner: custodian topic_slug: inter_hub created: "2026-04-29" updated: "2026-06-14" depends_on: IHUB-WP-0015 state_hub_workstream_id: "080d841a-3acd-4adf-b684-2d1890a5e986" --- # IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold ## Goal Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a Gitea Actions CI/CD pipeline. After this workplan, every push to `main` automatically builds an OCI container image on haskelseed, pushes it to the Railiance container registry, and deploys it — with automatic restart on node reboot guaranteed by K3s. ## Background inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer and socat. That setup is a development convenience, not a production operations scaffold. The target is the Railiance01 K3s cluster, which has: - K3s (single-node for now; ThreePhoenix HA cluster is in progress) - Traefik ingress with TLS - PostgreSQL HA (repmgr + pgpool) managed by railiance-platform - SOPS/age secret management - Gitea with built-in container registry (or separate registry service) - Staged Promotion Lifecycle CLI (`railiance run / deploy / promote / rollback`) **Key constraint:** This workplan depends on Railiance01 K3s being operational. Gate R3 verifies cluster readiness before any deployment work begins — if K3s or the container registry is not ready, this workplan blocks there and the cluster work must be completed first. **IHP specifics:** IHP DevServer is a development server. For production we build the IHP binary via `nix build` (which produces a self-contained binary) and wrap it in a minimal OCI image using Nix's `dockerTools.buildImage`. The app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes since Traefik routes directly to the pod's port. ## Architecture ``` git push → Gitea Actions → SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA → helm upgrade inter-hub railiance-apps/helm/inter-hub → Deployment (1 replica): inter-hub:$SHA + env from Secrets → Service (ClusterIP :8000) → Ingress (Traefik): hub.coulomb.social → Service → PersistentVolumeClaim: /app/static (generated CSS/JS) → PostgreSQL: database 'interhub' on railiance-platform HA cluster ``` ## Close-out Audit - 2026-06-04 WSJF triage flagged this workplan as a close-out candidate because State Hub had no indexed task rows for it. The deployment work is not complete; this file now contains explicit task blocks so the hub can track the remaining Railiance01 deployment work instead of treating the workplan as empty. ## Deployment Review - 2026-06-05 Review against the current repo and public Railiance endpoint shows the deployment scaffold is partially implemented but the live deployment is behind `origin/main`. - `origin/main` is at `a3d980c`, which includes the completed ops-hub bootstrap API work from `IHUB-WP-0019`. - `https://hub.coulomb.social/` returns 200 and serves inter-hub. - The public OpenAPI only lists the older v2 endpoints; it does not include `/hubs`, `/hub-capability-manifests`, `/api-consumers`, or `/policy-scopes`. - Unauthenticated `/api/v2/hubs` returns 404 publicly, while current source should route it and return 401. This means ops-hub bootstrap cannot run against production until the current image is deployed. - The registry endpoint returns the expected unauthenticated `/v2/` 401 challenge, but this workspace does not have `kubectl`, so R3 cluster readiness cannot be fully verified from here. ## Tasks ### R1 - Add OCI image build to flake.nix ```task id: IHUB-WP-0018-T01 status: done priority: high state_hub_task_id: "27420bd7-0f70-4793-8805-393d8d5cacfd" ``` Add a `packages.docker` output to `flake.nix` using `pkgs.dockerTools.buildLayeredImage`. The image wraps the IHP production binary produced by `nix build .#default`. ```nix packages.docker = pkgs.dockerTools.buildLayeredImage { name = "inter-hub"; tag = "latest"; contents = [ self.packages.${system}.default pkgs.cacert ]; config = { Cmd = [ "/bin/inter-hub" ]; ExposedPorts = { "8000/tcp" = {}; }; Env = [ "PORT=8000" "IHP_ENV=Production" ]; }; }; ``` Test locally on haskelseed: ```bash nix build .#docker docker load < result docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest ``` **Note:** First build pulls the full Haskell binary closure (~2 GB); subsequent builds are incremental (layer caching). Build must run on haskelseed - the only machine with the Nix store populated for GHC 9.10.3. **Implementation note (2026-06-05):** `flake.nix` exposes `packages.docker = config.packages.unoptimized-docker-image`, the IHP-provided production OCI image used by the Railiance runbook. The original `buildLayeredImage` sketch is superseded by that IHP image path. ### R2 — Verify container runs correctly ```task id: IHUB-WP-0018-T02 status: done priority: high state_hub_task_id: "5ab45e4e-16bc-4feb-8b1b-e8eeb05bf39a" ``` On haskelseed, run the container image against the existing `interhub` database. Confirm: - `curl http://localhost:8000/` returns 200 (LandingAction) - `curl http://localhost:8000/api/v2/hubs` returns 401 (auth required) - Static assets load (Tailwind CSS present in image) - Container exits cleanly on SIGTERM If Tailwind CSS output (`static/app.css`) is not bundled into the Nix binary closure, add a pre-build step: run tailwindcss and include `static/` in the image via `dockerTools.buildLayeredImage` `contents` or a NixOS module. ### R3 — Verify Railiance01 readiness (gate) ```task id: IHUB-WP-0018-T03 status: done priority: high state_hub_task_id: "79b5cf2c-3a5b-4b4b-8f84-f635cb6891c1" ``` This is a dependency gate. Before proceeding, confirm: ```bash # From CoulombCore (execution origin): kubectl get nodes # must show Ready kubectl get pods -n kube-system | grep traefik # Traefik must be running kubectl get pods -n railiance-platform # PostgreSQL HA pods ``` Also confirm: - Container registry is reachable from haskelseed (verify push access) - Registry address (e.g., `registry.coulomb.social` or `gitea.coulomb.social`) - SOPS/age key is present on CoulombCore at `~/.config/sops/age/keys.txt` If any check fails, block here and open the relevant Railiance workstream. Do not proceed until all checks pass. **Review note (2026-06-05):** Public smoke probes show `https://hub.coulomb.social/` returning 200 and the Gitea registry `/v2/` endpoint returning the expected unauthenticated 401 challenge. Full R3 remains blocked from this workspace because `kubectl` is not available here, and the live app is not serving the current `origin/main` v2 bootstrap routes. **Recovery note (2026-06-14):** Re-established the haskelseed ops-bridge path and verified the runner substrate before deployment. `make runner-status` in `railiance-forge` confirmed `act_runner` is registered to `https://gitea.coulomb.social`, running under OpenRC, and has the expected self-hosted labels and build/deploy tools. The K3s API path, Helm deploy path, and Gitea registry host were exercised successfully by the production rollout. ### R4 — Provision inter-hub database on railiance-platform ```task id: IHUB-WP-0018-T04 status: done priority: high state_hub_task_id: "c937cf36-3850-4ab3-aa83-2d846e1a378e" ``` On the PostgreSQL HA cluster, create the inter-hub database and user: ```sql CREATE USER interhub WITH PASSWORD ''; CREATE DATABASE interhub OWNER interhub; GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub; ``` Run schema migration (IHP migrations) as part of the first deployment via an init container or a manual `migrate` run inside the pod. Document the migration procedure in `deploy/railiance/RUNBOOK.md`. **Recovery note (2026-06-14):** Bootstrapped the production database manually on the Railiance PostgreSQL cluster: role `interhub`, database `interhub`, schema ownership, and privileges were created/updated. The running deployment now uses that database through the `inter-hub-env` Kubernetes Secret. ### R5 — SOPS-encrypted secrets ```task id: IHUB-WP-0018-T05 status: done priority: high state_hub_task_id: "926f82d1-15cd-425d-8a41-3d6b51c07f0b" ``` Create `deploy/railiance/secrets/inter-hub.env.sops.yaml` with: ```yaml apiVersion: v1 kind: Secret metadata: name: inter-hub-env namespace: inter-hub type: Opaque stringData: DATABASE_URL: postgresql://interhub:@net-kingdom-pg-rw.databases.svc.cluster.local:5432/interhub?sslmode=disable IHP_SESSION_SECRET: <64-char-hex> IHP_BASEURL: https://hub.coulomb.social PORT: "8000" IHP_ENV: Production ``` Encrypt with the age key: ```bash sops --encrypt \ --age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \ /tmp/inter-hub-env.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml ``` Commit only the encrypted file. Apply it with `sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -`. **Recovery note (2026-06-14):** Runtime secrets were bootstrapped manually in Kubernetes so production could deploy safely. This task remains in progress until the durable SOPS-encrypted source for `DATABASE_URL`, `IHP_SESSION_SECRET`, and related runtime env is committed and wired into the deploy path. **Progress note (2026-06-14):** Added repo root `.sops.yaml`, plaintext guardrails under `deploy/railiance/secrets/`, an example Secret manifest, and `k8s-secret-json-to-sops-input.py` to convert the live Kubernetes Secret into a SOPS-ready manifest without printing values. At that point the encrypted source file was still pending because local `sops` tooling was not available. **Completion note (2026-06-14):** Created `deploy/railiance/secrets/inter-hub.env.sops.yaml` from the live `inter-hub/inter-hub-env` Kubernetes Secret using temporary `sops` v3.13.1 and the shared Railiance age recipient. Verified the file is SOPS-encrypted, parses as YAML, leaves only non-secret metadata reviewable, and does not contain the checked plaintext runtime markers. Decryption/apply verification remains a custody-backed operator capability because the private age identity is not present in the normal workstation or haskelseed shell. ### R6 — Helm chart in railiance-apps ```task id: IHUB-WP-0018-T06 status: done priority: high state_hub_task_id: "4c4acc98-5773-4289-ad57-03f3fd5c381c" ``` Create `charts/inter-hub/` in the `railiance-apps` repository following the Railiance app.toml contract. Minimal chart: ``` charts/inter-hub/ Chart.yaml name: inter-hub, version: 0.1.0 values.yaml image.tag, ingress.host, resources helm/inter-hub-values.yaml production non-secret overrides templates/ deployment.yaml envFrom: secretRef inter-hub-env service.yaml ClusterIP :8000 ingress.yaml Traefik annotations, TLS ``` `app.toml` in the inter-hub repo root for railiance CLI integration: ```toml [app] name = "inter-hub" slug = "inter-hub" kind = "native" registry = "gitea.coulomb.social/coulomb/inter-hub" [deploy] chart = "railiance-apps/charts/inter-hub" namespace = "inter-hub" ``` **Implementation note (2026-06-05):** A Helm chart exists in `deploy/helm/inter-hub/` with Deployment, Service, Ingress, and values for the current Gitea registry and `hub.coulomb.social`. Remaining gaps: no repo-root `app.toml`, no committed SOPS secret manifest, and no separate `railiance-apps/helm/inter-hub` handoff in this repo. **Recovery note (2026-06-14):** The local chart under `deploy/helm/inter-hub/` successfully deployed the app to Railiance01. This task remains in progress because the repo-root `app.toml` and railiance-apps handoff are still not completed. **Completion note (2026-06-14):** Added repo-root `app.toml` in inter-hub and added `charts/inter-hub`, `helm/inter-hub-values.yaml`, Makefile targets, and server-dry-run coverage in `railiance-apps`. The chart rendered successfully on haskelseed with `helm template`. ### R7 — Gitea Actions CI/CD pipeline ```task id: IHUB-WP-0018-T07 status: done priority: medium state_hub_task_id: "ec25c67c-3cb0-4534-9fb0-9bd6578a2def" ``` Create `.gitea/workflows/deploy.yaml` in the inter-hub repo: ```yaml on: push: branches: [main] jobs: build-and-deploy: runs-on: ubuntu-latest # or self-hosted if available steps: - uses: actions/checkout@v4 - name: Build OCI image on haskelseed run: | ssh haskelseed "cd /root/inter-hub && git pull && \ nix build .#docker && \ docker load < result && \ docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \ docker push $REGISTRY/inter-hub:${{ github.sha }}" - name: Deploy to Railiance01 run: | ssh coulombcore "helm upgrade --install inter-hub \ railiance-apps/helm/inter-hub \ --namespace inter-hub --create-namespace \ --set image.tag=${{ github.sha }} \ -f railiance-apps/helm/inter-hub/values.prod.yaml" ``` Secrets in Gitea: `REGISTRY`, `SSH_KEY_HASKELSEED`, `SSH_KEY_COULOMBCORE`. **Alternative if self-hosted runner is available on CoulombCore:** run the deploy step directly without the SSH hop to coulombcore. **Implementation note (2026-06-05):** `.gitea/workflows/deploy.yaml` exists and builds `.#docker` on a self-hosted `haskelseed` runner, pushes to `92.205.130.254:32166/coulomb/inter-hub`, deploys with Helm, and smoke-tests the public endpoint. Remote `main` is already current, but production is still serving an older API surface, so the workflow needs an attended rerun/inspection or a new deployment trigger. **Runner substrate finding (2026-06-07):** Pushed commits `fa96fb8` and `7cc3173` to trigger the workflow, but public `/api/v2/hubs` remained `404` while `/` stayed `200`, indicating the current image was not deployed. Repo search shows `railiance-forge` owns Actions runner substrate, but its 2026-06-05 migration plan explicitly lists "No Actions runner deployment" as a non-goal and no runner manifest/script/workplan exists there yet. `haskelseed` itself is reachable on SSH and historical port 8080, but this workspace cannot authenticate non-interactively. Treat R7 as blocked on a forge-owned runner prerequisite rather than continuing to push commits as deployment probes. **Recovery note (2026-06-14):** The runner prerequisite was restored through the haskelseed ops-bridge path. The workflow now builds the Nix OCI image, publishes to `gitea.coulomb.social/coulomb/inter-hub` using a registry bearer token from the repo `REGISTRY_TOKEN` Actions secret, deploys with Helm, and runs public smoke checks. Gitea Actions run `2913` completed successfully for commit `5663fab`. **Load-control note (2026-06-14):** Added workflow `paths-ignore` for docs, workplans, `.custodian-brief.md`, `app.toml`, `.sops.yaml`, and `deploy/railiance/**` so State Hub consistency/doc-only commits do not consume a haskelseed build/deploy cycle. **Bootstrap-gate deploy note (2026-06-14):** Hardened the deployment workflow smoke test so a production rollout only passes when `/api/v2/hubs` returns the expected unauthenticated `401` and OpenAPI exposes `/hubs`, `/hub-capability-manifests`, `/api-consumers`, and `/policy-scopes`. This directly protects the ops-hub bootstrap gate instead of only checking the landing page and generic widget auth gate. **Authenticated inspection note (2026-06-14):** The stored local Tea token is stale for `https://gitea.coulomb.social`, but runner-side inspection succeeded. `make runner-status` in `railiance-forge` showed `act_runner` registered to `https://gitea.coulomb.social`, started under OpenRC, and carrying the expected `self-hosted`/`haskelseed` labels. The runner log shows task `19` for `coulomb/inter-hub` starting at `2026-06-14T19:59:19+02:00`, matching the `6455902` deploy trigger. ### R8 — Staged deployment and smoke test ```task id: IHUB-WP-0018-T08 status: done priority: high state_hub_task_id: "2b02ae5c-47b9-4f09-88f0-a4af7900b38f" ``` Follow the Railiance staged promotion lifecycle: 1. **Local verify** (done in R2 — container runs correctly) 2. **Deploy to Railiance01:** ```bash railiance deploy inter-hub --tag ``` 3. **Smoke test:** ```bash curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page curl -s https://hub.coulomb.social/capabilities # Capabilities curl -H "Authorization: Bearer " \ https://hub.coulomb.social/api/v2/hubs # API (200) curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401) ``` 4. **Verify restart persistence:** ```bash kubectl rollout restart deployment/inter-hub -n inter-hub kubectl rollout status deployment/inter-hub -n inter-hub # Then re-run smoke test ``` **Recovery note (2026-06-14):** Production is deployed from image `gitea.coulomb.social/coulomb/inter-hub:5663fab`; Kubernetes reports the `inter-hub` deployment ready with one replica. Public smoke checks pass: `/` returns 200 and contains `inter-hub`, `/api/v2/openapi.json` returns 200, and unauthenticated `/api/v2/widgets` returns 401. **DNS gate finding (2026-06-14):** The deployment workflow did publish and deploy `gitea.coulomb.social/coulomb/inter-hub:6455902`; Kubernetes reports the `inter-hub` Deployment ready on the COULOMBCORE K3s node `92.205.130.254`. An in-cluster probe to `http://inter-hub:8000/api/v2/hubs` returned the expected unauthenticated `401`, and forcing public TLS to `92.205.130.254` also returned `401`. The public DNS record for `hub.coulomb.social`, however, resolves to `92.205.62.239`, where `/api/v2/hubs` still returns `404` and OpenAPI lacks the bootstrap paths. The remaining production gate is therefore DNS cutover (or an intentional kubeconfig rotation to the cluster behind `92.205.62.239`), not a runner, build, registry, Helm, or image-content issue. ### R9 — Document and register ```task id: IHUB-WP-0018-T09 status: done priority: medium state_hub_task_id: "4d1e55c7-8dbb-480f-b07b-6c5e39a04218" ``` - Write `deploy/railiance/RUNBOOK.md`: image build, migration procedure, secret rotation, rollback (`railiance rollback inter-hub`), log access (`kubectl logs -n inter-hub -l app=inter-hub --tail=100`) - Add progress event to state hub - Remove haskelseed socat/OpenRC production role note from quickstart - document it as the build machine only, not the production host **Implementation note (2026-06-05):** `deploy/railiance/RUNBOOK.md` exists and documents architecture, image build/push, Helm deployment, logs, restart, rollback, secret rotation, and smoke checks. The deployment record remains incomplete until current `main` is running and the ops-hub bootstrap smoke test passes against production. **Recovery note (2026-06-14):** Current `main` is running in production and the deployment evidence has been recorded here. Remaining documentation work is to capture the durable secret-management and railiance-apps handoff path once R5 and R6 are completed. **Completion note (2026-06-14):** Updated `deploy/railiance/RUNBOOK.md` for the current Gitea registry host, runner-based build/deploy path, SOPS secret handoff, current smoke checks, and haskelseed's build-runner-only role. Updated `docs/new-hub-quickstart.md` so haskelseed is no longer described as a production/shared database runtime. ## Exit Criteria - `https://hub.coulomb.social/` returns the Landing page (200, no auth) - `/api/v2/hubs` returns 401 unauthenticated, 200 with valid API key - All 12 IHF dashboards accessible after admin login - `kubectl rollout restart` followed by smoke test passes (K3s restart persistence confirmed) - Gitea Actions pipeline: push to `main` → image built → deployed → smoke test green within 15 minutes - No dependency on haskelseed being up for the app to *run* (only for builds) ## Open Questions / Pre-flight Checks 1. **K3s status**: ThreePhoenix HA cluster workstream is active but not complete. Confirm whether Railiance01 is a single-node cluster already accepting workloads or still being provisioned. Gate R3 is the go/no-go check. 2. **Container registry**: Is Gitea's built-in registry available on Railiance01, or is a separate registry service needed? If neither, add registry deployment to the scope. 3. **PostgreSQL HA status**: railiance-platform baseline workstream is active. Confirm whether the HA cluster (repmgr + pgpool) is operational before R4. 4. **Static asset bundling**: The Nix production binary may or may not include `static/app.css` (Tailwind output). Verify in R2 and adjust image build if needed. 5. **Anthropic API key**: Phase 5 AI-assisted distillation requires `IHP_ANTHROPIC_API_KEY`. Add to SOPS secrets if the feature is to be active on Railiance01.