Files
inter-hub/workplans/IHUB-WP-0018-railiance01-deployment.md
tegwick 0edf05324e
Some checks failed
Test / test (push) Has been cancelled
feat(WP-0018): workplan for Railiance01 deployment with full ops scaffold
OCI image build (Nix dockerTools), Helm chart in railiance-apps,
SOPS/age secrets, PostgreSQL HA on railiance-platform, Traefik ingress,
Gitea Actions CI/CD. Includes dependency gate on K3s cluster readiness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:50:24 +02:00

10 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated depends_on state_hub_workstream_id
IHUB-WP-0018 workplan Railiance01 Deployment — Production Operations Scaffold inter_hub inter-hub open custodian inter_hub 2026-04-29 2026-04-29 IHUB-WP-0015 080d841a-3acd-4adf-b684-2d1890a5e986

IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold

Goal

Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a Gitea Actions CI/CD pipeline. After this workplan, every push to main automatically builds an OCI container image on haskelseed, pushes it to the Railiance container registry, and deploys it — with automatic restart on node reboot guaranteed by K3s.

Background

inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer and socat. That setup is a development convenience, not a production operations scaffold. The target is the Railiance01 K3s cluster, which has:

  • K3s (single-node for now; ThreePhoenix HA cluster is in progress)
  • Traefik ingress with TLS
  • PostgreSQL HA (repmgr + pgpool) managed by railiance-platform
  • SOPS/age secret management
  • Gitea with built-in container registry (or separate registry service)
  • Staged Promotion Lifecycle CLI (railiance run / deploy / promote / rollback)

Key constraint: This workplan depends on Railiance01 K3s being operational. Gate R3 verifies cluster readiness before any deployment work begins — if K3s or the container registry is not ready, this workplan blocks there and the cluster work must be completed first.

IHP specifics: IHP DevServer is a development server. For production we build the IHP binary via nix build (which produces a self-contained binary) and wrap it in a minimal OCI image using Nix's dockerTools.buildImage. The app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes since Traefik routes directly to the pod's port.

Architecture

git push → Gitea Actions
  → SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA
  → helm upgrade inter-hub railiance-apps/helm/inter-hub
    → Deployment (1 replica): inter-hub:$SHA + env from Secrets
    → Service (ClusterIP :8000)
    → Ingress (Traefik): hub.coulomb.social → Service
    → PersistentVolumeClaim: /app/static (generated CSS/JS)
  → PostgreSQL: database 'interhub' on railiance-platform HA cluster

Tasks

R1 — Add OCI image build to flake.nix

Add a packages.docker output to flake.nix using pkgs.dockerTools.buildLayeredImage. The image wraps the IHP production binary produced by nix build .#default.

packages.docker = pkgs.dockerTools.buildLayeredImage {
  name = "inter-hub";
  tag = "latest";
  contents = [ self.packages.${system}.default pkgs.cacert ];
  config = {
    Cmd = [ "/bin/inter-hub" ];
    ExposedPorts = { "8000/tcp" = {}; };
    Env = [
      "PORT=8000"
      "IHP_ENV=Production"
    ];
  };
};

Test locally on haskelseed:

nix build .#docker
docker load < result
docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest

Note: First build pulls the full Haskell binary closure (~2 GB); subsequent builds are incremental (layer caching). Build must run on haskelseed — the only machine with the Nix store populated for GHC 9.10.3.

R2 — Verify container runs correctly

On haskelseed, run the container image against the existing interhub database. Confirm:

  • curl http://localhost:8000/ returns 200 (LandingAction)
  • curl http://localhost:8000/api/v2/hubs returns 401 (auth required)
  • Static assets load (Tailwind CSS present in image)
  • Container exits cleanly on SIGTERM

If Tailwind CSS output (static/app.css) is not bundled into the Nix binary closure, add a pre-build step: run tailwindcss and include static/ in the image via dockerTools.buildLayeredImage contents or a NixOS module.

R3 — Verify Railiance01 readiness (gate)

This is a dependency gate. Before proceeding, confirm:

# From CoulombCore (execution origin):
kubectl get nodes          # must show Ready
kubectl get pods -n kube-system | grep traefik   # Traefik must be running
kubectl get pods -n railiance-platform            # PostgreSQL HA pods

Also confirm:

  • Container registry is reachable from haskelseed (verify push access)
  • Registry address (e.g., registry.coulomb.social or gitea.coulomb.social)
  • SOPS/age key is present on CoulombCore at ~/.config/sops/age/keys.txt

If any check fails, block here and open the relevant Railiance workstream. Do not proceed until all checks pass.

R4 — Provision inter-hub database on railiance-platform

On the PostgreSQL HA cluster, create the inter-hub database and user:

CREATE USER interhub WITH PASSWORD '<generated>';
CREATE DATABASE interhub OWNER interhub;
GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub;

Run schema migration (IHP migrations) as part of the first deployment via an init container or a manual migrate run inside the pod. Document the migration procedure in deploy/railiance/RUNBOOK.md.

R5 — SOPS-encrypted secrets

Create deploy/railiance/secrets/inter-hub.env.sops.yaml with:

# sops encrypted — do not edit manually
DATABASE_URL: postgresql://interhub:<pass>@pgpool.railiance-platform.svc:5432/interhub
IHP_SESSION_SECRET: <64-char-hex>
IHP_BASEURL: https://hub.coulomb.social

Encrypt with the age key:

sops --encrypt --age $(cat ~/.config/sops/age/keys.txt | grep public | awk '{print $4}') \
  deploy/railiance/secrets/inter-hub.env.sops.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml

Commit the encrypted file. The Gitea Actions workflow decrypts at deploy time using the age key from a Kubernetes Secret (bootstrapped once manually).

R6 — Helm chart in railiance-apps

Create helm/inter-hub/ in the railiance-apps repository following the Railiance app.toml contract. Minimal chart:

helm/inter-hub/
  Chart.yaml          name: inter-hub, version: 0.1.0
  values.yaml         image.tag, ingress.host, resources
  values.prod.yaml    replicas: 1, resources.requests.memory: 1Gi
  templates/
    deployment.yaml   envFrom: secretRef inter-hub-env
    service.yaml      ClusterIP :8000
    ingress.yaml      Traefik annotations, TLS
    secret.yaml       created by sops-operator or external-secrets

app.toml in the inter-hub repo root for railiance CLI integration:

[app]
name = "inter-hub"
slug = "inter-hub"
kind = "native"
registry = "registry.coulomb.social/coulomb/inter-hub"

[deploy]
chart = "railiance-apps/helm/inter-hub"
namespace = "inter-hub"

R7 — Gitea Actions CI/CD pipeline

Create .gitea/workflows/deploy.yaml in the inter-hub repo:

on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest   # or self-hosted if available
    steps:
      - uses: actions/checkout@v4

      - name: Build OCI image on haskelseed
        run: |
          ssh haskelseed "cd /root/inter-hub && git pull && \
            nix build .#docker && \
            docker load < result && \
            docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \
            docker push $REGISTRY/inter-hub:${{ github.sha }}"

      - name: Deploy to Railiance01
        run: |
          ssh coulombcore "helm upgrade --install inter-hub \
            railiance-apps/helm/inter-hub \
            --namespace inter-hub --create-namespace \
            --set image.tag=${{ github.sha }} \
            -f railiance-apps/helm/inter-hub/values.prod.yaml"

Secrets in Gitea: REGISTRY, SSH_KEY_HASKELSEED, SSH_KEY_COULOMBCORE.

Alternative if self-hosted runner is available on CoulombCore: run the deploy step directly without the SSH hop to coulombcore.

R8 — Staged deployment and smoke test

Follow the Railiance staged promotion lifecycle:

  1. Local verify (done in R2 — container runs correctly)
  2. Deploy to Railiance01:
    railiance deploy inter-hub --tag <sha>
    
  3. Smoke test:
    curl -s https://hub.coulomb.social/ | grep "Inter-Hub"   # Landing page
    curl -s https://hub.coulomb.social/capabilities           # Capabilities
    curl -H "Authorization: Bearer <key>" \
      https://hub.coulomb.social/api/v2/hubs                 # API (200)
    curl https://hub.coulomb.social/api/v2/hubs              # Unauthenticated (401)
    
  4. Verify restart persistence:
    kubectl rollout restart deployment/inter-hub -n inter-hub
    kubectl rollout status deployment/inter-hub -n inter-hub
    # Then re-run smoke test
    

R9 — Document and register

  • Write deploy/railiance/RUNBOOK.md: image build, migration procedure, secret rotation, rollback (railiance rollback inter-hub), log access (kubectl logs -n inter-hub -l app=inter-hub --tail=100)
  • Add progress event to state hub
  • Remove haskelseed socat/OpenRC production role note from quickstart — document it as the build machine only, not the production host

Exit Criteria

  • https://hub.coulomb.social/ returns the Landing page (200, no auth)
  • /api/v2/hubs returns 401 unauthenticated, 200 with valid API key
  • All 12 IHF dashboards accessible after admin login
  • kubectl rollout restart followed by smoke test passes (K3s restart persistence confirmed)
  • Gitea Actions pipeline: push to main → image built → deployed → smoke test green within 15 minutes
  • No dependency on haskelseed being up for the app to run (only for builds)

Open Questions / Pre-flight Checks

  1. K3s status: ThreePhoenix HA cluster workstream is active but not complete. Confirm whether Railiance01 is a single-node cluster already accepting workloads or still being provisioned. Gate R3 is the go/no-go check.

  2. Container registry: Is Gitea's built-in registry available on Railiance01, or is a separate registry service needed? If neither, add registry deployment to the scope.

  3. PostgreSQL HA status: railiance-platform baseline workstream is active. Confirm whether the HA cluster (repmgr + pgpool) is operational before R4.

  4. Static asset bundling: The Nix production binary may or may not include static/app.css (Tailwind output). Verify in R2 and adjust image build if needed.

  5. Anthropic API key: Phase 5 AI-assisted distillation requires IHP_ANTHROPIC_API_KEY. Add to SOPS secrets if the feature is to be active on Railiance01.