Files
inter-hub/workplans/IHUB-WP-0018-railiance01-deployment.md

22 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated depends_on state_hub_workstream_id
IHUB-WP-0018 workplan Railiance01 Deployment — Production Operations Scaffold inter_hub inter-hub finished custodian inter_hub 2026-04-29 2026-06-14 IHUB-WP-0015 080d841a-3acd-4adf-b684-2d1890a5e986

IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold

Goal

Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a Gitea Actions CI/CD pipeline. After this workplan, every push to main automatically builds an OCI container image on haskelseed, pushes it to the Railiance container registry, and deploys it — with automatic restart on node reboot guaranteed by K3s.

Background

inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer and socat. That setup is a development convenience, not a production operations scaffold. The target is the Railiance01 K3s cluster, which has:

  • K3s (single-node for now; ThreePhoenix HA cluster is in progress)
  • Traefik ingress with TLS
  • PostgreSQL HA (repmgr + pgpool) managed by railiance-platform
  • SOPS/age secret management
  • Gitea with built-in container registry (or separate registry service)
  • Staged Promotion Lifecycle CLI (railiance run / deploy / promote / rollback)

Key constraint: This workplan depends on Railiance01 K3s being operational. Gate R3 verifies cluster readiness before any deployment work begins — if K3s or the container registry is not ready, this workplan blocks there and the cluster work must be completed first.

IHP specifics: IHP DevServer is a development server. For production we build the IHP binary via nix build (which produces a self-contained binary) and wrap it in a minimal OCI image using Nix's dockerTools.buildImage. The app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes since Traefik routes directly to the pod's port.

Architecture

git push → Gitea Actions
  → SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA
  → helm upgrade inter-hub railiance-apps/helm/inter-hub
    → Deployment (1 replica): inter-hub:$SHA + env from Secrets
    → Service (ClusterIP :8000)
    → Ingress (Traefik): hub.coulomb.social → Service
    → PersistentVolumeClaim: /app/static (generated CSS/JS)
  → PostgreSQL: database 'interhub' on railiance-platform HA cluster

Close-out Audit - 2026-06-04

WSJF triage flagged this workplan as a close-out candidate because State Hub had no indexed task rows for it. The deployment work is not complete; this file now contains explicit task blocks so the hub can track the remaining Railiance01 deployment work instead of treating the workplan as empty.

Deployment Review - 2026-06-05

Review against the current repo and public Railiance endpoint shows the deployment scaffold is partially implemented but the live deployment is behind origin/main.

  • origin/main is at a3d980c, which includes the completed ops-hub bootstrap API work from IHUB-WP-0019.
  • https://hub.coulomb.social/ returns 200 and serves inter-hub.
  • The public OpenAPI only lists the older v2 endpoints; it does not include /hubs, /hub-capability-manifests, /api-consumers, or /policy-scopes.
  • Unauthenticated /api/v2/hubs returns 404 publicly, while current source should route it and return 401. This means ops-hub bootstrap cannot run against production until the current image is deployed.
  • The registry endpoint returns the expected unauthenticated /v2/ 401 challenge, but this workspace does not have kubectl, so R3 cluster readiness cannot be fully verified from here.

Tasks

R1 - Add OCI image build to flake.nix

id: IHUB-WP-0018-T01
status: done
priority: high
state_hub_task_id: "27420bd7-0f70-4793-8805-393d8d5cacfd"

Add a packages.docker output to flake.nix using pkgs.dockerTools.buildLayeredImage. The image wraps the IHP production binary produced by nix build .#default.

packages.docker = pkgs.dockerTools.buildLayeredImage {
  name = "inter-hub";
  tag = "latest";
  contents = [ self.packages.${system}.default pkgs.cacert ];
  config = {
    Cmd = [ "/bin/inter-hub" ];
    ExposedPorts = { "8000/tcp" = {}; };
    Env = [
      "PORT=8000"
      "IHP_ENV=Production"
    ];
  };
};

Test locally on haskelseed:

nix build .#docker
docker load < result
docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest

Note: First build pulls the full Haskell binary closure (~2 GB); subsequent builds are incremental (layer caching). Build must run on haskelseed - the only machine with the Nix store populated for GHC 9.10.3.

Implementation note (2026-06-05): flake.nix exposes packages.docker = config.packages.unoptimized-docker-image, the IHP-provided production OCI image used by the Railiance runbook. The original buildLayeredImage sketch is superseded by that IHP image path.

R2 — Verify container runs correctly

id: IHUB-WP-0018-T02
status: done
priority: high
state_hub_task_id: "5ab45e4e-16bc-4feb-8b1b-e8eeb05bf39a"

On haskelseed, run the container image against the existing interhub database. Confirm:

  • curl http://localhost:8000/ returns 200 (LandingAction)
  • curl http://localhost:8000/api/v2/hubs returns 401 (auth required)
  • Static assets load (Tailwind CSS present in image)
  • Container exits cleanly on SIGTERM

If Tailwind CSS output (static/app.css) is not bundled into the Nix binary closure, add a pre-build step: run tailwindcss and include static/ in the image via dockerTools.buildLayeredImage contents or a NixOS module.

R3 — Verify Railiance01 readiness (gate)

id: IHUB-WP-0018-T03
status: done
priority: high
state_hub_task_id: "79b5cf2c-3a5b-4b4b-8f84-f635cb6891c1"

This is a dependency gate. Before proceeding, confirm:

# From CoulombCore (execution origin):
kubectl get nodes          # must show Ready
kubectl get pods -n kube-system | grep traefik   # Traefik must be running
kubectl get pods -n railiance-platform            # PostgreSQL HA pods

Also confirm:

  • Container registry is reachable from haskelseed (verify push access)
  • Registry address (e.g., registry.coulomb.social or gitea.coulomb.social)
  • SOPS/age key is present on CoulombCore at ~/.config/sops/age/keys.txt

If any check fails, block here and open the relevant Railiance workstream. Do not proceed until all checks pass.

Review note (2026-06-05): Public smoke probes show https://hub.coulomb.social/ returning 200 and the Gitea registry /v2/ endpoint returning the expected unauthenticated 401 challenge. Full R3 remains blocked from this workspace because kubectl is not available here, and the live app is not serving the current origin/main v2 bootstrap routes.

Recovery note (2026-06-14): Re-established the haskelseed ops-bridge path and verified the runner substrate before deployment. make runner-status in railiance-forge confirmed act_runner is registered to https://gitea.coulomb.social, running under OpenRC, and has the expected self-hosted labels and build/deploy tools. The K3s API path, Helm deploy path, and Gitea registry host were exercised successfully by the production rollout.

R4 — Provision inter-hub database on railiance-platform

id: IHUB-WP-0018-T04
status: done
priority: high
state_hub_task_id: "c937cf36-3850-4ab3-aa83-2d846e1a378e"

On the PostgreSQL HA cluster, create the inter-hub database and user:

CREATE USER interhub WITH PASSWORD '<generated>';
CREATE DATABASE interhub OWNER interhub;
GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub;

Run schema migration (IHP migrations) as part of the first deployment via an init container or a manual migrate run inside the pod. Document the migration procedure in deploy/railiance/RUNBOOK.md.

Recovery note (2026-06-14): Bootstrapped the production database manually on the Railiance PostgreSQL cluster: role interhub, database interhub, schema ownership, and privileges were created/updated. The running deployment now uses that database through the inter-hub-env Kubernetes Secret.

Production initialization note (2026-06-14): After DNS/TLS and network access were restored, production OpenAPI still failed because the interhub database was blank (public_table_count:0). The IHP production image only contains RunProdServer and RunJobs, so there was no packaged migration runner to execute. Initialized the database through the CloudNativePG pod by loading Application/Schema.sql in one transaction, applying the idempotent type-registry seed migration 1744502400, and granting app privileges on the new schema to the interhub role. The default admin seed with a known password was intentionally not applied to production.

R5 — SOPS-encrypted secrets

id: IHUB-WP-0018-T05
status: done
priority: high
state_hub_task_id: "926f82d1-15cd-425d-8a41-3d6b51c07f0b"

Create deploy/railiance/secrets/inter-hub.env.sops.yaml with:

apiVersion: v1
kind: Secret
metadata:
  name: inter-hub-env
  namespace: inter-hub
type: Opaque
stringData:
  DATABASE_URL: postgresql://interhub:<pass>@net-kingdom-pg-rw.databases.svc.cluster.local:5432/interhub?sslmode=disable
  IHP_SESSION_SECRET: <64-char-hex>
  IHP_BASEURL: https://hub.coulomb.social
  PORT: "8000"
  IHP_ENV: Production

Encrypt with the age key:

sops --encrypt \
  --age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \
  /tmp/inter-hub-env.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml

Commit only the encrypted file. Apply it with sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -.

Recovery note (2026-06-14): Runtime secrets were bootstrapped manually in Kubernetes so production could deploy safely. This task remains in progress until the durable SOPS-encrypted source for DATABASE_URL, IHP_SESSION_SECRET, and related runtime env is committed and wired into the deploy path.

Progress note (2026-06-14): Added repo root .sops.yaml, plaintext guardrails under deploy/railiance/secrets/, an example Secret manifest, and k8s-secret-json-to-sops-input.py to convert the live Kubernetes Secret into a SOPS-ready manifest without printing values. At that point the encrypted source file was still pending because local sops tooling was not available.

Completion note (2026-06-14): Created deploy/railiance/secrets/inter-hub.env.sops.yaml from the live inter-hub/inter-hub-env Kubernetes Secret using temporary sops v3.13.1 and the shared Railiance age recipient. Verified the file is SOPS-encrypted, parses as YAML, leaves only non-secret metadata reviewable, and does not contain the checked plaintext runtime markers. Decryption/apply verification remains a custody-backed operator capability because the private age identity is not present in the normal workstation or haskelseed shell.

R6 — Helm chart in railiance-apps

id: IHUB-WP-0018-T06
status: done
priority: high
state_hub_task_id: "4c4acc98-5773-4289-ad57-03f3fd5c381c"

Create charts/inter-hub/ in the railiance-apps repository following the Railiance app.toml contract. Minimal chart:

charts/inter-hub/
  Chart.yaml          name: inter-hub, version: 0.1.0
  values.yaml         image.tag, ingress.host, resources
helm/inter-hub-values.yaml
                      production non-secret overrides
  templates/
    deployment.yaml   envFrom: secretRef inter-hub-env
    service.yaml      ClusterIP :8000
    ingress.yaml      Traefik annotations, TLS

app.toml in the inter-hub repo root for railiance CLI integration:

[app]
name = "inter-hub"
slug = "inter-hub"
kind = "native"
registry = "gitea.coulomb.social/coulomb/inter-hub"

[deploy]
chart = "railiance-apps/charts/inter-hub"
namespace = "inter-hub"

Implementation note (2026-06-05): A Helm chart exists in deploy/helm/inter-hub/ with Deployment, Service, Ingress, and values for the current Gitea registry and hub.coulomb.social. Remaining gaps: no repo-root app.toml, no committed SOPS secret manifest, and no separate railiance-apps/helm/inter-hub handoff in this repo.

Recovery note (2026-06-14): The local chart under deploy/helm/inter-hub/ successfully deployed the app to Railiance01. This task remains in progress because the repo-root app.toml and railiance-apps handoff are still not completed.

Completion note (2026-06-14): Added repo-root app.toml in inter-hub and added charts/inter-hub, helm/inter-hub-values.yaml, Makefile targets, and server-dry-run coverage in railiance-apps. The chart rendered successfully on haskelseed with helm template.

R7 — Gitea Actions CI/CD pipeline

id: IHUB-WP-0018-T07
status: done
priority: medium
state_hub_task_id: "ec25c67c-3cb0-4534-9fb0-9bd6578a2def"

Create .gitea/workflows/deploy.yaml in the inter-hub repo:

on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest   # or self-hosted if available
    steps:
      - uses: actions/checkout@v4

      - name: Build OCI image on haskelseed
        run: |
          ssh haskelseed "cd /root/inter-hub && git pull && \
            nix build .#docker && \
            docker load < result && \
            docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \
            docker push $REGISTRY/inter-hub:${{ github.sha }}"

      - name: Deploy to Railiance01
        run: |
          ssh coulombcore "helm upgrade --install inter-hub \
            railiance-apps/helm/inter-hub \
            --namespace inter-hub --create-namespace \
            --set image.tag=${{ github.sha }} \
            -f railiance-apps/helm/inter-hub/values.prod.yaml"

Secrets in Gitea: REGISTRY, SSH_KEY_HASKELSEED, SSH_KEY_COULOMBCORE.

Alternative if self-hosted runner is available on CoulombCore: run the deploy step directly without the SSH hop to coulombcore.

Implementation note (2026-06-05): .gitea/workflows/deploy.yaml exists and builds .#docker on a self-hosted haskelseed runner, pushes to 92.205.130.254:32166/coulomb/inter-hub, deploys with Helm, and smoke-tests the public endpoint. Remote main is already current, but production is still serving an older API surface, so the workflow needs an attended rerun/inspection or a new deployment trigger.

Runner substrate finding (2026-06-07): Pushed commits fa96fb8 and 7cc3173 to trigger the workflow, but public /api/v2/hubs remained 404 while / stayed 200, indicating the current image was not deployed. Repo search shows railiance-forge owns Actions runner substrate, but its 2026-06-05 migration plan explicitly lists "No Actions runner deployment" as a non-goal and no runner manifest/script/workplan exists there yet. haskelseed itself is reachable on SSH and historical port 8080, but this workspace cannot authenticate non-interactively. Treat R7 as blocked on a forge-owned runner prerequisite rather than continuing to push commits as deployment probes.

Recovery note (2026-06-14): The runner prerequisite was restored through the haskelseed ops-bridge path. The workflow now builds the Nix OCI image, publishes to gitea.coulomb.social/coulomb/inter-hub using a registry bearer token from the repo REGISTRY_TOKEN Actions secret, deploys with Helm, and runs public smoke checks. Gitea Actions run 2913 completed successfully for commit 5663fab.

Load-control note (2026-06-14): Added workflow paths-ignore for docs, workplans, .custodian-brief.md, app.toml, .sops.yaml, and deploy/railiance/** so State Hub consistency/doc-only commits do not consume a haskelseed build/deploy cycle.

Bootstrap-gate deploy note (2026-06-14): Hardened the deployment workflow smoke test so a production rollout only passes when /api/v2/hubs returns the expected unauthenticated 401 and OpenAPI exposes /hubs, /hub-capability-manifests, /api-consumers, and /policy-scopes. This directly protects the ops-hub bootstrap gate instead of only checking the landing page and generic widget auth gate.

Authenticated inspection note (2026-06-14): The stored local Tea token is stale for https://gitea.coulomb.social, but runner-side inspection succeeded. make runner-status in railiance-forge showed act_runner registered to https://gitea.coulomb.social, started under OpenRC, and carrying the expected self-hosted/haskelseed labels. The runner log shows task 19 for coulomb/inter-hub starting at 2026-06-14T19:59:19+02:00, matching the 6455902 deploy trigger.

R8 — Staged deployment and smoke test

id: IHUB-WP-0018-T08
status: done
priority: high
state_hub_task_id: "2b02ae5c-47b9-4f09-88f0-a4af7900b38f"

Follow the Railiance staged promotion lifecycle:

  1. Local verify (done in R2 — container runs correctly)
  2. Deploy to Railiance01:
    railiance deploy inter-hub --tag <sha>
    
  3. Smoke test:
    curl -s https://hub.coulomb.social/ | grep "Inter-Hub"   # Landing page
    curl -s https://hub.coulomb.social/capabilities           # Capabilities
    curl -H "Authorization: Bearer <key>" \
      https://hub.coulomb.social/api/v2/hubs                 # API (200)
    curl https://hub.coulomb.social/api/v2/hubs              # Unauthenticated (401)
    
  4. Verify restart persistence:
    kubectl rollout restart deployment/inter-hub -n inter-hub
    kubectl rollout status deployment/inter-hub -n inter-hub
    # Then re-run smoke test
    

Recovery note (2026-06-14): Production is deployed from image gitea.coulomb.social/coulomb/inter-hub:5663fab; Kubernetes reports the inter-hub deployment ready with one replica. Public smoke checks pass: / returns 200 and contains inter-hub, /api/v2/openapi.json returns 200, and unauthenticated /api/v2/widgets returns 401.

DNS gate finding (2026-06-14): The deployment workflow did publish and deploy gitea.coulomb.social/coulomb/inter-hub:6455902; Kubernetes reports the inter-hub Deployment ready on the COULOMBCORE K3s node 92.205.130.254. An in-cluster probe to http://inter-hub:8000/api/v2/hubs returned the expected unauthenticated 401, and forcing public TLS to 92.205.130.254 also returned 401. The public DNS record for hub.coulomb.social, however, resolves to 92.205.62.239, where /api/v2/hubs still returns 404 and OpenAPI lacks the bootstrap paths. The remaining production gate is therefore DNS cutover (or an intentional kubeconfig rotation to the cluster behind 92.205.62.239), not a runner, build, registry, Helm, or image-content issue.

Production gate completion note (2026-06-14): DNS for hub.coulomb.social now resolves to 92.205.130.254, cert-manager issued a Let's Encrypt certificate for the host, and the app deployment is serving image gitea.coulomb.social/coulomb/inter-hub:6455902. The final blockers were database ingress from inter-hub to net-kingdom-pg and the blank production schema. Added/applied the platform NetworkPolicy, initialized the interhub schema and framework type registries, granted privileges to the app role, and restarted the deployment. The ops-hub gate probe now passes: /api/v2/hubs returns the expected unauthenticated 401, /api/v2/openapi.json returns 200, and OpenAPI exposes /hubs, /hub-capability-manifests, /api-consumers, and /policy-scopes.

R9 — Document and register

id: IHUB-WP-0018-T09
status: done
priority: medium
state_hub_task_id: "4d1e55c7-8dbb-480f-b07b-6c5e39a04218"
  • Write deploy/railiance/RUNBOOK.md: image build, migration procedure, secret rotation, rollback (railiance rollback inter-hub), log access (kubectl logs -n inter-hub -l app=inter-hub --tail=100)
  • Add progress event to state hub
  • Remove haskelseed socat/OpenRC production role note from quickstart - document it as the build machine only, not the production host

Implementation note (2026-06-05): deploy/railiance/RUNBOOK.md exists and documents architecture, image build/push, Helm deployment, logs, restart, rollback, secret rotation, and smoke checks. The deployment record remains incomplete until current main is running and the ops-hub bootstrap smoke test passes against production.

Recovery note (2026-06-14): Current main is running in production and the deployment evidence has been recorded here. Remaining documentation work is to capture the durable secret-management and railiance-apps handoff path once R5 and R6 are completed.

Completion note (2026-06-14): Updated deploy/railiance/RUNBOOK.md for the current Gitea registry host, runner-based build/deploy path, SOPS secret handoff, current smoke checks, and haskelseed's build-runner-only role. Updated docs/new-hub-quickstart.md so haskelseed is no longer described as a production/shared database runtime.

Exit Criteria

  • https://hub.coulomb.social/ returns the Landing page (200, no auth)
  • /api/v2/hubs returns 401 unauthenticated, 200 with valid API key
  • All 12 IHF dashboards accessible after admin login
  • kubectl rollout restart followed by smoke test passes (K3s restart persistence confirmed)
  • Gitea Actions pipeline: push to main → image built → deployed → smoke test green within 15 minutes
  • No dependency on haskelseed being up for the app to run (only for builds)

Open Questions / Pre-flight Checks

  1. K3s status: ThreePhoenix HA cluster workstream is active but not complete. Confirm whether Railiance01 is a single-node cluster already accepting workloads or still being provisioned. Gate R3 is the go/no-go check.

  2. Container registry: Is Gitea's built-in registry available on Railiance01, or is a separate registry service needed? If neither, add registry deployment to the scope.

  3. PostgreSQL HA status: railiance-platform baseline workstream is active. Confirm whether the HA cluster (repmgr + pgpool) is operational before R4.

  4. Static asset bundling: The Nix production binary may or may not include static/app.css (Tailwind output). Verify in R2 and adjust image build if needed.

  5. Anthropic API key: Phase 5 AI-assisted distillation requires IHP_ANTHROPIC_API_KEY. Add to SOPS secrets if the feature is to be active on Railiance01.