Files
inter-hub/workplans/IHUB-WP-0018-railiance01-deployment.md
tegwick 0edf05324e
Some checks failed
Test / test (push) Has been cancelled
feat(WP-0018): workplan for Railiance01 deployment with full ops scaffold
OCI image build (Nix dockerTools), Helm chart in railiance-apps,
SOPS/age secrets, PostgreSQL HA on railiance-platform, Traefik ingress,
Gitea Actions CI/CD. Includes dependency gate on K3s cluster readiness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 15:50:24 +02:00

295 lines
10 KiB
Markdown

---
id: IHUB-WP-0018
type: workplan
title: "Railiance01 Deployment — Production Operations Scaffold"
domain: inter_hub
repo: inter-hub
status: open
owner: custodian
topic_slug: inter_hub
created: "2026-04-29"
updated: "2026-04-29"
depends_on: IHUB-WP-0015
state_hub_workstream_id: "080d841a-3acd-4adf-b684-2d1890a5e986"
---
# IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold
## Goal
Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic
deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a
Gitea Actions CI/CD pipeline. After this workplan, every push to `main`
automatically builds an OCI container image on haskelseed, pushes it to the
Railiance container registry, and deploys it — with automatic restart on node
reboot guaranteed by K3s.
## Background
inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer
and socat. That setup is a development convenience, not a production operations
scaffold. The target is the Railiance01 K3s cluster, which has:
- K3s (single-node for now; ThreePhoenix HA cluster is in progress)
- Traefik ingress with TLS
- PostgreSQL HA (repmgr + pgpool) managed by railiance-platform
- SOPS/age secret management
- Gitea with built-in container registry (or separate registry service)
- Staged Promotion Lifecycle CLI (`railiance run / deploy / promote / rollback`)
**Key constraint:** This workplan depends on Railiance01 K3s being operational.
Gate R3 verifies cluster readiness before any deployment work begins — if K3s
or the container registry is not ready, this workplan blocks there and the
cluster work must be completed first.
**IHP specifics:** IHP DevServer is a development server. For production we
build the IHP binary via `nix build` (which produces a self-contained binary)
and wrap it in a minimal OCI image using Nix's `dockerTools.buildImage`. The
app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes
since Traefik routes directly to the pod's port.
## Architecture
```
git push → Gitea Actions
→ SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA
→ helm upgrade inter-hub railiance-apps/helm/inter-hub
→ Deployment (1 replica): inter-hub:$SHA + env from Secrets
→ Service (ClusterIP :8000)
→ Ingress (Traefik): hub.coulomb.social → Service
→ PersistentVolumeClaim: /app/static (generated CSS/JS)
→ PostgreSQL: database 'interhub' on railiance-platform HA cluster
```
## Tasks
### R1 — Add OCI image build to flake.nix
Add a `packages.docker` output to `flake.nix` using `pkgs.dockerTools.buildLayeredImage`.
The image wraps the IHP production binary produced by `nix build .#default`.
```nix
packages.docker = pkgs.dockerTools.buildLayeredImage {
name = "inter-hub";
tag = "latest";
contents = [ self.packages.${system}.default pkgs.cacert ];
config = {
Cmd = [ "/bin/inter-hub" ];
ExposedPorts = { "8000/tcp" = {}; };
Env = [
"PORT=8000"
"IHP_ENV=Production"
];
};
};
```
Test locally on haskelseed:
```bash
nix build .#docker
docker load < result
docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest
```
**Note:** First build pulls the full Haskell binary closure (~2 GB); subsequent
builds are incremental (layer caching). Build must run on haskelseed — the only
machine with the Nix store populated for GHC 9.10.3.
### R2 — Verify container runs correctly
On haskelseed, run the container image against the existing `interhub` database.
Confirm:
- `curl http://localhost:8000/` returns 200 (LandingAction)
- `curl http://localhost:8000/api/v2/hubs` returns 401 (auth required)
- Static assets load (Tailwind CSS present in image)
- Container exits cleanly on SIGTERM
If Tailwind CSS output (`static/app.css`) is not bundled into the Nix binary
closure, add a pre-build step: run tailwindcss and include `static/` in the
image via `dockerTools.buildLayeredImage` `contents` or a NixOS module.
### R3 — Verify Railiance01 readiness (gate)
This is a dependency gate. Before proceeding, confirm:
```bash
# From CoulombCore (execution origin):
kubectl get nodes # must show Ready
kubectl get pods -n kube-system | grep traefik # Traefik must be running
kubectl get pods -n railiance-platform # PostgreSQL HA pods
```
Also confirm:
- Container registry is reachable from haskelseed (verify push access)
- Registry address (e.g., `registry.coulomb.social` or `gitea.coulomb.social`)
- SOPS/age key is present on CoulombCore at `~/.config/sops/age/keys.txt`
If any check fails, block here and open the relevant Railiance workstream.
Do not proceed until all checks pass.
### R4 — Provision inter-hub database on railiance-platform
On the PostgreSQL HA cluster, create the inter-hub database and user:
```sql
CREATE USER interhub WITH PASSWORD '<generated>';
CREATE DATABASE interhub OWNER interhub;
GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub;
```
Run schema migration (IHP migrations) as part of the first deployment via an
init container or a manual `migrate` run inside the pod. Document the
migration procedure in `deploy/railiance/RUNBOOK.md`.
### R5 — SOPS-encrypted secrets
Create `deploy/railiance/secrets/inter-hub.env.sops.yaml` with:
```yaml
# sops encrypted — do not edit manually
DATABASE_URL: postgresql://interhub:<pass>@pgpool.railiance-platform.svc:5432/interhub
IHP_SESSION_SECRET: <64-char-hex>
IHP_BASEURL: https://hub.coulomb.social
```
Encrypt with the age key:
```bash
sops --encrypt --age $(cat ~/.config/sops/age/keys.txt | grep public | awk '{print $4}') \
deploy/railiance/secrets/inter-hub.env.sops.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml
```
Commit the encrypted file. The Gitea Actions workflow decrypts at deploy time
using the age key from a Kubernetes Secret (bootstrapped once manually).
### R6 — Helm chart in railiance-apps
Create `helm/inter-hub/` in the `railiance-apps` repository following the
Railiance app.toml contract. Minimal chart:
```
helm/inter-hub/
Chart.yaml name: inter-hub, version: 0.1.0
values.yaml image.tag, ingress.host, resources
values.prod.yaml replicas: 1, resources.requests.memory: 1Gi
templates/
deployment.yaml envFrom: secretRef inter-hub-env
service.yaml ClusterIP :8000
ingress.yaml Traefik annotations, TLS
secret.yaml created by sops-operator or external-secrets
```
`app.toml` in the inter-hub repo root for railiance CLI integration:
```toml
[app]
name = "inter-hub"
slug = "inter-hub"
kind = "native"
registry = "registry.coulomb.social/coulomb/inter-hub"
[deploy]
chart = "railiance-apps/helm/inter-hub"
namespace = "inter-hub"
```
### R7 — Gitea Actions CI/CD pipeline
Create `.gitea/workflows/deploy.yaml` in the inter-hub repo:
```yaml
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest # or self-hosted if available
steps:
- uses: actions/checkout@v4
- name: Build OCI image on haskelseed
run: |
ssh haskelseed "cd /root/inter-hub && git pull && \
nix build .#docker && \
docker load < result && \
docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \
docker push $REGISTRY/inter-hub:${{ github.sha }}"
- name: Deploy to Railiance01
run: |
ssh coulombcore "helm upgrade --install inter-hub \
railiance-apps/helm/inter-hub \
--namespace inter-hub --create-namespace \
--set image.tag=${{ github.sha }} \
-f railiance-apps/helm/inter-hub/values.prod.yaml"
```
Secrets in Gitea: `REGISTRY`, `SSH_KEY_HASKELSEED`, `SSH_KEY_COULOMBCORE`.
**Alternative if self-hosted runner is available on CoulombCore:** run the
deploy step directly without the SSH hop to coulombcore.
### R8 — Staged deployment and smoke test
Follow the Railiance staged promotion lifecycle:
1. **Local verify** (done in R2 — container runs correctly)
2. **Deploy to Railiance01:**
```bash
railiance deploy inter-hub --tag <sha>
```
3. **Smoke test:**
```bash
curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page
curl -s https://hub.coulomb.social/capabilities # Capabilities
curl -H "Authorization: Bearer <key>" \
https://hub.coulomb.social/api/v2/hubs # API (200)
curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401)
```
4. **Verify restart persistence:**
```bash
kubectl rollout restart deployment/inter-hub -n inter-hub
kubectl rollout status deployment/inter-hub -n inter-hub
# Then re-run smoke test
```
### R9 — Document and register
- Write `deploy/railiance/RUNBOOK.md`: image build, migration procedure,
secret rotation, rollback (`railiance rollback inter-hub`), log access
(`kubectl logs -n inter-hub -l app=inter-hub --tail=100`)
- Add progress event to state hub
- Remove haskelseed socat/OpenRC production role note from quickstart —
document it as the build machine only, not the production host
## Exit Criteria
- `https://hub.coulomb.social/` returns the Landing page (200, no auth)
- `/api/v2/hubs` returns 401 unauthenticated, 200 with valid API key
- All 12 IHF dashboards accessible after admin login
- `kubectl rollout restart` followed by smoke test passes (K3s restart
persistence confirmed)
- Gitea Actions pipeline: push to `main` → image built → deployed → smoke
test green within 15 minutes
- No dependency on haskelseed being up for the app to *run* (only for builds)
## Open Questions / Pre-flight Checks
1. **K3s status**: ThreePhoenix HA cluster workstream is active but not complete.
Confirm whether Railiance01 is a single-node cluster already accepting
workloads or still being provisioned. Gate R3 is the go/no-go check.
2. **Container registry**: Is Gitea's built-in registry available on Railiance01,
or is a separate registry service needed? If neither, add registry deployment
to the scope.
3. **PostgreSQL HA status**: railiance-platform baseline workstream is active.
Confirm whether the HA cluster (repmgr + pgpool) is operational before R4.
4. **Static asset bundling**: The Nix production binary may or may not include
`static/app.css` (Tailwind output). Verify in R2 and adjust image build
if needed.
5. **Anthropic API key**: Phase 5 AI-assisted distillation requires
`IHP_ANTHROPIC_API_KEY`. Add to SOPS secrets if the feature is to be
active on Railiance01.