generated from coulomb/repo-seed
feat(WP-0018): workplan for Railiance01 deployment with full ops scaffold
Some checks failed
Test / test (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
OCI image build (Nix dockerTools), Helm chart in railiance-apps, SOPS/age secrets, PostgreSQL HA on railiance-platform, Traefik ingress, Gitea Actions CI/CD. Includes dependency gate on K3s cluster readiness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
294
workplans/IHUB-WP-0018-railiance01-deployment.md
Normal file
294
workplans/IHUB-WP-0018-railiance01-deployment.md
Normal file
@@ -0,0 +1,294 @@
|
||||
---
|
||||
id: IHUB-WP-0018
|
||||
type: workplan
|
||||
title: "Railiance01 Deployment — Production Operations Scaffold"
|
||||
domain: inter_hub
|
||||
repo: inter-hub
|
||||
status: open
|
||||
owner: custodian
|
||||
topic_slug: inter_hub
|
||||
created: "2026-04-29"
|
||||
updated: "2026-04-29"
|
||||
depends_on: IHUB-WP-0015
|
||||
state_hub_workstream_id: "080d841a-3acd-4adf-b684-2d1890a5e986"
|
||||
---
|
||||
|
||||
# IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold
|
||||
|
||||
## Goal
|
||||
|
||||
Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic
|
||||
deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a
|
||||
Gitea Actions CI/CD pipeline. After this workplan, every push to `main`
|
||||
automatically builds an OCI container image on haskelseed, pushes it to the
|
||||
Railiance container registry, and deploys it — with automatic restart on node
|
||||
reboot guaranteed by K3s.
|
||||
|
||||
## Background
|
||||
|
||||
inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer
|
||||
and socat. That setup is a development convenience, not a production operations
|
||||
scaffold. The target is the Railiance01 K3s cluster, which has:
|
||||
|
||||
- K3s (single-node for now; ThreePhoenix HA cluster is in progress)
|
||||
- Traefik ingress with TLS
|
||||
- PostgreSQL HA (repmgr + pgpool) managed by railiance-platform
|
||||
- SOPS/age secret management
|
||||
- Gitea with built-in container registry (or separate registry service)
|
||||
- Staged Promotion Lifecycle CLI (`railiance run / deploy / promote / rollback`)
|
||||
|
||||
**Key constraint:** This workplan depends on Railiance01 K3s being operational.
|
||||
Gate R3 verifies cluster readiness before any deployment work begins — if K3s
|
||||
or the container registry is not ready, this workplan blocks there and the
|
||||
cluster work must be completed first.
|
||||
|
||||
**IHP specifics:** IHP DevServer is a development server. For production we
|
||||
build the IHP binary via `nix build` (which produces a self-contained binary)
|
||||
and wrap it in a minimal OCI image using Nix's `dockerTools.buildImage`. The
|
||||
app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes
|
||||
since Traefik routes directly to the pod's port.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
git push → Gitea Actions
|
||||
→ SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA
|
||||
→ helm upgrade inter-hub railiance-apps/helm/inter-hub
|
||||
→ Deployment (1 replica): inter-hub:$SHA + env from Secrets
|
||||
→ Service (ClusterIP :8000)
|
||||
→ Ingress (Traefik): hub.coulomb.social → Service
|
||||
→ PersistentVolumeClaim: /app/static (generated CSS/JS)
|
||||
→ PostgreSQL: database 'interhub' on railiance-platform HA cluster
|
||||
```
|
||||
|
||||
## Tasks
|
||||
|
||||
### R1 — Add OCI image build to flake.nix
|
||||
|
||||
Add a `packages.docker` output to `flake.nix` using `pkgs.dockerTools.buildLayeredImage`.
|
||||
The image wraps the IHP production binary produced by `nix build .#default`.
|
||||
|
||||
```nix
|
||||
packages.docker = pkgs.dockerTools.buildLayeredImage {
|
||||
name = "inter-hub";
|
||||
tag = "latest";
|
||||
contents = [ self.packages.${system}.default pkgs.cacert ];
|
||||
config = {
|
||||
Cmd = [ "/bin/inter-hub" ];
|
||||
ExposedPorts = { "8000/tcp" = {}; };
|
||||
Env = [
|
||||
"PORT=8000"
|
||||
"IHP_ENV=Production"
|
||||
];
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
Test locally on haskelseed:
|
||||
```bash
|
||||
nix build .#docker
|
||||
docker load < result
|
||||
docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest
|
||||
```
|
||||
|
||||
**Note:** First build pulls the full Haskell binary closure (~2 GB); subsequent
|
||||
builds are incremental (layer caching). Build must run on haskelseed — the only
|
||||
machine with the Nix store populated for GHC 9.10.3.
|
||||
|
||||
### R2 — Verify container runs correctly
|
||||
|
||||
On haskelseed, run the container image against the existing `interhub` database.
|
||||
Confirm:
|
||||
- `curl http://localhost:8000/` returns 200 (LandingAction)
|
||||
- `curl http://localhost:8000/api/v2/hubs` returns 401 (auth required)
|
||||
- Static assets load (Tailwind CSS present in image)
|
||||
- Container exits cleanly on SIGTERM
|
||||
|
||||
If Tailwind CSS output (`static/app.css`) is not bundled into the Nix binary
|
||||
closure, add a pre-build step: run tailwindcss and include `static/` in the
|
||||
image via `dockerTools.buildLayeredImage` `contents` or a NixOS module.
|
||||
|
||||
### R3 — Verify Railiance01 readiness (gate)
|
||||
|
||||
This is a dependency gate. Before proceeding, confirm:
|
||||
|
||||
```bash
|
||||
# From CoulombCore (execution origin):
|
||||
kubectl get nodes # must show Ready
|
||||
kubectl get pods -n kube-system | grep traefik # Traefik must be running
|
||||
kubectl get pods -n railiance-platform # PostgreSQL HA pods
|
||||
```
|
||||
|
||||
Also confirm:
|
||||
- Container registry is reachable from haskelseed (verify push access)
|
||||
- Registry address (e.g., `registry.coulomb.social` or `gitea.coulomb.social`)
|
||||
- SOPS/age key is present on CoulombCore at `~/.config/sops/age/keys.txt`
|
||||
|
||||
If any check fails, block here and open the relevant Railiance workstream.
|
||||
Do not proceed until all checks pass.
|
||||
|
||||
### R4 — Provision inter-hub database on railiance-platform
|
||||
|
||||
On the PostgreSQL HA cluster, create the inter-hub database and user:
|
||||
|
||||
```sql
|
||||
CREATE USER interhub WITH PASSWORD '<generated>';
|
||||
CREATE DATABASE interhub OWNER interhub;
|
||||
GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub;
|
||||
```
|
||||
|
||||
Run schema migration (IHP migrations) as part of the first deployment via an
|
||||
init container or a manual `migrate` run inside the pod. Document the
|
||||
migration procedure in `deploy/railiance/RUNBOOK.md`.
|
||||
|
||||
### R5 — SOPS-encrypted secrets
|
||||
|
||||
Create `deploy/railiance/secrets/inter-hub.env.sops.yaml` with:
|
||||
|
||||
```yaml
|
||||
# sops encrypted — do not edit manually
|
||||
DATABASE_URL: postgresql://interhub:<pass>@pgpool.railiance-platform.svc:5432/interhub
|
||||
IHP_SESSION_SECRET: <64-char-hex>
|
||||
IHP_BASEURL: https://hub.coulomb.social
|
||||
```
|
||||
|
||||
Encrypt with the age key:
|
||||
```bash
|
||||
sops --encrypt --age $(cat ~/.config/sops/age/keys.txt | grep public | awk '{print $4}') \
|
||||
deploy/railiance/secrets/inter-hub.env.sops.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml
|
||||
```
|
||||
|
||||
Commit the encrypted file. The Gitea Actions workflow decrypts at deploy time
|
||||
using the age key from a Kubernetes Secret (bootstrapped once manually).
|
||||
|
||||
### R6 — Helm chart in railiance-apps
|
||||
|
||||
Create `helm/inter-hub/` in the `railiance-apps` repository following the
|
||||
Railiance app.toml contract. Minimal chart:
|
||||
|
||||
```
|
||||
helm/inter-hub/
|
||||
Chart.yaml name: inter-hub, version: 0.1.0
|
||||
values.yaml image.tag, ingress.host, resources
|
||||
values.prod.yaml replicas: 1, resources.requests.memory: 1Gi
|
||||
templates/
|
||||
deployment.yaml envFrom: secretRef inter-hub-env
|
||||
service.yaml ClusterIP :8000
|
||||
ingress.yaml Traefik annotations, TLS
|
||||
secret.yaml created by sops-operator or external-secrets
|
||||
```
|
||||
|
||||
`app.toml` in the inter-hub repo root for railiance CLI integration:
|
||||
```toml
|
||||
[app]
|
||||
name = "inter-hub"
|
||||
slug = "inter-hub"
|
||||
kind = "native"
|
||||
registry = "registry.coulomb.social/coulomb/inter-hub"
|
||||
|
||||
[deploy]
|
||||
chart = "railiance-apps/helm/inter-hub"
|
||||
namespace = "inter-hub"
|
||||
```
|
||||
|
||||
### R7 — Gitea Actions CI/CD pipeline
|
||||
|
||||
Create `.gitea/workflows/deploy.yaml` in the inter-hub repo:
|
||||
|
||||
```yaml
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
build-and-deploy:
|
||||
runs-on: ubuntu-latest # or self-hosted if available
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Build OCI image on haskelseed
|
||||
run: |
|
||||
ssh haskelseed "cd /root/inter-hub && git pull && \
|
||||
nix build .#docker && \
|
||||
docker load < result && \
|
||||
docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \
|
||||
docker push $REGISTRY/inter-hub:${{ github.sha }}"
|
||||
|
||||
- name: Deploy to Railiance01
|
||||
run: |
|
||||
ssh coulombcore "helm upgrade --install inter-hub \
|
||||
railiance-apps/helm/inter-hub \
|
||||
--namespace inter-hub --create-namespace \
|
||||
--set image.tag=${{ github.sha }} \
|
||||
-f railiance-apps/helm/inter-hub/values.prod.yaml"
|
||||
```
|
||||
|
||||
Secrets in Gitea: `REGISTRY`, `SSH_KEY_HASKELSEED`, `SSH_KEY_COULOMBCORE`.
|
||||
|
||||
**Alternative if self-hosted runner is available on CoulombCore:** run the
|
||||
deploy step directly without the SSH hop to coulombcore.
|
||||
|
||||
### R8 — Staged deployment and smoke test
|
||||
|
||||
Follow the Railiance staged promotion lifecycle:
|
||||
|
||||
1. **Local verify** (done in R2 — container runs correctly)
|
||||
2. **Deploy to Railiance01:**
|
||||
```bash
|
||||
railiance deploy inter-hub --tag <sha>
|
||||
```
|
||||
3. **Smoke test:**
|
||||
```bash
|
||||
curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page
|
||||
curl -s https://hub.coulomb.social/capabilities # Capabilities
|
||||
curl -H "Authorization: Bearer <key>" \
|
||||
https://hub.coulomb.social/api/v2/hubs # API (200)
|
||||
curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401)
|
||||
```
|
||||
4. **Verify restart persistence:**
|
||||
```bash
|
||||
kubectl rollout restart deployment/inter-hub -n inter-hub
|
||||
kubectl rollout status deployment/inter-hub -n inter-hub
|
||||
# Then re-run smoke test
|
||||
```
|
||||
|
||||
### R9 — Document and register
|
||||
|
||||
- Write `deploy/railiance/RUNBOOK.md`: image build, migration procedure,
|
||||
secret rotation, rollback (`railiance rollback inter-hub`), log access
|
||||
(`kubectl logs -n inter-hub -l app=inter-hub --tail=100`)
|
||||
- Add progress event to state hub
|
||||
- Remove haskelseed socat/OpenRC production role note from quickstart —
|
||||
document it as the build machine only, not the production host
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
- `https://hub.coulomb.social/` returns the Landing page (200, no auth)
|
||||
- `/api/v2/hubs` returns 401 unauthenticated, 200 with valid API key
|
||||
- All 12 IHF dashboards accessible after admin login
|
||||
- `kubectl rollout restart` followed by smoke test passes (K3s restart
|
||||
persistence confirmed)
|
||||
- Gitea Actions pipeline: push to `main` → image built → deployed → smoke
|
||||
test green within 15 minutes
|
||||
- No dependency on haskelseed being up for the app to *run* (only for builds)
|
||||
|
||||
## Open Questions / Pre-flight Checks
|
||||
|
||||
1. **K3s status**: ThreePhoenix HA cluster workstream is active but not complete.
|
||||
Confirm whether Railiance01 is a single-node cluster already accepting
|
||||
workloads or still being provisioned. Gate R3 is the go/no-go check.
|
||||
|
||||
2. **Container registry**: Is Gitea's built-in registry available on Railiance01,
|
||||
or is a separate registry service needed? If neither, add registry deployment
|
||||
to the scope.
|
||||
|
||||
3. **PostgreSQL HA status**: railiance-platform baseline workstream is active.
|
||||
Confirm whether the HA cluster (repmgr + pgpool) is operational before R4.
|
||||
|
||||
4. **Static asset bundling**: The Nix production binary may or may not include
|
||||
`static/app.css` (Tailwind output). Verify in R2 and adjust image build
|
||||
if needed.
|
||||
|
||||
5. **Anthropic API key**: Phase 5 AI-assisted distillation requires
|
||||
`IHP_ANTHROPIC_API_KEY`. Add to SOPS secrets if the feature is to be
|
||||
active on Railiance01.
|
||||
Reference in New Issue
Block a user