21 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, depends_on, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | depends_on | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|
| IHUB-WP-0018 | workplan | Railiance01 Deployment — Production Operations Scaffold | inter_hub | inter-hub | finished | custodian | inter_hub | 2026-04-29 | 2026-06-14 | IHUB-WP-0015 | 080d841a-3acd-4adf-b684-2d1890a5e986 |
IHUB-WP-0018 — Railiance01 Deployment: Production Operations Scaffold
Goal
Deploy inter-hub to the Railiance01 Kubernetes cluster with fully automatic
deployment, SOPS-encrypted secrets, Traefik ingress, PostgreSQL HA, and a
Gitea Actions CI/CD pipeline. After this workplan, every push to main
automatically builds an OCI container image on haskelseed, pushes it to the
Railiance container registry, and deploys it — with automatic restart on node
reboot guaranteed by K3s.
Background
inter-hub v0.2.0-alpha.1 is running on haskelseed (Alpine) via RunDevServer and socat. That setup is a development convenience, not a production operations scaffold. The target is the Railiance01 K3s cluster, which has:
- K3s (single-node for now; ThreePhoenix HA cluster is in progress)
- Traefik ingress with TLS
- PostgreSQL HA (repmgr + pgpool) managed by railiance-platform
- SOPS/age secret management
- Gitea with built-in container registry (or separate registry service)
- Staged Promotion Lifecycle CLI (
railiance run / deploy / promote / rollback)
Key constraint: This workplan depends on Railiance01 K3s being operational. Gate R3 verifies cluster readiness before any deployment work begins — if K3s or the container registry is not ready, this workplan blocks there and the cluster work must be completed first.
IHP specifics: IHP DevServer is a development server. For production we
build the IHP binary via nix build (which produces a self-contained binary)
and wrap it in a minimal OCI image using Nix's dockerTools.buildImage. The
app serves HTTP on port 8000; the socat workaround is not needed in Kubernetes
since Traefik routes directly to the pod's port.
Architecture
git push → Gitea Actions
→ SSH to haskelseed: nix build → docker load → docker push registry/inter-hub:$SHA
→ helm upgrade inter-hub railiance-apps/helm/inter-hub
→ Deployment (1 replica): inter-hub:$SHA + env from Secrets
→ Service (ClusterIP :8000)
→ Ingress (Traefik): hub.coulomb.social → Service
→ PersistentVolumeClaim: /app/static (generated CSS/JS)
→ PostgreSQL: database 'interhub' on railiance-platform HA cluster
Close-out Audit - 2026-06-04
WSJF triage flagged this workplan as a close-out candidate because State Hub had no indexed task rows for it. The deployment work is not complete; this file now contains explicit task blocks so the hub can track the remaining Railiance01 deployment work instead of treating the workplan as empty.
Deployment Review - 2026-06-05
Review against the current repo and public Railiance endpoint shows the
deployment scaffold is partially implemented but the live deployment is behind
origin/main.
origin/mainis ata3d980c, which includes the completed ops-hub bootstrap API work fromIHUB-WP-0019.https://hub.coulomb.social/returns 200 and serves inter-hub.- The public OpenAPI only lists the older v2 endpoints; it does not include
/hubs,/hub-capability-manifests,/api-consumers, or/policy-scopes. - Unauthenticated
/api/v2/hubsreturns 404 publicly, while current source should route it and return 401. This means ops-hub bootstrap cannot run against production until the current image is deployed. - The registry endpoint returns the expected unauthenticated
/v2/401 challenge, but this workspace does not havekubectl, so R3 cluster readiness cannot be fully verified from here.
Tasks
R1 - Add OCI image build to flake.nix
id: IHUB-WP-0018-T01
status: done
priority: high
state_hub_task_id: "27420bd7-0f70-4793-8805-393d8d5cacfd"
Add a packages.docker output to flake.nix using pkgs.dockerTools.buildLayeredImage.
The image wraps the IHP production binary produced by nix build .#default.
packages.docker = pkgs.dockerTools.buildLayeredImage {
name = "inter-hub";
tag = "latest";
contents = [ self.packages.${system}.default pkgs.cacert ];
config = {
Cmd = [ "/bin/inter-hub" ];
ExposedPorts = { "8000/tcp" = {}; };
Env = [
"PORT=8000"
"IHP_ENV=Production"
];
};
};
Test locally on haskelseed:
nix build .#docker
docker load < result
docker run --rm -p 8000:8000 -e DATABASE_URL=... -e IHP_SESSION_SECRET=... inter-hub:latest
Note: First build pulls the full Haskell binary closure (~2 GB); subsequent builds are incremental (layer caching). Build must run on haskelseed - the only machine with the Nix store populated for GHC 9.10.3.
Implementation note (2026-06-05): flake.nix exposes packages.docker = config.packages.unoptimized-docker-image, the IHP-provided production OCI
image used by the Railiance runbook. The original buildLayeredImage sketch is
superseded by that IHP image path.
R2 — Verify container runs correctly
id: IHUB-WP-0018-T02
status: done
priority: high
state_hub_task_id: "5ab45e4e-16bc-4feb-8b1b-e8eeb05bf39a"
On haskelseed, run the container image against the existing interhub database.
Confirm:
curl http://localhost:8000/returns 200 (LandingAction)curl http://localhost:8000/api/v2/hubsreturns 401 (auth required)- Static assets load (Tailwind CSS present in image)
- Container exits cleanly on SIGTERM
If Tailwind CSS output (static/app.css) is not bundled into the Nix binary
closure, add a pre-build step: run tailwindcss and include static/ in the
image via dockerTools.buildLayeredImage contents or a NixOS module.
R3 — Verify Railiance01 readiness (gate)
id: IHUB-WP-0018-T03
status: done
priority: high
state_hub_task_id: "79b5cf2c-3a5b-4b4b-8f84-f635cb6891c1"
This is a dependency gate. Before proceeding, confirm:
# From CoulombCore (execution origin):
kubectl get nodes # must show Ready
kubectl get pods -n kube-system | grep traefik # Traefik must be running
kubectl get pods -n railiance-platform # PostgreSQL HA pods
Also confirm:
- Container registry is reachable from haskelseed (verify push access)
- Registry address (e.g.,
registry.coulomb.socialorgitea.coulomb.social) - SOPS/age key is present on CoulombCore at
~/.config/sops/age/keys.txt
If any check fails, block here and open the relevant Railiance workstream. Do not proceed until all checks pass.
Review note (2026-06-05): Public smoke probes show
https://hub.coulomb.social/ returning 200 and the Gitea registry /v2/
endpoint returning the expected unauthenticated 401 challenge. Full R3 remains
blocked from this workspace because kubectl is not available here, and the
live app is not serving the current origin/main v2 bootstrap routes.
Recovery note (2026-06-14): Re-established the haskelseed ops-bridge path
and verified the runner substrate before deployment. make runner-status in
railiance-forge confirmed act_runner is registered to
https://gitea.coulomb.social, running under OpenRC, and has the expected
self-hosted labels and build/deploy tools. The K3s API path, Helm deploy path,
and Gitea registry host were exercised successfully by the production rollout.
R4 — Provision inter-hub database on railiance-platform
id: IHUB-WP-0018-T04
status: done
priority: high
state_hub_task_id: "c937cf36-3850-4ab3-aa83-2d846e1a378e"
On the PostgreSQL HA cluster, create the inter-hub database and user:
CREATE USER interhub WITH PASSWORD '<generated>';
CREATE DATABASE interhub OWNER interhub;
GRANT ALL PRIVILEGES ON DATABASE interhub TO interhub;
Run schema migration (IHP migrations) as part of the first deployment via an
init container or a manual migrate run inside the pod. Document the
migration procedure in deploy/railiance/RUNBOOK.md.
Recovery note (2026-06-14): Bootstrapped the production database manually on
the Railiance PostgreSQL cluster: role interhub, database interhub, schema
ownership, and privileges were created/updated. The running deployment now uses
that database through the inter-hub-env Kubernetes Secret.
R5 — SOPS-encrypted secrets
id: IHUB-WP-0018-T05
status: done
priority: high
state_hub_task_id: "926f82d1-15cd-425d-8a41-3d6b51c07f0b"
Create deploy/railiance/secrets/inter-hub.env.sops.yaml with:
apiVersion: v1
kind: Secret
metadata:
name: inter-hub-env
namespace: inter-hub
type: Opaque
stringData:
DATABASE_URL: postgresql://interhub:<pass>@net-kingdom-pg-rw.databases.svc.cluster.local:5432/interhub?sslmode=disable
IHP_SESSION_SECRET: <64-char-hex>
IHP_BASEURL: https://hub.coulomb.social
PORT: "8000"
IHP_ENV: Production
Encrypt with the age key:
sops --encrypt \
--age age1aq8twfd78wvpra0had8cezcnj96tj4q0068edrz5jez8d6xwmflqdepsh4 \
/tmp/inter-hub-env.yaml > deploy/railiance/secrets/inter-hub.env.sops.yaml
Commit only the encrypted file. Apply it with
sops -d deploy/railiance/secrets/inter-hub.env.sops.yaml | kubectl apply -f -.
Recovery note (2026-06-14): Runtime secrets were bootstrapped manually in
Kubernetes so production could deploy safely. This task remains in progress
until the durable SOPS-encrypted source for DATABASE_URL, IHP_SESSION_SECRET,
and related runtime env is committed and wired into the deploy path.
Progress note (2026-06-14): Added repo root .sops.yaml, plaintext
guardrails under deploy/railiance/secrets/, an example Secret manifest, and
k8s-secret-json-to-sops-input.py to convert the live Kubernetes Secret into a
SOPS-ready manifest without printing values. At that point the encrypted source
file was still pending because local sops tooling was not available.
Completion note (2026-06-14): Created
deploy/railiance/secrets/inter-hub.env.sops.yaml from the live
inter-hub/inter-hub-env Kubernetes Secret using temporary sops v3.13.1 and
the shared Railiance age recipient. Verified the file is SOPS-encrypted, parses
as YAML, leaves only non-secret metadata reviewable, and does not contain the
checked plaintext runtime markers. Decryption/apply verification remains a
custody-backed operator capability because the private age identity is not
present in the normal workstation or haskelseed shell.
R6 — Helm chart in railiance-apps
id: IHUB-WP-0018-T06
status: done
priority: high
state_hub_task_id: "4c4acc98-5773-4289-ad57-03f3fd5c381c"
Create charts/inter-hub/ in the railiance-apps repository following the
Railiance app.toml contract. Minimal chart:
charts/inter-hub/
Chart.yaml name: inter-hub, version: 0.1.0
values.yaml image.tag, ingress.host, resources
helm/inter-hub-values.yaml
production non-secret overrides
templates/
deployment.yaml envFrom: secretRef inter-hub-env
service.yaml ClusterIP :8000
ingress.yaml Traefik annotations, TLS
app.toml in the inter-hub repo root for railiance CLI integration:
[app]
name = "inter-hub"
slug = "inter-hub"
kind = "native"
registry = "gitea.coulomb.social/coulomb/inter-hub"
[deploy]
chart = "railiance-apps/charts/inter-hub"
namespace = "inter-hub"
Implementation note (2026-06-05): A Helm chart exists in
deploy/helm/inter-hub/ with Deployment, Service, Ingress, and values for the
current Gitea registry and hub.coulomb.social. Remaining gaps: no repo-root
app.toml, no committed SOPS secret manifest, and no separate
railiance-apps/helm/inter-hub handoff in this repo.
Recovery note (2026-06-14): The local chart under deploy/helm/inter-hub/
successfully deployed the app to Railiance01. This task remains in progress
because the repo-root app.toml and railiance-apps handoff are still not
completed.
Completion note (2026-06-14): Added repo-root app.toml in inter-hub and
added charts/inter-hub, helm/inter-hub-values.yaml, Makefile targets, and
server-dry-run coverage in railiance-apps. The chart rendered successfully on
haskelseed with helm template.
R7 — Gitea Actions CI/CD pipeline
id: IHUB-WP-0018-T07
status: done
priority: medium
state_hub_task_id: "ec25c67c-3cb0-4534-9fb0-9bd6578a2def"
Create .gitea/workflows/deploy.yaml in the inter-hub repo:
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest # or self-hosted if available
steps:
- uses: actions/checkout@v4
- name: Build OCI image on haskelseed
run: |
ssh haskelseed "cd /root/inter-hub && git pull && \
nix build .#docker && \
docker load < result && \
docker tag inter-hub:latest $REGISTRY/inter-hub:${{ github.sha }} && \
docker push $REGISTRY/inter-hub:${{ github.sha }}"
- name: Deploy to Railiance01
run: |
ssh coulombcore "helm upgrade --install inter-hub \
railiance-apps/helm/inter-hub \
--namespace inter-hub --create-namespace \
--set image.tag=${{ github.sha }} \
-f railiance-apps/helm/inter-hub/values.prod.yaml"
Secrets in Gitea: REGISTRY, SSH_KEY_HASKELSEED, SSH_KEY_COULOMBCORE.
Alternative if self-hosted runner is available on CoulombCore: run the deploy step directly without the SSH hop to coulombcore.
Implementation note (2026-06-05): .gitea/workflows/deploy.yaml exists and
builds .#docker on a self-hosted haskelseed runner, pushes to
92.205.130.254:32166/coulomb/inter-hub, deploys with Helm, and smoke-tests
the public endpoint. Remote main is already current, but production is still
serving an older API surface, so the workflow needs an attended rerun/inspection
or a new deployment trigger.
Runner substrate finding (2026-06-07): Pushed commits fa96fb8 and
7cc3173 to trigger the workflow, but public /api/v2/hubs remained 404
while / stayed 200, indicating the current image was not deployed. Repo
search shows railiance-forge owns Actions runner substrate, but its
2026-06-05 migration plan explicitly lists "No Actions runner deployment" as a
non-goal and no runner manifest/script/workplan exists there yet. haskelseed
itself is reachable on SSH and historical port 8080, but this workspace cannot
authenticate non-interactively. Treat R7 as blocked on a forge-owned runner
prerequisite rather than continuing to push commits as deployment probes.
Recovery note (2026-06-14): The runner prerequisite was restored through
the haskelseed ops-bridge path. The workflow now builds the Nix OCI image,
publishes to gitea.coulomb.social/coulomb/inter-hub using a registry bearer
token from the repo REGISTRY_TOKEN Actions secret, deploys with Helm, and
runs public smoke checks. Gitea Actions run 2913 completed successfully for
commit 5663fab.
Load-control note (2026-06-14): Added workflow paths-ignore for docs,
workplans, .custodian-brief.md, app.toml, .sops.yaml, and
deploy/railiance/** so State Hub consistency/doc-only commits do not consume a
haskelseed build/deploy cycle.
Bootstrap-gate deploy note (2026-06-14): Hardened the deployment workflow
smoke test so a production rollout only passes when /api/v2/hubs returns the
expected unauthenticated 401 and OpenAPI exposes /hubs,
/hub-capability-manifests, /api-consumers, and /policy-scopes. This
directly protects the ops-hub bootstrap gate instead of only checking the
landing page and generic widget auth gate.
Authenticated inspection note (2026-06-14): The stored local Tea token is
stale for https://gitea.coulomb.social, but runner-side inspection succeeded.
make runner-status in railiance-forge showed act_runner registered to
https://gitea.coulomb.social, started under OpenRC, and carrying the expected
self-hosted/haskelseed labels. The runner log shows task 19 for
coulomb/inter-hub starting at 2026-06-14T19:59:19+02:00, matching the
6455902 deploy trigger.
R8 — Staged deployment and smoke test
id: IHUB-WP-0018-T08
status: done
priority: high
state_hub_task_id: "2b02ae5c-47b9-4f09-88f0-a4af7900b38f"
Follow the Railiance staged promotion lifecycle:
- Local verify (done in R2 — container runs correctly)
- Deploy to Railiance01:
railiance deploy inter-hub --tag <sha> - Smoke test:
curl -s https://hub.coulomb.social/ | grep "Inter-Hub" # Landing page curl -s https://hub.coulomb.social/capabilities # Capabilities curl -H "Authorization: Bearer <key>" \ https://hub.coulomb.social/api/v2/hubs # API (200) curl https://hub.coulomb.social/api/v2/hubs # Unauthenticated (401) - Verify restart persistence:
kubectl rollout restart deployment/inter-hub -n inter-hub kubectl rollout status deployment/inter-hub -n inter-hub # Then re-run smoke test
Recovery note (2026-06-14): Production is deployed from image
gitea.coulomb.social/coulomb/inter-hub:5663fab; Kubernetes reports the
inter-hub deployment ready with one replica. Public smoke checks pass:
/ returns 200 and contains inter-hub, /api/v2/openapi.json returns 200,
and unauthenticated /api/v2/widgets returns 401.
DNS gate finding (2026-06-14): The deployment workflow did publish and
deploy gitea.coulomb.social/coulomb/inter-hub:6455902; Kubernetes reports the
inter-hub Deployment ready on the COULOMBCORE K3s node
92.205.130.254. An in-cluster probe to
http://inter-hub:8000/api/v2/hubs returned the expected unauthenticated
401, and forcing public TLS to 92.205.130.254 also returned 401. The
public DNS record for hub.coulomb.social, however, resolves to
92.205.62.239, where /api/v2/hubs still returns 404 and OpenAPI lacks the
bootstrap paths. The remaining production gate is therefore DNS cutover (or an
intentional kubeconfig rotation to the cluster behind 92.205.62.239), not a
runner, build, registry, Helm, or image-content issue.
R9 — Document and register
id: IHUB-WP-0018-T09
status: done
priority: medium
state_hub_task_id: "4d1e55c7-8dbb-480f-b07b-6c5e39a04218"
- Write
deploy/railiance/RUNBOOK.md: image build, migration procedure, secret rotation, rollback (railiance rollback inter-hub), log access (kubectl logs -n inter-hub -l app=inter-hub --tail=100) - Add progress event to state hub
- Remove haskelseed socat/OpenRC production role note from quickstart - document it as the build machine only, not the production host
Implementation note (2026-06-05): deploy/railiance/RUNBOOK.md exists and
documents architecture, image build/push, Helm deployment, logs, restart,
rollback, secret rotation, and smoke checks. The deployment record remains
incomplete until current main is running and the ops-hub bootstrap smoke test
passes against production.
Recovery note (2026-06-14): Current main is running in production and the
deployment evidence has been recorded here. Remaining documentation work is to
capture the durable secret-management and railiance-apps handoff path once R5
and R6 are completed.
Completion note (2026-06-14): Updated deploy/railiance/RUNBOOK.md for the
current Gitea registry host, runner-based build/deploy path, SOPS secret handoff,
current smoke checks, and haskelseed's build-runner-only role. Updated
docs/new-hub-quickstart.md so haskelseed is no longer described as a
production/shared database runtime.
Exit Criteria
https://hub.coulomb.social/returns the Landing page (200, no auth)/api/v2/hubsreturns 401 unauthenticated, 200 with valid API key- All 12 IHF dashboards accessible after admin login
kubectl rollout restartfollowed by smoke test passes (K3s restart persistence confirmed)- Gitea Actions pipeline: push to
main→ image built → deployed → smoke test green within 15 minutes - No dependency on haskelseed being up for the app to run (only for builds)
Open Questions / Pre-flight Checks
-
K3s status: ThreePhoenix HA cluster workstream is active but not complete. Confirm whether Railiance01 is a single-node cluster already accepting workloads or still being provisioned. Gate R3 is the go/no-go check.
-
Container registry: Is Gitea's built-in registry available on Railiance01, or is a separate registry service needed? If neither, add registry deployment to the scope.
-
PostgreSQL HA status: railiance-platform baseline workstream is active. Confirm whether the HA cluster (repmgr + pgpool) is operational before R4.
-
Static asset bundling: The Nix production binary may or may not include
static/app.css(Tailwind output). Verify in R2 and adjust image build if needed. -
Anthropic API key: Phase 5 AI-assisted distillation requires
IHP_ANTHROPIC_API_KEY. Add to SOPS secrets if the feature is to be active on Railiance01.