refine(CUST-WP-0032): incorporate all four architecture decisions
- Packer uses NAT during build; setup-vm.sh does post-import bridged config - Bake GHC 9.8.4 (primary) + 9.6.6 (LTS coverage); drop Stack + HLS - state-hub always via forward tunnel port 18000 (CoulombCore pattern) - autossh opens -R (reverse SSH) + -L 18000 (state-hub forward) together - Decisions section replaces Open Questions; all four resolved 2026-04-20 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -9,6 +9,7 @@ owner: custodian
|
||||
topic_slug: railiance
|
||||
created: "2026-04-20"
|
||||
updated: "2026-04-20"
|
||||
decisions_resolved: "2026-04-20"
|
||||
state_hub_workstream_id: "f2bfac74-de8a-4c86-9aa9-4d95a92336e2"
|
||||
---
|
||||
|
||||
@@ -30,11 +31,12 @@ laptop running VirtualBox — without changing the developer workflow.
|
||||
```
|
||||
Laptop (VirtualBox host)
|
||||
└── haskell-build VM (Ubuntu 24.04, bridged network)
|
||||
├── GHC 9.8.x + Cabal + Stack via GHCup
|
||||
├── GHC 9.8.4 (default) + GHC 9.6.6 + Cabal via GHCup (no Stack, no HLS)
|
||||
├── build-agent (systemd): registers with state-hub on boot
|
||||
│ └── POST /capability-catalog/ { capability_type: "haskell-build-agent" }
|
||||
└── autossh: reverse tunnel → workstation port 12222
|
||||
└── ssh -R 12222:localhost:22 worsch@<workstation-ip>
|
||||
└── autossh: two tunnels in one SSH connection
|
||||
├── -R 12222:localhost:22 (reverse: workstation → VM SSH)
|
||||
└── -L 18000:localhost:8000 (forward: VM → state-hub, port 18000)
|
||||
|
||||
Workstation (WSL2)
|
||||
├── state-hub: sees haskell-build-* capability entries, knows tunnel port
|
||||
@@ -55,14 +57,18 @@ Workstation (WSL2)
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `infra/build-machines/haskell/haskell-build.pkr.hcl` | Packer build definition |
|
||||
| `infra/build-machines/haskell/scripts/install-haskell.sh` | GHCup + toolchain install |
|
||||
| `infra/build-machines/haskell/haskell-build.pkr.hcl` | Packer build definition (NAT during build) |
|
||||
| `infra/build-machines/haskell/scripts/install-haskell.sh` | GHCup + GHC 9.8.4 + 9.6.6 install |
|
||||
| `infra/build-machines/haskell/scripts/install-agent.sh` | Agent + systemd install |
|
||||
| `infra/build-machines/haskell/files/build-agent.py` | Boot registration + tunnel agent |
|
||||
| `infra/build-machines/haskell/scripts/setup-vm.sh` | Post-import VBoxManage network config |
|
||||
| `infra/build-machines/haskell/scripts/inject-keys.sh` | SSH key + env injection for new VMs |
|
||||
| `infra/build-machines/haskell/files/build-agent.py` | Boot registration + dual tunnel agent |
|
||||
| `infra/build-machines/haskell/files/build-agent.service` | systemd unit |
|
||||
| `infra/build-machines/haskell/files/build-agent.env.template` | Env var template |
|
||||
| `infra/build-machines/haskell/files/cloud-init/user-data` | Ubuntu autoinstall config |
|
||||
| `infra/build-machines/haskell/files/cloud-init/meta-data` | cloud-init meta-data |
|
||||
| `infra/build-machines/port-registry.yml` | Port assignment tracker (12221-12230) |
|
||||
| `infra/build-machines/state-hub-refs.yml` | State-hub entity UUID references |
|
||||
| `infra/build-machines/README.md` | Deployment & usage guide |
|
||||
|
||||
---
|
||||
@@ -83,7 +89,8 @@ Create `infra/build-machines/haskell/haskell-build.pkr.hcl`.
|
||||
The Packer `virtualbox-iso` source must:
|
||||
- Base: Ubuntu 24.04 LTS server ISO (amd64)
|
||||
- Disk: 40 GB, Memory: 8192 MB, CPUs: 4
|
||||
- Network: bridged (host adapter selected at runtime, not baked in)
|
||||
- **Network during build: NAT** — Packer needs internet for ISO + packages; bridged
|
||||
is set post-import by `setup-vm.sh` (adapter names are laptop-specific)
|
||||
- Unattended install via cloud-init autoinstall (not preseed)
|
||||
- SSH communicator: `build` user, key-based after provisioning
|
||||
- Provisioners: `install-haskell.sh`, then `install-agent.sh`
|
||||
@@ -94,9 +101,27 @@ Key Packer variables to expose:
|
||||
- `var.disk_size` (default: `40960`)
|
||||
- `var.memory` (default: `8192`)
|
||||
- `var.cpus` (default: `4`)
|
||||
- `var.ghc_version` (default: `9.8.4`)
|
||||
- `var.ghc_primary_version` (default: `9.8.4`)
|
||||
- `var.ghc_secondary_version` (default: `9.6.6`)
|
||||
- `var.cabal_version` (default: `3.12.1.0`)
|
||||
|
||||
Also create `scripts/setup-vm.sh` (run once after OVA import):
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# setup-vm.sh — switches imported VM from NAT to bridged networking
|
||||
VM_NAME="${1:?Usage: setup-vm.sh <vm-name> [adapter]}"
|
||||
# Auto-detect first available bridge interface if not specified
|
||||
ADAPTER="${2:-$(VBoxManage list bridgedifs | awk '/^Name:/{print $2; exit}')}"
|
||||
|
||||
VBoxManage modifyvm "$VM_NAME" \
|
||||
--nic1 bridged \
|
||||
--bridgeadapter1 "$ADAPTER" \
|
||||
--memory 8192 --cpus 4
|
||||
|
||||
echo "Configured $VM_NAME: bridged on $ADAPTER"
|
||||
echo "Next: inject keys with scripts/inject-keys.sh, then start VM"
|
||||
```
|
||||
|
||||
### Task: Ubuntu autoinstall (cloud-init) config
|
||||
|
||||
```task
|
||||
@@ -144,11 +169,17 @@ apt-get install -y -qq build-essential curl git \
|
||||
libgmp-dev libffi-dev zlib1g-dev libncurses-dev libtinfo-dev pkg-config
|
||||
|
||||
# GHCup — non-interactive bootstrap
|
||||
# Primary version (9.8.4) is the default; secondary (9.6.6) covers LTS 22/23.
|
||||
# Skip Stack (cabal covers 95% of projects) and HLS (saves ~2 GB image size).
|
||||
GHC_PRIMARY="${GHC_PRIMARY_VERSION:-9.8.4}"
|
||||
GHC_SECONDARY="${GHC_SECONDARY_VERSION:-9.6.6}"
|
||||
CABAL_VERSION="${CABAL_VERSION:-3.12.1.0}"
|
||||
|
||||
export BOOTSTRAP_HASKELL_NONINTERACTIVE=1
|
||||
export BOOTSTRAP_HASKELL_GHC_VERSION=${GHC_VERSION:-9.8.4}
|
||||
export BOOTSTRAP_HASKELL_CABAL_VERSION=${CABAL_VERSION:-3.12.1.0}
|
||||
export BOOTSTRAP_HASKELL_INSTALL_STACK=1
|
||||
export BOOTSTRAP_HASKELL_INSTALL_HLS=0 # skip HLS — large, not needed for CI
|
||||
export BOOTSTRAP_HASKELL_GHC_VERSION="$GHC_PRIMARY"
|
||||
export BOOTSTRAP_HASKELL_CABAL_VERSION="$CABAL_VERSION"
|
||||
export BOOTSTRAP_HASKELL_INSTALL_STACK=0 # not needed; cabal suffices
|
||||
export BOOTSTRAP_HASKELL_INSTALL_HLS=0 # ~2 GB — skip for build-only image
|
||||
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org \
|
||||
| runuser -l build -c 'sh -s -- --no-modify-path'
|
||||
@@ -157,11 +188,18 @@ curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org \
|
||||
echo '. "$HOME/.ghcup/env"' >> /home/build/.bashrc
|
||||
echo '. "$HOME/.ghcup/env"' >> /home/build/.profile
|
||||
|
||||
# Install secondary GHC version (~500 MB, shared GHCup base — worth it)
|
||||
runuser -l build -c "source ~/.ghcup/env && ghcup install ghc $GHC_SECONDARY"
|
||||
|
||||
# Ensure primary is the default
|
||||
runuser -l build -c "source ~/.ghcup/env && ghcup set ghc $GHC_PRIMARY"
|
||||
|
||||
# Pre-warm cabal package db (saves 2-3 min on first real build)
|
||||
runuser -l build -c 'source ~/.ghcup/env && cabal update'
|
||||
|
||||
# Verify
|
||||
runuser -l build -c 'source ~/.ghcup/env && ghc --version && cabal --version'
|
||||
# Verify both versions present
|
||||
runuser -l build -c "source ~/.ghcup/env && ghc --version && cabal --version"
|
||||
runuser -l build -c "source ~/.ghcup/env && ghcup run --ghc $GHC_SECONDARY -- ghc --version"
|
||||
```
|
||||
|
||||
### Task: Agent installation script
|
||||
@@ -264,7 +302,10 @@ def get_local_ip():
|
||||
return "unknown"
|
||||
|
||||
def register(cfg):
|
||||
state_hub = cfg.get("STATE_HUB_URL", "http://192.168.1.100:8000")
|
||||
# State-hub is always accessed via the forward tunnel (port 18000), never
|
||||
# via direct LAN. This matches the CoulombCore remote worker pattern and
|
||||
# works regardless of network topology (LAN, VPN, different subnet).
|
||||
state_hub = cfg.get("STATE_HUB_URL", "http://127.0.0.1:18000")
|
||||
hostname = socket.gethostname()
|
||||
domain = cfg.get("STATE_HUB_DOMAIN", "railiance")
|
||||
remote_port = cfg.get("REMOTE_PORT", "12222")
|
||||
@@ -327,12 +368,16 @@ def open_tunnel(cfg):
|
||||
"-o", "StrictHostKeyChecking=no",
|
||||
"-o", "UserKnownHostsFile=/dev/null",
|
||||
"-N",
|
||||
"-R", f"{remote_port}:localhost:22",
|
||||
"-R", f"{remote_port}:localhost:22", # reverse: workstation → VM SSH
|
||||
"-L", "18000:localhost:8000", # forward: VM → state-hub (port 18000)
|
||||
"-i", ssh_key,
|
||||
f"{relay_user}@{relay_host}",
|
||||
]
|
||||
print(f"[build-agent] Opening tunnel: {relay_host}:{remote_port} -> local:22",
|
||||
flush=True)
|
||||
print(
|
||||
f"[build-agent] Opening tunnels: "
|
||||
f"-R {remote_port}→local:22, -L 18000→state-hub:8000",
|
||||
flush=True,
|
||||
)
|
||||
subprocess.run(cmd) # autossh manages reconnects internally
|
||||
|
||||
def main():
|
||||
@@ -393,24 +438,26 @@ WantedBy=multi-user.target
|
||||
Create `files/build-agent.env.template`:
|
||||
|
||||
```bash
|
||||
# Custodian State Hub URL (reachable from VM on LAN)
|
||||
# On workstation: http://<workstation-lan-ip>:8000
|
||||
# If using ops-bridge reverse tunnel from VM: http://127.0.0.1:18000
|
||||
STATE_HUB_URL=http://192.168.1.100:8000
|
||||
# Custodian State Hub URL — always access via forward tunnel (port 18000).
|
||||
# The agent opens -L 18000:localhost:8000 alongside the reverse SSH tunnel,
|
||||
# so this works regardless of network topology (LAN, VPN, different subnet).
|
||||
# Matches the CoulombCore remote worker bridge pattern.
|
||||
STATE_HUB_URL=http://127.0.0.1:18000
|
||||
|
||||
# Domain to register capability under
|
||||
STATE_HUB_DOMAIN=railiance
|
||||
|
||||
# Workstation hostname or LAN IP for reverse SSH tunnel
|
||||
SSH_RELAY_HOST=192.168.1.100
|
||||
# Workstation hostname or LAN IP for SSH relay connection
|
||||
# The VM connects OUT to this host to establish both tunnels.
|
||||
SSH_RELAY_HOST=192.168.1.100 # replace with actual workstation LAN IP
|
||||
SSH_RELAY_USER=worsch
|
||||
|
||||
# Path to private key for SSH tunnel (matching authorized_keys on workstation)
|
||||
SSH_KEY_PATH=/home/build/.ssh/id_build
|
||||
|
||||
# Port to bind on workstation (ssh -R <REMOTE_PORT>:localhost:22)
|
||||
# Each VM instance should use a distinct port to avoid conflicts
|
||||
# 12221 = first instance, 12222 = second, etc.
|
||||
# Each VM instance must use a distinct port — see port-registry.yml
|
||||
# Range: 12221-12230
|
||||
REMOTE_PORT=12222
|
||||
```
|
||||
|
||||
@@ -718,22 +765,25 @@ Create `infra/build-machines/README.md` covering:
|
||||
None — this workplan is self-contained. Packer and VirtualBox are workstation-level
|
||||
tools that need to be installed once.
|
||||
|
||||
## Open Questions / Decisions Needed
|
||||
## Decisions (Resolved 2026-04-20)
|
||||
|
||||
1. **Host network adapter name**: Bridged mode requires specifying the adapter
|
||||
(e.g. `eth0`, `enp3s0`). Should this be a Packer variable or set post-import?
|
||||
→ Recommend: set post-import via VBoxManage / GUI (avoids hardcoding laptop-specific adapter).
|
||||
1. **Host network adapter name** → **Post-import via `setup-vm.sh`** (VBoxManage
|
||||
auto-detects first available bridge interface). Packer builds with NAT for internet
|
||||
access; `setup-vm.sh` switches to bridged after OVA import. Avoids hardcoding any
|
||||
laptop-specific adapter name in the image.
|
||||
|
||||
2. **GHC version pinning**: Should each OVA bake a specific GHC version, or use
|
||||
`ghcup` to install on first boot? Baking is faster; first-boot install is flexible.
|
||||
→ Recommend: bake one version (9.8.4), expose `var.ghc_version` for rebuilds.
|
||||
2. **GHC version pinning** → **Bake two versions: 9.8.4 (primary) + 9.6.6 (secondary)**.
|
||||
Both installed at image build time via GHCup. 9.8.4 is the default; 9.6.6 covers
|
||||
Stackage LTS 22/23 projects at a cost of ~500 MB extra image size. Additional
|
||||
versions can be added post-deployment with `ghcup install ghc <ver>` — no rebuild.
|
||||
|
||||
3. **Multiple GHC versions**: Projects may need different GHC versions. Should the
|
||||
VM pre-install multiple versions via `ghcup install ghc <ver>`?
|
||||
→ Defer to v2 — for now, one version per image, rebuild for major version bumps.
|
||||
3. **Stack vs Cabal** → **Cabal only, no Stack, no HLS**. Cabal covers 95%+ of
|
||||
Haskell projects. Stack adds ~400 MB and HLS adds ~2 GB to the image with no
|
||||
benefit for a CI/build sandbox. Both can be added post-deployment if a project
|
||||
specifically requires them.
|
||||
|
||||
4. **state-hub reachability from VM**: Does the laptop's LAN have direct access to
|
||||
the workstation on port 8000, or does the VM need to tunnel state-hub access
|
||||
through the SSH relay? If the latter, the agent should use `http://127.0.0.1:18000`
|
||||
and add a forward tunnel alongside the reverse tunnel.
|
||||
→ Decision needed before implementing T05.
|
||||
4. **state-hub reachability from VM** → **Always via forward tunnel (port 18000)**.
|
||||
The agent opens `-L 18000:localhost:8000` alongside `-R <port>:localhost:22` in a
|
||||
single autossh connection. `STATE_HUB_URL` defaults to `http://127.0.0.1:18000`.
|
||||
This matches the CoulombCore remote worker bridge pattern and works on any network
|
||||
topology without exposing state-hub on a non-loopback interface.
|
||||
|
||||
Reference in New Issue
Block a user