Files
railiance-infra/docs/verification.md
tegwick b32dfd4f5a docs: add verification guide, close WP-0002
- docs/verification.md: explains spec/server-baseline.yaml, goss/baseline.yaml,
  make verify workflow, assertion mapping table, and how to add new checks
- docs/convergence.md: replace manual spot-check snippet with make verify reference
- workplans/RAIL-HO-WP-0002: mark completed (all tasks done, workstream closed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:37:10 +01:00

2.9 KiB

Server Verification

RailianceHosts ships a declarative baseline spec and a Goss test suite that asserts every managed node matches it. This replaces manual spot-checks with a reproducible, CI-friendly pass/fail verdict.

The spec

spec/server-baseline.yaml is the single source of truth for the target state of every managed node. It covers:

  • Firewall — UFW active, default deny inbound, required ports allowed (SSH 22/tcp, k3s API 6443/tcp, Flannel VXLAN 8472/udp)
  • SSH daemon — root login disabled, password auth disabled, pubkey auth enabled
  • Services — ufw, fail2ban, ssh.socket enabled and running
  • Packages — ufw, fail2ban, git, curl, vim, htop (age and sops installed as binaries)
  • Users — admin user with bash shell and passwordless sudo
  • Security — fail2ban sshd jail active, HISTCONTROL=ignorespace in /etc/profile.d/

When you change the desired state of a node, update this file first. Then update the Ansible role and the Goss tests to match.

Running verification

make verify

This runs ansible/playbooks/verify.yaml against all hosts. The playbook:

  1. Downloads the Goss binary (pinned version) to /usr/local/bin/goss
  2. Copies goss/baseline.yaml to /etc/goss/baseline.yaml on each host
  3. Runs goss validate --format tap
  4. Fails the play (non-zero exit) if any assertion fails
  5. Fetches the TAP report to reports/goss-<host>-<timestamp>.tap
  6. Auto-commits the report to git

All assertions passed → exit 0 One or more assertions FAILED → exit non-zero, TAP report in reports/

After convergence

The standard workflow after converging a new or updated node:

make converge   # bring the node to the desired state
make verify     # assert it got there

Run make status for a quick human-readable summary; run make verify when you need a structured, automatable check.

Goss test file

goss/baseline.yaml contains one Goss assertion per spec item. The mapping is:

spec section Goss resource
firewall command: ufw status stdout patterns
ssh file: /etc/ssh/sshd_config.d/10-hardening.conf contains
services service: blocks
packages package: blocks
users user: block + command: grep NOPASSWD
security.histcontrol command: grep -r HISTCONTROL /etc/profile.d/
security.fail2ban_jails command: fail2ban-client status sshd
age, sops (binary installs) command: test -x /usr/local/bin/{age,sops}

Adding new assertions

  1. Add the desired state to spec/server-baseline.yaml
  2. Add the Ansible task to ansible/roles/base/tasks/main.yml
  3. Add the Goss assertion to goss/baseline.yaml
  4. Run make converge && make verify to confirm

Reports

TAP reports are committed to reports/ after each make verify run. They are machine-readable and suitable for CI pipelines. A cleanup policy for old reports is tracked as extension point EP 78ef4879.