From b32dfd4f5a2c98e62bbdfcf1ea9d4662c1834d2d Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 9 Mar 2026 19:37:10 +0100 Subject: [PATCH] docs: add verification guide, close WP-0002 - docs/verification.md: explains spec/server-baseline.yaml, goss/baseline.yaml, make verify workflow, assertion mapping table, and how to add new checks - docs/convergence.md: replace manual spot-check snippet with make verify reference - workplans/RAIL-HO-WP-0002: mark completed (all tasks done, workstream closed) Co-Authored-By: Claude Sonnet 4.6 --- docs/convergence.md | 17 ++-- docs/verification.md | 79 +++++++++++++++++++ ...L-HO-WP-0002-server-spec-and-test-suite.md | 3 +- 3 files changed, 90 insertions(+), 9 deletions(-) create mode 100644 docs/verification.md diff --git a/docs/convergence.md b/docs/convergence.md index e57df5c..2883084 100644 --- a/docs/convergence.md +++ b/docs/convergence.md @@ -26,19 +26,20 @@ This will: ## Verifying -Once convergence completes, you can test: +After convergence, run the automated test suite to assert the node matches the +baseline spec: ```bash -ssh admin@ +make verify +``` -# Check sudo access without password -sudo -n true && echo "✔ sudo OK" +This runs Goss assertions against all hosts and exits non-zero on failure. +TAP reports are written to `reports/`. See `docs/verification.md` for details. -# Firewall status -sudo ufw status +For a quick human-readable summary without assertions: -# Installed tools -htop --version +```bash +make status ``` ## Notes diff --git a/docs/verification.md b/docs/verification.md new file mode 100644 index 0000000..7280fe6 --- /dev/null +++ b/docs/verification.md @@ -0,0 +1,79 @@ +# Server Verification + +RailianceHosts ships a declarative baseline spec and a Goss test suite that +asserts every managed node matches it. This replaces manual spot-checks with +a reproducible, CI-friendly pass/fail verdict. + +## The spec + +`spec/server-baseline.yaml` is the single source of truth for the target state +of every managed node. It covers: + +- **Firewall** — UFW active, default deny inbound, required ports allowed + (SSH 22/tcp, k3s API 6443/tcp, Flannel VXLAN 8472/udp) +- **SSH daemon** — root login disabled, password auth disabled, pubkey auth enabled +- **Services** — ufw, fail2ban, ssh.socket enabled and running +- **Packages** — ufw, fail2ban, git, curl, vim, htop (age and sops installed as binaries) +- **Users** — admin user with bash shell and passwordless sudo +- **Security** — fail2ban sshd jail active, HISTCONTROL=ignorespace in /etc/profile.d/ + +When you change the desired state of a node, update this file first. Then +update the Ansible role **and** the Goss tests to match. + +## Running verification + +```bash +make verify +``` + +This runs `ansible/playbooks/verify.yaml` against all hosts. The playbook: + +1. Downloads the Goss binary (pinned version) to `/usr/local/bin/goss` +2. Copies `goss/baseline.yaml` to `/etc/goss/baseline.yaml` on each host +3. Runs `goss validate --format tap` +4. Fails the play (non-zero exit) if any assertion fails +5. Fetches the TAP report to `reports/goss--.tap` +6. Auto-commits the report to git + +**All assertions passed** → exit 0 +**One or more assertions FAILED** → exit non-zero, TAP report in `reports/` + +## After convergence + +The standard workflow after converging a new or updated node: + +```bash +make converge # bring the node to the desired state +make verify # assert it got there +``` + +Run `make status` for a quick human-readable summary; run `make verify` when +you need a structured, automatable check. + +## Goss test file + +`goss/baseline.yaml` contains one Goss assertion per spec item. The mapping is: + +| spec section | Goss resource | +|---|---| +| `firewall` | `command: ufw status` stdout patterns | +| `ssh` | `file: /etc/ssh/sshd_config.d/10-hardening.conf` contains | +| `services` | `service:` blocks | +| `packages` | `package:` blocks | +| `users` | `user:` block + `command: grep NOPASSWD` | +| `security.histcontrol` | `command: grep -r HISTCONTROL /etc/profile.d/` | +| `security.fail2ban_jails` | `command: fail2ban-client status sshd` | +| `age`, `sops` (binary installs) | `command: test -x /usr/local/bin/{age,sops}` | + +## Adding new assertions + +1. Add the desired state to `spec/server-baseline.yaml` +2. Add the Ansible task to `ansible/roles/base/tasks/main.yml` +3. Add the Goss assertion to `goss/baseline.yaml` +4. Run `make converge && make verify` to confirm + +## Reports + +TAP reports are committed to `reports/` after each `make verify` run. +They are machine-readable and suitable for CI pipelines. A cleanup policy +for old reports is tracked as extension point EP `78ef4879`. diff --git a/workplans/RAIL-HO-WP-0002-server-spec-and-test-suite.md b/workplans/RAIL-HO-WP-0002-server-spec-and-test-suite.md index aa66447..30f2faa 100644 --- a/workplans/RAIL-HO-WP-0002-server-spec-and-test-suite.md +++ b/workplans/RAIL-HO-WP-0002-server-spec-and-test-suite.md @@ -4,12 +4,13 @@ type: workplan title: "Server Specification and Automated Test Suite" domain: railiance repo: railiance-hosts -status: active +status: completed owner: railiance topic_slug: railiance state_hub_workstream_id: "8fed53c2-4c39-4471-8bb9-61f58771fe0c" created: "2026-03-09" updated: "2026-03-09" +completed: "2026-03-09" --- # Server Specification and Automated Test Suite