Update all operational references to reflect the new repo name per ADR-003 (OAS S1 Infrastructure Substrate). Historical text in ADRs and state-hub-inbox files preserved as-is. Gitea remote URL updated locally (Gitea repo rename is a manual step). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.3 KiB
id, type, title, domain, repo, status, owner, topic_slug, state_hub_workstream_id, created, updated, completed
| id | type | title | domain | repo | status | owner | topic_slug | state_hub_workstream_id | created | updated | completed |
|---|---|---|---|---|---|---|---|---|---|---|---|
| RAIL-HO-WP-0002 | workplan | Server Specification and Automated Test Suite | railiance | railiance-infra | completed | railiance | railiance | 8fed53c2-4c39-4471-8bb9-61f58771fe0c | 2026-03-09 | 2026-03-09 | 2026-03-09 |
Server Specification and Automated Test Suite
Motivation
make status produces raw shell output that requires manual interpretation.
There is no machine-readable specification of what a converged Railiance node
should look like, and therefore no way to assert automatically whether a server
is in the correct state.
This workplan closes that gap by introducing:
- A declarative server specification (
spec/server-baseline.yaml) — the single source of truth for the target state of every managed node. - A Goss test suite derived from that spec — YAML assertions that map one-to-one to spec items and produce a structured pass/fail report.
make verify— runs the test suite against all hosts and exits non-zero on failure, suitable for CI.- An ADR that formally defines the boundary between
railiance-hostsandrailiance-bootstrap.
Concept
Separation of concerns
| Repo | Responsibility |
|---|---|
railiance-hosts |
What a managed node should look like (spec), how to get it there (Ansible roles), how to verify it got there (Goss tests), inventory, secrets |
railiance-bootstrap |
Upstream Kubernetes/app-layer provisioning that builds on an already-converged base node; does NOT own security baseline |
The railiance-bootstrap ansible work (harden.yml, bootstrap.yml) is
superseded by roles/base and roles/sops_agent in this repo. Going forward,
any security or OS-level configuration belongs here. railiance-bootstrap may
consume a node that has already been converged by this repo, but must not
re-configure items owned here.
Test framework: Goss
Goss is a Go binary that evaluates YAML test files against the live node. It was chosen because:
- Tests and spec map one-to-one (Goss YAML IS the assertion)
- Single binary, no Python/Ruby runtime on target host
- Fast (runs in-process, no SSH per test)
- Output can be TAP, JSON, or human-readable
- Deployable via Ansible in a single task
Directory layout
spec/
server-baseline.yaml ← authoritative target-state spec (already created)
goss/
baseline.yaml ← Goss assertions (derived from spec)
vars/
baseline-vars.yaml ← parameterised values (ports, users, etc.)
ansible/
playbooks/
verify.yaml ← deploy Goss + run tests + fetch results
roles/
goss/ ← role: install binary, copy tests, run, report
Tasks
T01 — Resolve duplicate converge target and fix SSH check
id: T01
status: done
completed: "2026-03-09"
priority: high
state_hub_task_id: "892f8bb8-beff-463a-b47c-ffd9a672d065"
- Remove redundant
converge: ansible-bootstrapalias (caused Makefile warning) - Fix
sshd -Tcommand (requires hostkeys) → replaced withgrep -iE '^(PermitRootLogin|PasswordAuthentication)' /etc/ssh/sshd_config
Done when: make status completes without warnings and SSH section returns
PermitRootLogin no / PasswordAuthentication no.
T02 — Finalise server baseline spec
id: T02
status: done
completed: "2026-03-09"
priority: high
state_hub_task_id: "293d950e-c0b3-4ae2-ac08-dcbf3fe5b114"
Created spec/server-baseline.yaml covering:
- Firewall rules (UFW, default deny, allowed ports)
- SSH daemon settings
- Required services and packages
- Admin user constraints
- Security settings (fail2ban jails, HISTCONTROL)
Done when: spec reviewed and agreed — it becomes the contract that roles and tests must satisfy.
T03 — Implement Goss test suite
id: T03
status: done
completed: "2026-03-09"
priority: high
state_hub_task_id: "a34a1626-ff38-4925-a957-d94036fbded6"
Create goss/baseline.yaml with Goss assertions that implement every item in
spec/server-baseline.yaml. Each spec section maps to a Goss resource type:
| spec section | Goss resource |
|---|---|
firewall.status |
command: ufw status |
firewall.rules |
command: ufw status stdout contains |
ssh.* |
file: /etc/ssh/sshd_config contains |
services |
service: blocks |
packages |
package: blocks |
users |
user: + file: /etc/sudoers.d/admin |
Example structure:
# goss/baseline.yaml
package:
ufw:
installed: true
fail2ban:
installed: true
service:
ufw:
enabled: true
running: true
fail2ban:
enabled: true
running: true
file:
/etc/ssh/sshd_config:
exists: true
contains:
- /^PermitRootLogin no/
- /^PasswordAuthentication no/
command:
ufw status:
exit-status: 0
stdout:
- "Status: active"
- "22/tcp.*ALLOW"
- "6443/tcp.*ALLOW"
- "8472/udp.*ALLOW"
user:
admin:
exists: true
groups:
- sudo
shell: /bin/bash
Done when: goss validate passes on a freshly converged node.
T04 — Ansible role and playbook for Goss
id: T04
status: done
completed: "2026-03-09"
priority: high
state_hub_task_id: "c072c45b-f18d-45be-b747-6d219c3f1439"
Create ansible/roles/goss/ with tasks that:
- Download the Goss binary (pinned version) to
/usr/local/bin/goss - Copy
goss/baseline.yamlto/etc/goss/baseline.yaml - Run
goss -g /etc/goss/baseline.yaml validate --format tap - Fetch the TAP output back to the control node as
reports/goss-<host>-<date>.tap - Fail the play if any test fails (
rc != 0)
Create ansible/playbooks/verify.yaml:
- hosts: all
become: true
roles:
- role: goss
Done when: ansible-playbook ansible/playbooks/verify.yaml exits 0 on a
clean node, non-zero on a deliberately broken one (test with a manual config change).
T05 — Add make verify target
id: T05
status: done
completed: "2026-03-09"
priority: medium
state_hub_task_id: "a8100b8e-aed0-4bb4-a0dc-a6bdf3938b8d"
Add to Makefile:
verify: ## Run Goss test suite against all hosts — exits non-zero on failure
cd $(ANS_DIR) && ansible-playbook playbooks/verify.yaml -u $(SSH_USER)
Also update make status to print a summary line ("All assertions passed" /
"N assertions FAILED") rather than raw shell output.
Done when: make verify exits 0 on a good node, non-zero on a bad one.
T06 — Write ADR: railiance-hosts vs railiance-bootstrap boundary
id: T06
status: done
completed: "2026-03-09"
priority: medium
state_hub_task_id: "c3d98022-638d-4dcb-bdc7-a9501e1b6cd9"
Create docs/adr/ADR-002-repo-boundary-hosts-vs-bootstrap.md documenting:
- What
railiance-hostsowns (OS baseline, security, spec, tests) - What
railiance-bootstrapowns (Kubernetes/app layer, consumes a converged node) - Decision: any item present in
spec/server-baseline.yamlmust NOT be managed byrailiance-bootstrap - Migration note: superseded bootstrap.yml / harden.yml in that repo
Done when: ADR written and merged.
References
- Goss documentation: https://github.com/goss-org/goss
- Server spec:
spec/server-baseline.yaml - Bootstrap workplan:
workplans/RAIL-HO-WP-0001-hosteurope-bootstrap.md