Commit Graph

94 Commits

Author SHA1 Message Date
3f4f03e838 feat(ansible): inject ops-bridge key in base role at bootstrap
Add ops_bridge_pubkey to group_vars/all.yaml (public key only, safe to
commit) and inject it via ansible.posix.authorized_key in the base role,
immediately after SSH hardening. This ensures ops-bridge tunnel
connectivity is available as soon as SSH infrastructure is up on any
managed host — no manual key provisioning required for new nodes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:52:54 +01:00
ab92c58bda chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 13:42:11 +01:00
127231bf62 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 13:41:52 +01:00
72da7bd151 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 13:41:46 +01:00
93080128fd chore(workplan): mark T06 done (Gitea values → railiance-apps)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 13:41:21 +01:00
d722cac4a5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 10:03:40 +01:00
afd664b248 chore(workplan): mark T05 done — Valkey standalone S3 asset deployed
bitnami/valkey 5.4.9 in platform namespace; gitea-valkey-cluster
subchart decommissioned; Gitea cache/session/queue verified working.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 10:03:14 +01:00
28f08b17f3 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 09:12:50 +01:00
26849bd4d6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 09:12:40 +01:00
dfed454353 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 09:12:39 +01:00
bbde20d78d chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 09:12:31 +01:00
7bff1f211d chore(workplan): mark T04 done — Gitea migrated to cnpg gitea-db
postgresql-ha subchart decommissioned; 4 users, 26 repos verified intact;
NetworkPolicy for default→databases ingress applied.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 09:12:22 +01:00
2d7e0101bc feat(infra): UFW k3s routing + full deploy runbook
- base role: allow UFW routing direction (required for k3s flannel
  pod networking to function across nodes)
- docs/deploy-stack.md: full S1→S5 ordered deploy runbook with
  pre-conditions checklist and layer-by-layer steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 02:28:51 +01:00
aa822164b5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 02:24:47 +01:00
74f7c72dbb chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 02:24:32 +01:00
13443ee2d5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 02:24:27 +01:00
11a2c37bde chore(workplan): mark T03 and T08 done in WP-0004
T03 (gitea-db cnpg cluster): cluster healthy after adding missing
NetworkPolicies for databases namespace default-deny-all policy.
T08 (deploy-stack docs): docs/deploy-stack.md written last session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 02:24:07 +01:00
a787a8acb0 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 02:20:53 +01:00
8c08b4b806 fix(custodian-agent): dedicated playbook, correct working dir
- ansible/playbooks/custodian-agent.yaml: minimal playbook with only
  the custodian_agent role — avoids loading base/sops_agent/etc when
  all we need is key injection
- Makefile: use custodian-agent.yaml in provision targets; remove
  --tags workaround (was fragile; standalone playbook is correct)

Manual invocation (from CoulombCore):
  cd ~/railiance-infra/ansible
  ansible-playbook playbooks/custodian-agent.yaml -u tegwick --limit Railiance01

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 02:20:33 +01:00
087f5da57b chore: add .venv to .gitignore
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:53:58 +01:00
ff59d4e0f8 feat(ansible): add swapfile + resource_limits roles; add CoulombCore to inventory
T01: roles/swapfile — idempotent 4GB swapfile, vm.swappiness=10, fstab entry
T02: roles/resource_limits — PAM nproc caps (512/1024), systemd user-1000.slice
     memory limits (1500M/512M); templated per-host via host_vars
- inventory/host_vars/CoulombCore.yml — host-specific vars for both roles
- inventory/servers.yaml — add CoulombCore with id_ops SSH key
- inventory_from_yaml.py — load host_vars files into Ansible hostvars
- playbooks/bootstrap.yaml — include swapfile + resource_limits roles
- workplans/WP-0004 — flag T04/T09/T10 needs_human, add CoulombCore-local convergence note

Codifies manual INC-002 hardening. See RAIL-HO-WP-0004-T01/T02.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:49:35 +01:00
e10789bdd2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 01:26:01 +01:00
abb3c50a5c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 01:25:27 +01:00
f5bfc1a922 feat(custodian-agent): add custodian agent public key
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIC/V9fe5MGKdhTBz9KwEvC1NE+HjdoCtQocpGxP6Pko9

Generated 2026-03-27 via make custodian-keygen. Private key at workstation
only (~/.ssh/id_custodian_agent), never committed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:22:45 +01:00
30a3f908aa feat(custodian-agent): Ansible role + Makefile for Custodian SSH identity
Establishes a dedicated SSH keypair for the Custodian automation agent:
- ansible/roles/custodian_agent/: authorized_key task (tagged custodian_agent)
- ansible/inventory/group_vars/all.yaml: custodian_agent_user/pubkey vars
- ansible/playbooks/bootstrap.yaml: custodian_agent role added
- Makefile: provision-custodian-agent / provision-custodian-agent-host targets

Keypair generation: cd ~/the-custodian && make custodian-keygen
Then deploy:        cd ~/railiance-infra && make provision-custodian-agent

The private key lives at ~/.ssh/id_custodian_agent — never committed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:21:57 +01:00
caa6ae36da feat(workplan): add RAIL-HO-WP-0004 production-readiness workplan
10-task cross-layer workplan covering: Ansible hardening codification (T01-T02),
cnpg platform baseline superseding stale WP-0001 (T03-T05), S2→S5 Gitea boundary
fix (T06), SSH git automation on CoulombCore (T07, done), deploy-stack docs (T08),
state-hub + activity-core migration to cluster (T09-T10).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:01:47 +01:00
9d59b5c667 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 01:00:15 +01:00
967707bdca chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:59:45 +01:00
c4470fa4ac chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 00:58:30 +01:00
f2951f1a5b chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:58:11 +01:00
3d647e7181 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 00:54:37 +01:00
8688750dce chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:54:27 +01:00
4b3c2dcdce chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 00:53:47 +01:00
1c56bd3efc chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:53:38 +01:00
cc8c05fdd2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 00:53:06 +01:00
1171a27324 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:52:54 +01:00
86ceabd80e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-hosts
2026-03-27 00:10:20 +01:00
fda9d1c386 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-03-27:
  - update .custodian-brief.md for railiance-infra
2026-03-27 00:10:12 +01:00
7e1a5ef87b Updated scope 2026-03-20 23:44:33 +01:00
12feb80a98 chore(sbom): add system-level tool manifest for railiance-infra
Captures direct tool dependencies (terraform 1.9.5, sops 3.10.2, ansible, age,
cloud-init) with SPDX licence identifiers. Low-confidence entries flagged for
human verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 18:35:20 +01:00
216514e3a0 docs: add SCOPE.md for rapid orientation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 23:11:38 +01:00
0433453806 feat(backup): implement S1 integrated backup (Q3/D4)
tools/cmd/railiance-backup-s1:
  - OS config snapshot: sshd, ufw, fail2ban, hosts, apt sources
  - installed packages list
  - age-encrypted, output: /opt/backup/railiance/infra/
  - requires root, no network dependency

Makefile: add `make backup` target

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 21:18:05 +01:00
558d2d9677 chore(makefile): remove tunnel target (moved to the-custodian state-hub)
The reverse SSH tunnel is State Hub infrastructure, not infra-layer
tooling. Use: cd ~/the-custodian/state-hub && make tunnel HOST=user@host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 01:19:43 +01:00
2634102ce2 chore(workplan): mark WP-0003 completed — all tasks done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 00:36:56 +01:00
1433877aa2 feat(relocate): receive cloudinit and railiance-plan-host from railiance-cluster
Per ADR-003: cloud-init (S1 node provisioning) and host planning tool
belong at the Infrastructure Substrate layer. Moved from railiance-cluster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 00:34:50 +01:00
703c57d91c chore(rename): railiance-hosts → railiance-infra
Update all operational references to reflect the new repo name per
ADR-003 (OAS S1 Infrastructure Substrate). Historical text in ADRs
and state-hub-inbox files preserved as-is. Gitea remote URL updated
locally (Gitea repo rename is a manual step).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 00:34:18 +01:00
a680fb51af feat(adr): add ADR-003 (5-repo OAS stack); supersede ADR-002
ADR-003 formalises the 5-repo structure aligned with OAS Stack S1-S5:
railiance-infra, railiance-cluster, railiance-platform,
railiance-enablement, railiance-apps. Defines boundary rule, pre-condition
chain, and content relocation table. ADR-002 marked superseded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 00:27:18 +01:00
d9f6848a5b feat(workplan): add WP-0003 for 5-repo OAS stack restructure
Plans the rename of railiance-hosts→infra and railiance-bootstrap→cluster,
creation of railiance-platform/enablement/apps, ADR-003 (supersedes ADR-002),
content relocations, state hub re-registration, and resolution of the
pending railiance-apps decision (7cddead6).

7 tasks; state_hub_workstream_id: 3ae0afc5-13f2-4e6c-aea7-1c1fb9f1ab81

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 23:53:49 +01:00
15bb2978cc feat(tunnel): add make tunnel target; complete WP-0001
- Add `make tunnel` to Makefile: reads first host from
  inventory/servers.yaml and opens a reverse SSH tunnel
  forwarding local state-hub (port 8000) to the remote host
- Mark T02 done and close WP-0001 (all tasks complete)
- WP-0002 T01/T02 task IDs backfilled by consistency checker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:53:10 +01:00
b32dfd4f5a docs: add verification guide, close WP-0002
- docs/verification.md: explains spec/server-baseline.yaml, goss/baseline.yaml,
  make verify workflow, assertion mapping table, and how to add new checks
- docs/convergence.md: replace manual spot-check snippet with make verify reference
- workplans/RAIL-HO-WP-0002: mark completed (all tasks done, workstream closed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:37:10 +01:00