Commit Graph

7 Commits

Author SHA1 Message Date
dddc7ebd81 Add activity-core cluster verifier
Some checks failed
railiance-tests / smoke (push) Has been cancelled
2026-06-16 03:51:01 +02:00
4e1a90032b fix(backup): elevate sudo in Makefile and guard mkdir after root check
Some checks failed
railiance-tests / smoke (push) Has been cancelled
- `make backup` now invokes `sudo tools/cmd/railiance-backup-s2` directly
- Move `mkdir -p` in railiance-backup-s2 to after the root check so the
  script emits a clear error instead of a raw permission-denied failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 22:33:49 +00:00
66f8ca4009 docs(wp-0004): add implementation notes for sudo, etcd, helm, cron
Some checks failed
railiance-tests / smoke (push) Has been cancelled
T02: note to verify etcd is in use before implementing; flags root requirement
T03: add KUBECONFIG to helm commands; note root access approach
T06: document solution to sudo problem — run cron under root's crontab,
     not a sudoers whitelist. Add restore drill commands. Fix cron to use
     absolute path (~ unreliable in root crontab).
T01: note to remove old railiance-backup script (wrong scope)
Makefile: fix stale backup description, add restore target, fix .PHONY

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 16:52:40 +00:00
75467673a8 feat(safety-net): create WP-0004, update preflight for OAS 5-repo layout
- workplans/RAIL-BS-WP-0004-safety-net.md: ADR-001 workplan file for
  current-env-safety-net workstream (7e8b0c20), T01-T04 done, T05-T06 todo
- tools/cmd/railiance-preflight: update REPOS to OAS S1-S5 stack
  (railiance-infra/cluster/platform/enablement/apps) + project repos;
  remove stale railiance-bootstrap reference
- docs/backup-restore.md: fix Step 5 clone commands to current repo names
- Makefile: add make backup and make preflight targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 15:21:29 +01:00
3297ac1f6c fix(test): correct ha-failover test — wrong URL, wrong pod label, missing kubectl
Three bugs:
- GITEA_URL defaulted to localhost:3000; Gitea NodePort is 32166
- Pod label app.kubernetes.io/name=postgresql-ha matched pgpool pod too;
  added component=postgresql to target only postgres nodes
- Used bare 'kubectl' which is not on PATH; switched to 'k3s kubectl'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 13:42:54 +00:00
660a63c674 feat(pgpool): implement WP-0003 T01-T04 — permanent fix for pgpool-password bug
Some checks failed
railiance-tests / smoke (push) Has been cancelled
T01: helm/gitea-values.yaml with postgresql-ha.pgpool.adminPassword
     (fill REPLACE_WITH_PGPOOL_ADMIN_PASSWORD before helm upgrade)
T02: tests/smoke_kube.sh — add pgpool and postgresql-ha pod health checks
T03: tests/test_ha_failover.sh — D3 HA failover test script
T04: docs/incidents/2026-03-10-pgpool-missing-secret.md + README link

Also: make test-ha-failover target, Makefile .PHONY updated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 14:16:22 +01:00
901535ca44 feat(k3s-baseline): complete WP-0002 T01-T05
- bootstrap.yml: install k3s (server+cluster-init, pinned v1.35.1+k3s1)
  and Helm (v3.17.3 with checksum verify); fetch kubeconfig to control node
- tests/smoke_kube.sh: assert node Ready, helm, CoreDNS, Traefik
- docs/kubeconfig.md: usage, merge, context-switch, security note
- Makefile: k3s-install and smoke targets with make help

Closes T01, T02, T03, T04, T05 of RAIL-BS-WP-0002.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 09:43:16 +00:00