# Backup & Restore Covers the current single-server development environment. This is the safety net that must be operational before any infrastructure migration work begins. --- ## What is protected | Asset | Location | Risk without backup | |---|---|---| | Custodian State Hub database | Docker volume `infra_pg_data` | Total loss of all workstreams, tasks, decisions, progress history | | Claude config & memory | `~/.claude/`, `~/.claude.json` | Loss of project memory, MCP registration, settings | | Git config | `~/.gitconfig` | Minor friction, recoverable | | age private key | `~/.config/age/railiance-backup.key` | Cannot decrypt any existing backup | Git repositories are **not** included in the backup — they are protected by being pushed to Gitea remotes. The preflight check verifies this. --- ## Encryption All backups are encrypted with [age](https://age-encryption.org/) before leaving the machine. **Key locations:** | Copy | Location | Purpose | |---|---|---| | Operational | `~/.config/age/railiance-backup.key` | Used locally for restore drills | | Recovery | Password manager | Used when the machine is gone | Permissions: `chmod 700 ~/.config/age && chmod 600 ~/.config/age/railiance-backup.key` The public key is hardcoded in `tools/cmd/railiance-backup`. To retrieve it: ```bash grep "public key" ~/.config/age/railiance-backup.key ``` > **The password manager copy is the only key that survives hardware failure.** > Verify it is there before doing any infrastructure work. --- ## Destination Backups are uploaded to a Nextcloud file drop (upload-only, no credentials required to write, cannot be read back without Nextcloud admin access). The endpoint URL is stored locally in `wiki/260225-backup-dropoff-link.txt` (gitignored). Uploads use a direct HTTP PUT via curl — rclone is not used because Nextcloud file drop links only permit PUT requests. A local cache of the last 7 backups of each type is kept in `~/.cache/railiance/backups/`. --- ## Running a backup ```bash bin/railiance backup ``` This runs two steps: 1. **PostgreSQL dump** — `pg_dump` from the running `infra-postgres-1` container, piped through `age`, uploaded as `db-.sql.age`. 2. **Config snapshot** — tar of `~/.claude/`, `~/.claude.json`, `~/.gitconfig`, encrypted with `age`, uploaded as `config-.tar.gz.age`. A `.last-backup` stamp is written to the local cache; the preflight check reads this to verify freshness. **Automated:** a cron job runs the backup daily at 02:00: ``` 0 2 * * * /home/worsch/railiance-cluster/bin/railiance backup >> ~/.cache/railiance/backup.log 2>&1 ``` --- ## Pre-migration preflight Before touching any infrastructure, run: ```bash bin/railiance preflight ``` Checks performed: | Check | Pass condition | |---|---| | DB backup freshness | Latest `db-*.sql.age` is less than 24 hours old | | Config backup freshness | Latest `config-*.tar.gz.age` is less than 24 hours old | | Git repos clean | No uncommitted changes in any tracked repo | | Git repos pushed | No unpushed commits in any tracked repo | | age key present | `~/.config/age/railiance-backup.key` exists | Exit 0 = safe to proceed. Exit 1 = do not proceed. --- ## Restore procedure Use this when recovering from hardware failure, WSL2 corruption, or accidental data loss. Work through it in order — each step depends on the previous one. ### Step 0 — Prerequisites On a fresh Ubuntu / WSL2 instance, install the required tools: ```bash sudo apt-get update && sudo apt-get install -y \ git curl docker.io age postgresql-client ``` Start Docker: ```bash sudo service docker start ``` ### Step 1 — Retrieve the age private key Copy the private key from your password manager into the machine: ```bash mkdir -p ~/.config/age # paste the key content: cat > ~/.config/age/railiance-backup.key # (paste, then Ctrl-D) chmod 700 ~/.config/age && chmod 600 ~/.config/age/railiance-backup.key ``` ### Step 2 — Get the backup files Download the most recent backup files from Nextcloud (ask the Nextcloud admin for read access, or retrieve from `~/.cache/railiance/backups/` on a secondary machine if the local cache survived). Files needed: - `db-.sql.age` - `config-.tar.gz.age` ### Step 3 — Restore PostgreSQL Start a fresh postgres container: ```bash cd ~/the-custodian/state-hub cp infra/.env.example infra/.env # fill in POSTGRES_PASSWORD make db ``` Decrypt and restore the database dump: ```bash age --decrypt \ -i ~/.config/age/railiance-backup.key \ db-.sql.age \ | docker exec -i infra-postgres-1 psql -U custodian custodian ``` Verify row counts look sane: ```bash docker exec infra-postgres-1 psql -U custodian custodian \ -c "SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE n_live_tup > 0 ORDER BY n_live_tup DESC;" ``` ### Step 4 — Restore config files ```bash age --decrypt \ -i ~/.config/age/railiance-backup.key \ config-.tar.gz.age \ | tar -xz -C ~ ``` This restores `~/.claude/`, `~/.claude.json`, and `~/.gitconfig`. ### Step 5 — Clone repositories ```bash git clone /coulomb/railiance-bootstrap.git ~/railiance-bootstrap git clone /tegwick/the-custodian.git ~/the-custodian git clone /coulomb/markitect_project.git ~/markitect_project # ... remaining repos as needed ``` If Gitea is offline, clone from the local bare mirrors in `~/.cache/railiance/git-mirrors/` if they were set up (see T3). ### Step 6 — Register the MCP server ```bash cd ~/the-custodian/state-hub python3 scripts/patch_mcp_cwd.py ``` ### Step 7 — Start the state hub and verify ```bash cd ~/the-custodian/state-hub make api & # in background or a separate terminal ``` Smoke test — confirm state hub is responding: ```bash curl -sf http://127.0.0.1:8000/state/summary | python3 -m json.tool | head -20 ``` And from Claude Code, confirm MCP tools are available: ``` bin/railiance preflight ``` --- ## Restore drill (validation) Run a restore drill before doing any major infrastructure work. The drill validates that the procedure above actually works without waiting for a real disaster. A minimal drill that does not require a second machine: ```bash # 1. Start a second postgres container on a different port docker run -d --name restore-test \ -e POSTGRES_DB=custodian \ -e POSTGRES_USER=custodian \ -e POSTGRES_PASSWORD=testpass \ -p 5433:5432 \ postgres:16-alpine # 2. Decrypt and restore to it age --decrypt \ -i ~/.config/age/railiance-backup.key \ ~/.cache/railiance/backups/db-$(ls ~/.cache/railiance/backups/db-*.sql.age | sort -r | head -1 | xargs basename | sed 's/db-//;s/.sql.age//').sql.age \ | docker exec -i restore-test psql -U custodian custodian # 3. Check row counts docker exec restore-test psql -U custodian custodian \ -c "SELECT relname, n_live_tup FROM pg_stat_user_tables WHERE n_live_tup > 0 ORDER BY n_live_tup DESC;" # 4. Clean up docker rm -f restore-test ``` Record the drill completion with a dated file (preflight checks for this in T5): ```bash echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) restore drill OK" \ >> ~/.cache/railiance/restore-drill.log ``` --- ## Extension Points ### EP-RAIL-003 — Git bare-repo mirrors as secondary restore source The current design relies on Gitea remotes for git repo recovery. If Gitea is offline during a migration, repos can only be recovered from local working copies (if they survive). A secondary bare-repo mirror (e.g., in a local directory or on a NAS) would make git recovery independent of Gitea availability. **Trigger:** when the Gitea server becomes a SPOF for restore operations (e.g., during ThreePhoenix migration work on the server that runs Gitea). **Constraint:** mirrors must be updated on the same schedule as the DB backup; stale mirrors provide false confidence. ### EP-RAIL-004 — Offsite secondary copy of encrypted backups The current Nextcloud file drop is the only offsite copy. A second destination (rclone to an S3-compatible store, or rsync to a NAS) would protect against Nextcloud unavailability. **Trigger:** when Nextcloud is not available or is itself hosted on the same infrastructure being migrated. **Constraint:** the second destination must also be write-only or similarly access-controlled; duplicating to a readable location without additional access controls widens the blast radius of a credential leak.