Define backup restore secret handoff

This commit is contained in:
2026-06-05 16:28:03 +02:00
parent 4915ecf374
commit 8de78f1636
5 changed files with 221 additions and 2 deletions

View File

@@ -23,6 +23,7 @@ Key contracts:
- `docs/initial-operating-contracts.md` - `docs/initial-operating-contracts.md`
- `docs/ci-runner-actions-gitops-ownership.md` - `docs/ci-runner-actions-gitops-ownership.md`
- `docs/backup-restore-secret-handoff.md`
- `docs/gitea-container-registry.md` - `docs/gitea-container-registry.md`
- `docs/gitea-package-registry.md` - `docs/gitea-package-registry.md`

View File

@@ -34,6 +34,8 @@ Deploy-capable Gitea Helm/SOPS/manifests also live here now; `railiance-apps`
keeps only transitional compatibility wrappers for old operator entry points. keeps only transitional compatibility wrappers for old operator entry points.
The runner, Actions, and GitOps ownership contract lives in The runner, Actions, and GitOps ownership contract lives in
`docs/ci-runner-actions-gitops-ownership.md`. `docs/ci-runner-actions-gitops-ownership.md`.
The backup, restore, and secret custody handoff contract lives in
`docs/backup-restore-secret-handoff.md`.
--- ---
@@ -178,7 +180,9 @@ Known starting point:
`docs/gitea-package-registry.md`. `docs/gitea-package-registry.md`.
6. For runner, Actions, and GitOps ownership, read 6. For runner, Actions, and GitOps ownership, read
`docs/ci-runner-actions-gitops-ownership.md`. `docs/ci-runner-actions-gitops-ownership.md`.
7. For migration context, read 7. For backup, restore, and secret custody handoffs, read
`docs/backup-restore-secret-handoff.md`.
8. For migration context, read
`/home/worsch/railiance-apps/workplans/RAILIANCE-WP-0006-railiance-forge-extraction.md`. `/home/worsch/railiance-apps/workplans/RAILIANCE-WP-0006-railiance-forge-extraction.md`.
--- ---

View File

@@ -0,0 +1,210 @@
# Backup, Restore, And Secret Handoff
Last reviewed: 2026-06-05
Status: contract v1. This document defines ownership, evidence gates, and
allowed references. It does not authorize a live backup job, restore drill,
secret rotation, OpenBao policy change, or credential migration.
## Purpose
Forge data is operationally important, but the mechanisms that make data
durable belong mostly below the forge layer. This contract states what
`railiance-forge` owns, what `railiance-platform` implements, and what evidence
S5 application releases can trust without taking custody of forge secrets.
## Boundary Summary
- `railiance-forge` owns the inventory of forge data, artifact restore
requirements, retention posture, operator runbooks, and non-secret evidence
that downstream consumers cite.
- `railiance-platform` owns shared database, object-storage, backup/restore,
OpenBao, policy, audit, and runtime secret-delivery mechanisms.
- `railiance-cluster` owns cluster-level recovery primitives such as etcd,
kubeconfig, node/runtime recovery, and cluster add-ons.
- `railiance-apps` consumes published artifacts and restore evidence in app
runbooks; it does not own package blobs, registry credentials, runner tokens,
or forge database backups.
- Source repos own source code, build definitions, package metadata, and image
build definitions.
## Protected Asset Inventory
| Asset | Current anchor | Forge responsibility | Platform/lower-layer responsibility | Trust gate |
| --- | --- | --- | --- | --- |
| Gitea application database | CNPG cluster `databases/gitea-db`, checked by `make gitea-status` | State what must be restorable and what forge checks prove after restore | CNPG backup/restore implementation and database recovery mechanisms | Restored database must support login, repo list, package metadata, and registry metadata checks |
| Gitea shared storage | PVC `default/gitea-shared-storage`, mounted at `/data`; package blobs under `/data/packages` | Track package/blob growth, retention posture, and restore requirements | Durable volume backup, object-storage export, or future storage replication | Restored storage must support Git clone, package download/install, and container pull checks |
| Source repositories and forge app state | Gitea database plus `/data` storage | Define restore drill scope and consumer evidence | Database/PVC/object-storage restore tooling | Non-production restore drill proves a known repo can be cloned after restore |
| Container and Python package registry data | Package blobs under `/data/packages`; metadata in Gitea database | Define retention, cleanup, package evidence, and consumer verification gates | Durable backup of blobs and metadata | Known image/package can be pulled or installed after restore |
| Runner registration and labels | Forge-owned runner substrate | Inventory labels, runner purpose, and replacement expectations | Secret delivery for runner tokens where OpenBao or platform policies apply | Replacement runner can run a sample job with the same semantic labels |
| SOPS-encrypted Gitea values | `helm/gitea-values.sops.yaml` | Keep encrypted deploy input and sentinel check | SOPS/age bootstrap custody remains outside runtime secret delivery | `make check-sops` proves authorized decryption without storing plaintext |
| Runtime secrets | Kubernetes Secrets, OpenBao paths, operator custody paths | Reference names and required purposes only | OpenBao paths, policy, audit, break-glass, and workload secret delivery | Platform OpenBao restore/audit evidence exists before production-trust use |
| Artifact evidence | Forge docs, future artifact-store package, State Hub notes | Define required evidence fields and consumer references | Object-storage backend and credential delivery where evidence packages become durable artifacts | Evidence is retained without embedding secret material |
Known current caveat: the Gitea package data is on a 10 GiB `local-path` PVC.
On 2026-05-19 `/data/packages` was about 798.5 MiB, and no Kubernetes `CronJob`
backup resources were observed. That posture is acceptable for smoke and
development artifacts, but production-critical package reliance needs recorded
backup and restore evidence first.
## Backup Ownership
Forge owns the question: "What forge data must be recoverable, and what does a
successful recovery prove?"
Platform owns the mechanism for:
- CNPG database backup and restore;
- S3-compatible/object-storage backup targets;
- OpenBao runtime secret custody, audit, backup, and restore;
- workload secret delivery through External Secrets, CSI, or another approved
platform mechanism;
- future object-storage credential vending and policy shape.
Cluster owns the mechanism for:
- etcd and kubeconfig backup;
- Kubernetes runtime recovery;
- cluster add-ons needed before platform services can recover.
Until platform backup coverage is explicitly available for a forge asset, forge
docs must treat that asset as not production-trustworthy. Operators may still
use it for smoke, development, and migration evidence if the risk is recorded.
## Restore Drill Requirements
A forge restore drill should be non-production first and should record only
non-secret evidence.
Minimum drills:
1. Source forge restore:
- restore the Gitea database and shared storage into an isolated namespace or
host;
- verify Gitea starts;
- verify a known repository can be listed and cloned;
- verify a known user/org/repo permission path still exists.
2. Package/blob restore:
- restore package metadata and package blobs together;
- verify a known Python package version can be installed;
- verify a known container image tag or digest can be pulled;
- verify registry authentication behavior without recording the token.
3. Runner substrate restore:
- replace a runner without reusing old registration tokens;
- verify semantic labels still match the published label contract;
- run a non-production sample workflow;
- record runner identity and label evidence, not runner tokens.
4. Secret delivery restore:
- cite platform OpenBao restore evidence before relying on OpenBao-delivered
forge credentials;
- verify a non-production secret reaches the intended workload path;
- verify no secret value appears in Git, State Hub notes, logs, screenshots,
or drill artifacts.
Successful evidence should include:
- date and operator;
- source backup reference or encrypted snapshot reference;
- restored environment name;
- commands run, with secret values redacted before recording;
- post-restore checks and results;
- explicit `no_secret_material_recorded` assertion;
- rollback or cleanup note for the restored environment.
## Secret Custody Boundaries
SOPS/age remains the Git-at-rest bootstrap mechanism for encrypted deploy
inputs such as `helm/gitea-values.sops.yaml`. This repo may keep encrypted SOPS
files and may provide `make check-sops` as a sentinel, but it must not commit or
log decrypted values.
OpenBao is the platform runtime secret service. The platform docs define paths
such as:
```text
platform/workloads/<namespace>/<service-account>/<secret-name>
platform/object-storage/<consumer>
platform/databases/<consumer>
platform/operators/<purpose>
```
Forge may request or reference OpenBao paths for forge workloads, package
tokens, runner registration, object-storage credentials, and database access.
Forge does not define OpenBao mounts, audit devices, root/unseal custody,
break-glass policy, or global secret-delivery mechanisms.
Do not store in forge docs, State Hub notes, screenshots, logs, or workplans:
- decrypted SOPS values;
- OpenBao tokens, root tokens, unseal shares, or recovery codes;
- database passwords or connection strings with passwords;
- package tokens or tokenized package index URLs;
- runner registration tokens;
- object-storage access keys or secret keys;
- kubeconfigs or bearer tokens.
Allowed references:
- Kubernetes namespace and Secret names;
- SOPS file paths;
- OpenBao path names and policy names;
- credential purpose and scope;
- non-secret command names;
- redacted command examples;
- timestamps, backup ids, encrypted snapshot locations, and evidence file names
that do not reveal secret material.
## S5 Artifact Verification Without Registry Credentials
S5 application runbooks can trust forge artifacts only through evidence, not by
owning forge credentials.
For a consuming app release, S5 may cite:
- source repo and commit SHA;
- package name and version;
- container image repository, tag, and digest when available;
- forge publish job id or evidence reference;
- package/blob restore drill reference when the artifact is production-critical;
- namespace-local pull Secret name if private registry access is required;
- app deployment dry-run and smoke-test result.
S5 should not store:
- package publish credentials;
- registry write tokens;
- package index URLs containing credentials;
- forge backup snapshots;
- OpenBao tokens or platform-root paths;
- package blob cleanup procedures as app-owned operations.
If an S5 release depends on a private package or image, the app runbook should
name the consuming Kubernetes Secret or OpenBao-delivered workload path and cite
forge/platform evidence that the artifact can be restored. The app repo should
not copy the credential or the forge backup recipe.
## Production-Trust Gates
Before treating forge packages, images, or source state as production-critical,
the relevant asset must have:
- backup mechanism identified;
- restore drill completed in an isolated environment;
- consumer verification command recorded;
- secret custody path documented without live values;
- rollback or disable path documented;
- storage growth inspection procedure;
- owner for the next restore drill.
If one of these gates is missing, consumers may still use forge artifacts for
smoke, development, or migration work, but production promotion should record a
follow-up against the owning layer before relying on the artifact.
## Follow-Ups
- WP-0006-T08 should turn backup, restore, storage growth, and runner status
evidence into inspectable operating signals.
- WP-0006-T09 should model forge backup/restore and secret-delivery edges in
Railiance Fabric.
- `RAILIANCE-WP-0005-T04` should use this contract when documenting S5 app data
restore readiness and app runbook evidence requirements.

View File

@@ -80,4 +80,4 @@ The PVC is `default/gitea-shared-storage`, 10 GiB, `local-path`, `RWO`. The live
cluster showed no Kubernetes `CronJob` backup resources across namespaces on cluster showed no Kubernetes `CronJob` backup resources across namespaces on
2026-05-19. This is acceptable for the current smoke-test images, but heavy tag 2026-05-19. This is acceptable for the current smoke-test images, but heavy tag
growth should wait for the forge/platform backup and retention follow-up in growth should wait for the forge/platform backup and retention follow-up in
`docs/initial-operating-contracts.md`. `docs/backup-restore-secret-handoff.md`.

View File

@@ -70,6 +70,8 @@ leaving live deploy and secret custody changes behind separate review gates.
drill for the relevant storage path. drill for the relevant storage path.
- S5 app releases may consume forge artifacts, but they should cite forge - S5 app releases may consume forge artifacts, but they should cite forge
evidence rather than owning package blob backup procedures themselves. evidence rather than owning package blob backup procedures themselves.
- The detailed backup, restore, and secret custody handoff contract lives in
`docs/backup-restore-secret-handoff.md`.
## Secret Custody ## Secret Custody
@@ -79,6 +81,8 @@ leaving live deploy and secret custody changes behind separate review gates.
tokens, tokenized package index URLs, or generated credential material. tokens, tokenized package index URLs, or generated credential material.
- Deploy-capable files that reference encrypted values move only after review of - Deploy-capable files that reference encrypted values move only after review of
the SOPS/OpenBao handoff and compatibility pointers. the SOPS/OpenBao handoff and compatibility pointers.
- Allowed and forbidden secret references are defined in
`docs/backup-restore-secret-handoff.md`.
## Observability And Evidence ## Observability And Evidence