diff --git a/README.md b/README.md index 8575158..960e51c 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Key contracts: - `docs/initial-operating-contracts.md` - `docs/ci-runner-actions-gitops-ownership.md` +- `docs/backup-restore-secret-handoff.md` - `docs/gitea-container-registry.md` - `docs/gitea-package-registry.md` diff --git a/SCOPE.md b/SCOPE.md index 73ac072..bfff64c 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -34,6 +34,8 @@ Deploy-capable Gitea Helm/SOPS/manifests also live here now; `railiance-apps` keeps only transitional compatibility wrappers for old operator entry points. The runner, Actions, and GitOps ownership contract lives in `docs/ci-runner-actions-gitops-ownership.md`. +The backup, restore, and secret custody handoff contract lives in +`docs/backup-restore-secret-handoff.md`. --- @@ -178,7 +180,9 @@ Known starting point: `docs/gitea-package-registry.md`. 6. For runner, Actions, and GitOps ownership, read `docs/ci-runner-actions-gitops-ownership.md`. -7. For migration context, read +7. For backup, restore, and secret custody handoffs, read + `docs/backup-restore-secret-handoff.md`. +8. For migration context, read `/home/worsch/railiance-apps/workplans/RAILIANCE-WP-0006-railiance-forge-extraction.md`. --- diff --git a/docs/backup-restore-secret-handoff.md b/docs/backup-restore-secret-handoff.md new file mode 100644 index 0000000..2573da8 --- /dev/null +++ b/docs/backup-restore-secret-handoff.md @@ -0,0 +1,210 @@ +# Backup, Restore, And Secret Handoff + +Last reviewed: 2026-06-05 + +Status: contract v1. This document defines ownership, evidence gates, and +allowed references. It does not authorize a live backup job, restore drill, +secret rotation, OpenBao policy change, or credential migration. + +## Purpose + +Forge data is operationally important, but the mechanisms that make data +durable belong mostly below the forge layer. This contract states what +`railiance-forge` owns, what `railiance-platform` implements, and what evidence +S5 application releases can trust without taking custody of forge secrets. + +## Boundary Summary + +- `railiance-forge` owns the inventory of forge data, artifact restore + requirements, retention posture, operator runbooks, and non-secret evidence + that downstream consumers cite. +- `railiance-platform` owns shared database, object-storage, backup/restore, + OpenBao, policy, audit, and runtime secret-delivery mechanisms. +- `railiance-cluster` owns cluster-level recovery primitives such as etcd, + kubeconfig, node/runtime recovery, and cluster add-ons. +- `railiance-apps` consumes published artifacts and restore evidence in app + runbooks; it does not own package blobs, registry credentials, runner tokens, + or forge database backups. +- Source repos own source code, build definitions, package metadata, and image + build definitions. + +## Protected Asset Inventory + +| Asset | Current anchor | Forge responsibility | Platform/lower-layer responsibility | Trust gate | +| --- | --- | --- | --- | --- | +| Gitea application database | CNPG cluster `databases/gitea-db`, checked by `make gitea-status` | State what must be restorable and what forge checks prove after restore | CNPG backup/restore implementation and database recovery mechanisms | Restored database must support login, repo list, package metadata, and registry metadata checks | +| Gitea shared storage | PVC `default/gitea-shared-storage`, mounted at `/data`; package blobs under `/data/packages` | Track package/blob growth, retention posture, and restore requirements | Durable volume backup, object-storage export, or future storage replication | Restored storage must support Git clone, package download/install, and container pull checks | +| Source repositories and forge app state | Gitea database plus `/data` storage | Define restore drill scope and consumer evidence | Database/PVC/object-storage restore tooling | Non-production restore drill proves a known repo can be cloned after restore | +| Container and Python package registry data | Package blobs under `/data/packages`; metadata in Gitea database | Define retention, cleanup, package evidence, and consumer verification gates | Durable backup of blobs and metadata | Known image/package can be pulled or installed after restore | +| Runner registration and labels | Forge-owned runner substrate | Inventory labels, runner purpose, and replacement expectations | Secret delivery for runner tokens where OpenBao or platform policies apply | Replacement runner can run a sample job with the same semantic labels | +| SOPS-encrypted Gitea values | `helm/gitea-values.sops.yaml` | Keep encrypted deploy input and sentinel check | SOPS/age bootstrap custody remains outside runtime secret delivery | `make check-sops` proves authorized decryption without storing plaintext | +| Runtime secrets | Kubernetes Secrets, OpenBao paths, operator custody paths | Reference names and required purposes only | OpenBao paths, policy, audit, break-glass, and workload secret delivery | Platform OpenBao restore/audit evidence exists before production-trust use | +| Artifact evidence | Forge docs, future artifact-store package, State Hub notes | Define required evidence fields and consumer references | Object-storage backend and credential delivery where evidence packages become durable artifacts | Evidence is retained without embedding secret material | + +Known current caveat: the Gitea package data is on a 10 GiB `local-path` PVC. +On 2026-05-19 `/data/packages` was about 798.5 MiB, and no Kubernetes `CronJob` +backup resources were observed. That posture is acceptable for smoke and +development artifacts, but production-critical package reliance needs recorded +backup and restore evidence first. + +## Backup Ownership + +Forge owns the question: "What forge data must be recoverable, and what does a +successful recovery prove?" + +Platform owns the mechanism for: + +- CNPG database backup and restore; +- S3-compatible/object-storage backup targets; +- OpenBao runtime secret custody, audit, backup, and restore; +- workload secret delivery through External Secrets, CSI, or another approved + platform mechanism; +- future object-storage credential vending and policy shape. + +Cluster owns the mechanism for: + +- etcd and kubeconfig backup; +- Kubernetes runtime recovery; +- cluster add-ons needed before platform services can recover. + +Until platform backup coverage is explicitly available for a forge asset, forge +docs must treat that asset as not production-trustworthy. Operators may still +use it for smoke, development, and migration evidence if the risk is recorded. + +## Restore Drill Requirements + +A forge restore drill should be non-production first and should record only +non-secret evidence. + +Minimum drills: + +1. Source forge restore: + - restore the Gitea database and shared storage into an isolated namespace or + host; + - verify Gitea starts; + - verify a known repository can be listed and cloned; + - verify a known user/org/repo permission path still exists. +2. Package/blob restore: + - restore package metadata and package blobs together; + - verify a known Python package version can be installed; + - verify a known container image tag or digest can be pulled; + - verify registry authentication behavior without recording the token. +3. Runner substrate restore: + - replace a runner without reusing old registration tokens; + - verify semantic labels still match the published label contract; + - run a non-production sample workflow; + - record runner identity and label evidence, not runner tokens. +4. Secret delivery restore: + - cite platform OpenBao restore evidence before relying on OpenBao-delivered + forge credentials; + - verify a non-production secret reaches the intended workload path; + - verify no secret value appears in Git, State Hub notes, logs, screenshots, + or drill artifacts. + +Successful evidence should include: + +- date and operator; +- source backup reference or encrypted snapshot reference; +- restored environment name; +- commands run, with secret values redacted before recording; +- post-restore checks and results; +- explicit `no_secret_material_recorded` assertion; +- rollback or cleanup note for the restored environment. + +## Secret Custody Boundaries + +SOPS/age remains the Git-at-rest bootstrap mechanism for encrypted deploy +inputs such as `helm/gitea-values.sops.yaml`. This repo may keep encrypted SOPS +files and may provide `make check-sops` as a sentinel, but it must not commit or +log decrypted values. + +OpenBao is the platform runtime secret service. The platform docs define paths +such as: + +```text +platform/workloads/// +platform/object-storage/ +platform/databases/ +platform/operators/ +``` + +Forge may request or reference OpenBao paths for forge workloads, package +tokens, runner registration, object-storage credentials, and database access. +Forge does not define OpenBao mounts, audit devices, root/unseal custody, +break-glass policy, or global secret-delivery mechanisms. + +Do not store in forge docs, State Hub notes, screenshots, logs, or workplans: + +- decrypted SOPS values; +- OpenBao tokens, root tokens, unseal shares, or recovery codes; +- database passwords or connection strings with passwords; +- package tokens or tokenized package index URLs; +- runner registration tokens; +- object-storage access keys or secret keys; +- kubeconfigs or bearer tokens. + +Allowed references: + +- Kubernetes namespace and Secret names; +- SOPS file paths; +- OpenBao path names and policy names; +- credential purpose and scope; +- non-secret command names; +- redacted command examples; +- timestamps, backup ids, encrypted snapshot locations, and evidence file names + that do not reveal secret material. + +## S5 Artifact Verification Without Registry Credentials + +S5 application runbooks can trust forge artifacts only through evidence, not by +owning forge credentials. + +For a consuming app release, S5 may cite: + +- source repo and commit SHA; +- package name and version; +- container image repository, tag, and digest when available; +- forge publish job id or evidence reference; +- package/blob restore drill reference when the artifact is production-critical; +- namespace-local pull Secret name if private registry access is required; +- app deployment dry-run and smoke-test result. + +S5 should not store: + +- package publish credentials; +- registry write tokens; +- package index URLs containing credentials; +- forge backup snapshots; +- OpenBao tokens or platform-root paths; +- package blob cleanup procedures as app-owned operations. + +If an S5 release depends on a private package or image, the app runbook should +name the consuming Kubernetes Secret or OpenBao-delivered workload path and cite +forge/platform evidence that the artifact can be restored. The app repo should +not copy the credential or the forge backup recipe. + +## Production-Trust Gates + +Before treating forge packages, images, or source state as production-critical, +the relevant asset must have: + +- backup mechanism identified; +- restore drill completed in an isolated environment; +- consumer verification command recorded; +- secret custody path documented without live values; +- rollback or disable path documented; +- storage growth inspection procedure; +- owner for the next restore drill. + +If one of these gates is missing, consumers may still use forge artifacts for +smoke, development, or migration work, but production promotion should record a +follow-up against the owning layer before relying on the artifact. + +## Follow-Ups + +- WP-0006-T08 should turn backup, restore, storage growth, and runner status + evidence into inspectable operating signals. +- WP-0006-T09 should model forge backup/restore and secret-delivery edges in + Railiance Fabric. +- `RAILIANCE-WP-0005-T04` should use this contract when documenting S5 app data + restore readiness and app runbook evidence requirements. diff --git a/docs/gitea-container-registry.md b/docs/gitea-container-registry.md index ebc41ae..604f1c8 100644 --- a/docs/gitea-container-registry.md +++ b/docs/gitea-container-registry.md @@ -80,4 +80,4 @@ The PVC is `default/gitea-shared-storage`, 10 GiB, `local-path`, `RWO`. The live cluster showed no Kubernetes `CronJob` backup resources across namespaces on 2026-05-19. This is acceptable for the current smoke-test images, but heavy tag growth should wait for the forge/platform backup and retention follow-up in -`docs/initial-operating-contracts.md`. +`docs/backup-restore-secret-handoff.md`. diff --git a/docs/initial-operating-contracts.md b/docs/initial-operating-contracts.md index a8954f9..8202ccd 100644 --- a/docs/initial-operating-contracts.md +++ b/docs/initial-operating-contracts.md @@ -70,6 +70,8 @@ leaving live deploy and secret custody changes behind separate review gates. drill for the relevant storage path. - S5 app releases may consume forge artifacts, but they should cite forge evidence rather than owning package blob backup procedures themselves. +- The detailed backup, restore, and secret custody handoff contract lives in + `docs/backup-restore-secret-handoff.md`. ## Secret Custody @@ -79,6 +81,8 @@ leaving live deploy and secret custody changes behind separate review gates. tokens, tokenized package index URLs, or generated credential material. - Deploy-capable files that reference encrypted values move only after review of the SOPS/OpenBao handoff and compatibility pointers. +- Allowed and forbidden secret references are defined in + `docs/backup-restore-secret-handoff.md`. ## Observability And Evidence