generated from coulomb/repo-seed
Define backup restore secret handoff
This commit is contained in:
210
docs/backup-restore-secret-handoff.md
Normal file
210
docs/backup-restore-secret-handoff.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# Backup, Restore, And Secret Handoff
|
||||
|
||||
Last reviewed: 2026-06-05
|
||||
|
||||
Status: contract v1. This document defines ownership, evidence gates, and
|
||||
allowed references. It does not authorize a live backup job, restore drill,
|
||||
secret rotation, OpenBao policy change, or credential migration.
|
||||
|
||||
## Purpose
|
||||
|
||||
Forge data is operationally important, but the mechanisms that make data
|
||||
durable belong mostly below the forge layer. This contract states what
|
||||
`railiance-forge` owns, what `railiance-platform` implements, and what evidence
|
||||
S5 application releases can trust without taking custody of forge secrets.
|
||||
|
||||
## Boundary Summary
|
||||
|
||||
- `railiance-forge` owns the inventory of forge data, artifact restore
|
||||
requirements, retention posture, operator runbooks, and non-secret evidence
|
||||
that downstream consumers cite.
|
||||
- `railiance-platform` owns shared database, object-storage, backup/restore,
|
||||
OpenBao, policy, audit, and runtime secret-delivery mechanisms.
|
||||
- `railiance-cluster` owns cluster-level recovery primitives such as etcd,
|
||||
kubeconfig, node/runtime recovery, and cluster add-ons.
|
||||
- `railiance-apps` consumes published artifacts and restore evidence in app
|
||||
runbooks; it does not own package blobs, registry credentials, runner tokens,
|
||||
or forge database backups.
|
||||
- Source repos own source code, build definitions, package metadata, and image
|
||||
build definitions.
|
||||
|
||||
## Protected Asset Inventory
|
||||
|
||||
| Asset | Current anchor | Forge responsibility | Platform/lower-layer responsibility | Trust gate |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| Gitea application database | CNPG cluster `databases/gitea-db`, checked by `make gitea-status` | State what must be restorable and what forge checks prove after restore | CNPG backup/restore implementation and database recovery mechanisms | Restored database must support login, repo list, package metadata, and registry metadata checks |
|
||||
| Gitea shared storage | PVC `default/gitea-shared-storage`, mounted at `/data`; package blobs under `/data/packages` | Track package/blob growth, retention posture, and restore requirements | Durable volume backup, object-storage export, or future storage replication | Restored storage must support Git clone, package download/install, and container pull checks |
|
||||
| Source repositories and forge app state | Gitea database plus `/data` storage | Define restore drill scope and consumer evidence | Database/PVC/object-storage restore tooling | Non-production restore drill proves a known repo can be cloned after restore |
|
||||
| Container and Python package registry data | Package blobs under `/data/packages`; metadata in Gitea database | Define retention, cleanup, package evidence, and consumer verification gates | Durable backup of blobs and metadata | Known image/package can be pulled or installed after restore |
|
||||
| Runner registration and labels | Forge-owned runner substrate | Inventory labels, runner purpose, and replacement expectations | Secret delivery for runner tokens where OpenBao or platform policies apply | Replacement runner can run a sample job with the same semantic labels |
|
||||
| SOPS-encrypted Gitea values | `helm/gitea-values.sops.yaml` | Keep encrypted deploy input and sentinel check | SOPS/age bootstrap custody remains outside runtime secret delivery | `make check-sops` proves authorized decryption without storing plaintext |
|
||||
| Runtime secrets | Kubernetes Secrets, OpenBao paths, operator custody paths | Reference names and required purposes only | OpenBao paths, policy, audit, break-glass, and workload secret delivery | Platform OpenBao restore/audit evidence exists before production-trust use |
|
||||
| Artifact evidence | Forge docs, future artifact-store package, State Hub notes | Define required evidence fields and consumer references | Object-storage backend and credential delivery where evidence packages become durable artifacts | Evidence is retained without embedding secret material |
|
||||
|
||||
Known current caveat: the Gitea package data is on a 10 GiB `local-path` PVC.
|
||||
On 2026-05-19 `/data/packages` was about 798.5 MiB, and no Kubernetes `CronJob`
|
||||
backup resources were observed. That posture is acceptable for smoke and
|
||||
development artifacts, but production-critical package reliance needs recorded
|
||||
backup and restore evidence first.
|
||||
|
||||
## Backup Ownership
|
||||
|
||||
Forge owns the question: "What forge data must be recoverable, and what does a
|
||||
successful recovery prove?"
|
||||
|
||||
Platform owns the mechanism for:
|
||||
|
||||
- CNPG database backup and restore;
|
||||
- S3-compatible/object-storage backup targets;
|
||||
- OpenBao runtime secret custody, audit, backup, and restore;
|
||||
- workload secret delivery through External Secrets, CSI, or another approved
|
||||
platform mechanism;
|
||||
- future object-storage credential vending and policy shape.
|
||||
|
||||
Cluster owns the mechanism for:
|
||||
|
||||
- etcd and kubeconfig backup;
|
||||
- Kubernetes runtime recovery;
|
||||
- cluster add-ons needed before platform services can recover.
|
||||
|
||||
Until platform backup coverage is explicitly available for a forge asset, forge
|
||||
docs must treat that asset as not production-trustworthy. Operators may still
|
||||
use it for smoke, development, and migration evidence if the risk is recorded.
|
||||
|
||||
## Restore Drill Requirements
|
||||
|
||||
A forge restore drill should be non-production first and should record only
|
||||
non-secret evidence.
|
||||
|
||||
Minimum drills:
|
||||
|
||||
1. Source forge restore:
|
||||
- restore the Gitea database and shared storage into an isolated namespace or
|
||||
host;
|
||||
- verify Gitea starts;
|
||||
- verify a known repository can be listed and cloned;
|
||||
- verify a known user/org/repo permission path still exists.
|
||||
2. Package/blob restore:
|
||||
- restore package metadata and package blobs together;
|
||||
- verify a known Python package version can be installed;
|
||||
- verify a known container image tag or digest can be pulled;
|
||||
- verify registry authentication behavior without recording the token.
|
||||
3. Runner substrate restore:
|
||||
- replace a runner without reusing old registration tokens;
|
||||
- verify semantic labels still match the published label contract;
|
||||
- run a non-production sample workflow;
|
||||
- record runner identity and label evidence, not runner tokens.
|
||||
4. Secret delivery restore:
|
||||
- cite platform OpenBao restore evidence before relying on OpenBao-delivered
|
||||
forge credentials;
|
||||
- verify a non-production secret reaches the intended workload path;
|
||||
- verify no secret value appears in Git, State Hub notes, logs, screenshots,
|
||||
or drill artifacts.
|
||||
|
||||
Successful evidence should include:
|
||||
|
||||
- date and operator;
|
||||
- source backup reference or encrypted snapshot reference;
|
||||
- restored environment name;
|
||||
- commands run, with secret values redacted before recording;
|
||||
- post-restore checks and results;
|
||||
- explicit `no_secret_material_recorded` assertion;
|
||||
- rollback or cleanup note for the restored environment.
|
||||
|
||||
## Secret Custody Boundaries
|
||||
|
||||
SOPS/age remains the Git-at-rest bootstrap mechanism for encrypted deploy
|
||||
inputs such as `helm/gitea-values.sops.yaml`. This repo may keep encrypted SOPS
|
||||
files and may provide `make check-sops` as a sentinel, but it must not commit or
|
||||
log decrypted values.
|
||||
|
||||
OpenBao is the platform runtime secret service. The platform docs define paths
|
||||
such as:
|
||||
|
||||
```text
|
||||
platform/workloads/<namespace>/<service-account>/<secret-name>
|
||||
platform/object-storage/<consumer>
|
||||
platform/databases/<consumer>
|
||||
platform/operators/<purpose>
|
||||
```
|
||||
|
||||
Forge may request or reference OpenBao paths for forge workloads, package
|
||||
tokens, runner registration, object-storage credentials, and database access.
|
||||
Forge does not define OpenBao mounts, audit devices, root/unseal custody,
|
||||
break-glass policy, or global secret-delivery mechanisms.
|
||||
|
||||
Do not store in forge docs, State Hub notes, screenshots, logs, or workplans:
|
||||
|
||||
- decrypted SOPS values;
|
||||
- OpenBao tokens, root tokens, unseal shares, or recovery codes;
|
||||
- database passwords or connection strings with passwords;
|
||||
- package tokens or tokenized package index URLs;
|
||||
- runner registration tokens;
|
||||
- object-storage access keys or secret keys;
|
||||
- kubeconfigs or bearer tokens.
|
||||
|
||||
Allowed references:
|
||||
|
||||
- Kubernetes namespace and Secret names;
|
||||
- SOPS file paths;
|
||||
- OpenBao path names and policy names;
|
||||
- credential purpose and scope;
|
||||
- non-secret command names;
|
||||
- redacted command examples;
|
||||
- timestamps, backup ids, encrypted snapshot locations, and evidence file names
|
||||
that do not reveal secret material.
|
||||
|
||||
## S5 Artifact Verification Without Registry Credentials
|
||||
|
||||
S5 application runbooks can trust forge artifacts only through evidence, not by
|
||||
owning forge credentials.
|
||||
|
||||
For a consuming app release, S5 may cite:
|
||||
|
||||
- source repo and commit SHA;
|
||||
- package name and version;
|
||||
- container image repository, tag, and digest when available;
|
||||
- forge publish job id or evidence reference;
|
||||
- package/blob restore drill reference when the artifact is production-critical;
|
||||
- namespace-local pull Secret name if private registry access is required;
|
||||
- app deployment dry-run and smoke-test result.
|
||||
|
||||
S5 should not store:
|
||||
|
||||
- package publish credentials;
|
||||
- registry write tokens;
|
||||
- package index URLs containing credentials;
|
||||
- forge backup snapshots;
|
||||
- OpenBao tokens or platform-root paths;
|
||||
- package blob cleanup procedures as app-owned operations.
|
||||
|
||||
If an S5 release depends on a private package or image, the app runbook should
|
||||
name the consuming Kubernetes Secret or OpenBao-delivered workload path and cite
|
||||
forge/platform evidence that the artifact can be restored. The app repo should
|
||||
not copy the credential or the forge backup recipe.
|
||||
|
||||
## Production-Trust Gates
|
||||
|
||||
Before treating forge packages, images, or source state as production-critical,
|
||||
the relevant asset must have:
|
||||
|
||||
- backup mechanism identified;
|
||||
- restore drill completed in an isolated environment;
|
||||
- consumer verification command recorded;
|
||||
- secret custody path documented without live values;
|
||||
- rollback or disable path documented;
|
||||
- storage growth inspection procedure;
|
||||
- owner for the next restore drill.
|
||||
|
||||
If one of these gates is missing, consumers may still use forge artifacts for
|
||||
smoke, development, or migration work, but production promotion should record a
|
||||
follow-up against the owning layer before relying on the artifact.
|
||||
|
||||
## Follow-Ups
|
||||
|
||||
- WP-0006-T08 should turn backup, restore, storage growth, and runner status
|
||||
evidence into inspectable operating signals.
|
||||
- WP-0006-T09 should model forge backup/restore and secret-delivery edges in
|
||||
Railiance Fabric.
|
||||
- `RAILIANCE-WP-0005-T04` should use this contract when documenting S5 app data
|
||||
restore readiness and app runbook evidence requirements.
|
||||
@@ -80,4 +80,4 @@ The PVC is `default/gitea-shared-storage`, 10 GiB, `local-path`, `RWO`. The live
|
||||
cluster showed no Kubernetes `CronJob` backup resources across namespaces on
|
||||
2026-05-19. This is acceptable for the current smoke-test images, but heavy tag
|
||||
growth should wait for the forge/platform backup and retention follow-up in
|
||||
`docs/initial-operating-contracts.md`.
|
||||
`docs/backup-restore-secret-handoff.md`.
|
||||
|
||||
@@ -70,6 +70,8 @@ leaving live deploy and secret custody changes behind separate review gates.
|
||||
drill for the relevant storage path.
|
||||
- S5 app releases may consume forge artifacts, but they should cite forge
|
||||
evidence rather than owning package blob backup procedures themselves.
|
||||
- The detailed backup, restore, and secret custody handoff contract lives in
|
||||
`docs/backup-restore-secret-handoff.md`.
|
||||
|
||||
## Secret Custody
|
||||
|
||||
@@ -79,6 +81,8 @@ leaving live deploy and secret custody changes behind separate review gates.
|
||||
tokens, tokenized package index URLs, or generated credential material.
|
||||
- Deploy-capable files that reference encrypted values move only after review of
|
||||
the SOPS/OpenBao handoff and compatibility pointers.
|
||||
- Allowed and forbidden secret references are defined in
|
||||
`docs/backup-restore-secret-handoff.md`.
|
||||
|
||||
## Observability And Evidence
|
||||
|
||||
|
||||
Reference in New Issue
Block a user