generated from coulomb/repo-seed
211 lines
10 KiB
Markdown
211 lines
10 KiB
Markdown
# Backup, Restore, And Secret Handoff
|
|
|
|
Last reviewed: 2026-06-05
|
|
|
|
Status: contract v1. This document defines ownership, evidence gates, and
|
|
allowed references. It does not authorize a live backup job, restore drill,
|
|
secret rotation, OpenBao policy change, or credential migration.
|
|
|
|
## Purpose
|
|
|
|
Forge data is operationally important, but the mechanisms that make data
|
|
durable belong mostly below the forge layer. This contract states what
|
|
`railiance-forge` owns, what `railiance-platform` implements, and what evidence
|
|
S5 application releases can trust without taking custody of forge secrets.
|
|
|
|
## Boundary Summary
|
|
|
|
- `railiance-forge` owns the inventory of forge data, artifact restore
|
|
requirements, retention posture, operator runbooks, and non-secret evidence
|
|
that downstream consumers cite.
|
|
- `railiance-platform` owns shared database, object-storage, backup/restore,
|
|
OpenBao, policy, audit, and runtime secret-delivery mechanisms.
|
|
- `railiance-cluster` owns cluster-level recovery primitives such as etcd,
|
|
kubeconfig, node/runtime recovery, and cluster add-ons.
|
|
- `railiance-apps` consumes published artifacts and restore evidence in app
|
|
runbooks; it does not own package blobs, registry credentials, runner tokens,
|
|
or forge database backups.
|
|
- Source repos own source code, build definitions, package metadata, and image
|
|
build definitions.
|
|
|
|
## Protected Asset Inventory
|
|
|
|
| Asset | Current anchor | Forge responsibility | Platform/lower-layer responsibility | Trust gate |
|
|
| --- | --- | --- | --- | --- |
|
|
| Gitea application database | CNPG cluster `databases/gitea-db`, checked by `make gitea-status` | State what must be restorable and what forge checks prove after restore | CNPG backup/restore implementation and database recovery mechanisms | Restored database must support login, repo list, package metadata, and registry metadata checks |
|
|
| Gitea shared storage | PVC `default/gitea-shared-storage`, mounted at `/data`; package blobs under `/data/packages` | Track package/blob growth, retention posture, and restore requirements | Durable volume backup, object-storage export, or future storage replication | Restored storage must support Git clone, package download/install, and container pull checks |
|
|
| Source repositories and forge app state | Gitea database plus `/data` storage | Define restore drill scope and consumer evidence | Database/PVC/object-storage restore tooling | Non-production restore drill proves a known repo can be cloned after restore |
|
|
| Container and Python package registry data | Package blobs under `/data/packages`; metadata in Gitea database | Define retention, cleanup, package evidence, and consumer verification gates | Durable backup of blobs and metadata | Known image/package can be pulled or installed after restore |
|
|
| Runner registration and labels | Forge-owned runner substrate | Inventory labels, runner purpose, and replacement expectations | Secret delivery for runner tokens where OpenBao or platform policies apply | Replacement runner can run a sample job with the same semantic labels |
|
|
| SOPS-encrypted Gitea values | `helm/gitea-values.sops.yaml` | Keep encrypted deploy input and sentinel check | SOPS/age bootstrap custody remains outside runtime secret delivery | `make check-sops` proves authorized decryption without storing plaintext |
|
|
| Runtime secrets | Kubernetes Secrets, OpenBao paths, operator custody paths | Reference names and required purposes only | OpenBao paths, policy, audit, break-glass, and workload secret delivery | Platform OpenBao restore/audit evidence exists before production-trust use |
|
|
| Artifact evidence | Forge docs, future artifact-store package, State Hub notes | Define required evidence fields and consumer references | Object-storage backend and credential delivery where evidence packages become durable artifacts | Evidence is retained without embedding secret material |
|
|
|
|
Known current caveat: the Gitea package data is on a 10 GiB `local-path` PVC.
|
|
On 2026-05-19 `/data/packages` was about 798.5 MiB, and no Kubernetes `CronJob`
|
|
backup resources were observed. That posture is acceptable for smoke and
|
|
development artifacts, but production-critical package reliance needs recorded
|
|
backup and restore evidence first.
|
|
|
|
## Backup Ownership
|
|
|
|
Forge owns the question: "What forge data must be recoverable, and what does a
|
|
successful recovery prove?"
|
|
|
|
Platform owns the mechanism for:
|
|
|
|
- CNPG database backup and restore;
|
|
- S3-compatible/object-storage backup targets;
|
|
- OpenBao runtime secret custody, audit, backup, and restore;
|
|
- workload secret delivery through External Secrets, CSI, or another approved
|
|
platform mechanism;
|
|
- future object-storage credential vending and policy shape.
|
|
|
|
Cluster owns the mechanism for:
|
|
|
|
- etcd and kubeconfig backup;
|
|
- Kubernetes runtime recovery;
|
|
- cluster add-ons needed before platform services can recover.
|
|
|
|
Until platform backup coverage is explicitly available for a forge asset, forge
|
|
docs must treat that asset as not production-trustworthy. Operators may still
|
|
use it for smoke, development, and migration evidence if the risk is recorded.
|
|
|
|
## Restore Drill Requirements
|
|
|
|
A forge restore drill should be non-production first and should record only
|
|
non-secret evidence.
|
|
|
|
Minimum drills:
|
|
|
|
1. Source forge restore:
|
|
- restore the Gitea database and shared storage into an isolated namespace or
|
|
host;
|
|
- verify Gitea starts;
|
|
- verify a known repository can be listed and cloned;
|
|
- verify a known user/org/repo permission path still exists.
|
|
2. Package/blob restore:
|
|
- restore package metadata and package blobs together;
|
|
- verify a known Python package version can be installed;
|
|
- verify a known container image tag or digest can be pulled;
|
|
- verify registry authentication behavior without recording the token.
|
|
3. Runner substrate restore:
|
|
- replace a runner without reusing old registration tokens;
|
|
- verify semantic labels still match the published label contract;
|
|
- run a non-production sample workflow;
|
|
- record runner identity and label evidence, not runner tokens.
|
|
4. Secret delivery restore:
|
|
- cite platform OpenBao restore evidence before relying on OpenBao-delivered
|
|
forge credentials;
|
|
- verify a non-production secret reaches the intended workload path;
|
|
- verify no secret value appears in Git, State Hub notes, logs, screenshots,
|
|
or drill artifacts.
|
|
|
|
Successful evidence should include:
|
|
|
|
- date and operator;
|
|
- source backup reference or encrypted snapshot reference;
|
|
- restored environment name;
|
|
- commands run, with secret values redacted before recording;
|
|
- post-restore checks and results;
|
|
- explicit `no_secret_material_recorded` assertion;
|
|
- rollback or cleanup note for the restored environment.
|
|
|
|
## Secret Custody Boundaries
|
|
|
|
SOPS/age remains the Git-at-rest bootstrap mechanism for encrypted deploy
|
|
inputs such as `helm/gitea-values.sops.yaml`. This repo may keep encrypted SOPS
|
|
files and may provide `make check-sops` as a sentinel, but it must not commit or
|
|
log decrypted values.
|
|
|
|
OpenBao is the platform runtime secret service. The platform docs define paths
|
|
such as:
|
|
|
|
```text
|
|
platform/workloads/<namespace>/<service-account>/<secret-name>
|
|
platform/object-storage/<consumer>
|
|
platform/databases/<consumer>
|
|
platform/operators/<purpose>
|
|
```
|
|
|
|
Forge may request or reference OpenBao paths for forge workloads, package
|
|
tokens, runner registration, object-storage credentials, and database access.
|
|
Forge does not define OpenBao mounts, audit devices, root/unseal custody,
|
|
break-glass policy, or global secret-delivery mechanisms.
|
|
|
|
Do not store in forge docs, State Hub notes, screenshots, logs, or workplans:
|
|
|
|
- decrypted SOPS values;
|
|
- OpenBao tokens, root tokens, unseal shares, or recovery codes;
|
|
- database passwords or connection strings with passwords;
|
|
- package tokens or tokenized package index URLs;
|
|
- runner registration tokens;
|
|
- object-storage access keys or secret keys;
|
|
- kubeconfigs or bearer tokens.
|
|
|
|
Allowed references:
|
|
|
|
- Kubernetes namespace and Secret names;
|
|
- SOPS file paths;
|
|
- OpenBao path names and policy names;
|
|
- credential purpose and scope;
|
|
- non-secret command names;
|
|
- redacted command examples;
|
|
- timestamps, backup ids, encrypted snapshot locations, and evidence file names
|
|
that do not reveal secret material.
|
|
|
|
## S5 Artifact Verification Without Registry Credentials
|
|
|
|
S5 application runbooks can trust forge artifacts only through evidence, not by
|
|
owning forge credentials.
|
|
|
|
For a consuming app release, S5 may cite:
|
|
|
|
- source repo and commit SHA;
|
|
- package name and version;
|
|
- container image repository, tag, and digest when available;
|
|
- forge publish job id or evidence reference;
|
|
- package/blob restore drill reference when the artifact is production-critical;
|
|
- namespace-local pull Secret name if private registry access is required;
|
|
- app deployment dry-run and smoke-test result.
|
|
|
|
S5 should not store:
|
|
|
|
- package publish credentials;
|
|
- registry write tokens;
|
|
- package index URLs containing credentials;
|
|
- forge backup snapshots;
|
|
- OpenBao tokens or platform-root paths;
|
|
- package blob cleanup procedures as app-owned operations.
|
|
|
|
If an S5 release depends on a private package or image, the app runbook should
|
|
name the consuming Kubernetes Secret or OpenBao-delivered workload path and cite
|
|
forge/platform evidence that the artifact can be restored. The app repo should
|
|
not copy the credential or the forge backup recipe.
|
|
|
|
## Production-Trust Gates
|
|
|
|
Before treating forge packages, images, or source state as production-critical,
|
|
the relevant asset must have:
|
|
|
|
- backup mechanism identified;
|
|
- restore drill completed in an isolated environment;
|
|
- consumer verification command recorded;
|
|
- secret custody path documented without live values;
|
|
- rollback or disable path documented;
|
|
- storage growth inspection procedure;
|
|
- owner for the next restore drill.
|
|
|
|
If one of these gates is missing, consumers may still use forge artifacts for
|
|
smoke, development, or migration work, but production promotion should record a
|
|
follow-up against the owning layer before relying on the artifact.
|
|
|
|
## Follow-Ups
|
|
|
|
- WP-0006-T08 should turn backup, restore, storage growth, and runner status
|
|
evidence into inspectable operating signals.
|
|
- WP-0006-T09 should model forge backup/restore and secret-delivery edges in
|
|
Railiance Fabric.
|
|
- `RAILIANCE-WP-0005-T04` should use this contract when documenting S5 app data
|
|
restore readiness and app runbook evidence requirements.
|