Close S5 app readiness workplan
This commit is contained in:
94
docs/app-data-backup-restore-handoff.md
Normal file
94
docs/app-data-backup-restore-handoff.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# App Data Backup And Restore Handoff
|
||||
|
||||
This document defines the S5 app release boundary for data durability. It does
|
||||
not create backup jobs, authorize a live restore drill, or move platform
|
||||
backup ownership into `railiance-apps`.
|
||||
|
||||
## Current App Data
|
||||
|
||||
`vergabe-teilnahme` stores relational app data in `vergabe_db` on the shared
|
||||
CloudNativePG cluster `apps-pg` in the `databases` namespace. The cluster is an
|
||||
S3 platform service owned by `railiance-platform`; see
|
||||
`/home/worsch/railiance-platform/docs/apps-pg.md`.
|
||||
|
||||
The app currently has no durable media PVC enabled. `persistence.media.enabled`
|
||||
is `false`, so uploaded media is deferred rather than an S5 durability promise.
|
||||
|
||||
## Ownership Matrix
|
||||
|
||||
| Concern | S5 app repo owns | Upstream owner |
|
||||
| --- | --- | --- |
|
||||
| App database request | App name, namespace, database name, role name, intended use, and production-readiness need | `railiance-platform` reviews and provisions the role/database |
|
||||
| Runtime DB Secret use | Secret name in the app namespace and URL-encoded DSN rebuild helper | `railiance-platform` owns platform credential source and future secret delivery |
|
||||
| Database backup job | Readiness gate and consumer evidence requirement | `railiance-platform` owns CNPG backup and restore implementation |
|
||||
| App restore verification | App-specific post-restore checks, migrations, login/smoke path, and rollback note | `railiance-platform` restores the backing database |
|
||||
| Forge images/packages | Artifact identity and consumer evidence cited by app runbooks | `railiance-forge` owns registry/package restore evidence |
|
||||
| App media/blob data | PVC declaration and app-level restore checks if enabled | `railiance-platform` owns storage backup mechanism once media is production-critical |
|
||||
|
||||
## Production Readiness Gate
|
||||
|
||||
Before an app release treats data as production-critical, the app runbook should
|
||||
record:
|
||||
|
||||
- data class: disposable, externally reproducible, or production-critical;
|
||||
- owning platform workplan or doc for the backup mechanism;
|
||||
- latest non-secret backup evidence reference;
|
||||
- latest restore-drill evidence reference, ideally from an isolated
|
||||
environment;
|
||||
- app-specific post-restore checks, such as migrations, health endpoint,
|
||||
login/admin path, and representative business workflow;
|
||||
- rollback or disable path if restore fails;
|
||||
- assertion that no secret material was copied into Git, logs, screenshots, or
|
||||
State Hub notes.
|
||||
|
||||
If this gate is missing, the app can still be used for smoke, development, or
|
||||
migration validation, but promotion beyond that should create or link a
|
||||
`railiance-platform` workplan.
|
||||
|
||||
## `vergabe-teilnahme` Gate
|
||||
|
||||
Current posture:
|
||||
|
||||
- database: `vergabe_db` on `databases/apps-pg`;
|
||||
- app role Secret: `vergabe-app-credentials`;
|
||||
- env Secret: `vergabe-teilnahme-env`;
|
||||
- current backup status: platform docs state that `apps-pg` backup coverage is
|
||||
follow-up work;
|
||||
- restore status: no app-level restore drill evidence is recorded in this repo.
|
||||
|
||||
Minimum evidence before production-trust use:
|
||||
|
||||
- platform confirms `apps-pg` backup coverage for `vergabe_db`;
|
||||
- an isolated restore drill proves the database can be restored;
|
||||
- `vergabe-teilnahme` runs migrations successfully after restore;
|
||||
- health endpoint and HTTPS smoke checks pass;
|
||||
- operator verifies the app can complete a representative tender-management
|
||||
workflow after restore;
|
||||
- any app media path remains disabled or has its own storage restore evidence.
|
||||
|
||||
## Forge Artifact Evidence
|
||||
|
||||
S5 runbooks may cite forge-owned package and blob restore evidence, but must not
|
||||
own Gitea package backup procedures or registry credentials. Use
|
||||
`/home/worsch/railiance-forge/docs/backup-restore-secret-handoff.md` for the
|
||||
forge artifact boundary.
|
||||
|
||||
For app releases, cite:
|
||||
|
||||
- image repository, tag, and digest when available;
|
||||
- source commit and package version;
|
||||
- forge publish job or evidence reference;
|
||||
- package/blob restore drill evidence when the artifact is production-critical;
|
||||
- namespace-local pull Secret or approved workload secret path, without token
|
||||
values.
|
||||
|
||||
## Filing Upstream Gaps
|
||||
|
||||
When the missing durability item is not local to S5:
|
||||
|
||||
1. Keep the S5 task focused on the app release impact.
|
||||
2. Create or link the platform/forge workplan that owns the missing mechanism.
|
||||
3. Mark the S5 task `blocked` only when the app release cannot safely continue
|
||||
without that upstream evidence.
|
||||
4. Record the State Hub workstream/task id in the app runbook or workplan.
|
||||
5. Revisit the S5 promotion gate after upstream evidence exists.
|
||||
94
docs/manifest-server-dry-run.md
Normal file
94
docs/manifest-server-dry-run.md
Normal file
@@ -0,0 +1,94 @@
|
||||
# Manifest Server Dry-Run Prerequisites
|
||||
|
||||
`make k8s-server-dry-run` checks committed manifests and rendered app charts
|
||||
against a Kubernetes API server using server-side dry-run. It catches schema and
|
||||
admission drift that Helm rendering alone cannot see.
|
||||
|
||||
## What The Command Does
|
||||
|
||||
The helper in `tools/k8s-server-dry-run.sh`:
|
||||
|
||||
1. verifies `kubectl` and `helm` are installed;
|
||||
2. verifies a Kubernetes API server is reachable;
|
||||
3. optionally creates the target namespace when
|
||||
`DRY_RUN_CREATE_NAMESPACES=true`;
|
||||
4. renders `charts/vergabe-teilnahme/` with
|
||||
`helm/vergabe-teilnahme-values.yaml`;
|
||||
5. runs `kubectl apply --dry-run=server -f manifests`;
|
||||
6. runs `kubectl apply --dry-run=server` against the rendered chart output.
|
||||
|
||||
The namespace creation step is a real apply, not a dry-run. Use
|
||||
`DRY_RUN_CREATE_NAMESPACES=true` only against a disposable or approved
|
||||
representative cluster where creating the app namespace is acceptable.
|
||||
|
||||
## Representative Cluster Requirement
|
||||
|
||||
The check expects a live Kubernetes API server whose version, admission
|
||||
webhooks, and installed APIs are close enough to Railiance to be meaningful. A
|
||||
pure local render or unseeded kind cluster is not enough.
|
||||
|
||||
The current `vergabe-teilnahme` release uses built-in Kubernetes APIs:
|
||||
|
||||
- `apps/v1` Deployment;
|
||||
- `v1` Service;
|
||||
- `v1` PersistentVolumeClaim when media persistence is enabled;
|
||||
- `networking.k8s.io/v1` Ingress.
|
||||
|
||||
For realistic S5 validation, the representative cluster should also have the
|
||||
same ingress class, cert-manager issuer posture, NetworkPolicy posture, and
|
||||
admission policies as Railiance. Future app manifests that introduce CNPG,
|
||||
cert-manager, Traefik, External Secrets, or other CRDs require those CRDs and
|
||||
webhooks to be installed before the dry-run result is meaningful.
|
||||
|
||||
## Runner And Credential Requirements
|
||||
|
||||
Local operator runs require:
|
||||
|
||||
- `kubectl` context pointed at the representative cluster;
|
||||
- credentials with `get` access for API discovery;
|
||||
- server-side dry-run permission for the rendered resources;
|
||||
- namespace create/apply permission only when
|
||||
`DRY_RUN_CREATE_NAMESPACES=true`.
|
||||
|
||||
CI runs require forge-owned runner prerequisites:
|
||||
|
||||
- a runner label approved for S5 release checks, such as `s5-release-check` or
|
||||
`cluster-dry-run`;
|
||||
- approved kubeconfig or equivalent cluster access delivery;
|
||||
- runner placement that is allowed to reach the representative API server;
|
||||
- no kubeconfig, bearer token, package token, or secret value stored in Git.
|
||||
|
||||
The runner label contract and secret boundary live in
|
||||
`/home/worsch/railiance-forge/docs/ci-runner-actions-gitops-ownership.md`.
|
||||
|
||||
## Failure Classification
|
||||
|
||||
Treat these as release-blocking when the representative cluster and runner are
|
||||
known-good:
|
||||
|
||||
- Helm render fails;
|
||||
- server-side dry-run rejects a changed manifest;
|
||||
- Kubernetes schema or admission policy rejects an app resource;
|
||||
- the rendered image reference or required Secret name is structurally invalid.
|
||||
|
||||
Treat these as prerequisite gaps rather than app release failures:
|
||||
|
||||
- no runner with the required label is available;
|
||||
- the runner cannot reach the representative cluster;
|
||||
- kubeconfig or secret delivery is missing;
|
||||
- required CRDs/admission webhooks are absent from the representative cluster;
|
||||
- namespace creation is forbidden while `DRY_RUN_CREATE_NAMESPACES=true`.
|
||||
|
||||
For prerequisite gaps, link or create the owning forge, cluster, platform, or
|
||||
enablement workplan instead of weakening the S5 app chart.
|
||||
|
||||
## Enforcement Gate
|
||||
|
||||
The workflow in `.gitea/workflows/manifest-server-dry-run.yaml` is ready to
|
||||
enforce PR checks only when:
|
||||
|
||||
- forge has published the runner label and placement evidence;
|
||||
- cluster/platform have provided representative API access and secret delivery;
|
||||
- the namespace side effect is accepted or pre-provisioned;
|
||||
- at least one successful dry-run result is recorded for the current release
|
||||
surface.
|
||||
@@ -38,6 +38,11 @@ make k8s-server-dry-run
|
||||
```
|
||||
|
||||
The command expects a representative Kubernetes API server with the same
|
||||
CRDs as the Railiance cluster. CI should run it against a disposable kind
|
||||
cluster seeded with CNPG, cert-manager, Traefik, and any other CRDs used
|
||||
by changed manifests.
|
||||
APIs, CRDs, admission webhooks, ingress posture, and cert-manager posture as
|
||||
the Railiance cluster. The CI workflow sets `DRY_RUN_CREATE_NAMESPACES=true`,
|
||||
which creates the app namespace before server-side dry-run so namespaced
|
||||
resources can validate. Use that mode only against a disposable or approved
|
||||
representative cluster.
|
||||
|
||||
See `docs/manifest-server-dry-run.md` for runner, credential, and failure
|
||||
classification rules.
|
||||
|
||||
100
docs/s5-app-onboarding-checklist.md
Normal file
100
docs/s5-app-onboarding-checklist.md
Normal file
@@ -0,0 +1,100 @@
|
||||
# S5 App Onboarding Checklist
|
||||
|
||||
Use this checklist when adding a new user-facing Railiance workload to
|
||||
`railiance-apps`. It turns the `vergabe-teilnahme` lessons into a repeatable
|
||||
starting point so new app releases do not have to read the historical
|
||||
workplans first.
|
||||
|
||||
## Scope And Planning
|
||||
|
||||
- [ ] Create or update a repo-local workplan under `workplans/`.
|
||||
- [ ] Confirm the work is S5 app release wiring, not application source code,
|
||||
forge runtime operation, platform service provisioning, or cluster addon work.
|
||||
- [ ] Name the source application repo and the owning package/image release
|
||||
path.
|
||||
- [ ] Record upstream dependencies on `railiance-platform`, `railiance-forge`,
|
||||
or `railiance-enablement` instead of hiding them in app values.
|
||||
- [ ] Run State Hub consistency sync after task-status edits.
|
||||
|
||||
## Release Files
|
||||
|
||||
- [ ] Add a Helm chart under `charts/<app>/`.
|
||||
- [ ] Add non-secret production values under `helm/<app>-values.yaml`.
|
||||
- [ ] Add app-specific manifests under `manifests/` only when they do not fit
|
||||
cleanly in the chart.
|
||||
- [ ] Keep `image.tag` pinned by git SHA or immutable version.
|
||||
- [ ] Keep committed values non-secret. Runtime secrets must come from
|
||||
Kubernetes Secrets, approved SOPS files, or a platform secret-delivery path.
|
||||
- [ ] Add Makefile targets for render, deploy, status, logs, migrations, and
|
||||
any app-specific secret rebuild helpers.
|
||||
|
||||
## Image And Artifact Consumption
|
||||
|
||||
- [ ] Use a forge-owned image or package registry path.
|
||||
- [ ] Link to source-repo publish instructions instead of duplicating build
|
||||
pipelines here.
|
||||
- [ ] Record the source commit, image tag, package version, and evidence needed
|
||||
by the app runbook.
|
||||
- [ ] For private images or packages, name the consuming Secret or approved
|
||||
secret-delivery path without storing tokenized URLs.
|
||||
- [ ] Verify a cluster can pull the image before promoting the release beyond
|
||||
smoke-test use.
|
||||
|
||||
## Database And Runtime Secrets
|
||||
|
||||
- [ ] Request app database, role, and Secret handoff through
|
||||
`railiance-platform` when using shared platform databases.
|
||||
- [ ] Use an app-scoped database and role, for example `<app>_db` and `<app>`.
|
||||
- [ ] Mirror only the app role credential into the app namespace.
|
||||
- [ ] If the app consumes `DATABASE_URL`, URL-encode generated passwords before
|
||||
writing the env Secret.
|
||||
- [ ] Prefer separate PostgreSQL env vars when a framework does not require a
|
||||
single DSN string.
|
||||
- [ ] Document secret rotation commands without printing or committing secret
|
||||
values.
|
||||
|
||||
## Ingress, TLS, And Probes
|
||||
|
||||
- [ ] Name the public host, namespace, Helm release, ingress, and TLS Secret in
|
||||
the app runbook.
|
||||
- [ ] Confirm ingress class and cert-manager issuer ownership with
|
||||
`railiance-cluster`.
|
||||
- [ ] Keep certificate lifecycle in cert-manager, not in app scripts.
|
||||
- [ ] For framework apps with strict host validation, set HTTP probe `Host`
|
||||
headers to a value included in the app's allowed hosts.
|
||||
- [ ] Keep readiness and liveness paths stable and unauthenticated.
|
||||
|
||||
## Validation And Smoke Tests
|
||||
|
||||
- [ ] Run `make check-tools`.
|
||||
- [ ] Run `make <app>-dry-run` or the equivalent Helm render before deploy.
|
||||
- [ ] Run `make k8s-server-dry-run` against a representative cluster before
|
||||
enforcing PR checks.
|
||||
- [ ] Use the persistent-pod plus `kubectl exec` smoke pattern from
|
||||
`docs/operator-recipes.md`.
|
||||
- [ ] Capture app-level deployment evidence: dry-run result, rollout status,
|
||||
HTTPS or service smoke check, migration result when applicable, and rollback
|
||||
note.
|
||||
|
||||
## Runbook Baseline
|
||||
|
||||
Each S5 app runbook should include:
|
||||
|
||||
- identity table with URL, namespace, release, chart, values, ingress, image,
|
||||
database, and TLS Secret;
|
||||
- secrets and rotation section;
|
||||
- day-to-day operator commands;
|
||||
- image promotion steps;
|
||||
- rollback behavior and migration warning;
|
||||
- troubleshooting for probes, database URLs, TLS, and app-specific failure
|
||||
modes;
|
||||
- backup and restore readiness gate;
|
||||
- cross-references to source repo, platform handoff docs, forge artifact docs,
|
||||
and common S5 recipes.
|
||||
|
||||
## Done Gate
|
||||
|
||||
A new app is ready for routine S5 operation when an operator can deploy, verify,
|
||||
roll back, inspect logs, rotate app-owned runtime secrets, understand upstream
|
||||
data durability gates, and sync the workplan without reading old app-specific
|
||||
history.
|
||||
@@ -125,18 +125,26 @@ fail, cert-manager keeps serving the old cert until it expires.
|
||||
Investigate with `kubectl describe certificate vergabe-teilnahme-tls
|
||||
-n vergabe-teilnahme`.
|
||||
|
||||
## Backup posture (open)
|
||||
## Data durability and restore readiness
|
||||
|
||||
The shared `apps-pg` cluster is not yet covered by an automated
|
||||
backup job — only the legacy PostgreSQL-HA setup is. Manual logical
|
||||
dump for now:
|
||||
`vergabe_db` lives on the shared `apps-pg` CNPG cluster owned by
|
||||
`railiance-platform`. S5 owns the app release runbook and post-restore app
|
||||
checks; platform owns the database backup and restore mechanism.
|
||||
|
||||
Current status: `apps-pg` backup coverage is still platform follow-up work, so
|
||||
`vergabe-teilnahme` should not be treated as production-critical data until the
|
||||
gate in `docs/app-data-backup-restore-handoff.md` is satisfied.
|
||||
|
||||
Manual logical dump is a break-glass or inspection option, not the durable
|
||||
backup contract:
|
||||
|
||||
```bash
|
||||
kubectl exec -n databases apps-pg-1 -- pg_dump -U postgres -Fc vergabe_db > vergabe_db-$(date +%F).dump
|
||||
```
|
||||
|
||||
Tracked as a follow-up in `RAILIANCE-WP-0003 Notes` (CNPG backup
|
||||
configuration belongs to `railiance-platform`).
|
||||
Before promotion beyond smoke or development use, record platform backup
|
||||
evidence, an isolated restore drill, migration result, health check, HTTPS
|
||||
smoke check, and representative app workflow verification.
|
||||
|
||||
## Deferred for v1
|
||||
|
||||
@@ -155,6 +163,9 @@ configuration belongs to `railiance-platform`).
|
||||
- Shared DB cluster: `railiance-platform/docs/apps-pg.md`
|
||||
- Container registry: `/home/worsch/railiance-forge/docs/gitea-container-registry.md`
|
||||
- Python package registry: `/home/worsch/railiance-forge/docs/gitea-package-registry.md`
|
||||
- S5 app onboarding checklist: `docs/s5-app-onboarding-checklist.md`
|
||||
- App data backup handoff: `docs/app-data-backup-restore-handoff.md`
|
||||
- Manifest dry-run prerequisites: `docs/manifest-server-dry-run.md`
|
||||
- Django deployment recipe: `docs/django-on-railiance.md`
|
||||
- Operator setup: `docs/operator-setup.md`
|
||||
- Operator recipes: `docs/operator-recipes.md`
|
||||
|
||||
Reference in New Issue
Block a user