Add Railiance Stage 2 deploy observe tooling
Some checks failed
railiance-tests / smoke (push) Has been cancelled

This commit is contained in:
2026-06-27 16:51:02 +02:00
parent 11ceeed03c
commit 9a463e0749
9 changed files with 529 additions and 20 deletions

View File

@@ -77,6 +77,7 @@ From two bare Linux servers, a Git repo, and valid credentials, you can rebuild
- [Railiance app.toml contract](app-toml-contract.md)
- [Railiance overlay repo pattern](overlay-repo-pattern.md)
- [Canary Helm template](canary-helm-template.md)
- [Stage 2 deploy and observe](stage2-deploy-observe.md)
- [Railiance run command](railiance-run-command.md)
## 👥 Contributing

View File

@@ -186,16 +186,17 @@ records only the route, target object, and pass/fail state.
## Command Semantics
Commands in `app.toml` are declarations for future tooling. Until T04-T07
implement the CLI, they may point to existing scripts or runbook commands.
Commands in `app.toml` are declarations for Railiance tooling. Stage 1 and
Stage 2 commands now have local CLI support; Stage 3 commands may still point
to existing scripts or runbook commands until T07 lands.
Expected mapping:
- Stage 1 commands are consumed by `bin/railiance run <app>`.
- Stage 2 commands are consumed by `bin/railiance deploy --stage 2 <app>` and
`bin/railiance observe <app>`.
- Stage 3 commands are consumed by `bin/railiance promote <app>` and
`bin/railiance rollback <app>`.
- Stage 1 commands are consumed by `bin/railiance run <overlay-dir>`.
- Stage 2 commands are consumed by `bin/railiance deploy --stage 2 <overlay-dir>`
and `bin/railiance observe --stage 2 <overlay-dir>`.
- Stage 3 commands are consumed by future `bin/railiance promote <overlay-dir>`
and `bin/railiance rollback <overlay-dir>` commands.
Tooling must emit machine-readable results with workload identity, candidate
revision, checks run, pass/fail status, non-secret evidence, rollback target,

View File

@@ -320,11 +320,11 @@ must not cut over to Stage 3.
Future CLI tasks should make these lifecycle operations repeatable:
```text
bin/railiance run <app> # Stage 1 local validation
bin/railiance deploy --stage 2 <app> # Stage 2 canary deployment
bin/railiance observe <app> # Stage 2/3 evidence collection
bin/railiance promote <app> # Stage 3 production promotion
bin/railiance rollback <app> # rollback to previous stable
bin/railiance run <overlay-dir> # Stage 1 local validation
bin/railiance deploy --stage 2 <overlay-dir> --plan # Stage 2 canary plan
bin/railiance observe --stage 2 <overlay-dir> --plan # Stage 2 evidence targets
bin/railiance promote <overlay-dir> # Stage 3 production promotion
bin/railiance rollback <overlay-dir> # rollback to previous stable
```
The exact command names may change as implementation lands, but the behavior

View File

@@ -0,0 +1,49 @@
# Stage 2 Deploy And Observe
`bin/railiance deploy --stage 2` and `bin/railiance observe --stage 2` provide
the repeatable command path for production canaries declared in
`railiance/app.toml`.
Both commands default to non-mutating plan mode.
## Deploy
```bash
bin/railiance deploy --stage 2 /path/to/overlay --pretty
bin/railiance deploy --stage 2 /path/to/overlay --server-dry-run --pretty
bin/railiance deploy --stage 2 /path/to/overlay --apply --approval-id <state-hub-id>
```
Plan mode validates the local Stage 2 chart and values paths and emits a
`railiance.stage2-deploy-result.v1` JSON plan. It does not contact the cluster.
`--server-dry-run` runs `helm upgrade --install --dry-run=server` when Helm and
cluster access are available. `--apply` runs the Helm canary apply path with
`--atomic --wait`. If Stage 2 declares `requires_approval = true`, apply mode
fails closed unless `--approval-id` is provided.
The result records release identity, namespace, chart path, values path,
expected checks/evidence, precheck status, and command byte counts. It does not
embed Helm or kubectl logs.
## Observe
```bash
bin/railiance observe --stage 2 /path/to/overlay --pretty
bin/railiance observe --stage 2 /path/to/overlay --live --pretty
```
Plan mode emits the rollout, pod selector, ingress selector, health URL, and
metrics targets that live observation will query.
Live mode uses `kubectl` to check rollout status, deployment JSON, canary pods,
ingress/routing resources, and pod metrics when metrics-server is available.
Metrics unavailability is reported separately so a canary can fail for rollout
or readiness problems without hiding missing observability.
## Safety
Stage 2 remains blocked when required local paths are missing, Helm is missing
for dry-run/apply, `kubectl` is missing for live observe, or approval evidence
is missing for an apply that requires approval. Use the emitted JSON as
non-secret evidence in State Hub progress notes.