Add Railiance promote rollback tooling
Some checks failed
railiance-tests / smoke (push) Has been cancelled
Some checks failed
railiance-tests / smoke (push) Has been cancelled
This commit is contained in:
@@ -20,6 +20,8 @@ Commands:
|
||||
run Run Stage 1 local validation from railiance/app.toml
|
||||
deploy Plan/apply Stage 2 canary deployment
|
||||
observe Plan/run Stage 2 observation checks
|
||||
promote Plan/apply Stage 3 stable promotion
|
||||
rollback Plan/apply rollback to previous stable
|
||||
build-spore Build a distributable "Spore" bundle
|
||||
seed-local Run the seed script on this machine
|
||||
checklist Pre-VM checklist
|
||||
@@ -47,6 +49,8 @@ case "$cmd" in
|
||||
run) exec railiance-run "$@" ;;
|
||||
deploy) exec railiance-stage2 deploy "$@" ;;
|
||||
observe) exec railiance-stage2 observe "$@" ;;
|
||||
promote) exec railiance-stage3 promote "$@" ;;
|
||||
rollback) exec railiance-stage3 rollback "$@" ;;
|
||||
build-spore) bash "$ROOT/tools/build_spore.sh" ;;
|
||||
seed-local) bash "$ROOT/tools/seed_node.sh" ;;
|
||||
checklist)
|
||||
|
||||
@@ -78,6 +78,7 @@ From two bare Linux servers, a Git repo, and valid credentials, you can rebuild
|
||||
- [Railiance overlay repo pattern](overlay-repo-pattern.md)
|
||||
- [Canary Helm template](canary-helm-template.md)
|
||||
- [Stage 2 deploy and observe](stage2-deploy-observe.md)
|
||||
- [Promote, rollback, and onboarding](promote-rollback-onboarding.md)
|
||||
- [Railiance run command](railiance-run-command.md)
|
||||
|
||||
## 👥 Contributing
|
||||
|
||||
@@ -186,17 +186,17 @@ records only the route, target object, and pass/fail state.
|
||||
|
||||
## Command Semantics
|
||||
|
||||
Commands in `app.toml` are declarations for Railiance tooling. Stage 1 and
|
||||
Stage 2 commands now have local CLI support; Stage 3 commands may still point
|
||||
to existing scripts or runbook commands until T07 lands.
|
||||
Commands in `app.toml` are declarations for Railiance tooling. Stage 1, Stage
|
||||
2, and Stage 3 commands now have local CLI support; workload scripts may still
|
||||
wrap them for service-specific checks.
|
||||
|
||||
Expected mapping:
|
||||
|
||||
- Stage 1 commands are consumed by `bin/railiance run <overlay-dir>`.
|
||||
- Stage 2 commands are consumed by `bin/railiance deploy --stage 2 <overlay-dir>`
|
||||
and `bin/railiance observe --stage 2 <overlay-dir>`.
|
||||
- Stage 3 commands are consumed by future `bin/railiance promote <overlay-dir>`
|
||||
and `bin/railiance rollback <overlay-dir>` commands.
|
||||
- Stage 3 commands are consumed by `bin/railiance promote <overlay-dir>` and
|
||||
`bin/railiance rollback <overlay-dir>`.
|
||||
|
||||
Tooling must emit machine-readable results with workload identity, candidate
|
||||
revision, checks run, pass/fail status, non-secret evidence, rollback target,
|
||||
|
||||
@@ -317,14 +317,14 @@ must not cut over to Stage 3.
|
||||
|
||||
## Minimum Command Contract
|
||||
|
||||
Future CLI tasks should make these lifecycle operations repeatable:
|
||||
The Railiance CLI makes these lifecycle operations repeatable:
|
||||
|
||||
```text
|
||||
bin/railiance run <overlay-dir> # Stage 1 local validation
|
||||
bin/railiance deploy --stage 2 <overlay-dir> --plan # Stage 2 canary plan
|
||||
bin/railiance observe --stage 2 <overlay-dir> --plan # Stage 2 evidence targets
|
||||
bin/railiance promote <overlay-dir> # Stage 3 production promotion
|
||||
bin/railiance rollback <overlay-dir> # rollback to previous stable
|
||||
bin/railiance run <overlay-dir> # Stage 1 local validation
|
||||
bin/railiance deploy --stage 2 <overlay-dir> --plan # Stage 2 canary plan
|
||||
bin/railiance observe --stage 2 <overlay-dir> --plan # Stage 2 evidence targets
|
||||
bin/railiance promote <overlay-dir> --plan # Stage 3 production promotion
|
||||
bin/railiance rollback <overlay-dir> --plan # rollback to previous stable
|
||||
```
|
||||
|
||||
The exact command names may change as implementation lands, but the behavior
|
||||
|
||||
71
docs/promote-rollback-onboarding.md
Normal file
71
docs/promote-rollback-onboarding.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# Promote, Rollback, And Onboarding
|
||||
|
||||
This guide shows the representative Railiance lifecycle for an overlay repo.
|
||||
Commands default to plan mode so the path is repeatable before cluster access or
|
||||
operator approval exists.
|
||||
|
||||
## Stage 1
|
||||
|
||||
```bash
|
||||
bin/railiance run /path/to/overlay --pretty
|
||||
```
|
||||
|
||||
Stage 1 validates `railiance/app.toml`, local commands, and local checks. Save
|
||||
the JSON result as non-secret evidence before Stage 2.
|
||||
|
||||
## Stage 2
|
||||
|
||||
```bash
|
||||
bin/railiance deploy --stage 2 /path/to/overlay --plan --pretty
|
||||
bin/railiance observe --stage 2 /path/to/overlay --plan --pretty
|
||||
```
|
||||
|
||||
When Helm, kubectl, cluster access, and approval evidence are ready:
|
||||
|
||||
```bash
|
||||
bin/railiance deploy --stage 2 /path/to/overlay --apply --approval-id <state-hub-id>
|
||||
bin/railiance observe --stage 2 /path/to/overlay --live --pretty
|
||||
```
|
||||
|
||||
For critical workloads, Stage 2 apply must not run until the operator has
|
||||
approved canary exposure and rollback context is known.
|
||||
|
||||
## Stage 3
|
||||
|
||||
```bash
|
||||
bin/railiance promote /path/to/overlay --plan --pretty
|
||||
bin/railiance rollback /path/to/overlay --plan --pretty
|
||||
```
|
||||
|
||||
Promotion plan mode emits a `railiance.stage3-promote-result.v1` JSON result
|
||||
with stable release identity, chart and values paths, previous-stable target,
|
||||
expected evidence, and approval requirements.
|
||||
|
||||
Rollback plan mode emits a `railiance.stage3-rollback-result.v1` JSON result
|
||||
with rollback strategy, release identity, verification text, and apply-time
|
||||
requirements.
|
||||
|
||||
When approval evidence and Helm access are ready:
|
||||
|
||||
```bash
|
||||
bin/railiance promote /path/to/overlay --apply --approval-id <state-hub-id>
|
||||
bin/railiance rollback /path/to/overlay --apply --approval-id <state-hub-id> --revision <helm-revision>
|
||||
```
|
||||
|
||||
Stage 3 apply fails closed if the chart or values are missing, previous stable
|
||||
is not recorded, Helm is unavailable, or approval evidence is missing. Rollback
|
||||
apply fails closed if the rollback strategy is missing, Helm is unavailable,
|
||||
approval evidence is missing, or a Helm revision is required but absent.
|
||||
|
||||
## Human Approval Points
|
||||
|
||||
Critical infrastructure workloads require explicit operator approval before:
|
||||
|
||||
- Stage 2 canary exposure;
|
||||
- Stage 3 stable promotion;
|
||||
- rollback apply, unless an incident runbook defines a narrower break-glass
|
||||
process and records the evidence id.
|
||||
|
||||
Progress notes should include only non-secret result summaries: schema version,
|
||||
status, release, namespace, approval id, check counts, and command byte counts.
|
||||
Do not paste command logs, kubeconfigs, tokens, or private service output.
|
||||
@@ -60,6 +60,11 @@ This model emphasizes:
|
||||
---
|
||||
|
||||
|
||||
|
||||
### `railiance-stage3`
|
||||
- Backs `bin/railiance promote` and `bin/railiance rollback`.
|
||||
- Emits non-secret JSON plans/results for stable promotion and rollback.
|
||||
|
||||
### `railiance-stage2`
|
||||
- Backs `bin/railiance deploy --stage 2` and `bin/railiance observe --stage 2`.
|
||||
- Emits non-secret JSON plans/results for canary deployment and observation.
|
||||
|
||||
377
tools/cmd/railiance-stage3
Executable file
377
tools/cmd/railiance-stage3
Executable file
@@ -0,0 +1,377 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Railiance Stage 3 promote and rollback tooling."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import tomllib
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
SUPPORTED_SCHEMA = "railiance.app.v1"
|
||||
|
||||
|
||||
def utc_now() -> str:
|
||||
return datetime.now(UTC).replace(microsecond=0).isoformat().replace("+00:00", "Z")
|
||||
|
||||
|
||||
def load_contract(app_dir: Path) -> tuple[Path, dict[str, Any]]:
|
||||
contract_path = app_dir / "railiance" / "app.toml"
|
||||
if not contract_path.exists():
|
||||
raise SystemExit(f"Missing Railiance contract: {contract_path}")
|
||||
with contract_path.open("rb") as handle:
|
||||
data = tomllib.load(handle)
|
||||
if data.get("schema_version") != SUPPORTED_SCHEMA:
|
||||
raise SystemExit(
|
||||
f"Unsupported schema_version {data.get('schema_version')!r}; expected {SUPPORTED_SCHEMA}"
|
||||
)
|
||||
return contract_path, data
|
||||
|
||||
|
||||
def app_identity(data: dict[str, Any]) -> dict[str, Any]:
|
||||
app = data.get("app", {})
|
||||
source = data.get("source", {})
|
||||
return {
|
||||
"app": {
|
||||
"id": app.get("id"),
|
||||
"name": app.get("name"),
|
||||
"repo": app.get("repo"),
|
||||
"owner": app.get("owner"),
|
||||
"criticality": app.get("criticality"),
|
||||
},
|
||||
"source": {
|
||||
"revision": source.get("revision"),
|
||||
"artifact": source.get("artifact"),
|
||||
"digest_policy": source.get("digest_policy"),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def checks_by_id(data: dict[str, Any]) -> dict[str, dict[str, Any]]:
|
||||
return {check.get("id"): check for check in data.get("checks", [])}
|
||||
|
||||
|
||||
def stage_checks(data: dict[str, Any], stage_name: str) -> list[dict[str, Any]]:
|
||||
stage = data.get("stages", {}).get(stage_name, {})
|
||||
lookup = checks_by_id(data)
|
||||
return [lookup[item] for item in stage.get("checks", []) if item in lookup]
|
||||
|
||||
|
||||
def stage2_helm_check(data: dict[str, Any]) -> dict[str, Any] | None:
|
||||
for check in stage_checks(data, "stage2"):
|
||||
if check.get("type") == "helm":
|
||||
return check
|
||||
return None
|
||||
|
||||
|
||||
def precheck(name: str, status: str, required: bool, detail: str | None = None) -> dict[str, Any]:
|
||||
item: dict[str, Any] = {"name": name, "status": status, "required": required}
|
||||
if detail:
|
||||
item["detail"] = detail
|
||||
return item
|
||||
|
||||
|
||||
def required_failures(items: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||
return [item for item in items if item.get("required", True) and item.get("status") != "passed"]
|
||||
|
||||
|
||||
def run_command(args: list[str], cwd: Path, timeout: int, command_ref: str) -> dict[str, Any]:
|
||||
started = time.monotonic()
|
||||
try:
|
||||
completed = subprocess.run(
|
||||
args,
|
||||
cwd=cwd,
|
||||
text=True,
|
||||
capture_output=True,
|
||||
timeout=timeout,
|
||||
check=False,
|
||||
)
|
||||
return {
|
||||
"command_ref": command_ref,
|
||||
"status": "passed" if completed.returncode == 0 else "failed",
|
||||
"exit_code": completed.returncode,
|
||||
"duration_seconds": round(time.monotonic() - started, 3),
|
||||
"stdout_bytes": len(completed.stdout.encode()),
|
||||
"stderr_bytes": len(completed.stderr.encode()),
|
||||
}
|
||||
except subprocess.TimeoutExpired as exc:
|
||||
stdout = exc.stdout if isinstance(exc.stdout, str) else ""
|
||||
stderr = exc.stderr if isinstance(exc.stderr, str) else ""
|
||||
return {
|
||||
"command_ref": command_ref,
|
||||
"status": "failed",
|
||||
"exit_code": None,
|
||||
"duration_seconds": round(time.monotonic() - started, 3),
|
||||
"error": f"timeout after {timeout}s",
|
||||
"stdout_bytes": len(stdout.encode()),
|
||||
"stderr_bytes": len(stderr.encode()),
|
||||
}
|
||||
|
||||
|
||||
def stage3_context(app_dir: Path, contract_path: Path, data: dict[str, Any]) -> dict[str, Any]:
|
||||
stage = data.get("stages", {}).get("stage3", {})
|
||||
if not stage.get("enabled", False):
|
||||
raise SystemExit("Stage 3 is disabled in railiance/app.toml")
|
||||
app = data.get("app", {})
|
||||
helm = stage2_helm_check(data) or {}
|
||||
chart = app_dir / str(helm.get("chart", f"charts/{app.get('id', 'app')}"))
|
||||
values = app_dir / "values" / "stage3-production.yaml"
|
||||
release = str(stage.get("release", app.get("id", "app")))
|
||||
namespace = str(stage.get("namespace", app.get("id", "default")))
|
||||
context = {
|
||||
"contract": str(contract_path),
|
||||
"app_dir": str(app_dir),
|
||||
"stage": "stage3",
|
||||
"namespace": namespace,
|
||||
"release": release,
|
||||
"chart": str(chart),
|
||||
"values": str(values),
|
||||
"promotion_mode": stage.get("promotion_mode"),
|
||||
"previous_stable": stage.get("previous_stable"),
|
||||
"requires_approval": bool(stage.get("requires_approval", False)),
|
||||
"evidence_expected": list(stage.get("evidence", [])),
|
||||
"checks_expected": list(stage.get("checks", [])),
|
||||
}
|
||||
context.update(app_identity(data))
|
||||
return context
|
||||
|
||||
|
||||
def rollback_context(app_dir: Path, contract_path: Path, data: dict[str, Any]) -> dict[str, Any]:
|
||||
context = stage3_context(app_dir, contract_path, data)
|
||||
rollback = data.get("rollback", {})
|
||||
context["rollback"] = {
|
||||
"strategy": rollback.get("strategy"),
|
||||
"command_ref": "rollback.command",
|
||||
"verification": rollback.get("verification"),
|
||||
}
|
||||
return context
|
||||
|
||||
|
||||
def promote_prechecks(app_dir: Path, context: dict[str, Any], mode: str, approval_id: str | None) -> list[dict[str, Any]]:
|
||||
checks = [precheck("app.toml", "passed", True)]
|
||||
chart = Path(context["chart"])
|
||||
values = Path(context["values"])
|
||||
checks.append(precheck("stage3-chart", "passed" if chart.exists() else "failed", True, str(chart)))
|
||||
checks.append(precheck("stage3-values", "passed" if values.exists() else "failed", True, str(values)))
|
||||
checks.append(
|
||||
precheck(
|
||||
"previous-stable",
|
||||
"passed" if context.get("previous_stable") else "failed",
|
||||
True,
|
||||
"Stage 3 must record the rollback target before promotion",
|
||||
)
|
||||
)
|
||||
if mode == "apply":
|
||||
checks.append(precheck("helm", "passed" if shutil.which("helm") else "failed", True, "helm executable"))
|
||||
else:
|
||||
checks.append(precheck("helm", "not_required", False, "plan mode does not execute helm"))
|
||||
if mode == "apply" and context.get("requires_approval"):
|
||||
checks.append(
|
||||
precheck(
|
||||
"approval-id",
|
||||
"passed" if approval_id else "failed",
|
||||
True,
|
||||
"Stage 3 requires approval before stable promotion",
|
||||
)
|
||||
)
|
||||
elif context.get("requires_approval"):
|
||||
checks.append(precheck("approval-id", "required_before_apply", False))
|
||||
return checks
|
||||
|
||||
|
||||
def rollback_prechecks(context: dict[str, Any], mode: str, approval_id: str | None, revision: str | None) -> list[dict[str, Any]]:
|
||||
checks = [precheck("app.toml", "passed", True)]
|
||||
strategy = context.get("rollback", {}).get("strategy")
|
||||
checks.append(precheck("rollback-strategy", "passed" if strategy else "failed", True, str(strategy or "")))
|
||||
if mode == "apply":
|
||||
checks.append(precheck("helm", "passed" if shutil.which("helm") else "failed", True, "helm executable"))
|
||||
checks.append(
|
||||
precheck(
|
||||
"approval-id",
|
||||
"passed" if approval_id else "failed",
|
||||
True,
|
||||
"Rollback apply requires approval or incident evidence",
|
||||
)
|
||||
)
|
||||
if strategy == "helm-revision":
|
||||
checks.append(precheck("helm-revision", "passed" if revision else "failed", True))
|
||||
else:
|
||||
checks.append(precheck("helm", "not_required", False, "plan mode does not execute helm"))
|
||||
checks.append(precheck("approval-id", "required_before_apply", False))
|
||||
if strategy == "helm-revision":
|
||||
checks.append(precheck("helm-revision", "required_before_apply", False))
|
||||
return checks
|
||||
|
||||
|
||||
def promote_args(context: dict[str, Any], timeout: int) -> list[str]:
|
||||
return [
|
||||
"helm",
|
||||
"upgrade",
|
||||
"--install",
|
||||
context["release"],
|
||||
context["chart"],
|
||||
"--namespace",
|
||||
context["namespace"],
|
||||
"--create-namespace",
|
||||
"-f",
|
||||
context["values"],
|
||||
"--atomic",
|
||||
"--wait",
|
||||
"--timeout",
|
||||
f"{timeout}m",
|
||||
]
|
||||
|
||||
|
||||
def rollback_args(context: dict[str, Any], revision: str, timeout: int) -> list[str]:
|
||||
return [
|
||||
"helm",
|
||||
"rollback",
|
||||
context["release"],
|
||||
revision,
|
||||
"--namespace",
|
||||
context["namespace"],
|
||||
"--wait",
|
||||
"--timeout",
|
||||
f"{timeout}m",
|
||||
]
|
||||
|
||||
|
||||
def promote(argv: list[str]) -> int:
|
||||
parser = argparse.ArgumentParser(description="Plan or apply a Stage 3 stable promotion.")
|
||||
parser.add_argument("app_dir", nargs="?", default=".")
|
||||
parser.add_argument("--mode", choices=["plan", "apply"], default="plan")
|
||||
parser.add_argument("--plan", action="store_const", const="plan", dest="mode")
|
||||
parser.add_argument("--apply", action="store_const", const="apply", dest="mode")
|
||||
parser.add_argument("--approval-id")
|
||||
parser.add_argument("--timeout-minutes", type=int, default=10)
|
||||
parser.add_argument("--json-out")
|
||||
parser.add_argument("--pretty", action="store_true")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
app_dir = Path(args.app_dir).resolve()
|
||||
contract_path, data = load_contract(app_dir)
|
||||
context = stage3_context(app_dir, contract_path, data)
|
||||
checks = promote_prechecks(app_dir, context, args.mode, args.approval_id)
|
||||
failures = required_failures(checks)
|
||||
actions: list[dict[str, Any]] = []
|
||||
status = "planned" if not failures else "blocked"
|
||||
if args.mode == "apply" and not failures:
|
||||
action = run_command(promote_args(context, args.timeout_minutes), app_dir, args.timeout_minutes * 60, "stage3.helm-promote")
|
||||
actions.append(action)
|
||||
status = "applied" if action.get("status") == "passed" else "failed"
|
||||
result: dict[str, Any] = {
|
||||
"schema_version": "railiance.stage3-promote-result.v1",
|
||||
"status": status,
|
||||
"mode": args.mode,
|
||||
"generated_at": utc_now(),
|
||||
**context,
|
||||
"approval_id": args.approval_id,
|
||||
"prechecks": checks,
|
||||
"actions": actions,
|
||||
"planned_actions": [
|
||||
{
|
||||
"action_ref": "stage3.helm-promote",
|
||||
"tool": "helm",
|
||||
"release": context["release"],
|
||||
"namespace": context["namespace"],
|
||||
"chart": context["chart"],
|
||||
"values": context["values"],
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"required_prechecks_failed": len(failures),
|
||||
"actions_total": len(actions),
|
||||
"actions_failed": len([item for item in actions if item.get("status") != "passed"]),
|
||||
},
|
||||
}
|
||||
return emit(result, args.json_out, args.pretty, {"planned", "applied"})
|
||||
|
||||
|
||||
def rollback(argv: list[str]) -> int:
|
||||
parser = argparse.ArgumentParser(description="Plan or apply a rollback to the previous stable release.")
|
||||
parser.add_argument("app_dir", nargs="?", default=".")
|
||||
parser.add_argument("--mode", choices=["plan", "apply"], default="plan")
|
||||
parser.add_argument("--plan", action="store_const", const="plan", dest="mode")
|
||||
parser.add_argument("--apply", action="store_const", const="apply", dest="mode")
|
||||
parser.add_argument("--approval-id")
|
||||
parser.add_argument("--revision", help="Helm revision to roll back to for helm-revision strategy.")
|
||||
parser.add_argument("--timeout-minutes", type=int, default=10)
|
||||
parser.add_argument("--json-out")
|
||||
parser.add_argument("--pretty", action="store_true")
|
||||
args = parser.parse_args(argv)
|
||||
|
||||
app_dir = Path(args.app_dir).resolve()
|
||||
contract_path, data = load_contract(app_dir)
|
||||
context = rollback_context(app_dir, contract_path, data)
|
||||
checks = rollback_prechecks(context, args.mode, args.approval_id, args.revision)
|
||||
failures = required_failures(checks)
|
||||
actions: list[dict[str, Any]] = []
|
||||
status = "planned" if not failures else "blocked"
|
||||
if args.mode == "apply" and not failures:
|
||||
action = run_command(
|
||||
rollback_args(context, str(args.revision), args.timeout_minutes),
|
||||
app_dir,
|
||||
args.timeout_minutes * 60,
|
||||
"stage3.helm-rollback",
|
||||
)
|
||||
actions.append(action)
|
||||
status = "applied" if action.get("status") == "passed" else "failed"
|
||||
result: dict[str, Any] = {
|
||||
"schema_version": "railiance.stage3-rollback-result.v1",
|
||||
"status": status,
|
||||
"mode": args.mode,
|
||||
"generated_at": utc_now(),
|
||||
**context,
|
||||
"approval_id": args.approval_id,
|
||||
"revision": args.revision,
|
||||
"prechecks": checks,
|
||||
"actions": actions,
|
||||
"planned_actions": [
|
||||
{
|
||||
"action_ref": "stage3.helm-rollback",
|
||||
"tool": "helm",
|
||||
"release": context["release"],
|
||||
"namespace": context["namespace"],
|
||||
"revision": args.revision,
|
||||
}
|
||||
],
|
||||
"summary": {
|
||||
"required_prechecks_failed": len(failures),
|
||||
"actions_total": len(actions),
|
||||
"actions_failed": len([item for item in actions if item.get("status") != "passed"]),
|
||||
},
|
||||
}
|
||||
return emit(result, args.json_out, args.pretty, {"planned", "applied"})
|
||||
|
||||
|
||||
def emit(result: dict[str, Any], json_out: str | None, pretty: bool, success_statuses: set[str]) -> int:
|
||||
rendered = json.dumps(result, indent=2 if pretty else None, sort_keys=True)
|
||||
print(rendered)
|
||||
if json_out:
|
||||
output = Path(json_out)
|
||||
output.parent.mkdir(parents=True, exist_ok=True)
|
||||
output.write_text(rendered + "\n", encoding="utf-8")
|
||||
return 0 if result["status"] in success_statuses else 1
|
||||
|
||||
|
||||
def main(argv: list[str]) -> int:
|
||||
if not argv:
|
||||
print("Usage: railiance-stage3 <promote|rollback> [args]", file=sys.stderr)
|
||||
return 2
|
||||
command = argv[0]
|
||||
if command == "promote":
|
||||
return promote(argv[1:])
|
||||
if command == "rollback":
|
||||
return rollback(argv[1:])
|
||||
print(f"Unknown Stage 3 command: {command}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main(sys.argv[1:]))
|
||||
@@ -152,7 +152,7 @@ digest_policy = "preferred"
|
||||
|
||||
[rollback]
|
||||
strategy = "helm-revision"
|
||||
command = "bin/railiance rollback ${APP_ID}"
|
||||
command = "railiance rollback . --plan"
|
||||
verification = "Stable release health check returns 200 after rollback."
|
||||
|
||||
[platform]
|
||||
@@ -197,7 +197,7 @@ observation_minutes = 30
|
||||
enabled = true
|
||||
namespace = "${APP_ID}"
|
||||
release = "${APP_ID}"
|
||||
commands = ["bin/railiance promote ${APP_ID}", "bin/railiance observe ${APP_ID}"]
|
||||
commands = ["railiance promote . --plan", "railiance rollback . --plan"]
|
||||
checks = ["stage2-accepted", "rollback-target", "cluster-health"]
|
||||
evidence = ["promotion command id", "new stable digest", "post-promotion smoke"]
|
||||
requires_approval = true
|
||||
@@ -748,7 +748,9 @@ change \`railiance.traffic.mode\` to \`weighted\`, set \`provider: traefik\`,
|
||||
and choose explicit stable/canary weights in \`values/stage2-canary.yaml\`.
|
||||
|
||||
Before Stage 2 apply, fill in real image repositories, platform dependencies,
|
||||
observability endpoints, rollback target details, and approval evidence.
|
||||
observability endpoints, rollback target details, and approval evidence. Before
|
||||
Stage 3, run \`railiance promote . --plan\` and \`railiance rollback . --plan\`
|
||||
so stable promotion and rollback evidence can be reviewed together.
|
||||
EOF
|
||||
|
||||
cat > "${OUT_DIR}/.gitignore" <<'EOF'
|
||||
|
||||
@@ -224,7 +224,7 @@ generated overlays to declare the repeatable Stage 2 plan commands.
|
||||
|
||||
```task
|
||||
id: RAIL-BS-WP-0006-T07
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "476198f6-0049-4ac4-9593-6723c86c9602"
|
||||
```
|
||||
@@ -242,6 +242,15 @@ Expected output:
|
||||
**Done when:** a representative app can move Stage 1 -> Stage 2 -> Stage 3 and
|
||||
back through rollback using documented commands.
|
||||
|
||||
2026-06-27: Added `tools/cmd/railiance-stage3` and dispatcher entries for
|
||||
`bin/railiance promote` and `bin/railiance rollback`. Both commands default to
|
||||
non-mutating JSON plans, apply modes require approval evidence and Helm, and
|
||||
rollback apply also requires a Helm revision for `helm-revision` strategy.
|
||||
Added `docs/promote-rollback-onboarding.md` with the representative Stage 1 ->
|
||||
Stage 2 -> Stage 3 -> rollback path and explicit human approval points for
|
||||
critical workloads. Updated generated overlays to declare promote/rollback plan
|
||||
commands.
|
||||
|
||||
## Dependencies
|
||||
|
||||
This workplan should be done before the Forgejo production cutover. It can run
|
||||
|
||||
Reference in New Issue
Block a user