234 lines
8.0 KiB
Markdown
234 lines
8.0 KiB
Markdown
# Railiance app.toml Contract
|
|
|
|
This document defines the repository-local `railiance/app.toml` contract used by
|
|
Railiance staged promotion tooling. The file tells Railiance how a workload
|
|
moves through Stage 1 local validation, Stage 2 production canary, and Stage 3
|
|
production promotion without relying on bespoke operator notes.
|
|
|
|
The contract is intentionally declarative. Commands, health checks, platform
|
|
dependencies, and secret references are described by stable names. Plaintext
|
|
secrets, bearer tokens, kubeconfigs, and private key material must never appear
|
|
in `railiance/app.toml`.
|
|
|
|
The machine-readable schema lives at `schemas/railiance-app.schema.json`. A
|
|
minimal example lives at `examples/railiance/app.toml`.
|
|
|
|
## File Location
|
|
|
|
Participating workload repositories declare the contract at:
|
|
|
|
```text
|
|
railiance/app.toml
|
|
```
|
|
|
|
Overlay repositories for third-party applications use the same path in the
|
|
overlay repo, not in the upstream source repository.
|
|
|
|
## Versioning
|
|
|
|
Every file must include:
|
|
|
|
```toml
|
|
schema_version = "railiance.app.v1"
|
|
```
|
|
|
|
Breaking contract changes require a new schema version. Tooling must fail closed
|
|
when it sees an unsupported `schema_version`.
|
|
|
|
## Top-Level Sections
|
|
|
|
### app
|
|
|
|
Identifies the workload and its ownership boundary.
|
|
|
|
Required fields:
|
|
|
|
- `id`: stable lowercase id using letters, numbers, and hyphens.
|
|
- `name`: human-readable workload name.
|
|
- `repo`: owning source or overlay repository slug.
|
|
- `owner`: owning team, domain, or operator group.
|
|
- `criticality`: one of `low`, `medium`, `high`, or `critical`.
|
|
- `description`: short purpose statement.
|
|
|
|
Production-critical workloads include source forge, identity, State Hub,
|
|
Inter-Hub, databases, object stores, backup systems, ingress, and cluster-wide
|
|
policy controllers. For those workloads, `criticality = "critical"` requires
|
|
explicit human approval before Stage 2 traffic exposure and Stage 3 promotion.
|
|
|
|
### source
|
|
|
|
Identifies the candidate under promotion.
|
|
|
|
Required fields:
|
|
|
|
- `revision`: commit id, tag, or immutable source revision expression.
|
|
- `artifact`: artifact kind, normally `image`, `helm-chart`, or `bundle`.
|
|
- `digest_policy`: one of `required`, `preferred`, or `not-applicable`.
|
|
|
|
If an image is promoted, Stage 2 and Stage 3 tooling should prefer immutable
|
|
image digests over mutable tags.
|
|
|
|
### platform.dependencies
|
|
|
|
Declares platform services required before canary or production promotion.
|
|
|
|
Each dependency has:
|
|
|
|
- `name`: stable service name.
|
|
- `kind`: dependency kind such as `postgres`, `redis`, `object-store`,
|
|
`identity`, `state-hub`, `inter-hub`, `network`, or `other`.
|
|
- `required`: boolean.
|
|
- `stage`: earliest stage that needs it, one of `stage1`, `stage2`, `stage3`.
|
|
- `evidence`: non-secret evidence expected before promotion, such as a health
|
|
endpoint result, Kubernetes Ready condition, or State Hub progress id.
|
|
|
|
### secrets.references
|
|
|
|
Declares required secret references without secret values.
|
|
|
|
Each reference has:
|
|
|
|
- `name`: workload-local secret name.
|
|
- `route`: approved credential route id, for example `openbao-api-key`,
|
|
`key-cape-oidc-login`, or `activity-core-issue-sink`.
|
|
- `target`: non-secret target reference such as a Kubernetes Secret name,
|
|
ExternalSecret name, OpenBao path, or environment variable name.
|
|
- `stage`: earliest stage that needs the secret.
|
|
- `required`: boolean.
|
|
|
|
Forbidden fields include plaintext values, tokens, passwords, kubeconfigs, or
|
|
private keys. Tooling must reject suspicious field names such as `value`,
|
|
`token`, `password`, `secret`, `private_key`, or `kubeconfig` inside secret
|
|
reference objects unless they are part of the approved non-secret `target` text.
|
|
|
|
### observability
|
|
|
|
Defines how promotion tooling proves the workload is alive and observable.
|
|
|
|
Required fields:
|
|
|
|
- `health_endpoints`: one or more HTTP health endpoint declarations.
|
|
- `metrics`: optional metrics endpoint or query references.
|
|
- `logs`: optional log selectors or query references.
|
|
|
|
Health endpoint declarations include `name`, `url`, `stage`, and expected
|
|
status code. URLs may be internal service URLs for Stage 2/3; they must not
|
|
embed credentials.
|
|
|
|
### rollback
|
|
|
|
Defines how the workload returns to a previous stable state.
|
|
|
|
Required fields:
|
|
|
|
- `strategy`: one of `helm-revision`, `image-digest`, `traffic-shift`,
|
|
`manual-runbook`, or `none`.
|
|
- `command`: command name or runbook path. This may be a placeholder before
|
|
T07 implements automation, but it must tell the operator where rollback lives.
|
|
- `verification`: non-secret check to confirm rollback succeeded.
|
|
|
|
`strategy = "none"` is allowed only for Stage 1-only workloads and must not be
|
|
used for production-critical workloads.
|
|
|
|
## Stage Sections
|
|
|
|
The contract has one table for each stage:
|
|
|
|
```toml
|
|
[stages.stage1]
|
|
[stages.stage2]
|
|
[stages.stage3]
|
|
```
|
|
|
|
Each stage includes:
|
|
|
|
- `enabled`: boolean.
|
|
- `namespace`: target Kubernetes namespace, or a local namespace for Stage 1.
|
|
- `release`: release identity.
|
|
- `commands`: ordered command aliases or shell commands that tooling may run.
|
|
- `checks`: ordered check ids to evaluate.
|
|
- `evidence`: expected non-secret evidence outputs.
|
|
- `requires_approval`: boolean.
|
|
|
|
Stage 2 additionally includes `canary_mode`, one of `weighted`, `header`,
|
|
`path`, `shadow`, or `isolated`, plus `observation_minutes` and optional
|
|
`traffic_percent` when weighted routing is used.
|
|
|
|
Stage 3 additionally includes `promotion_mode`, one of `traffic-shift`,
|
|
`release-replace`, `selector-switch`, or `workflow`, plus `previous_stable`.
|
|
|
|
## Check Definitions
|
|
|
|
Checks live under `[[checks]]` entries and are referenced by stage `checks`.
|
|
|
|
Required fields:
|
|
|
|
- `id`: stable check id.
|
|
- `type`: one of `command`, `http`, `kubernetes`, `helm`, `metric`, `log`, or
|
|
`manual`.
|
|
- `stage`: earliest stage that may run the check.
|
|
- `description`: human-readable purpose.
|
|
- `required`: boolean.
|
|
|
|
Type-specific fields:
|
|
|
|
- `command`: `run` command string and optional `timeout_seconds`.
|
|
- `http`: `url`, `expected_status`, and optional `timeout_seconds`.
|
|
- `kubernetes`: `namespace`, `resource`, and `condition`.
|
|
- `helm`: `chart`, `values`, and `mode` such as `template` or
|
|
`server-dry-run`.
|
|
- `metric`: `query`, `window_minutes`, and `threshold`.
|
|
- `log`: `selector`, `window_minutes`, and `forbidden_patterns`.
|
|
- `manual`: `evidence_required` text.
|
|
|
|
Checks must not print secrets. If a check needs secret-backed access, the result
|
|
records only the route, target object, and pass/fail state.
|
|
|
|
## Command Semantics
|
|
|
|
Commands in `app.toml` are declarations for future tooling. Until T04-T07
|
|
implement the CLI, they may point to existing scripts or runbook commands.
|
|
|
|
Expected mapping:
|
|
|
|
- Stage 1 commands are consumed by `bin/railiance run <app>`.
|
|
- Stage 2 commands are consumed by `bin/railiance deploy --stage 2 <app>` and
|
|
`bin/railiance observe <app>`.
|
|
- Stage 3 commands are consumed by `bin/railiance promote <app>` and
|
|
`bin/railiance rollback <app>`.
|
|
|
|
Tooling must emit machine-readable results with workload identity, candidate
|
|
revision, checks run, pass/fail status, non-secret evidence, rollback target,
|
|
and approval state.
|
|
|
|
## Minimal Example
|
|
|
|
See `examples/railiance/app.toml`. It declares a critical internal service with:
|
|
|
|
- immutable image digest requirement;
|
|
- Stage 1 local validation;
|
|
- Stage 2 isolated canary;
|
|
- Stage 3 release replacement;
|
|
- OpenBao-routed secret references without values;
|
|
- HTTP, Helm, Kubernetes, and manual approval checks.
|
|
|
|
## Adoption Rules
|
|
|
|
A workload can enter Stage 1 when `app.toml` passes schema validation and all
|
|
Stage 1 required checks are declared.
|
|
|
|
A workload can enter Stage 2 only when:
|
|
|
|
- Stage 1 passed for the same candidate artifact;
|
|
- Stage 2 namespace, release, canary mode, health checks, dependencies, and
|
|
rollback target are declared;
|
|
- secret references use approved routes and contain no values;
|
|
- production-critical workloads have explicit approval.
|
|
|
|
A workload can enter Stage 3 only when:
|
|
|
|
- Stage 2 acceptance gates passed for the same candidate artifact;
|
|
- `previous_stable` and rollback verification are recorded;
|
|
- backup/restore posture is current for stateful workloads;
|
|
- production-critical workloads have explicit human approval.
|