# Railiance app.toml Contract This document defines the repository-local `railiance/app.toml` contract used by Railiance staged promotion tooling. The file tells Railiance how a workload moves through Stage 1 local validation, Stage 2 production canary, and Stage 3 production promotion without relying on bespoke operator notes. The contract is intentionally declarative. Commands, health checks, platform dependencies, and secret references are described by stable names. Plaintext secrets, bearer tokens, kubeconfigs, and private key material must never appear in `railiance/app.toml`. The machine-readable schema lives at `schemas/railiance-app.schema.json`. A minimal example lives at `examples/railiance/app.toml`. ## File Location Participating workload repositories declare the contract at: ```text railiance/app.toml ``` Overlay repositories for third-party applications use the same path in the overlay repo, not in the upstream source repository. ## Versioning Every file must include: ```toml schema_version = "railiance.app.v1" ``` Breaking contract changes require a new schema version. Tooling must fail closed when it sees an unsupported `schema_version`. ## Top-Level Sections ### app Identifies the workload and its ownership boundary. Required fields: - `id`: stable lowercase id using letters, numbers, and hyphens. - `name`: human-readable workload name. - `repo`: owning source or overlay repository slug. - `owner`: owning team, domain, or operator group. - `criticality`: one of `low`, `medium`, `high`, or `critical`. - `description`: short purpose statement. Production-critical workloads include source forge, identity, State Hub, Inter-Hub, databases, object stores, backup systems, ingress, and cluster-wide policy controllers. For those workloads, `criticality = "critical"` requires explicit human approval before Stage 2 traffic exposure and Stage 3 promotion. ### source Identifies the candidate under promotion. Required fields: - `revision`: commit id, tag, or immutable source revision expression. - `artifact`: artifact kind, normally `image`, `helm-chart`, or `bundle`. - `digest_policy`: one of `required`, `preferred`, or `not-applicable`. If an image is promoted, Stage 2 and Stage 3 tooling should prefer immutable image digests over mutable tags. ### platform.dependencies Declares platform services required before canary or production promotion. Each dependency has: - `name`: stable service name. - `kind`: dependency kind such as `postgres`, `redis`, `object-store`, `identity`, `state-hub`, `inter-hub`, `network`, or `other`. - `required`: boolean. - `stage`: earliest stage that needs it, one of `stage1`, `stage2`, `stage3`. - `evidence`: non-secret evidence expected before promotion, such as a health endpoint result, Kubernetes Ready condition, or State Hub progress id. ### secrets.references Declares required secret references without secret values. Each reference has: - `name`: workload-local secret name. - `route`: approved credential route id, for example `openbao-api-key`, `key-cape-oidc-login`, or `activity-core-issue-sink`. - `target`: non-secret target reference such as a Kubernetes Secret name, ExternalSecret name, OpenBao path, or environment variable name. - `stage`: earliest stage that needs the secret. - `required`: boolean. Forbidden fields include plaintext values, tokens, passwords, kubeconfigs, or private keys. Tooling must reject suspicious field names such as `value`, `token`, `password`, `secret`, `private_key`, or `kubeconfig` inside secret reference objects unless they are part of the approved non-secret `target` text. ### observability Defines how promotion tooling proves the workload is alive and observable. Required fields: - `health_endpoints`: one or more HTTP health endpoint declarations. - `metrics`: optional metrics endpoint or query references. - `logs`: optional log selectors or query references. Health endpoint declarations include `name`, `url`, `stage`, and expected status code. URLs may be internal service URLs for Stage 2/3; they must not embed credentials. ### rollback Defines how the workload returns to a previous stable state. Required fields: - `strategy`: one of `helm-revision`, `image-digest`, `traffic-shift`, `manual-runbook`, or `none`. - `command`: command name or runbook path. This may be a placeholder before T07 implements automation, but it must tell the operator where rollback lives. - `verification`: non-secret check to confirm rollback succeeded. `strategy = "none"` is allowed only for Stage 1-only workloads and must not be used for production-critical workloads. ## Stage Sections The contract has one table for each stage: ```toml [stages.stage1] [stages.stage2] [stages.stage3] ``` Each stage includes: - `enabled`: boolean. - `namespace`: target Kubernetes namespace, or a local namespace for Stage 1. - `release`: release identity. - `commands`: ordered command aliases or shell commands that tooling may run. - `checks`: ordered check ids to evaluate. - `evidence`: expected non-secret evidence outputs. - `requires_approval`: boolean. Stage 2 additionally includes `canary_mode`, one of `weighted`, `header`, `path`, `shadow`, or `isolated`, plus `observation_minutes` and optional `traffic_percent` when weighted routing is used. Stage 3 additionally includes `promotion_mode`, one of `traffic-shift`, `release-replace`, `selector-switch`, or `workflow`, plus `previous_stable`. ## Check Definitions Checks live under `[[checks]]` entries and are referenced by stage `checks`. Required fields: - `id`: stable check id. - `type`: one of `command`, `http`, `kubernetes`, `helm`, `metric`, `log`, or `manual`. - `stage`: earliest stage that may run the check. - `description`: human-readable purpose. - `required`: boolean. Type-specific fields: - `command`: `run` command string and optional `timeout_seconds`. - `http`: `url`, `expected_status`, and optional `timeout_seconds`. - `kubernetes`: `namespace`, `resource`, and `condition`. - `helm`: `chart`, `values`, and `mode` such as `template` or `server-dry-run`. - `metric`: `query`, `window_minutes`, and `threshold`. - `log`: `selector`, `window_minutes`, and `forbidden_patterns`. - `manual`: `evidence_required` text. Checks must not print secrets. If a check needs secret-backed access, the result records only the route, target object, and pass/fail state. ## Command Semantics Commands in `app.toml` are declarations for future tooling. Until T04-T07 implement the CLI, they may point to existing scripts or runbook commands. Expected mapping: - Stage 1 commands are consumed by `bin/railiance run `. - Stage 2 commands are consumed by `bin/railiance deploy --stage 2 ` and `bin/railiance observe `. - Stage 3 commands are consumed by `bin/railiance promote ` and `bin/railiance rollback `. Tooling must emit machine-readable results with workload identity, candidate revision, checks run, pass/fail status, non-secret evidence, rollback target, and approval state. ## Minimal Example See `examples/railiance/app.toml`. It declares a critical internal service with: - immutable image digest requirement; - Stage 1 local validation; - Stage 2 isolated canary; - Stage 3 release replacement; - OpenBao-routed secret references without values; - HTTP, Helm, Kubernetes, and manual approval checks. ## Adoption Rules A workload can enter Stage 1 when `app.toml` passes schema validation and all Stage 1 required checks are declared. A workload can enter Stage 2 only when: - Stage 1 passed for the same candidate artifact; - Stage 2 namespace, release, canary mode, health checks, dependencies, and rollback target are declared; - secret references use approved routes and contain no values; - production-critical workloads have explicit approval. A workload can enter Stage 3 only when: - Stage 2 acceptance gates passed for the same candidate artifact; - `previous_stable` and rollback verification are recorded; - backup/restore posture is current for stateful workloads; - production-critical workloads have explicit human approval.