This commit is contained in:
@@ -74,6 +74,7 @@ From two bare Linux servers, a Git repo, and valid credentials, you can rebuild
|
||||
## Operations
|
||||
|
||||
- [Deployment lifecycle](deployment-lifecycle.md)
|
||||
- [Railiance app.toml contract](app-toml-contract.md)
|
||||
|
||||
## 👥 Contributing
|
||||
|
||||
|
||||
233
docs/app-toml-contract.md
Normal file
233
docs/app-toml-contract.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# Railiance app.toml Contract
|
||||
|
||||
This document defines the repository-local `railiance/app.toml` contract used by
|
||||
Railiance staged promotion tooling. The file tells Railiance how a workload
|
||||
moves through Stage 1 local validation, Stage 2 production canary, and Stage 3
|
||||
production promotion without relying on bespoke operator notes.
|
||||
|
||||
The contract is intentionally declarative. Commands, health checks, platform
|
||||
dependencies, and secret references are described by stable names. Plaintext
|
||||
secrets, bearer tokens, kubeconfigs, and private key material must never appear
|
||||
in `railiance/app.toml`.
|
||||
|
||||
The machine-readable schema lives at `schemas/railiance-app.schema.json`. A
|
||||
minimal example lives at `examples/railiance/app.toml`.
|
||||
|
||||
## File Location
|
||||
|
||||
Participating workload repositories declare the contract at:
|
||||
|
||||
```text
|
||||
railiance/app.toml
|
||||
```
|
||||
|
||||
Overlay repositories for third-party applications use the same path in the
|
||||
overlay repo, not in the upstream source repository.
|
||||
|
||||
## Versioning
|
||||
|
||||
Every file must include:
|
||||
|
||||
```toml
|
||||
schema_version = "railiance.app.v1"
|
||||
```
|
||||
|
||||
Breaking contract changes require a new schema version. Tooling must fail closed
|
||||
when it sees an unsupported `schema_version`.
|
||||
|
||||
## Top-Level Sections
|
||||
|
||||
### app
|
||||
|
||||
Identifies the workload and its ownership boundary.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`: stable lowercase id using letters, numbers, and hyphens.
|
||||
- `name`: human-readable workload name.
|
||||
- `repo`: owning source or overlay repository slug.
|
||||
- `owner`: owning team, domain, or operator group.
|
||||
- `criticality`: one of `low`, `medium`, `high`, or `critical`.
|
||||
- `description`: short purpose statement.
|
||||
|
||||
Production-critical workloads include source forge, identity, State Hub,
|
||||
Inter-Hub, databases, object stores, backup systems, ingress, and cluster-wide
|
||||
policy controllers. For those workloads, `criticality = "critical"` requires
|
||||
explicit human approval before Stage 2 traffic exposure and Stage 3 promotion.
|
||||
|
||||
### source
|
||||
|
||||
Identifies the candidate under promotion.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `revision`: commit id, tag, or immutable source revision expression.
|
||||
- `artifact`: artifact kind, normally `image`, `helm-chart`, or `bundle`.
|
||||
- `digest_policy`: one of `required`, `preferred`, or `not-applicable`.
|
||||
|
||||
If an image is promoted, Stage 2 and Stage 3 tooling should prefer immutable
|
||||
image digests over mutable tags.
|
||||
|
||||
### platform.dependencies
|
||||
|
||||
Declares platform services required before canary or production promotion.
|
||||
|
||||
Each dependency has:
|
||||
|
||||
- `name`: stable service name.
|
||||
- `kind`: dependency kind such as `postgres`, `redis`, `object-store`,
|
||||
`identity`, `state-hub`, `inter-hub`, `network`, or `other`.
|
||||
- `required`: boolean.
|
||||
- `stage`: earliest stage that needs it, one of `stage1`, `stage2`, `stage3`.
|
||||
- `evidence`: non-secret evidence expected before promotion, such as a health
|
||||
endpoint result, Kubernetes Ready condition, or State Hub progress id.
|
||||
|
||||
### secrets.references
|
||||
|
||||
Declares required secret references without secret values.
|
||||
|
||||
Each reference has:
|
||||
|
||||
- `name`: workload-local secret name.
|
||||
- `route`: approved credential route id, for example `openbao-api-key`,
|
||||
`key-cape-oidc-login`, or `activity-core-issue-sink`.
|
||||
- `target`: non-secret target reference such as a Kubernetes Secret name,
|
||||
ExternalSecret name, OpenBao path, or environment variable name.
|
||||
- `stage`: earliest stage that needs the secret.
|
||||
- `required`: boolean.
|
||||
|
||||
Forbidden fields include plaintext values, tokens, passwords, kubeconfigs, or
|
||||
private keys. Tooling must reject suspicious field names such as `value`,
|
||||
`token`, `password`, `secret`, `private_key`, or `kubeconfig` inside secret
|
||||
reference objects unless they are part of the approved non-secret `target` text.
|
||||
|
||||
### observability
|
||||
|
||||
Defines how promotion tooling proves the workload is alive and observable.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `health_endpoints`: one or more HTTP health endpoint declarations.
|
||||
- `metrics`: optional metrics endpoint or query references.
|
||||
- `logs`: optional log selectors or query references.
|
||||
|
||||
Health endpoint declarations include `name`, `url`, `stage`, and expected
|
||||
status code. URLs may be internal service URLs for Stage 2/3; they must not
|
||||
embed credentials.
|
||||
|
||||
### rollback
|
||||
|
||||
Defines how the workload returns to a previous stable state.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `strategy`: one of `helm-revision`, `image-digest`, `traffic-shift`,
|
||||
`manual-runbook`, or `none`.
|
||||
- `command`: command name or runbook path. This may be a placeholder before
|
||||
T07 implements automation, but it must tell the operator where rollback lives.
|
||||
- `verification`: non-secret check to confirm rollback succeeded.
|
||||
|
||||
`strategy = "none"` is allowed only for Stage 1-only workloads and must not be
|
||||
used for production-critical workloads.
|
||||
|
||||
## Stage Sections
|
||||
|
||||
The contract has one table for each stage:
|
||||
|
||||
```toml
|
||||
[stages.stage1]
|
||||
[stages.stage2]
|
||||
[stages.stage3]
|
||||
```
|
||||
|
||||
Each stage includes:
|
||||
|
||||
- `enabled`: boolean.
|
||||
- `namespace`: target Kubernetes namespace, or a local namespace for Stage 1.
|
||||
- `release`: release identity.
|
||||
- `commands`: ordered command aliases or shell commands that tooling may run.
|
||||
- `checks`: ordered check ids to evaluate.
|
||||
- `evidence`: expected non-secret evidence outputs.
|
||||
- `requires_approval`: boolean.
|
||||
|
||||
Stage 2 additionally includes `canary_mode`, one of `weighted`, `header`,
|
||||
`path`, `shadow`, or `isolated`, plus `observation_minutes` and optional
|
||||
`traffic_percent` when weighted routing is used.
|
||||
|
||||
Stage 3 additionally includes `promotion_mode`, one of `traffic-shift`,
|
||||
`release-replace`, `selector-switch`, or `workflow`, plus `previous_stable`.
|
||||
|
||||
## Check Definitions
|
||||
|
||||
Checks live under `[[checks]]` entries and are referenced by stage `checks`.
|
||||
|
||||
Required fields:
|
||||
|
||||
- `id`: stable check id.
|
||||
- `type`: one of `command`, `http`, `kubernetes`, `helm`, `metric`, `log`, or
|
||||
`manual`.
|
||||
- `stage`: earliest stage that may run the check.
|
||||
- `description`: human-readable purpose.
|
||||
- `required`: boolean.
|
||||
|
||||
Type-specific fields:
|
||||
|
||||
- `command`: `run` command string and optional `timeout_seconds`.
|
||||
- `http`: `url`, `expected_status`, and optional `timeout_seconds`.
|
||||
- `kubernetes`: `namespace`, `resource`, and `condition`.
|
||||
- `helm`: `chart`, `values`, and `mode` such as `template` or
|
||||
`server-dry-run`.
|
||||
- `metric`: `query`, `window_minutes`, and `threshold`.
|
||||
- `log`: `selector`, `window_minutes`, and `forbidden_patterns`.
|
||||
- `manual`: `evidence_required` text.
|
||||
|
||||
Checks must not print secrets. If a check needs secret-backed access, the result
|
||||
records only the route, target object, and pass/fail state.
|
||||
|
||||
## Command Semantics
|
||||
|
||||
Commands in `app.toml` are declarations for future tooling. Until T04-T07
|
||||
implement the CLI, they may point to existing scripts or runbook commands.
|
||||
|
||||
Expected mapping:
|
||||
|
||||
- Stage 1 commands are consumed by `bin/railiance run <app>`.
|
||||
- Stage 2 commands are consumed by `bin/railiance deploy --stage 2 <app>` and
|
||||
`bin/railiance observe <app>`.
|
||||
- Stage 3 commands are consumed by `bin/railiance promote <app>` and
|
||||
`bin/railiance rollback <app>`.
|
||||
|
||||
Tooling must emit machine-readable results with workload identity, candidate
|
||||
revision, checks run, pass/fail status, non-secret evidence, rollback target,
|
||||
and approval state.
|
||||
|
||||
## Minimal Example
|
||||
|
||||
See `examples/railiance/app.toml`. It declares a critical internal service with:
|
||||
|
||||
- immutable image digest requirement;
|
||||
- Stage 1 local validation;
|
||||
- Stage 2 isolated canary;
|
||||
- Stage 3 release replacement;
|
||||
- OpenBao-routed secret references without values;
|
||||
- HTTP, Helm, Kubernetes, and manual approval checks.
|
||||
|
||||
## Adoption Rules
|
||||
|
||||
A workload can enter Stage 1 when `app.toml` passes schema validation and all
|
||||
Stage 1 required checks are declared.
|
||||
|
||||
A workload can enter Stage 2 only when:
|
||||
|
||||
- Stage 1 passed for the same candidate artifact;
|
||||
- Stage 2 namespace, release, canary mode, health checks, dependencies, and
|
||||
rollback target are declared;
|
||||
- secret references use approved routes and contain no values;
|
||||
- production-critical workloads have explicit approval.
|
||||
|
||||
A workload can enter Stage 3 only when:
|
||||
|
||||
- Stage 2 acceptance gates passed for the same candidate artifact;
|
||||
- `previous_stable` and rollback verification are recorded;
|
||||
- backup/restore posture is current for stateful workloads;
|
||||
- production-critical workloads have explicit human approval.
|
||||
@@ -54,9 +54,10 @@ Each stage emits a machine-readable result with:
|
||||
## Workload Declaration
|
||||
|
||||
Each participating workload should declare its promotion contract in a
|
||||
repository-local `railiance/app.toml`. The full schema is defined by the next
|
||||
workplan task, but this lifecycle expects every workload declaration to provide
|
||||
at least:
|
||||
repository-local `railiance/app.toml`. The contract is defined in
|
||||
`docs/app-toml-contract.md`, with a machine-readable schema at
|
||||
`schemas/railiance-app.schema.json`. This lifecycle expects every workload
|
||||
declaration to provide at least:
|
||||
|
||||
- stable workload name and owning repo;
|
||||
- source revision, image tag, or image digest policy;
|
||||
|
||||
Reference in New Issue
Block a user