Clone
1
ConfigLayering
Bernd Worsch edited this page 2026-06-26 13:53:38 +00:00

Introduction to ConfigLayering

ConfigLayering is the practice of composing the effective configuration of a system from multiple ordered configuration scopes. Instead of assuming that configuration lives in one file, one database table, one environment, or one operations team, ConfigLayering recognizes that real companies accumulate configuration across many places: application defaults, infrastructure code, deployment environments, security policies, tenant settings, feature flags, secrets, operational overrides, user preferences, and emergency controls.

The key insight is that configuration is not merely data. Configuration is distributed control information. It determines how systems behave, which capabilities are available, who may access what, which limits apply, which integrations are active, how risks are constrained, and how business rules become executable. In a fast-moving company, this control information is rarely cleanly centralized. It is layered across systems, teams, tools, vendors, environments, tenants, and responsibilities.

ConfigLayering therefore asks a central question:

How is the final, effective configuration of a system produced from all relevant scopes, and can that result be discovered, explained, validated, governed, and safely changed?

A simple layering model may start with product defaults, then add company baselines, platform or domain settings, environment-specific overlays, regional or cluster-specific settings, installation-specific settings, tenant or customer settings, role or group settings, user or agent settings, and finally temporary runtime or incident overrides. Each layer contributes values, constraints, defaults, or policies. More specific layers may override broader layers, but some higher-level layers may define non-overridable guardrails.

This makes ConfigLayering both a technical and organizational discipline. Technically, it requires clear precedence, schema validation, merge rules, runtime delivery, secrets handling, feature-flag separation, policy enforcement, rollback, and observability. Organizationally, it requires ownership, scope boundaries, change authority, evidence, auditability, and conflict resolution.

The most important concept is the effective configuration: the final configuration that actually applies to a given system, environment, tenant, user, request, or operational situation. Individual files or settings are only partial evidence. The effective configuration is the resolved result of all relevant layers. Without visibility into this result, organizations can know what was configured somewhere, but not what is actually in force.

This has several implications.

First, configuration needs a scope model. A company must know whether a setting belongs to the product, the platform, an environment, a region, an installation, a tenant, a group, a user, an agent, or an emergency override. Without a scope model, configuration becomes a mixture of local decisions and inherited assumptions.

Second, configuration needs an ownership model. Every meaningful key should have an owner, a purpose, a lifecycle, and a change process. A setting without an owner becomes operational debt. A setting with multiple implicit owners becomes a conflict surface.

Third, configuration needs explicit precedence and merge semantics. It must be clear which layer wins, whether objects are deep-merged, whether lists are replaced or merged by key, whether null means “unset” or “delete,” and whether a value may be overridden at all.

Fourth, configuration needs separation by kind. Ordinary runtime configuration, infrastructure desired state, secrets, feature flags, policies, tenant entitlements, and emergency controls should not be treated as the same thing. They differ in sensitivity, lifecycle, blast radius, mutability, and governance requirements.

Fifth, configuration needs evidence. For every effective value, it should be possible to answer: where did this value come from, what did it override, who owns it, when was it changed, why was it changed, which systems consume it, and how can it be rolled back?

Best practice is to treat ConfigLayering as a governed configuration supply chain. Configuration should be declared as close as possible to its natural owner, but as high as necessary for consistency and control. Product defaults belong with the product. Company baselines and security guardrails belong in controlled central layers. Environment and deployment settings belong in platform or operations layers. Tenant settings belong in tenant-governed scopes. Secrets belong in dedicated secret management. Feature flags belong in runtime control infrastructure. Emergency overrides require strong audit and expiry.

A practical ConfigLayering standard should define:

  1. A canonical configuration registry for all known keys.
  2. A scope model describing where configuration may exist.
  3. A precedence model describing which layers override others.
  4. A schema model describing valid types, ranges, defaults, and constraints.
  5. A policy model describing what may never be overridden.
  6. A secrets model keeping sensitive values outside ordinary configuration.
  7. A feature-control model for runtime behavior switches.
  8. An evidence model for audit, rollback, drift detection, and effective-config explanation.
  9. A lifecycle model for deprecating, replacing, and migrating configuration keys.
  10. An observability model for inspecting redacted effective configuration safely.

The goal is not to centralize every setting into one giant configuration database. That would create a different kind of fragility. The goal is to make the distributed configuration surface of the company discoverable, explainable, governable, and safe to evolve.

In this sense, ConfigLayering is the foundation of a Configuration Control Plane. It turns scattered configuration from an unmanaged source of operational risk into a visible, structured, and auditable company capability.

What config layering is

Config layering means building the final, effective configuration for a system from multiple ordered scopes. Each layer contributes defaults, constraints, or overrides. The result should be deterministic and explainable.

A simple model:

product defaults
  < company baseline
  < domain/platform baseline
  < environment: dev/stage/prod
  < region / datacenter / cluster
  < installation / deployment
  < tenant / customer / community
  < group / role
  < user / agent
  < temporary runtime / incident override

“More specific wins” is the normal rule, but company or security layers may define non-overridable guardrails.

This is the same basic pattern behind Kubernetes Kustomize bases/overlays, Helm values precedence, NixOS modules, and many application configuration frameworks: start with a reusable base, apply increasingly specific overlays, and produce one effective configuration. Kubernetes documents Kustomize in terms of bases and overlays, Helm explicitly defines override precedence for values, and NixOS uses a modular declarative system configuration model. ([Kubernetes]1)

The most important distinction

Do not treat all config as one thing. A companywide config strategy should separate at least these categories:

  1. Code defaults: safe defaults shipped with the app.
  2. Deployment config: environment, endpoints, resource limits, region, cluster, runtime mode.
  3. Secrets: passwords, tokens, keys, certificates.
  4. Feature flags: runtime behavior switches and experiments.
  5. Policy config: access rules, compliance constraints, guardrails.
  6. Tenant/customer config: entitlements, limits, preferences, routing.
  7. Operational overrides: incident switches, kill switches, temporary throttles.
  8. Infrastructure desired state: machines, networks, Kubernetes objects, IAM, storage.

The Twelve-Factor App principle is still useful here: configuration should be separated from code, commonly exposed through the environment at runtime. In Kubernetes, ConfigMaps are explicitly meant to decouple environment-specific configuration from container images, while Secrets are a separate object type for sensitive values. ([12factor.net]2)

Best-practice architecture

For systemwide or companywide config management, I would use this structure:

Config Registry
  - canonical keys
  - descriptions
  - owners
  - schema
  - allowed scopes
  - mutability class
  - security classification
  - default value policy

Config Sources
  - Git repos for declarative desired state
  - secret manager for secrets
  - runtime config / feature flag service for dynamic behavior
  - tenant/admin UI for allowed business-level settings

Config Resolution Engine
  - deterministic layer ordering
  - validation
  - conflict detection
  - policy enforcement
  - effective-config rendering

Config Distribution
  - env vars
  - generated files
  - Kubernetes ConfigMaps / Secrets
  - sidecar / agent
  - SDK lookup
  - API lookup

Config Evidence
  - audit log
  - effective config snapshots
  - who changed what, when, why
  - rollback points
  - drift detection

The core idea: Git for declarative desired state, a secret manager for secrets, a feature/config service for dynamic runtime behavior, and policy-as-code for guardrails. Argo CD describes the GitOps model as automating desired application states in target environments; Terraform similarly uses human-readable declarative configuration files for infrastructure lifecycle management. ([argo-cd.readthedocs.io]3)

A practical companywide layering standard could look like this:

L0 vendor/product defaults
L1 company baseline
L2 platform/domain baseline
L3 environment overlay: dev, test, stage, prod
L4 region/zone/cluster overlay
L5 installation/deployment overlay
L6 tenant/customer/community overlay
L7 group/role overlay
L8 user/agent/workload overlay
L9 emergency/runtime override

Each config key should declare which layers may override it. For example:

key: mail.delivery.max_batch_size
type: integer
default: 500
allowed_layers:
  - company
  - environment
  - installation
  - tenant
minimum: 1
maximum: 5000
hot_reloadable: true
owner: platform-delivery
security_class: operational

That gives you a companywide contract: teams know what the key means, who owns it, where it may be changed, and what values are legal.

Merge rules matter a lot

The most dangerous part of config layering is vague merge behavior. Define it explicitly.

Good default rules:

scalar:      more specific layer replaces earlier value
object/map:  deep merge by key
array/list:  replace by default, unless keyed merge is explicitly declared
null:        not deletion unless tombstone semantics are defined
secret:      never merged into normal config
policy:      restrictive rule wins unless explicitly delegated

Avoid hidden “last writer wins” behavior. Every effective value should be explainable:

config explain mail.delivery.max_batch_size

effective value: 1000
source: tenants/acme/prod.yaml
overrides:
  - defaults/product.yaml: 500
  - baselines/company.yaml: 800
  - environments/prod.yaml: 1000
validated by: schemas/mail-delivery.schema.json
owner: platform-delivery

JSON Schema and CUE are both useful for typed validation. JSON Schema is a declarative language for defining and validating JSON structure and constraints; CUE is designed for validating data, schemas, and configuration alignment with policies. ([json-schema.org]4)

Mutability classes

Every key should have a mutability class:

build-time        requires rebuild
deploy-time       requires redeploy
startup-time      requires process restart
hot-reloadable    can reload safely while running
per-request       can vary by tenant/user/request
emergency         can override quickly with strong audit

This prevents a common failure mode: teams treat dangerous structural config like a harmless feature flag. Feature flags are excellent for changing application behavior without redeploying code, and OpenFeature provides a vendor-neutral abstraction for that pattern. AWS AppConfig and Azure App Configuration are examples of managed services that support dynamic configuration and feature flags with safer rollout patterns. ([openfeature.dev]5)

Secrets must be separate

Secrets should not live in ordinary config files, not even “encrypted but casually handled” ones unless the lifecycle is deliberately designed.

Best practice:

normal config: Git / config registry / ConfigMap
secrets:       OpenBao, Vault, cloud secret manager, SOPS, External Secrets
injection:     identity-based, least privilege, short-lived where possible
audit:         access and rotation evidence

SOPS supports encrypted YAML, JSON, ENV, INI, and binary files with KMS/age/PGP-style backends, while External Secrets Operator synchronizes secrets from external APIs into Kubernetes Secrets. ([getsops.io]6)

Companywide best practices

The strongest practices are these:

1. Treat config as a governed product. Each key needs a name, owner, description, type, allowed scope, default, lifecycle, validation, and deprecation path.

2. Prefer declarative config over imperative scripts. For infrastructure and system state, use desired-state tools: Terraform/OpenTofu, Ansible, NixOS, Puppet, Kubernetes manifests, Helm, Kustomize, Argo CD, or Flux depending on the layer. Ansible playbooks are explicitly repeatable and source-controllable, and Puppet-style configuration management is built around desired state. ([Ansible Dokumentation]7)

3. Make the effective config observable. Every service should be able to expose a redacted effective-config view: version, source layers, schema version, feature flags, and active policy set. This is essential for debugging.

4. Validate before rollout. Use schema validation, policy-as-code, static checks, config unit tests, and environment simulation. OPA is a general-purpose policy engine usable across microservices, Kubernetes, CI/CD, API gateways, and more. ([openpolicyagent.org]8)

5. Use progressive rollout for risky runtime config. Feature flags, rate limits, routing, and model/provider selection should support staged rollout, canary, percentage rollout, tenant allowlists, health checks, and fast rollback.

6. Keep global config small. Companywide config should define defaults and guardrails, not become a giant mutable dictionary. The more global a config key is, the higher the blast radius.

7. Separate ownership from override rights. A tenant admin may change tenant preferences. A platform team may change platform limits. Security may own non-overridable guardrails. Product may own entitlements. Finance may own pricing parameters.

8. Record evidence. For every config change: who changed it, what changed, why, approval link, rollout scope, affected services, previous value, new value, rollback path.

Anti-patterns to avoid

one giant companywide YAML file
manual console changes not mirrored anywhere
secrets mixed with normal config
environment-specific if/else logic in application code
untyped stringly-typed config
arrays merged by position
feature flags that live forever
global kill switches without ownership
tenant-specific config copied across files
no way to explain the winning value
no rollback path

The worst variant is “centralized chaos”: everything is technically in one place, but nobody knows who owns a key, what it means, which systems consume it, or whether changing it is safe.

A good companywide target state

For your kind of multi-repo, multi-tenant, platform-oriented work, I would frame the target as a Configuration Control Plane:

Config Canon
  defines the vocabulary and schema

Config Registry
  catalogs every key, owner, type, scope, lifecycle

Config Resolver
  renders effective config from layered sources

Config Policy
  validates allowed values and allowed overrides

Config Delivery
  pushes or exposes config to systems

Config Evidence
  records snapshots, changes, drift, rollout, rollback

The guiding rule:

Put config as close as possible to the owner, but as high as necessary for consistency.

For example, company security baselines belong high. Tenant preferences belong low. Secrets belong outside normal config. Feature flags belong in a runtime control plane. Infrastructure desired state belongs in GitOps/IaC. Application defaults belong with the code.