diff --git a/research/README.md b/research/README.md new file mode 100644 index 0000000..dbe641e --- /dev/null +++ b/research/README.md @@ -0,0 +1,31 @@ +# research/ + +Deep-research notes backing **ConfigAtlas** — the configuration control plane for +discovering, mapping, explaining, and governing the living configuration surface +of fast-moving companies. + +This directory is the *evidence layer* for the product thesis. It is not part of +the configuration surface registry (`registry/`); it is the reasoning and sourcing +behind the registry's design choices. + +## Contents + +| File | What it is | +|------|------------| +| [`configuration-control-plane.md`](configuration-control-plane.md) | Main digest: what the configuration control plane is, why it matters, the layering and resolution model, adjacent topics, and where ConfigAtlas fits. | +| [`sources.md`](sources.md) | Annotated bibliography — every cited source with a one-line note on why it matters. | + +## Related repo material + +- [`../wiki/ConfigLayering.md`](../wiki/ConfigLayering.md) — the layering primer (scope model, precedence, merge rules, mutability classes). +- [`../wiki/CompetitiveLandscape.md`](../wiki/CompetitiveLandscape.md) — adjacent tool families and white space. +- [`../wiki/ProductVision.md`](../wiki/ProductVision.md) / [`../wiki/BrandFrame.md`](../wiki/BrandFrame.md) — product framing. + +## Method + +Synthesized 2026-06-26 from the repo's own wiki material plus web research across +vendor docs, primary-source engineering writing (Brian Grant / KRM, InfoQ, CUE), +and 2024–2026 outage retrospectives. Sources captured in `sources.md`; claims tied +to live behavior of tools are dated because this category is moving fast. + + diff --git a/research/configuration-control-plane.md b/research/configuration-control-plane.md new file mode 100644 index 0000000..7ba98fe --- /dev/null +++ b/research/configuration-control-plane.md @@ -0,0 +1,244 @@ +# Configuration Layering and the Configuration Control Plane — Research Digest + +> Compiled 2026-06-26. Numbered references resolve in [`sources.md`](sources.md). +> This digest deepens the repo's own [ConfigLayering primer](../wiki/ConfigLayering.md) +> and [CompetitiveLandscape](../wiki/CompetitiveLandscape.md) with primary sources +> and the surrounding technical context. + +--- + +## 1. The thesis in one paragraph + +Configuration stopped being static data a long time ago. It is now *distributed +control information*: the live mechanism that changes how production systems +behave, in real time, often faster and with less ceremony than a code deploy. As +cloud-native scale grew, the industry independently converged on treating +configuration as a **control plane** — something that needs staged rollout, +blast-radius containment, dependency-aware validation, and automated rollback, +exactly like the deployment systems it sits beside [1]. **ConfigAtlas** bets that +before companies can *control* that surface safely, they first need to *see* it: +discover where configuration lives, classify it by kind and scope, resolve the +effective value, and attach ownership and evidence. Map the territory, then govern +it. + +--- + +## 2. Why this matters now: configuration is the dominant failure mode + +The strongest argument for a configuration control plane is the outage record. A +disproportionate share of large 2024–2026 incidents trace to a configuration +change rather than a code defect [4][5]: + +- **CrowdStrike (Jul 2024)** — a faulty Falcon *sensor configuration* update + blue-screened Windows hosts worldwide; estimated ~$5.4B impact to Fortune 500 + firms alone. A content/config push, not a binary release [5]. +- **AT&T Mobility (Feb 2024)** — an equipment *configuration error* took down + ~125M devices for 12+ hours, blocking ~92M calls including 25,000 to 911 [5]. +- **Cloudflare (Nov 2025)** — a global outage taking down X, ChatGPT, Spotify and + others, triggered by a software bug *exposed by a configuration change* [5]. +- **Azure Front Door (Nov 2025) / Azure networking (2025)** — a control-plane + defect and a networking *configuration change* produced multi-hour to ~50-hour + degradations across services [4][7]. + +ThousandEyes' 2024 internet-outage analysis names configuration change as a +leading, recurring cause [4]. The lesson the hyperscalers drew is not "stop +changing config" — it is "make unsafe configuration changes progressively harder +to express, deploy, or overlook" [1]. That sentence is essentially the ConfigAtlas +mission restated as a safety property. + +--- + +## 3. Configuration layering — the resolution model + +Layering is the practice of composing one **effective configuration** from +multiple ordered scopes. The repo's primer [internal] gives the canonical stack; +the research backs *why* each design choice is non-negotiable. + +### 3.1 The scope stack + +``` +L0 vendor/product defaults +L1 company baseline +L2 platform/domain baseline +L3 environment overlay (dev/test/stage/prod) +L4 region/zone/cluster overlay +L5 installation/deployment overlay +L6 tenant/customer/community overlay +L7 group/role overlay +L8 user/agent/workload overlay +L9 emergency/runtime override +``` + +"More specific wins" is the default, but **higher layers may declare +non-overridable guardrails** (a security baseline a tenant cannot loosen). This is +the same base+overlay pattern behind Kubernetes Kustomize, Helm value precedence, +and NixOS modules [8][9] — the industry already agrees on the shape; what is +missing is a cross-tool *view* of it. + +### 3.2 The effective configuration is the only thing that's real + +A file or a flag is partial evidence. The value that actually applies to a given +system/tenant/request is the resolved result of every relevant layer. The central +product capability — and the line between a config *database* and a config +*control plane* — is answering: **what value applies here, which layer won, what +did it override, which policy constrained it, and who is affected** [internal, +CompetitiveLandscape §"Effective configuration resolution"]. + +### 3.3 Merge semantics are where layering quietly fails + +Vague merge behavior is the most dangerous part of layering. Define it explicitly: + +``` +scalar more specific layer replaces earlier value +object/map deep merge by key +array/list replace by default; keyed merge only if declared +null not deletion unless tombstone semantics are defined +secret never merged into normal config +policy restrictive rule wins unless explicitly delegated +``` + +The schema/validation choice matters here. **JSON Schema** validates structure and +constraints but keeps schema and data separate. **CUE** unifies types and values +in a single lattice where merge (`&`) is commutative, associative, and idempotent +— so the resolved result is *order-independent*, and the same definition both +validates data and reduces boilerplate [2][3]. By contrast Jsonnet's `+` mixin +composition is order-dependent (right-hand side wins on scalar conflicts) [2]. +For a control plane whose whole value proposition is a *deterministic, explainable* +effective value, order-independent merge is a meaningful property, not a detail. +Notably, CUE itself now ships **CUE Hub**, explicitly branded "the Configuration +Control Plane" — independent validation that the category name is forming [6]. + +### 3.4 Mutability classes prevent the worst failure mode + +Every key should declare how it can change: `build-time`, `deploy-time`, +`startup-time`, `hot-reloadable`, `per-request`, `emergency`. The recurring +failure is treating dangerous structural config like a harmless flag — exactly the +CrowdStrike-shaped risk where a "content update" had deploy-grade blast radius [5]. + +--- + +## 4. The adjacent topics (the converging market) + +The control plane is not one product; it is a convergence of tool families. +ConfigAtlas's stance is **integrate and map, don't replace** [internal, +CompetitiveLandscape]. Summary of each adjacency and the research behind it: + +### 4.1 Configuration-as-Data (the closest intellectual neighbor) +Brian Grant — creator of the Kubernetes Resource Model (KRM), now CTO of ConfigHub +— argues configuration should be *data*, authoritative and stored like data, with +code that operates on it kept separate [10][11]. ConfigHub stores each variant in +fully-rendered "WET" form (no templates/variables/generators), versioned with +metadata, and — because KRM *is* the API representation — can update config *from* +live state, mitigating drift bidirectionally [10][12]. This is the strongest +direct competitor and the sharpest articulation of "config is graph-shaped +operational data, not files." **ConfigAtlas differentiation:** discovery-first and +cross-tool — map config that already lives in many systems, rather than asking +everyone to move into one store. + +### 4.2 GitOps / IaC — desired state and drift +Argo CD and Flux continuously reconcile live cluster state against Git-declared +desired state; any divergence is *drift*, flagged or auto-corrected on a sync loop +[13]. Terraform/OpenTofu do the same for infrastructure lifecycle. This camp owns +the "desired state" narrative. **ConfigAtlas complements it with the "effective +state" narrative:** GitOps tells you what you *intended* to deploy; ConfigAtlas +tells you which scopes contributed, what actually applies, who owns it, and what's +risky to change [internal]. + +### 4.3 Feature flags / runtime control — and the AI-era expansion +Feature management (LaunchDarkly, Unleash, Flagsmith, OpenFeature as the +vendor-neutral standard) owns live behavior change and **progressive delivery**: +ring-based rollout (internal → 1–5% canary → 10–25% beta → 100%), deterministic +cohorts for blast-radius containment, and kill switches / circuit breakers that +auto-deactivate on SLO breach [14][15]. The frontier is **AI configuration**: +LaunchDarkly's AI Configs / AgentControl move prompts, model selection, and tool +access out of code into runtime config that propagates in <200ms, with guarded +rollouts that auto-revert when eval metrics (accuracy, toxicity) drop [16][17]. +This validates the core ConfigAtlas claim — the *kinds* of configuration keep +multiplying (now: agent behavior), so a map that spans kinds is increasingly +valuable. **ConfigAtlas treats flags as one scope class among many**, not the +whole plane [internal]. + +### 4.4 Secrets management — adjacent but kept separate +Vault, OpenBao, Infisical, Doppler, plus SOPS and External Secrets for the +GitOps path. Secrets differ in sensitivity, lifecycle, and blast radius and must +never be merged into ordinary config [internal]. **ConfigAtlas stores references +and dependencies, never values** — which config depends on which secret, where +it's injected, what's affected if it rotates. + +### 4.5 Policy-as-code — the guardrail backend +OPA, Kyverno, Checkov answer "is this change allowed?" across K8s, CI/CD, IaC, and +more [internal]. They are ideal *validation backends* for a control plane but +don't model provenance, ownership, or effective behavior. **ConfigAtlas is the +context and evidence layer around them** — which policy applies, at which scope, +and why. + +### 4.6 CMDB / developer portals / SSPM — the enterprise gravity wells +CMDBs (ServiceNow et al.) model assets and services; developer portals (Backstage, +Port, Cortex, OpsLevel) model ownership; SSPM tools (CoreView, AppOmni) model SaaS +posture drift [internal]. None model the layered behavioral config surface with +effective-value resolution. **ConfigAtlas integrates** — enriching catalogs and +portals rather than displacing them; a Backstage/Port plugin is a plausible +adoption path. + +--- + +## 5. Reference architecture for a configuration control plane + +Synthesizing the layering primer with the control-plane framing [1][internal]: + +``` +Config Canon vocabulary + schema (what a key means) +Config Registry every key: owner, type, allowed scopes, lifecycle, mutability, security class +Config Resolver deterministic layer ordering -> effective value (the "explain" engine) +Config Policy allowed values + allowed overrides (OPA/Kyverno/CUE backends) +Config Delivery env vars / ConfigMaps / sidecar / SDK / API lookup +Config Evidence snapshots, who/what/why/when, drift, rollout, rollback +``` + +The InfoQ framing adds three forward-looking elements that map directly onto this: +**reconciler-first control planes** (resolution as a continuous loop, à la GitOps), +**configuration knowledge graphs** (the `key → service → deployment → tenant → +feature → policy → secret → owner → incident` graph), and **AI-assisted decision +support** (surfacing blast radius and risk before a human approves a change) [1]. +The knowledge-graph element is precisely ConfigAtlas's differentiator. + +Guiding rule from the primer: **put config as close as possible to its owner, but +as high as necessary for consistency** — defaults with the product, guardrails +high and central, tenant prefs low, secrets outside, flags in the runtime plane, +infra state in GitOps. + +--- + +## 6. The wedge and the white space + +The defensible opening is **read-first configuration intelligence**, not +write-first control [internal, CompetitiveLandscape]. The category name +("Configuration Control Plane") is emerging and not yet owned — InfoQ frames it as +a pattern [1], CUE markets a product under the exact phrase [6], ConfigHub attacks +the same instinct from the data angle [10]. None yet own the **companywide living +configuration surface**: cross-tool discovery, effective-value resolution, +organizational scope/ownership governance, blast-radius/dependency intelligence, +and change evidence. + +Sharpest positioning [internal]: + +> **ConfigAtlas is not where all configuration must live. It is where +> configuration becomes visible, explainable, governable, and safe to change.** + +--- + +## 7. Open questions to drive the next research pass + +1. **Discovery connectors** — what is the minimum viable set of ingestion sources + (Git, K8s, Terraform state, a feature-flag platform, a secret manager) to + prove cross-tool effective-config resolution end to end? +2. **Effective-value provenance schema** — can the registry's entry schema carry + enough to render a full `config explain` (source layer, overrides, validating + schema, owner) without becoming a second source of truth for values? +3. **Graph model** — what is the canonical edge set for the configuration + knowledge graph, and does it reuse the State Hub's existing relationship model? +4. **CUE vs JSON Schema** for atlas entry validation — does order-independent + merge buy enough to justify the toolchain cost over JSON Schema? [2][3] +5. **AI-config as a first-class scope** — given the LaunchDarkly trajectory [16], + should "agent/model configuration" be a named scope class in the L-stack now? + diff --git a/research/sources.md b/research/sources.md new file mode 100644 index 0000000..bd06934 --- /dev/null +++ b/research/sources.md @@ -0,0 +1,99 @@ +# Sources — Configuration Control Plane research + +Annotated bibliography for [`configuration-control-plane.md`](configuration-control-plane.md). +Captured 2026-06-26. "internal" citations refer to this repo's own +[`wiki/ConfigLayering.md`](../wiki/ConfigLayering.md) and +[`wiki/CompetitiveLandscape.md`](../wiki/CompetitiveLandscape.md), which already +carry their own source lists. + +## Category framing + +1. **Configuration as a Control Plane: Designing for Safety and Reliability at Scale** — InfoQ. + The anchor source. Argues hyperscalers independently converged on the same safety + patterns (staged rollout, blast-radius containment, dependency-aware validation, + automated rollback) and names the emerging tech: reconciler-first control planes, + configuration knowledge graphs, AI-assisted decision support. + https://www.infoq.com/articles/configuration-control-plane/ + +6. **CUE Hub: the Configuration Control Plane** — CUE Labs. + Independent use of the exact category phrase; a vendor branding a product as + "the configuration control plane." Evidence the category name is forming. + https://cue.dev/blog/announcing-cue-labs/ + +## Layering, schema, and merge semantics + +2. **Config Wars — Chapter 3: CUE** — Miru's Blog (Vedant Nair). + Comparative analysis of CUE vs JSON Schema vs Jsonnet merge semantics; + establishes CUE's commutative/associative/idempotent unification and Jsonnet's + order-dependent mixin composition. + https://mirurobotics.substack.com/p/config-wars-chapter-3-cue + +3. **Data Validation use case** — CUE official docs. + Primary source: CUE merges schema and data; one definition both validates and + templates. + https://cuelang.org/docs/concept/data-validation-use-case/ + +8. **Declarative Management of Kubernetes Objects Using Kustomize** — Kubernetes docs. + Canonical base/overlay layering pattern. + https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/ + +9. **Store config in the environment** — The Twelve-Factor App. + Foundational "separate config from code" principle underpinning the kind-separation. + https://12factor.net/config + +## Configuration-as-data + +10. **Introducing ConfigHub** — Brian Grant, ITNEXT. + Closest direct competitor; "configuration as authoritative data," WET rendered + config, versioned units, live-state reconciliation. + https://itnext.io/introducing-confighub-b127736641c5 + +11. **What is Configuration as Data?** — Brian Grant, ITNEXT. + Primary articulation of CaD vs IaC; data is authoritative, code operates on it separately. + https://itnext.io/what-is-configuration-as-data-210b0c4be324 + +12. **Configuration as Data** — ConfigHub docs. + Product-doc treatment of the same concept, incl. updating config from live state. + https://docs.confighub.com/background/config-as-data/ + +## GitOps / drift / desired vs effective state + +13. **GitOps Prescription: Curing the Configuration Drift Epidemic** — BridgePhase. + Desired-state vs live-state reconciliation, drift detection/self-healing with + Argo CD and Flux. + https://bridgephase.com/insights/drift-detection/ + +## Feature flags, progressive delivery, AI-era config + +14. **Kill switches vs progressive delivery** — Unleash. + Ring-based rollout, blast-radius containment, kill switch / circuit-breaker patterns. + https://www.getunleash.io/blog/kill-switch-vs-progressive-delivery + +15. **7 Advanced Feature Flagging Best Practices for 2025** — OpsMoon. + Progressive delivery cohorts, SLO-triggered automated rollback. + https://opsmoon.com/blog/feature-flagging-best-practices/ + +16. **AI Configs is now GA: Runtime control for prompts and models** — LaunchDarkly. + Prompts/model selection as runtime config; <200ms propagation; guarded rollouts + that auto-revert on eval-metric regression. + https://launchdarkly.com/blog/ai-configs-ga-runtime-control-prompts-models/ + +17. **LaunchDarkly launches runtime control layer for the agentic AI era** — SiliconANGLE. + Independent coverage of AgentControl; runtime control of AI agents without redeploy. + https://siliconangle.com/2026/05/19/launchdarkly-launches-runtime-control-layer-agentic-ai-era/ + +## Outages — why configuration safety matters + +4. **Configuration Change Trouble & Other 2024 Outage Trends** — ThousandEyes. + Names configuration change as a leading recurring outage cause. + https://www.thousandeyes.com/blog/internet-report-configuration-change-outages + +5. **8 major IT disasters of 2024** — CIO. + CrowdStrike Falcon config update, AT&T equipment config error, McDonald's POS + third-party config change. + https://www.cio.com/article/3624552/8-major-it-disasters-of-2024.html + +7. **Azure Front Door Outage: How a Single Control-Plane Defect Exposed Architectural Fragility** — InfoQ. + Control-plane defect as outage cause; reinforces the control-plane safety thesis. + https://www.infoq.com/news/2025/11/azure-afd-control-plane-failure/ + diff --git a/wiki/BrandFrame.md b/wiki/BrandFrame.md new file mode 100644 index 0000000..a359f94 --- /dev/null +++ b/wiki/BrandFrame.md @@ -0,0 +1,7 @@ +ConfigAtlas + +The Configuration Control Plane for discovering, mapping, and governing +the living configuration surface of fast-moving companies. + +Reveal every configuration scope, understand every override, +and govern change safely across systems, teams, tenants, and environments. \ No newline at end of file diff --git a/wiki/CompetitiveLandscape.md b/wiki/CompetitiveLandscape.md new file mode 100644 index 0000000..c006098 --- /dev/null +++ b/wiki/CompetitiveLandscape.md @@ -0,0 +1,378 @@ +## Competitive landscape: Configuration Control Plane + +As of June 26, 2026, “Configuration Control Plane” looks like an emerging category, not yet a mature analyst-defined software segment. The problem is recognized, though: modern configuration is increasingly treated as a live control surface that changes production behavior, affects reliability, and needs staged rollout, policy enforcement, rollback, blast-radius control, and explainability. + +- https://www.infoq.com/articles/configuration-control-plane + +For ConfigAtlas, the competition is therefore not one category. It is a converging market made from several adjacent tool families. + +## 1. Direct and near-direct competitors + +These are closest to the product idea. + + +Player | What they do | Relevance to ConfigAtlas +-- | -- | -- +ConfigHub | Treats configuration as authoritative data, not generated files. It emphasizes API-based config reads/writes, versioned config units, WET “fully rendered” config, validation, policy checks, and live-state reconciliation. (ITNEXT) | Very close conceptual competitor. Strongest direct watch item. More focused on configuration-as-data and deployment operations than companywide discovery/governance. +Configu | Open-source / cloud “Configuration-as-Code” platform for managing application configuration across environments, with validation, dependency checks, integrations, secrets/feature flag awareness, and automation across storage systems. (configu.com) | Directly relevant for application config and ConfigOps. Less obviously positioned around organizational scope discovery, ownership graphs, or effective-config intelligence. +Pulumi ESC | Manages hierarchical environments, secrets, and configuration; supports composing environments, secret management, dynamic values from providers, and use from apps or Pulumi IaC. (pulumi) | Strong in environment/secrets/config composition. More developer/IaC-oriented than enterprise-wide configuration cartography. +Humanitec + Score | Humanitec’s Platform Orchestrator generates deployment configuration from Score workload definitions; Score aims to provide platform-agnostic workload configuration and reduce environment inconsistency. (Humanitec) | Competes where the problem is “how do workloads get configured consistently?” Less focused on discovering existing scattered config and overlapping responsibilities. +Crossplane | A framework for building cloud-native control planes and declarative platform APIs. (docs.crossplane.io) | Not a config intelligence product, but a powerful “build your own control plane” substrate. Potential integration or infrastructure-layer competitor. + +
ConfigHub is the most dangerous direct competitor because it has a very similar category instinct: configuration as structured data, API-addressable, versioned, queryable, validated, and operationally safer than template-driven Git workflows. (ITNEXT)
ConfigAtlas differentiation: go broader and more discovery-first: organizational config cartography, existing-tool ingestion, ownership and scope graph, unknown-unknown discovery, and effective-config explanation.
Large enterprises may assume this belongs in ServiceNow or another CMDB. ServiceNow defines CMDB around CIs and relationships across infrastructure and services. (ServiceNow)
ConfigAtlas differentiation: CMDBs know assets; ConfigAtlas knows layered behavioral control information. Integrate rather than replace.
Feature management platforms already own runtime behavior changes and progressive delivery. LaunchDarkly explicitly markets runtime control, progressive release, automated rollback, AI agent control, and cost/performance optimization for AI workloads. (LaunchDarkly)
ConfigAtlas differentiation: treat feature flags as one class of configuration scope among many, not the whole control plane.
Humanitec/Score is strong where the buyer wants standardized workload configuration and developer self-service. (Humanitec)
ConfigAtlas differentiation: discover and govern config across the company, including legacy and already-existing config, not only platform-generated workload config.
They validate that SaaS configuration drift and tenant resilience are becoming board-level concerns, especially in Microsoft 365 and SaaS-heavy companies. (coreview.com)
ConfigAtlas differentiation: become the broader cross-domain configuration map, while SSPM remains a specialized security-posture input.
The best initial wedge is read-first configuration intelligence, not write-first control.
Start with:
discover config sources
+classify config by kind and scope
+build ownership graph
+detect duplicates and conflicts
+show effective config paths
+surface unknown owners and risky overrides
+generate audit/evidence reports
+integrate with existing tools
+Only later add:
controlled changes
+approval workflows
+policy enforcement
+safe rollout
+rollback orchestration
+runtime override management
+That reduces adoption friction. Companies are more willing to connect a discovery and evidence layer than to hand over control of production configuration on day one.
The market is real but fragmented. The exact phrase Configuration Control Plane is not yet fully owned, which is good. The strongest adjacent categories are already crowded, but none of them fully cover the companywide living configuration surface.
ConfigAtlas has a credible opening if it becomes the map, resolver, and evidence layer across existing systems.
The sharpest positioning:
ConfigAtlas is not where all configuration must live. It is where configuration becomes visible, explainable, governable, and safe to change.