Featured Post

Statistical Inference with Emergent Constraints

Various attempts have been made to narrow the likely range of the equilibrium climate sensitivity (ECS) through exploitation of “emergent constraints.” They generally use correlations between the response of climate models to increasing greenhouse gas (GHG) concentrations and a quantity in principle observable in the present climate (e.g., an amplitude of natural fluctuations) to constrain ECS given measurements of the present-day observable. However, recent studies have arrived at different conclusions about likely ECS ranges. The different conclusions arise at least in part because the studies have systematically underestimated statistical uncertainties. 

For example, Brown and Caldeira (2017) use fluctuations in Earth’s top-of-the-atmosphere (TOA) energy budget and their correlation with the response of climate models to increases in GHG concentrations to infer that ECS lies between 3 and 4.2 K with 50% probability, and most likely is 3.7 K. Assuming t statistics, this roughly corresponds to an ECS range that in IPCC parlance is considered likely (66% probability) between 2.8 and 4.5 K. By contrast, Cox et al. (2018) use fluctuations of the global-mean temperature and their correlation with the response of climate models to increases in GHG concentrations to infer that ECS likely lies between 2.2 and 3.4 K, and most likely is 2.8 K. These estimates are quite different from another, albeit not statistically significantly different. Why?

One reason is that the statistical inference procedure, which is similar in both studies, systematically underestimates uncertainties. One way to illustrate this is to look at the data Florent Brient and I analyzed in another emergent-constraint paper, which used fluctuations in TOA energy fluxes in marine tropical low-cloud (TLC) regions and their correlation with ECS (Brient and Schneider 2016, see blog post). [The data used in our paper are similar to those in Brown and Caldeira (2017).]

Figure 1:  (a) Scatterplot of ECS vs deseasonalized covariance of marine tropical low-cloud (TLC) reflectance \alpha_c with surface temperature T in CMIP5 models (numbered in order of increasing ECS). Gray lines represent a robust regression line (solid), with the 90%  confidence interval of the fitted values (dashed) estimated by a bootstrap procedure. The green line at the lower axis indicates the PDF of the deseasonalized TLC reflectance variation with surface temperature inferred from observations. The vertical green band indicates the 66% band of the observations. The blue horizontal band shows the likely (66%) ECS range inferred from a linear regression procedure, taking into account uncertainties estimated by bootstrapping predictions with estimating regression models. (b) Posterior PDF of ECS (orange) obtained by a weighted average of the climate models, given the observations. The bars with circles represent the mode and confidence intervals (66% and 90%) implied by the posterior (orange) PDF and the prior (gray) PDF. Adapted from Brient and Schneider (2016)[Update 01/25/18: Corrected blue shading band. Note that the dashed lines around the regression line mark the 90% confidence interval on the fitted ECS values, not on predicted ECS values; they are not directly used in the estimation of the ECS uncertainty based on the regression.]

Figure 1a shows the relation in 29 current climate models between ECS and the strength with which the reflection of sunlight in TLC regions covaries with surface temperature. That is, the horizontal axis shows the percentage change in the reflection of sunlight per degree surface warming, for deseasonalized natural variations. It is clear that there is a strong correlation (correlation coefficient about -0.7) between ECS on the vertical axis and the natural fluctuations on the horizontal axis—an example of an empirical fluctuation-dissipation relation in the models. The green line on the horizontal axis indicates the probability density function (PDF) of the observed natural fluctuations. The center 66% of this PDF is indicated by the green shaded band.

What many previous emergent-constraint studies have done is to take such a band of observations and project it onto the vertical ECS axis using the estimated regression line between ECS and the natural fluctuations, taking into account uncertainties in the estimated regression model. This is what both Brown and Caldeira (2017) and Cox et al (2018) did, among several others. If we do this with the data here, we obtain an ECS that likely lies within the blue band: between 3.1 and 4.2 K, with a most likely value of 3.7 K. Simply looking at the scatter of the 29 models in this plot indicates that this uncertainty band is too narrow. For example, model 7 is consistent with the observations, but has a much lower ECS of 2.6 K. The regression analysis would imply that the probability of an ECS this low or lower is less than 4%. Yet this is one of 29 models, and one of relatively few (around 9) that are likely consistent with the data. Obviously, the probability of an ECS this low is much larger than what the regression analysis implies. What went wrong in the regression-based inference? [Update 01/25/18: Corrected the inferred ECS range in this paragraph and related text, which previously were incorrect because of coding errors.]

There are several problems with this kind of inference. Most fundamentally, the inference revolves around assuming that there exists a linear relationship, and estimating parameters in the linear relationship from climate models. But it is not clear that such a linear relationship does in fact exist, and estimating parameters in it is strongly influenced by models that are inconsistent with the observations, such as models 2 and 3, and to a lesser degree, model 28 in Figure 1. In other words, the analysis neglects structural uncertainty about the adequacy of the assumed linear model, and the parameter uncertainty the analysis does take into account is strongly reduced by models that are “bad” by this model-data mismatch metric. Models that are inconsistent with the data, such as models 2 and 3, strongly influence the result, whereas the influence of models such as 7, which are consistent with the data but off the regression line, is diminished (they primarily affect the ECS uncertainty through their contribution to the variance of residuals). Given that there is no strong a priori knowledge about any linear relationship—this is why it is an “emergent” constraint—it seems inadvisable to make one’s statistical inference strongly dependent on models that are not consistent with the data at hand.

There are several other problems. For example:

  • Often analysis parameters (such as the choice of how the TOA energy fluxes are averaged in space) are chosen so as to give strong correlations between the response of models to increases in GHG (e.g., ECS) and the natural fluctuations. This introduces selection bias in the estimation of the regression lines, which leads to biased estimates and underestimation of uncertainties in parameters such as the slope of the regression line (e.g., Miller 1984). In other words, when analysis parameters and subsets of regression variables are chosen so as to make a correlation large, thereafter estimating the correlation leads to biased estimates with underestimated uncertainties. This underestimation of uncertainties propagates into underestimated ECS uncertainties.
  • When regression parameters are estimated by least squares, the observable on the horizontal axis is treated as being a known predictor, rather than as being affected by error (e.g., from sampling variability). This likewise leads to underestimation of uncertainties in regression parameters. This problem can be mitigated by using errors-in-variables methods.

We in fact first tried to estimate ECS from the data in Figure 1a in the way described above, based on regression lines estimated by a robust regression method. But the uncertainties looked too small. So we developed an alternative inference procedure that does not suffer from some of the problems above. The idea is to arrive at a posterior PDF for ECS by weighting each model’s ECS by the likelihood of the model given the observations of the natural fluctuations. We used a measure from information theory, the Kullback–Leibler divergence or relative entropy, to estimate the logarithm of this model likelihood (Burnham and Anderson 2010). In this analysis, models such as numbers 2 and 3, which are inconsistent with observations, receive essentially zero weight—unlike in the regression-based analysis, they do not influence the final result. No linear relationship is assumed or implied, so models such as 7 receive a large weight because they are consistent with the data, although they lie far from any regression line. The resulting posterior PDF for ECS is shown by the orange line in Figure 1b. The most likely ECS value according to this analysis is 4.0 K—shifted upward relative to the regression estimate, toward the values in the cluster of models (around numbers 25 and 26) with relatively high ECS that are consistent with the observations. The likely ECS range stretches from 2.9 to 4.5 K. This is perhaps a disappointingly wide range. It is 50% wider than what the analysis based on linear regressions suggests, and it is not much narrower than what simple-minded equal weighting of raw climate models gives (gray line in Figure 1b). But it is a much more statistically defensible range.

Even such a more justifiable inference still suffers from several shortcomings. For example, it suffers from selection bias, and it treats the model ensemble as a random sample (which it is not). It also only weights models (see our previous discussion of this issue). ECS far outside the range of what current models produce will always come out as being very unlikely. Yet what is the probability that Earth may in fact have an ECS outside the range of the current models? It is quite possible that there are processes and feedbacks that all models miss, and the probability of that being the case may not be all that small, given, for example, the rudimentary state of modeling clouds and their climate feedback.

[Update 01/26/18: Data and code for the regression analysis based on Figure 1a are on GitHub. The multimodel inference procedure (somewhat similar to but not quite the same as Bayesian model averaging) used in Figure 1b is described in Brient and Schneider (2016), and the model weights are listed in the paper.]


  1. Hi Tapio,

    This was a really informative post. The plain-language explanations of the various limitations of the linear regression approach to emergent constraints were especially useful. So, thank you!

    To my knowledge, the vast majority of emergent constraints-based work has focused on fields related to climate sensitivity — either ECS itself or specific feedbacks. But there seems to me no a priori reason why it couldn’t be applied to other aspects of the global warming response — e.g. regional precipitation change. Would you agree, or is there reason to suspect that emergent constraints are less well suited to hydrological fields and/or regional scales?


    1. Hi Spencer,

      Yes, I agree. To put it perhaps more broadly: Covariances (e.g., between natural variations in cloud cover and temperature, or between natural variations in carbon uptake and physical climate variables, as in Cox et al. 2013) can be very valuable metrics for model evaluation. And I would go further: we should use them not just to evaluate models, but to make component models (e.g., for clouds, the carbon cycle etc.) better.


  2. Dear Tapio

    It appears that you corrected your example Emergent Constraint on 25th Jan. Can you say what you did and why?
    Also, would you mind posting the (x,y) coordinates for the data in the scatter plot (your Fig 1a) along with the mean and stdev of the observational constraint?


    Peter Cox

    1. Dear Peter,

      What was there at first came from Florent’s old files, which did not contain what I first thought they contained. (The dangers of trying to come up with a quantitative illustration on the fly…) I then redid the analysis (Matlab code and data at GitHub, at the added link above). What I did is this:

      (1) Assume a regression model ECS = b_0 + b_1 x + \sigma \epsilon , where x is d\alpha/dT.
      (2) Estimate the parameters b_0, b_1, and \sigma from the climate model (ECS, x). Then predict ECS using the observational x.
      (3) Repeat (1) and (2) by bootstrapping pairs (ECS, x) from the climate models and predicting ECS, drawing normal noise terms \epsilon.

      There are several variants of this procedure possible (available in the code on GitHub). For example, one can use robust regression or ordinary least squares for the estimation. In the prediction, one can also draw from a (stationary) bootstrap sample of the observations (as we did in the paper), or just use the observational mean. All of these variants give essentially the same result. I think using standard regression inference as in your paper should be asymptotically equivalent (under the assumption of model adequacy). With OLS and using the mean observations as predictor, the estimated mean ECS is 3.68 K, standard deviation 0.58 K.


  3. This is a great post – thanks!

    It seems to me there are 2 natural ways to use emergent constraints. One approach is to interpret everything through the functional relationship between the constraint and the quantity to be predicted. This is basically the approach you’re criticizing in this note. The other approach (which you advocate) is to weight models based on their agreement with observations – basically throwing out models which seem unlikely. In my mind, the value of your note is in providing motivation for why one approach is better than the other. Would you agree with this characterization?

    Also, I’m not sure I agree with the argument that it’s wrong to assume a linear relationship exists. The whole goal of emergent-constraint hunting is to find predictors which we have reason to expect are tightly-related to our predictand. In this context even models whose predictors are far outside of the observational range can be valuable for clarifying the slope and uncertainty of the predictive relationship. It also seems unfair to say that the model-weighting approach is better because it doesn’t rely on the existence of a linear relationship when you *chose* the variable to compare against observations on the basis of that variable providing a good linear fit to your predictand. Are you objecting in particular to the assumption of a linear rather than a higher-order relationship? Or are you arguing against using data points which are far from observations? Note that my intuition is that model weighting is better than linear fitting… I’m just playing devil’s advocate here because the motivation for disliking linear fitting is still not entirely clear to me.

    Do you plan to publish this work? I’d like to cite it!

    Thanks again!

    1. Hi Peter,

      Thanks for your thoughts. My views are a bit more nuanced. A main point I wanted to make is that assuming a linear (or any other functional) relationship, without actually knowing a priori that it holds, leads to underestimation of uncertainties. The reason is that in the inference procedure, the functional relationship represented by the assumed model is treated as known, only with unknown parameters. Structural uncertainty about the assumed model is neglected; only parametric uncertainty is taken into account. If some fluctuation-dissipation relation for the climate system were known to exist, we would not need ’emergent’ constraints but would have actual constraints, and would have far fewer problems estimating climate sensitivity!

      Given that the (usually) linear relationship is not assured to hold, I am concerned about an inference procedure that relies on a linear relationship that is selected in part because it looks strong in models. It leads to selection bias, and it often (as in the example above) is most strongly constrained by models that are not consistent with data. As you say, that would be fine if there is in fact good scientific reason for the linear relationship to hold. But do we really believe this to be the case, e.g., for low-cloud and turbulence parameterizations, on which the relationship in the above example (and probably in many other emergent constraints) depends? That is, do we really believe that all or most parameterizations represent the relevant physics correctly, but just happen to exhibit (perhaps because of different parameter choices) weaker or stronger responses to perturbations? I am not convinced, and hence I would be cautious about making inferences dependent on models that are inconsistent with data.

      That being said, model weighting has its own issues. As you say, it still is affected by selection bias because it is also useful only if there is some clear (though not necessarily linear) empirical relationship between something observable and climate predictions. I do not want to advocate model weighting as the cure for all inference challenges with emergent constraints. But I do want to advocate for caution, as you nicely did in your 2014 paper on data mining for emergent constraints. I think we all need to be more cautious about stating probabilities for climate sensitivities.


      1. Hi Tapio,

        Thanks for the reply, which I wholeheartedly agree with. Perhaps we could say that probabilistic statements about climate sensitivity ignore uncertainty regarding the true relationship between predictor and predictand. That uncertainty can be broken down into 2 pieces: statements based on model weighting ignore uncertainty about how tight (and real) the constraint actually is, while statements based on an assumed functional relationship not only neglect uncertainty related to constraint validity, but also ignore uncertainty regarding what the correct functional relationship should actually be. In either case, assigning probabilities to ECS based on emergent constraints makes me uncomfortable.


  4. Dear Tapio,

    Thank you for this excellent post. I very largely agree with what you say, including the key point that it is quite possible that ECS may be outside the range of current models. I have a few comments, if I may.

    Regarding the emergent constraint used in Brient & Schneider (2016), it is noteworthy that if the models are weighted by reference to their consistency with the data, regression of ECS on TLC reflection variability explains almost none of the intermodel ECS variation – the R-squared is negligible. While that certainly justifies the rejection in the paper of the usual linear relationship assumption, I’m not sure that a likelihood-weighted model-averaging approach is very satisfactory either.

    One problem is that many models are closely related to and/or have similar characteristics to other models, but are nevertheless given full weightings. For instance, the method gives substantial weightings to both IPSL-CM5A-LR and IPSL-CM5A-MR, but these models differ only in their resolution.

    More fundamentally, if a constraint is satisfied by one or more low sensitivity models and one or more high sensitivity models, how can it be considered to give useful information about ECS? The fact that more of the constraint-satisfying models have high ECS than have low ECS seems very weak evidence that ECS is high. The models are, as we know, not a random sample.

    It is interesting that both IPSL-CM5A-LR and IPSL-CM5B-LR, respectively a high (4.1 K) and low (2.6 K) ECS model, closely satisfy the observational constraint. Although the later CM5B model scores marginally less well on your KL divergence measure, it has improved representation of the convective boundary layer and cumulus clouds, of tropical SW CRF and of mid-level cloud coverage. The fact that developing a model can leave its satisfaction of an observational constraint unaltered but drastically change its sensitivity seems to me to fatally undermine the emergent constraint approach in relation to any constraint for which this can occur.

    These points all support the need for caution regarding emergent constraints that you advocate.


    Nic Lewis

    PS The mean ECS of 3.45 ± 0.76 K given in Table 1 of Brient & Schneider 2016 looks wrong; the simple mean is 3.34 K. Maybe the median ECS (which is 3.45 K) was given by mistake?

    1. Dear Nic,

      Thanks for your comment. You are right to worry about the effect of correlations among models on any likelihood-weighted inference, e.g., about ECS. I tried to make the same point, perhaps too briefly, in the final paragraph of the post: Model weighting still assumes the models form a random sample, which is not the case, for the reasons you point out.

      If “a constraint is satisfied by one or more low sensitivity models and one or more high sensitivity models,” with model weighting, at least this would be reflected in the posterior ECS–in that it may not be much different from the prior. And of course, any scalar metric of model adequacy is incomplete at best and can lead to situations like the one you describe with the IPSL models, where a model may seem marginally worse by one scalar metric but is better by several other metrics.

      And I agree that too strong reliance on emergent constraints can me misleading.

      Finally, thanks for pointing out the error in our Table 1. We do indeed seem to have reported the median ECS (apparently because we talked about the median in other places in the paper). But Florent assures me the other values in the table are indeed mean values.


  5. Dear Tapio,

    Fundamental question.

    How you calculate the correlation between CO2 concentration and temperature and do you think that correlation occurs in both directions?

    There are many types of calculation and I would like to know how you count this correlation?

    In my opinion, correlation occurs in both directions.

    Best Regards

    Hannu (Finnish Meteorological Institute)

Comments are closed.