# Statistical Inference with Emergent Constraints

Various attempts have been made to narrow the likely range of the equilibrium climate sensitivity (ECS) through exploitation of “emergent constraints.” They generally use correlations between the response of climate models to increasing greenhouse gas (GHG) concentrations and a quantity in principle observable in the present climate (e.g., an amplitude of natural fluctuations) to constrain ECS given measurements of the present-day observable. However, recent studies have arrived at different conclusions about likely ECS ranges. The different conclusions arise at least in part because the studies have systematically underestimated statistical uncertainties.

For example, Brown and Caldeira (2017) use fluctuations in Earth’s top-of-the-atmosphere (TOA) energy budget and their correlation with the response of climate models to increases in GHG concentrations to infer that ECS lies between 3 and 4.2 K with 50% probability, and most likely is 3.7 K. Assuming t statistics, this roughly corresponds to an ECS range that in IPCC parlance is considered likely (66% probability) between 2.8 and 4.5 K. By contrast, Cox et al. (2018) use fluctuations of the global-mean temperature and their correlation with the response of climate models to increases in GHG concentrations to infer that ECS likely lies between 2.2 and 3.4 K, and most likely is 2.8 K. These estimates are quite different from another, albeit not statistically significantly different. Why?

One reason is that the statistical inference procedure, which is similar in both studies, systematically underestimates uncertainties. One way to illustrate this is to look at the data Florent Brient and I analyzed in another emergent-constraint paper, which used fluctuations in TOA energy fluxes in marine tropical low-cloud (TLC) regions and their correlation with ECS (Brient and Schneider 2016, see blog post). [The data used in our paper are similar to those in Brown and Caldeira (2017).]

Figure 1a shows the relation in 29 current climate models between ECS and the strength with which the reflection of sunlight in TLC regions covaries with surface temperature. That is, the horizontal axis shows the percentage change in the reflection of sunlight per degree surface warming, for deseasonalized natural variations. It is clear that there is a strong correlation (correlation coefficient about -0.7) between ECS on the vertical axis and the natural fluctuations on the horizontal axis—an example of an empirical fluctuation-dissipation relation in the models. The green line on the horizontal axis indicates the probability density function (PDF) of the observed natural fluctuations. The center 66% of this PDF is indicated by the green shaded band.

What many previous emergent-constraint studies have done is to take such a band of observations and project it onto the vertical ECS axis using the estimated regression line between ECS and the natural fluctuations, taking into account uncertainties in the estimated regression model. This is what both Brown and Caldeira (2017) and Cox et al (2018) did, among several others. If we do this with the data here, we obtain an ECS that likely lies within the blue band: between 3.1 and 4.2 K, with a most likely value of 3.7 K. Simply looking at the scatter of the 29 models in this plot indicates that this uncertainty band is too narrow. For example, model 7 is consistent with the observations, but has a much lower ECS of 2.6 K. The regression analysis would imply that the probability of an ECS this low or lower is less than 4%. Yet this is one of 29 models, and one of relatively few (around 9) that are likely consistent with the data. Obviously, the probability of an ECS this low is much larger than what the regression analysis implies. What went wrong in the regression-based inference? [Update 01/25/18: Corrected the inferred ECS range in this paragraph and related text, which previously were incorrect because of coding errors.]

There are several problems with this kind of inference. Most fundamentally, the inference revolves around assuming that there exists a linear relationship, and estimating parameters in the linear relationship from climate models. But it is not clear that such a linear relationship does in fact exist, and estimating parameters in it is strongly influenced by models that are inconsistent with the observations, such as models 2 and 3, and to a lesser degree, model 28 in Figure 1. In other words, the analysis neglects structural uncertainty about the adequacy of the assumed linear model, and the parameter uncertainty the analysis does take into account is strongly reduced by models that are “bad” by this model-data mismatch metric. Models that are inconsistent with the data, such as models 2 and 3, strongly influence the result, whereas the influence of models such as 7, which are consistent with the data but off the regression line, is diminished (they primarily affect the ECS uncertainty through their contribution to the variance of residuals). Given that there is no strong a priori knowledge about any linear relationship—this is why it is an “emergent” constraint—it seems inadvisable to make one’s statistical inference strongly dependent on models that are not consistent with the data at hand.

There are several other problems. For example:

• Often analysis parameters (such as the choice of how the TOA energy fluxes are averaged in space) are chosen so as to give strong correlations between the response of models to increases in GHG (e.g., ECS) and the natural fluctuations. This introduces selection bias in the estimation of the regression lines, which leads to biased estimates and underestimation of uncertainties in parameters such as the slope of the regression line (e.g., Miller 1984). In other words, when analysis parameters and subsets of regression variables are chosen so as to make a correlation large, thereafter estimating the correlation leads to biased estimates with underestimated uncertainties. This underestimation of uncertainties propagates into underestimated ECS uncertainties.
• When regression parameters are estimated by least squares, the observable on the horizontal axis is treated as being a known predictor, rather than as being affected by error (e.g., from sampling variability). This likewise leads to underestimation of uncertainties in regression parameters. This problem can be mitigated by using errors-in-variables methods.

We in fact first tried to estimate ECS from the data in Figure 1a in the way described above, based on regression lines estimated by a robust regression method. But the uncertainties looked too small. So we developed an alternative inference procedure that does not suffer from some of the problems above. The idea is to arrive at a posterior PDF for ECS by weighting each model’s ECS by the likelihood of the model given the observations of the natural fluctuations. We used a measure from information theory, the Kullback–Leibler divergence or relative entropy, to estimate the logarithm of this model likelihood (Burnham and Anderson 2010). In this analysis, models such as numbers 2 and 3, which are inconsistent with observations, receive essentially zero weight—unlike in the regression-based analysis, they do not influence the final result. No linear relationship is assumed or implied, so models such as 7 receive a large weight because they are consistent with the data, although they lie far from any regression line. The resulting posterior PDF for ECS is shown by the orange line in Figure 1b. The most likely ECS value according to this analysis is 4.0 K—shifted upward relative to the regression estimate, toward the values in the cluster of models (around numbers 25 and 26) with relatively high ECS that are consistent with the observations. The likely ECS range stretches from 2.9 to 4.5 K. This is perhaps a disappointingly wide range. It is 50% wider than what the analysis based on linear regressions suggests, and it is not much narrower than what simple-minded equal weighting of raw climate models gives (gray line in Figure 1b). But it is a much more statistically defensible range.

Even such a more justifiable inference still suffers from several shortcomings. For example, it suffers from selection bias, and it treats the model ensemble as a random sample (which it is not). It also only weights models (see our previous discussion of this issue). ECS far outside the range of what current models produce will always come out as being very unlikely. Yet what is the probability that Earth may in fact have an ECS outside the range of the current models? It is quite possible that there are processes and feedbacks that all models miss, and the probability of that being the case may not be all that small, given, for example, the rudimentary state of modeling clouds and their climate feedback.

[Update 01/26/18: Data and code for the regression analysis based on Figure 1a are on GitHub. The multimodel inference procedure (somewhat similar to but not quite the same as Bayesian model averaging) used in Figure 1b is described in Brient and Schneider (2016), and the model weights are listed in the paper.]