RegEM: Regularized Expectation Maximization

RegEM is a software package that provides regularized variants of the classical expectation maximization algorithm for estimating statistics from and filling in missing values in incomplete datasets. It is widely used, for example, for imputing missing values in climate and other datasets and for estimating information about past climates from proxies such as tree-ring widths.

ARfit: Multivariate Autoregressive Model Fitting

ARfit is a software package for autoregressive (AR) time series modeling. It can estimate multivariate AR models from time series data, analyze spectral information (eigenmodes or principal oscillation patterns) of fitted models, and simulate time series. It is widely used, for example, for modeling and analyzing climate and finance time series and electroencephalograms.

GCMs: General Circulation Models

We use a variety of GCMs for our research, from dry atmosphere models to relatively complex coupled atmosphere-ocean models. Our model codes are freely available.

PyCLES: A Python-Based Large-Eddy Simulation Infrastructure

PyCLES is a Python-based large-eddy simulation (LES) code for the simulation of clouds and boundary layers.

RegEM: Regularized Expectation Maximization

Purpose

RegEM is a collection of Matlab modules for

  • the estimation of mean values and covariance matrices from incomplete datasets, and
  • the imputation of missing values in incomplete datasets.

The modules implement the regularized EM algorithm described in

T. Schneider, 2001: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate, 14, 853-871.

The EM algorithm for Gaussian data is based on iterated linear regression analyses. In the regularized EM algorithm, a regularized estimation method replaces the conditional maximum likelihood estimation of regression parameters in the conventional EM algorithm for Gaussian data. The modules here provide truncated total least squares (with fixed truncation parameter) and ridge regression with generalized cross-validation as regularized estimation methods.

The implementation of the regularized EM algorithm is modular, so that the modules that perform he regularized estimation of regression parameters (e.g., ridge regression and generalized cross-validation) can be exchanged for other regularization methods and other methods of determiningca regularization parameter. Per-Christian Hansen’s Regularization Tools contain Matlab modules implementing a collection of regularization methods that can be adapted to fit into the framework of the EM algorithm. The generalized cross-validation modules of the regularized EM algorithm are adapted from Hansen’s generalized cross-validation modules.

In the Matlab implementation of the regularized EM algorithm, more emphasis was placed on the modularity of the program code than on computational efficiency.

The U.S. National Science Foundation’s Paleo Perspectives on Climate Change program provided funding for recent further developments of the RegEM algorithm.

 


Installation

The program package consists of several Matlab modules, all available on Github. To install the programs, download the package into a directory that is accessible by Matlab.

Starting Matlab and invoking Matlab’s online help function

help filename

displays information on the module filename.m.

ARfit: Multivariate Autoregressive Model Fitting

Purpose

ARfit is a collection of Matlab modules for

  • estimating parameters of multivariate autoregressive (AR) models,
  • diagnostic checking of fitted AR models, and
  • analyzing eigenmodes of fitted AR models.

The algorithms implemented in ARfit are described in the following papers, which should be referenced if you use ARfit in publications:

A. Neumaier and T. Schneider, 2001: Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans. Math. Softw., 27, 27-57.

T. Schneider and A. Neumaier, 2001: Algorithm 808: ARfit – A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans. Math. Softw., 27, 58-65.

ARfit includes support for multiple realizations (trials) of time series and can estimate parameters of multivariate AR models taking all available realizations into account.

Last ARfit revision: 1 December 2010

 


Installation

The ARfit package consists of a number of Matlab modules, all available on Github. To install Matlab, download the package and copy it into a directory accessible by Matlab.

Starting Matlab and invoking Matlab’s online help function

help filename

calls up detailed information on the purpose and the calling syntax of the module filename.m. The script ardem.m demonstrates the basic features of the modules contained in ARfit.


Scilab version

ARfit is also available for Scilab (provided by Holger Nahrstaedt).

GCMs: General Circulation Models

We use a variety of GCMs for our research, from dry atmosphere models to relatively complex coupled atmosphere-ocean models. The GCMs we use are built using the Flexible Modeling System (FMS) of NOAA’s Geophysical Fluid Dynamics Laboratory. For idealized GCMs, we use the FMS dynamical core (that is, the basic numerical schemes FMS provides for the hydrostatic primitive equations), with various idealizations for the lower boundary conditions, for radiative transfer, and for moist or dry convection.

Several dry and moist idealized GCMs are available in a common package available at Github. The package contains rudimentary instructions on how to run the models and sample output and log files with parameter settings for a few model configurations (dry, moist, moist with bucket surface hydrology).

The models were developed with support by the U.S. National Science Foundation (NSF) and the National Aeronautics and Space Administration (NASA).

 


Dry Idealized GCM

The dry idealized GCM has a spherical model surface that is spatially uniform and thermally insulating. Radiative heating and cooling are represented by Newtonian relaxation of temperatures toward prescribed radiative-equilibrium states. Because the radiative-equilibrium states are statically unstable, it is necessary to include a representation of convection. (The GCM can be run without a convection scheme, but the results become strongly resolution-dependent.) In the GCM, if a layer is statically unstable relative to a specified convective temperature lapse rate, a convection scheme relaxes temperatures in that layer toward an enthalpy-conserving profile with the convective lapse rate. The convective lapse rate can be specified to be smaller than the dry-adiabatic lapse rate, in which case the convection schemes mimics the stabilizing effect latent heat release has in moist convection. This makes it possible to obtain in this dry GCM mean circulations that resemble those of Earth’s atmosphere.

The GCM is described in

Schneider, T., and C. Walker, 2006: Self-organization of atmospheric macroturbulence into critical states of weak nonlinear eddy-eddy interactions. J. Atmos. Sci., 63, 1569-1586.

The radiative-equilibrium states toward which temperatures are relaxed are those of a semi-gray atmosphere, described in

Schneider, T., 2004: The tropopause and the thermal stratification in the extratropics of a dry atmosphere. J. Atmos. Sci., 61,1317-1340.


Moist Idealized GCM

The moist idealized GCM has a spherical model surface that is entirely water covered (an “aquaplanet”). By default, water at the surface does not move (a “slab ocean”), but it is also possible to prescribe ocean heat transport or to take wind-driven ocean heat transport in low latitudes into account through a simple one-dimensional model driven by surface winds. Radiative heating and cooling are represented by a two-stream radiative transfer scheme for a gray atmosphere, in which absorption and emission of solar and of thermal radiation do not depend on wavelength. The surface has a closed energy budget, with heating driving sensible heat fluxes and evaporation. Atmospheric water vapor is advected by the flow; it condenses and precipitates when it reaches saturation on the grid-scale of the model. The thermodynamics of water are simplified in that only the vapor-liquid phase transition is taken into account, and the latent heat of vaporization is taken to be constant, as in Frierson et al. (2006). Moist convection is represented by a simplified Betts-Miller convection scheme that relaxes temperatures toward a moist adiabat and specific humidities toward a profile with a prescribed relative humidity. (The convection scheme is a slight variant of that described in Frierson (2007).)

The GCM is described in

O’Gorman, P. A., and T. Schneider, 2008: The hydrological cycle over a wide range of climates simulated with an idealized GCM. J. Climate, 21, 3815-3832.

How the atmosphere can be coupled to wind-driven ocean heat transport goes back to Klinger and Marotzke (2000) and is described in

Levine, X. J., and T. Schneider, 2011: Response of the Hadley circulation to climate change in an aquaplanet GCM coupled to a simple representation of ocean heat transport. J. Atmos. Sci., 68, 769-783.


Planetary GCMs

The idealized GCMs described above are flexible and general enough that it is relatively straightforward to convert them into GCMs for other planets, for example, with different planetary rotation rates and radii and with different thermodynamic properties of the atmosphere, condensable species, and surface. Included in the distribution available here is code for simulations of the upper atmospheres of Jupiter and Saturn and of the troposphere and lower stratosphere of the Saturn moon Titan.

In these planetary GCMs, we use a relatively simple two-stream radiative transfer for scattering and absorbing atmospheres, with assumed diffuse incident of solar radiation at the top of the model domain. We use similar dry and moist convection schemes as in the dry and moist models above, for Titan adapted to the thermodynamic properties of methane as the condensable species. For the giant planet atmospheres (Jupiter, Saturn), there is an artificial lower boundary, at which drag acts and from which an intrinsic heat flux can emanate.

The GCMs are described in

Schneider, T., and J. Liu, 2009: Formation of jets and equatorial superrotation on Jupiter. J. Atmos. Sci., 66, 579-601.

Liu, J., and T. Schneider, 2010: Mechanisms of jet formation on the giant planets. J. Atmos. Sci., 67, 3652-3672.

Schneider, T., S. D. Graves, E. L. Schaller, and M. E. Brown, 2012: Polar methane accumulation and rain storms on Titan from simulations of the methane cycle. Nature, 481, 58-61.

PyCLES

PyCLES is a Python-based large-eddy simulation (LES) code, whose development is led by Kyle Pressel.  The source code, including test cases, can be downloaded from Github. We welcome further development of the code by the user community.

PyCLES is described in:

Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. Journal of Advances in Modeling Earth Systems,  7, 1425–1456, doi:10.1002/2015MS000496.

A novel forcing framework with closed surface energy balance has been implemented in PyCLES as described in:

Tan, Z., T. Schneider, J. Teixeira, and K. G. Pressel, 2016: Large-eddy simulation of subtropical cloud-topped boundary layers. Part I: A forcing framework with closed surface energy balance. Journal of Advances in Modeling Earth Systems, in press.