RegEM: Regularized Expectation Maximization

RegEM is a software package that provides regularized variants of the classical expectation maximization algorithm for estimating statistics from and filling in missing values in incomplete datasets. It is widely used, for example, for imputing missing values in climate and other datasets and for estimating information about past climates from proxies such as tree-ring widths.

ARfit: Multivariate Autoregressive Model Fitting

ARfit is a software package for autoregressive (AR) time series modeling. It can estimate multivariate AR models from time series data, analyze spectral information (eigenmodes or principal oscillation patterns) of fitted models, and simulate time series. It is widely used, for example, for modeling and analyzing climate and finance time series and electroencephalograms.

GCMs: General Circulation Models

We use a variety of GCMs for our research, from dry atmosphere models to relatively complex coupled atmosphere-ocean models. Our model codes are freely available.

PyCLES: A Python-Based Large-Eddy Simulation Infrastructure

PyCLES is a Python-based large-eddy simulation (LES) code for the simulation of clouds and boundary layers.

RegEM: Regularized Expectation Maximization

Purpose

What follows is a collection of Matlab modules for

  • the estimation of mean values and covariance matrices from incomplete datasets, and
  • the imputation of missing values in incomplete datasets.

The modules implement the regularized EM algorithm described in

T. Schneider, 2001: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. Journal of Climate, 14, 853-871.

The EM algorithm for Gaussian data is based on iterated linear regression analyses. In the regularized EM algorithm, a regularized estimation method replaces the conditional maximum likelihood estimation of regression parameters in the conventional EM algorithm for Gaussian data. The modules here provide truncated total least squares (with fixed truncation parameter) and ridge regression with generalized cross-validation as regularized estimation methods.

The implementation of the regularized EM algorithm is modular, so that the modules that perform he regularized estimation of regression parameters (e.g., ridge regression and generalized cross-validation) can be exchanged for other regularization methods and other methods of determiningca regularization parameter. Per-Christian Hansen’s Regularization Tools contain Matlab modules implementing a collection of regularization methods that can be adapted to fit into the framework of the EM algorithm. The generalized cross-validation modules of the regularized EM algorithm are adapted from Hansen’s generalized cross-validation modules.

In the Matlab implementation of the regularized EM algorithm, more emphasis was placed on the modularity of the program code than on computational efficiency. The regularized EM algorithm is currently being developed further under a project funded by the U.S. National Science Foundation’s Paleo Perspectives on Climate Change program

 


Installation

The program package consists of several Matlab modules. To install the programs, copy the package (available as a tar.gz-file) into a directory that is accessible by Matlab. Unpack the package using

gunzip imputation.tar.gz
tar -xvf imputation.tar

Starting Matlab and invoking Matlab’s online help function

help filename

displays information on the module filename.m.

 


Module Descriptions

CHANGES
Recent significant changes of the programs.
center.m
Centers data by subtracting the mean.
gcvfctn.m (auxiliary module to gcvridge.m)
Evaluates generalized cross-validation function.
gcvridge.m
Finds minimum of generalized cross-validation function for ridge regression.
iridge.m
Computes regression parameters by individual ridge regressions.
kcv_ttls.m
Selects truncation parameter for TTLS by K-fold cross-validation.
kcvindices.m
Returns random indices for K-fold cross-validation.
missingness_patterns.m
Returns unique patterns of missing values in a data matrix.
mridge.m
Computes regression parameters by a multiple ridge regression.
nancov.m
Sample covariance matrix of available values in incomplete dataset.
nanmean.m
Sample mean of available values in incomplete dataset.
nanstd.m
Standard deviation of available values in incomplete dataset.
nansum.m
Sum over available values in incomplete dataset.
pca_truncation_criteria.m
Computes criteria for truncating principal component analyses
peigs.m
Computes positive eigenvalues and corresponding eigenvectors.
pttls.m
Computes regression parameters by truncated total least squares.
regem.m
Driver module for regularized EM algorithm.
standardize.m
Standardizes data by subtracting the mean and scaling with the standard deviation.

ARfit: Multivariate Autoregressive Model Fitting

Purpose

ARfit is a collection of Matlab modules for

  • estimating parameters of multivariate autoregressive (AR) models,
  • diagnostic checking of fitted AR models, and
  • analyzing eigenmodes of fitted AR models.

The algorithms implemented in ARfit are described in the following papers, which should be referenced if you use ARfit in publications:

A. Neumaier and T. Schneider, 2001: Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans. Math. Softw., 27, 27-57.

T. Schneider and A. Neumaier, 2001: Algorithm 808: ARfit – A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans. Math. Softw., 27, 58-65.

ARfit includes support for multiple realizations (trials) of time series and can estimate parameters of multivariate AR models taking all available realizations into account.

Last ARfit revision: 1 December 2010

 


Installation

The ARfit package consists of a number of Matlab modules, the file CHANGES with a history of recent revisions of the programs, and the above papers.

To install ARfit, copy the package (available as a zip-archive) into a directory that is accessible by Matlab. Unpack the package using

unzip arfit.zip

on Unix/Linux platforms or an equivalent command on other platforms.

Starting Matlab and invoking Matlab’s online help function

help filename

calls up detailed information on the purpose and the calling syntax of the module filename.m. The script ardem.m demonstrates the basic features of the modules contained in ARfit.

If you experience problems downloading ARfit in the packaged form, you may want to download the ARfit files individually.

 


Module descriptions

CHANGES
A history of recent changes to ARfit.
acf.m
Plots the sample autocorrelation function of a univariate time series (using XCORR from the Matlab Signal Processing Toolbox).
adjph.m (auxiliary routine)
Multiplies a complex vector by a phase factor such that the real part and the imaginary part of the vector are orthogonal and the norm of the real part is greater than or equal to the norm of the imaginary part. ADJPH is required by ARMODE to normalize the eigenmodes of an AR model.
arconf.m
Computes approximate confidence intervals for the AR model coefficients.
ardem.m
Demonstrates the use of modules contained in the ARfit package.
arfit.m
Stepwise selection of the order of an AR model and least squares estimation of AR model parameters.
arfit.pdf
Published description of the algorithms.
arfit_alg.pdf
Published note on using ARfit.
armode.m
Eigendecomposition of AR model. For a fitted AR model, ARMODE computes eigenmodes and their associated oscillation periods and damping times, as well as approximate confidence intervals for the eigenmodes, periods, and damping times.
arord.m (auxiliary routine)
Computes approximate order selection criteria for a sequence of AR models. ARORD is required by ARFIT.
arqr.m (auxiliary routine)
QR factorization for least squares estimation of AR model parameters. ARQR is required by ARFIT.
arres.m
Diagnostic checking of the residuals of a fitted model. Computes the time series of residuals. The modified multivariate portmanteau statistic of Li & McLeod (1981) is used to test the residuals for uncorrelatedness.
arsim.m
Simulation of AR processes.
tquant.m (auxiliary routine)
Quantiles of Student’s t distribution. (TQUANT is required by ARCONF and ARMODE in the construction of confidence intervals.)

Scilab version

ARfit is also available for Scilab (provided by Holger Nahrstaedt).

GCMs: General Circulation Models

We use a variety of GCMs for our research, from dry atmosphere models to relatively complex coupled atmosphere-ocean models. The GCMs we use are built using the Flexible Modeling System (FMS) of NOAA’s Geophysical Fluid Dynamics Laboratory. For idealized GCMs, we use the FMS dynamical core (that is, the basic numerical schemes FMS provides for the hydrostatic primitive equations), with various idealizations for the lower boundary conditions, for radiative transfer, and for moist or dry convection.

Several dry and moist idealized GCMs are available in a common package available at Github. The package contains rudimentary instructions on how to run the models and sample output and log files with parameter settings for a few model configurations (dry, moist, moist with bucket surface hydrology).

The models were developed with support by the U.S. National Science Foundation (NSF) and the National Aeronautics and Space Administration (NASA).

 


Dry Idealized GCM

The dry idealized GCM has a spherical model surface that is spatially uniform and thermally insulating. Radiative heating and cooling are represented by Newtonian relaxation of temperatures toward prescribed radiative-equilibrium states. Because the radiative-equilibrium states are statically unstable, it is necessary to include a representation of convection. (The GCM can be run without a convection scheme, but the results become strongly resolution-dependent.) In the GCM, if a layer is statically unstable relative to a specified convective temperature lapse rate, a convection scheme relaxes temperatures in that layer toward an enthalpy-conserving profile with the convective lapse rate. The convective lapse rate can be specified to be smaller than the dry-adiabatic lapse rate, in which case the convection schemes mimics the stabilizing effect latent heat release has in moist convection. This makes it possible to obtain in this dry GCM mean circulations that resemble those of Earth’s atmosphere.

The GCM is described in

Schneider, T., and C. Walker, 2006: Self-organization of atmospheric macroturbulence into critical states of weak nonlinear eddy-eddy interactions. J. Atmos. Sci., 63, 1569-1586.

The radiative-equilibrium states toward which temperatures are relaxed are those of a semi-gray atmosphere, described in

Schneider, T., 2004: The tropopause and the thermal stratification in the extratropics of a dry atmosphere. J. Atmos. Sci., 61,1317-1340.


Moist Idealized GCM

The moist idealized GCM has a spherical model surface that is entirely water covered (an “aquaplanet”). By default, water at the surface does not move (a “slab ocean”), but it is also possible to prescribe ocean heat transport or to take wind-driven ocean heat transport in low latitudes into account through a simple one-dimensional model driven by surface winds. Radiative heating and cooling are represented by a two-stream radiative transfer scheme for a gray atmosphere, in which absorption and emission of solar and of thermal radiation do not depend on wavelength. The surface has a closed energy budget, with heating driving sensible heat fluxes and evaporation. Atmospheric water vapor is advected by the flow; it condenses and precipitates when it reaches saturation on the grid-scale of the model. The thermodynamics of water are simplified in that only the vapor-liquid phase transition is taken into account, and the latent heat of vaporization is taken to be constant, as in Frierson et al. (2006). Moist convection is represented by a simplified Betts-Miller convection scheme that relaxes temperatures toward a moist adiabat and specific humidities toward a profile with a prescribed relative humidity. (The convection scheme is a slight variant of that described in Frierson (2007).)

The GCM is described in

O’Gorman, P. A., and T. Schneider, 2008: The hydrological cycle over a wide range of climates simulated with an idealized GCM. J. Climate, 21, 3815-3832.

How the atmosphere can be coupled to wind-driven ocean heat transport goes back to Klinger and Marotzke (2000) and is described in

Levine, X. J., and T. Schneider, 2011: Response of the Hadley circulation to climate change in an aquaplanet GCM coupled to a simple representation of ocean heat transport. J. Atmos. Sci., 68, 769-783.


Planetary GCMs

The idealized GCMs described above are flexible and general enough that it is relatively straightforward to convert them into GCMs for other planets, for example, with different planetary rotation rates and radii and with different thermodynamic properties of the atmosphere, condensable species, and surface. Included in the distribution available here is code for simulations of the upper atmospheres of Jupiter and Saturn and of the troposphere and lower stratosphere of the Saturn moon Titan.

In these planetary GCMs, we use a relatively simple two-stream radiative transfer for scattering and absorbing atmospheres, with assumed diffuse incident of solar radiation at the top of the model domain. We use similar dry and moist convection schemes as in the dry and moist models above, for Titan adapted to the thermodynamic properties of methane as the condensable species. For the giant planet atmospheres (Jupiter, Saturn), there is an artificial lower boundary, at which drag acts and from which an intrinsic heat flux can emanate.

The GCMs are described in

Schneider, T., and J. Liu, 2009: Formation of jets and equatorial superrotation on Jupiter. J. Atmos. Sci., 66, 579-601.

Liu, J., and T. Schneider, 2010: Mechanisms of jet formation on the giant planets. J. Atmos. Sci., 67, 3652-3672.

Schneider, T., S. D. Graves, E. L. Schaller, and M. E. Brown, 2012: Polar methane accumulation and rain storms on Titan from simulations of the methane cycle. Nature, 481, 58-61.

PyCLES

PyCLES is a Python-based large-eddy simulation (LES) code, whose development is led by Kyle Pressel.  The source code, including test cases, can be downloaded from Github. We welcome further development of the code by the user community.

PyCLES is described in:

Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. Journal of Advances in Modeling Earth Systems,  7, 1425–1456, doi:10.1002/2015MS000496.

A novel forcing framework with closed surface energy balance has been implemented in PyCLES as described in:

Tan, Z., T. Schneider, J. Teixeira, and K. G. Pressel, 2016: Large-eddy simulation of subtropical cloud-topped boundary layers. Part I: A forcing framework with closed surface energy balance. Journal of Advances in Modeling Earth Systems, in press.