## RegEM: Regularized Expectation Maximization

RegEM is a software package that provides regularized variants of the classical expectation maximization algorithm for estimating statistics from and filling in missing values in incomplete datasets. It is widely used, for example, for imputing missing values in climate and other datasets and for estimating information about past climates from proxies such as tree-ring widths.

## ARfit: Multivariate Autoregressive Model Fitting

ARfit is a software package for autoregressive (AR) time series modeling. It can estimate multivariate AR models from time series data, analyze spectral information (eigenmodes or principal oscillation patterns) of fitted models, and simulate time series. It is widely used, for example, for modeling and analyzing climate and finance time series and electroencephalograms.

## GCMs: General Circulation Models

We use a variety of GCMs for our research, from dry atmosphere models to relatively complex coupled atmosphere-ocean models. Our model codes are freely available.

## PyCLES: A Python-Based Large-Eddy Simulation Infrastructure

PyCLES is a Python-based large-eddy simulation (LES) code for the simulation of clouds and boundary layers.

## RegEM: Regularized Expectation Maximization

### Purpose

What follows is a collection of Matlab modules for

- the estimation of mean values and covariance matrices from incomplete datasets, and
- the imputation of missing values in incomplete datasets.

The modules implement the regularized EM algorithm described in

T. Schneider, 2001: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. *Journal of Climate*, **14**, 853-871.

The EM algorithm for Gaussian data is based on iterated linear regression analyses. In the regularized EM algorithm, a regularized estimation method replaces the conditional maximum likelihood estimation of regression parameters in the conventional EM algorithm for Gaussian data. The modules here provide truncated total least squares (with fixed truncation parameter) and ridge regression with generalized cross-validation as regularized estimation methods.

The implementation of the regularized EM algorithm is modular, so that the modules that perform he regularized estimation of regression parameters (e.g., ridge regression and generalized cross-validation) can be exchanged for other regularization methods and other methods of determiningca regularization parameter. Per-Christian Hansen’s Regularization Tools contain Matlab modules implementing a collection of regularization methods that can be adapted to fit into the framework of the EM algorithm. The generalized cross-validation modules of the regularized EM algorithm are adapted from Hansen’s generalized cross-validation modules.

In the Matlab implementation of the regularized EM algorithm, more emphasis was placed on the modularity of the program code than on computational efficiency. The regularized EM algorithm is currently being developed further under a project funded by the U.S. National Science Foundation’s Paleo Perspectives on Climate Change program

### Installation

The program package consists of several Matlab modules. To install the programs, copy the package (available as a tar.gz-file) into a directory that is accessible by Matlab. Unpack the package using

`gunzip imputation.tar.gz`

tar -xvf imputation.tar

Starting Matlab and invoking Matlab’s online help function

`help filename`

displays information on the module `filename.m`

.

### Module Descriptions

- CHANGES
- Recent significant changes of the programs.
- center.m
- Centers data by subtracting the mean.
- gcvfctn.m (auxiliary module to gcvridge.m)
- Evaluates generalized cross-validation function.
- gcvridge.m
- Finds minimum of generalized cross-validation function for ridge regression.
- iridge.m
- Computes regression parameters by individual ridge regressions.
- kcv_ttls.m
- Selects truncation parameter for TTLS by K-fold cross-validation.
- kcvindices.m
- Returns random indices for K-fold cross-validation.
- missingness_patterns.m
- Returns unique patterns of missing values in a data matrix.
- mridge.m
- Computes regression parameters by a multiple ridge regression.
- nancov.m
- Sample covariance matrix of available values in incomplete dataset.
- nanmean.m
- Sample mean of available values in incomplete dataset.
- nanstd.m
- Standard deviation of available values in incomplete dataset.
- nansum.m
- Sum over available values in incomplete dataset.
- pca_truncation_criteria.m
- Computes criteria for truncating principal component analyses
- peigs.m
- Computes positive eigenvalues and corresponding eigenvectors.
- pttls.m
- Computes regression parameters by truncated total least squares.
- regem.m
- Driver module for regularized EM algorithm.
- standardize.m
- Standardizes data by subtracting the mean and scaling with the standard deviation.

## ARfit: Multivariate Autoregressive Model Fitting

### Purpose

ARfit is a collection of Matlab modules for

- estimating parameters of multivariate autoregressive (AR) models,
- diagnostic checking of fitted AR models, and
- analyzing eigenmodes of fitted AR models.

The algorithms implemented in ARfit are described in the following papers, which should be referenced if you use ARfit in publications:

A. Neumaier and T. Schneider, 2001: Estimation of parameters and eigenmodes of multivariate autoregressive models. *ACM Trans. Math. Softw*., **27**, 27-57.

T. Schneider and A. Neumaier, 2001: Algorithm 808: ARfit – A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. *ACM Trans. Math*. Softw., **27**, 58-65.

ARfit includes support for multiple realizations (trials) of time series and can estimate parameters of multivariate AR models taking all available realizations into account.

Last ARfit revision: 1 December 2010

### Installation

The ARfit package consists of a number of Matlab modules, the file CHANGES with a history of recent revisions of the programs, and the above papers.

To install ARfit, copy the package (available as a zip-archive) into a directory that is accessible by Matlab. Unpack the package using

`unzip arfit.zip`

on Unix/Linux platforms or an equivalent command on other platforms.

Starting Matlab and invoking Matlab’s online help function

`help filename`

calls up detailed information on the purpose and the calling syntax of the module `filename.m`

. The script ardem.m demonstrates the basic features of the modules contained in ARfit.

If you experience problems downloading ARfit in the packaged form, you may want to download the ARfit files individually.

### Module descriptions

- CHANGES
- A history of recent changes to ARfit.
- acf.m
- Plots the sample autocorrelation function of a univariate time series (using XCORR from the Matlab Signal Processing Toolbox).
- adjph.m (auxiliary routine)
- Multiplies a complex vector by a phase factor such that the real part and the imaginary part of the vector are orthogonal and the norm of the real part is greater than or equal to the norm of the imaginary part. ADJPH is required by ARMODE to normalize the eigenmodes of an AR model.
- arconf.m
- Computes approximate confidence intervals for the AR model coefficients.
- ardem.m
- Demonstrates the use of modules contained in the ARfit package.
- arfit.m
- Stepwise selection of the order of an AR model and least squares estimation of AR model parameters.
- arfit.pdf
- Published description of the algorithms.
- arfit_alg.pdf
- Published note on using ARfit.
- armode.m
- Eigendecomposition of AR model. For a fitted AR model, ARMODE computes eigenmodes and their associated oscillation periods and damping times, as well as approximate confidence intervals for the eigenmodes, periods, and damping times.
- arord.m (auxiliary routine)
- Computes approximate order selection criteria for a sequence of AR models. ARORD is required by ARFIT.
- arqr.m (auxiliary routine)
- QR factorization for least squares estimation of AR model parameters. ARQR is required by ARFIT.
- arres.m
- Diagnostic checking of the residuals of a fitted model. Computes the time series of residuals. The modified multivariate portmanteau statistic of Li & McLeod (1981) is used to test the residuals for uncorrelatedness.
- arsim.m
- Simulation of AR processes.
- tquant.m (auxiliary routine)
- Quantiles of Student’s t distribution. (TQUANT is required by ARCONF and ARMODE in the construction of confidence intervals.)

### Scilab version

ARfit is also available for Scilab (provided by Holger Nahrstaedt).

## PyCLES

PyCLES is a Python-based large-eddy simulation (LES) code, whose development is led by Kyle Pressel. The source code, including test cases, can be downloaded from Github. We welcome further development of the code by the user community.

PyCLES is described in:

Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. *Journal of Advances in Modeling Earth Systems*, **7**, 1425–1456, doi:10.1002/2015MS000496.

A novel forcing framework with closed surface energy balance has been implemented in PyCLES as described in:

Tan, Z., T. Schneider, J. Teixeira, and K. G. Pressel, 2016: Large-eddy simulation of subtropical cloud-topped boundary layers. Part I: A forcing framework with closed surface energy balance. *Journal of Advances in Modeling Earth Systems*, in press.