Augmenting Data Wisely to Speed up the EM Algorithm

Xiao-Li Meng and David A. van Dyk
Department of Statistics, The University of Chicago, Chicago, IL 60637, U.S.A.

Speeding up the EM algorithm has been a topic of interest since its formulation, but thus far it has been typically achieved by sacrificing the simplicity and/or stability of EM. In this paper we show that in the common EM application of fitting multivariate t-distributions to arbitrary data sets, it is possible to speed up EM while losing neither simplicity nor stability. Our general idea is to introduce a working parameter into the data-augmentation scheme underlying the EM construction, minimize the fraction of missing information as a function of the working parameter, and thus maximize the speed of EM. For the t problem, we show that the optimal EM is computationally as simple as the standard EM (i.e., the iteratively reweighted least squared algorithm), but always converges faster for any t-model being fit to any data set. Our simulations show that the reduction in the number of iterations is especially dramatic (e.g., 90%) when the degrees of freedom is small (e.g., the Cauchy model) and/or the dimension of t is high. We thus suggest that the standard EM algorithm for fitting the t model be replaced by the optimal EM algorithm in all applications.

Key Words: Convergence Rate; Data Augmentation; EM Algorithm; Incomplete Data; Iteratively Reweighted Least Squares; t-Distribution.

Return to David van Dyk's homepage.