Augmenting Data Wisely to Speed up the EM Algorithm
Xiao-Li Meng and David A. van Dyk
Department of Statistics, The University of Chicago,
Chicago, IL 60637, U.S.A.
Speeding up the EM algorithm has been a topic of interest since its
formulation, but thus far it has been typically achieved
by sacrificing the simplicity and/or stability of EM.
In this paper we show that in the common EM application
of fitting multivariate t-distributions to arbitrary data
sets, it is possible to speed up EM while losing neither simplicity nor
stability. Our general idea is to introduce a working parameter into the
data-augmentation scheme underlying the EM construction,
minimize the fraction of missing information as a function of the
working parameter, and thus maximize the speed of EM.
For the t problem, we
show that the optimal EM is computationally as simple as the
standard EM (i.e., the iteratively reweighted least squared
algorithm), but always converges faster
for any t-model being fit to any data set.
Our simulations show that the
reduction in the number of iterations is
especially dramatic (e.g., 90%) when the degrees of freedom is
small (e.g., the Cauchy model) and/or the dimension of t is high.
We thus suggest that the standard EM algorithm for
fitting the t model be replaced by the optimal EM
algorithm in all applications.
Key Words: Convergence Rate; Data Augmentation; EM Algorithm;
Incomplete Data; Iteratively Reweighted Least Squares;
t-Distribution.
Return to David van Dyk's homepage.