Multiple Imputation for Incomplete Data with Semicontinuous Variables

Submitted to Biometrika

Kristin Javaras
Department of Statistics, Oxford University, Oxford, OX1 3RG, U.K.

David A. van Dyk
Department of Statistics, Harvard University, Cambridge, MA 02138, U.S.A.

We consider an application of multiple imputation to data consisting not only of partially missing categorical and continuous variables, but also of partially missing `semicontinuous variables,' variables that take on a single discrete value with positive probability but are otherwise continuously distributed. Multiple imputation requires a suitable imputation model, and, thus, for the above situation, we propose the `blocked general location model,' which is an extension of the (standard) general location model proposed by Olkin & Tate (1961) for mixed discrete and continuous variables. In addition, we introduce EM and data augmentation algorithms for the blocked general location model with missing data, which can be employed to generate imputations under a blocked general location model.

Return to David van Dyk's homepage.