This package extends and builds on the mice package by adding a functionality to perform multivariate predictive mean matching on imputed data as well as new functionalities to perform predictive mean matching on factor variables.
miceExt package can be installed from CRAN as follows: install.packages(“miceExt”)
Overall, miceExt provides three funtions, namely
out of which the first function post-processes results of the mice()-algorithm by performing multivariate predictive mean matching on a user-defined set of column tuples, and results in imputations that are always equal to already-observed values, which annihilates the chance of getting unrealistic output values. The latter two functions provide a new option to impute categorical data by even extending the functionality of ´mice.post.matching()´. The function ´mice.binarize()´ transforms categorical attributes of a given data frame into a binary dummy representation, which results in an exclusively numerical data set that mice can handle well. Inconsistencies within the imputed dummy columns can then be handled by ´mice.post.matching()´, and ´mice.factorize()´ finally serves the purpose of retransforming the imputed binary data into the corresponding original categories, resulting in a proper imputation of the given categorical data.
In this example, we work on a modification of the ‘mammalsleep’ data set from mice, ‘mammal_data’, which is included in the miceExt-package and which has identical missing data patterns on the column tuples (‘ps’,‘sws’) and (‘mls’,‘gt’). We want to post-process the imputations gained from after running ´mice()´ on this data by performing multivariate PMM on these tuples. This procedure works in two simple steps:
Run mice on data set ‘mammal_data’ and obtain a mids object to post-process: mids_mammal <- mice(mammal_data)
Run ´mice.post.matching´. As column argument ‘cols’ has not been specified, it will automatically detect the column tuples with identical missing data patterns and then impute on these: post_mammal <- mice.post.matching(mids_mammal)
Now we can look into the reulsting imputations via ´post_mammalmidsobjimp´ or analyze the results via the ´with()´ function.
In this example, we want to impute the categorical columns ‘gen’ and ‘phb’ in the data set ‘boys’ that is included in the mice-package with the functionalities of the package. This works in three main steps:
Binarize the factor columns in boys the we want to impute on: boys_bin <- mice.binarize(boys, cols = c(“gen”, “phb”))
Run mice on binarized data (note that we need to grab boys_bindata)andpost − processitwithmice.post.matching´´, asitisverylikelythatmiceimputedmultipleonesamongonesetofdummyvariables : midsboys < −mice(boysbindata) post_boys <- mice.post.matching(mids_boys, distmetric = “residual”)
Restransform the resulting imputations back into categorical format: res_boys <- mice.factorize(post_boysmidsobj, boysbinpar_list)
Also in this case, we can analyze the resulting imputed dataset via the ´with()´ function. If, e.g., we want to take a closer look at the distribution of the values of ‘gen’, we can use: with(res_boys, table(gen))