JointAI: Joint Analysis and Imputation of Incomplete Data

Travis-CI Build Status CRAN_Status_Badge Download counter Rdoc

The package JointAI provides joint analysis and imputation of (generalized) linear regression models, (generalized) linear mixed models and parametric (Weibull) survival models with incomplete (covariate) data in the Bayesian framework.

The package performs some preprocessing of the data and creates a JAGS model, which will then automatically be passed to JAGS with the help of the R package rjags.

JointAI also provides summary and plotting functions for the output.

Installation

You can install JointAI from GitHub with:

# install.packages("devtools")
devtools::install_github("NErler/JointAI")

Main functions

Currently, there are three main functions that perform linear, generalized linear or linear mixed regression:

lm_imp()
glm_imp()
lme_imp()
glme_imp()
survreg_imp()

lm_imp() and glm_imp() use specification similar to their complete data counterparts lm() and glm(), whereas lme_imp() uses similar specification as lme() from the package nlme. glme_imp() uses a combination of the specifications from glm() (with regards to the family) and lme (with regards to the random effects structure). survreg_imp() is a missing data version of survreg from the package survival.

Functions summary(), traceplot(), densityplot() provide a summary of the posterior distribution and its visualization.

GR_crit() and MC_error() provide the Gelman-Rubin diagnostic for convergence and the Monte Carlo error of the MCMC sample, respectively.

Minimal Example

Visualize the observed data and missing data pattern

library(JointAI)

par(mar = c(2.5, 3, 2.5, 1), mgp = c(2, 0.8, 0))
plot_all(NHANES[c(1, 5:6, 8:12)], fill = '#18bc9c', border = '#2C3E50', ncol = 4, nclass = 30)

md_pattern(NHANES, color = c('#2C3E50', '#18bc9c'))

Fit a linear regression model with incomplete covariates

lm1 <- lm_imp(SBP ~ gender + age + WC + alc + educ + bili,
              data = NHANES, n.iter = 500, progress.bar = 'none')

Visualize the MCMC sample

traceplot(lm1, col = c('#E74C3C', '#2C3E50', '#18bc9c'), ncol = 4)

densplot(lm1, col = c('#E74C3C', '#2C3E50', '#18bc9c'), ncol = 4, lwd = 2)

Summarize the Result

summary(lm1)
#> 
#>  Linear model fitted with JointAI 
#> 
#> Call:
#> lm_imp(formula = SBP ~ gender + age + WC + alc + educ + bili, 
#>     data = NHANES, n.iter = 500, progress.bar = "none")
#> 
#> Posterior summary:
#>                Mean     SD    2.5%   97.5% tail-prob. GR-crit
#> (Intercept)  88.138 8.8987  70.907 105.004    0.00000   1.016
#> genderfemale -3.493 2.2629  -7.995   0.733    0.11333   1.026
#> age           0.333 0.0698   0.198   0.471    0.00000   0.999
#> educhigh     -2.834 2.1179  -6.937   1.213    0.17600   1.001
#> WC            0.226 0.0727   0.087   0.365    0.00133   1.006
#> bili         -5.325 4.8712 -14.671   4.107    0.27867   1.034
#> alc>=1        6.474 2.3852   1.617  10.975    0.00800   1.008
#> 
#> Posterior summary of residual std. deviation:
#>           Mean    SD 2.5% 97.5% GR-crit
#> sigma_SBP 13.6 0.713 12.3  15.1       1
#> 
#> 
#> MCMC settings:
#> Iterations = 101:600
#> Sample size per chain = 500 
#> Thinning interval = 1 
#> Number of chains = 3 
#> 
#> Number of observations: 186
coef(lm1)
#>  (Intercept) genderfemale          age     educhigh           WC 
#>   88.1376999   -3.4929987    0.3329173   -2.8344294    0.2261967 
#>         bili       alc>=1 
#>   -5.3246059    6.4737628

confint(lm1)
#>                      2.5%       97.5%
#> (Intercept)   70.90745371 105.0043681
#> genderfemale  -7.99483709   0.7328822
#> age            0.19811543   0.4708507
#> educhigh      -6.93670692   1.2126142
#> WC             0.08699786   0.3646676
#> bili         -14.67086056   4.1073254
#> alc>=1         1.61683002  10.9745147
#> sigma_SBP     12.29417867  15.0952757