# MCMC Settings

#### 2018-12-04

In JointAI, models are estimated in the Bayesian framework, using MCMC (Markov Chain Monte Carlo) sampling. The sampling is done by the software JAGS (“Just Another Gibbs Sampler”), which performs Gibbs sampling. JointAI pre-processes the data to get it into a form that can be passed to JAGS and writes the JAGS model. The R package rjags is used as interface to JAGS.

Here, we describe how to specify the arguments in the main functions that control MCMC related settings. To learn more about how to specify the other arguments in JointAI models, check out the vignette on Model Specification.

In this vignette we use the NHANES data for examples in cross-sectional data and the dataset simLong for examples in longitudinal data. For more info on these datasets, check out the vignette Visualizing Incomplete Data, in which the distributions of variables and missing values in both sets is explored.

Note:
In many of the examples we use progress.bar = 'none' which prevents printing of the progress of the MCMC sampling, since this would result in lengthy output in the vignette.

## Chains and Iterations

In MCMC sampling, values are drawn from a probability distribution. The distribution the current value is drawn from depends on the previously drawn value (but not on values before that). Values, thus, form a chain. Once the chain has converged, its elements can be seen as a sample from the target posterior distribution.

### Number of chains

To evaluate convergence of MCMC chains it is helpful to create multiple chains, that have different starting values. The argument n.chains selects the number of chains (by default, n.chains = 3).

For calculating the model summary, multiple chains are merged.

JAGS has an adaptive mode, in which samplers are optimized (for example the step size is adjusted). Samples obtained during the adaptive mode do not form a Markov chain and are discarded. The argument n.adapt controls the length of this adaptive phase.

The default value for n.adapt is 100, which works well in many of the examples considered here. Complex models may require longer adaptive phases. If the adaptive phase is not sufficient for JAGS to optimize the samplers, a warning message will be printed (see example below).

### Sampling iterations

n.iter specifies the number of iterations in the sampling phase, i.e., the length of the MCMC chain. How many samples are required to reach convergence and to have sufficient precision depends on the complexity of data and model, and may range from as few as 100 to several million.

#### Side note: How to evaluate convergence?

Convergence can for instance be evaluated visually using a traceplot() or using the Gelman-Rubin diagnostic criterion1 (implemented in GR_crit()). The latter compares within and between chain variability and requires the JointAI object to have at least two chains.

#### Side note: How to check precision?

Precision of the MCMC sample can be checked with the function MC_error(). It calculates the Monte Carlo error (the error that is made since the sample is finite) and compares is to the standard deviation of the posterior sample. A rule of thumb is that the Monte Carlo error should not be more than 5% of the standard deviation.2

### Thinning

In settings with high autocorrelation i.e., there are no large jumps in the chain but sampled values are always close to the previous value, it may take many iterations before a sample is created that sufficiently represents the whole range of the posterior distribution.

Processing of such long chains can be slow and take a lot of memory. The parameter thin allows to specify if and how much the MCMC chains should be thinned out before storing them. By default thin = 1 is used, which corresponds to keeping all values. A value thin = 10 would result in keeping every 10th value and discarding all other values.

### Examples

#### Default settings

n.adapt = 100 and thin = 1 with 100 sampling iterations

mod1 <- lm_imp(SBP ~ alc, data = NHANES, n.iter = 100, progress.bar = 'none')

The relevant part of the model summary (obtained with summary()) shows that the first 100 iterations (adaptive phase) were discarded, and the 100 iterations that follow form the posterior sample. Thinning was set to 1 and there are 3 chains.

#> [...]
#> MCMC settings:
#> Iterations = 101:200
#> Sample size per chain = 100
#> Thinning interval = 1
#> Number of chains = 3

mod2 <- lm_imp(SBP ~ alc, data = NHANES, n.adapt = 10, n.iter = 100, progress.bar = 'none')
#> Warning in rjags::jags.model(file = modelfile, data = data_list, inits = inits, : Adaptation
#> incomplete
#> NOTE: Stopping adaptation

Specifying n.adapt = 10 results in a warning message. The relevant part of the model summary from the resulting model is:

#> [...]
#> MCMC settings:
#> Iterations = 11:110
#> Sample size per chain = 100
#> Thinning interval = 1
#> Number of chains = 3

#### Thinning

mod3 <- lm_imp(SBP ~ alc, data = NHANES, n.iter = 500, thin = 10, progress.bar = 'none')

Here, iterations 110 until 600 are used in the output, but due to thinning interval of ten, the resulting MCMC chains contain only 50 samples instead of 500, that is, the samples from iteration 110, 120, 130, …

#> [...]
#> MCMC settings:
#> Iterations = 110:600
#> Sample size per chain = 50
#> Thinning interval = 10
#> Number of chains = 3

## Parameters to follow

JAGS only saves the values of MCMC chains for those nodes for which the user has specified that they should be monitored. This is also the case in JointAI.

### What are nodes?

Nodes are variables in the Bayesian framework, i.e., everything that is observed or unobserved. This includes the data and parameters that are estimated, but also missing values in the data or parts of the data that are generally unobserved, such as random effects or latent class indicators.

### Specifying which nodes should be monitored

For this purpose, lm_imp(), glm_imp() and lme_imp() have an argument monitor_params.

For details, explanation and examples see the vignette Parameter Selection.

## Initial values

Initial values are the starting point for the MCMC sampler. Setting good initial values, i.e., initial values that are likely under the posterior distribution, can speed up convergence. By default, the argument inits = TRUE, which means that initial values are generated automatically by JointAI. When inits = FALSE, initial values are generated by JAGS.

### User specified initial values

It is also possible to supply initial values directly as

• a list or
• a function.

Initial values can be specified for every unobserved node, that is, parameters and missing values, but it is possible to only specify initial values for a subset of nodes.

#### Initial values in a list of lists

A list containing initial values should have the same length as the number of chains, where each element is a named list of initial values. Initial values should differ between the chains.

For example, to create initial values for the parameter vector beta and the precision parameter tau_SBP for three chains:

init_list <- lapply(1:3, function(i) {
list(beta = rnorm(4),
tau_SBP = rgamma(1, 1, 1))
})

init_list
#> [[1]]
#> [[1]]$beta #> [1] -1.2995363 -0.8698701 1.0543623 -0.1486396 #> #> [[1]]$tau_SBP
#> [1] 1.479009
#>
#>
#> [[2]]
#> [[2]]$beta #> [1] -0.7215865 -0.7787781 -0.6181664 1.6052045 #> #> [[2]]$tau_SBP
#> [1] 0.1708875
#>
#>
#> [[3]]
#> [[3]]$beta #> [1] -0.2227807 -0.4625003 -0.7372436 0.2611442 #> #> [[3]]$tau_SBP
#> [1] 1.499624

The list is then passed to the argument inits. User provided lists of initial values are stored in the JointAI object:

mod4a <- lm_imp(SBP ~ gender + age + WC, data = NHANES, progress.bar = 'none',
inits = init_list)

mod4a$mcmc_settings$inits
#> [[1]]
#> [[1]]$beta #> [1] -1.2995363 -0.8698701 1.0543623 -0.1486396 #> #> [[1]]$tau_SBP
#> [1] 1.479009
#>
#>
#> [[2]]
#> [[2]]$beta #> [1] -0.7215865 -0.7787781 -0.6181664 1.6052045 #> #> [[2]]$tau_SBP
#> [1] 0.1708875
#>
#>
#> [[3]]
#> [[3]]$beta #> [1] -0.2227807 -0.4625003 -0.7372436 0.2611442 #> #> [[3]]$tau_SBP
#> [1] 1.499624

#### Initial values as a function

Initial values can be specified by a function. The function should either take no arguments or a single argument called chain, and return a named list that supplies values for one chain.

For example, to create inital values for the parameter vectors beta and alpha:

inits_fun <- function() {
list(beta = rnorm(4),
alpha = rnorm(3))
}

inits_fun()
#> $beta #> [1] -0.2727562 -0.2795425 -0.1957527 0.3865116 #> #>$alpha
#> [1] 0.6313753 0.9355431 0.7734924

mod4b <- lm_imp(SBP ~ gender + age + WC, data = NHANES, progress.bar = 'none',
inits = inits_fun)

mod4b$mcmc_settings$inits
#> function() {
#>   list(beta = rnorm(4),
#>        alpha = rnorm(3))
#> }
#> <bytecode: 0x000000001386ba70>

When a function is supplied, the function, but not the actual values is stored in the JointAI object.

### For which nodes can initial values be specified?

Initial values can be specified for all unobserved stochastic nodes, i.e., parameters or unobserved data for which a distribution is specified in the JAGS model.

Inital values have to be supplied in the format the parameter or unobserved value is used in the JAGS model.

To find out which nodes there are in a model, the function coef() from package rjags can be used. It returns a list with the current values of the MCMC chains, by default the first chain. Elements of the inital values should have the same structure as the elements in this list.

#### Example:

We are using a longitudinal model and the simLong data in this example. The output is abbreviated to show the relevant characteristics.

mod4c <- lme_imp(bmi ~ time + HEIGHT_M + hc + SMOKE, random = ~ time | ID,
data = simLong, progress.bar = 'none')

coef(mod4c$model) #>$RinvD
#>          [,1]      [,2]
#> [1,] 1.028083        NA
#> [2,]       NA 0.2349163
#>
#> $Xc #> [,1] [,2] [,3] [,4] #> [1,] NA NA NA NA #> [2,] NA NA NA NA #> [3,] NA NA NA NA #> #> [...] #> #> [9,] NA NA NA NA #> [10,] NA NA NA NA #> [11,] NA -1.5208035 NA NA #> [12,] NA NA NA NA #> #> [...] #> #> [498,] NA NA NA NA #> [499,] NA NA NA NA #> #>$Xcat
#>        [,1]
#>   [1,]   NA
#>   [2,]   NA
#>   [3,]   NA
#>
#> [...]
#>
#>  [35,]   NA
#>  [36,]   NA
#>  [37,]    1
#>  [38,]   NA
#>  [39,]    1
#>  [40,]   NA
#>  [41,]   NA
#>
#> [...]
#>
#> [498,]   NA
#> [499,]   NA
#>
#> $alpha #> [1] -0.007947985 -0.002403037 #> #>$b
#>            [,1]        [,2]
#>   [1,] 16.35337 -1.91232469
#>   [2,] 18.92922 -1.32261424
#>   [3,] 15.73310 -0.88070042
#>   [4,] 18.13042 -0.76116505
#>   [5,] 14.45357 -0.41300231
#>   [6,] 17.55362 -0.96329273
#>
#> [...]
#>
#> [496,] 14.31769 -1.46642473
#> [497,] 15.93877 -0.79078109
#> [498,] 14.61694 -1.17746873
#> [499,] 14.58357 -0.58934464
#>
#> $beta #> [1] 16.48923658 0.04855539 -0.05471580 0.10887890 -1.17770728 1.19268157 #> #>$delta_SMOKE
#> [1] -0.8540864
#>
#> $gamma_SMOKE #> [1] 1.127806 NA #> #>$invD
#>            [,1]       [,2]
#> [1,]  0.6847447 -0.1678952
#> [2,] -0.1678952  6.4420629
#>
#> $mu_b #> [,1] [,2] #> [1,] NA -1.177707 #> [2,] NA -1.177707 #> [3,] NA -1.177707 #> #> [...] #> #> [497,] NA -1.177707 #> [498,] NA -1.177707 #> [499,] NA -1.177707 #> #>$tau_HEIGHT_M
#> [1] 1.039257
#>
#> $tau_bmi #> [1] 1.315459 RinvD is the scale matrix in the Wishart prior for the inverse of the random effects design matrix D. In the data that is passed JAGS (which is stored in the element data_list in a JointAI object), this matrix is specified as diagonal matrix, with unknown diagonal elements: mod4c$data_list['RinvD']
#> $RinvD #> [,1] [,2] #> [1,] NA 0 #> [2,] 0 NA These diagonal elements are estimated in the model and have a Gamma prior. The corresponding part of the JAGS model is: #> [...] #> # Priors for the covariance of the random effects #> for (k in 1:2){ #> RinvD[k, k] ~ dgamma(a_diag_RinvD, b_diag_RinvD) #> } #> invD[1:2, 1:2] ~ dwish(RinvD[ , ], KinvD) #> D[1:2, 1:2] <- inverse(invD[ , ]) #> [...] The element RinvD in the initial values has to be a 2 $$\times$$ 2 matrix, with positive values on the diagonal and NA as off-diagonal elements, since these are fixed in the data. The following element in the output of coef(mod4c$model) is Xc, the fixed effects design matrix containing baseline covariates. The corresponding matrix in the data is

head(mod4c$data_list$Xc, 15)
#>        (Intercept)   HEIGHT_M SMOKEsmoked until pregnancy was known
#> 1.1              1  1.4568523                                    NA
#> 10.1             1 -1.2863098                                    NA
#> 100.11           1 -1.1350146                                    NA
#> 101.1            1  0.2962555                                    NA
#> 102.4            1  0.7211347                                    NA
#> 103.15           1 -1.8893248                                    NA
#> 104.4            1 -0.8216534                                    NA
#> 105.2            1  0.5784470                                    NA
#> 106.1            1  0.3689238                                    NA
#> 107.1            1 -0.6264986                                    NA
#> 108.7            1         NA                                    NA
#> 109.1            1  0.5398961                                    NA
#> 11.1             1  0.2053576                                    NA
#> 110.26           1  0.6703233                                    NA
#> 111.1            1  1.6552801                                    NA
#>        SMOKEcontinued smoking in pregnancy
#> 1.1                                     NA
#> 10.1                                    NA
#> 100.11                                  NA
#> 101.1                                   NA
#> 102.4                                   NA
#> 103.15                                  NA
#> 104.4                                   NA
#> 105.2                                   NA
#> 106.1                                   NA
#> 107.1                                   NA
#> 108.7                                   NA
#> 109.1                                   NA
#> 11.1                                    NA
#> 110.26                                  NA
#> 111.1                                   NA

Again, the matrix Xc in the inital values has the same dimension as Xc in the data, and values where there are missing values in Xc, e.g., Xc[11, 2] and NA elsewhere. There are no inital values specified for the third and fourth column, since these columns contain the dummy variables corresponding to the categorical variable SMOKE and are calculated from a column in the matrix Xcat, which contains the original version of SMOKE:

head(mod4c$data_list$Xcat)
#>        SMOKE
#> 1.1        3
#> 10.1       1
#> 100.11     1
#> 101.1      3
#> 102.4      3
#> 103.15     1

The relevant part of the JAGS model is:

#> [...]
#>
#>   # ordinal model for SMOKE
#>   Xcat[i, 1] ~ dcat(p_SMOKE[i, 1:3])
#>
#> [...]
#>
#>   Xc[i, 3] <- ifelse(Xcat[i, 1] == 2, 1, 0)
#>   Xc[i, 4] <- ifelse(Xcat[i, 1] == 3, 1, 0)
#> [...]

Where Xcat in the data has missing values (e.g., rows 37 and 39), inital values need to be specified.

Elements that are completely unobserved, like the parameter vectors alpha and beta, the random effects b or scalar parameters delta_SMOKE and gamma_SMOKE are entirely specified in the initial values.

## Other arguments

There are two more arguments in *_imp() that are passed directly to the rjags functions jags.model() or coda.samples():

• quiet: should messages generated during compilation be suppressed?
• progress.bar: allows to select the type of progress bar. Possible values are "text", "gui" and "none".

1. Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 733-760.

2. See p. 195 of Lesaffre, E., & Lawson, A. B. (2012). Bayesian Biostatistics. John Wiley & Sons.