# Statistics and Measures for Mixed Effects Models

This vignettes demontrates those functions of the sjstats-package that deal especially with mixed effects models. sjstats provides following functions:

• deff() and smpsize_lmm()
• converge_ok() and is_singular()
• p_value()
• scale_weights()
• get_re_var() and re_var()
• icc()

Befor we start, we fit a simple linear mixed model:

library(sjstats)
library(lme4)
data(sleepstudy)

# fit linear mixed model
m <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

## Sample Size Calculation for Mixed Models

The first two functions, deff() and smpsize_lmm(), can be used to approximately calculate the sample size in the context of power calculation. Calculating the sample size for simple linear models is pretty straightforward, however, for (linear) mixed models, statistical power is affected through the change of the variance of test statistics. This is what Hsieh et al. (2003) call a design effect (or variance inflation factor, VIF). Once this design effect is calculated, the sample size calculated for a standard design can be adjusted accordingly.

### Design Effect for Two-Level Mixed Models

deff() computes this design effect for linear mixed models with two-level design. It requires the approximated average number of observations per grouping cluster (i.e. level-2 unit) and the assumed intraclass correlation coefficient (ICC) for the multilevel-model. Typically, the minimum assumed value for the ICC is 0.05.

# Design effect for two-level model with 30 observations per
# cluster group (level-2 unit) and an assumed intraclass
# correlation coefficient of 0.05.
deff(n = 30)
#> [1] 2.45

# Design effect for two-level model with 24 observation per cluster
# group and an assumed intraclass correlation coefficient of 0.2.
deff(n = 24, icc = 0.2)
#> [1] 5.6

### Calculating the Sample Size for Linear Mixed Models

smpsize_lmm() combines the functions for power calculation from the pwr-package and design effect deff(). It computes an approximated sample size for linear mixed models (two-level-designs), based on power-calculation for standard design and adjusted for design effect for 2-level-designs.

# Sample size for multilevel model with 30 cluster groups and a small to
# medium effect size (Cohen's d) of 0.3. 27 subjects per cluster and
# hence a total sample size of about 802 observations is needed.
smpsize_lmm(eff.size = .3, k = 30)
#> $Subjects per Cluster #> [1] 27 #> #>$Total Sample Size
#> [1] 802

# Sample size for multilevel model with 20 cluster groups and a medium
# to large effect size for linear models of 0.2. Five subjects per cluster and
# hence a total sample size of about 107 observations is needed.
smpsize_lmm(eff.size = .2, df.n = 5, k = 20, power = .9)
#> $Subjects per Cluster #> [1] 5 #> #>$Total Sample Size
#> [1] 107

There are more ways to perform power calculations for multilevel models, however, most of these require very detailed knowledge about the sample characteristics and performing simulation studys. smpsize_lmm() is a more pragmatic alternative to these approaches.

## Trouble Shooting

Sometimes, when fitting mixed models, covergence warnings or warnings about singularity may come up (see details on these issues in this FAQ). These warnings may arise due to the strict tresholds and / or may be safely ignored. converge_ok() and is_singular() may help to check whether such a warning is problematic or not.

converge_ok() provides an alternative convergence test for merMod-objects (with a less strict treshold, as suggested by one of the lme4-package authors), while is_singular() can be used in case of post-fitting convergence warnings, such as warnings about negative eigenvalues of the Hessian. In both cases, if the function returns TRUE, these warnings can most likely be ignored.

converge_ok(m)
#> 1.79669205387819e-07
#>                 TRUE

is_singular(m)
#> [1] FALSE

## Rescale model weights for complex samples

Most functions to fit multilevel and mixed effects models only allow to specify frequency weights, but not design (i.e. sampling or probability) weights, which should be used when analyzing complex samples and survey data.

scale_weights() implements an algorithm proposed by Aaparouhov (2006) and Carle (2009) to rescale design weights in survey data to account for the grouping structure of multilevel models, which then can be used for multilevel modelling.

To calculate a weight-vector that can be used in multilevel models, scale_weights() needs that data frame with survey data as x-argument. This data frame should contain 1) a cluster ID (argument cluster.id), which represents the strata of the survey data (the level-2-cluster variable) and 2) the probability weights (argument pweight), which represents the design or sampling weights of the survey data (level-1-weight).

scale_weights() then returns the original data frame, including two new variables: svywght_a, where the sample weights pweight are adjusted by a factor that represents the proportion of cluster size divided by the sum of sampling weights within each cluster. The adjustment factor for svywght_b is the sum of sample weights within each cluster devided by the sum of squared sample weights within each cluster (see Carle (2009), Appendix B, for details).

data(nhanes_sample)
scale_weights(nhanes_sample, SDMVSTRA, WTINT2YR)
#> # A tibble: 2,992 x 9
#>     total   age RIAGENDR RIDRETH1 SDMVPSU SDMVSTRA WTINT2YR svywght_a svywght_b
#>     <dbl> <dbl>    <dbl>    <dbl>   <dbl>    <dbl>    <dbl>     <dbl>     <dbl>
#>  1   1.00 2.20      1.00     3.00    2.00     31.0    97594     1.57      1.20
#>  2   7.00 2.08      2.00     3.00    1.00     29.0    39599     0.623     0.525
#>  3   3.00 1.48      2.00     1.00    2.00     42.0    26620     0.898     0.544
#>  4   4.00 1.32      2.00     4.00    2.00     33.0    34999     0.708     0.550
#>  5   1.00 2.00      2.00     1.00    1.00     41.0    14746     0.422     0.312
#>  6   6.00 2.20      2.00     4.00    1.00     38.0    28232     0.688     0.516
#>  7 350    1.60      1.00     3.00    2.00     33.0    93162     1.89      1.46
#>  8  NA    1.48      2.00     3.00    1.00     29.0    82276     1.29      1.09
#>  9   3.00 2.28      2.00     4.00    1.00     41.0    24726     0.707     0.523
#> 10  30.0  0.840     1.00     3.00    2.00     35.0    39895     0.760     0.594
#> # ... with 2,982 more rows

## P-Values

For linear mixed models, the summary() in lme4 does not report p-values. Other objects from regression functions are not consistent in their output structure when reporting p-values. p_value() aims at returning a standardized (“tidy”) output for any regression model. The return value is always a data frame (a tibble) with three columns: term, p.value and std.error, which represent the name, p-value and standard error for each term.

For linear mixed models, the approximation of p-values are more precise when p.kr = TRUE, based on conditional F-tests with Kenward-Roger approximation for the degrees of freedom (calling pbkrtest::get_Lb_ddf()).

# Using the t-statistics for calculating the p-value
p_value(m)
#> # A tibble: 2 x 3
#>   term                                                                                                                                                                                                                                                                                                              p.value std.error
#>   <chr>                                                                                                                                                                                                                                                                                                               <dbl>     <dbl>
#> 1 (Intercept) 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000450      6.82
#> 2 Days        0.0000000000127                                                                                                                                                                                                                                                                                                    1.55

# p-values based on conditional F-tests with
# Kenward-Roger approximation for the degrees of freedom
p_value(m, p.kr = TRUE)
#> # A tibble: 2 x 3
#>   term                      p.value std.error
#>   <chr>                       <dbl>     <dbl>
#> 1 (Intercept) 0.0000000000000000117      6.82
#> 2 Days        0.00000326                 1.55

## Random Effect Variances

In mixed effects models, several random effect variances (depending on the model specification) are calculated:

• sigma_2: Within-group (residual) variance
• tau.00: Between-group-variance (variation between individual intercepts and average intercept)
• tau.11: Random-slope-variance (variation between individual slopes and average slope)
• tau.01: Random-Intercept-Slope-covariance
• rho.01: Random-Intercept-Slope-correlation

You can access on of these values with get_re_var(), or all of them with re_var():

# get residual variance
get_re_var(m, "sigma_2")
#> [1] 654.941

# get all random effect variances
re_var(m)
#>       Within-group-variance:  654.941
#>      Between-group-variance:  612.090 (Subject)
#>       Random-slope-variance:   35.072 (Subject.Days)
#>  Slope-Intercept-covariance:    9.604 (Subject)
#> Slope-Intercept-correlation:    0.066 (Subject)

## Intraclass-Correlation Coefficient

The components of the random effect variances are of interest when calculating the intraclass-correlation coefficient, ICC. The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance). The ICC can be interpreted as “the proportion of the variance explained by the grouping structure in the population” (Hox 2002: 15).

Usually, the ICC is calculated for the null model (“unconditional model”). However, according to Raudenbush and Bryk (2002) or Rabe-Hesketh and Skrondal (2012) it is also feasible to compute the ICC for full models with covariates (“conditional models”) and compare how much a level-2 variable explains the portion of variation in the grouping structure (random intercept).

The ICC for mixed models can be computed with icc():

icc(m)
#>
#> Linear mixed model
#>  Family: gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#>
#>   ICC (Subject): 0.483090

You can also compute ICCs for multiple models at once:

# fit 2nd model
sleepstudy\$mygrp <- sample(1:45, size = 180, replace = TRUE)
m2 <- lmer(Reaction ~ Days + (1 | mygrp) + (Days | Subject), sleepstudy)

# calculate ICC for both models
icc(m, m2)
#> [[1]]
#>
#> Linear mixed model
#>  Family: gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#>
#>   ICC (Subject): 0.483090
#>
#> [[2]]
#>
#> Linear mixed model
#>  Family: gaussian (identity)
#> Formula: Reaction ~ Days + (1 | mygrp) + (Days | Subject)
#>
#>     ICC (mygrp): 0.000000
#>   ICC (Subject): 0.483090

# References

Aaparouhov T. 2006. General Multi-Level Modeling with Sampling Weights. Communications in Statistics—Theory and Methods (35): 439–460

Carle AC. 2009. Fitting multilevel models in complex survey data with design weights: Recommendations. BMC Medical Research Methodology 9(49): 1-13

Hox J. 2002. Multilevel analysis: techniques and applications. Mahwah, NJ: Erlbaum

Hsieh FY, Lavori PW, Cohen HJ, Feussner JR. 2003. An Overview of Variance Inflation Factors for Sample-Size Calculation. Evaluation & the Health Professions 26: 239–257. doi: 10.1177/0163278703255230

Rabe-Hesketh S, Skrondal A. 2012. Multilevel and longitudinal modeling using Stata. 3rd ed. College Station, Tex: Stata Press Publication

Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. 2nd ed. Thousand Oaks: Sage Publications