This vignettes demontrates those functions of the *sjstats*-package that deal especially with mixed effects models. *sjstats* provides following functions:

`deff()`

and`smpsize_lmm()`

`converge_ok()`

and`is_singular()`

`p_value()`

`scale_weights()`

`get_re_var()`

and`re_var()`

`icc()`

`r2()`

Befor we start, we fit a simple linear mixed model:

```
library(sjstats)
library(lme4)
# load sample data
data(sleepstudy)
# fit linear mixed model
m <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
set.seed(2018)
sleepstudy$mygrp <- sample(1:45, size = 180, replace = TRUE)
m2 <- lmer(Reaction ~ Days + (1 | mygrp) + (1 | Subject), sleepstudy)
```

The first two functions, `deff()`

and `smpsize_lmm()`

, can be used to approximately calculate the sample size in the context of power calculation. Calculating the sample size for simple linear models is pretty straightforward, however, for (linear) mixed models, statistical power is affected through the change of the variance of test statistics. This is what *Hsieh et al. (2003)* call a *design effect* (or variance inflation factor, VIF). Once this design effect is calculated, the sample size calculated for a standard design can be adjusted accordingly.

`deff()`

computes this design effect for linear mixed models with two-level design. It requires the approximated average number of observations per grouping cluster (i.e. level-2 unit) and the assumed intraclass correlation coefficient (ICC) for the multilevel-model. Typically, the minimum assumed value for the ICC is *0.05*.

```
# Design effect for two-level model with 30 observations per
# cluster group (level-2 unit) and an assumed intraclass
# correlation coefficient of 0.05.
deff(n = 30)
#> [1] 2.45
# Design effect for two-level model with 24 observation per cluster
# group and an assumed intraclass correlation coefficient of 0.2.
deff(n = 24, icc = 0.2)
#> [1] 5.6
```

`smpsize_lmm()`

combines the functions for power calculation from the **pwr**-package and design effect `deff()`

. It computes an approximated sample size for linear mixed models (two-level-designs), based on power-calculation for standard design and adjusted for design effect for 2-level-designs.

```
# Sample size for multilevel model with 30 cluster groups and a small to
# medium effect size (Cohen's d) of 0.3. 27 subjects per cluster and
# hence a total sample size of about 802 observations is needed.
smpsize_lmm(eff.size = .3, k = 30)
#> $`Subjects per Cluster`
#> [1] 27
#>
#> $`Total Sample Size`
#> [1] 802
# Sample size for multilevel model with 20 cluster groups and a medium
# to large effect size for linear models of 0.2. Five subjects per cluster and
# hence a total sample size of about 107 observations is needed.
smpsize_lmm(eff.size = .2, df.n = 5, k = 20, power = .9)
#> $`Subjects per Cluster`
#> [1] 5
#>
#> $`Total Sample Size`
#> [1] 107
```

There are more ways to perform power calculations for multilevel models, however, most of these require very detailed knowledge about the sample characteristics and performing simulation studys. `smpsize_lmm()`

is a more pragmatic alternative to these approaches.

Sometimes, when fitting mixed models, covergence warnings or warnings about singularity may come up (see details on these issues in this FAQ). These warnings sometimes arise due to the strict tresholds and / or may be safely ignored, but sometimes indicate overly complex models or models with poorly defined random structure. `converge_ok()`

and `is_singular()`

may help to check whether such a warning is problematic or not.

`converge_ok()`

provides an alternative convergence test for merMod-objects (with a less strict treshold, as suggested by one of the *lme4*-package authors), while `is_singular()`

can be used in case of post-fitting convergence warnings, such as warnings about negative eigenvalues of the Hessian. Typically, you want `TRUE`

for `converge_ok()`

and non-singular models (i.e. `FALSE`

for `is_singular()`

).

Regarding singular models, there may be some concerns that should be checked:

- singular fits correspond to overfitted models that may have poor power;
- chances of numerical problems and mis-convergence are higher for singular models (e.g. it may be computationally difficult to compute profile confidence intervals for such models);
- standard inferential procedures such as Wald statistics and likelihood ratio tests may be inappropriate.

Singular fits are likely to occur when the numbers of random-effect levels is small or for complex random-effects models, e.g. models with several different random-slopes terms. There are several (contradicting) proposals how to deal with singularity, although there is no consensus about the best approach:

- Start with the most complex model, then dropping terms until model fit is non-singular and convergence is ok (see
*Barr et al. 2013*). - Define a parsimonious, simplified model
*a priori*(see*Bates et al. 2015*,*Matuschek et al. 2017*).

See `?is_singular`

and `?lme4::isSingular`

for further details.

Most functions to fit multilevel and mixed effects models only allow to specify frequency weights, but not design (i.e. *sampling* or *probability*) weights, which should be used when analyzing complex samples and survey data.

`scale_weights()`

implements an algorithm proposed by *Aaparouhov (2006)* and *Carle (2009)* to rescale design weights in survey data to account for the grouping structure of multilevel models, which then can be used for multilevel modelling.

To calculate a weight-vector that can be used in multilevel models, `scale_weights()`

needs the data frame with survey data as `x`

-argument. This data frame should contain 1) a *cluster ID* (argument `cluster.id`

), which represents the *strata* of the survey data (the level-2-cluster variable) and 2) the probability weights (argument `pweight`

), which represents the design or sampling weights of the survey data (level-1-weight).

`scale_weights()`

then returns the original data frame, including two new variables: `svywght_a`

, where the sample weights `pweight`

are adjusted by a factor that represents the proportion of cluster size divided by the sum of sampling weights within each cluster. The adjustment factor for `svywght_b`

is the sum of sample weights within each cluster devided by the sum of squared sample weights within each cluster (see *Carle (2009)*, Appendix B, for details).

```
data(nhanes_sample)
scale_weights(nhanes_sample, SDMVSTRA, WTINT2YR)
#> # A tibble: 2,992 x 9
#> total age RIAGENDR RIDRETH1 SDMVPSU SDMVSTRA WTINT2YR svywght_a svywght_b
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2.2 1 3 2 31 97594. 1.57 1.20
#> 2 7 2.08 2 3 1 29 39599. 0.623 0.525
#> 3 3 1.48 2 1 2 42 26620. 0.898 0.544
#> 4 4 1.32 2 4 2 33 34999. 0.708 0.550
#> 5 1 2 2 1 1 41 14746. 0.422 0.312
#> 6 6 2.2 2 4 1 38 28232. 0.688 0.516
#> 7 350 1.6 1 3 2 33 93162. 1.89 1.46
#> 8 NA 1.48 2 3 1 29 82276. 1.29 1.09
#> 9 3 2.28 2 4 1 41 24726. 0.707 0.523
#> 10 30 0.84 1 3 2 35 39895. 0.760 0.594
#> # ... with 2,982 more rows
```

For linear mixed models, the `summary()`

in **lme4** does not report p-values. Other objects from regression functions are not consistent in their output structure when reporting p-values. `p_value()`

aims at returning a standardized (“tidy”) output for any regression model. The return value is always a data frame with three columns: *term*, *p.value* and *std.error*, which represent the name, p-value and standard error for each term.

For linear mixed models, the approximation of p-values are more precise when `p.kr = TRUE`

, based on conditional F-tests with Kenward-Roger approximation for the degrees of freedom (calling `pbkrtest::get_Lb_ddf()`

).

```
# Using the t-statistics for calculating the p-value
p_value(m2)
#> term p.value std.error
#> 1 (Intercept) 0 9.933
#> 2 Days 0 0.789
# p-values based on conditional F-tests with
# Kenward-Roger approximation for the degrees of freedom
p_value(m2, p.kr = TRUE)
#> term p.value std.error
#> 1 (Intercept) 0 9.945
#> 2 Days 0 0.796
```

To see more details on the degrees of freedom when using Kenward-Roger approximation, use the `summary()`

-method:

In mixed effects models, several random effect variances (depending on the model specification) can be calculated:

`sigma_2`

: Within-group (residual) variance`tau.00`

: Between-group-variance (variation between individual intercepts and average intercept)`tau.11`

: Random-slope-variance (variation between individual slopes and average slope)`tau.01`

: Random-Intercept-Slope-covariance`rho.01`

: Random-Intercept-Slope-correlation

You can access on of these values with `get_re_var()`

, or all of them with `re_var()`

:

```
# get residual variance
get_re_var(m, "sigma_2")
#> [1] 654.9408
# get all random effect variances
re_var(m)
#> Within-group-variance: 654.941
#> Between-group-variance: 611.898 (Subject)
#> Random-slope-variance: 35.081 (Subject.Days)
#> Slope-Intercept-covariance: 9.614 (Subject.(Intercept))
#> Slope-Intercept-correlation: 0.066 (Subject)
```

Further variance components can be calculated using `adjusted = TRUE`

. In this case, `re_var()`

returns the variance attributable to the fixed effects, the variance of the random effects, the variance due to additive dispersion and the distribution-specific variance. The residual variance is the sum of the dispersion and distribution-specific variance (see *Nakagawa et al. (2017)* and *Johnson et al. (2014)*):

```
# get all variance components of mixed models
re_var(m, adjusted = TRUE)
#>
#> Variance Components of Mixed Models
#>
#> Family : gaussian (identity)
#> Formula: list(conditional = Reaction ~ Days, random = ~Days | Subject)
#>
#> fixed: 908.953
#> random: 1698.233
#> residual: 654.941
#> dispersion: 0.000
#> distribution: 654.941
```

*Nakagawa et al. (2017)* proposed a method to compute marginal and conditional r-squared values, which is implemented in the `r2()`

-function. For mixed models, the marginal r-squared considers only the variance of the fixed effects, while the conditional r-squared takes both the fixed and random effects into account. `r2()`

can be used with models fitted with the functions of the **lme4** and **glmmTMB** packages.

The components of the random effect variances are of interest when calculating the intraclass-correlation coefficient, ICC. The ICC is calculated by dividing the between-group-variance (random intercept variance) by the total variance (i.e. sum of between-group-variance and within-group (residual) variance). The ICC can be interpreted as “the proportion of the variance explained by the grouping structure in the population” (Hox 2002: 15).

Usually, the ICC is calculated for the null model (“unconditional model”). However, according to *Raudenbush and Bryk (2002)* or *Rabe-Hesketh and Skrondal (2012)* it is also feasible to compute the ICC for full models with covariates (“conditional models”) and compare how much a level-2 variable explains the portion of variation in the grouping structure (random intercept).

The ICC for mixed models can be computed with `icc()`

. *Caution:* For random-slope-intercept models, the ICC would differ at each unit of the predictors. Hence, the ICC for these kind of models cannot be understood simply as proportion of variance (see *Goldstein et al. 2010*). For convenience reasons, as the `icc()`

function is also used to extract the different random effects variances (see `re_var()`

above), the ICC for random-slope-intercept-models is reported nonetheless, but it is usually no meaningful summary of the proportion of variances. To get a meaningful ICC also for models with random slopes, use `adjusted = TRUE`

. The adjusted ICC uses the mean random effect variance, which is based on the random effect variances for each value of the random slope (see *Johnson 2014*).

By default, for three-level-models, depending on the nested structure of the model, or for cross-classified models, `icc()`

only reports the proportion of variance explained for each grouping level. Use `adjusted = TRUE`

to calculate the adjusted and conditional ICC that take all random effect variances into account.

```
icc(m)
#> Caution! ICC for random-slope-intercept models usually not meaningful. Use `adjusted = TRUE` to use the mean random effect variance to calculate the ICC. See 'Note' in `?icc`.
#>
#> Intraclass Correlation Coefficient for Linear mixed model
#>
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (Days | Subject)
#>
#> ICC (Subject): 0.4830
icc(m2)
#>
#> Intraclass Correlation Coefficient for Linear mixed model
#>
#> Family : gaussian (identity)
#> Formula: Reaction ~ Days + (1 | mygrp) + (1 | Subject)
#>
#> ICC (mygrp): 0.0383
#> ICC (Subject): 0.5981
```

If `adjusted = TRUE`

, an adjusted and a conditional ICC are calculated, which take all sources of uncertainty (of all random effects) into account to report an “adjusted” ICC, as well as the conditional ICC. The latter also takes the fixed effects variances into account (see *Nakagawa et al. 2017*). If random effects are not nested and not cross-classified, the adjusted (`adjusted = TRUE`

) and unadjusted (`adjusted = FALSE`

) ICC are identical.

```
icc(m, adjusted = TRUE)
#>
#> Intraclass Correlation Coefficient for Generalized Linear Mixed Model
#>
#> Family : gaussian (identity)
#> Formula: ~Days | Subject Reaction ~ Days NA
#>
#> Adjusted ICC: 0.7217
#> Conditional ICC: 0.5206
icc(m2, adjusted = TRUE)
#>
#> Intraclass Correlation Coefficient for Generalized Linear Mixed Model
#>
#> Family : gaussian (identity)
#> Formula: list(~1 | mygrp, ~1 | Subject) Reaction ~ Days NA
#>
#> Adjusted ICC: 0.6363
#> Conditional ICC: 0.4609
```

Aaparouhov T. 2006. *General Multi-Level Modeling with Sampling Weights.* Communications in Statistics—Theory and Methods (35): 439–460

Barr DJ, Levy R, Scheepers C, Tily HJ. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3):255–278

Bates D, Kliegl R, Vasishth S, Baayen H. 2015. Parsimonious Mixed Models. arXiv:1506.04967

Carle AC. 2009. *Fitting multilevel models in complex survey data with design weights: Recommendations.* BMC Medical Research Methodology 9(49): 1-13

Goldstein H, Browne W, Rasbash J. 2010. Partitioning Variation in Multilevel Models. Understanding Statistics, 1:4, 223-231, doi: 10.1207/S15328031US0104_02

Hox J. 2002. *Multilevel analysis: techniques and applications.* Mahwah, NJ: Erlbaum

Hsieh FY, Lavori PW, Cohen HJ, Feussner JR. 2003. *An Overview of Variance Inflation Factors for Sample-Size Calculation.* Evaluation & the Health Professions 26: 239–257. doi: 10.1177/0163278703255230

Johnson PC, O’Hara RB. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 5: 944-946. doi: 10.1111/2041-210X.12225

Matuschek H, Kliegl R, Vasishth S, Baayen H, Bates D. 2017.Balancing type I error and power in linear mixed models. Journal of Memory and Language, 94:305–315

Nakagawa S, Johnson P, Schielzeth H. 2017. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. doi: 10.1098/rsif.2017.0213

Rabe-Hesketh S, Skrondal A. 2012. *Multilevel and longitudinal modeling using Stata.* 3rd ed. College Station, Tex: Stata Press Publication

Raudenbush SW, Bryk AS. 2002. *Hierarchical linear models: applications and data analysis methods.* 2nd ed. Thousand Oaks: Sage Publications