This vignette shows how you can further investigate your models using Monte Carlo simulations. In addition to reporting power, simulations allows you to investigate parameter bias, model misspecification, Type I errors and convergence issues. For instance, even if the analytical power calculations work well, it can be useful to check the small sample properties of the model, e.g. estimates of cluster-level random effects with only a few clusters.

The simulation-function accepts the same study setup-object that we’ve used in the two- and three-level power vignettes. The simulation-function supports all models that are available in **powerlmm**. As an example we’ll investigate the partially nested three-level model.

When doing simulations it can be a good idea to specify the model using raw values, to ensure all parameters are reasonable. Here I use estimates from a real clinical trial.

```
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = 6,
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = 0.05,
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(-0.8,
standardizer = "pretest_SD"))
p
```

The `get_monte_carlo_se`

-function accepts an power object, and will return the 95 % interval for a given amount of simulations. This is useful to gauge the precision of the empirical power estimates.

```
x <- get_power(p)
get_monte_carlo_se(x, nsim = c(1000, 2000, 5000))
```

We would need about 5000 simulation if we want our empirical power to be +/- 1% from the analytical power.

Now, let’s run the simulation. First we need to write the **lme4**-formula, since the simulated data sets are analyzed using `lme4::lmer`

. Then we pass the `study_parameters`

-object to the `simulate`

function, and when the simulation is finished we use `summary`

to view the results. You can run the simulation in parallel by settings `cores`

> 1. The parallel computations will be done using the **parallel** package. If performed on a Windows machine *or* inside a GUI, then PSOCK clusters are used, if on Unix/macOS and the simulation is run non-interactively in a terminal then forking is used.

```
f <- "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)"
cores <- parallel::detectCores() # use all cores
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
summary(res)
```

It is also possible to automatically create the `formula`

by leaving it blank. Then the `lmer-formula`

is created by including the parameters that are not specified as `NA`

or `NULL`

in the model, see `?create_lmer_formula`

. N.B., parameters specified as 0 are included in the model.

Let’s compare the correct partially nested model to a two-level linear mixed-effects model. The formula argument accepts a named list with a `correct`

and `wrong`

formula. By setting `effect_size`

to 0 the simulation results tells us the empirical Type I errors for the correct and misspecified model.

```
p <- update(p, effect_size = 0)
f <- list("correct" = "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)",
"wrong" = "y ~ treatment * time + (1 + time | subject)")
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
summary(res)
```

All of the parameters are summarized and presented for both models. Since we specified `satterthwaite = TRUE`

, empirical power is both based on the Wald Z test and the Wald *t* test using Satterthwaite’s *df* approximation (from the **lmerTest** package).

We can also simulate multiple designs and compare them. Let’s compare 6 vs 12 clusters in the treatment arm, and `ìcc_slope`

of 0.05 and 0.1, and a Cohen’s d of 0.5 and 0.8.

```
p <- study_parameters(n1 = 11,
n2 = 10,
n3 = c(6, 12),
fixed_intercept = 37,
fixed_slope = -0.64,
sigma_subject_intercept = 2.8,
sigma_subject_slope = 0.4,
sigma_cluster_intercept = 0,
cor_subject = -0.2,
icc_slope = c(0.05, 0.1),
sigma_error = 2.6,
dropout = dropout_weibull(proportion = 0.3,
rate = 1/2),
partially_nested = TRUE,
effect_size = cohend(c(0.5, 0.8),
standardizer = "pretest_SD"))
f <- "y ~ treatment * time + (1 + time | subject) + (0 + treatment:time | cluster)"
res <- simulate(object = p,
nsim = 5000,
formula = f,
satterthwaite = TRUE,
cores = cores,
save = FALSE)
```

The setup and simulation is the same, except that we define vectors of values for some of the parameters. In this scenario we have a total of 2 * 2 * 2 = 8 different combination of parameter values. For each of the 8 setups `nsim`

number of simulations will be performed. With multiple designs `summary`

works a bit differently. Instead of summarizing all parameters, we now have to choose which parameters to summarize.

```
# Summarize the 'time:treatment' results
summary(res, para = "time:treatment", type = "fixed", model = "correct")
# Summarize the cluster-level random slope
summary(res, para = "cluster_slope", type = "random", model = "correct")
```