Using merTools to Marginalize Over Random Effect Levels

Jared Knowles and Carl Frederick

2019-05-12

Marginalizing Random Effects

One of the most common questions about multilevel models is how much influence grouping terms have on the outcome. One way to explore this is to simulate the predicted values of an observation across the distribution of random effects for a specific grouping variable and term. This can be described as “marginalizing” predictions over the distribution of random effects. This allows you to explore the influence of the grouping term and grouping levels on the outcome scale by simulating predictions for simulated values of each observation across the distribution of effect sizes.

The REmargins() function allows you to do this. Here, we take the example sleepstudy model and marginalize predictions for all of the random effect terms (Subject:Intercept, Subject:Days). By default, the function will marginalize over the quartiles of the expected rank (see expected rank vignette) of the effect distribution for each term.

fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
mfx <- REmargins(merMod = fm1, newdata = sleepstudy[1:10,])
head(mfx)
#>   Reaction Days Subject case grouping_var      term breaks
#> 1   249.56    0     309    1      Subject Intercept      1
#> 2   249.56    0     334    1      Subject      Days      1
#> 3   249.56    0     350    1      Subject Intercept      2
#> 4   249.56    0     330    1      Subject      Days      2
#> 5   249.56    0     308    1      Subject Intercept      3
#> 6   249.56    0     332    1      Subject      Days      3
#>   original_group_level fit_combined upr_combined lwr_combined fit_Subject
#> 1                  308     211.6022     252.5443     177.4033  -40.220047
#> 2                  308     244.4658     283.3303     207.5940   -7.471686
#> 3                  308     237.1407     277.0037     201.7126  -15.712567
#> 4                  308     276.0716     311.4830     239.3424   22.626852
#> 5                  308     256.3923     292.7700     216.6426    3.815614
#> 6                  308     262.6514     295.5489     224.5716   10.306149
#>   upr_Subject lwr_Subject fit_fixed upr_fixed lwr_fixed
#> 1   -4.177672   -74.44036  249.3960  283.5701  215.7122
#> 2   29.913604   -45.58652  251.8809  286.7075  218.0804
#> 3   21.558164   -52.58777  250.4633  288.5282  216.1605
#> 4   59.942004   -14.46200  249.9996  285.5817  216.0925
#> 5   40.481303   -34.47880  252.2277  285.3248  217.0583
#> 6   48.049021   -28.16187  252.2168  286.4449  218.3318

The new data frame output from REmargins contains a lot of information. The first few columns contain the original data passed to newdata. Each observation in newdata is identified by a case number, because the function repeats each observation by the number of random effect terms and number of breaks to simulate each term over. Then the grouping_var

Summarizing

Plotting

Finally - you can plot the results marginalization to evaluate the effect of the random effect terms graphically.

ggplot(mfx) + aes(x = breaks, y = fit_Subject, group = case) +
  geom_line() +
  facet_wrap(~term)