Technical Details: Difference between ggpredict() and ggemmeans()

Daniel Lüdecke

2019-09-03

ggpredict() and ggemmeans() compute predicted values for all possible levels or values from a model’s predictor. Basically, ggpredict() wraps the predict()-method for the related model, while ggemmeans() wraps the emmeans()-method from the emmeans-package. Both ggpredict() and ggemmeans() do some data-preparation to bring the data in shape for the newdata-argument (predict()) resp. the at-argument (emmeans()). It is recommended to read the general introduction first, if you haven’t done this yet.

For models without categorical predictors, the results from ggpredict() and ggemmeans() are identical (except some slight differences in the associated confidence intervals, which are, however, negligable).

library(ggeffects)
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    75.072     1.077   72.962    77.183
#>   20    70.155     0.895   68.400    71.909
#>   45    64.008     0.818   62.405    65.610
#>   65    59.090     0.902   57.323    60.857
#>   85    54.172     1.087   52.042    56.302
#>  105    49.255     1.331   46.645    51.864
#>  125    44.337     1.609   41.184    47.490
#>  170    33.272     2.289   28.787    37.758
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted conf.low conf.high
#>    0    75.072   72.959    77.186
#>   20    70.155   68.398    71.912
#>   45    64.008   62.403    65.612
#>   65    59.090   57.320    60.860
#>   85    54.172   52.039    56.305
#>  105    49.255   46.641    51.868
#>  125    44.337   41.180    47.494
#>  170    33.272   28.780    37.764
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

As can be seen, the continuous predictor neg_c_7 is held constant at its mean value, 11.83. For categorical predictors, ggpredict() and ggemmeans() behave differently. While ggpredict() uses the reference level of each categorical predictor to hold it constant, ggemmeans() - like ggeffects() - averages over the proportions of the categories of factors.

library(sjmisc)
data(efc)
efc$e42dep <- to_label(efc$e42dep)
fit <- lm(barthtot ~ c12hour + neg_c_7 + e42dep, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    92.745     2.173   88.485    97.004
#>   20    91.317     2.169   87.067    95.567
#>   45    89.532     2.208   85.206    93.859
#>   65    88.105     2.274   83.649    92.561
#>   85    86.677     2.368   82.037    91.318
#>  105    85.250     2.486   80.376    90.123
#>  125    83.822     2.627   78.674    88.970
#>  170    80.610     3.005   74.721    86.499
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted conf.low conf.high
#>    0    73.515   71.853    75.176
#>   20    72.087   70.646    73.528
#>   45    70.302   68.894    71.711
#>   65    68.875   67.287    70.462
#>   85    67.447   65.550    69.344
#>  105    66.019   63.735    68.304
#>  125    64.592   61.875    67.309
#>  170    61.380   57.608    65.152
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

In this case, one would obtain the same results for ggpredict() and ggemmeans() again, if condition is used to define specific levels at which variables, in our case the factor e42dep, should be held constant.

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    92.745     2.173   88.485    97.004
#>   20    91.317     2.169   87.067    95.567
#>   45    89.532     2.208   85.206    93.859
#>   65    88.105     2.274   83.649    92.561
#>   85    86.677     2.368   82.037    91.318
#>  105    85.250     2.486   80.376    90.123
#>  125    83.822     2.627   78.674    88.970
#>  170    80.610     3.005   74.721    86.499
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour", condition = c(e42dep = "independent"))
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted conf.low conf.high
#>    0    92.745   88.479    97.010
#>   20    91.317   87.061    95.573
#>   45    89.532   85.199    93.865
#>   65    88.105   83.642    92.567
#>   85    86.677   82.030    91.324
#>  105    85.250   80.370    90.130
#>  125    83.822   78.667    88.977
#>  170    80.610   74.712    86.507
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Creating plots is as simple as described in the vignette Plotting Marginal Effects.

ggemmeans(fit, terms = c("c12hour", "e42dep")) %>% plot()