Effect Size Statistics for Anova Tables

This vignettes demontrates those functions of the sjstats-package that deal with Anova tables. These functions report different effect size measures, which are useful beyond significance tests (p-values), because they estimate the magnitude of effects, independent from sample size. sjstats provides following functions:

• eta_sq()
• omega_sq()
• epsilon_sq()
• cohens_f()
• anova_stats()

Befor we start, we fit a simple model:

library(sjstats)
# load sample data
data(efc)

# fit linear model
fit <- aov(
c12hour ~ as.factor(e42dep) + as.factor(c172code) + c160age,
data = efc
)

All functions accept objects of class aov or anova, so you can also use model fits from the car package, which allows fitting Anova’s with different types of sum of squares. Other objects, like lm, will be coerced to anova internally.

The following functions return the effect size statistic as named numeric vector, using the model’s term names.

Eta-Squared

The eta-squared is the proportion of the total variability in the dependent variable that is accounted for by the variation in the independent variable. It is the ratio of the sum of squares for each group level to the total sum of squares. It can be interpreted as percentage of variance accounted for by a variable.

For variables with 1 degree of freedeom (in the numerator), the square root of eta-squared is equal to the correlation coefficient r. For variables with more than 1 degree of freedom, eta-squared equals R2. This makes eta-squared easily interpretable. Furthermore, these effect sizes can easily be converted into effect size measures that can be, for instance, further processed in meta-analyses.

Eta-squared can be computed simply with:

eta_sq(fit)
#>                  term etasq
#> 1   as.factor(e42dep) 0.266
#> 2 as.factor(c172code) 0.005
#> 3             c160age 0.048

Partial Eta-Squared

The partial eta-squared value is the ratio of the sum of squares for each group level to the sum of squares for each group level plus the residual sum of squares. It is more difficult to interpret, because its value strongly depends on the variability of the residuals. Partial eta-squared values should be reported with caution, and Levine and Hullett (2002) recommend reporting eta- or omega-squared rather than partial eta-squared.

Use the partial-argument to compute partial eta-squared values:

eta_sq(fit, partial = TRUE)
#>                  term partial.etasq
#> 1   as.factor(e42dep)         0.281
#> 2 as.factor(c172code)         0.008
#> 3             c160age         0.066

Omega-Squared

While eta-squared estimates tend to be biased in certain situations, e.g. when the sample size is small or the independent variables have many group levels, omega-squared estimates are corrected for this bias.

Omega-squared can be simply computed with:

omega_sq(fit)
#>                  term omegasq
#> 1   as.factor(e42dep)   0.263
#> 2 as.factor(c172code)   0.004
#> 3             c160age   0.048

Partial Omega-Squared

omega_sq() also has a partial-argument to compute partial omega-squared values. Computing the partial omega-squared statistics is based on bootstrapping. In this case, use n to define the number of samples (1000 by default.)

omega_sq(fit, partial = TRUE, n = 100)
#>                  term partial.omegasq
#> 1   as.factor(e42dep)           0.278
#> 2 as.factor(c172code)           0.005
#> 3             c160age           0.065

Epsilon Squared

Espilon-squared is a less common measure of effect size. It is sometimes considered as an “adjusted r-squared” value. You can compute this effect size using epsilon_sq().

epsilon_sq(fit)
#>                  term epsilonsq
#> 1   as.factor(e42dep)     0.264
#> 2 as.factor(c172code)     0.004
#> 3             c160age     0.048

When the ci.lvl-argument is defined, bootstrapping is used to compute the confidence intervals.

epsilon_sq(fit, ci.lvl = .95, n = 100)
#>                  term epsilonsq conf.low conf.high
#> 1   as.factor(e42dep)     0.264    0.204     0.313
#> 2 as.factor(c172code)     0.004   -0.003     0.015
#> 3             c160age     0.048    0.025     0.073

Cohen’s F

Finally, cohens_f() computes Cohen’s F effect size for all independent variables in the model:

cohens_f(fit)
#>                  term   cohens.f
#> 1   as.factor(e42dep) 0.62555427
#> 2 as.factor(c172code) 0.08910342
#> 3             c160age 0.26689334

Complete Statistical Table Output

The anova_stats() function takes a model input and computes a comprehensive summary, including the above effect size measures, returned as tidy data frame:

anova_stats(fit)
#>                  term  df      sumsq     meansq statistic p.value etasq partial.etasq omegasq partial.omegasq epsilonsq cohens.f power
#> 1   as.factor(e42dep)   3  577756.33 192585.444   108.786   0.000 0.266         0.281   0.263           0.278     0.264    0.626  1.00
#> 2 as.factor(c172code)   2   11722.05   5861.024     3.311   0.037 0.005         0.008   0.004           0.005     0.004    0.089  0.63
#> 3             c160age   1  105169.60 105169.595    59.408   0.000 0.048         0.066   0.048           0.065     0.048    0.267  1.00
#> 4           Residuals 834 1476436.34   1770.307        NA      NA    NA            NA      NA              NA        NA       NA    NA

Like the other functions, the input may also be an object of class anova, so you can also use model fits from the car package, which allows fitting Anova’s with different types of sum of squares:

anova_stats(car::Anova(fit, type = 3))
#>                  term       sumsq     meansq  df statistic p.value etasq partial.etasq omegasq partial.omegasq epsilonsq cohens.f power
#> 1         (Intercept)   26851.070  26851.070   1    15.167   0.000 0.013         0.018   0.012           0.017     0.012    0.135 0.973
#> 2   as.factor(e42dep)  426461.571 142153.857   3    80.299   0.000 0.209         0.224   0.206           0.220     0.206    0.537 1.000
#> 3 as.factor(c172code)    7352.049   3676.025   2     2.076   0.126 0.004         0.005   0.002           0.003     0.002    0.071 0.429
#> 4             c160age  105169.595 105169.595   1    59.408   0.000 0.051         0.066   0.051           0.065     0.051    0.267 1.000
#> 5           Residuals 1476436.343   1770.307 834        NA      NA    NA            NA      NA              NA        NA       NA    NA

Confidence Intervals

eta_sq() and omega_sq() have a ci.lvl-argument, which - if not NULL - also computes a confidence interval.

For eta-squared, i.e. eta_sq() with partial = FALSE, due to non-symmetry, confidence intervals are based on bootstrap-methods. Confidence intervals for partial omega-squared, i.e. omega_sq() with partial = TRUE - is also based on bootstrapping. In these cases, n indicates the number of bootstrap samples to be drawn to compute the confidence intervals.

eta_sq(fit, partial = TRUE, ci.lvl = .8)
#>                  term partial.etasq conf.low conf.high
#> 1   as.factor(e42dep)         0.281    0.247     0.310
#> 2 as.factor(c172code)         0.008    0.001     0.016
#> 3             c160age         0.066    0.047     0.089

# uses bootstrapping - here, for speed, just 100 samples
omega_sq(fit, partial = TRUE, ci.lvl = .9, n = 100)
#>                  term partial.omegasq conf.low conf.high
#> 1   as.factor(e42dep)           0.278    0.230     0.327
#> 2 as.factor(c172code)           0.005   -0.004     0.018
#> 3             c160age           0.065    0.039     0.097

References

Levine TR, Hullet CR. Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research. Human Communication Research 28(4); 2002: 612-625