Introduction

In many real world applications there are no straightforward ways of obtaining standardized effect sizes. However, it is possible to get approximations of most of the effect size indices ((d), (r), (\eta^2_p)…) with the use of test statistics. These conversions are based on the idea that test statistics are a function of effect size and sample size. Thus information about samples size (or more often of degrees of freedom) is used to reverse-engineer indices of effect size from test statistics. This idea and these functions also power our Effect Sizes From Test Statistics shiny app.

The measures discussed here are, in one way or another, signal to noise ratios, with the “noise” representing the unaccounted variance in the outcome variable[1].

The indices are:

• Percent variance explained ((\eta^2_p), (\omega^2_p), (\epsilon^2_p)).
• Measure of association ((r)).
• Measure of difference ((d)).

(Partial) Percent Variance Explained

These measures represent the ratio of (Signal^2 / (Signal^2 + Noise^2)), with the “noise” having all other “signals” partial-ed out (be they of other fixed or random effects). The most popular of these indices is (\eta^2_p) (Eta; which is equivalent to (R^2)).

The conversion of the (F)- or (t)-statistic is based on Friedman (1982).

Let’s look at an example:

library(afex)

data(md_12.1)

aov_fit <- aov_car(rt ~ angle * noise + Error(id/(angle * noise)),
data = md_12.1,
anova_table=list(correction = "none", es = "pes"))
aov_fit
> Anova Table (Type 3 tests)
>
> Response: rt
>        Effect    df     MSE         F  pes p.value
> 1       angle 2, 18 3560.00 40.72 *** .819   <.001
> 2       noise  1, 9 8460.00 33.77 *** .790   <.001
> 3 angle:noise 2, 18 1160.00 45.31 *** .834   <.001
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1


Let’s compare the (\eta^2_p) (the pes column) obtained here with ones recovered from F_to_eta2():

library(effectsize)

F_to_eta2(
f = c(40.72, 33.77, 45.31),
df = c(2, 1, 2),
df_error = c(18, 9, 18)
)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.82 | [0.66, 0.89]
>           0.79 | [0.49, 0.89]
>           0.83 | [0.69, 0.90]


They are identical![2] (except for the fact that F_to_eta2() also provides confidence intervals[3] :)

In this case we were able to easily obtain the effect size (thanks to afex!), but in other cases it might not be as easy, and using estimates based on test statistic offers a good approximation.

For example:

In Simple Effect and Contrast Analysis

library(emmeans)

joint_tests(aov_fit, by = "noise")
> noise = absent:
>  model term df1 df2 F.ratio p.value
>  angle        2  29       5 0.0144
>
> noise = present:
>  model term df1 df2 F.ratio p.value
>  angle        2  29      79 <.0001

F_to_eta2(f = c(5, 79),
df = 2,
df_error = 29)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.26 | [0.04, 0.44]
>           0.84 | [0.75, 0.89]


We can also use t_to_eta2() for contrast analysis:

pairs(emmeans(aov_fit, ~ angle))
> NOTE: Results may be misleading due to involvement in interactions

>  contrast estimate   SE df t.ratio p.value
>  X0 - X4      -108 18.9 18 -5.700  0.0001
>  X0 - X8      -168 18.9 18 -8.900  <.0001
>  X4 - X8       -60 18.9 18 -3.200  0.0137
>
> Results are averaged over the levels of: noise
> P value adjustment: tukey method for comparing a family of 3 estimates

t_to_eta2(t = c(-5.7, -8.9, -3.2),
df_error = 18)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.64 | [0.39, 0.78]
>           0.81 | [0.66, 0.88]
>           0.36 | [0.09, 0.58]


In Linear Mixed Models

library(lmerTest)

fit_lmm <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

anova(fit_lmm)
> Type III Analysis of Variance Table with Satterthwaite's method
>      Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)
> Days  30031   30031     1    17    45.9 3.3e-06 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

F_to_eta2(45.8, 1, 17)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.73 | [0.51, 0.83]


We can also use t_to_eta2() for the slope of Days (which in this case gives the same result).

model_parameters(fit_lmm, df_method = "satterthwaite")
> Parameter   | Coefficient |   SE |           95% CI |     t |    df |      p
> ----------------------------------------------------------------------------
> (Intercept) |      251.41 | 6.82 | [238.03, 264.78] | 36.84 | 17.00 | < .001
> Days        |       10.47 | 1.55 | [  7.44,  13.50] |  6.77 | 17.00 | < .001

t_to_eta2(6.77, df_error = 17)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.73 | [0.51, 0.83]


Bias-Corrected Indices

Alongside (\eta^2_p) there are also the less biased (\omega_p^2) (Omega) and (\epsilon^2_p) (Epsilon; sometimes called (\text{Adj. }\eta^2_p), which is equivalent to (R^2_{adj}); Albers and Lakens (2018), Mordkoff (2019)).

F_to_eta2(45.8, 1, 17)
> Eta2 (partial) |       90% CI
> -----------------------------
>           0.73 | [0.51, 0.83]

F_to_epsilon2(45.8, 1, 17)
> Epsilon2 (partial) |       90% CI
> ---------------------------------
>               0.71 | [0.48, 0.82]

F_to_omega2(45.8, 1, 17)
> Omega2 (partial) |       90% CI
> -------------------------------
>             0.70 | [0.47, 0.82]


Measure of Association

Similar to (\eta^2_p), (r) is a signal to noise ratio, and is in fact equal to (\sqrt{\eta^2_p}) (so it’s really a partial (r)). It is often used instead of (\eta^2_p) when discussing the strength of association (but I suspect people use it instead of (\eta^2_p) because it gives a bigger number, which looks better).

For Slopes

model_parameters(fit_lmm, df_method = "satterthwaite")
> Parameter   | Coefficient |   SE |           95% CI |     t |    df |      p
> ----------------------------------------------------------------------------
> (Intercept) |      251.41 | 6.82 | [238.03, 264.78] | 36.84 | 17.00 | < .001
> Days        |       10.47 | 1.55 | [  7.44,  13.50] |  6.77 | 17.00 | < .001

t_to_r(6.77, df_error = 17)
>    r |       95% CI
> -------------------
> 0.85 | [0.67, 0.92]


In a fixed-effect linear model, this returns the partial correlation. Compare:

fit_lm <- lm(Sepal.Length ~ Sepal.Width + Petal.Length, data = iris)

model_parameters(fit_lm)
> Parameter    | Coefficient |   SE |       95% CI |     t |  df |      p
> -----------------------------------------------------------------------
> (Intercept)  |        2.25 | 0.25 | [1.76, 2.74] |  9.07 | 147 | < .001
> Sepal.Width  |        0.60 | 0.07 | [0.46, 0.73] |  8.59 | 147 | < .001
> Petal.Length |        0.47 | 0.02 | [0.44, 0.51] | 27.57 | 147 | < .001

t_to_r(t = c(8.59, 27.57),
df_error = 147)
>    r |       95% CI
> -------------------
> 0.58 | [0.47, 0.66]
> 0.92 | [0.89, 0.93]


to:

correlation::correlation(iris[,1:3], partial = TRUE)[1:2, c(1:3,7:8)]
> Parameter1   |   Parameter2 |    r |  df |      p
> -------------------------------------------------
> Sepal.Length |  Sepal.Width | 0.58 | 148 | < .001
> Sepal.Length | Petal.Length | 0.92 | 148 | < .001


In Contrast Analysis

This measure is also sometimes used in contrast analysis, where it is called the point bi-serial correlation - (r_{pb}) (Cohen and others 1965; Rosnow, Rosenthal, and Rubin 2000):

pairs(emmeans(aov_fit, ~ angle))
> NOTE: Results may be misleading due to involvement in interactions

>  contrast estimate   SE df t.ratio p.value
>  X0 - X4      -108 18.9 18 -5.700  0.0001
>  X0 - X8      -168 18.9 18 -8.900  <.0001
>  X4 - X8       -60 18.9 18 -3.200  0.0137
>
> Results are averaged over the levels of: noise
> P value adjustment: tukey method for comparing a family of 3 estimates

t_to_r(t = c(-5.7, -8.9, -3.2),
df_error = 18)
>     r |         95% CI
> ----------------------
> -0.80 | [-0.89, -0.57]
> -0.90 | [-0.95, -0.78]
> -0.60 | [-0.79, -0.22]


Measures of Difference

These indices represent (Signal/Noise) with the “signal” representing the difference between two means. This is akin to Cohen’s (d), and is a close approximation when comparing two groups of equal size (Wolf 1986; Rosnow, Rosenthal, and Rubin 2000).

These can be useful in contrast analyses.

Between-Subject Contrasts

warp.lm <- lm(breaks ~ tension, data = warpbreaks)

pairs(emmeans(warp.lm,  ~ tension))
>  contrast estimate SE df t.ratio p.value
>  L - M        10.0  4 51 2.500   0.0400
>  L - H        14.7  4 51 3.700   <.0001
>  M - H         4.7  4 51 1.200   0.4600
>
> P value adjustment: tukey method for comparing a family of 3 estimates

t_to_d(t = c(2.5, 3.7, 1.2),
df_error = 51)
>    d |        95% CI
> --------------------
> 0.70 | [ 0.13, 1.26]
> 1.04 | [ 0.45, 1.62]
> 0.34 | [-0.22, 0.89]


Within-Subject Contrasts

pairs(emmeans(aov_fit, ~ angle))
> NOTE: Results may be misleading due to involvement in interactions

>  contrast estimate   SE df t.ratio p.value
>  X0 - X4      -108 18.9 18 -5.700  0.0001
>  X0 - X8      -168 18.9 18 -8.900  <.0001
>  X4 - X8       -60 18.9 18 -3.200  0.0137
>
> Results are averaged over the levels of: noise
> P value adjustment: tukey method for comparing a family of 3 estimates

t_to_d(t = c(-5.7,-5.9,-3.2),
df_error = 18,
paired = TRUE)
>     d |         95% CI
> ----------------------
> -1.34 | [-1.97, -0.70]
> -1.39 | [-2.03, -0.74]
> -0.75 | [-1.27, -0.22]


(Note paired = TRUE to not over estimate the size of the effect; Rosenthal (1991); Rosnow, Rosenthal, and Rubin (2000))

References

Albers, Casper, and Daniël Lakens. 2018. “When Power Analyses Based on Pilot Data Are Biased: Inaccurate Effect Size Estimators and Follow-up Bias.” Journal of Experimental Social Psychology 74: 187–95.

Cohen, Jacob, and others. 1965. “Some Statistical Issues in Psychological Research.” Handbook of Clinical Psychology, 95–121.

Friedman, Herbert. 1982. “Simplified Determinations of Statistical Power, Magnitude of Effect and Research Sample Sizes.” Educational and Psychological Measurement 42 (2): 521–26.

Mordkoff, J Toby. 2019. “A Simple Method for Removing Bias from a Popular Measure of Standardized Effect Size: Adjusted Partial Eta Squared.” Advances in Methods and Practices in Psychological Science 2 (3): 228–32.

Rosenthal, Robert. 1991. “Meta-Analytic Procedures for Social Sciences.” Newbury Park, CA: Sage 10: 9781412984997.

Rosnow, Ralph L, Robert Rosenthal, and Donald B Rubin. 2000. “Contrasts and Correlations in Effect-Size Estimation.” Psychological Science 11 (6): 446–53.

Wolf, Fredric M. 1986. Meta-Analysis: Quantitative Methods for Research Synthesis. Vol. 59. Sage.

1. Note that for generalized linear models (Poisson, Logistic…), where the outcome is never on an arbitrary scale, estimates themselves are indices of effect size! Thus this vignette is relevant only to general linear models.

2. Note that these are partial percent variance explained, and so their sum can be larger than 1.

3. Confidence intervals for all indices are estimated using the non-centrality parameter method; These methods search for a the best non-central parameter of the non-central (F)/(t) distribution for the desired tail-probabilities, and then convert these ncps to the corresponding effect sizes.