# Statistical Test Plotting with ‘ggasym’

## Purpose

One of the great uses of ‘ggasym’ is to plot two values from the results of a multi-way statistical test. Each comparison is a cell, and two values can be used for the fills. Below I give brief examples and plot the differences in mean and the p-value on the symmetric matrix.

library(dplyr)
library(ggplot2)
library(tibble)
library(purrr)
library(broom)
library(ggasym)

## Data

The data will be modeled as expression values of 6 genes, each with 10 measurements. I will then as if any of them have different levels of expression.

n_reps <- 10  # number of measurements per gene
expt_std_dev <- 1.5  # std. dev. of measurements
genes <- c("FAK", "talin", "paxillin", "vinculin", "B1integrin", "kindlin")
# "real" expression levels to be used as the mean in rnorm
real_expression_levels <- sample(seq(1, 5, 0.1), length(genes), replace = TRUE)
# create a tibble
expr_data <- tibble(gene = rep(genes, n_reps),
real_expr = rep(real_expression_levels, n_reps),
rep_num = sort(rep(1:n_reps, length(genes))))
# add in the measured expression values as a normal distribution around the mean
expr_data <- expr_data %>%
mutate(expt_expr = rnorm(nrow(expr_data),
mean = real_expr,
sd = expt_std_dev))
#> # A tibble: 6 x 4
#>   gene       real_expr rep_num expt_expr
#>   <chr>          <dbl>   <int>     <dbl>
#> 1 FAK              2.3       1     2.92
#> 2 talin            1.3       1    -1.01
#> 3 paxillin         4.8       1     3.41
#> 4 vinculin         1         1     0.558
#> 5 B1integrin       4.3       1     4.29
#> 6 kindlin          3.2       1     6.81

## Plotting

Now I want to plot the estimate in the top-left and p adj in the bottom right. First, I must prepare the data for use with geom_asymmat() by passing the results of the Tukey post-hoc test to asymmetrise_stats(). You can see that it returns the data in a tibble with new columns x and y that are the result of splitting comparison.

asymmat_tib <- asymmetrise_stats(tukey_res)
#> # A tibble: 6 x 8
#>   term  comparison    estimate conf.low conf.high adj.p.value x      y
#>   <chr> <chr>            <dbl>    <dbl>     <dbl>       <dbl> <chr>  <chr>
#> 1 gene  FAK-B1integr…   -1.81     -3.70    0.0877  0.0698     FAK    B1int…
#> 2 gene  kindlin-B1in…   -1.27     -3.16    0.628   0.369      kindl… B1int…
#> 3 gene  paxillin-B1i…    0.440    -1.46    2.34    0.983      paxil… B1int…
#> 4 gene  talin-B1inte…   -3.26     -5.15   -1.36    0.0000698  talin  B1int…
#> 5 gene  vinculin-B1i…   -3.79     -5.68   -1.89    0.00000356 vincu… B1int…
#> 6 gene  kindlin-FAK      0.540    -1.36    2.44    0.958      kindl… FAK

Finally, I can plot the data using geom_asymmat().

ggplot(asymmat_tib, aes(x = x, y = y)) +
geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
scale_fill_tl_gradient2(low = "dodgerblue", high = "tomato") +
scale_fill_br_distiller(type = "seq", palette = "Greens", direction = 1)

And add a few styling changes with normal ‘ggplot2’ semantics.

ggplot(asymmat_tib, aes(x = x, y = y)) +
geom_asymmat(aes(fill_tl = estimate, fill_br = -log(adj.p.value))) +
scale_fill_tl_gradient2(low = "dodgerblue", high = "tomato",
guide = guide_colourbar(order = 1)) +
scale_fill_br_distiller(type = "seq", palette = "Greens", direction = 1,
guide = guide_colourbar(order = 2)) +
theme_bw() +
theme(panel.background = element_rect(fill = "grey50"),
panel.grid = element_blank(),
axis.title = element_blank(),
plot.title = element_text(hjust = 0.5)) +
labs(title = "Differential Gene Expression",
fill_tl = "diff. in\nmean expr.",
fill_br = "-log( adj. p-value )") +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0))

One of the conclusions that can be drawn here is that the difference in expression of kindlin and FAK is the greatest and has a very low adjusted p-value. Thus, one of the conclusion is that kindlin is expressed at a lower level than FAK.