# SimplifyStats Vignette

#### 2019-03-11

SimplifyStats provides a set of functions to simplify the process of 1) generating descriptive statistics for the numeric variables of multiple groups and 2) performing hypothesis testing between all combinations of groups.

## Generate group-wise descriptive statistics

The function group_summarize can be used to generate descriptive statistics for multiple groups based on unique combinations of the grouping variables.

library(SimplifyStats)

# Generate data.
df <- iris

# Modify df to demonstrate additional functionality.
df$Sepal.Length[1] <- NA ## Add another grouping variable. df$Condition <- rep(c("untreated","treated"), 75)

# Generate descriptive statistics.
group_summarize(
df,
group_cols = c("Species","Condition"),
var_cols = c("Sepal.Length","Sepal.Width"),
na.rm = TRUE
)
#> # A tibble: 12 x 17
#>    Variable Species Condition     N  Mean StdDev StdErr   Min Quartile1
#>    <chr>    <fct>   <chr>     <int> <dbl>  <dbl>  <dbl> <dbl>     <dbl>
#>  1 Sepal.L~ setosa  untreated    24  5.02  0.399 0.0814   4.4      4.77
#>  2 Sepal.L~ setosa  treated      25  4.99  0.317 0.0633   4.3      4.8
#>  3 Sepal.L~ versic~ untreated    25  5.99  0.556 0.111    5        5.6
#>  4 Sepal.L~ versic~ treated      25  5.88  0.478 0.0956   4.9      5.6
#>  5 Sepal.L~ virgin~ untreated    25  6.50  0.603 0.121    4.9      6.2
#>  6 Sepal.L~ virgin~ treated      25  6.67  0.669 0.134    5.6      6.3
#>  7 Sepal.W~ setosa  untreated    25  3.48  0.325 0.0651   2.9      3.2
#>  8 Sepal.W~ setosa  treated      25  3.38  0.426 0.0853   2.3      3.1
#>  9 Sepal.W~ versic~ untreated    25  2.78  0.336 0.0672   2        2.6
#> 10 Sepal.W~ versic~ treated      25  2.76  0.297 0.0594   2.3      2.5
#> 11 Sepal.W~ virgin~ untreated    25  2.94  0.287 0.0574   2.5      2.8
#> 12 Sepal.W~ virgin~ treated      25  3.01  0.356 0.0713   2.2      2.8
#> # ... with 8 more variables: Median <dbl>, Quartile3 <dbl>, Max <dbl>,
#> #   PropNA <dbl>, Kurtosis <dbl>, Skewness <dbl>,
#> #   Jarque-Bera_p.value <dbl>, Shapiro-Wilk_p.value <dbl>

## Perform pair-wise hypothesis testing

Similarly, the function pairwise_stats can be used to perform pairwise statistical tests for multiple variables based on unique combinations of the grouping variables.

# Generate descriptive statistics.
pairwise_stats(
df,
group_cols = c("Species","Condition"),
var_cols = c("Sepal.Length", "Sepal.Width"),
t.test
)
#> # A tibble: 30 x 15
#>    Variable A.Species A.Condition B.Species B.Condition estimate estimate1
#>    <chr>    <fct>     <chr>       <fct>     <chr>          <dbl>     <dbl>
#>  1 Sepal.L~ setosa    untreated   setosa    treated       0.0328      5.02
#>  2 Sepal.L~ setosa    untreated   versicol~ untreated    -0.971       5.02
#>  3 Sepal.L~ setosa    untreated   versicol~ treated      -0.859       5.02
#>  4 Sepal.L~ setosa    untreated   virginica untreated    -1.48        5.02
#>  5 Sepal.L~ setosa    untreated   virginica treated      -1.65        5.02
#>  6 Sepal.L~ setosa    treated     versicol~ untreated    -1.00        4.99
#>  7 Sepal.L~ setosa    treated     versicol~ treated      -0.892       4.99
#>  8 Sepal.L~ setosa    treated     virginica untreated    -1.52        4.99
#>  9 Sepal.L~ setosa    treated     virginica treated      -1.68        4.99
#> 10 Sepal.L~ versicol~ untreated   versicol~ treated       0.112       5.99
#> # ... with 20 more rows, and 8 more variables: estimate2 <dbl>,
#> #   statistic <dbl>, p.value <dbl>, parameter <dbl>, conf.low <dbl>,
#> #   conf.high <dbl>, method <chr>, alternative <chr>