Usage guidance

2020-12-14

Introduction

DescrTab2 is the replacement of the DescrTab package. It supports a variety of different customization options and can be used in .Rmd files in conjunction with knitr.

Preamble settings

imbi_report

You’re all set. Everything is already included.

pdf_document

Here is what you need to include in the yaml header to use DescrTab2 inside .Rmd file with pdf_document output:

---
title: "DescrTab2 tutorial"
header-includes:
   - \usepackage{needspace}
   - \usepackage{longtable}
   - \usepackage{booktabs}
output: pdf_document
---

html & word_document

No special preamble needed. Make sure you have pandoc version >= 2.0 installed on your system.

Global print_format option

In order for DescrTab2 to work properly with your document type of choice, you need to set the printFormat options, preferably right at the start of your document. You can do this by typing:

options(print_format = "html") # or = "word" or "tex", depending on your document type

Getting started

For instructive purposes, we will use the following dataset:

dat <- iris[, c("Species", "Sepal.Length")]
dat %<>% mutate(animal= c("Mammal", "Fish") %>% rep(75) %>% factor())
dat %<>% mutate(food= c("fries", "wedges") %>% sample(150, TRUE) %>% factor())

Make sure you include the DescrTab library by typing

library(DescrTab2)

somewhere in the document before you use it. You are now ready to go! Producing beautiful descriptive tables in html and tex is now as easy as typing:

```{r, results='asis'}
descr(dat)
```
Variables
Total
p
(N=150)
Species
setosa 50 (33%) >0.999chi1
versicolor 50 (33%)
virginica 50 (33%)
Sepal.Length
N 150 <0.001tt1
mean 5.8
sd 0.83
median 5.8
Q1 - Q3 5.1 – 6.4
min - max 4.3 – 7.9
animal
Fish 75 (50%) >0.999chi1
Mammal 75 (50%)
food
fries 74 (49%) 0.870chi1
wedges 76 (51%)
chi1 Chi-squared goodness-of-fit test
tt1 Students one-sample t-test

Note the chunk option results='asis'. DescrTab2 produces raw LaTeX or hmtl code. To get pandoc to render this properly, the results='asis' option is required. An alternative will be described later.

To produce descriptive tables for a word document, a bit more typing is required:

```{r}
descr(dat) %>% print() %>% knitr::knit_print()
```

When producing word tables in this fashion, you must not have the results='asis' chunk option set.

Note that DescrTab2 can also produce console output! In fact, this is the default setting (i.e. if the global DescrTabFormat is not specified)

Accessing table elements

The object returned from the descr function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by descr:

my_table <- descr(dat)

You can then access the elements of the list using the $ operator.

my_table$variables$Sepal.Length$results$Total$mean
#> [1] 5.843333

Rstudios autocomplete suggestions are very helpful when navigating this list.

The print function returns a formatted version of this list, which you can also save and access using the same syntax.

my_table <- descr(dat) %>% print(silent=TRUE)

Specifying a group

Use the group option to specify the name of a grouping variable in your data:

descr(dat, "Species")
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Assigning labels

Use the group_labels option to assign group labels and the var_labels option to assign variable labels:

descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))
Variables
My custom group label
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
My custom variable label
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Confidence intervals for two group comparisons

For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:

descr(dat, "animal")
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999chi2
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961tt2 Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 41 (55%) 33 (44%) 74 (49%) 0.253chi2 Prop. dif. CI
wedges 34 (45%) 42 (56%) 76 (51%) [-0.066, 0.28]
chi2 Pearsons chi-squared test
tt2 Welchs two-sample t-test

Different tests

There are a lot of different tests available. Check out the test_choice vignette for details: vignette("test_choice_tree", "DescrTab2"), or look at https://imbi-heidelberg.github.io/DescrTab2/articles/test_choice_tree_html.html

Here are some different tests in action:

descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 0.871MWU HL CI
mean 5.8 5.8 5.8 [-0.3, 0.3]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 41 (55%) 33 (44%) 74 (49%) 0.221Bolo Prop. dif. CI
wedges 34 (45%) 42 (56%) 76 (51%) [-0.066, 0.28]
MWU Mann-Whitney U test
Bolo Boschloos test
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001KW
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
KW Kruskal-Wallis one-way ANOVA
descr(dat %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:50, 3)))
#> Warning in test_cont(var, group, test_options, test_override, var_name): Removed
#> paired observations with missings.
#> Error converted to warning: Error in t.test.default(x, y, paired = TRUE): not enough 'x' observations
#> Error converted to warning: Error in test_cat(var, group, test_options, test_override, var_name): tmp1$idx == tmp2$idx are not all TRUE
Variables
Fish
Mammal
Total
p
CI
(N=75) (N=75) (N=150)
Sepal.Length
N 75 75 150 NA
mean 5.8 5.8 5.8 [NA, NA]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 41 (55%) 33 (44%) 74 (49%) NA
wedges 34 (45%) 42 (56%) 76 (51%) [NA, NA]
Unknown test Test errored, check console output.

Significant digits

Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:

descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Omitting summary statistics

Let’s say you don’t want to calculate quantiles for your numeric variables. You can specify the summary_stats_cont option to include all summary statistics but quantiles:

descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean =
    DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max =
    DescrTab2:::.max))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Adding summary statistics

Let’s say you have a categorical variable, but for some reason it’s levels are numerals and you want to calculate the mean. No problem:

# Create example dataset
dat2 <- iris
dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor()
dat2 <- dat2[, c("Species", "cat_var")]

descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
cat_var
mean 1.6 1.4 1.5 1.5 0.487chi2
1 22 (44%) 28 (56%) 25 (50%) 75 (50%)
2 28 (56%) 22 (44%) 25 (50%) 75 (50%)
chi2 Pearsons chi-squared test

Combining mean and sd

Use the format_options = list(combine_mean_sd=TRUE) option:

descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean ± sd 5 ± 0.35 5.9 ± 0.52 6.6 ± 0.64 5.8 ± 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

Omitting p values

You can declare the format_options = list(print_p = FALSE) option to omit p-values:

descr(dat, "animal", format_options = list(print_p = FALSE))
Variables
Fish
Mammal
Total
CI
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%)
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 Mean dif. CI
mean 5.8 5.8 5.8 [-0.26, 0.27]
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 41 (55%) 33 (44%) 74 (49%) Prop. dif. CI
wedges 34 (45%) 42 (56%) 76 (51%) [-0.066, 0.28]

Similarily for Confidence intervals:

descr(dat, "animal", format_options = list(print_CI = FALSE))
Variables
Fish
Mammal
Total
p
(N=75) (N=75) (N=150)
Species
setosa 25 (33%) 25 (33%) 50 (33%) >0.999chi2
versicolor 25 (33%) 25 (33%) 50 (33%)
virginica 25 (33%) 25 (33%) 50 (33%)
Sepal.Length
N 75 75 150 0.961tt2
mean 5.8 5.8 5.8
sd 0.86 0.81 0.83
median 5.7 5.8 5.8
Q1 - Q3 5.1 – 6.4 5.1 – 6.5 5.1 – 6.4
min - max 4.3 – 7.9 4.4 – 7.7 4.3 – 7.9
food
fries 41 (55%) 33 (44%) 74 (49%) 0.253chi2
wedges 34 (45%) 42 (56%) 76 (51%)
chi2 Pearsons chi-squared test
tt2 Welchs two-sample t-test

Printing without results=‘asis’

Sometimes, e.g. if you have a loop inside your R-chunk and you want to plot graphics in between descriptive tables, it is necessary not to have the results=‘asis’ option. You can still use DescrTab2 with the following commands:

```{r}
capture.output(print(descr(dat, "Species"))) %>%  knitr::raw_html() # or knitr::raw_tex() for tex
```
capture.output(print(descr(dat, "Species"))) %>%  knitr::raw_html() # or knitr::raw_tex() for tex
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001F
mean 5 5.9 6.6 5.8
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 -- 5.2 5.6 -- 6.3 6.2 -- 6.9 5.1 -- 6.4
min - max 4.3 -- 5.8 4.9 -- 7 4.9 -- 7.9 4.3 -- 7.9
animal
Fish 25 (50%) 25 (50%) 25 (50%) 75 (50%) >0.999chi2
Mammal 25 (50%) 25 (50%) 25 (50%) 75 (50%)
food
fries 24 (48%) 23 (46%) 27 (54%) 74 (49%) 0.707chi2
wedges 26 (52%) 27 (54%) 23 (46%) 76 (51%)
F F-test (ANOVA)
chi2 Pearsons chi-squared test

In word documents this is irrelevant, because you never have to specify results='asis'

Controling options on a per-variable level

You can use the var_options list to control formatting and test options on a per-variable basis. Let’s say in the dataset iris, we want that only the Sepal.Length variable has more digits in the mean and a nonparametric test:

descr(iris, "Species", var_options = list(Sepal.Length = list(
  format_summary_stats = list(
    mean = function(x)
      formatC(x, digits = 4)
  ),
  test_options = c(nonparametric = TRUE)
)))
Variables
setosa
versicolor
virginica
Total
p
(N=50) (N=50) (N=50) (N=150)
Sepal.Length
N 50 50 50 150 <0.001KW
mean 5.006 5.936 6.588 5.843
sd 0.35 0.52 0.64 0.83
median 5 5.9 6.5 5.8
Q1 - Q3 4.8 – 5.2 5.6 – 6.3 6.2 – 6.9 5.1 – 6.4
min - max 4.3 – 5.8 4.9 – 7 4.9 – 7.9 4.3 – 7.9
Sepal.Width
N 50 50 50 150 <0.001F
mean 3.4 2.8 3 3.1
sd 0.38 0.31 0.32 0.44
median 3.4 2.8 3 3
Q1 - Q3 3.2 – 3.7 2.5 – 3 2.8 – 3.2 2.8 – 3.3
min - max 2.3 – 4.4 2 – 3.4 2.2 – 3.8 2 – 4.4
Petal.Length
N 50 50 50 150 <0.001F
mean 1.5 4.3 5.6 3.8
sd 0.17 0.47 0.55 1.8
median 1.5 4.3 5.5 4.3
Q1 - Q3 1.4 – 1.6 4 – 4.6 5.1 – 5.9 1.6 – 5.1
min - max 1 – 1.9 3 – 5.1 4.5 – 6.9 1 – 6.9
Petal.Width
N 50 50 50 150 <0.001F
mean 0.25 1.3 2 1.2
sd 0.11 0.2 0.27 0.76
median 0.2 1.3 2 1.3
Q1 - Q3 0.2 – 0.3 1.2 – 1.5 1.8 – 2.3 0.3 – 1.8
min - max 0.1 – 0.6 1 – 1.8 1.4 – 2.5 0.1 – 2.5
KW Kruskal-Wallis one-way ANOVA
F F-test (ANOVA)