Tutorial: fmt_table1

Daniel D. Sjoberg

Last Updated: November 29, 2018

Introduction

This vignette will walk a reader through the fmt_table1() function, and the various functions available to modify and make additions to an existing Table 1.

To start, a quick note on the magrittr package’s pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to fmt_table1() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Here’s how this translates into the use of fmt_table1().

mtcars %>% fmt_table1() is equivalent to fmt_table1(mtcars)
mtcars %>% fmt_table1(by = "am") is equivalent to fmt_table1(mtcars, by = "am")
fmt_table1(mtcars, by = "am") %>% add_comparison() is equivalent to
    t = fmt_table1(mtcars, by = "am")
    add_comparison(t)

Basic Usage

We’ll be using the trial data set throughout this example. This set contains data from 200 patients randomized to a new adjuvant therapy or placebo. The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") = "Treatment Randomization"). These labels are displayed in the output table by default. A data frame without labels will print variable names.

trt      Treatment Randomization
age      Age, yrs
marker   Marker Level, ng/mL
stage    T Stage
grade    Grade
response Tumor Response
library(dplyr)
library(knitr)
library(kableExtra)
library(gtsummary)

# printing trial data
head(trial) %>% kable()
trt age marker stage grade response
Drug 23 0.160 T3 I 1
Drug 9 1.107 T4 III 1
Drug 31 0.277 T1 I 1
Placebo 46 2.067 T4 II 1
Drug 51 2.767 T2 II 0
Drug 39 0.613 T1 III 1

The default output from fmt_table1() is meant to be publication ready. Let’s start by creating a descriptive statistics table from the trial data set built into the gtsummary package. The fmt_table1() can take, minimally, a data set as the only input, and return descriptive statistics for each column in the data frame.

For brevity, keeping a subset of the variables in the trial data set.

trial2 =
  trial %>%
  select(trt, marker, stage)

fmt_table1(trial2)
Variable N = 200
Treatment Randomization
Drug 107 (54%)
Placebo 93 (46%)
Marker Level, ng/mL 0.68 (0.22, 1.42)
Unknown 8
T Stage
T1 51 (26%)
T2 49 (24%)
T3 42 (21%)
T4 58 (29%)

If your output does not appear in a formatted table, it is likely due to a known issue in the knitr::kable() function. One way around the issue to to add styling from the kableExtra package.
fmt_table1(trial2) %>% as_tibble() %>% knitr::kable() %>% kableExtra::kable_styling()

This is a great table, but for trial data the summary statistics should be split by randomization group. While reporting p-values for a randomized trial isn’t recommended, we’ll do it here as an illustration. To compare two or more groups, include add_comparison() to the function call.

fmt_table1(trial2, by = "trt") %>% add_comparison()
Variable Drug Placebo p-value
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63) 0.4
Unknown 4 4
T Stage 0.13
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)

Customize Table 1 Output

It’s also possible to add information to fmt_table1() output. The code below calculates the standard table with summary statistics split by treatment randomization with the following modifications

trial2 %>%
  # build base table 1
  fmt_table1(
    by = "trt",
    # change variable labels
    label = list(
      marker = "Pretreatment Marker Level, ng/mL",
      stage = "Clinical T Stage"
      ),
    # change statistics printed in table
    statistic = list(
      continuous = "{mean} ({sd})",
      categorical = "{n} / {N} ({p}%)"
    ),
    missing = "no"
  ) %>%
  # add p-values to table, perform t-test for the marker,
  # and round large pvalues to two decimal place
  add_comparison(
    test = list(marker = "t.test"),
    pvalue_fun = function(x) fmt_pvalue(x, digits = 2)
  ) %>%
  # add q-values (p-values adjusted for multiple testing)
  add_q(pvalue_fun = function(x) fmt_pvalue(x, digits = 2)) %>%
  # add overall column
  add_overall() %>%
  # add column with N
  add_n() %>%
  # add statistic labels
  add_stat_label() %>%
  # bold variable labels, italicize levels
  bold_labels() %>%
  italicize_levels() %>%
  # bold p-values under a given threshold (default 0.05)
  bold_p(t = 0.2) %>%
  # include percent in headers
  modify_header(
    stat_by = c("{level}", "N = {n} ({p}%)"),
    stat_overall = c("All Patients", "N = {N} (100%)")
  )
Variable Statistic N All Patients Drug Placebo p-value q-value
N = 200 (100%) N = 107 (54%) N = 93 (46%)
Pretreatment Marker Level, ng/mL Mean (SD) 192 0.93 (0.85) 0.90 (0.88) 0.97 (0.83) 0.58 0.58
Clinical T Stage 200 0.13 0.26
T1 n / N (%) 51 / 200 (26%) 25 / 107 (23%) 26 / 93 (28%)
T2 n / N (%) 49 / 200 (24%) 26 / 107 (24%) 23 / 93 (25%)
T3 n / N (%) 42 / 200 (21%) 29 / 107 (27%) 13 / 93 (14%)
T4 n / N (%) 58 / 200 (29%) 27 / 107 (25%) 31 / 93 (33%)

Each of the modification functions have additional options outlined in their respective help files.

Report Results Inline

Having a well formatted and reproducible table is a great! But we often need to report the results from a table in the text of an Rmarkdown report. Inline reporting has been made simple with inline_text().

Let’s first create a basic Table 1.

tab1 = fmt_table1(trial2, by = "trt")
tab1
Variable Drug Placebo
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63)
Unknown 4 4
T Stage
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)

To report the median (IQR) of the marker levels in each group, use the following commands inline.

The median (IQR) marker level in the drug and placebo groups are `r inline_text(tab1, cell = "marker:Drug")` and `r inline_text(tab1, cell = "marker:Placebo")`, respectively.

Here’s how the line will appear in your report.

The median (IQR) marker level in the drug and placebo groups are 0.61 (0.22, 1.20) and 0.72 (0.22, 1.63), respectively.

The cell argument indicates to inline_text() which statistic to display. Information regarding which statistic to display are separated by ":". The first term indicates the variable name and the last indicates the level of the by variable e.g. marker:Placebo would display the summary statistics for the variable marker among patients in the Placebo group. If you display a statistic from a categorical variable, include the desired level after the variable name, e.g. stage:T1:Drug.

`r inline_text(tab1, "stage:T1:Drug")` resolves to “25 (23%)”

gtsummary + kableExtra

Need a data frame for any reason (e.g. if you want to get extra fancy with kableExtra)? Use generic function as_tibble to extract an easy-to-use data frame from any fmt_table1 object.

#get data frame from fmt_table1 object
tab1_df <- as_tibble(tab1)

If you want to customize anything with knitr::kable or kableExtra, you can use the above as_tibble along with the function indent_key which extracts the row numbers you want indented when knitting your table to HTML. (NOTE: Only load library(kableExtra) and use the below if knitting to HTML, this will not work with Word or PDF.) For more on customizing your tables with kableExtra check out the package’s vignette on HTML output.

# knit pretty table
tab1 %>%
  bold_labels() %>% # bold labels in here if you want
  as_tibble() %>%
  kable(
    row.names = FALSE,
    caption = "Table 1: Summary of Patient and Clinical Variables"
  ) %>%
  # Below, using kableExtra functions to do things like change table style, add 
  # grouped column header, footnote, and indent variable categories
  kable_styling(
    bootstrap_options = c("striped", "condensed", "hover"), #popular bootstrap styles
    font_size = 16,
    full_width = FALSE
  ) %>%
  add_header_above(c(" " = 1, "Treatment assignment" = 2)) %>%
  footnote(
    general = "Isn't this footnote so nice?",
    number = c("You can also add numbered or lettered footnotes", "Which is great.")
  ) %>%
  add_indent(indent_key(tab1)) 
Table 1: Summary of Patient and Clinical Variables
Treatment assignment
Variable Drug Placebo
N = 107 N = 93
Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63)
Unknown 4 4
T Stage
T1 25 (23%) 26 (28%)
T2 26 (24%) 23 (25%)
T3 29 (27%) 13 (14%)
T4 27 (25%) 31 (33%)
Note:
Isn’t this footnote so nice?
1 You can also add numbered or lettered footnotes
2 Which is great.

Under the Hood

When you print the output from the fmt_table1() function into the R console or into an Rmarkdown, there are default printing functions that are called in the background: print.fmt_table1() and knit_print.fmt_table1(). The true output from fmt_table1() is a named list, but when you print into the R console the interesting portions are displayed from the .$table1 data frame.

t = fmt_table1(trial2, by = "trt") %>% add_comparison()
ls(t)
#> [1] "by"        "call"      "call_list" "inputs"    "meta_data" "table1"

There is additional information stored in the fmt_table1() output list.

It is particularly useful to access .$meta_data to confirm which statistical tests were used to calculate the p-values in the table.

print.listof(t)
#> table1 :
#> # A tibble: 9 x 6
#>   .variable row_type label            stat_by1        stat_by2       pvalue
#>   <chr>     <chr>    <chr>            <chr>           <chr>          <chr> 
#> 1 <NA>      header2  Variable         Drug            Placebo        p-val~
#> 2 <NA>      header1  ""               N = 107         N = 93         ""    
#> 3 marker    label    Marker Level, n~ 0.61 (0.22, 1.~ 0.72 (0.22, 1~ 0.4   
#> 4 marker    missing  Unknown          4               4              <NA>  
#> 5 stage     label    T Stage          <NA>            <NA>           0.13  
#> 6 stage     level    T1               25 (23%)        26 (28%)       <NA>  
#> 7 stage     level    T2               26 (24%)        23 (25%)       <NA>  
#> 8 stage     level    T3               29 (27%)        13 (14%)       <NA>  
#> 9 stage     level    T4               27 (25%)        31 (33%)       <NA>  
#> 
#> by :
#> [1] "trt"
#> 
#> meta_data :
#> # A tibble: 2 x 10
#>   .variable .class .summary_type .dichotomous_va~ .var_label .stat_display
#>   <chr>     <chr>  <chr>         <list>           <chr>      <chr>        
#> 1 marker    numer~ continuous    <NULL>           Marker Le~ {median} ({q~
#> 2 stage     factor categorical   <NULL>           T Stage    {n} ({p}%)   
#> # ... with 4 more variables: .digits <dbl>, stat_test <chr>,
#> #   pvalue_exact <dbl>, pvalue <chr>
#> 
#> call :
#> fmt_table1(trial2, by = "trt")
#> 
#> inputs :
#> $data
#> # A tibble: 200 x 3
#>    trt     marker stage
#>    <chr>    <dbl> <fct>
#>  1 Drug     0.16  T3   
#>  2 Drug     1.11  T4   
#>  3 Drug     0.277 T1   
#>  4 Placebo  2.07  T4   
#>  5 Drug     2.77  T2   
#>  6 Drug     0.613 T1   
#>  7 Drug     0.354 T4   
#>  8 Drug     1.74  T4   
#>  9 Drug     0.144 T4   
#> 10 Placebo  0.205 T2   
#> # ... with 190 more rows
#> 
#> $by
#> [1] "trt"
#> 
#> $label
#> NULL
#> 
#> $type
#> NULL
#> 
#> $statistic
#> NULL
#> 
#> $digits
#> NULL
#> 
#> $id
#> NULL
#> 
#> $missing
#> [1] "ifany"
#> 
#> 
#> call_list :
#> $fmt_table1
#> fmt_table1(data = trial2, by = "trt")
#> 
#> $add_comparison
#> add_comparison(x = .)