# Tutorial: fmt_table1

## Introduction

This vignette will walk a reader through the fmt_table1() function, and the various functions available to modify and make additions to an existing Table 1.

To start, a quick note on the magrittr package’s pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to fmt_table1() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Here’s how this translates into the use of fmt_table1().

mtcars %>% fmt_table1() is equivalent to fmt_table1(mtcars)
mtcars %>% fmt_table1(by = "am") is equivalent to fmt_table1(mtcars, by = "am")
fmt_table1(mtcars, by = "am") %>% add_comparison() is equivalent to
t = fmt_table1(mtcars, by = "am")
add_comparison(t)

We’ll be using the trial data set throughout this example. This set contains data from 200 patients randomized to a new adjuvant therapy or placebo. The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") = "Treatment Randomization"). These labels are displayed in the output table by default. A data frame without labels will print variable names. trt Treatment Randomization age Age, yrs marker Marker Level, ng/mL stage T Stage grade Grade response Tumor Response library(dplyr) library(knitr) library(kableExtra) library(gtsummary) # printing trial data head(trial) %>% kable() trt age marker stage grade response Drug 23 0.160 T3 I 1 Drug 9 1.107 T4 III 1 Drug 31 0.277 T1 I 1 Placebo 46 2.067 T4 II 1 Drug 51 2.767 T2 II 0 Drug 39 0.613 T1 III 1 The default output from fmt_table1() is meant to be publication ready. Let’s start by creating a descriptive statistics table from the trial data set built into the gtsummary package. The fmt_table1() can take, minimally, a data set as the only input, and return descriptive statistics for each column in the data frame. For brevity, keeping a subset of the variables in the trial data set. trial2 = trial %>% select(trt, marker, stage) fmt_table1(trial2) Variable N = 200 Treatment Randomization Drug 107 (54%) Placebo 93 (46%) Marker Level, ng/mL 0.68 (0.22, 1.42) Unknown 8 T Stage T1 51 (26%) T2 49 (24%) T3 42 (21%) T4 58 (29%) If your output does not appear in a formatted table, it is likely due to a known issue in the knitr::kable() function. One way around the issue to to add styling from the kableExtra package. fmt_table1(trial2) %>% as_tibble() %>% knitr::kable() %>% kableExtra::kable_styling() This is a great table, but for trial data the summary statistics should be split by randomization group. While reporting p-values for a randomized trial isn’t recommended, we’ll do it here as an illustration. To compare two or more groups, include add_comparison() to the function call. fmt_table1(trial2, by = "trt") %>% add_comparison() Variable Drug Placebo p-value N = 107 N = 93 Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63) 0.4 Unknown 4 4 T Stage 0.13 T1 25 (23%) 26 (28%) T2 26 (24%) 23 (25%) T3 29 (27%) 13 (14%) T4 27 (25%) 31 (33%) ## Customize Table 1 Output It’s also possible to add information to fmt_table1() output. The code below calculates the standard table with summary statistics split by treatment randomization with the following modifications • Report ‘mean (SD)’ and ‘n / N (%)’ • Use t-test instead of Wilcoxon rank-sum • Do not add row for number of missing observations • Round large p-values to two decimal place • Add column of q-values (p-values adjusted using FDR) • Add column reporting summary statistics for the cohort overall • Add column reporting N not missing for each variable • Add column with statistic labels • Modify header to include percentages in each group • Bold variable labels • Italicize variable levels trial2 %>% # build base table 1 fmt_table1( by = "trt", # change variable labels label = list( marker = "Pretreatment Marker Level, ng/mL", stage = "Clinical T Stage" ), # change statistics printed in table statistic = list( continuous = "{mean} ({sd})", categorical = "{n} / {N} ({p}%)" ), missing = "no" ) %>% # add p-values to table, perform t-test for the marker, # and round large pvalues to two decimal place add_comparison( test = list(marker = "t.test"), pvalue_fun = function(x) fmt_pvalue(x, digits = 2) ) %>% # add q-values (p-values adjusted for multiple testing) add_q(pvalue_fun = function(x) fmt_pvalue(x, digits = 2)) %>% # add overall column add_overall() %>% # add column with N add_n() %>% # add statistic labels add_stat_label() %>% # bold variable labels, italicize levels bold_labels() %>% italicize_levels() %>% # bold p-values under a given threshold (default 0.05) bold_p(t = 0.2) %>% # include percent in headers modify_header( stat_by = c("{level}", "N = {n} ({p}%)"), stat_overall = c("All Patients", "N = {N} (100%)") ) Variable Statistic N All Patients Drug Placebo p-value q-value N = 200 (100%) N = 107 (54%) N = 93 (46%) Pretreatment Marker Level, ng/mL Mean (SD) 192 0.93 (0.85) 0.90 (0.88) 0.97 (0.83) 0.58 0.58 Clinical T Stage 200 0.13 0.26 T1 n / N (%) 51 / 200 (26%) 25 / 107 (23%) 26 / 93 (28%) T2 n / N (%) 49 / 200 (24%) 26 / 107 (24%) 23 / 93 (25%) T3 n / N (%) 42 / 200 (21%) 29 / 107 (27%) 13 / 93 (14%) T4 n / N (%) 58 / 200 (29%) 27 / 107 (25%) 31 / 93 (33%) Each of the modification functions have additional options outlined in their respective help files. ## Report Results Inline Having a well formatted and reproducible table is a great! But we often need to report the results from a table in the text of an Rmarkdown report. Inline reporting has been made simple with inline_text(). Let’s first create a basic Table 1. tab1 = fmt_table1(trial2, by = "trt") tab1 Variable Drug Placebo N = 107 N = 93 Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63) Unknown 4 4 T Stage T1 25 (23%) 26 (28%) T2 26 (24%) 23 (25%) T3 29 (27%) 13 (14%) T4 27 (25%) 31 (33%) To report the median (IQR) of the marker levels in each group, use the following commands inline. The median (IQR) marker level in the drug and placebo groups are r inline_text(tab1, cell = "marker:Drug") and r inline_text(tab1, cell = "marker:Placebo"), respectively. Here’s how the line will appear in your report. The median (IQR) marker level in the drug and placebo groups are 0.61 (0.22, 1.20) and 0.72 (0.22, 1.63), respectively. The cell argument indicates to inline_text() which statistic to display. Information regarding which statistic to display are separated by ":". The first term indicates the variable name and the last indicates the level of the by variable e.g. marker:Placebo would display the summary statistics for the variable marker among patients in the Placebo group. If you display a statistic from a categorical variable, include the desired level after the variable name, e.g. stage:T1:Drug. r inline_text(tab1, "stage:T1:Drug") resolves to “25 (23%)” ## gtsummary + kableExtra Need a data frame for any reason (e.g. if you want to get extra fancy with kableExtra)? Use generic function as_tibble to extract an easy-to-use data frame from any fmt_table1 object. #get data frame from fmt_table1 object tab1_df <- as_tibble(tab1) If you want to customize anything with knitr::kable or kableExtra, you can use the above as_tibble along with the function indent_key which extracts the row numbers you want indented when knitting your table to HTML. (NOTE: Only load library(kableExtra) and use the below if knitting to HTML, this will not work with Word or PDF.) For more on customizing your tables with kableExtra check out the package’s vignette on HTML output. # knit pretty table tab1 %>% bold_labels() %>% # bold labels in here if you want as_tibble() %>% kable( row.names = FALSE, caption = "Table 1: Summary of Patient and Clinical Variables" ) %>% # Below, using kableExtra functions to do things like change table style, add # grouped column header, footnote, and indent variable categories kable_styling( bootstrap_options = c("striped", "condensed", "hover"), #popular bootstrap styles font_size = 16, full_width = FALSE ) %>% add_header_above(c(" " = 1, "Treatment assignment" = 2)) %>% footnote( general = "Isn't this footnote so nice?", number = c("You can also add numbered or lettered footnotes", "Which is great.") ) %>% add_indent(indent_key(tab1))  Table 1: Summary of Patient and Clinical Variables Treatment assignment Variable Drug Placebo N = 107 N = 93 Marker Level, ng/mL 0.61 (0.22, 1.20) 0.72 (0.22, 1.63) Unknown 4 4 T Stage T1 25 (23%) 26 (28%) T2 26 (24%) 23 (25%) T3 29 (27%) 13 (14%) T4 27 (25%) 31 (33%) Note: Isn’t this footnote so nice? 1 You can also add numbered or lettered footnotes 2 Which is great. ## Under the Hood When you print the output from the fmt_table1() function into the R console or into an Rmarkdown, there are default printing functions that are called in the background: print.fmt_table1() and knit_print.fmt_table1(). The true output from fmt_table1() is a named list, but when you print into the R console the interesting portions are displayed from the .$table1 data frame.

t = fmt_table1(trial2, by = "trt") %>% add_comparison()
ls(t)
#> [1] "by"        "call"      "call_list" "inputs"    "meta_data" "table1"

There is additional information stored in the fmt_table1() output list.

• table1 data frame with summary statistics
• meta_data data frame that is one row per variable, and contains information about each variable in the object
• by the by = variable name from the function call
• call the fmt_table1 function call
• call_list named list of each function called for the fmt_table1 object. the above example would have two elements in the list: fmt_table1 and add_comparison.
• inputs Inputs from the function call. Not only is the call stored, but the values of the inputs as well. For example, you can access the data frame passed to fmt_table1().

It is particularly useful to access .$meta_data to confirm which statistical tests were used to calculate the p-values in the table. print.listof(t) #> table1 : #> # A tibble: 9 x 6 #> .variable row_type label stat_by1 stat_by2 pvalue #> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 <NA> header2 Variable Drug Placebo p-val~ #> 2 <NA> header1 "" N = 107 N = 93 "" #> 3 marker label Marker Level, n~ 0.61 (0.22, 1.~ 0.72 (0.22, 1~ 0.4 #> 4 marker missing Unknown 4 4 <NA> #> 5 stage label T Stage <NA> <NA> 0.13 #> 6 stage level T1 25 (23%) 26 (28%) <NA> #> 7 stage level T2 26 (24%) 23 (25%) <NA> #> 8 stage level T3 29 (27%) 13 (14%) <NA> #> 9 stage level T4 27 (25%) 31 (33%) <NA> #> #> by : #> [1] "trt" #> #> meta_data : #> # A tibble: 2 x 10 #> .variable .class .summary_type .dichotomous_va~ .var_label .stat_display #> <chr> <chr> <chr> <list> <chr> <chr> #> 1 marker numer~ continuous <NULL> Marker Le~ {median} ({q~ #> 2 stage factor categorical <NULL> T Stage {n} ({p}%) #> # ... with 4 more variables: .digits <dbl>, stat_test <chr>, #> # pvalue_exact <dbl>, pvalue <chr> #> #> call : #> fmt_table1(trial2, by = "trt") #> #> inputs : #>$data
#> # A tibble: 200 x 3
#>    trt     marker stage
#>    <chr>    <dbl> <fct>
#>  1 Drug     0.16  T3
#>  2 Drug     1.11  T4
#>  3 Drug     0.277 T1
#>  4 Placebo  2.07  T4
#>  5 Drug     2.77  T2
#>  6 Drug     0.613 T1
#>  7 Drug     0.354 T4
#>  8 Drug     1.74  T4
#>  9 Drug     0.144 T4
#> 10 Placebo  0.205 T2
#> # ... with 190 more rows
#>
#> $by #> [1] "trt" #> #>$label
#> NULL
#>
#> $type #> NULL #> #>$statistic
#> NULL
#>
#> $digits #> NULL #> #>$id
#> NULL
#>
#> $missing #> [1] "ifany" #> #> #> call_list : #>$fmt_table1
#> fmt_table1(data = trial2, by = "trt")
#>
#> add_comparison(x = .)