Tutorial: fmt_regression

Daniel D. Sjoberg

Last Updated: November 29, 2018

Introduction

This vignette will walk a reader through the fmt_regression() function, and the various functions available to modify and make additions to an existing formatted regression table. fmt_regression() uses broom::tidy() to perform the initial model formatting, and can accommodate many different model types (e.g. lm(), glm(), coxph(), survreg() and more).

To start, a quick note on the magrittr package’s pipe function, %>%. By default the pipe operator puts whatever is on the left hand side of %>% into the first argument of the function on the right hand side. The pipe function can be used to make the code relating to fmt_regression() easier to use, but it is not required. Here are a few examples of how %>% translates into typical R notation.

x %>% f() is equivalent to f(x)
x %>% f(y) is equivalent to f(x, y)
y %>% f(x, .) is equivalent to f(x, y)
z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z)

Basic Usage

We will use the trial data set throughout this example. This set contains data from 200 patients randomized to a new adjuvant treatment or placebo. The outcome is a binary tumor response. Each variable in the data frame has been assigned an attribute label (i.e. attr(trial$trt, "label") == "Treatment Randomization")

trt      Treatment Randomization
age      Age, yrs
marker   Marker Level, ng/mL
stage    T Stage
grade    Grade
response Tumor Response

The default output from fmt_regression() is meant to be publication ready. Let’s start by creating a regression model table from the trial data set included in the gtsummary package. We will predict tumor response using age, stage, and grade using a logistic regression model.

# build logistic regression model
m1 = glm(response ~ age + stage + grade, trial, family = binomial(link = "logit"))

# format results into data frame
fmt_regression(m1, exponentiate = TRUE)
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.4
T3 0.65 0.26, 1.58 0.3
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.5

Because the variables in the data set were labeled, the labels were carried through into the model regression results table. (Had the data not been labeled, the default is to display the variable name.) The model was recognized as logistic regression with coefficients exponentiated, so the header displayed “OR” for odds ratio. The correct reference group has also been added to the table.

Imagine you are not interested in the individual p-values for the stage and grade levels. You can use the add_global() function to add a global p-value for each. The add_global() function is a wrapper for cars::Anova(), which calculates LRT and Wald test statistics (a few other methods are also available depending on the type of model). The p-value presented is the default from cars::Anova() for your type of model. You can pass arguments to this function via add_global() to change the defaults.

# format results into data frame with global p-values
fmt_regression(m1, exponentiate = TRUE) %>% 
  add_global()
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage 0.057
T1 Ref.
T2 1.49 0.64, 3.49
T3 0.65 0.26, 1.58
T4 0.51 0.22, 1.14
Grade 0.3
I Ref.
II 0.55 0.25, 1.19
III 0.79 0.38, 1.64

Format Model Results

There are formatting options available, such as bolding and italicizing rows. In the example below, variable labels are bolded, and the levels of categorical levels are italicized.

# format results into data frame with global p-values
fmt_regression(m1, exponentiate = TRUE) %>% 
  bold_labels() %>% 
  italicize_levels()
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.4
T3 0.65 0.26, 1.58 0.3
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.5

The fmt_regression() has default methods for rounding p-values and model coefficients. The default p-value formatting function is fmt_pvalue() and fmt_beta() for the coefficients. In the example below, the large large p-values were rounded to 1 decimal place rather than 2.

# format results into data frame with global p-values
fmt_regression(
  m1, 
  exponentiate = TRUE, 
  pvalue_fun = function(x) fmt_pvalue(x, digits = 2)
) 
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.89
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.35
T3 0.65 0.26, 1.58 0.35
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.52

Modify Header

The results of the regression model are stored in a data frame with named columns–label, est, ci, and pvalue. The header rows can be modified with the modify_header() function by referencing these columns.

# format results into data frame with global p-values
fmt_regression(m1, exponentiate = TRUE) %>% 
  modify_header(label = " ", est = "Odds Ratio", ci = "95% Confidence Interval")
Odds Ratio 95% Confidence Interval p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.4
T3 0.65 0.26, 1.58 0.3
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.5

Report Results Inline

Having a well formatted and reproducible table is a great. But we often need to report the results from a table in the text of an Rmarkdown report. Inline reporting has been made simple with inline_text(). We’ll continue to use the logistic regression model m1 in this example.

fmt_m1 = fmt_regression(m1, exponentiate = TRUE)
fmt_m1
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.4
T3 0.65 0.26, 1.58 0.3
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.5

To report the result for age, use the following commands inline.

`r inline_text(fmt_m1, "age")`

Here’s how the line will appear in your report.

1.00 (95% CI 0.98, 1.02; p=0.9)

It is reasonable that you’ll need to modify the text. To do this, use the stat option. You have access to following fields within the stat option. The stat option syntax follows the glue::glue() format with referenced R objects being inserted between curly brackets.

{est}      Primary estimate (e.g. odds ratio, model coefficient)
{ll}       Rounded lower limit of confidence interval
{ul}       Rounded upper limit of confidence interval
{ci}       Rounded confidence interval. Lower and upper limits separated by a comma
{pvalue}   Rounded p-value
{p_pvalue} Rounded p-value with a 'p=' or 'p<' prepended to the pvalue.

Age was not significantly associated with tumor response `r inline_text(fmt_m1, "age", stat = "(OR {est}; 95% CI {ci}; {p_pvalue})")`.

Age was not significantly associated with tumor response (OR 1.00; 95% CI 0.98, 1.02; p=0.9).

If you’re printing results from a categorical variable, include the desired level after the variable name, e.g. inline_text(fmt_m1, "stage:T3") resolves to “0.65 (95% CI 0.26, 1.58; p=0.3)”.

gtsummary + kableExtra

Remember how the results are stored in a data frame? Need that data frame for any reason (e.g. if you want to get extra fancy with kableExtra)? Use generic function as_tibble to extract an easy-to-use data frame from any fmt_regression object.

#make data frame out of fmt_regression object
mod1_df <- as_tibble(fmt_m1)

If you want to customize anything with knitr::kable or kableExtra, you can use the above as_tibble along with our function indent_key which extracts the factors that you might want indented when knitting your table to HTML. (NOTE: Only load library(kableExtra) and use the below if knitting to HTML, this will not work with Word or PDF.) For more on customizing your tables with kableExtra check out the package’s vignette on HTML output.

library(knitr)
library(kableExtra)

# knit pretty table with indent into HTM
fmt_m1 %>%
  bold_labels() %>%
  as_tibble() %>%
  kable(
    row.names = FALSE,
    caption = "Regression Summary"
  ) %>%
  # Below, using kableExtra functions to do things like change table style, add 
  # grouped column header, footnote, and indent variable categories
  kable_styling(
    bootstrap_options = c("striped", "condensed", "hover"), #popular bootstrap styles
    font_size = 16,
    full_width = F
  ) %>%
  add_header_above(c(" " = 1, "MVA" = 3)) %>% #add grouped header if needed
  footnote(general = "Values calculated using logistic regression predicting response.")%>%
  add_indent(indent_key(fmt_m1)) 
Regression Summary
MVA
N = 183 OR 95% CI p-value
Age, yrs 1.00 0.98, 1.02 0.9
T Stage
T1 Ref.
T2 1.49 0.64, 3.49 0.4
T3 0.65 0.26, 1.58 0.3
T4 0.51 0.22, 1.14 0.10
Grade
I Ref.
II 0.55 0.25, 1.19 0.13
III 0.79 0.38, 1.64 0.5
Note:
Values calculated using logistic regression predicting response.

Under the Hood

When you print the output from the fmt_regression() function into the R console or into an Rmarkdown, there are default printing functions that are called in the background: print.fmt_regression() and knit_print.fmt_regression(). The true output from fmt_regression() is a named list, but when you print into the R console the interesting portions are displayed from the .$model_tbl data frame.

ls(fmt_m1)
#> [1] "inputs"    "model_obj" "model_tbl" "n"

There is additional information stored in the fmt_regression() output list.

print.listof(fmt_m1)
#> model_tbl :
#> # A tibble: 11 x 12
#>    row_type var_type variable label     N est   ll    ul    ci   
#>    <chr>    <chr>    <chr>    <chr> <int> <chr> <chr> <chr> <chr>
#>  1 header1  <NA>     <NA>     N = ~    NA OR    <NA>  <NA>  95% ~
#>  2 label    continu~ age      Age,~   183 1.00  0.98  1.02  0.98~
#>  3 label    categor~ stage    T St~   183 <NA>  <NA>  <NA>  <NA> 
#>  4 level    categor~ stage    T1      183 Ref.  <NA>  <NA>  <NA> 
#>  5 level    categor~ stage    T2      183 1.49  0.64  3.49  0.64~
#>  6 level    categor~ stage    T3      183 0.65  0.26  1.58  0.26~
#>  7 level    categor~ stage    T4      183 0.51  0.22  1.14  0.22~
#>  8 label    categor~ grade    Grade   183 <NA>  <NA>  <NA>  <NA> 
#>  9 level    categor~ grade    I       183 Ref.  <NA>  <NA>  <NA> 
#> 10 level    categor~ grade    II      183 0.55  0.25  1.19  0.25~
#> 11 level    categor~ grade    III     183 0.79  0.38  1.64  0.38~
#> # ... with 3 more variables: pvalue_exact <dbl>, pvalue <chr>,
#> #   p_pvalue <chr>
#> 
#> n :
#> [1] 183
#> 
#> model_obj :
#> 
#> Call:  glm(formula = response ~ age + stage + grade, family = binomial(link = "logit"), 
#>     data = trial)
#> 
#> Coefficients:
#> (Intercept)          age      stageT2      stageT3      stageT4  
#>    0.127164     0.001502     0.399070    -0.428039    -0.675289  
#>     gradeII     gradeIII  
#>   -0.590765    -0.238397  
#> 
#> Degrees of Freedom: 182 Total (i.e. Null);  176 Residual
#>   (17 observations deleted due to missingness)
#> Null Deviance:       250.8 
#> Residual Deviance: 241.4     AIC: 255.4
#> 
#> inputs :
#> $x
#> 
#> Call:  glm(formula = response ~ age + stage + grade, family = binomial(link = "logit"), 
#>     data = trial)
#> 
#> Coefficients:
#> (Intercept)          age      stageT2      stageT3      stageT4  
#>    0.127164     0.001502     0.399070    -0.428039    -0.675289  
#>     gradeII     gradeIII  
#>   -0.590765    -0.238397  
#> 
#> Degrees of Freedom: 182 Total (i.e. Null);  176 Residual
#>   (17 observations deleted due to missingness)
#> Null Deviance:       250.8 
#> Residual Deviance: 241.4     AIC: 255.4
#> 
#> $exponentiate
#> [1] TRUE
#> 
#> $label
#> NULL
#> 
#> $include
#> [1] "response" "age"      "stage"    "grade"   
#> 
#> $show_yesno
#> NULL
#> 
#> $conf.level
#> [1] 0.95
#> 
#> $intercept
#> [1] FALSE
#> 
#> $beta_fun
#> function (x) 
#> {
#>     dplyr::case_when(is.na(x) ~ NA_character_, abs(x) >= 100 ~ 
#>         sprintf("%.0f", x), abs(x) >= 10 ~ sprintf("%.1f", x), 
#>         TRUE ~ sprintf("%.2f", x))
#> }
#> <bytecode: 0x000000001c9bdbf8>
#> <environment: namespace:gtsummary>
#> 
#> $pvalue_fun
#> function (x, digits = 1, prepend_p = FALSE) 
#> {
#>     if (digits == 2) {
#>         p_fmt <- dplyr::case_when(x > 1 ~ NA_character_, x < 
#>             0 ~ NA_character_, x > 0.99 ~ ">0.99", round(x, 2) >= 
#>             0.1 ~ sprintf("%.2f", x), x >= 0.001 ~ sprintf("%.3f", 
#>             x), x < 0.001 ~ "<0.001")
#>     }
#>     if (digits == 1) {
#>         p_fmt <- dplyr::case_when(x > 1 ~ NA_character_, x < 
#>             0 ~ NA_character_, x > 0.9 ~ ">0.9", round(x, 1) >= 
#>             0.2 ~ sprintf("%.1f", x), round(x, 2) >= 0.1 ~ sprintf("%.2f", 
#>             x), x >= 0.001 ~ sprintf("%.3f", x), x < 0.001 ~ 
#>             "<0.001")
#>     }
#>     if (prepend_p == TRUE) {
#>         p_fmt <- dplyr::case_when(is.na(p_fmt) ~ NA_character_, 
#>             stringr::str_sub(p_fmt, end = 1L) %in% c("<", ">") ~ 
#>                 paste0("p", p_fmt), TRUE ~ paste0("p=", p_fmt))
#>     }
#>     return(p_fmt)
#> }
#> <bytecode: 0x000000001c9c0918>
#> <environment: namespace:gtsummary>

fmt_uni_regression

The fmt_uni_regression() produces a table of univariate regression results. The function is a wrapper for fmt_regression(), and as a result, accepts nearly identical function arguments. The function’s results can be modified in similar ways to fmt_regression() as well.

# rounding pvalues to 2 decimal places, 
# and adding global p-values,
# and bold_labels
fmt_uni_regression(
 trial,
 method = "glm",
 y = "response",
 method.args = list(family = binomial),
 exponentiate = TRUE,
 pvalue_fun = function(x) fmt_pvalue(x, digits = 2)
) %>%
  # overrides the default that shows p-values for each level
  add_global() %>%
  # adjusts global p-values for multiple testing (default method: FDR)
  add_q() %>%
  # bold p-values under a given threshold (default 0.05)
  bold_p() %>% 
  # now bold q-values under the threshold of 0.10
  bold_p(
    t = 0.10,
    q = TRUE
  ) %>%
  bold_labels()
Variable N OR 95% CI p-value q-value
Treatment Randomization 191 0.011 0.055
Drug Ref.
Placebo 0.47 0.26, 0.84
Age, yrs 183 1.00 0.98, 1.02 0.90 0.90
Marker Level, ng/mL 183 0.92 0.65, 1.30 0.64 0.80
T Stage 191 0.055 0.14
T1 Ref.
T2 1.40 0.63, 3.14
T3 0.63 0.26, 1.50
T4 0.50 0.22, 1.11
Grade 191 0.40 0.66
I Ref.
II 0.60 0.29, 1.25
III 0.76 0.38, 1.51