Recommendations for Using summarytools With Rmarkdown

Dominic Comtois

2019-04-11

Configuration

This document uses theme rmarkdown::html_vignette. Its yaml section looks like this:

# ---
# title: "Recommendations for Using summarytools With Rmarkdown"
# author: "Dominic Comtois"
# date: "2019-04-11"
# output: 
#   rmarkdown::html_vignette: 
#     css: 
#     - !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown")
# vignette: >
#   %\VignetteIndexEntry{Recommendations for Rmarkdown}
#   %\VignetteEngine{knitr::rmarkdown}
#   %\VignetteEncoding{UTF-8}
# ---

The following summarytools global options have been set. More of them can be useful, but this is a good starting point.

st_options(bootstrap.css     = FALSE,       # Already part of the theme so no need for it
           plain.ascii       = FALSE,       # One of the essential settings
           style             = "rmarkdown", # Idem.
           dfSummary.silent  = TRUE,        # Suppresses messages about temporary files
           footnote          = NA,          # Keeping the results minimalistic
           subtitle.emphasis = FALSE)       # For the vignette theme, this gives
                                            # much better results. Your mileage may vary.

Also, the following knitr chunk options were set this way:

library(knitr)
opts_chunk$set(comment=NA, prompt=FALSE, cache=FALSE, echo=TRUE, results='asis')

Finally, summarytools’ CSS has been included in the following manner, with chunk option echo = FALSE:

st_css()

Demo / Examples

Below are examples using recommended styles for Rmarkdown rendering. Available styles in summarytools are the same as pander’s:

For freq(), descr() (and ctable(), although with caveats), rmarkdown style is recommended. FordfSummary(), grid is recommended.

Starting with freq(), we’ll review the recommended methods and styles to quickly get satisfying results in your Rmarkdown documents.

Jump to…


freq()

freq() is best used with `style = ‘rmarkdown’; html rendering is also possible.

Rmarkdown Style

Frequencies

tobacco$gender
Type: Factor

  Freq % Valid % Valid Cum. % Total % Total Cum.
F 489 50.00 50.00 48.90 48.90
M 489 50.00 100.00 48.90 97.80
<NA> 22 2.20 100.00
Total 1000 100.00 100.00 100.00 100.00

HTML Rendering

Frequencies

tobacco$gender
Type: Factor
Valid Total
gender Freq % % Cum. % % Cum.
F 489 50.00 50.00 48.90 48.90
M 489 50.00 100.00 48.90 97.80
<NA> 22 2.20 100.00
Total 1000 100.00 100.00 100.00 100.00

If you find the table too large, you can use table.classes = 'st-small' - an example is provided further below.


Back to top

ctable()

Rmarkdown Style

Tables with heading spanning over 2 rows are not fully supported in markdown (yet), but the result is getting close to acceptable. This, however, is not true for all themes. That’s why the rendering method is preferred.

Cross-Tabulation, Row Proportions

gender * smoker
Data Frame: tobacco

smoker Yes No Total
gender
F 147 (30.1%) 342 (69.9%) 489 (100.0%)
M 143 (29.2%) 346 (70.8%) 489 (100.0%)
<NA> 8 (36.4%) 14 (63.6%) 22 (100.0%)
Total 298 (29.8%) 702 (70.2%) 1000 (100.0%)

HTML Rendering

For best results, use this method.

Cross-Tabulation, Row Proportions

gender * smoker
Data Frame: tobacco
smoker
gender Yes No Total
F 147 ( 30.1% ) 342 ( 69.9% ) 489 ( 100.0% )
M 143 ( 29.2% ) 346 ( 70.8% ) 489 ( 100.0% )
<NA> 8 ( 36.4% ) 14 ( 63.6% ) 22 ( 100.0% )
Total 298 ( 29.8% ) 702 ( 70.2% ) 1000 ( 100.0% )

Back to top

descr()

descr() is also best used with style = 'rmarkdown', and HTML rendering is also supported.

Rmarkdown Style

Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease

Descriptive Statistics

tobacco
N: 1000

  BMI age cigs.per.day samp.wgts
Mean 25.73 49.60 6.78 1.00
Std.Dev 4.49 18.29 11.88 0.08
Min 8.83 18.00 0.00 0.86
Q1 22.93 34.00 0.00 0.86
Median 25.62 50.00 0.00 1.04
Q3 28.65 66.00 11.00 1.05
Max 39.44 80.00 40.00 1.06
MAD 4.18 23.72 0.00 0.01
IQR 5.72 32.00 11.00 0.19
CV 0.17 0.37 1.75 0.08
Skewness 0.02 -0.04 1.54 -1.04
SE.Skewness 0.08 0.08 0.08 0.08
Kurtosis 0.26 -1.26 0.90 -0.90
N.Valid 974.00 975.00 965.00 1000.00
Pct.Valid 97.40 97.50 96.50 100.00

HTML Rendering

We’ll use table.classes = ‘st-small’ to show how it affects the table’s size (compare to the freq() table rendered earlier).

Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease

Descriptive Statistics

tobacco
N: 1000
BMI age cigs.per.day samp.wgts
Mean 25.73 49.60 6.78 1.00
Std.Dev 4.49 18.29 11.88 0.08
Min 8.83 18.00 0.00 0.86
Q1 22.93 34.00 0.00 0.86
Median 25.62 50.00 0.00 1.04
Q3 28.65 66.00 11.00 1.05
Max 39.44 80.00 40.00 1.06
MAD 4.18 23.72 0.00 0.01
IQR 5.72 32.00 11.00 0.19
CV 0.17 0.37 1.75 0.08
Skewness 0.02 -0.04 1.54 -1.04
SE.Skewness 0.08 0.08 0.08 0.08
Kurtosis 0.26 -1.26 0.90 -0.90
N.Valid 974 975 965 1000
Pct.Valid 97.40 97.50 96.50 100.00

Back to top

dfSummary()

Grid Style

This style gives good results, and since v0.9, the graphs are shown as true images. Don’t forget to specify plain.ascii = FALSE (or set it as a global option with st_options(plain.ascii = FALSE)), or you won’t get good results.

HTML Rendering

This method has also been much improved in version 0.9 of summarytools.

Data Frame Summary

tobacco
Dimensions: 1000 x 9
Duplicates: 2
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 gender [factor] 1. F 2. M
489(50.0%)
489(50.0%)
978 (97.8%) 22 (2.2%)
2 age [numeric] Mean (sd) : 49.6 (18.3) min < med < max: 18 < 50 < 80 IQR (CV) : 32 (0.4) 63 distinct values 975 (97.5%) 25 (2.5%)
3 age.gr [factor] 1. 18-34 2. 35-50 3. 51-70 4. 71 +
258(26.5%)
241(24.7%)
317(32.5%)
159(16.3%)
975 (97.5%) 25 (2.5%)
4 BMI [numeric] Mean (sd) : 25.7 (4.5) min < med < max: 8.8 < 25.6 < 39.4 IQR (CV) : 5.7 (0.2) 974 distinct values 974 (97.4%) 26 (2.6%)
5 smoker [factor] 1. Yes 2. No
298(29.8%)
702(70.2%)
1000 (100%) 0 (0%)
6 cigs.per.day [numeric] Mean (sd) : 6.8 (11.9) min < med < max: 0 < 0 < 40 IQR (CV) : 11 (1.8) 37 distinct values 965 (96.5%) 35 (3.5%)
7 diseased [factor] 1. Yes 2. No
224(22.4%)
776(77.6%)
1000 (100%) 0 (0%)
8 disease [character] 1. Hypertension 2. Cancer 3. Cholesterol 4. Heart 5. Pulmonary 6. Musculoskeletal 7. Diabetes 8. Hearing 9. Digestive 10. Hypotension [ 3 others ]
36(16.2%)
34(15.3%)
21(9.5%)
20(9.0%)
20(9.0%)
19(8.6%)
14(6.3%)
14(6.3%)
12(5.4%)
11(5.0%)
21(9.5%)
222 (22.2%) 778 (77.8%)
9 samp.wgts [numeric] Mean (sd) : 1 (0.1) min < med < max: 0.9 < 1 < 1.1 IQR (CV) : 0.2 (0.1)
0.86!:267(26.7%)
1.04!:249(24.9%)
1.05!:324(32.4%)
1.06!:160(16.0%)
! rounded
1000 (100%) 0 (0%)

Back to top

Final Notes

This is by no way a definitive guide; depending on the themes you use, you could find that other settings yield better results. Also, this document focuses on HTML document rendering. If you are looking to create a Word of a pdf document, you might want to try different combinations of options.

One thing that seems clear though: the rendering method is not well-suited for Word documents.