Recommendations for Using summarytools With Rmarkdown

Dominic Comtois

2018-10-07

This document uses theme rmarkdown::html_vignette.

Below are examples using recommended styles for Rmarkdown rendering. Available styles in summarytools are the same as pander’s:

For freq(), descr() (and ctable(), although with caveats), rmarkdown style is recommended. For dfSummary(), grid is recommended.

Important Note

knitr option results = 'asis' must be specified to get good results. This can be done globally via opts_chunk$set(results='asis'), or in the individual chunks.

The following summarytools global options have been set:

#st_options('omit.headings', TRUE)
st_options('bootstrap.css', FALSE)
st_options('footnote', NA)

Using method = ‘render’

To generate tables using summarytool’s own html rendering, the .Rmd document’s configuration part (yaml) must point to the package’s summarytools.css file. This can be achieved in several ways; the current vignette uses this configuration:

output: 
  rmarkdown::html_vignette: 
    css: 
    - !expr system.file("rmarkdown/templates/html_vignette/resources/vignette.css", package = "rmarkdown")
    - !expr system.file("includes/stylesheets/summarytools.css", package = "summarytools")

An alternative is to point to the directory on your system containing summarytools.css:

---
title: "RMarkdown using summarytools"
output: 
  html_document: 
    css: C:/R/win-library/3.4/summarytools/includes/stylesheets/summarytools.css
---

Starting with freq(), we’ll review the recommended methods and styles to get going with summarytools in Rmarkdown documents.

Jump to…


freq()

freq() is best used with `style = ‘rmarkdown’; html rendering is also possible.

Rmarkdown Style

Frequencies

Variable: tobacco$gender
Type: Factor (unordered)

  Freq % Valid % Valid Cum. % Total % Total Cum.
F 489 50.00 50.00 48.90 48.90
M 489 50.00 100.00 48.90 97.80
<NA> 22 2.20 100.00
Total 1000 100.00 100.00 100.00 100.00

HTML Rendering

Frequencies

Variable: gender
Type: Factor (unordered)
Valid Total
gender Freq % % Cumul % % Cumul
F 489 50.00 50.00 48.90 48.90
M 489 50.00 100.00 48.90 97.80
<NA> 22 2.20 100.00
Total 1000 100.00 100.00 100.00 100.00

If you find the table too large, you can use table.classes = 'st-small' - an example is provided further below.


Back to top

ctable()

Rmarkdown Style

Tables with heading spanning over 2 rows are not fully supported in markdown (yet), but the result is getting close to acceptable.

Cross-Tabulation / Row Proportions

Variables: gender * smoker
Data Frame: tobacco

smoker Yes No Total
gender
F 147 (30.06%) 342 (69.94%) 489 (100.00%)
M 143 (29.24%) 346 (70.76%) 489 (100.00%)
<NA> 8 (36.36%) 14 (63.64%) 22 (100.00%)
Total 298 (29.80%) 702 (70.20%) 1000 (100.00%)

HTML Rendering

For best results, use this method.

Cross-Tabulation / Row Proportions

Variables: gender * smoker
Data Frame: tobacco
smoker
gender Yes No Total
F 147 (30.06%) 342 (69.94%)  489 (100.00%)
M 143 (29.24%) 346 (70.76%)  489 (100.00%)
<NA>   8 (36.36%)  14 (63.64%)   22 (100.00%)
Total 298 (29.80%) 702 (70.20%) 1000 (100.00%)

Back to top

descr()

descr() is also best used with style = 'rmarkdown', and HTML rendering is also supported.

Rmarkdown Style

Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease

Descriptive Statistics

Data Frame: tobacco
N: 1000

  age BMI cigs.per.day samp.wgts
Mean 49.60 25.73 6.78 1.00
Std.Dev 18.29 4.49 11.88 0.08
Min 18.00 8.83 0.00 0.86
Q1 34.00 22.93 0.00 0.86
Median 50.00 25.62 0.00 1.04
Q3 66.00 28.65 11.00 1.05
Max 80.00 39.44 40.00 1.06
MAD 23.72 4.18 0.00 0.01
IQR 32.00 5.72 11.00 0.19
CV 0.37 0.17 1.75 0.08
Skewness -0.04 0.02 1.54 -1.04
SE.Skewness 0.08 0.08 0.08 0.08
Kurtosis -1.26 0.26 0.90 -0.90
N.Valid 975.00 974.00 965.00 1000.00
Pct.Valid 97.50 97.40 96.50 100.00

HTML Rendering

We’ll use table.classes = ‘st-small’ to show how it affects the table’s size (compare to the freq() table rendered earlier).

Non-numerical variable(s) ignored: gender, age.gr, smoker, diseased, disease

Descriptive Statistics

Data Frame: tobacco
N: 1000
age BMI cigs.per.day samp.wgts
Mean 49.60 25.73 6.78 1.00
Std.Dev 18.29 4.49 11.88 0.08
Min 18.00 8.83 0.00 0.86
Q1 34.00 22.93 0.00 0.86
Median 50.00 25.62 0.00 1.04
Q3 66.00 28.65 11.00 1.05
Max 80.00 39.44 40.00 1.06
MAD 23.72 4.18 0.00 0.01
IQR 32.00 5.72 11.00 0.19
CV 0.37 0.17 1.75 0.08
Skewness -0.04 0.02 1.54 -1.04
SE.Skewness 0.08 0.08 0.08 0.08
Kurtosis -1.26 0.26 0.90 -0.90
N.Valid 975 974 965 1000
Pct.Valid 97.50 97.40 96.50 100.00

Back to top

dfSummary()

Grid Style

This gives good results, although the histograms are not shown. This has to do with an unresolved issue, but we’re working hard to figure out a solution. Don’t forget to specify plain.ascii = FALSE, or you won’t get good results.

Data Frame Summary

tobacco
N: 1000

No Variable Stats / Values Freqs (% of Valid) Text Graph Valid Missing
1 gender
[factor]
1. F
2. M
489 (50.0%)
489 (50.0%)
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIIII
978
(97.8%)
22
(2.2%)
2 age
[numeric]
mean (sd) : 49.6 (18.29)
min < med < max :
18 < 50 < 80
IQR (CV) : 32 (0.37)
63 distinct values 975
(97.5%)
25
(2.5%)
3 age.gr
[factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258 (26.5%)
241 (24.7%)
317 (32.5%)
159 (16.3%)
IIIIIIIIIIIII
IIIIIIIIIIII
IIIIIIIIIIIIIIII
IIIIIIII
975
(97.5%)
25
(2.5%)
4 BMI
[numeric]
mean (sd) : 25.73 (4.49)
min < med < max :
8.83 < 25.62 < 39.44
IQR (CV) : 5.72 (0.17)
974 distinct values 974
(97.4%)
26
(2.6%)
5 smoker
[factor]
1. Yes
2. No
298 (29.8%)
702 (70.2%)
IIIIII
IIIIIIIIIIIIIIII
1000
(100%)
0
(0%)
6 cigs.per.day
[numeric]
mean (sd) : 6.78 (11.88)
min < med < max :
0 < 0 < 40
IQR (CV) : 11 (1.75)
37 distinct values 965
(96.5%)
35
(3.5%)
7 diseased
[factor]
1. Yes
2. No
224 (22.4%)
776 (77.6%)
IIII
IIIIIIIIIIIIIIII
1000
(100%)
0
(0%)
8 disease
[character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 others ]
36 (16.2%)
34 (15.3%)
21 ( 9.5%)
20 ( 9.0%)
20 ( 9.0%)
19 ( 8.6%)
14 ( 6.3%)
14 ( 6.3%)
12 ( 5.4%)
11 ( 5.0%)
21 ( 9.5%)
IIIIIIIIIIIIIIII
IIIIIIIIIIIIIII
IIIIIIIII
IIIIIIII
IIIIIIII
IIIIIIII
IIIIII
IIIIII
IIIII
IIII
IIIIIIIII
222
(22.2%)
778
(77.8%)
9 samp.wgts
[numeric]
mean (sd) : 1 (0.08)
min < med < max :
0.86 < 1.04 < 1.06
IQR (CV) : 0.19 (0.08)
0.86!: 267 (26.7%)
1.04!: 249 (24.9%)
1.05!: 324 (32.4%)
1.06!: 160 (16.0%)
! rounded
IIIIIIIIIIIII
IIIIIIIIIIII
IIIIIIIIIIIIIIII
IIIIIII

1000
(100%)
0
(0%)

HTML Rendering

Although the results are not as neat as they are when simply generating an html report from the R interpreter – the transparency of the graphs is lost in translation –, this is the best method still.

Data Frame Summary

tobacco

N: 1000
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 gender [factor] 1. F 2. M 489 (50.0%) 489 (50.0%) 978 (97.8%) 22 (2.2%)
2 age [numeric] mean (sd) : 49.6 (18.29) min < med < max : 18 < 50 < 80 IQR (CV) : 32 (0.37) 63 distinct values 975 (97.5%) 25 (2.5%)
3 age.gr [factor] 1. 18-34 2. 35-50 3. 51-70 4. 71 + 258 (26.5%) 241 (24.7%) 317 (32.5%) 159 (16.3%) 975 (97.5%) 25 (2.5%)
4 BMI [numeric] mean (sd) : 25.73 (4.49) min < med < max : 8.83 < 25.62 < 39.44 IQR (CV) : 5.72 (0.17) 974 distinct values 974 (97.4%) 26 (2.6%)
5 smoker [factor] 1. Yes 2. No 298 (29.8%) 702 (70.2%) 1000 (100%) 0 (0%)
6 cigs.per.day [numeric] mean (sd) : 6.78 (11.88) min < med < max : 0 < 0 < 40 IQR (CV) : 11 (1.75) 37 distinct values 965 (96.5%) 35 (3.5%)
7 diseased [factor] 1. Yes 2. No 224 (22.4%) 776 (77.6%) 1000 (100%) 0 (0%)
8 disease [character] 1. Hypertension 2. Cancer 3. Cholesterol 4. Heart 5. Pulmonary 6. Musculoskeletal 7. Diabetes 8. Hearing 9. Digestive 10. Hypotension [ 3 others ] 36 (16.2%) 34 (15.3%) 21 (9.5%) 20 (9.0%) 20 (9.0%) 19 (8.6%) 14 (6.3%) 14 (6.3%) 12 (5.4%) 11 (5.0%) 21 (9.5%) 222 (22.2%) 778 (77.8%)
9 samp.wgts [numeric] mean (sd) : 1 (0.08) min < med < max : 0.86 < 1.04 < 1.06 IQR (CV) : 0.19 (0.08) 0.86! : 267 (26.7%) 1.04! : 249 (24.9%) 1.05! : 324 (32.4%) 1.06! : 160 (16.0%) ! rounded 1000 (100%) 0 (0%)

Back to top