This vignette will review the different ways of selecting variables to describe with crosstable
. For more general informations about crosstable
, see vignette("crosstable")
(link). For more information and tips on tidyselect
, see tidyselect syntax
Please note that tables look better on a white background. If you are using RStudio in dark mode, borders may look blurry.
The simplest case is when you want to describe the whole table, as you need no further argument. If you really want to be more explicit, you also can use tidyselect::everything()
. All tidyselect
helpers are re-exported by dplyr
so we only want to load this latter package.
Here are the 10 first lines of the mtcars2
dataset:
library(crosstable)
library(dplyr)
=crosstable(mtcars2, everything())
ctdim(ct)
#> [1] 46 4
%>% head(10) %>%
ct as_flextable(keep_id=TRUE)
.id | label | variable | value |
mpg | Miles/(US) gallon | Min / Max | 10.4 / 33.9 |
Med [IQR] | 19.2 [15.4;22.8] | ||
Mean (std) | 20.1 (6.0) | ||
N (NA) | 32 (0) | ||
cyl | Number of cylinders | 4 | 11 (34.38%) |
6 | 7 (21.88%) | ||
8 | 14 (43.75%) | ||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 472.0 |
Med [IQR] | 196.3 [120.8;326.0] | ||
Mean (std) | 230.7 (123.9) |
Just like with dplyr::select
, you can use names with or without quotes to select variables you want to describe:
crosstable(mtcars2, mpg, by=vs) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | Engine | |
straight | vshaped | |||
mpg | Miles/(US) gallon | Min / Max | 17.8 / 33.9 | 10.4 / 26.0 |
Med [IQR] | 22.8 [21.4;29.6] | 15.6 [14.8;19.1] | ||
Mean (std) | 24.6 (5.4) | 16.6 (3.9) | ||
N (NA) | 14 (0) | 18 (0) |
.id | label | variable | Transmission | |
auto | manual | |||
qsec | 1/4 mile time | Min / Max | 15.4 / 22.9 | 14.5 / 19.9 |
Med [IQR] | 17.8 [17.2;19.2] | 17.0 [16.5;18.6] | ||
Mean (std) | 18.2 (1.8) | 17.4 (1.8) | ||
N (NA) | 19 (0) | 13 (0) |
However, it is better to use all_of
or any_of
when you take your column names from an external vector. Otherwise, there would be an ambiguity as you might have wanted to select a column named like this vector.
=c("mpg", "cyl") #well, that's quite a bad variable name...
qseccrosstable(mtcars2, any_of(qsec), by=vs) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | Engine | |
straight | vshaped | |||
mpg | Miles/(US) gallon | Min / Max | 17.8 / 33.9 | 10.4 / 26.0 |
Med [IQR] | 22.8 [21.4;29.6] | 15.6 [14.8;19.1] | ||
Mean (std) | 24.6 (5.4) | 16.6 (3.9) | ||
N (NA) | 14 (0) | 18 (0) | ||
cyl | Number of cylinders | 4 | 10 (90.91%) | 1 (9.09%) |
6 | 4 (57.14%) | 3 (42.86%) | ||
8 | 0 (0%) | 14 (100.00%) |
You can also use negation to keep all but some columns:
crosstable(mtcars2, -mpg, by=vs) %>% head(7) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | Engine | |
straight | vshaped | |||
cyl | Number of cylinders | 4 | 10 (90.91%) | 1 (9.09%) |
6 | 4 (57.14%) | 3 (42.86%) | ||
8 | 0 (0%) | 14 (100.00%) | ||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 258.0 | 120.3 / 472.0 |
Med [IQR] | 120.5 [83.0;162.4] | 311.0 [275.8;360.0] | ||
Mean (std) | 132.5 (56.9) | 307.1 (106.8) | ||
N (NA) | 14 (0) | 18 (0) |
.id | label | variable | Engine | |
straight | vshaped | |||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 258.0 | 120.3 / 472.0 |
Med [IQR] | 120.5 [83.0;162.4] | 311.0 [275.8;360.0] | ||
Mean (std) | 132.5 (56.9) | 307.1 (106.8) | ||
N (NA) | 14 (0) | 18 (0) | ||
hp | Gross horsepower | Min / Max | 52.0 / 123.0 | 91.0 / 335.0 |
Med [IQR] | 96.0 [66.0;109.8] | 180.0 [156.2;226.2] | ||
Mean (std) | 91.4 (24.4) | 189.7 (60.3) | ||
N (NA) | 14 (0) | 18 (0) |
.id | label | variable | Engine | |
straight | vshaped | |||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 258.0 | 120.3 / 472.0 |
Med [IQR] | 120.5 [83.0;162.4] | 311.0 [275.8;360.0] | ||
Mean (std) | 132.5 (56.9) | 307.1 (106.8) | ||
N (NA) | 14 (0) | 18 (0) | ||
hp | Gross horsepower | Min / Max | 52.0 / 123.0 | 91.0 / 335.0 |
Med [IQR] | 96.0 [66.0;109.8] | 180.0 [156.2;226.2] | ||
Mean (std) | 91.4 (24.4) | 189.7 (60.3) | ||
N (NA) | 14 (0) | 18 (0) |
This can be useful sometimes, for instance when you want to quickly describe the 3 first columns.
crosstable(mtcars2, 1:3, by=vs) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | Engine | |
straight | vshaped | |||
mpg | Miles/(US) gallon | Min / Max | 17.8 / 33.9 | 10.4 / 26.0 |
Med [IQR] | 22.8 [21.4;29.6] | 15.6 [14.8;19.1] | ||
Mean (std) | 24.6 (5.4) | 16.6 (3.9) | ||
N (NA) | 14 (0) | 18 (0) | ||
cyl | Number of cylinders | 4 | 10 (90.91%) | 1 (9.09%) |
6 | 4 (57.14%) | 3 (42.86%) | ||
8 | 0 (0%) | 14 (100.00%) | ||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 258.0 | 120.3 / 472.0 |
Med [IQR] | 120.5 [83.0;162.4] | 311.0 [275.8;360.0] | ||
Mean (std) | 132.5 (56.9) | 307.1 (106.8) | ||
N (NA) | 14 (0) | 18 (0) |
You can also use negation (-(1:3)
), concatenation (c(1,2,3)
), or both (crosstable(mtcars2, 1:4, -2, by=vs)
).
tidyselect
helpersAlong with everything()
, tidyselect
provides a large choice of helpers. You can see ?tidyselect::select_helpers
for a complete list. They all have the useful ignore.case
argument which is often very convenient.
The main are re-exported by crosstable: starts_with
, ends_with
, contains
and matches
. Here are some examples:
crosstable(mtcars2, starts_with("d")) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | value |
disp | Displacement (cu.in.) | Min / Max | 71.1 / 472.0 |
Med [IQR] | 196.3 [120.8;326.0] | ||
Mean (std) | 230.7 (123.9) | ||
N (NA) | 32 (0) | ||
drat | Rear axle ratio | Min / Max | 2.8 / 4.9 |
Med [IQR] | 3.7 [3.1;3.9] | ||
Mean (std) | 3.6 (0.5) | ||
N (NA) | 32 (0) |
.id | label | variable | value |
mpg | Miles/(US) gallon | Min / Max | 10.4 / 33.9 |
Med [IQR] | 19.2 [15.4;22.8] | ||
Mean (std) | 20.1 (6.0) | ||
N (NA) | 32 (0) | ||
cyl | Number of cylinders | 4 | 11 (34.38%) |
6 | 7 (21.88%) | ||
8 | 14 (43.75%) |
.id | label | variable | value |
disp | Displacement (cu.in.) | Min / Max | 71.1 / 472.0 |
Med [IQR] | 196.3 [120.8;326.0] | ||
Mean (std) | 230.7 (123.9) | ||
N (NA) | 32 (0) | ||
drat | Rear axle ratio | Min / Max | 2.8 / 4.9 |
Med [IQR] | 3.7 [3.1;3.9] | ||
Mean (std) | 3.6 (0.5) | ||
N (NA) | 32 (0) | ||
gear | Number of forward gears | 3 | 15 (46.88%) |
4 | 12 (37.50%) | ||
5 | 5 (15.62%) |
Sometimes, you want to select columns if they meet a set of specifications, for instance of type or of value. You can then use predicate functions: functions that return a single logical value. If the function is named, it is a good practice to wrap it in where
.
For instance, you might want to keep only numeric variables:
crosstable(mtcars2, where(is.numeric)) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | value |
mpg | Miles/(US) gallon | Min / Max | 10.4 / 33.9 |
Med [IQR] | 19.2 [15.4;22.8] | ||
Mean (std) | 20.1 (6.0) | ||
N (NA) | 32 (0) | ||
disp | Displacement (cu.in.) | Min / Max | 71.1 / 472.0 |
Med [IQR] | 196.3 [120.8;326.0] | ||
Mean (std) | 230.7 (123.9) | ||
N (NA) | 32 (0) | ||
hp | Gross horsepower | Min / Max | 52.0 / 335.0 |
Med [IQR] | 123.0 [96.5;180.0] | ||
Mean (std) | 146.7 (68.6) | ||
N (NA) | 32 (0) | ||
drat | Rear axle ratio | Min / Max | 2.8 / 4.9 |
Med [IQR] | 3.7 [3.1;3.9] | ||
Mean (std) | 3.6 (0.5) | ||
N (NA) | 32 (0) | ||
wt | Weight (1000 lbs) | Min / Max | 1.5 / 5.4 |
Med [IQR] | 3.3 [2.6;3.6] | ||
Mean (std) | 3.2 (1.0) | ||
N (NA) | 32 (0) | ||
qsec | 1/4 mile time | Min / Max | 14.5 / 22.9 |
Med [IQR] | 17.7 [16.9;18.9] | ||
Mean (std) | 17.8 (1.8) | ||
N (NA) | 32 (0) | ||
carb | Number of carburetors | Min / Max | 1.0 / 8.0 |
Med [IQR] | 2.0 [2.0;4.0] | ||
Mean (std) | 2.8 (1.6) | ||
N (NA) | 32 (0) |
Using anonymous functions, you can even use more complicated patterns. For instance, you might want only numeric variables which mean is higher than 100:
crosstable(mtcars2, function(x) is.numeric(x) && mean(x)>100) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | value |
disp | Displacement (cu.in.) | Min / Max | 71.1 / 472.0 |
Med [IQR] | 196.3 [120.8;326.0] | ||
Mean (std) | 230.7 (123.9) | ||
N (NA) | 32 (0) | ||
hp | Gross horsepower | Min / Max | 52.0 / 335.0 |
Med [IQR] | 123.0 [96.5;180.0] | ||
Mean (std) | 146.7 (68.6) | ||
N (NA) | 32 (0) |
Of note, crosstable support lambda-functions, so you could write crosstable(mtcars2, ~is.numeric(.x) && mean(.x)>100)
for the exact same result.
The only, logical constraint is that the function should return a single logical value. Use &&
, ||
, and parenthesis to combine functions in complex patterns.
If you want to mutate some variables in real-time, you can use the formula interface. The left-hand-side are the variables to describe, while the right-hand-side is the by
variable (which can be set to NULL
, 0
or 1
for “no variable”).
crosstable(mtcars2, mpg+cyl ~ vs) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | Engine | |
straight | vshaped | |||
mpg | Miles/(US) gallon | Min / Max | 17.8 / 33.9 | 10.4 / 26.0 |
Med [IQR] | 22.8 [21.4;29.6] | 15.6 [14.8;19.1] | ||
Mean (std) | 24.6 (5.4) | 16.6 (3.9) | ||
N (NA) | 14 (0) | 18 (0) | ||
cyl | Number of cylinders | 4 | 10 (90.91%) | 1 (9.09%) |
6 | 4 (57.14%) | 3 (42.86%) | ||
8 | 0 (0%) | 14 (100.00%) |
This permits very complex and interesting patterns, using functions in situ and operations using with the I
function. Labels are inherited and make little sense though.
crosstable(mtcars2, sqrt(mpg) + I(qsec^2) ~ ifelse(mpg>20,"mpg>20","mpg<20"),
label=FALSE) %>%
as_flextable(keep_id=TRUE)
.id | label | variable | ifelse(mpg > 20, "mpg>20", "mpg<20") | |
mpg<20 | mpg>20 | |||
sqrt(mpg) | sqrt(mpg) | Min / Max | 3.2 / 4.4 | 4.6 / 5.8 |
Med [IQR] | 4.0 [3.8;4.2] | 4.9 [4.6;5.4] | ||
Mean (std) | 4.0 (0.4) | 5.0 (0.4) | ||
N (NA) | 18 (0) | 14 (0) | ||
I(qsec^2) | I(qsec^2) | Min / Max | 210.2 / 408.8 | 270.9 / 524.4 |
Med [IQR] | 301.0 [259.3;321.8] | 351.8 [303.0;391.8] | ||
Mean (std) | 294.3 (50.4) | 356.8 (66.7) | ||
N (NA) | 18 (0) | 14 (0) |
Note that you cannot use tidyselect
helpers in formulas.
Lets play a little with all of this :-)
I want all numeric variables that do not start by “d” or “w”, but I still want drat
in the end.
crosstable(mtcars2, where(is.numeric), -matches("^d|w"), drat, label=FALSE) %>%
as_flextable()
label | variable | value |
mpg | Min / Max | 10.4 / 33.9 |
Med [IQR] | 19.2 [15.4;22.8] | |
Mean (std) | 20.1 (6.0) | |
N (NA) | 32 (0) | |
hp | Min / Max | 52.0 / 335.0 |
Med [IQR] | 123.0 [96.5;180.0] | |
Mean (std) | 146.7 (68.6) | |
N (NA) | 32 (0) | |
qsec | Min / Max | 14.5 / 22.9 |
Med [IQR] | 17.7 [16.9;18.9] | |
Mean (std) | 17.8 (1.8) | |
N (NA) | 32 (0) | |
carb | Min / Max | 1.0 / 8.0 |
Med [IQR] | 2.0 [2.0;4.0] | |
Mean (std) | 2.8 (1.6) | |
N (NA) | 32 (0) | |
drat | Min / Max | 2.8 / 4.9 |
Med [IQR] | 3.7 [3.1;3.9] | |
Mean (std) | 3.6 (0.5) | |
N (NA) | 32 (0) |