This version: May, 2014. Stefan Milton Bache
This introduction to the magrittr (to be pronounced with a sophisticated french accent) will be short(ish) and to the point; as is the package itself. magrittr has two aims: to decrease development time and to improve readability and maintainability of code. Or even shortr: to make your code smokin' (puff puff)!
To archive its humble aims, magrittr (remember the accent) provides a new
“pipe”-like operator, %>%
, with which you may pipe a value forward into an
expression or function call; something along the lines of x %>% f
, rather
than f(x)
. This is not an unknown feature
elsewhere; a prime example is the |>
operator used extensively in F#
(to say the least) and indeed this – along with Unix pipes – served as a
motivation for developing the magrittr package.
At first encounter, you may wonder whether such an operator can really be all that beneficial; but as you may notice, it semantically changes your code in a way that makes it more intuitive to both read and write.
Consider the following example, in which the airquality
dataset shipped with
R is aggregated by week. We also print the first 3 rows for exposition.
library(magrittr)
weekly <-
airquality %>%
transform(Date = paste(1973, Month, Day, sep = "-") %>% as.Date) %>%
aggregate(. ~ Date %>% format("%W"), ., mean)
weekly %>% head(3)
Date %>% format("%W") Ozone Solar.R Wind Temp Month Day Date
1 18 26.75 192.5 9.875 68.75 5 2.5 1218
2 19 15.40 192.6 12.280 64.00 5 9.8 1225
3 20 18.14 203.4 12.457 63.29 5 17.0 1232
We start with the value airquality
(a data.frame
). Then based on this, we
make the “transformation” of adding a Date
column using month, day and
year (the year can be found in the dataset's documentation). Then we
aggregate the data by week (which is a “format” of the date) using mean
as aggregator. Note how the code is arranged in the logical
order of how you think about the task: data->transform->aggregate.
A horrific alternative would be to write
weekly <- aggregate(. ~ format(Date, "%W"), transform(airquality,
Date = as.Date(paste(1973, Month, Day, sep = "-"))), mean)
head(weekly, 3)
format(Date, "%W") Ozone Solar.R Wind Temp Month Day Date
1 18 26.75 192.5 9.875 68.75 5 2.5 1218
2 19 15.40 192.6 12.280 64.00 5 9.8 1225
3 20 18.14 203.4 12.457 63.29 5 17.0 1232
There is a lot more clutter with parentheses, and the mental task of deciphering the code is more challenging—in particular if you did not write it yourself. Note how even the extraction of few rows has a semantic appeal in the first example over the second, even though none of them are hard to understand. Granted: you may make the second example better, perhaps throw in a few temporary variables (which is often avoided to some degree when using magrittr), but one often sees cluttered lines like the ones presented.
And here is another selling point. Suppose I want to quickly go a step further and extract a subset somewhere in the process. Simply add a few steps to the chain:
windy.weeks <-
airquality %>%
transform(Date = paste(1973, Month, Day, sep = "-") %>% as.Date) %>%
aggregate(. ~ Date %>% format("%W"), ., mean) %>%
subset(Wind > 12, c(Ozone, Solar.R, Wind)) %>%
print
Ozone Solar.R Wind
2 15.40 192.6 12.28
3 18.14 203.4 12.46
7 27.00 207.7 14.53
I will refrain from making the alternative code even messier, but it should be clear that adding steps in a magrittr chain is simpler than working ones way through a labyrinth of parentheses.
The combined example shows a few neat features of the pipe (which it is not):
transform
expression."2014-02-01" %>% as.Date
is more pleasant than is as.Date("2014-02-01")
).'.'
, as placeholder. This is used in the aggregate
expression.aggregate
expression.print
(which also returns its
argument). Here, LHS %>% print()
, or even LHS %>% print(.)
would also work.One feature, which was not utilized above is piping into anonymous functions. This is also possible, e.g.
windy.weeks %>%
(function(x) rbind(x %>% head(1), x %>% tail(1)))
Ozone Solar.R Wind
2 15.4 192.6 12.28
7 27.0 207.7 14.53
Here the right-hand side is enclosed in parentheses, which is not strictly necessary, but advised. Whenever the RHS is parenthesized, is evaluated before the piping operation is carried out, i.e., one could do:
1:10 %>% (substitute(f(), list(f = sum)))
[1] 55
To summarize the important features:
In addition to the %>%
-operator, magrittr provides some aliases for other
operators which make operations such as addition or multiplication fit well
into the magrittr-syntax. As an example, consider:
rnorm(1000) %>%
multiply_by(5) %>%
add(5) %>%
function(x)
cat("Mean:", x %>% mean,
"Variance:", x %>% var, "\n")
Mean: 5.049 Variance: 24.66
which could be written in more compact form as
rnorm(100) %>% `*`(5) %>% `+`(5) %>%
function(x) cat("Mean:", x %>% mean, "Variance:", x %>% var, "\n")
To see a list of the aliases, execute e.g. ?multiply_by
. For more
examples of %>%
in use, see the development page:
github.com/smbache/magrittr.