magrittr-pipe

This version: May, 2014. Stefan Milton Bache

This introduction to the magrittr (to be pronounced with a sophisticated french accent) will be short(ish) and to the point; as is the package itself. magrittr has two aims: to decrease development time and to improve readability and maintainability of code. Or even shortr: to make your code smokin' (puff puff)!

To archive its humble aims, magrittr (remember the accent) provides a new “pipe”-like operator, %>%, with which you may pipe a value forward into an expression or function call; something along the lines of x %>% f, rather than f(x). This is not an unknown feature elsewhere; a prime example is the |> operator used extensively in F# (to say the least) and indeed this – along with Unix pipes – served as a motivation for developing the magrittr package.

At first encounter, you may wonder whether such an operator can really be all that beneficial; but as you may notice, it semantically changes your code in a way that makes it more intuitive to both read and write.

Consider the following example, in which the airquality dataset shipped with R is aggregated by week. We also print the first 3 rows for exposition.

library(magrittr)

weekly <-
  airquality %>% 
  transform(Date = paste(1973, Month, Day, sep = "-") %>% as.Date) %>% 
  aggregate(. ~ Date %>% format("%W"), ., mean)

weekly %>% head(3)
  Date %>% format("%W") Ozone Solar.R   Wind  Temp Month  Day Date
1                    18 26.75   192.5  9.875 68.75     5  2.5 1218
2                    19 15.40   192.6 12.280 64.00     5  9.8 1225
3                    20 18.14   203.4 12.457 63.29     5 17.0 1232

We start with the value airquality (a data.frame). Then based on this, we make the “transformation” of adding a Date column using month, day and year (the year can be found in the dataset's documentation). Then we aggregate the data by week (which is a “format” of the date) using mean as aggregator. Note how the code is arranged in the logical order of how you think about the task: data->transform->aggregate. A horrific alternative would be to write

weekly <- aggregate(. ~ format(Date, "%W"), transform(airquality, 
  Date = as.Date(paste(1973, Month, Day, sep = "-"))), mean)

head(weekly, 3)
  format(Date, "%W") Ozone Solar.R   Wind  Temp Month  Day Date
1                 18 26.75   192.5  9.875 68.75     5  2.5 1218
2                 19 15.40   192.6 12.280 64.00     5  9.8 1225
3                 20 18.14   203.4 12.457 63.29     5 17.0 1232

There is a lot more clutter with parentheses, and the mental task of deciphering the code is more challenging—in particular if you did not write it yourself. Note how even the extraction of few rows has a semantic appeal in the first example over the second, even though none of them are hard to understand. Granted: you may make the second example better, perhaps throw in a few temporary variables (which is often avoided to some degree when using magrittr), but one often sees cluttered lines like the ones presented.

And here is another selling point. Suppose I want to quickly go a step further and extract a subset somewhere in the process. Simply add a few steps to the chain:

windy.weeks <-
  airquality %>% 
  transform(Date = paste(1973, Month, Day, sep = "-") %>% as.Date) %>% 
  aggregate(. ~ Date %>% format("%W"), ., mean) %>%
  subset(Wind > 12, c(Ozone, Solar.R, Wind)) %>% 
  print
  Ozone Solar.R  Wind
2 15.40   192.6 12.28
3 18.14   203.4 12.46
7 27.00   207.7 14.53

I will refrain from making the alternative code even messier, but it should be clear that adding steps in a magrittr chain is simpler than working ones way through a labyrinth of parentheses.

The combined example shows a few neat features of the pipe (which it is not):

  1. By default the left-hand side (LHS) will be piped in as the first argument of the function appearing on the right-hand side (RHS). This is the case in the transform expression.
  2. It may be used in a nested fashion, e.g. appearing in expressions within arguments. This is used in the date conversion (Note how the “pronunciation” of e.g. "2014-02-01" %>% as.Date is more pleasant than is as.Date("2014-02-01")).
  3. When the LHS is needed at a position other than the first, one can use the dot,'.', as placeholder. This is used in the aggregate expression.
  4. The dot in e.g. a formula is not confused with a placeholder, which is utilized in the aggregate expression.
  5. Whenever only one argument is needed, the LHS, then one can omit the empty parentheses. This is used in the call to print (which also returns its argument). Here, LHS %>% print(), or even LHS %>% print(.) would also work.

One feature, which was not utilized above is piping into anonymous functions. This is also possible, e.g.

windy.weeks %>%
(function(x) rbind(x %>% head(1), x %>% tail(1)))
  Ozone Solar.R  Wind
2  15.4   192.6 12.28
7  27.0   207.7 14.53

Here the right-hand side is enclosed in parentheses, which is not strictly necessary, but advised. Whenever the RHS is parenthesized, is evaluated before the piping operation is carried out, i.e., one could do:

1:10 %>% (substitute(f(), list(f = sum)))
[1] 55

To summarize the important features:

  1. Piping defaults to first-argument placement.
  2. Using a dot as placeholder allows you to use the LHS anywhere on the right-hand side.
  3. You may omit parentheses when only the LHS is needed as argument.
  4. You can use anonymous functions.
  5. You can parenthesize the right-hand side if this itself evaluates to the relevant call or function.

In addition to the %>%-operator, magrittr provides some aliases for other operators which make operations such as addition or multiplication fit well into the magrittr-syntax. As an example, consider:

rnorm(1000)    %>%
multiply_by(5) %>%
add(5)         %>%
function(x) 
  cat("Mean:",     x %>% mean, 
      "Variance:", x %>% var,  "\n")
Mean: 5.049 Variance: 24.66 

which could be written in more compact form as

rnorm(100) %>% `*`(5) %>% `+`(5) %>% 
function(x) cat("Mean:", x %>% mean, "Variance:", x %>% var,  "\n")

To see a list of the aliases, execute e.g. ?multiply_by. For more examples of %>% in use, see the development page: github.com/smbache/magrittr.