Unpacking Assignment

2017-08-25

Getting Started

The zeallot package defines an operator for unpacking assignment, sometimes called parallel assignment or destructuring assignment in other programming languages. The operator is written as %<-% and used like this.

c(lat, lng) %<-% list(38.061944, -122.643889)

The result is that the list is unpacked into its elements, and the elements are assigned to lat and lng.

lat
#> [1] 38.06194
lng
#> [1] -122.6439

You can also unpack the elements of a vector.

c(lat, lng) %<-% c(38.061944, -122.643889)
lat
#> [1] 38.06194
lng
#> [1] -122.6439

You can unpack much longer structures, too, of course, such as the 6-part summary of a vector.

c(min_wt, q1_wt, med_wt, mean_wt, q3_wt, max_wt) %<-% summary(mtcars$wt)
#> Warning in if (!is_list(value)) {: the condition has length > 1 and only
#> the first element will be used
min_wt
#> [1] 1.513
q1_wt
#> [1] 2.58125
med_wt
#> [1] 3.325
mean_wt
#> [1] 3.21725
q3_wt
#> [1] 3.61
max_wt
#> [1] 5.424

If the left-hand side and right-hand sides do not match, an error is raised. This guards against missing or unexpected values.

c(stg1, stg2, stg3) %<-% list("Moe", "Donald")
#> Error: invalid `%<-%` right-hand side, incorrect number of values
c(stg1, stg2, stg3) %<-% list("Moe", "Larry", "Curley", "Donald")
#> Error: invalid `%<-%` right-hand side, incorrect number of values

Unpacking a returned value

A common use-case is when a function returns a list of values and you want to extract the individual values. In this example, the list of values returned by coords_list() is unpacked into the variables lat and lng.

#
# A function which returns a list of 2 numeric values.
# 
coords_list <- function() {
  list(38.061944, -122.643889)
}

c(lat, lng) %<-% coords_list()
lat
#> [1] 38.06194
lng
#> [1] -122.6439

As in the earlier example, colon and braces on the left-hand side of the assignment tell the assignment operator to expect a list.

In this example, we call a function that returns a vector. Now we omit the braces.

#
# Convert cartesian coordinates to polar
#
to_polar = function(x, y) {
  c(sqrt(x^2 + y^2), atan(y / x))
}

c(radius, angle) %<-% to_polar(12, 5)
radius
#> [1] 13
angle
#> [1] 0.3947911

Example: Intercept and slope of regression

You can directly unpack the coefficients of a simple linear regression into the intercept and slope.

c(inter, slope) %<-% coef(lm(mpg ~ cyl, data = mtcars))
inter
#> [1] 37.88458
slope
#> [1] -2.87579

Example: Unpacking the result of safely

The purrr package includes the safely function. It wraps a given function to create a new, “safe” version of the original function.

safe_log <- purrr::safely(log)

The safe version returns a list of two items. The first item is the result of calling the original function, assuming no error occurred; or NULL if an error did occur. The second item is the error, if an error occurred; or NULL if no error occurred. Whether or not the original function would have thrown an error, the safe version will never throw an error.

pair <- safe_log(10)
pair$result
#> [1] 2.302585
pair$error
#> NULL
pair <- safe_log("donald")
pair$result
#> NULL
pair$error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

You can tighten and clarify calls to the safe function by using %<-%.

c(res, err) %<-% safe_log(10)
res
#> [1] 2.302585
err
#> NULL

Unpacking a data frame

A data frame is simply a list of columns, so the zeallot assignment does what you expect. It unpacks the data frame into individual columns.

c(mpg, cyl, disp, hp) %<-% mtcars[, 1:4]

head(mpg)
#> [1] 21.0 21.0 22.8 21.4 18.7 18.1

head(cyl)
#> [1] 6 6 4 6 8 6

head(disp)
#> [1] 160 160 108 258 360 225

head(hp)
#> [1] 110 110  93 110 175 105

Example: List of data frames

Bear in mind that a list of data frames is still just a list. The assignment will extract the list elements (which are data frames) but not unpack the data frames themselves.

quartet <- lapply(1:4, function(i) anscombe[, c(i, i + 4)])

c(an1, an2, an3, an4) %<-% lapply(quartet, head, n = 3)

an1
#>   x1   y1
#> 1 10 8.04
#> 2  8 6.95
#> 3 13 7.58

an2
#>   x2   y2
#> 1 10 9.14
#> 2  8 8.14
#> 3 13 8.74

an3
#>   x3    y3
#> 1 10  7.46
#> 2  8  6.77
#> 3 13 12.74

an4
#>   x4   y4
#> 1  8 6.58
#> 2  8 5.76
#> 3  8 7.71

The %<-% operator assigned four data frames to four variables, leaving the data frames intact.

Unpacking nested values

In addition to unpacking flat lists, you can unpack lists of lists.

c(a, c(b, d), e) %<-% list("begin", list("middle1", "middle2"), "end")
a
#> [1] "begin"
b
#> [1] "middle1"
d
#> [1] "middle2"
e
#> [1] "end"

Not only does this simplify extracting individual elements, it also adds a level of checking. If the described list structure does not match the actual list structure, an error is raised.

c(a, c(b, d, e), f) %<-% list("begin", list("middle1", "middle2"), "end")
#> Error: invalid `%<-%` right-hand side, incorrect number of values

Splitting a value into its parts

The previous examples dealt with unpacking a list or vector into its elements. You can also split certain kinds of individual values into subvalues.

Character vectors

You can assign individual characters of a string to variables. Since character strings are vectors, do not forget to exclude braces from the assignment expression.

c(ch1, ch2, ch3) %<-% "abc"
ch1
#> [1] "a"
ch2
#> [1] "b"
ch3
#> [1] "c"

Dates

You can split a Date into its year, month, and day, and assign the parts to variables.

c(y, m, d) %<-% Sys.Date()
y
#> [1] 2017
m
#> [1] 8
d
#> [1] 25

Class objects

zeallot includes implementations of destructure for character strings, complex numbers, data frames, date objects, and linear model summaries. However, because destructure is a generic function, you can define new implementations for custom classes. When defining a new implementation keep in mind the implementation needs to return a list so that values are properly unpacked.

Trailing values: the “everything else” clause

In some cases, you want the first few elements of a list or vector but do not care about the trailing elements. The summary function of lm, for example, returns a list of 11 values, and you might want only the first few. Fortunately, there is a way to capture those first few and say “don’t worry about everything else”.

f <- lm(mpg ~ cyl, data = mtcars)

c(fcall, fterms, resids, ...rest) %<-% summary(f)

fcall
#> lm(formula = mpg ~ cyl, data = mtcars)

fterms
#> mpg ~ cyl
#> attr(,"variables")
#> list(mpg, cyl)
#> attr(,"factors")
#>     cyl
#> mpg   0
#> cyl   1
#> attr(,"term.labels")
#> [1] "cyl"
#> attr(,"order")
#> [1] 1
#> attr(,"intercept")
#> [1] 1
#> attr(,"response")
#> [1] 1
#> attr(,".Environment")
#> <environment: R_GlobalEnv>
#> attr(,"predvars")
#> list(mpg, cyl)
#> attr(,"dataClasses")
#>       mpg       cyl 
#> "numeric" "numeric"

head(resids)
#>         Mazda RX4     Mazda RX4 Wag        Datsun 710    Hornet 4 Drive 
#>         0.3701643         0.3701643        -3.5814159         0.7701643 
#> Hornet Sportabout           Valiant 
#>         3.8217446        -2.5298357

Here, rest will capture everything else.

str(rest)
#> List of 8
#>  $ coefficients : num [1:2, 1:4] 37.885 -2.876 2.074 0.322 18.268 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "cyl"
#>   .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
#>  $ aliased      : Named logi [1:2] FALSE FALSE
#>   ..- attr(*, "names")= chr [1:2] "(Intercept)" "cyl"
#>  $ sigma        : num 3.21
#>  $ df           : int [1:3] 2 30 2
#>  $ r.squared    : num 0.726
#>  $ adj.r.squared: num 0.717
#>  $ fstatistic   : Named num [1:3] 79.6 1 30
#>   ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
#>  $ cov.unscaled : num [1:2, 1:2] 0.4185 -0.0626 -0.0626 0.0101
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:2] "(Intercept)" "cyl"
#>   .. ..$ : chr [1:2] "(Intercept)" "cyl"

The assignment operator noticed that ...rest is prefixed with ..., and it created a variable called rest for the trailing values of the list. If you omitted the “everything else” prefix, there would be an error because the lengths of the left- and right-hand sides of the assignment would be mismatched.

c(fcall, fterms, resids, rest) %<-% summary(f)
#> Error: invalid `%<-%` right-hand side, incorrect number of values

If multiple collector variables are specified at a particular depth it is ambiguous which values to assign to which collector and an error will be raised.

Leading values and middle values

In addition to collecting trailing values, you can also collect initial values and assign specific remaining values.

c(...skip, e, f) %<-% list(1, 2, 3, 4, 5)
skip
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
e
#> [1] 4
f
#> [1] 5

Or you can assign the first value, skip values, and then assign the last value.

c(begin, ...middle, end) %<-% list(1, 2, 3, 4, 5)
begin
#> [1] 1
middle
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 3
#> 
#> [[3]]
#> [1] 4
end
#> [1] 5

Skipped values: anonymous elements

You can skip one or more values without raising an error by using a period (.) instead of a variable name. For example, you might care only about the min, mean, and max values of a vector’s summary.

c(min_wt, ., ., mean_wt, ., max_wt) %<-% summary(mtcars$wt)
#> Warning in if (!is_list(value)) {: the condition has length > 1 and only
#> the first element will be used
min_wt
#> [1] 1.513
mean_wt
#> [1] 3.21725
max_wt
#> [1] 5.424

By combining an anonymous element (.) with the collector prefix, (...), you can ignore whole sublists.

c(begin, ..., end) %<-% list("hello", "blah", list("blah"), "blah", "world!")
begin
#> [1] "hello"
end
#> [1] "world!"

You can mix periods and collectors together to selectively keep and discard elements.

c(begin, ., ...middle, end) %<-% as.list(1:5)
begin
#> [1] 1
middle
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 4
end
#> [1] 5

It is important to note that although value(s) are skipped they are still expected. The next section touches on how to handle missing values.

Default values: handle missing values

You can specify a default value for a left-hand side variable using =, similar to specifying the default value of a function argument. This comes in handy when the number of elements returned by a function cannot be guaranteed. tail for example may return fewer elements than asked for.

nums <- 1:2
c(x, y) %<-% tail(nums, 2)
x
#> [1] 1
y
#> [1] 2

However, if we tried to get 3 elements and assign them an error would be raised because tail(nums, 3) still returns only 2 values.

c(x, y, z) %<-% tail(nums, 3)
#> Error: invalid `%<-%` right-hand side, incorrect number of values

We can fix the problem and resolve the error by specifying a default value for z.

c(x, y, z = NULL) %<-% tail(nums, 3)
x
#> [1] 1
y
#> [1] 2
z
#> NULL

Swapping values

A handy trick is swapping values without the use of a temporary variable.

c(first, last) %<-% c("Ai", "Genly")
first
#> [1] "Ai"
last
#> [1] "Genly"

c(first, last) %<-% c(last, first)
first
#> [1] "Genly"
last
#> [1] "Ai"

or

cat <- "meow"
dog <- "bark"

c(cat, dog, fish) %<-% c(dog, cat, dog)
cat
#> [1] "bark"
dog
#> [1] "meow"
fish
#> [1] "bark"

Right operator

The magrittr package provides a pipe operator %>% which allows functions to be called in succession instead of nested. The left operator %<-% does not work well with these function chains. Instead, the right operator %->% is recommended. The below example is adapted from the magrittr readme.

library(magrittr)

mtcars %>%
  subset(hp > 100) %>%
  aggregate(. ~ cyl, data = ., FUN = . %>% mean() %>% round(2)) %>%
  transform(kpl = mpg %>% multiply_by(0.4251)) %->% 
  c(cyl, mpg, ...rest)

cyl
#> [1] 4 6 8
mpg
#> [1] 25.90 19.74 15.10
rest
#> $disp
#> [1] 108.05 183.31 353.10
#> 
#> $hp
#> [1] 111.00 122.29 209.21
#> 
#> $drat
#> [1] 3.94 3.59 3.23
#> 
#> $wt
#> [1] 2.15 3.12 4.00
#> 
#> $qsec
#> [1] 17.75 17.98 16.77
#> 
#> $vs
#> [1] 1.00 0.57 0.00
#> 
#> $am
#> [1] 1.00 0.43 0.14
#> 
#> $gear
#> [1] 4.50 3.86 3.29
#> 
#> $carb
#> [1] 2.00 3.43 3.50
#> 
#> $kpl
#> [1] 11.010090  8.391474  6.419010