An implementation of common higher order functions with syntactic sugar for anonymous function. Provides also a link to ‘dplyr’ for common transformations on data frames to work around non standard evaluation by default.
devtools::install_github("wahani/dat")
install.packages("dat")
R CMD check
. And you don’t like that.dplyr
is not respecting the class of the object it operates on; the class attribute changes on-the-fly.dplyr
nor data.table
are playing nice with S4, but you really, really want a S4 data.table or tbl_df.rlist
and purrr
.dplyr
The examples are from the introductory vignette of dplyr
. You still work with data frames: so you can simply mix in dplyr features whenever you need them. The functions filtar
, mutar
and sumar
are R CMD check
friendly replacements for the corresponding versions in dplyr
. For select
you can use extract
. The function names are chosen so that they are similar but do not conflict with dplyr
s function names - so dplyr
can be savely attached to the search path.
library("nycflights13")
library("dat")
## Loading required package: aoos
##
## Attaching package: 'dat'
## The following object is masked from 'package:base':
##
## replace
filtar
can be used as a replacement for filter
and slice
. When you reference a variable in the data itself, you can indicate this by using a one sided formula.
filtar(flights, ~ month == 1 & day == 1)
filtar(flights, 1:10)
And for sorting:
filtar(flights, ~ order(year, month, day))
## # A tibble: 336,776 x 19
## year month day dep_t… sched_… dep_d… arr_… sched… arr_d… carr… flig…
## <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> <int>
## 1 2013 1 1 517 515 2.00 830 819 11.0 UA 1545
## 2 2013 1 1 533 529 4.00 850 830 20.0 UA 1714
## 3 2013 1 1 542 540 2.00 923 850 33.0 AA 1141
## 4 2013 1 1 544 545 -1.00 1004 1022 -18.0 B6 725
## 5 2013 1 1 554 600 -6.00 812 837 -25.0 DL 461
## 6 2013 1 1 554 558 -4.00 740 728 12.0 UA 1696
## 7 2013 1 1 555 600 -5.00 913 854 19.0 B6 507
## 8 2013 1 1 557 600 -3.00 709 723 -14.0 EV 5708
## 9 2013 1 1 557 600 -3.00 838 846 - 8.00 B6 79
## 10 2013 1 1 558 600 -2.00 753 745 8.00 AA 301
## # ... with 336,766 more rows, and 8 more variables: tailnum <chr>,
## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## # minute <dbl>, time_hour <dttm>
You can use characters, logicals, regular expressions and functions to select columns. Regular expressions are indicated by a leading “^”. Character are simply passed to dplyr::select_
.
flights %>%
extract(c("year", "month", "day")) %>%
extract("year:day") %>%
extract("^day$") %>%
extract(is.numeric)
The main difference between mutate
and mutar
is that you use a ~
instead of =
.
mutar(
flights,
gain ~ arr_delay - dep_delay,
speed ~ distance / air_time * 60
)
Grouping data is handled within mutar
:
mutar(flights, n ~ n(), by = "month")
sumar(flights, delay ~ mean(dep_delay, na.rm = TRUE), by = "month")
You can also provide additional arguments to a formula. This is especially helpful when you want to pass arguments from a function to such expressions. The additional augmentation can be anything which you can use to select columns (character, regular expression, function) or a named list where each element is a character.
sumar(
flights,
.n ~ mean(.n, na.rm = TRUE) | "^.*delay$",
x ~ mean(x, na.rm = TRUE) | list(x = "arr_time"),
by = "month"
)
## # A tibble: 12 x 4
## month dep_delay arr_delay arr_time
## <int> <dbl> <dbl> <dbl>
## 1 1 10.0 6.13 1523
## 2 2 10.8 5.61 1522
## 3 3 13.2 5.81 1510
## 4 4 13.9 11.2 1501
## 5 5 13.0 3.52 1503
## 6 6 20.8 16.5 1468
## 7 7 21.7 16.7 1456
## 8 8 12.6 6.04 1495
## 9 9 6.72 - 4.02 1504
## 10 10 6.24 - 0.167 1520
## 11 11 5.44 0.461 1523
## 12 12 16.6 14.9 1505
Using this package you can create S4 classes to contain a data frame (or a data.table) and use the interface to dplyr
. Both dplyr
and data.table
do not support integration with S4. The main function here is mutar
which is generic enough to link to subsetting of rows and cols as well as mutate and summarise. In the background dplyr
s ability to work on a data.table
is being used.
library("data.table")
setClass("DataTable", "data.table")
DataTable <- function(...) {
new("DataTable", data.table::data.table(...))
}
setMethod("[", "DataTable", mutar)
dtflights <- do.call(DataTable, nycflights13::flights)
dtflights[1:10, "year:day"]
dtflights[n ~ n(), by = "month"]
dtflights[n ~ n(), sby = "month"]
dtflights %>%
filtar(~month > 6) %>%
mutar(n ~ n(), by = "month") %>%
sumar(n ~ first(n), by = "month")
Inspired by rlist
and purrr
some low level operations on vectors are supported. The aim here is to integrate syntactic sugar for anonymous functions. Furthermore the functions should support the use of pipes.
map
and flatmap
as replacements for the apply functionsextract
for subsettingreplace
for replacing elements in a vectorWhat we can do with map:
map(1:3, ~ .^2)
flatmap(1:3, ~ .^2)
map(1:3 ~ 11:13, c) # zip
dat <- data.frame(x = 1, y = "")
map(dat, x ~ x + 1, is.numeric)
What we can do with extract:
extract(1:10, ~ . %% 2 == 0) %>% sum
extract(1:15, ~ 15 %% . == 0)
l <- list(aList = list(x = 1), aAtomic = "hi")
extract(l, "^aL")
extract(l, is.atomic)
What we can do with replace:
replace(c(1, 2, NA), is.na, 0)
replace(c(1, 2, NA), rep(TRUE, 3), 0)
replace(c(1, 2, NA), 3, 0)
replace(list(x = 1, y = 2), "x", 0)
replace(list(x = 1, y = 2), "^x$", 0)
replace(list(x = 1, y = "a"), is.character, NULL)