As of cdata
version 1.0.8
cdata
implements an operator notation for data transform.
The idea is simple, yet powerful. In fact it helps us find new names for the transforms: multiply to block records and factor to row records. These terms will become clear as we work examples.
First let’s start with some data.
d <- wrapr::build_frame(
"model_id" , "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
knitr::kable(d)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
In the above data we have two measurements each for two individuals (individuals identified by the “model_id
” column). Using cdata
’s rowrecs_to_blocks_spec()
method we can capture a description of this record structure and transformation details.
library("cdata")
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "model_id")
print(transform)
#> {
#> row_record <- wrapr::qchar_frame(
#> "model_id" , "AUC", "R2" |
#> . , AUC , R2 )
#> row_keys <- c('model_id')
#>
#> # becomes
#>
#> block_record <- wrapr::qchar_frame(
#> "model_id" , "measure", "value" |
#> . , "AUC" , AUC |
#> . , "R2" , R2 )
#> block_keys <- c('model_id', 'measure')
#>
#> # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
#> }
Once we have this specification we can transform the data using operator notation.
We can collect the record blocks into rows by a “factor-out” (or aggregation/projection) step.
knitr::kable(d)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
d2 <- d %//% t(transform)
knitr::kable(d2)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
We can expand record rows into blocks by a “multiplication” (or join) step.
knitr::kable(d2)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
d3 <- d2 %**% transform
knitr::kable(d3)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
(%//%
and %**%
being two operators introduced by the cdata
package.)
And the two operators have an inverse/adjoint relation.
knitr::kable(d)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
# identity
d4 <- d %//% t(transform) %**% transform
knitr::kable(d4)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
We can also pipe into the spec (and into its adjoint) using the wrapr
dot pipe operator
.
# reverse or adjoint/transpose operation specification
t_record_spec <- t(transform)
d %.>%
t_record_spec %.>%
knitr::kable(.)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
# using dot-pipe's bquote style .() execute immediate notation
d %.>%
.(t(transform)) %.>%
knitr::kable(.)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
# identity
d %.>%
.(t(transform)) %.>%
transform %.>%
knitr::kable(.)
model_id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
And, of course, the exact same functionality for database tables.
have_db <- requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)
raw_connection <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
RSQLite::initExtension(raw_connection)
db <- rquery::rquery_db_info(
connection = raw_connection,
is_dbi = TRUE,
connection_options = rquery::rq_connection_tests(raw_connection))
d_td <- rquery::rq_copy_to(db, "d", d)
ops <- d_td %//% t(transform)
cat(format(ops))
#> table(`d`;
#> model_id,
#> measure,
#> value) %.>%
#> non_sql_node(., blocks_to_rowrecs(.))
rquery::execute(db, ops) %.>%
knitr::kable(.)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
d_td %.>%
.(t(transform)) %.>%
rquery::execute(db, .) %.>%
knitr::kable(.)
model_id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
DBI::dbDisconnect(raw_connection)
And that is truly fluid data manipulation.