As of `cdata`

version `1.0.8`

`cdata`

implements an operator notation for data transform.

The idea is simple, yet powerful. In fact it helps us find new names for the transforms: *multiply* to block records and *factor* to row records. These terms will become clear as we work examples.

First let’s start with some data.

```
d <- wrapr::build_frame(
"model_id" , "measure", "value" |
1 , "AUC" , 0.7 |
1 , "R2" , 0.4 |
2 , "AUC" , 0.8 |
2 , "R2" , 0.5 )
knitr::kable(d)
```

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

In the above data we have two measurements each for two individuals (individuals identified by the “`model_id`

” column). Using `cdata`

’s `rowrecs_to_blocks_spec()`

method we can capture a description of this record structure and transformation details.

```
library("cdata")
transform <- rowrecs_to_blocks_spec(
wrapr::qchar_frame(
"measure", "value" |
"AUC" , AUC |
"R2" , R2 ),
recordKeys = "model_id")
print(transform)
#> {
#> row_record <- wrapr::qchar_frame(
#> "model_id" , "AUC", "R2" |
#> . , AUC , R2 )
#> row_keys <- c('model_id')
#>
#> # becomes
#>
#> block_record <- wrapr::qchar_frame(
#> "model_id" , "measure", "value" |
#> . , "AUC" , AUC |
#> . , "R2" , R2 )
#> block_keys <- c('model_id', 'measure')
#>
#> # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
#> }
```

Once we have this specification we can transform the data using operator notation.

We can collect the record blocks into rows by a “factor-out” (or aggregation/projection) step.

`knitr::kable(d)`

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

```
d2 <- d %//% t(transform)
knitr::kable(d2)
```

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

We can expand record rows into blocks by a “multiplication” (or join) step.

`knitr::kable(d2)`

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

```
d3 <- d2 %**% transform
knitr::kable(d3)
```

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

(`%//%`

and `%**%`

being two operators introduced by the `cdata`

package.)

And the two operators have an inverse/adjoint relation.

`knitr::kable(d)`

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

```
# identity
d4 <- d %//% t(transform) %**% transform
knitr::kable(d4)
```

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

We can also pipe into the spec (and into its adjoint) using the `wrapr`

`dot pipe operator`

.

```
# reverse or adjoint/transpose operation specification
t_record_spec <- t(transform)
d %.>%
t_record_spec %.>%
knitr::kable(.)
```

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

```
# using dot-pipe's bquote style .() execute immediate notation
d %.>%
.(t(transform)) %.>%
knitr::kable(.)
```

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

```
# identity
d %.>%
.(t(transform)) %.>%
transform %.>%
knitr::kable(.)
```

model_id | measure | value |
---|---|---|

1 | AUC | 0.7 |

1 | R2 | 0.4 |

2 | AUC | 0.8 |

2 | R2 | 0.5 |

And, of course, the exact same functionality for database tables.

```
have_db <- requireNamespace("DBI", quietly = TRUE) &&
requireNamespace("RSQLite", quietly = TRUE)
```

```
raw_connection <- DBI::dbConnect(RSQLite::SQLite(),
":memory:")
RSQLite::initExtension(raw_connection)
db <- rquery::rquery_db_info(
connection = raw_connection,
is_dbi = TRUE,
connection_options = rquery::rq_connection_tests(raw_connection))
d_td <- rquery::rq_copy_to(db, "d", d)
```

```
ops <- d_td %//% t(transform)
cat(format(ops))
#> table(`d`;
#> model_id,
#> measure,
#> value) %.>%
#> non_sql_node(., blocks_to_rowrecs(.))
rquery::execute(db, ops) %.>%
knitr::kable(.)
```

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

```
d_td %.>%
.(t(transform)) %.>%
rquery::execute(db, .) %.>%
knitr::kable(.)
```

model_id | AUC | R2 |
---|---|---|

1 | 0.7 | 0.4 |

2 | 0.8 | 0.5 |

`DBI::dbDisconnect(raw_connection)`

And that is truly fluid data manipulation.