Advanced REDCapR Operations

2017-05-18

This vignette covers the the less-typical uses of REDCapR to interact with REDCap through its API.

Next Steps

Set project-wide values.

There is some information that is specific to a REDCap project, as opposed to an individual operation. This includes the (1) uri of the server, and the (2) token for the user’s project. This is hosted on a machine used in REDCapR’s public test suite, so you can run this example from any computer. Unless tests are running.

library(REDCapR) #Load the package into the current R session.
uri                   <- "https://bbmc.ouhsc.edu/redcap/api/"
token_simple          <- "9A81268476645C4E5F03428B8AC3AA7B"
token_longitudinal    <- "0434F0E9CF53ED0587847AB6E51DE762"

Converting from tall/long to wide

Disclaimer: Occasionally we’re asked for a longitudinal dataset to be converted from a “long/tall format” (where typically each row is one observation for a participant) to a “wide format” (where each row is on participant). Usually we advise against it. Besides all the database benefits of a long structure, a wide structure restricts your options with the stat routine. No modern longitudinal analysis procedures (eg, growth curve models or multilevel/hierarchical models) accept wide. You’re pretty much stuck with repeated measures anova, which is very inflexible for real-world medical-ish analyses. It requires a patient to have a measurement at every time point; otherwise the anova excludes the patient entirely.

However we like going wide to produce visual tables for publications, and here’s one way to do it in R. First retrieve the dataset from REDCap.

library(magrittr); 
suppressPackageStartupMessages(requireNamespace("dplyr"))
suppressPackageStartupMessages(requireNamespace("tidyr"))
events_to_retain  <- c("dose_1_arm_1", "visit_1_arm_1", "dose_2_arm_1", "visit_2_arm_1")

ds_long <- REDCapR::redcap_read_oneshot(redcap_uri=uri, token=token_longitudinal)$data
#> 18 records and 125 columns were read from REDCap in 0.4 seconds.  The http status code was 200.
ds_long %>% 
  dplyr::select(study_id, redcap_event_name, pmq1, pmq2, pmq3, pmq4)
study id redcap event name pmq1 pmq2 pmq3 pmq4
100 enrollment_arm_1 NA NA NA NA
100 dose_1_arm_1 2 2 1 1
100 visit_1_arm_1 1 0 0 0
100 dose_2_arm_1 3 1 0 0
100 visit_2_arm_1 0 1 0 0
100 final_visit_arm_1 NA NA NA NA
220 enrollment_arm_1 NA NA NA NA
220 dose_1_arm_1 0 1 0 2
220 visit_1_arm_1 0 3 1 0
220 dose_2_arm_1 1 2 0 1
220 visit_2_arm_1 3 4 1 0
220 final_visit_arm_1 NA NA NA NA
304 enrollment_arm_2 NA NA NA NA
304 deadline_to_opt_ou_arm_2 NA NA NA NA
304 first_dose_arm_2 0 1 0 0
304 first_visit_arm_2 2 0 0 0
304 final_visit_arm_2 NA NA NA NA
304 deadline_to_return_arm_2 NA NA NA NA

When widening only one variable (eg, pmq1), the code’s pretty simple:

ds_wide <- ds_long %>% 
  dplyr::select(study_id, redcap_event_name, pmq1) %>% 
  dplyr::filter(redcap_event_name %in% events_to_retain) %>% 
  tidyr::spread(key=redcap_event_name, value=pmq1)
ds_wide
study id dose 1 arm 1 dose 2 arm 1 visit 1 arm 1 visit 2 arm 1
100 2 3 1 0
220 0 1 0 3

When widening more than one variable (eg, pmq1 - pmq4), it’s usually easiest to go even longer/taller (eg, ds_eav) before reversing direction and going wide:

pattern <- "^(\\w+?)_arm_(\\d)$"

ds_eav <- ds_long %>% 
  dplyr::select(study_id, redcap_event_name, pmq1, pmq2, pmq3, pmq4) %>% 
  dplyr::mutate(
    event      = sub(pattern, "\\1", redcap_event_name),
    arm        = as.integer(sub(pattern, "\\2", redcap_event_name))
  ) %>% 
  dplyr::select(study_id, event, arm, pmq1, pmq2, pmq3, pmq4) %>% 
  tidyr::gather(key=key, value=value, pmq1, pmq2, pmq3, pmq4) %>% 
  dplyr::filter(!(event %in% c(
    "enrollment", "final_visit", "deadline_to_return", "deadline_to_opt_ou")
  )) %>% 
  dplyr::mutate( # Simulate correcting for mismatched names across arms:
    event = dplyr::recode(event, "first_dose"="dose_1", "first_visit"="visit_1"),
    key = paste0(event, "_", key)
  ) %>% 
  dplyr::select(-event)

# Show the first 10 rows of the EAV table.
ds_eav %>% 
  head(10)
study id arm key value
100 1 dose_1_pmq1 2
100 1 visit_1_pmq1 1
100 1 dose_2_pmq1 3
100 1 visit_2_pmq1 0
220 1 dose_1_pmq1 0
220 1 visit_1_pmq1 0
220 1 dose_2_pmq1 1
220 1 visit_2_pmq1 3
304 2 dose_1_pmq1 0
304 2 visit_1_pmq1 2
# Spread the EAV to wide.
ds_wide <- ds_eav %>% 
  tidyr::spread(key=key, value=value)
ds_wide
study id arm dose 1 pmq1 dose 1 pmq2 dose 1 pmq3 dose 1 pmq4 dose 2 pmq1 dose 2 pmq2 dose 2 pmq3 dose 2 pmq4 visit 1 pmq1 visit 1 pmq2 visit 1 pmq3 visit 1 pmq4 visit 2 pmq1 visit 2 pmq2 visit 2 pmq3 visit 2 pmq4
100 1 2 2 1 1 3 1 0 0 1 0 0 0 0 1 0 0
220 1 0 1 0 2 1 2 0 1 0 3 1 0 3 4 1 0
304 2 0 1 0 0 NA NA NA NA 2 0 0 0 NA NA NA NA

SSL Options

The official cURL site discusses the process of using SSL to verify the server being connected to.

Use the SSL cert file that come with the openssl package.

cert_location <- system.file("cacert.pem", package="openssl")
if( file.exists(cert_location) ) {
  config_options         <- list(cainfo=cert_location)
  ds_different_cert_file <- redcap_read_oneshot(
    redcap_uri     = uri,
    token          = token_simple,
    config_options = config_options
  )$data
}
#> 5 records and 24 columns were read from REDCap in 0.5 seconds.  The http status code was 200.

Force the connection to use SSL=3 (which is not preferred, and possibly insecure).

config_options <- list(sslversion=3)
ds_ssl_3 <- redcap_read_oneshot(
  redcap_uri     = uri,
  token          = token_simple,
  config_options = config_options
)$data
#> 5 records and 24 columns were read from REDCap in 0.4 seconds.  The http status code was 200.
config_options <- list(ssl.verifypeer=FALSE)
ds_no_ssl <- redcap_read_oneshot(
   redcap_uri     = uri,
   token          = token_simple,
   config_options = config_options
)$data
#> 5 records and 24 columns were read from REDCap in 0.5 seconds.  The http status code was 200.

Session Information

For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.

Environment

#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.0 (2017-04-21)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_US                       
#>  collate  C                           
#>  tz       America/Chicago             
#>  date     2017-05-18
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                           
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                   
#>  backports    1.0.5      2017-01-18 CRAN (R 3.3.1)                   
#>  base       * 3.4.0      2017-04-21 local                            
#>  bindr        0.1        2016-11-13 cran (@0.1)                      
#>  bindrcpp   * 0.1        2016-12-11 cran (@0.1)                      
#>  compiler     3.4.0      2017-04-21 local                            
#>  curl         2.6        2017-04-27 CRAN (R 3.4.0)                   
#>  data.table   1.10.4     2017-02-01 CRAN (R 3.3.1)                   
#>  datasets   * 3.4.0      2017-04-21 local                            
#>  devtools     1.13.1     2017-05-13 CRAN (R 3.4.0)                   
#>  digest       0.6.12     2017-01-27 CRAN (R 3.3.1)                   
#>  dplyr        0.5.0.9005 2017-05-18 Github (tidyverse/dplyr@aece1a5) 
#>  evaluate     0.10       2016-10-11 CRAN (R 3.3.1)                   
#>  glue         1.0.0      2017-04-17 CRAN (R 3.4.0)                   
#>  graphics   * 3.4.0      2017-04-21 local                            
#>  grDevices  * 3.4.0      2017-04-21 local                            
#>  highr        0.6        2016-05-09 CRAN (R 3.3.0)                   
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                   
#>  httr         1.2.1      2016-07-03 CRAN (R 3.3.1)                   
#>  kableExtra   0.1.0      2017-03-02 CRAN (R 3.3.1)                   
#>  knitr      * 1.16       2017-05-18 CRAN (R 3.4.0)                   
#>  magrittr   * 1.5        2014-11-22 CRAN (R 3.3.0)                   
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                   
#>  methods    * 3.4.0      2017-04-21 local                            
#>  R6           2.2.1      2017-05-10 CRAN (R 3.4.0)                   
#>  Rcpp         0.12.10    2017-03-19 CRAN (R 3.3.1)                   
#>  REDCapR    * 0.9.8      2017-05-18 local                            
#>  rlang        0.1.1      2017-05-18 CRAN (R 3.4.0)                   
#>  rmarkdown    1.5        2017-04-26 CRAN (R 3.4.0)                   
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.3.1)                   
#>  rstudioapi   0.6        2016-06-27 CRAN (R 3.3.1)                   
#>  rvest        0.3.2      2016-06-17 CRAN (R 3.3.1)                   
#>  selectr      0.3-1      2016-12-19 CRAN (R 3.3.1)                   
#>  stats      * 3.4.0      2017-04-21 local                            
#>  stringi      1.1.5      2017-04-07 CRAN (R 3.3.3)                   
#>  stringr      1.2.0      2017-02-18 CRAN (R 3.3.1)                   
#>  tibble       1.3.1      2017-05-18 Github (tidyverse/tibble@8f30072)
#>  tidyr        0.6.3      2017-05-15 CRAN (R 3.4.0)                   
#>  tools        3.4.0      2017-04-21 local                            
#>  utils      * 3.4.0      2017-04-21 local                            
#>  withr        1.0.2      2016-06-20 CRAN (R 3.3.0)                   
#>  XML          3.98-1.7   2017-05-03 CRAN (R 3.4.0)                   
#>  xml2         1.1.1      2017-01-24 CRAN (R 3.3.1)                   
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.3.1)

Report rendered by wibeasley at 2017-05-18, 12:45 -0500 in 2 seconds.