Custom workflow using low-level parsers

Hugo Gruson

2019-11-13

Some use cases require more flexibility than the high-level user-friendly functions provides by lightr. For this use case, lightr also exports the low-level individual parsers, which allow the user to code its own custom workflow.

We don’t recommend the use of those functions unless you absolutely have to. Most users should use lr_get_spec() and lr_get_metadata() instead.

A common request from spectral data users is too keep their raw data, without any interpolation. This is not possible in lr_get_spec() but it parses spectra from many different formats and then concatenates them to output a single dataframe. For this to be possible, all spectra must be evaluated over the same wavelengths, which is not usually the case at first. So, we do need to interpolate them to make sure that they can safely be concatenated afterwards.

In this vignette, I described how you can import your spectral data without any interpolation in the case where you are working with only one file format, created from the same spectrometer and software.

library(lightr)

Step 1: find all files

jdx_files <- list.files("data/heliomaster", pattern = "jdx$", full.names = TRUE)

Step 2: import individual spectra

first_jdx <- lr_parse_jdx(jdx_files[1])[[1]]
head(first_jdx)
##       wl      dark     white     scope  processed
## 1 176.36 32822.795 32822.795 32822.795        NaN
## 2 176.58 32822.795 32822.795 32822.795        NaN
## 3 176.80 32822.795 32822.795 32822.795        NaN
## 4 177.02  1611.751  1606.017  1643.290 -550.03488
## 5 177.24  1646.976  1555.227  1631.412   16.96367
## 6 177.47  2505.485  2494.426  2548.083 -385.18853

As you can see on this first file, lr_parse_$extension() functions return a data frame with many columns. The meaning of each column is explained in full details in ?lr_parse_jdx. Here, we are only interested in the first column (the wavelengths) and the last one (the normalised spectral data).

res <- first_jdx[, c("wl", "processed")]

Step 3: create a loop

We captured the wavelengths in the first spectra, we don’t need to save them each time because in this example, they are the same for all spectra. So we only record the “processed” column:

for (i in 2:length(jdx_files)) {
  next_jdx <- lr_parse_jdx(jdx_files[i])[[1]]
  
  res <- cbind(res, next_jdx[, "processed"])
}
colnames(res) <- c("wl", paste0("spec", seq_along(jdx_files)))

And it’s done, we can now convert res to an rspec object and use it in our analyses. The spectrometer I used for those measurements is not reliable outside the 300-700 nm wavelength range so we will only keep this range:

library(pavo)
## Welcome to pavo 2! Take a look at the latest features (and update your bibliography) in our recent publication: Maia R., Gruson H., Endler J. A., White T. E. (2019) pavo 2: new tools for the spectral and spatial analysis of colour in R. Methods in Ecology and Evolution, 10, 1097-1107.
res <- na.omit(res)
res <- res[res$wl > 300 & res$wl < 700, ]
res <- as.rspec(res, interp = FALSE, whichwl = 1)
## The spectral data contain 2 negative value(s), which may produce unexpected results if used in models. Consider using procspec() to correct them.
plot(res)

Bonus: one-liner with the tidyverse

The following snippet should work for all file types supported by lightr:

library(tidyverse)
library(fs)

get_uninterp <- function(path, extension) {
  dir_ls(path = path, glob = extension) %>%
    map_dfc(function(file) lightr:::dispatch_parser(file)[[1]]) %>%
    select(wl, starts_with("processed"))
}

get_uninterp("data/heliomaster/", extension = "*.jdx")

Caveats

The example presented in this example might be useful if you want to export your data in a human readable format, while retaining as much information as possible.

But be careful if you intend to use this uninterpolated data in your analyses. Many statistics and models will produce bogus results when fed uninterpolated data.

One reason if because wavelengths are not always evenly distributed within the sampling range (they can result from a \(3^{th}\) to \(5^{th}\) order polynome). Because of this, some regions of the wavelength range will be more heavily sampled than others, so statistics such a \(S1\) (summary.rspec() in pavo) may not make any sense.