Regression and Similarity Evaluation for Memory-Based Learning in Spectral Chemometrics

Leo Ramirez-Lopez & Antoine Stevens

Visit the resemble site here

Installing the package is very simple:

install.packages('resemble')

If you do not have the following packages installed, in some cases it is better to install them first

install.packages('Rcpp')
install.packages('RcppArmadillo')
install.packages('foreach')
install.packages('iterators')

Note: Apart from these packages we stronly recommend to download and install Rtools (directly from here or from CRAN https://cran.r-project.org/bin/windows/Rtools/). This is important for obtaining the proper C++ toolchain that you might need for using resemble.

Then, install resemble

install.packages('C:/MyFolder/resemble-1.2.2.zip', repos = NULL)

The development version can be obtained at the package website

After installing resemble you should be also able to run the following lines:

require(resemble)

help(mbl)

#install.packages('prospectr')
require(prospectr)

data(NIRsoil)

Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]

Xu <- Xu[!is.na(Yu),]
Xr <- Xr[!is.na(Yr),]

Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]

# Example of the mbl function
# A mbl approach (the spectrum-based learner) as implemented in Ramirez-Lopez et al. (2013)
# An exmaple where Yu is supposed to be unknown, but the Xu (spectral variables) are known
ctrl <- mblControl(sm = 'pc', pcSelection = list('opc', 40),
                   valMethod = 'NNv', center = TRUE)

sbl.u <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu,
             mblCtrl = ctrl,
             dissUsage = 'predictors',
             k = seq(40, 150, by = 10),
             method = 'gpr')

getPredictions(sbl.u)

resemble implements a function dedicated to non-linear modelling of complex visible and infrared spectral data based on memory-based learning (MBL, a.k.a instance-based learning or local modelling in the chemometrics literature). The package also includes functions for: computing and evaluate spectral similarity/dissimilarity matrices; projecting the spectra onto low dimensional orthogonal variables; removing irrelevant spectra from a reference set; etc.

The functions for computing and evaluate spectral similarity/dissimilarity matrices can be summarized as follows:

fDiss: Euclidean and Mahalanobis distances as well as the cosine dissimilarity (a.k.a spectral angle mapper)
corDiss: correlation and moving window correlation dissimilarity
sid: spectral information divergence between spectra or between the probability distributions of spectra
orthoDiss: principal components and partial least squares dissimilarity (including several options)
simEval: evaluates a given similarity/dissimilarity matrix based on the concept of side information

The functions for projecting the spectra onto low dimensional orthogonal variables are:

pcProjection: projects the spectra onto a principal component space
plsProjection: projects the spectra onto a partial least squares component space (a.k.a projection to latent structures)
orthoProjection: reproduces either the pcProjection or the plsProjection functions

The projection functions also offer different options for optimizing/selecting the number of components involved in the projection.

The functions modelling the spectra using memory-based learning are:

mblControl: controls some modelling aspects of the mbl function
mbl: models the spectra by memory-based learning

Some additional miscellaneous functions are:

print.mbl: prints a summary of the results obtained by the mbl function
plot.mbl: plots a summary of the results obtained by the mbl function
print.localOrthoDiss: prints local distance matrices generated with the orthoDiss function

In order to expand a little bit more the explanation on the mbl function, let's define first the basic input datasets:

In order to predict each value in Yu, the mbl function takes each sample in Xu and searches in Xr for its k-nearest neighbours (most spectrally similar samples). Then a (local) model is calibrated with these (reference) neighbours and it immediately predicts the correspondent value in Yu from Xu. In the function, the k-nearest neighbour search is performed by computing spectral similarity/dissimilarity matrices between samples. The mbl function offers the following regression options for calibrating the (local) models:

'gpr': Gaussian process with linear kernel
'pls': Partial least squares
'wapls1': Weighted average partial least squares 1
'wapls2': Weighted average partial least squares 2 (no longer supported)

Keywords

Bug report and development version

You can send an e-mail to the package maintainer (ramirez.lopez.leo@gmail.com) or create an issue on github.