*Leo Ramirez-Lopez & Antoine Stevens*

Visit the `resemble`

site here

Installing the package is very simple:

`install.packages('resemble')`

If you do not have the following packages installed, in some cases it is better to install them first

```
install.packages('Rcpp')
install.packages('RcppArmadillo')
install.packages('foreach')
install.packages('iterators')
```

**Note**: Apart from these packages we stronly recommend to download and install Rtools (directly from here or from CRAN https://cran.r-project.org/bin/windows/Rtools/). This is important for obtaining the proper C++ toolchain that you might need for using `resemble`

.

Then, install `resemble`

`install.packages('C:/MyFolder/resemble-1.2.2.zip', repos = NULL)`

The development version can be obtained at the package website

After installing `resemble`

you should be also able to run the following lines:

```
require(resemble)
help(mbl)
#install.packages('prospectr')
require(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]
Xu <- Xu[!is.na(Yu),]
Xr <- Xr[!is.na(Yr),]
Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]
# Example of the mbl function
# A mbl approach (the spectrum-based learner) as implemented in Ramirez-Lopez et al. (2013)
# An exmaple where Yu is supposed to be unknown, but the Xu (spectral variables) are known
ctrl <- mblControl(sm = 'pc', pcSelection = list('opc', 40),
valMethod = 'NNv', center = TRUE)
sbl.u <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu,
mblCtrl = ctrl,
dissUsage = 'predictors',
k = seq(40, 150, by = 10),
method = 'gpr')
getPredictions(sbl.u)
```

`resemble`

implements a function dedicated to non-linear modelling of complex visible and infrared spectral data based on memory-based learning (MBL, *a.k.a* instance-based learning or local modelling in the chemometrics literature). The package also includes functions for: computing and evaluate spectral similarity/dissimilarity matrices; projecting the spectra onto low dimensional orthogonal variables; removing irrelevant spectra from a reference set; etc.

The functions for computing and evaluate spectral similarity/dissimilarity matrices can be summarized as follows:

** fDiss**: Euclidean and Mahalanobis distances as well as the cosine dissimilarity (

`corDiss`

`sid`

`orthoDiss`

`simEval`

The functions for projecting the spectra onto low dimensional orthogonal variables are:

** pcProjection**: projects the spectra onto a principal component space

`plsProjection`

`orthoProjection`

`pcProjection`

or the `plsProjection`

functionsThe projection functions also offer different options for optimizing/selecting the number of components involved in the projection.

The functions modelling the spectra using memory-based learning are:

** mblControl**: controls some modelling aspects of the

`mbl`

function`mbl`

Some additional miscellaneous functions are:

** print.mbl**: prints a summary of the results obtained by the

`mbl`

function`plot.mbl`

`mbl`

function`print.localOrthoDiss`

`orthoDiss`

functionIn order to expand a little bit more the explanation on the `mbl`

function, let's define first the basic input datasets:

**Reference (training) set**: Dataset with*n*reference samples (e.g. spectral library) to be used in the calibration of spectral models. Xr represents the matrix of samples (containing the spectral predictor variables) and Yr represents a given response variable corresponding to Xr.**Prediction set**: Data set with*m*samples where the response variable (Yu) is unknown. However it can be predicted by applying a spectral model (calibrated by using Xr and Yr) on the spectra of these samples (Xu).

In order to predict each value in Yu, the `mbl`

function takes each sample in Xu and searches in Xr for its *k*-nearest neighbours (most spectrally similar samples). Then a (local) model is calibrated with these (reference) neighbours and it immediately predicts the correspondent value in Yu from Xu. In the function, the *k*-nearest neighbour search is performed by computing spectral similarity/dissimilarity matrices between samples. The `mbl`

function offers the following regression options for calibrating the (local) models:

** 'gpr'**: Gaussian process with linear kernel

`'pls'`

`'wapls1'`

`'wapls2'`

*Infrared spectroscopy**Chemometrics**Local modelling**Spectral library**Lazy learning**Soil spectroscopy*

- Check our other project called
`prospectr`

. - Check this presentation in which we used the resemble package to predict soil attributes from large scale soil spectral libraries.

You can send an e-mail to the package maintainer (ramirez.lopez.leo@gmail.com) or create an issue on github.