Parallel Annotation

Arnaud Wolfer

2018-06-13

The peakPantheR package is designed for the detection, integration and reporting of pre-defined features in MS files.

The Parallel Annotation is set to detect and integrate multiple compounds in multiple files in parallel and store results in a single object.

Using an example dataset, this vignette will:

Parallel Annotation Concept

Parallel compound integration is set to:

Parallel Annotation Example

We can target 2 features in 3 MS spectra file from the faahKO package with peakPantheR_parallelAnnotation():

setRepositories(ind=1:4)
install.packages('faahKO')

Input Data

Input spectra are selected:

Two targeted features are defined and stored in a table with as columns:

cpdID cpdName rtMin rt rtMax mzMin mz mzMax
ID-1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
ID-2 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

Additional compound and spectra metadata can be provided but isn’t employed during the fit:

sampleType
sample type 1
sample type 2
sample type 1

Initialise and Run Parallel Annotation

A peakPantheRAnnotation object is first initialised with the path to the files to process (spectraPaths), compounds to integrate (targetFeatTable) and additional information and parameters such as spectraMetadata, uROI, FIR and if they should be used (useUROI=TRUE, useFIR=TRUE):

The resulting peakPantheRAnnotation object is not annotated, does not contain and use uROI and FIR

peakPantheR_parallelAnnotation() will execute the annotation across files in parallel (if ncores >0) and return the successful annotations (result$annotation) and failures (result$failures):

# annotate files serially
annotation_result <- peakPantheR_parallelAnnotation(init_annotation, ncores=0, verbose=TRUE)
#> Processing 2 compounds in 3 samples:
#>   uROI:  FALSE
#>   FIR:   FALSE
#> ----- ko15 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.47 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.9 secs
#> ----- ko16 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.46 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.03 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.81 secs
#> ----- ko18 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.44 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.03 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.79 secs
#> Annotation object cannot be reordered by sample acquisition date
#> ----------------
#> Parallel annotation done in: 3.32 secs
#>   0 failure(s)

# successful fit
nbSamples(annotation_result$annotation)
#> [1] 3
data_annotation   <- annotation_result$annotation
data_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 3 samples. 
#>   updated ROI do not exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is annotated

# list failed fit
annotation_result$failures
#> [1] file  error
#> <0 rows> (or 0-length row.names)

Process Parallel Annotation Results

Based on the fit results, updated ROI (uROI) and fallback integration region (FIR) can be determined using annotationParamsDiagnostic():

outputAnnotationDiagnostic() will save to disk annotationParameters_summary.csv containing the original ROI and newly determined uROI and FIR for manual validation. Additionnaly a diagnostic plot for each compound is saved for reference:

Table continues below
cpdID cpdName X ROI_rt ROI_mz ROI_rtMin ROI_rtMax ROI_mzMin
ID-1 Cpd 1 | 3344.888 522.2 3310 3390 522.194778
ID-2 Cpd 2 | 3385.577 496.2 3280 3440 496.195038
Table continues below
ROI_mzMax X uROI_rtMin uROI_rtMax uROI_mzMin uROI_mzMax uROI_rt
522.205222 | 3305.75893 3411.43628 522.194778 522.205222 3344.888
496.204962 | 3337.37666 3462.44903 496.195038 496.204962 3385.577
uROI_mz X FIR_rtMin FIR_rtMax FIR_mzMin FIR_mzMax
522.2 | 3326.10635 3407.27265 522.194778 522.205222
496.2 | 3365.02386 3453.40496 496.195038 496.204962

Diagnostic plot for compound 1: The top panel is an overlay of the extracted EIC across all samples with the fitted curve as dotted line. The panel under the EIC represent each found peak RT peakwidth (rtMin, rtMax and apex marked as dot), ordered with the first sample at the top. The bottom 3 panels represent found RT (peakwidth), m/z (peakwidth) and peak area by run order, with the corresponding histograms to the right

ROI exported to .csv can be updated based on the diagnostic plots; uROI (updated ROI potentially used for all samples) and FIR (fallback integration regions for when no peak is found) can also be tweaked to better fit the peaks.

New Initialisation with Updated Parameters to be Applied to All Study Samples

Following this manual validation of the fit on reference samples, the modified parameters in the .csv file can be reloaded and applied to all study samples.

Load new fit parameters

peakPantheR_loadAnnotationParamsCSV() will load the new .csv parameters (as generated by outputAnnotationDiagnostic()) and initialise a peakPantheRAnnotation object without spectraPaths, spectraMetadata or cpdMetadata which will need to be added before annotation. useUROI and useFIR are set to FALSE and will need to be set accordingly. uROIExist is established depending on the uROI columns present in the .csv and will only be set to TRUE if no NA are present.

Add new samples to process

Now that the fit parameters were set on the QC samples, the same processing can be applied to all study samples. resetAnnotation() will reinitialise all the results and modify the samples or compounds targeted if required:

#> [1] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt15.CDF"
#> [3] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko16.CDF"
#> [4] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt16.CDF"
#> [5] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko18.CDF"
#> [6] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt18.CDF"
Group
KO
WT
KO
WT
KO
WT

Run Final Parallel Annotation

Run the final annotation:

Output final results

The final fits can be saved to disk with outputAnnotationDiagnostic():

For each processed sample, a peakTables contains all the fit information for all compounds targeted. annotationTable( , column) will group the values across all samples and compounds for any peakTables column:

Table continues below
found rtMin rt rtMax mzMin mz mzMax peakArea
TRUE 3342 3342 3395 522.2 522.2 522.2 18409123
TRUE 3345 3387 3428 496.2 496.2 496.2 35467323
Table continues below
maxIntMeasured maxIntPredicted is_filled ppm_error rt_dev_sec
889280 907347 FALSE 0.02338 2.928
1128960 1113682 FALSE 0.0246 0.9518
tailingFactor asymmetryFactor cpdID cpdName
203.5 377.4 ID-1 Cpd 1
1.005 1.009 ID-2 Cpd 2
  ID-1 ID-2
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko15.CDF 18409123 35467323
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt15.CDF 23871264 37965512
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko16.CDF 24775525 37795145
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt16.CDF 25012332 34499235
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko18.CDF 21909568 36717689
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt18.CDF 21729136 36961319

Finally all annotation results can be saved to disk as .csv with outputAnnotationResult(). These .csv will contain the compound metadata, spectra metadata and a file for each column of peakTables (with samples as rows and compounds as columns):

# save
outputAnnotationResult(final_annotation, saveFolder='/final_output_folder/', annotationName='ProjectName', verbose=TRUE)
#> Compound metadata saved at /final_output_folder/ProjectName_cpdMetadata.csv
#> Spectra metadata saved at /final_output_folder/ProjectName_spectraMetadata.csv
#> Peak measurement "found" saved at /final_output_folder/ProjectName_found.csv
#> Peak measurement "rtMin" saved at /final_output_folder/ProjectName_rtMin.csv
#> Peak measurement "rt" saved at /final_output_folder/ProjectName_rt.csv
#> Peak measurement "rtMax" saved at /final_output_folder/ProjectName_rtMax.csv
#> Peak measurement "mzMin" saved at /final_output_folder/ProjectName_mzMin.csv
#> Peak measurement "mz" saved at /final_output_folder/ProjectName_mz.csv
#> Peak measurement "mzMax" saved at /final_output_folder/ProjectName_mzMax.csv
#> Peak measurement "peakArea" saved at /final_output_folder/ProjectName_peakArea.csv
#> Peak measurement "maxIntMeasured" saved at /final_output_folder/ProjectName_maxIntMeasured.csv
#> Peak measurement "maxIntPredicted" saved at /final_output_folder/ProjectName_maxIntPredicted.csv
#> Peak measurement "is_filled" saved at /final_output_folder/ProjectName_is_filled.csv
#> Peak measurement "ppm_error" saved at /final_output_folder/ProjectName_ppm_error.csv
#> Peak measurement "rt_dev_sec" saved at /final_output_folder/ProjectName_rt_dev_sec.csv
#> Peak measurement "tailingFactor" saved at /final_output_folder/ProjectName_tailingFactor.csv
#> Peak measurement "asymmetryFactor" saved at /final_output_folder/ProjectName_asymmetryFactor.csv
#> Summary saved at /final_output_folder/ProjectName_summary.csv

See Also