The peakPantheR
package is designed for the detection, integration and reporting of pre-defined features in MS files.
The Parallel Annotation is set to detect and integrate multiple compounds in multiple files in parallel and store results in a single object.
Using an example dataset, this vignette will:
Parallel compound integration is set to:
We can target 2 features in 3 MS spectra file from the faahKO
package with peakPantheR_parallelAnnotation()
:
Input spectra are selected:
library(faahKO)
## file paths
input_spectraPaths <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko18.CDF"
Two targeted features are defined and stored in a table with as columns:
cpdID
(numeric)cpdName
(character)rtMin
(sec)rtMax
(sec)rt
(sec, optional / NA
)mzMin
(m/z)mzMax
(m/z)mz
(m/z, optional / NA
)# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", "mz", "mzMax"))), stringsAsFactors=F)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], as.numeric)
cpdID | cpdName | rtMin | rt | rtMax | mzMin | mz | mzMax |
---|---|---|---|---|---|---|---|
ID-1 | Cpd 1 | 3310 | 3344.888 | 3390 | 522.194778 | 522.2 | 522.205222 |
ID-2 | Cpd 2 | 3280 | 3385.577 | 3440 | 496.195038 | 496.2 | 496.204962 |
Additional compound and spectra metadata can be provided but isn’t employed during the fit:
# spectra Metadata
input_spectraMetadata <- data.frame(matrix(c("sample type 1", "sample type 2", "sample type 1"), 3, 1, dimnames=list(c(), c("sampleType"))), stringsAsFactors=F)
sampleType |
---|
sample type 1 |
sample type 2 |
sample type 1 |
A peakPantheRAnnotation
object is first initialised with the path to the files to process (spectraPaths
), compounds to integrate (targetFeatTable
) and additional information and parameters such as spectraMetadata
, uROI
, FIR
and if they should be used (useUROI=TRUE
, useFIR=TRUE
):
library(peakPantheR)
#>
#> This is peakPantheR version 1.2.0
init_annotation <- peakPantheRAnnotation(spectraPaths = input_spectraPaths,
targetFeatTable = input_targetFeatTable,
spectraMetadata = input_spectraMetadata)
The resulting peakPantheRAnnotation
object is not annotated, does not contain and use uROI
and FIR
init_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI do not exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is not annotated
peakPantheR_parallelAnnotation()
will execute the annotation across files in parallel (if ncores
>0) and return the successful annotations (result$annotation
) and failures (result$failures
):
# annotate files serially
annotation_result <- peakPantheR_parallelAnnotation(init_annotation, ncores=0, verbose=TRUE)
#> Processing 2 compounds in 3 samples:
#> uROI: FALSE
#> FIR: FALSE
#> ----- ko15 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.47 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Found 2/2 features in 0.04 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.9 secs
#> ----- ko16 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.46 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.03 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.81 secs
#> ----- ko18 -----
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Check input, mzMLPath must be a .mzML
#> Reading data from 2 windows
#> Data read in: 0.44 secs
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #1
#> Warning: rtMin/rtMax outside of ROI; datapoints cannot be used for mzMin/mzMax calculation, approximate mz and returning ROI$mzMin and ROI$mzMax for ROI #2
#> Found 2/2 features in 0.03 secs
#> Peak statistics done in: 0 secs
#> Feature search done in: 0.79 secs
#> Annotation object cannot be reordered by sample acquisition date
#> ----------------
#> Parallel annotation done in: 3.32 secs
#> 0 failure(s)
# successful fit
nbSamples(annotation_result$annotation)
#> [1] 3
data_annotation <- annotation_result$annotation
data_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI do not exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is annotated
# list failed fit
annotation_result$failures
#> [1] file error
#> <0 rows> (or 0-length row.names)
Based on the fit results, updated ROI (uROI
) and fallback integration region (FIR
) can be determined using annotationParamsDiagnostic()
:
uROI
are established as the min/max (rt
and m/z
) of the found peaks (+/- 5% in RT)FIR
are established as the median of found rtMin
, rtMax
, mzMin
, mzMax
updated_annotation <- annotationParamsDiagnostic(data_annotation, verbose=TRUE)
#> uROI will be set as mimimum/maximum of found peaks (+/-5% of ROI in retention time)
#> FIR will be calculated as the median of found "rtMin","rtMax","mzMin","mzMax"
# uROI now exist
updated_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 3 samples.
#> updated ROI exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is annotated
outputAnnotationDiagnostic()
will save to disk annotationParameters_summary.csv
containing the original ROI
and newly determined uROI
and FIR
for manual validation. Additionnaly a diagnostic plot for each compound is saved for reference:
# create a colourScale based on the sampleType
uniq_sType <- sort(unique(spectraMetadata(updated_annotation)$sampleType),na.last=TRUE)
col_sType <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(updated_annotation)$sampleType] )
# output fit diagnostic to disk
outputAnnotationDiagnostic(updated_annotation, saveFolder='/output_folder/', savePlots=TRUE, sampleColour=col_sType, verbose=TRUE)
cpdID | cpdName | X | ROI_rt | ROI_mz | ROI_rtMin | ROI_rtMax | ROI_mzMin |
---|---|---|---|---|---|---|---|
ID-1 | Cpd 1 | | | 3344.888 | 522.2 | 3310 | 3390 | 522.194778 |
ID-2 | Cpd 2 | | | 3385.577 | 496.2 | 3280 | 3440 | 496.195038 |
ROI_mzMax | X | uROI_rtMin | uROI_rtMax | uROI_mzMin | uROI_mzMax | uROI_rt |
---|---|---|---|---|---|---|
522.205222 | | | 3305.75893 | 3411.43628 | 522.194778 | 522.205222 | 3344.888 |
496.204962 | | | 3337.37666 | 3462.44903 | 496.195038 | 496.204962 | 3385.577 |
uROI_mz | X | FIR_rtMin | FIR_rtMax | FIR_mzMin | FIR_mzMax |
---|---|---|---|---|---|
522.2 | | | 3326.10635 | 3407.27265 | 522.194778 | 522.205222 |
496.2 | | | 3365.02386 | 3453.40496 | 496.195038 | 496.204962 |
Diagnostic plot for compound 1: The top panel is an overlay of the extracted EIC across all samples with the fitted curve as dotted line. The panel under the EIC represent each found peak RT peakwidth (
rtMin
,rtMax
and apex marked as dot), ordered with the first sample at the top. The bottom 3 panels represent foundRT
(peakwidth),m/z
(peakwidth) andpeak area
by run order, with the corresponding histograms to the right
ROI
exported to .csv
can be updated based on the diagnostic plots; uROI
(updated ROI potentially used for all samples) and FIR
(fallback integration regions for when no peak is found) can also be tweaked to better fit the peaks.
Following this manual validation of the fit on reference samples, the modified parameters in the .csv
file can be reloaded and applied to all study samples.
peakPantheR_loadAnnotationParamsCSV()
will load the new .csv
parameters (as generated by outputAnnotationDiagnostic()
) and initialise a peakPantheRAnnotation
object without spectraPaths
, spectraMetadata
or cpdMetadata
which will need to be added before annotation. useUROI
and useFIR
are set to FALSE and will need to be set accordingly. uROIExist
is established depending on the uROI columns present in the .csv
and will only be set to TRUE if no NA
are present.
update_csv_path <- '/path_to_new_csv/'
# load csv
new_annotation <- peakPantheR_loadAnnotationParamsCSV(update_csv_path)
#> uROIExist set to TRUE
#> New peakPantheRAnnotation object initialised for 2 compounds
new_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 0 samples.
#> updated ROI exist (uROI)
#> does not use updated ROI (uROI)
#> does not use fallback integration regions (FIR)
#> is not annotated
Now that the fit parameters were set on the QC samples, the same processing can be applied to all study samples. resetAnnotation()
will reinitialise all the results and modify the samples or compounds targeted if required:
## new files
new_spectraPaths <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
system.file('cdf/WT/wt15.CDF', package = "faahKO"),
system.file('cdf/KO/ko16.CDF', package = "faahKO"),
system.file('cdf/WT/wt16.CDF', package = "faahKO"),
system.file('cdf/KO/ko18.CDF', package = "faahKO"),
system.file('cdf/WT/wt18.CDF', package = "faahKO"))
#> [1] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt15.CDF"
#> [3] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko16.CDF"
#> [4] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt16.CDF"
#> [5] "C:/R/R-3.5.0/library/faahKO/cdf/KO/ko18.CDF"
#> [6] "C:/R/R-3.5.0/library/faahKO/cdf/WT/wt18.CDF"
## new spectra metadata
new_spectraMetadata <- data.frame(matrix(c("KO", "WT", "KO", "WT", "KO", "WT"), 6, 1, dimnames=list(c(), c("Group"))), stringsAsFactors=F)
Group |
---|
KO |
WT |
KO |
WT |
KO |
WT |
## add new samples to the annotation loaded from csv, useUROI, useFIR
new_annotation <- resetAnnotation(new_annotation, spectraPaths=new_spectraPaths, spectraMetadata=new_spectraMetadata, useUROI=TRUE, useFIR=TRUE)
#> peakPantheRAnnotation object being reset:
#> Previous "ROI", "cpdID" and "cpdName" value kept
#> Previous "uROI" value kept
#> Previous "FIR" value kept
#> Previous "cpdMetadata" value kept
#> New "spectraPaths" value set
#> New "spectraMetadata" value set
#> Previous "uROIExist" value kept
#> New "useUROI" value set
#> New "useFIR" value set
Run the final annotation:
# annotate files serially
new_annotation_result <- peakPantheR_parallelAnnotation(new_annotation, ncores=0, verbose=FALSE)
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
#> Polarity can not be extracted from netCDF files, please set manually the polarity with the 'polarity' method.
# successful fit
nbSamples(new_annotation_result$annotation)
#> [1] 6
final_annotation <- new_annotation_result$annotation
final_annotation
#> An object of class peakPantheRAnnotation
#> 2 compounds in 6 samples.
#> updated ROI exist (uROI)
#> uses updated ROI (uROI)
#> uses fallback integration regions (FIR)
#> is annotated
# list failed fit
new_annotation_result$failures
#> [1] file error
#> <0 rows> (or 0-length row.names)
The final fits can be saved to disk with outputAnnotationDiagnostic()
:
# create a colourScale based on the sampleType
uniq_group <- sort(unique(spectraMetadata(final_annotation)$Group),na.last=TRUE)
col_group <- unname( setNames(c('blue', 'red'),c(uniq_sType))[spectraMetadata(final_annotation)$Group] )
# output fit diagnostic to disk
outputAnnotationDiagnostic(final_annotation, saveFolder='/final_output_folder/', savePlots=TRUE, sampleColour=col_group, verbose=TRUE)
For each processed sample, a peakTables
contains all the fit information for all compounds targeted. annotationTable( , column)
will group the values across all samples and compounds for any peakTables
column:
found | rtMin | rt | rtMax | mzMin | mz | mzMax | peakArea |
---|---|---|---|---|---|---|---|
TRUE | 3342 | 3342 | 3395 | 522.2 | 522.2 | 522.2 | 18409123 |
TRUE | 3345 | 3387 | 3428 | 496.2 | 496.2 | 496.2 | 35467323 |
maxIntMeasured | maxIntPredicted | is_filled | ppm_error | rt_dev_sec |
---|---|---|---|---|
889280 | 907347 | FALSE | 0.02338 | 2.928 |
1128960 | 1113682 | FALSE | 0.0246 | 0.9518 |
tailingFactor | asymmetryFactor | cpdID | cpdName |
---|---|---|---|
203.5 | 377.4 | ID-1 | Cpd 1 |
1.005 | 1.009 | ID-2 | Cpd 2 |
# Extract the found peak area for all compounds and all samples
annotationTable(final_annotation, column='peakArea')
ID-1 | ID-2 | |
---|---|---|
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko15.CDF | 18409123 | 35467323 |
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt15.CDF | 23871264 | 37965512 |
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko16.CDF | 24775525 | 37795145 |
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt16.CDF | 25012332 | 34499235 |
C:/R/R-3.5.0/library/faahKO/cdf/KO/ko18.CDF | 21909568 | 36717689 |
C:/R/R-3.5.0/library/faahKO/cdf/WT/wt18.CDF | 21729136 | 36961319 |
Finally all annotation results can be saved to disk as .csv
with outputAnnotationResult()
. These .csv
will contain the compound metadata, spectra metadata and a file for each column of peakTables (with samples as rows and compounds as columns):
# save
outputAnnotationResult(final_annotation, saveFolder='/final_output_folder/', annotationName='ProjectName', verbose=TRUE)
#> Compound metadata saved at /final_output_folder/ProjectName_cpdMetadata.csv
#> Spectra metadata saved at /final_output_folder/ProjectName_spectraMetadata.csv
#> Peak measurement "found" saved at /final_output_folder/ProjectName_found.csv
#> Peak measurement "rtMin" saved at /final_output_folder/ProjectName_rtMin.csv
#> Peak measurement "rt" saved at /final_output_folder/ProjectName_rt.csv
#> Peak measurement "rtMax" saved at /final_output_folder/ProjectName_rtMax.csv
#> Peak measurement "mzMin" saved at /final_output_folder/ProjectName_mzMin.csv
#> Peak measurement "mz" saved at /final_output_folder/ProjectName_mz.csv
#> Peak measurement "mzMax" saved at /final_output_folder/ProjectName_mzMax.csv
#> Peak measurement "peakArea" saved at /final_output_folder/ProjectName_peakArea.csv
#> Peak measurement "maxIntMeasured" saved at /final_output_folder/ProjectName_maxIntMeasured.csv
#> Peak measurement "maxIntPredicted" saved at /final_output_folder/ProjectName_maxIntPredicted.csv
#> Peak measurement "is_filled" saved at /final_output_folder/ProjectName_is_filled.csv
#> Peak measurement "ppm_error" saved at /final_output_folder/ProjectName_ppm_error.csv
#> Peak measurement "rt_dev_sec" saved at /final_output_folder/ProjectName_rt_dev_sec.csv
#> Peak measurement "tailingFactor" saved at /final_output_folder/ProjectName_tailingFactor.csv
#> Peak measurement "asymmetryFactor" saved at /final_output_folder/ProjectName_asymmetryFactor.csv
#> Summary saved at /final_output_folder/ProjectName_summary.csv