metaObject
as input for analysissummarizeFilterResults()
datasetObject
with violinPlot()
datasetObject
with rocPlot()
forestPlot()
calculateScore()
The MetaIntegrator package comprises several analysis and plot functions to perform integrated multi-cohort analysis of gene expression data (meta-analysis). The advent of the gene expression microarray has allowed for a rapid increase in gene expression studies. Largely due to the MIAME standards for data sharing, many of these studies have been deposited into public repositories such as the NIH Gene Expression Omnibus (GEO) and ArrayExpress. There is now a wealth of publically available gene expression data available for re-analysis.
An obvious next step to increase statistical power in detecting changes in gene expression associated with some condition is to aggregate data from multiple studies. However, inter-study technical and biological differences prevent us from simply pooling results and summarizing our findings. A random-effects model of meta-analysis circumvents these issues by assuming that the results from each study is drawn from a single distribution, and that such inter-study differences are thus a ‘random effect’. Thus, the MetaIntegrator package will perform a DerSimonian & Laird random-effects meta-analysis for each gene (not probeset) between all target studies between cases and controls; it also performs a Fischers sum-of-logs method on the same data, and requires that a gene is significant by both methods. The resulting p-values are False discovery rate (FDR) corrected to q-values, and will evaluate the hypothesis of whether each gene is differentially expressed between cases and controls across all studies included in the analysis.
The resulting list of genes with significantly different expression between cases and controls can be used for multiple purposes, such as (1) a new diagnostic or prognostic test for the disease of interest, (2) a better understanding of the underlying biology, (3) identification of therapeutic targets, and multiple other applications. Our lab has already used these methods in a wide variety of diseases, including organ transplant reject, lung cancer, neurodegenerative disease, and sepsis (Khatri et al., J Exp Med 2013; Chen et al, Cancer Res 2014; Li et al., Acta Neur Comm 2014; Sweeney et al, Sci Trans Med 2015).
The MetaIntegrator Vignette will take the user through the basic steps of the package, including basic multi-cohort analysis, leave-one-out (LOO) analysis (whereby each of the included datasets is left out and multi-cohort analysis is run on the remaining datasets in a round-robin fashion), selection of significant genes, and then analysis of the gene set. The MetaIntegrator package assumes that the user (1) already has their data in hand, and (2) has already decided which datasets to include in the multi-cohort meta-analysis. Our group recommends that some datasets be left out of the analysis, if possible, for independent validation.
Contact
Winston A. Haynes hayneswa@stanford.edu
Links
MetaIntegrator
Installation
install.packages("MetaIntegrator")
The Metaintegrator package can be used to run a meta-analysis on microarray gene expression data as described in Khatri et al. J Exp Med. 2013. Briefly, it computes an Hedges’ g effect size for each gene in each dataset defined as:
where \(1\) and \(0\) represent the group of cases and controls for a given condition, respectively. For each gene, the summary effect size \(g_s\) is computed using a random effect model as:
where \(W_i\) is a weight equal to \(1/(V_i+T^2)\), where \(V_i\) is the variance of that gene within a given dataset \(i\), and \(T^{2}\) is the inter-dataset variation (for details see: Borenstein M et al Introduction to Meta-analysis, Wiley 2009). For each gene, the False discovery rate (FDR) is computed and a final set of genes is selected based on FDR thresholding.
For a set of signature genes, a signature score can be computed as:
where \(pos\) and \(neg\) are the sets of positive and negative genes, respectively, and \(x_i(gene)\) is the expression of any particular gene in sample \(i\) (a positive score indicates an association with cases and a negative score with controls). This score \(S\) is then converted into a z-score \(Z_s\) as:
1. Data collection, curation and annotation, select datasets for discovery and validation: Helper Functions
2. Meta-analysis on discovery datasets: Meta-Analysis, Filtering, Validation, Visualization, Search, Helper Functions
3. Validation on independent validation datasets: Visualization, Validation, Helper Functions
metaObject
as input for analysisdatasetObject
for each gene expression GEO datasetdatasetObject$expr
) and phenotype (datasetObject$pheno
) information using the read.table()
function in R
datasetObject$class
vector using the phenotype information (0 is control, 1 is case)datasetObject$keys
vector. Mappings are usually stored in GPL files (for format details see GEO Platform guidelines).datasetObject$formattedName
The final datasetObject
should have the structure:
datasetObject: named list
$class: named vector. Names are sample names. Values are 0 if control, 1 if case.
$expr: matrix. Row names are probe names. Column names are sample names. Values are expression values
$keys: named vector. Names are probe names. Values are gene names.
$pheno: data frame. Row names are the sample names. Column names are the annotation information (none required).
$formattedName: string. A formatted name for this dataset which will be used in plots.
Example object structure for one datasetObject
from tinyMetaObject
:
dataObj1 <- tinyMetaObject$originalData$PBMC.Study.1
str(dataObj1, max.level = 1)
## List of 5
## $ class : Named num [1:115] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:115] "Sample 1" "Sample 2" "Sample 3" "Sample 4" ...
## $ expr : num [1:190, 1:115] 10.3 13.6 13.2 13.8 10.2 ...
## ..- attr(*, "dimnames")=List of 2
## $ keys : Named chr [1:190] "Gene41" "Gene58" "Gene112" "Gene59" ...
## ..- attr(*, "names")= chr [1:190] "1294_at" "200036_s_at" "200080_s_at" "200089_s_at" ...
## $ pheno :'data.frame': 115 obs. of 8 variables:
## $ formattedName: chr "PBMC Study 1"
Note: Gene expression values in dataObj1$expr
should be in \(log_2\) scale and the expression data might need normalization. Also, negative gene expression values are problematic for geometric mean calculation.
This can be checked by generating a boxplot of the dataset expression values:
boxplot(dataObj1$expr[,1:15]) # -> shows samples 1-15, to see all run: boxplot(dataObj1$expr)
Here, normalization is not necessary because the median of the samples is similar, and the data is already in log scale because the expression values are between 0 and 15. (If negative expression values would be observed e.g. the lowest expression value is -1, we recommend to shift all expression values of the dataset above 0 by adding +1 to each gene expression measurement in all samples.)
datasetObject
using checkDataObject()
The function checks for errors within the datasetObject
. I returns TRUE
if the object passed error checking, FALSE
otherwise, and it prints warning messages explaining failed checks.
checkDataObject(dataObj1, "Dataset")
## [1] TRUE
metaObject
from dataset objectsGenerate a named list of dataset objects that have been imported for analysis:
# use the additional 2 example datasets from tinyMetaObject
dataObj2 = tinyMetaObject$originalData$Whole.Blood.Study.1
dataObj3 = tinyMetaObject$originalData$Whole.Blood.Study.2
# and create the metaObject
discovery_datasets <- list(dataObj1, dataObj2, dataObj3)
names(discovery_datasets) = c(dataObj1$formattedName, dataObj2$formattedName, dataObj3$formattedName)
exampleMetaObj=list()
exampleMetaObj$originalData <- discovery_datasets
IMPORTANT: Keep at least one dataset out of the discovery datasets to use it for validation!
The final metaObject
should have the structure:
metaObject: named list
$originalData: named list [1]
$datasetName: Dataset object. 'datasetName' will be the (unquoted) name of that dataset.[0,n]
metaObject
before MetaAnalysis using checkDataObject()
The function checks for errors within the metaObject
.
Example how to check your metaObject
:
checkDataObject(exampleMetaObj, "Meta", "Pre-Analysis")
## [1] TRUE
runMetaAnalysis()
Once the data is written to metaObject$originalData
, the Meta-Analysis can be started by:
runMetaAnalysis(metaObject)
The Meta-Analysis results are written into metaObject$metaAnalysis
and the results of the leave-one-out analysis into metaObject$leaveOneOutAnalysis
For details see Meta-Analysis algorithm above
Example:
exampleMetaObj <- runMetaAnalysis(exampleMetaObj, maxCores=1)
## Found common probes in 3
## Computing effect sizes...
## Computing summary effect sizes...
## Computing Fisher's output...