In some settings measurements and tests in different conditions may be correlated with one another. For example, in eQTL applications this can occur due to sample overlap among the different conditions.
Failure to deal with such correlations can cause false positives in a
To deal with these correlations
mashr allows the user to specify a correlation matrix \(V\) when setting up the data in
mash_set_data. The easiest way to specify this correlation matrix is to estimate it using
estimate_null_correlation_simple, which, as its name suggests, uses the null tests (specifically, tests without a strong \(z\) score) to estimate the correlations.
Here we simulate data for illustration. The data does not have any correlation, but we will analyze it as if we did not know that.
Read in the data, and estimate correlations:
data = mash_set_data(simdata$Bhat, simdata$Shat) V = estimate_null_correlation_simple(data) data.V = mash_update_data(data, V=V)
Now we have two mash data objects, one (
data.V) with correlations specified, and one without (
data). So analyses using
data.V will allow for correlations, whereas analyses using
data will assume measurements are independent.
Here, for illustration purposes, we proceed to analyze the data with correlations, using just the simple canonical covariances as in the initial introductory vignette.
U.c = cov_canonical(data.V) m.c = mash(data.V, U.c) # fits with correlations because data.V includes correlation information
# - Computing 2000 x 151 likelihood matrix. # - Likelihood calculations took 0.04 seconds. # - Fitting model with 151 mixture components. # - Model fitting took 0.59 seconds. # - Computing posterior matrices. # - Computation allocated took 0.01 seconds.
#  -16121.11177
We can also compare with the original analysis. (Note that the canonical covariances do not depend on the correlations, so we can use the same
U.c here for both analyses. If we used data-driven covariances we might prefer to estimate these separately for each analysis as the correlations would affect them.)
m.c.orig = mash(data, U.c) # fits without correlations because data object was set up without correlations
# - Computing 2000 x 151 likelihood matrix. # - Likelihood calculations took 0.04 seconds. # - Fitting model with 151 mixture components. # - Model fitting took 0.63 seconds. # - Computing posterior matrices. # - Computation allocated took 0.02 seconds.
#  -16120.32142
The log-likelihoods with and without correlations are similar here, which is expected since there are no actual correlations in the data.