# Introduction

We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in (Palowitch et al. 2017), involves an additive effect of allele count and multiplicative component random noise (hence “ACME”: Additive-Contribution, Multiplicative-Error), and is defined as

$y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i$

where

• $$y_i$$ – log-transformed normalized gene read count from sample $$i$$
• $$s_i$$ – minor allele count for the SNP of sample $$i$$
• $$Z_i$$ – the $$p \times 1$$ vector of covarites for the sample $$i$$
• $$\beta_0$$ – unknown “baseline” mean of raw expression
• $$\beta_1$$ – unknown linear contribution of $$s_i$$ on the raw expression
• $$\gamma$$ – unknown $$p \times 1$$ covariate coefficient
• $$\epsilon_i$$ – noise

We estimate the model using a fast iterative algorithm.
The algorithm estimates the model which is nonlinear only with respect to $$\eta = \beta_1 / \beta_0$$

$y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i$

# Installation

ACMEeqtl can be installed with the following command.

install.packages("ACMEeqtl")

# Using the Package

ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.

library(ACMEeqtl)

## Testing a Single Gene-SNP Pair

First we generate sample gene expression, SNP allele counts, and a set of covariates.

# Model parameters
beta0 = 10000
beta1 = 50000

# Data dimensions
nsample = 1000
ncvrt = 19

### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))

# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)

# Generate log-normalized expression
y = log(beta0 + beta1 * s) +
cvrt %*% rnorm(ncvrt) +
rnorm(nsample)

We provide two equivalent functions for model estimation.

• effectSizeEstimationR – fully coded in R
• effectSizeEstimationC – faster version with core coded in C.
z1 = effectSizeEstimationR(s, y, cvrt)
z2 = effectSizeEstimationC(s, y, cvrt)

pander(rbind(z1,z2))
beta0 beta1 nits SSE SST Ftest eta SE_eta
z1 9744 52302 6 907 1764 926 5.37 0.401
z2 9744 52302 6 907 1764 926 5.37 0.401

Variables z1, z2 show that the estimation was done in 6 iterations, with estimated parameters

• $$\hat\beta_0$$ = 9743.5 (true parameter is 10000)
• $$\hat\beta_1$$ = 52302.2 (true parameter is 50000)

## Testing All Local Gene-SNP Pairs

First we generate a eQTL dataset in filematrix format (see filematrix package).

tempdirectory = tempdir()
z = create_artificial_data(
nsample = 100,
ngene = 100,
nsnp = 501,
ncvrt = 1,
minMAF = 0.2,
saveDir = tempdirectory,
returnData = FALSE,
savefmat = TRUE,
savetxt = FALSE,
verbose = FALSE)

In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.

multithreadACME(
genefm = "gene",
snpsfm = "snps",
glocfm = "gene_loc",
slocfm = "snps_loc",
cvrtfm = "cvrt",
acmefm = "ACME",
cisdist = 1.5e+06,
workdir = file.path(tempdirectory, "filematrices"),
verbose = FALSE)

Now the filematrix ACME holds estimations for all local gene-SNP pairs.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))
geneid snp_id beta0 beta1 nits SSE SST F eta SE
1 1 92.7 -26.9 8 104 114 9.51 -0.29 0.0602
1 2 82.7 -16.9 6 112 114 2.17 -0.204 0.109
1 3 72.7 -2.84 3 114 114 0.0595 -0.0391 0.155
1 4 61.7 12.8 5 113 114 1.13 0.207 0.223
2 4 99.1 -30.6 8 103 109 5.89 -0.309 0.079
2 5 82.5 -10.5 4 108 109 0.941 -0.127 0.114
2 6 86.9 -18.1 5 106 109 2.7 -0.209 0.0993
2 7 79.1 -7.04 4 109 109 0.339 -0.089 0.14
2 8 76.3 -3.47 4 109 109 0.0999 -0.0455 0.138
2 9 66.3 12.2 4 108 109 1.11 0.184 0.199

## Testing Multi-SNP Model for All Local Gene-SNP Pairs

Now we can estimate multi-SNP ACME models for each gene:

multisnpACME(
genefm = "gene",
snpsfm = "snps",
glocfm = "gene_loc",
slocfm = "snps_loc",
cvrtfm = "cvrt",
acmefm = "ACME",
workdir = file.path(tempdirectory, "filematrices"),
genecap = Inf,
verbose = FALSE)

Now the filematrix ACME_multiSNP holds estimations for all multi-SNP models.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))
geneid snp_id beta0 betas forward_adjR2
1 1 91.3 -29 0.0799
1 4 91.3 13.1 0.0843
2 4 120 -36.6 0.0475
2 5 120 -11.9 0.0505
3 10 104 21.6 0.00589
4 19 40.8 48.1 0.0669
4 17 40.8 21.3 0.0877
4 18 40.8 15.7 0.0939
5 21 35.7 126 0.198
5 23 35.7 51.4 0.243

Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R$$^2$$ value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model’s adjusted-R$$^2$$ is greater than that from the previous combined model.

# References

Palowitch, John, Andrey Shabalin, Yi-Hui Zhou, Andrew B Nobel, and Fred A Wright. 2017. “Estimation of Cis-eQTL Effect Sizes Using a Log of Linear Model.” Biometrics. Wiley Online Library.