Convolution-type smoothed quantile regression
The conquer
library performs fast and accurate convolution-type smoothed quantile regression (Fernandes, Guerre and Horta, 2019) implemented via Barzilai-Borwein gradient descent (Barzilai and Borwein, 1988) with a Huber regression warm start. The package can also construct confidence intervals for regression coefficients using multiplier bootstrap.
conquer
is available on CRAN, and it can be installed into R
environment:
A collection of error / warning messages we received from issues or e-mails and their solutions:
Error: smqr.cpp: ‘quantile’ is not a member of ‘arma’. Solution: ‘quantile’ function is added into RcppArmadillo
version 0.9.850.1.0 (2020-02-09), so reinstalling / updating the library RcppArmadillo
will fix this issue.
Error: unable to load shared object.. Symbol not found: _EXTPTR_PTR. Solution: This issue is common in some specific versions of R
when we load any Rcpp-based libraries. It is an error in R caused by a minor change about EXTPTR_PTR
. Upgrading R to 4.0.2 will solve the problem.
Let us illustrate conquer by a simple example. For sample size n = 5000 and dimension p = 70, we generate data from a linear model yi = β0 + <xi, β> + εi, for i = 1, 2, … n. Here we set β0 = 1, β is a p-dimensional vector with every entry being 1, xi follows p-dimensional standard multivariate normal distribution (available in the library MASS
), and εi is from t2 distribution.
library(MASS)
library(quantreg)
library(conquer)
n = 5000
p = 70
beta = rep(1, p + 1)
set.seed(2020)
X = mvrnorm(n, rep(0, p), diag(p))
err = rt(n, 2)
Y = cbind(1, X) %*% beta + err
Then we run both quantile regression using package quantreg
, with a Frisch-Newton approach after preprocessing (Portnoy and Koenker, 1997), and conquer (with Gaussian kernel) on the generated data. The quantile level τ is fixed to be 0.5.
tau = 0.5
start = Sys.time()
fit.qr = rq(Y ~ X, tau = tau, method = "pfn")
end = Sys.time()
time.qr = as.numeric(difftime(end, start, units = "secs"))
est.qr = norm(as.numeric(fit.qr$coefficients) - beta, "2")
start = Sys.time()
fit.conquer = conquer(X, Y, tau = tau)
end = Sys.time()
time.conquer = as.numeric(difftime(end, start, units = "secs"))
est.conquer = norm(fit.conquer$coeff - beta, "2")
It takes 0.1955 seconds to run the standard quantile regression but only 0.0255 seconds to run conquer. In the meanwhile, the estimation error is 0.1799 for quantile regression and 0.1685 for conquer. For readers’ reference, these runtimes are recorded on a Macbook Pro with 2.3 GHz 8-Core Intel Core i9 processor, and 16 GB 2667 MHz DDR4 memory.
Help on the functions can be accessed by typing ?
, followed by function name at the R
command prompt.
For example, ?conquer
will present a detailed documentation with inputs, outputs and examples of the function conquer
.
GPL-3.0
C++11
Xuming He xmhe@umich.edu, Xiaoou Pan xip024@ucsd.edu, Kean Ming Tan keanming@umich.edu and Wen-Xin Zhou wez243@ucsd.edu
Xiaoou Pan xip024@ucsd.edu
Barzilai, J. and Borwein, J. M. (1988). Two-point step size gradient methods. IMA J. Numer. Anal. 8 141–148. Paper
Fernandes, M., Guerre, E. and Horta, E. (2019). Smoothing quantile regressions. J. Bus. Econ. Statist., in press. Paper
He, X., Pan, X., Tan, K. M., and Zhou, W.-X. (2020). Smoothed quantile regression with large-scale inference. Preprint.
Horowitz, J. L. (1998). Bootstrap methods for median regression models. Econometrica 66 1327–1351. Paper
Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press, Cambridge. Book
Koenker, R. (2019). Package “quantreg”, version 5.54. CRAN
Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33-50. Paper
Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279–300. Paper
Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper