Large-scale Bayesian variable selection for R

CRAN status badge Build Status codecov

Overview

We introduce varbvs, an R package for analysis of large-scale data sets using Bayesian variable selection methods. To facilitate application of Bayesian variable selection to a range of problems, the varbvs interface hides most of the complexities of modeling and optimization, while also providing many options for adaptation to range of applications. The varbvs software has been used to implement Bayesian variable selection for large problems with over a million variables and thousands of samples, including analysis of massive genome-wide data sets.

The R package been tested by Travis CI, and the tests' code coverage has been assessed by Codecov.

Citing varbvs

If you find that this software is useful for your research project, please cite our paper:

Carbonetto, P., and Stephens, M. (2012). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis 7, 73-108.

License

Copyright (c) 2012-2016, Peter Carbonetto.

The varbvs source code repository by Peter Carbonetto is free software: you can redistribute it under the terms of the GNU General Public License. All the files in this project are part of varbvs. This project is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See file LICENSE for the full text of the license.

Installing the package

To install the official release of the varbvs package available from CRAN (link), in R simply run:

install.packages("varbvs")

Alternatively, you can to install the most up-to-date development version. The easiest way to accomplish this is using the devtools package:

install.packages("devtools")
library(devtools)
install_github("pcarbo/varbvs",subdir = "varbvs-R")

Without devtools, it is a little more complicated, but not hard. Begin by downloading the github repository for this project. The simplest way to do this is to download the repository as a ZIP archive. Once you have extracted the files from the compressed archive, you will see that the main directory has two subdirectories, one containing the MATLAB code, and the other containing the files for the R package.

This subdirectory has all the necessary files to build and install a package for R. To install this package, follow the standard instructions for installing an R package from source. On a Unix or Unix-like platform (e.g., Mac OS X), the following steps should install the R package:

mv varbvs-R varbvs
R CMD build varbvs
R CMD INSTALL varbvs_2.0.0.tar.gz

Using the package

Once you have installed the package, load the package in R by entering

library(varbvs)

To get an overview of the package, enter

help(package = "varbvs")

The key function in this package is function varbvs. Here is an example in which we fit the variable selection model to the Leukemia data:

library(varbvs)
data(leukemia)
fit <- varbvs(leukemia$x,NULL,leukemia$y,family = "binomial",
              logodds = seq(-3.5,-1,0.1),sa = 1)
print(summary(fit))

To get more information about this function, type

help(varbvs)

Working examples

We have provided several R scripts in the vignettes and tests folders to illustrate application of varbvs to small and large data sets:

Credits

The varbvs software package was developed by:
Peter Carbonetto
Dept. of Human Genetics, University of Chicago
and AncestryDNA, San Francisco, California
2012-2016

Xiang Zhou, Xiang Zhu and Matthew Stephens have also contributed to the development of this software.