HaploReg http://archive.broadinstitute.org/mammals/haploreg/haploreg.php and RegulomeDB http://www.regulomedb.org are web-based tools that extracts biological information such as eQTL, LD, motifs, etc. from large genomic projects such as ENCODE, the 1000 Genomes Project, Roadmap Epigenomics Project and others. This is sometimes called “post-GWAS” analysis.
The R-package was developed to query those tools (HaploReg and RegulomeDB) directly from in order to facilitate high-throughput genomic data analysis. Below we provide several examples that show how to work with this package.
Note: you must have a stable Internet connection to use this package.
Contact: ilya.zhbannikov@duke.edu for questions of usage the or any other issues.
In order to install the package, the user must first install R https://www.r-project.org. After that, (its developer version) can be installed with:
devtools::install_github("izhbannikov/haplor", buildVignette=TRUE)
library(haploR)
results <- queryHaploreg(query=c("rs10048158","rs4791078"))
head(results)
## chr pos_hg38 r2 D' is_query_snp rsID ref alt AFR AMR ASN
## 1 17 66213160 0.82 0.93 0 rs4790914 C G 0.84 0.66 0.91
## 2 17 66213422 0.82 0.93 0 rs4791079 T G 0.85 0.67 0.91
## 3 17 66213896 0.82 0.93 0 rs4791078 A C 0.84 0.67 0.91
## 4 17 66214285 0.83 0.93 0 rs1971682 G C 0.86 0.67 0.91
## 5 17 66216124 0.83 0.93 0 rs4366742 T C 0.93 0.68 0.91
## 6 17 66219453 0.83 0.93 0 rs2215415 G A 0.91 0.67 0.91
## EUR GERP_cons SiPhy_cons Chromatin_States Chromatin_States_Imputed
## 1 0.57 0 0
## 2 0.57 0 0
## 3 0.57 0 0
## 4 0.57 0 0
## 5 0.57 0 0 E118,6_EnhG
## 6 0.57 0 0
## Chromatin_Marks
## 1
## 2
## 3 E002,H3K4me3_Pro;E004,H3K9ac_Pro
## 4 E002,H3K4me3_Pro;E056,H3K4me1_Enh;E066,H3K4me1_Enh;E118,H3K4me1_Enh
## 5 E002,H3K4me1_Enh;E066,H3K4me1_Enh;E118,H3K4me1_Enh;E006,H3K9ac_Pro;E066,H3K27ac_Enh
## 6 E038,H3K9ac_Pro;E066,H3K4me1_Enh
## DNAse Proteins
## 1 .
## 2 .
## 3 .
## 4 HepG2,POL2,UT-A,None
## 5 .
## 6 HepG2,POL2,Stanford,forskolin
## eQTL
## 1 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,3.25964766049438e-10
## 2 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,2.87827072933431e-10;Koopman2014,Heart,PRKCA,9.83E-07
## 3 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,2.87827072933431e-10
## 4 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,5.94596439339797e-11
## 5 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,6.83955923561212e-11
## 6 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,6.80544182605399e-11
## gwas grasp
## 1 . .
## 2 . .
## 3 . .
## 4 . .
## 5 . .
## 6 . .
## Motifs
## 1 AP-4_3;Ascl2;E2A_5;Foxa_disc3;HEN1_2;LBP-1_2;NRSF_disc4;NRSF_disc8;NRSF_known3;Rad21_disc10;Rad21_disc5;SMC3_disc3;ZEB1_disc1
## 2 Pou5f1_disc1;RFX5_known1;Sox_4;TATA_disc7
## 3 RFX5_known5;Zbtb3
## 4 AP-1_disc1;HEN1_1;Maf_disc2;NR4A_known1;RAR;RXRA_known3;T3R
## 5 Pdx1_2
## 6 AP-1_disc1;HEY1_disc1;TATA_disc2
## GENCODE_id GENCODE_name GENCODE_direction GENCODE_distance
## 1 ENSG00000091583.6 APOH 0 0
## 2 ENSG00000091583.6 APOH 0 0
## 3 ENSG00000091583.6 APOH 0 0
## 4 ENSG00000091583.6 APOH 0 0
## 5 ENSG00000091583.6 APOH 0 0
## 6 ENSG00000091583.6 APOH 0 0
## RefSeq_id RefSeq_name RefSeq_direction RefSeq_distance
## 1 NM_000042 APOH 0 0
## 2 NM_000042 APOH 0 0
## 3 NM_000042 APOH 0 0
## 4 NM_000042 APOH 0 0
## 5 NM_000042 APOH 0 0
## 6 NM_000042 APOH 0 0
## dbSNP_functional_annotation query_snp_rsid
## 1 INT rs10048158
## 2 INT rs10048158
## 3 INT rs10048158
## 4 INT rs10048158
## 5 INT rs10048158
## 6 INT rs10048158
Here is a vector with names of genetic variants. are the table similar to the output of HaploReg.
If you have a file with your SNPs you would like to analyze (one SNP per line), you can supply it on an input as follows:
library(haploR)
results <- queryHaploreg(file=system.file("extdata/snps.txt", package = "haploR"))
head(results)
## chr pos_hg38 r2 D' is_query_snp rsID ref alt AFR AMR ASN
## 1 17 66213160 0.82 0.93 0 rs4790914 C G 0.84 0.66 0.91
## 2 17 66213422 0.82 0.93 0 rs4791079 T G 0.85 0.67 0.91
## 3 17 66213896 0.82 0.93 0 rs4791078 A C 0.84 0.67 0.91
## 4 17 66214285 0.83 0.93 0 rs1971682 G C 0.86 0.67 0.91
## 5 17 66216124 0.83 0.93 0 rs4366742 T C 0.93 0.68 0.91
## 6 17 66219453 0.83 0.93 0 rs2215415 G A 0.91 0.67 0.91
## EUR GERP_cons SiPhy_cons Chromatin_States Chromatin_States_Imputed
## 1 0.57 0 0
## 2 0.57 0 0
## 3 0.57 0 0
## 4 0.57 0 0
## 5 0.57 0 0 E118,6_EnhG
## 6 0.57 0 0
## Chromatin_Marks
## 1
## 2
## 3 E002,H3K4me3_Pro;E004,H3K9ac_Pro
## 4 E002,H3K4me3_Pro;E056,H3K4me1_Enh;E066,H3K4me1_Enh;E118,H3K4me1_Enh
## 5 E002,H3K4me1_Enh;E066,H3K4me1_Enh;E118,H3K4me1_Enh;E006,H3K9ac_Pro;E066,H3K27ac_Enh
## 6 E038,H3K9ac_Pro;E066,H3K4me1_Enh
## DNAse Proteins
## 1 .
## 2 .
## 3 .
## 4 HepG2,POL2,UT-A,None
## 5 .
## 6 HepG2,POL2,Stanford,forskolin
## eQTL
## 1 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,3.25964766049438e-10
## 2 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,2.87827072933431e-10;Koopman2014,Heart,PRKCA,9.83E-07
## 3 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,2.87827072933431e-10
## 4 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,5.94596439339797e-11
## 5 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,6.83955923561212e-11
## 6 GTEx2015_v6,Heart_Left_Ventricle,PRKCA,6.80544182605399e-11
## gwas grasp
## 1 . .
## 2 . .
## 3 . .
## 4 . .
## 5 . .
## 6 . .
## Motifs
## 1 AP-4_3;Ascl2;E2A_5;Foxa_disc3;HEN1_2;LBP-1_2;NRSF_disc4;NRSF_disc8;NRSF_known3;Rad21_disc10;Rad21_disc5;SMC3_disc3;ZEB1_disc1
## 2 Pou5f1_disc1;RFX5_known1;Sox_4;TATA_disc7
## 3 RFX5_known5;Zbtb3
## 4 AP-1_disc1;HEN1_1;Maf_disc2;NR4A_known1;RAR;RXRA_known3;T3R
## 5 Pdx1_2
## 6 AP-1_disc1;HEY1_disc1;TATA_disc2
## GENCODE_id GENCODE_name GENCODE_direction GENCODE_distance
## 1 ENSG00000091583.6 APOH 0 0
## 2 ENSG00000091583.6 APOH 0 0
## 3 ENSG00000091583.6 APOH 0 0
## 4 ENSG00000091583.6 APOH 0 0
## 5 ENSG00000091583.6 APOH 0 0
## 6 ENSG00000091583.6 APOH 0 0
## RefSeq_id RefSeq_name RefSeq_direction RefSeq_distance
## 1 NM_000042 APOH 0 0
## 2 NM_000042 APOH 0 0
## 3 NM_000042 APOH 0 0
## 4 NM_000042 APOH 0 0
## 5 NM_000042 APOH 0 0
## 6 NM_000042 APOH 0 0
## dbSNP_functional_annotation query_snp_rsid
## 1 INT rs10048158
## 2 INT rs10048158
## 3 INT rs10048158
## 4 INT rs10048158
## 5 INT rs10048158
## 6 INT rs10048158
Sometimes you would like to explore results from already performed study. In this case you should first the explore existing studies from HaploReg web site and then use one of them as an input parameter. See example below:
library(haploR)
# Getting a list of existing studies:
studies <- getStudyList()
# Let us look at the first element:
studies[[1]]
## $name
## [1] ""
##
## $id
## [1] "0"
# Let us look at the second element:
studies[[2]]
## $name
## [1] "β2-Glycoprotein I (β2-GPI) plasma levels (Athanasiadis G, 2013, 9 SNPs)"
##
## $id
## [1] "1756"
# Query Hploreg to explore results from
# this study:
results <- queryHaploreg(study=studies[[2]])
head(results)
## chr pos_hg38 r2 D' is_query_snp rsID ref alt AFR AMR ASN
## 1 11 34524785 0.97 1 0 rs836138 C A 0.34 0.16 0.42
## 2 11 34524788 0.87 0.97 0 rs11032744 C T 0.04 0.1 0.0035
## 3 11 34526877 1 1 0 rs836137 A G 0.37 0.16 0.42
## 4 11 34527359 1 1 0 rs836135 G A 0.36 0.16 0.42
## 5 11 34527815 1 1 0 rs704727 T A 0.16 0.15 0.42
## 6 11 34530979 0.96 0.99 0 rs836133 C T 0.16 0.15 0.42
## EUR GERP_cons SiPhy_cons Chromatin_States
## 1 0.11 1 1 E091,7_Enh;E118,11_BivFlnk
## 2 0.1 1 0 E091,7_Enh;E118,11_BivFlnk
## 3 0.11 0 0 E066,7_Enh;E118,7_Enh
## 4 0.11 0 0 E066,7_Enh;E118,7_Enh
## 5 0.11 0 0 E066,7_Enh;E118,7_Enh
## 6 0.11 0 0
## Chromatin_States_Imputed
## 1
## 2
## 3
## 4
## 5
## 6
## Chromatin_Marks
## 1 E007,H3K9ac_Pro;E083,H3K9ac_Pro;E117,H3K9ac_Pro;E011,H3K4me1_Enh;E012,H3K4me1_Enh;E022,H3K4me1_Enh;E056,H3K4me1_Enh;E083,H3K4me1_Enh;E091,H3K4me1_Enh;E099,H3K4me1_Enh;E118,H3K4me1_Enh;E055,H3K27ac_Enh;E099,H3K27ac_Enh;E118,H3K4me3_Pro
## 2 E007,H3K9ac_Pro;E083,H3K9ac_Pro;E117,H3K9ac_Pro;E011,H3K4me1_Enh;E012,H3K4me1_Enh;E022,H3K4me1_Enh;E056,H3K4me1_Enh;E083,H3K4me1_Enh;E091,H3K4me1_Enh;E099,H3K4me1_Enh;E118,H3K4me1_Enh;E055,H3K27ac_Enh;E099,H3K27ac_Enh;E118,H3K4me3_Pro
## 3 E006,H3K9ac_Pro;E028,H3K4me1_Enh;E056,H3K4me1_Enh;E066,H3K4me1_Enh;E079,H3K4me1_Enh;E091,H3K4me1_Enh;E099,H3K4me1_Enh;E115,H3K4me1_Enh;E118,H3K4me1_Enh;E037,H3K4me3_Pro;E094,H3K4me3_Pro;E061,H3K27ac_Enh
## 4 E006,H3K9ac_Pro;E066,H3K4me1_Enh;E099,H3K4me1_Enh;E118,H3K4me1_Enh
## 5 E066,H3K4me1_Enh;E118,H3K4me1_Enh
## 6 E118,H3K4me1_Enh
## DNAse Proteins eQTL gwas grasp
## 1 E118 . . . .
## 2 E118 . . . .
## 3 E091 . . . .
## 4 . . . .
## 5 . . . .
## 6 . . . .
## Motifs GENCODE_id
## 1 HDAC2_disc3;Irf_disc5;NRSF_disc10;Sin3Ak-20_disc7 ENSG00000255271.1
## 2 Nanog_disc2;SP1_known3 ENSG00000255271.1
## 3 . ENSG00000255271.1
## 4 Evi-1_4;Nrf1_known2;Pou1f1_2;Pou2f2_known11 ENSG00000255271.1
## 5 Evi-1_5;HNF1_2;Osf2_2;PEBP;Pax-5_disc1;Pax-5_known3 ENSG00000255271.1
## 6 AP-1_known1;ATF4;TCF11::MafG;Zbtb3 ENSG00000255271.1
## GENCODE_name GENCODE_direction GENCODE_distance RefSeq_id RefSeq_name
## 1 RP4-594L9.2 3 8228 NM_001422 ELF5
## 2 RP4-594L9.2 3 8225 NM_001422 ELF5
## 3 RP4-594L9.2 3 6136 NM_001422 ELF5
## 4 RP4-594L9.2 3 5654 NM_001422 ELF5
## 5 RP4-594L9.2 3 5198 NM_001422 ELF5
## 6 RP4-594L9.2 3 2034 NM_001422 ELF5
## RefSeq_direction RefSeq_distance dbSNP_functional_annotation
## 1 5 11001 .
## 2 5 11004 .
## 3 5 13093 .
## 4 5 13575 .
## 5 5 14031 .
## 6 5 17195 .
## query_snp_rsid
## 1 rs836132
## 2 rs836132
## 3 rs836132
## 4 rs836132
## 5 rs836132
## 6 rs836132
library(haploR)
data <- queryRegulome(c("rs4791078","rs10048158"))
head(data)
## $res.table
## #chromosome coordinate rsid
## 1 chr17 64236317 rs10048158
## 2 chr17 64210013 rs4791078
## hits
## 1 Chromatin_Structure||DNase-seq|Progfib, Chromatin_Structure||DNase-seq|Hffmyc, Chromatin_Structure||DNase-seq|Gm04503, Chromatin_Structure||DNase-seq|Nhdfad, Chromatin_Structure||DNase-seq|Bj, Chromatin_Structure||DNase-seq|Sknmc, Chromatin_Structure||DNase-seq|Phte, Chromatin_Structure||DNase-seq|Gm04504, Chromatin_Structure||DNase-seq|Ag10803, Chromatin_Structure||DNase-seq|Hff
## 2 No data
## score
## 1 5
## 2 7
##
## $bad.snp.id
## character(0)
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
##
## locale:
## [1] C/UTF-8/C/C/C/C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] haploR_1.4.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.9 XML_3.98-1.5 digest_0.6.12 rprojroot_1.2
## [5] mime_0.5 R6_2.2.0 backports_1.0.5 magrittr_1.5
## [9] evaluate_0.10 httr_1.2.1 stringi_1.1.2 curl_2.3
## [13] rmarkdown_1.3 tools_3.3.2 stringr_1.1.0 yaml_2.1.14
## [17] htmltools_0.3.5 knitr_1.15.1