This RMarkdown document demonstrates how key elements from the notebook for case study 2 in the EpiGraphDB paper can be achieved using the R package. For detailed explanations of the case study please refer to the paper or the case study notebook.
Systematic MR of molecular phenotypes such as proteins and expression of transcript levels offer enormous potential to prioritise drug targets for further investigation. However, many genes and gene products are not easily druggable, so some potentially important causal genes may not offer an obvious route to intervention.
A parallel problem is that current GWASes of molecular phenotypes have limited sample sizes and limited protein coverages. A potential way to address both these problems is to use protein-protein interaction information to identify druggable targets which are linked to a non-druggable, but robustly causal target. Their relationship to the causal target increases our confidence in their potential causal role even if the initial evidence of effect is below our multiple-testing threshold.
Here in case study 2 we demonstrate an approach to use data in EpiGraphDB to prioritise potential alternative drug targets in the same PPI network, as follows:
The triangulation of MR evidence and literature evidence as available from EpiGraphDB regarding these candidate genes will greatly enhance our confidence in identifying potential viable drug targets.
library("magrittr")
library("dplyr")
library("purrr")
library("glue")
library("epigraphdb")
Here we configure the parameters used in the case study example. We illustrate this approach using IL23R, an established drug target for inflammatory bowel disease (IBD) (Duerr et al., 2006; Momozawa et al., 2011).
While specific IL23R interventions are still undergoing trials, there is a possibility that these therapies may not be effective for all or even the majority of patients. This case study therefore explores potential alternative drug targets.
"IL23R"
GENE_NAME <- "Inflammatory bowel disease" OUTCOME_TRAIT <-
The assumption here is that the most likely alternative targets are either directly interacting with IL23R or somewhere in the same PPI network. In this example, we consider only genes that were found to interact with IL23R via direct protein-protein interactions, and require that those interacting proteins should also be druggable.
The thousands of genes are classified with regard to their druggability by Finan et al. 2017, where the Tier 1 category refers to approved drugs or those in clinical testing while for other tier categories the druggability confidence drops in order Tier 2 and then Tier 3.
Here we use the GET /gene/druggability/ppi endpoint to get data on the druggable alternative genes.**
function(gene_name) {
get_drug_targets_ppi <- "/gene/druggability/ppi"
endpoint <- list(gene_name = gene_name)
params <- query_epigraphdb(route = endpoint, params = params, mode = "table")
df <-
df
}
get_drug_targets_ppi(gene_name = GENE_NAME)
ppi_df <-
ppi_df#> # A tibble: 42 x 5
#> g1.name p1.uniprot_id p2.uniprot_id g2.name g2.druggability_tier
#> <chr> <chr> <chr> <chr> <chr>
#> 1 IL23R Q5VWK5 P04141 CSF2 Tier 1
#> 2 IL23R Q5VWK5 P01562 IFNA1 Tier 1
#> 3 IL23R Q5VWK5 P01579 IFNG Tier 1
#> 4 IL23R Q5VWK5 P22301 IL10 Tier 1
#> 5 IL23R Q5VWK5 P29460 IL12B Tier 1
#> 6 IL23R Q5VWK5 P42701 IL12RB1 Tier 1
#> 7 IL23R Q5VWK5 P35225 IL13 Tier 1
#> 8 IL23R Q5VWK5 P40933 IL15 Tier 1
#> 9 IL23R Q5VWK5 Q16552 IL17A Tier 1
#> 10 IL23R Q5VWK5 Q96PD4 IL17F Tier 1
#> # … with 32 more rows
For further analysis we select the gene of interest (IL23R) as well as its interacting genes with Tier 1 druggability.
function(ppi_df, include_primary_gene = TRUE) {
get_gene_list <-if (include_primary_gene) {
c(
gene_list <-%>% pull(`g1.name`) %>% unique(),
ppi_df %>% filter(`g2.druggability_tier` == "Tier 1") %>% pull(`g2.name`)
ppi_df
)else {
} ppi_df %>%
gene_list <- filter(`g2.druggability_tier` == "Tier 1") %>%
pull(`g2.name`)
}
gene_list
}
get_gene_list(ppi_df)
gene_list <-
gene_list#> [1] "IL23R" "CSF2" "IFNA1" "IFNG" "IL10" "IL12B" "IL12RB1"
#> [8] "IL13" "IL15" "IL17A" "IL17F" "IL2" "IL22" "IL23A"
#> [15] "IL4" "IL5" "IL6" "IL9" "JAK1" "JAK2" "NFKB1"
#> [22] "PIK3CA" "RORC" "STAT3" "TSLP" "TYK2"
The next step is to find out whether any of these genes have a comparable and statistically plausible effect on IBD.
Here we search EpiGraphDB for the Mendelian randomization (MR) results for these genes and IBD from the recent study by Zheng et al, 2019 (https://epigraphdb.org/xqtl/) via the GET /xqtl/single-snp-mr endpoint.
function(outcome_trait, gene_list, qtl_type) {
extract_mr <- "/xqtl/single-snp-mr"
endpoint <- function(gene_name) {
per_gene <- list(
params <-exposure_gene = gene_name,
outcome_trait = outcome_trait,
qtl_type = qtl_type,
pval_threshold = 1e-5
) query_epigraphdb(route = endpoint, params = params, mode = "table")
df <-
df
} gene_list %>% map_df(per_gene)
res_df <-
res_df
}
c("pQTL", "eQTL") %>% map_df(function(qtl_type) {
xqtl_df <-extract_mr(
outcome_trait = OUTCOME_TRAIT,
gene_list = gene_list,
qtl_type = qtl_type
%>%
) mutate(qtl_type = qtl_type)
})
xqtl_df#> # A tibble: 9 x 9
#> gene.ensembl_id gene.name gwas.id gwas.trait r.beta r.se r.p r.rsid
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 ENSG00000162594 IL23R ieu-a-… Inflammat… 1.50 0.0546 0. rs115…
#> 2 ENSG00000113302 IL12B ieu-a-… Inflammat… 0.418 0.0345 9.59e-34 rs492…
#> 3 ENSG00000162594 IL23R ieu-a-… Inflammat… 0.887 0.0644 4.16e-43 rs206…
#> 4 ENSG00000164136 IL15 ieu-a-… Inflammat… -1.42 0.197 5.53e-13 rs753…
#> 5 ENSG00000113520 IL4 ieu-a-… Inflammat… 0.460 0.0840 4.47e- 8 rs207…
#> 6 ENSG00000096968 JAK2 ieu-a-… Inflammat… -1.90 0.204 1.32e-20 rs478…
#> 7 ENSG00000109320 NFKB1 ieu-a-… Inflammat… 0.974 0.174 2.16e- 8 rs476…
#> 8 ENSG00000143365 RORC ieu-a-… Inflammat… -0.995 0.116 1.21e-17 rs484…
#> 9 ENSG00000168610 STAT3 ieu-a-… Inflammat… 0.597 0.0757 2.96e-15 rs105…
#> # … with 1 more variable: qtl_type <chr>
Can we find evidence in the literature where these genes are found to be associated with IBD to increase our level of confidence in MR results or to provide alternative evidence where MR results to not exist?
We can use the GET /gene/literature endpoint to get data on the literature evidence for the set of genes.
function(outcome_trait, gene_list) {
extract_literature <- function(gene_name) {
per_gene <- "/gene/literature"
endpoint <- list(
params <-gene_name = gene_name,
object_name = outcome_trait %>% stringr::str_to_lower()
) query_epigraphdb(route = endpoint, params = params, mode = "table")
df <-
df
} gene_list %>% map_df(per_gene)
res_df <-%>%
res_df mutate(literature_count = map_int(pubmed_id, function(x) length(x)))
}
extract_literature(
literature_df <-outcome_trait = OUTCOME_TRAIT,
gene_list = gene_list
)
literature_df#> # A tibble: 45 x 5
#> pubmed_id gene.name st.predicate st.object_name literature_count
#> <list> <chr> <chr> <chr> <int>
#> 1 <chr [2]> IL23R NEG_ASSOCIATED_… Inflammatory Bowel Di… 2
#> 2 <chr [1]> IL23R AFFECTS Inflammatory Bowel Di… 1
#> 3 <chr [21]> IL23R ASSOCIATED_WITH Inflammatory Bowel Di… 21
#> 4 <chr [1]> IL23R PREDISPOSES Inflammatory Bowel Di… 1
#> 5 <chr [2]> CSF2 ASSOCIATED_WITH Inflammatory Bowel Di… 2
#> 6 <chr [1]> CSF2 AFFECTS Inflammatory Bowel Di… 1
#> 7 <chr [3]> IFNA1 ASSOCIATED_WITH Inflammatory Bowel Di… 3
#> 8 <chr [1]> IFNA1 PREVENTS Inflammatory Bowel Di… 1
#> 9 <chr [2]> IFNG ASSOCIATED_WITH Inflammatory Bowel Di… 2
#> 10 <chr [1]> IFNG AFFECTS Inflammatory Bowel Di… 1
#> # … with 35 more rows
sessionInfo
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Manjaro Linux
#>
#> Matrix products: default
#> BLAS: /usr/lib/libblas.so.3.9.0
#> LAPACK: /usr/lib/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] epigraphdb_0.2.1 igraph_1.2.5 glue_1.4.1 purrr_0.3.4
#> [5] dplyr_1.0.2 magrittr_1.5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.4.6 knitr_1.28 tidyselect_1.1.0 R6_2.4.1
#> [5] rlang_0.4.7 fansi_0.4.1 stringr_1.4.0 httr_1.4.2
#> [9] tools_4.0.2 xfun_0.14 utf8_1.1.4 cli_2.0.2
#> [13] gtools_3.8.2 htmltools_0.4.0 ellipsis_0.3.1 assertthat_0.2.1
#> [17] yaml_2.2.1 digest_0.6.25 tibble_3.0.3 lifecycle_0.2.0
#> [21] crayon_1.3.4 vctrs_0.3.2 curl_4.3 evaluate_0.14
#> [25] rmarkdown_2.1 stringi_1.4.6 compiler_4.0.2 pillar_1.4.6
#> [29] generics_0.0.2 jsonlite_1.7.0 pkgconfig_2.0.3