There are two prevaling methods for using HTS-SIP data to estimate the amount of isotope that each OTU incorporated:
In this vignette, we are going to show how to run both analyses and also compare the results a bit.
First, let’s load some packages including HTSSIP
.
library(dplyr)
library(ggplot2)
library(HTSSIP)
OK. We’re going to be using 2 data files:
We’ll be using the dataset that we simulated in the HTSSIP_sim vignette.
The phyloseq object is similar to the dataset in the other vignettes.
# HTS-SIP data
physeq_rep3
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 6 taxa and 144 samples ]
## sample_data() Sample Data: [ 144 samples by 4 sample variables ]
The associated qPCR data is a list of length = 2.
# qPCR data (list object)
physeq_rep3_qPCR %>% names
## [1] "raw" "summary"
For the analyses in this vignette, we only need the ‘summary’ table.
# qPCR data (list object)
physeq_rep3_qPCR_sum = physeq_rep3_qPCR$summary
physeq_rep3_qPCR_sum %>% head(n=4)
## IS_CONTROL Sample Buoyant_density qPCR_tech_rep_mean
## 1 FALSE 13C-Glu_rep1_1.671712 1.671712 46598172
## 2 FALSE 13C-Glu_rep1_1.671722 1.671722 42299449
## 3 FALSE 13C-Glu_rep1_1.680311 1.680311 53536519
## 4 FALSE 13C-Glu_rep1_1.683540 1.683540 51449609
## qPCR_tech_rep_sd Gradient Treatment Replicate
## 1 9055833 13C-Glu_rep1 13C-Glu 1
## 2 16369298 13C-Glu_rep1 13C-Glu 1
## 3 18265667 13C-Glu_rep1 13C-Glu 1
## 4 3796576 13C-Glu_rep1 13C-Glu 1
OK. Let’s quantify isotope incorporation witht the q-SIP method.
# transforming OTU counts
physeq_rep3_t = OTU_qPCR_trans(physeq_rep3, physeq_rep3_qPCR_sum)
# calculating atom fraction excess
atomX = qSIP_atom_excess(physeq_rep3_t,
control_expr='Treatment=="12C-Con"',
treatment_rep='Replicate')
atomX %>% names
## [1] "W" "A"
The resulting list object contains 2 data.frames. We are interested in the ‘A’ table, which contains estimated BD shifts (Z) and atom fraction excess (A).
atomX$A %>% head(n=4)
## # A tibble: 4 × 9
## OTU Wlab Wlight Z Gi Mlight Mheavymax Mlab
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OTU.1 1.721357 1.699178 0.02217880 0.6361398 308.0065 317.6638 312.0268
## 2 OTU.2 1.706713 1.694977 0.01173549 0.5858298 307.9816 317.6640 310.1139
## 3 OTU.3 1.731520 1.696813 0.03470620 0.6078171 307.9925 317.6639 314.2921
## 4 OTU.4 1.730946 1.704892 0.02605395 0.7045621 308.0405 317.6636 312.7479
## # ... with 1 more variables: A <dbl>
Next, let’s calculate bootstrap confidence intervales for the atom fraction excess estimations.
# calculating bootstrapped CI values
df_atomX_boot = qSIP_bootstrap(atomX, n_boot=100)
df_atomX_boot %>% head(n=4)
## # A tibble: 4 × 11
## OTU Wlab Wlight Z Gi Mlight Mheavymax Mlab
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 OTU.1 1.721357 1.699178 0.02217880 0.6361398 308.0065 317.6638 312.0268
## 2 OTU.2 1.706713 1.694977 0.01173549 0.5858298 307.9816 317.6640 310.1139
## 3 OTU.3 1.731520 1.696813 0.03470620 0.6078171 307.9925 317.6639 314.2921
## 4 OTU.4 1.730946 1.704892 0.02605395 0.7045621 308.0405 317.6636 312.7479
## # ... with 3 more variables: A <dbl>, A_CI_low <dbl>, A_CI_high <dbl>
Now for delta_BD. The setup is easier because we are not using qPCR data, just relative abundances from 16S rRNA sequence data.
df_dBD = delta_BD(physeq_rep3, control_expr='Treatment=="12C-Con"')
df_dBD %>% head(n=4)
## # A tibble: 4 × 4
## OTU CM_control CM_treatment delta_BD
## <chr> <dbl> <dbl> <dbl>
## 1 OTU.1 1.694039 1.720068 0.02602886
## 2 OTU.2 1.690518 1.703567 0.01304861
## 3 OTU.3 1.686954 1.743139 0.05618481
## 4 OTU.4 1.705739 1.726582 0.02084385
Let’s plot the data and compare all of the results. First, let’s join all of the data into one table for plotting. We’ll also format it for plotting.
# checking & joining data
stopifnot(nrow(df_atomX_boot) == nrow(df_dBD))
df_j = dplyr::inner_join(df_atomX_boot, df_dBD, c('OTU'='OTU'))
stopifnot(nrow(df_atomX_boot) == nrow(df_j))
# formatting data for plotting
df_j = df_j %>%
dplyr::mutate(OTU = reorder(OTU, -delta_BD))
OK. Time to plot!
# plotting BD shift (Z)
ggplot(df_j, aes(OTU)) +
geom_point(aes(y=Z), color='blue') +
geom_point(aes(y=delta_BD), color='red') +
geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
labs(x='OTU', y='BD shift (Z)') +
theme_bw() +
theme(
axis.text.x = element_blank()
)
In the figure, red points are delta_BD and blue points are q-SIP. It’s easy to see that delta_BD is a lot more variable than q-SIP. This is likely due to a high influence of compositional data artifacts on delta_BD versus q-SIP.
Let’s make a boxplot to show the difference in estimation variance between the two methods.
# plotting BD shift (Z): boxplots
## formatting the table
df_j_g = df_j %>%
dplyr::select(OTU, Z, delta_BD) %>%
tidyr::gather(Method, BD_shift, Z, delta_BD)
## plotting
ggplot(df_j_g, aes(Method, BD_shift)) +
geom_boxplot() +
geom_hline(yintercept=0, linetype='dashed', alpha=0.5) +
labs(x='Method', y='BD shift (Z)') +
theme_bw()
The boxplot helps to summarize how much more variance delta_BD produces versus q-SIP.
sessionInfo()
## R version 3.3.3 (2017-03-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] phyloseq_1.19.1 HTSSIP_1.2.0 ggplot2_2.2.1 tidyr_0.6.1
## [5] dplyr_0.5.0
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.34.0 jsonlite_1.4
## [3] splines_3.3.3 foreach_1.4.3
## [5] Formula_1.2-1 assertthat_0.2.0
## [7] stats4_3.3.3 latticeExtra_0.6-28
## [9] yaml_2.1.14 RSQLite_1.1-2
## [11] backports_1.0.5 lattice_0.20-35
## [13] digest_0.6.12 GenomicRanges_1.26.4
## [15] RColorBrewer_1.1-2 XVector_0.14.1
## [17] checkmate_1.8.2 colorspace_1.3-2
## [19] htmltools_0.3.5 Matrix_1.2-8
## [21] plyr_1.8.4 XML_3.98-1.6
## [23] DESeq2_1.14.1 genefilter_1.56.0
## [25] zlibbioc_1.20.0 xtable_1.8-2
## [27] scales_0.4.1 BiocParallel_1.8.2
## [29] annotate_1.52.1 tibble_1.3.0
## [31] htmlTable_1.9 mgcv_1.8-17
## [33] IRanges_2.8.2 SummarizedExperiment_1.4.0
## [35] nnet_7.3-12 BiocGenerics_0.20.0
## [37] lazyeval_0.2.0 survival_2.41-3
## [39] magrittr_1.5 memoise_1.1.0
## [41] evaluate_0.10 doParallel_1.0.10
## [43] nlme_3.1-131 MASS_7.3-45
## [45] foreign_0.8-67 vegan_2.4-3
## [47] tools_3.3.3 data.table_1.10.4
## [49] stringr_1.2.0 S4Vectors_0.12.2
## [51] locfit_1.5-9.1 munsell_0.4.3
## [53] cluster_2.0.6 AnnotationDbi_1.36.2
## [55] Biostrings_2.42.1 ade4_1.7-6
## [57] compiler_3.3.3 GenomeInfoDb_1.10.3
## [59] rhdf5_2.18.0 grid_3.3.3
## [61] coenocliner_0.2-2 RCurl_1.95-4.8
## [63] iterators_1.0.8 biomformat_1.2.0
## [65] htmlwidgets_0.8 igraph_1.0.1
## [67] bitops_1.0-6 base64enc_0.1-3
## [69] labeling_0.3 rmarkdown_1.4
## [71] gtable_0.2.0 codetools_0.2-15
## [73] multtest_2.30.0 DBI_0.6-1
## [75] reshape2_1.4.2 R6_2.2.0
## [77] gridExtra_2.2.1 knitr_1.15.1
## [79] Hmisc_4.0-2 rprojroot_1.2
## [81] permute_0.9-4 ape_4.1
## [83] stringi_1.1.5 parallel_3.3.3
## [85] Rcpp_0.12.10 geneplotter_1.52.0
## [87] rpart_4.1-10 acepack_1.4.1