An introduction to the phyloregion package

Barnabas H. Daru, Piyal Karunarathne & Klaus Schliep

March 29, 2020

1. Installation

phyloregion is a free R package hosted on GitHub. To install phyloregion, with the help from the remotes package, type the following commands in R:

if (!requireNamespace("remotes", quietly = TRUE)) 
    install.packages("remotes") 
remotes::install_github("darunabas/phyloregion")

When installed, load the package in R:

library(phyloregion)

2. Overview and general workflow of phyloregion

The workflow of the phyloregion package demonstrates steps from preparation of different types of data to visualizing the results of biogeographical regionalization, together with tips on selecting the optimal method for achieving the best output, depending on the types of data used and research questions.

Simplified workflow for analysis of biogeographical regionalization using phyloregion. Distribution data is converted to a sparse community matrix. When paired with phylogenetic data, phylobuilder creates a subtree with largest overlap from a species list, thereby ensuring complete representation of missing data; phylocommunity matrix to visualization of results.

Simplified workflow for analysis of biogeographical regionalization using phyloregion. Distribution data is converted to a sparse community matrix. When paired with phylogenetic data, phylobuilder creates a subtree with largest overlap from a species list, thereby ensuring complete representation of missing data; phylocommunity matrix to visualization of results.

3. Input data

Phylogenies

In R, phylogenetic relationships among species / taxa are often represented as a phylo object implemented in the ape package (Paradis & Schliep, 2018). Phylogenies (often in the Newick or Nexus formats) can be imported into R with the read.tree or read.nexus functions of the ape package.

Community data

Community data are commonly stored in a matrix with the sites as rows and species / operational taxonomic units (OTUs) as columns. The elements of the matrix are numeric values indicating the abundance/observations or presence/absence (0/1) of OTUs in different sites. In practice, such a matrix can contain many zero values because species are known to generally have unimodal distributions along environmental gradients (Ter Braak & Prentice, 2004), and storing and analyzing every single element of that matrix can be computationally challenging and expensive.

phyloregion differs from other R packages (e.g. vegan (Oksanen et al., 2019), picante (Kembel et al., 2010) or betapart (Baselga & Orme, 2012)) in that the data are not stored in a (dense) matrix or data.frame but as a sparse matrix making use of the infrastructure provided by the Matrix package (Bates & Maechler, 2019). A sparse matrix is a matrix with a high proportion of zero entries (Duff, 1977), of which only the non-zero entries are stored and used for downstream analysis.

A sparse matrix representation has two advantages. First the community matrix can be stored in a much memory efficient manner, allowing analysis of larger datasets. Second, for very large datasets spanning thousands of taxa and spatial scales, computations with a sparse matrix are often much faster.
The phyloregion package contains functions to conveniently change between data formats.

## 4216952 bytes
## 885952 bytes

Here, the data set in the dense matrix representation consumes roughly five times more memory than the sparse representation.

4. Analysis

Phylogenetic beta diversity

phyloregion offers a fast means of computing phylogenetic beta diversity, the turnover of branch lengths among sites, making use of and improving on the infrastructure provided by the betapart package (Baselga & Orme, 2012).

Session Information

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Matrix_1.2-18     phyloregion_1.0.2
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3        compiler_3.6.1    tools_3.6.1       magic_1.5-9      
##  [5] betapart_1.5.1    digest_0.6.25     evaluate_0.14     nlme_3.1-145     
##  [9] lattice_0.20-40   mgcv_1.8-31       pkgconfig_2.0.3   rlang_0.4.5      
## [13] fastmatch_1.1-0   igraph_1.2.4.2    yaml_2.2.1        parallel_3.6.1   
## [17] xfun_0.12         raster_3.0-12     stringr_1.4.0     knitr_1.28       
## [21] cluster_2.1.0     rgeos_0.5-2       rcdd_1.2-2        grid_3.6.1       
## [25] data.table_1.12.8 rmarkdown_2.1     phangorn_2.5.5    sp_1.4-1         
## [29] magrittr_1.5      codetools_0.2-16  htmltools_0.4.0   MASS_7.3-51.5    
## [33] splines_3.6.1     abind_1.4-5       picante_1.8.1     permute_0.9-5    
## [37] ape_5.3           colorspace_1.4-1  quadprog_1.5-8    stringi_1.4.6    
## [41] geometry_0.4.5    vegan_2.5-6

REFERENCES

Baselga, A. & Orme, C.D.L. (2012) Betapart: An r package for the study of beta diversity. Methods in Ecology and Evolution, 3, 808–812.

Bates, D. & Maechler, M. (2019) Matrix: Sparse and dense matrix classes and methods,

Daru, B.H., Elliott, T.L., Park, D.S. & Davies, T.J. (2017) Understanding the processes underpinning patterns of phylogenetic regionalization. Trends in ecology & evolution, 32, 845–860.

Daru, B.H., Karunarathne, P. & Schliep, K. (2020) Phyloregion: R package for biogeographic regionalization and spatial conservation. bioRxiv.

Duff, I.S. (1977) A survey of sparse matrix research. Proceedings of the IEEE, 65, 500–535.

Kembel, S., Cowan, P., Helmus, M., Cornwell, W., Morlon, H., Ackerly, D., Blomberg, S. & Webb, C. (2010) Picante: R tools for integrating phylogenies and ecology. Bioinformatics, 26, 1463–1464.

Oksanen, J., Blanchet, F.G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., Minchin, P.R., O’Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Szoecs, E. & Wagner, H. (2019) Vegan: Community ecology package,

Paradis, E. & Schliep, K. (2018) Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics, 35, 526–528.

Ter Braak, C.J. & Prentice, I. (2004) A theory of gradient analysis. Advances in ecological research: Classic papers Advances in ecological research., pp. 235–282. Academic Press.