`phyloregion`

is a free R package hosted on GitHub. To install `phyloregion`

, with the help from the `remotes`

package, type the following commands in R:

```
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("darunabas/phyloregion")
```

When installed, load the package in R:

`phyloregion`

The workflow of the `phyloregion`

package demonstrates steps from preparation of different types of data to visualizing the results of biogeographical regionalization, together with tips on selecting the optimal method for achieving the best output, depending on the types of data used and research questions.

In R, phylogenetic relationships among species / taxa are often represented as a phylo object implemented in the `ape`

package (Paradis & Schliep, 2018). Phylogenies (often in the Newick or Nexus formats) can be imported into R with the `read.tree`

or `read.nexus`

functions of the `ape`

package.

Community data are commonly stored in a matrix with the sites as rows and species / operational taxonomic units (OTUs) as columns. The elements of the matrix are numeric values indicating the abundance/observations or presence/absence (0/1) of OTUs in different sites. In practice, such a matrix can contain many zero values because species are known to generally have unimodal distributions along environmental gradients (Ter Braak & Prentice, 2004), and storing and analyzing every single element of that matrix can be computationally challenging and expensive.

`phyloregion`

differs from other R packages (e.g. vegan (Oksanen *et al.*, 2019), picante (Kembel *et al.*, 2010) or betapart (Baselga & Orme, 2012)) in that the data are not stored in a (dense) `matrix`

or `data.frame`

but as a sparse matrix making use of the infrastructure provided by the Matrix package (Bates & Maechler, 2019). A sparse matrix is a matrix with a high proportion of zero entries (Duff, 1977), of which only the non-zero entries are stored and used for downstream analysis.

A sparse matrix representation has two advantages. First the community matrix can be stored in a much memory efficient manner, allowing analysis of larger datasets. Second, for very large datasets spanning thousands of taxa and spatial scales, computations with a sparse matrix are often much faster.

The `phyloregion`

package contains functions to conveniently change between data formats.

```
library(Matrix)
data(africa)
sparse_comm <- africa$comm
dense_comm <- as.matrix(sparse_comm)
object.size(dense_comm)
```

`## 4216952 bytes`

`## 885952 bytes`

Here, the data set in the dense matrix representation consumes roughly five times more memory than the sparse representation.

`phyloregion`

offers a fast means of computing phylogenetic beta diversity, the turnover of branch lengths among sites, making use of and improving on the infrastructure provided by the `betapart`

package (Baselga & Orme, 2012).

```
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Matrix_1.2-18 phyloregion_1.0.2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 compiler_3.6.1 tools_3.6.1 magic_1.5-9
## [5] betapart_1.5.1 digest_0.6.25 evaluate_0.14 nlme_3.1-145
## [9] lattice_0.20-40 mgcv_1.8-31 pkgconfig_2.0.3 rlang_0.4.5
## [13] fastmatch_1.1-0 igraph_1.2.4.2 yaml_2.2.1 parallel_3.6.1
## [17] xfun_0.12 raster_3.0-12 stringr_1.4.0 knitr_1.28
## [21] cluster_2.1.0 rgeos_0.5-2 rcdd_1.2-2 grid_3.6.1
## [25] data.table_1.12.8 rmarkdown_2.1 phangorn_2.5.5 sp_1.4-1
## [29] magrittr_1.5 codetools_0.2-16 htmltools_0.4.0 MASS_7.3-51.5
## [33] splines_3.6.1 abind_1.4-5 picante_1.8.1 permute_0.9-5
## [37] ape_5.3 colorspace_1.4-1 quadprog_1.5-8 stringi_1.4.6
## [41] geometry_0.4.5 vegan_2.5-6
```

Baselga, A. & Orme, C.D.L. (2012) Betapart: An r package for the study of beta diversity. *Methods in Ecology and Evolution*, **3**, 808–812.

Bates, D. & Maechler, M. (2019) *Matrix: Sparse and dense matrix classes and methods*,

Daru, B.H., Elliott, T.L., Park, D.S. & Davies, T.J. (2017) Understanding the processes underpinning patterns of phylogenetic regionalization. *Trends in ecology & evolution*, **32**, 845–860.

Daru, B.H., Karunarathne, P. & Schliep, K. (2020) Phyloregion: R package for biogeographic regionalization and spatial conservation. *bioRxiv*.

Duff, I.S. (1977) A survey of sparse matrix research. *Proceedings of the IEEE*, **65**, 500–535.

Kembel, S., Cowan, P., Helmus, M., Cornwell, W., Morlon, H., Ackerly, D., Blomberg, S. & Webb, C. (2010) Picante: R tools for integrating phylogenies and ecology. *Bioinformatics*, **26**, 1463–1464.

Oksanen, J., Blanchet, F.G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., Minchin, P.R., O’Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Szoecs, E. & Wagner, H. (2019) *Vegan: Community ecology package*,

Paradis, E. & Schliep, K. (2018) Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. *Bioinformatics*, **35**, 526–528.

Ter Braak, C.J. & Prentice, I. (2004) *A theory of gradient analysis*. *Advances in ecological research: Classic papers* Advances in ecological research., pp. 235–282. Academic Press.