phylogram is an R package for developing evolutionary trees as deeply-nested lists known as “dendrogram” objects. It provides functions for importing and exporting trees in the Newick parenthetic text format, as well as several functions for command-line tree manipulation. With an emphasis on speed and computational efficiency,
phylogram also includes a suite of tools for rapidly computing distance matrices and building large trees using fast alignment-free k-mer counting and divisive clustering techniques. This package makes R’s powerful nested-list architecture more accessible to evolutionary biologists, and facilitates the analysis of very large sequence datasets.
phylogram from CRAN and load the package, run
To download the development version from GitHub you will first need to ensure you have a C/C++ compliler and the devtools R package installed. Linux users will generally have a compiler such as
gcc installed by default; however Windows users will need to download Rtools and Mac OSX users will need Xcode (note that Rtools and Xcode are not R packages). To download and install devtools, run
Then install and load the
phylogram package by running
devtools::install_github("shaunpwilkinson/phylogram", build_vignettes = TRUE) library("phylogram")
Consider the simple example of a tree with three members named “A”, “B” and “C”, where “B” and “C” are more closely related to eachother than they are to “A”. An unweighted Newick string for this tree would be “(A,(B,C));”. This text can be imported as a dendrogram object using the
read.dendrogram function as follows:
library("phylogram") newick <- "(A,(B,C));" x <- read.dendrogram(text = newick) plot(x)
The following command writes the object back to the console in Newick format without edge weights:
write.dendrogram(x, edges = FALSE)
The syntax is similar when reading and writing text files, except that the
text argument is replaced by
file and a valid file path is passed to the function.
topdown builds a tree by divisive clustering. This is done by counting k-mers and recursively partitioning the sequence set using successive k-means clustering steps. No alignment is necessary and no distance matrix is computed, making it possible to rapidly and efficiently build trees from very large sequence datasets.
This following code demonstrates how to build and plot a divisive tree using the
woodmouse data from the ape package:
library("phylogram") library("ape") data(woodmouse) x <- topdown(woodmouse, k = 5, nstart = 20) op <- par(no.readonly = TRUE) par(mar = c(4, 4, 4, 5)) plot(x, horiz = TRUE) par(op)
These and more examples are available in the package vignette. If downloading the package from github, users will need to have LaTeX installed to build the vignette. RStudio recommends MiKTeX Complete for Windows and TexLive 2013 Full for Mac OS X and Linux. To view the vignette, run
vignette(package = "phylogram")
An overview of the package with links to the function documentation can be found by running
If you experience a problem using this package please either raise it as an issue on GitHub or post it on the phylogram google group.
This software was developed at Victoria University of Wellington with funding from a Rutherford Foundation Postdoctoral Research Fellowship award from the Royal Society of New Zealand.