TreeSearch
is an R package that allows, among other things, parsimony search on morphological datasets using the Profile Parsimony optimality criterion of Faith & Trueman (2001).
Profile Parsimony finds the tree that is most faithful to the information contained within a given dataset.
Character data are read from a format using the R function ; see the latter function’s for details.
files can be edited in any standard text editor, for example .
A notable limitation is that the parser cannot interpret curly braces, for example {01}.
Ambiguous tokens of this nature should be replaced with a separate character, for example ‘A’ to denote {01}, ‘B’ to denote {12}, perhaps using a search-and-replace operation in your favourite text editor.
This said, the Profile Parsimony algorithm is currently implemented only for binary characters with no ambiguous tokens. I hope to address this limitation soon.
To install the package for the first time, type
install.packages('TreeSearch')
Once the package has been installed, load it using
library('TreeSearch')
To use this script on your own data, launch R, and type (or copy-paste) the following text into the R console. Lines starting with ‘#’ are comments and do not need to be copied.
Data can be read from a nexus file (note the restrictions detailed above):
my.data <- read.nexus.data('C:/path/to/filename.nex')
Alternatively you can use a built-in dataset:
data(congreveLamsdellMatrices)
my.data <- congreveLamsdellMatrices[[10]]
A contrast matrix translates the tokens used in your dataset to the character states to which they correspond: for example decoding ‘A’ to {01}. For more details, see the ‘phangorn-specials’ vignette in the phangorn package, accesible by typing ‘?phangorn’ in the R prompt and navigating to index > package vignettes.
contrast.matrix <- matrix(data=c(
# 0 1 - # Each column corresponds to a character-state
1,0,0, # Each row corresponds to a token, here 0, denoting the character-state set {0}
0,1,0, # 1 | {1}
0,0,1, # - | {-}
1,1,0, # A | {01}
1,1,0, # + | {01}
1,1,1 # ? | {01-}
), ncol=3, byrow=TRUE); # ncol should correspond to the number of columns in the matrix
dimnames(contrast.matrix) <- list(
c(0, 1, '-', 'A', '+', '?'), # A list of the tokens corresponding to each row
# in the contrast matrix
c(0, 1, '-') # A list of the character-states corresponding to the columns
# in the contrast matrix
)
contrast.matrix
## 0 1 -
## 0 1 0 0
## 1 0 1 0
## - 0 0 1
## A 1 1 0
## + 1 1 0
## ? 1 1 1
If you need to use a contrast matrix, convert the data using
my.phyDat <- phyDat(my.data, type='USER', contrast=contrast.matrix)
We don’t have any such ambiguities in our dataset, so we can go straight in with
my.phyDat <- phangorn::phyDat(my.data, type='USER', levels=c(1, 2))
We then need to prepare our dataset for Profile Parsimony by calculating the information loss implied by each additional step in each character. Ideally we’d pick a high value of precision (> 1e+06): this takes a while, but only needs doing once for each dataset of a given size.
my.prepdata <- PrepareDataProfile(my.phyDat, precision=4e+04)
To start analysis, we need to load a starting tree. We can do this at random:
set.seed(888)
tree <- RandomTree(my.phyDat)
Or using a neighbour joining method, to start at a reasonably good tree:
tree <- NJTree(my.phyDat)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(tree)
Calculate the tree’s parsimony score:
ProfileScore(tree, my.prepdata)
## [1] -278.3369
Search for a better tree:
better.tree <- ProfileTreeSearch(ape::root(tree, '1', resolve.root=TRUE), my.prepdata, EdgeSwapper=RootedTBRSwap)
##
## - Performing tree search. Initial score: -278.3369
## - Final score -280.2899 found 2 times after 100 rearrangements
The parsimony ratchet (Nixon, 1999) is better at finding globally optimal trees. ProfileRatchet
is a convenient wrapper for the underlying function Ratchet
.
# Longwinded approach:
better.tree <- Ratchet(better.tree, my.prepdata, searchHits=10, searchIter=100, ratchIter=5,
swappers=list(RootedTBRSwap, RootedSPRSwap, RootedNNISwap),
InitializeData=ProfileInitMorphy, CleanUpData=ProfileDestroyMorphy,
TreeScorer=ProfileScoreMorphy, Bootstrapper=ProfileBootstrap)
# Less typing!
RootedSwappers <- list(RootedTBRSwap, RootedSPRSwap, RootedNNISwap)
better.tree <- ProfileRatchet(better.tree, my.prepdata,
swappers=RootedSwappers,
searchHits=10, searchIter=100, ratchIter=5)
##
## * Beginning Parsimony Ratchet, with initial score -280.2899
## Completed parsimony ratchet after 5 iterations with score -304.0829
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(better.tree)
The default parameters may not be enough to find the optimal tree; type ?Ratchet
to view all search parameters.
In parsimony search, it is generally good practice to consider trees that are slightly suboptimal. Here, we’ll take a consensus that includes all trees that are suboptimal by up to 1.5 bits. To sample this region of tree space well, the trick is to use large values of ratchHits
and ratchIter
, and small values of searchHits
and searchiter
, so that many runs don’t quite hit the optimal tree. In a serious study, you would want to sample many more than 25 Ratchet hits (ratchHits
).
suboptimals <- ProfileRatchet(better.tree, my.prepdata,
swappers=list(RootedTBRSwap),
returnAll=TRUE, suboptimal=5,
ratchHits=25, ratchIter=5000,
bootstrapHits=15, bootstrapIter=450,
searchHits=10, searchIter=50)
##
## * Beginning Parsimony Ratchet, with initial score -304.0829
## Completed parsimony ratchet after 43 iterations with score -305.1751
##
## Found 1 unique MPTs and 3 suboptimal trees.
Let’s calculate and display a consensus tree that includes slightly suboptimal trees.
par(mar=rep(0.2, 4))
plot(my.consensus <- ape::consensus(suboptimals))
Faith, D. P., & Trueman, J. W. H. (2001). Towards an inclusive philosophy for phylogenetic inference. Systematic Biology, 50(3), 331–350. doi:10.1080/10635150118627
Nixon, K. C. (1999). The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics, 15(4), 407–414. doi:10.1111/j.1096-0031.1999.tb00277.x