Profile Parsimony: Preparing data

How are profile scores generated and what do they mean?

In this vignette we’ll understand profile scores, and get a small insight into how they are calculated in the TreeSearch package.

Let’s get started by loading the package and one of its datasets. (To understand how to load your own phylogenetic data into R, see the separate TreeSearch vignette.) These datasets are those generated by Congreve & Lamsdell (2016b, 2016a).

One thing thats interesting to know is how much extra precision is gained by running larger samples of trees when generating concavity curves.

preci1 <- PrepareDataProfile(dataset, precision=2e+05) # Quick, imprecise
preci2 <- PrepareDataProfile(dataset, precision=4e+05)
preci3 <- PrepareDataProfile(dataset, precision=8e+05)
info1 <- attr(preci1, 'info.amounts')
info2 <- attr(preci2, 'info.amounts')
info3 <- attr(preci3, 'info.amounts')
diff32 <- as.double(info3 - info2)
hist (diff32, breaks=seq(min(diff32) - 0.002, max(diff32) + 0.005, by=0.002))

if (all_the_time_in_the_world <- FALSE) {
preci4 <- PrepareDataProfile(dataset, precision=1.6e+06)
preci5 <- PrepareDataProfile(dataset, precision=3.2e+06) # Slow, more precise

info4 <- attr(preci4, 'info.amounts')
info5 <- attr(preci5, 'info.amounts')

diff42 <- as.double(info4 - info2)
diff43 <- as.double(info4 - info3)
diff54 <- as.double(info5 - info4)
nonzero <- info4 > 0.00001

hist (diff43)
hist (thisDiff <- diff54); quantile(thisDiff, probs=c(0, 5, 10, 50, 90, 95, 100)/100)
hist (diff42)
hist(100*(diff32 / info4)[nonzero])
hist(100*(diff42 / info4)[nonzero])
hist(100*(diff43 / info4)[nonzero])
}
diff12 <- info1[1:10, ] - info2

hist(diff12, breaks=seq(min(diff12)-0.01, max(diff12)+0.01, by=0.01))

hist(info3 - info2)

hist(info3 - info1[1:10, ])

if (all_the_time_in_the_world) {
hist(info4 - info2)
}`

References

Congreve, C. R., & Lamsdell, J. C. (2016a). Data from: Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Dryad Digital Repository, doi:10.5061/dryad.7dq0j. doi:10.5061/dryad.7dq0j

Congreve, C. R., & Lamsdell, J. C. (2016b). Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Palaeontology, 59(3), 447–465. doi:10.1111/pala.12236