Concavity profiles with Profile Parsimony

Martin R. Smith

2019-03-21

Profile Parsimony: Preparing data

How are profile scores generated and what do they mean?

In this vignette we’ll understand profile scores, and get a small insight into how they are calculated in the TreeSearch package.

Let’s get started by loading the package and one of its datasets. (To understand how to load your own phylogenetic data into R, see the separate TreeSearch vignette.) These datasets are those generated by Congreve & Lamsdell (2016b, 2016a).

One thing that`s interesting to know is how much extra precision is gained by running larger samples of trees when generating concavity curves.

preci1 <- PrepareDataProfile(dataset, precision=2e+05) # Quick, imprecise
preci2 <- PrepareDataProfile(dataset, precision=4e+05)
preci3 <- PrepareDataProfile(dataset, precision=8e+05)
info1 <- attr(preci1, 'info.amounts')
info2 <- attr(preci2, 'info.amounts')
info3 <- attr(preci3, 'info.amounts')
diff32 <- as.double(info3 - info2)
hist (diff32, breaks=seq(min(diff32) - 0.002, max(diff32) + 0.005, by=0.002))

if (all_the_time_in_the_world <- FALSE) {
preci4 <- PrepareDataProfile(dataset, precision=1.6e+06)
preci5 <- PrepareDataProfile(dataset, precision=3.2e+06) # Slow, more precise
info4 <- attr(preci4, 'info.amounts')
info5 <- attr(preci5, 'info.amounts')
diff42 <- as.double(info4 - info2)
diff43 <- as.double(info4 - info3)
diff54 <- as.double(info5 - info4)
nonzero <- info4 > 0.00001
hist (diff43)
hist (thisDiff <- diff54); quantile(thisDiff, probs=c(0, 5, 10, 50, 90, 95, 100)/100)
hist (diff42)
hist(100*(diff32 / info4)[nonzero])
hist(100*(diff42 / info4)[nonzero])
hist(100*(diff43 / info4)[nonzero])
}
diff12 <- info1[1:10, ] - info2
hist(diff12, breaks=seq(min(diff12)-0.01, max(diff12)+0.01, by=0.01))

hist(info3 - info2)

hist(info3 - info1[1:10, ])

if (all_the_time_in_the_world) {
hist(info4 - info2)
}

References

Congreve, C. R., & Lamsdell, J. C. (2016a). Data from: Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Dryad Digital Repository, doi:10.5061/dryad.7dq0j. doi:10.5061/dryad.7dq0j

Congreve, C. R., & Lamsdell, J. C. (2016b). Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Palaeontology, 59(3), 447–465. doi:10.1111/pala.12236