Concavity profiles with Profile Parsimony

Martin R. Smith

2018-03-19

Profile Parsimony: Preparing data

How are profile scores generated and what do they mean?

In this vignette we’ll understand profile scores, and get a small insight into how they are calculated in the TreeSearch package.

Let’s get started by loading the package and one of its datasets. (To understand how to load your own phylogenetic data into R, see the separate TreeSearch vignette.) These datasets are those generated by Congreve & Lamsdell (2016b, 2016a).

One thing that`s interesting to know is how much extra precision is gained by running larger samples of trees when generating concavity curves.

```r Precision testing, cache=TRUE preci1 <- PrepareDataProfile(dataset, precision=2e+05) # Quick, imprecise preci2 <- PrepareDataProfile(dataset, precision=4e+05) preci3 <- PrepareDataProfile(dataset, precision=8e+05) info1 <- attr(preci1, ‘info.amounts’) info2 <- attr(preci2, ‘info.amounts’) info3 <- attr(preci3, ‘info.amounts’) diff32 <- as.double(info3 - info2) hist (diff32, breaks=seq(min(diff32) - 0.002, max(diff32) + 0.005, by=0.002))

if (all_the_time_in_the_world <- FALSE) { preci4 <- PrepareDataProfile(dataset, precision=1.6e+06) preci5 <- PrepareDataProfile(dataset, precision=3.2e+06) # Slow, more precise

info4 <- attr(preci4, ‘info.amounts’) info5 <- attr(preci5, ‘info.amounts’)

diff42 <- as.double(info4 - info2) diff43 <- as.double(info4 - info3) diff54 <- as.double(info5 - info4) nonzero <- info4 > 0.00001

hist (diff43) hist (thisDiff <- diff54); quantile(thisDiff, probs=c(0, 5, 10, 50, 90, 95, 100)/100) hist (diff42) hist(100(diff32 / info4)[nonzero]) hist(100(diff42 / info4)[nonzero]) hist(100*(diff43 / info4)[nonzero]) } ```

```r More histograms diff12 <- info1[1:10, ] - info2

hist(diff12, breaks=seq(min(diff12)-0.01, max(diff12)+0.01, by=0.01))

hist(info3 - info2) hist(info3 - info1[1:10, ]) if (all_the_time_in_the_world) { hist(info4 - info2) } ```

References

Congreve, C. R., & Lamsdell, J. C. (2016a). Data from: Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Dryad Digital Repository, doi:10.5061/dryad.7dq0j. doi:10.5061/dryad.7dq0j

Congreve, C. R., & Lamsdell, J. C. (2016b). Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Palaeontology, 59(3), 447–465. doi:10.1111/pala.12236