How are profile scores generated and what do they mean?
In this vignette we’ll understand profile scores, and get a small insight into how they are calculated in the TreeSearch
package.
Let’s get started by loading the package and one of its datasets. (To understand how to load your own phylogenetic data into R, see the separate TreeSearch vignette.) These datasets are those generated by Congreve & Lamsdell (2016b, 2016a).
One thing that`s interesting to know is how much extra precision is gained by running larger samples of trees when generating concavity curves.
```r Precision testing, cache=TRUE preci1 <- PrepareDataProfile(dataset, precision=2e+05) # Quick, imprecise preci2 <- PrepareDataProfile(dataset, precision=4e+05) preci3 <- PrepareDataProfile(dataset, precision=8e+05) info1 <- attr(preci1, ‘info.amounts’) info2 <- attr(preci2, ‘info.amounts’) info3 <- attr(preci3, ‘info.amounts’) diff32 <- as.double(info3 - info2) hist (diff32, breaks=seq(min(diff32) - 0.002, max(diff32) + 0.005, by=0.002))
if (all_the_time_in_the_world <- FALSE) { preci4 <- PrepareDataProfile(dataset, precision=1.6e+06) preci5 <- PrepareDataProfile(dataset, precision=3.2e+06) # Slow, more precise
info4 <- attr(preci4, ‘info.amounts’) info5 <- attr(preci5, ‘info.amounts’)
diff42 <- as.double(info4 - info2) diff43 <- as.double(info4 - info3) diff54 <- as.double(info5 - info4) nonzero <- info4 > 0.00001
hist (diff43) hist (thisDiff <- diff54); quantile(thisDiff, probs=c(0, 5, 10, 50, 90, 95, 100)/100) hist (diff42) hist(100(diff32 / info4)[nonzero]) hist(100(diff42 / info4)[nonzero]) hist(100*(diff43 / info4)[nonzero]) } ```
```r More histograms diff12 <- info1[1:10, ] - info2
hist(diff12, breaks=seq(min(diff12)-0.01, max(diff12)+0.01, by=0.01))
hist(info3 - info2) hist(info3 - info1[1:10, ]) if (all_the_time_in_the_world) { hist(info4 - info2) } ```
Congreve, C. R., & Lamsdell, J. C. (2016a). Data from: Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Dryad Digital Repository, doi:10.5061/dryad.7dq0j. doi:10.5061/dryad.7dq0j
Congreve, C. R., & Lamsdell, J. C. (2016b). Implied weighting and its utility in palaeontological datasets: a study using modelled phylogenetic matrices. Palaeontology, 59(3), 447–465. doi:10.1111/pala.12236