Estimating the optimal number of migration edges from Treemix

This package uses results from the population software ‘Treemix’ by Pickrell and Pritchard (2002) DOI:10.1371/journal.pgen.1002967 to estimate the optimal number of migrations edges to add to the tree. Previously, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but optM automates this process using an *ad hoc* statistic based on the second order rate of change in the log likelihood. OptM has added functionality for various threshold modeling to compare with the ad hoc statistic. The various methods are:

- “Evanno” - calculates an
*ad hoc*statistic we call deltaM based on the Evanno method, or second-order rate of change in likelihood weighted by the standard deviation. - “linear” - estimates of the optimal M based on a piecewise linear (change point), bent cable (alpha), simple exponential (threshold, default 5%), or non-linear least squares (threshold, default 5%) models
- “SiZer” - a method to map and analyze derivatives for change point estimation for ecological modeling.

- To install from CRAN
- First install the R package ‘SiZer’ from CRAN using the command
`install.packages("SiZer")`

- Then install the OptM package using
`install.packages("OptM")`

- Load the package into your working R environment using
`library(OptM)`

- First install the R package ‘SiZer’ from CRAN using the command

To run OptM, you will need a folder of output files produced by Treemix v1.13. The function optM will automatically search the folder for the *stem.llik*, *stem.modelcov.gz*, and *stem.cov.gz* files; where “*stem*” is that provided to the *-o* parameter of *treemix*. This *stem* must be in the format *stem*.*M*.*i*; where

*stem*is any name you prefer*M*is the number of migration edges used for the treemix run (*-m*parameter)*i*is the iteration number for that value of*M*

In order for optM to function properly, you must run:

- At least two iterations at each value of
*M*(the number of migration edges) *M*>2.- The range for
*M*must be sequential integers (e.g., 1, 2, 3, etc) - You do not need to run
*M*=0 because*treemix*automatically includes this as the null model in each run.

**NOTE: There will be an error check to see if there is variation across iterations for each M. In other words, if the data are very robust, you may get the same likelihood across all runs, thus the standard deviation across runs is zero and the ad hoc statistic is undefined. In this case, try making larger variations in the dataset (subsetting the SNPs, varying -k in treemix, or other method of permutation/bootstrap).**

```
for m in {1..10}
do
for i in {1..5}
do
treemix \
-i test.treemix.gz \
-o test.${m}.${i} \
-global \
-m ${m} \
-k 1000
done
done
```

- First load the provided example data for a simulated dataset with 3 migration edges; and 10 iterations for
*M*={1-10}`folder <- system.file("extdata", package = "OptM")`

- Next, run
*optM*using the default “Evanno”-like method:`test.optM = optM(folder)`

- Finally, plot the results:
`plot_optM(test.optM, method = "Evanno")`

- Alternatively, run using various linear modeling estimates rather than the
*ad hoc*statistic:`folder <- system.file("extdata", package = "OptM")`

`test.linear = optM(folder, method = "linear")`

`plot_optM(test.linear, method = "linear")`

- OR using
*SiZer*:`folder <- system.file("extdata", package = "OptM")`

`test.sizer = optM(folder, method = "SiZer")`

`plot_optM(test.sizer, method = "SiZer")`

- Version 0.1.1, 2019/1/2
- Released the first version

Fitak, R. R. (2018) optM: an R package to optimize the number of migration edges using threshold models. Journal of Heredity [in prep]

- Or enter the command
`citation("OptM")`

into your R console

Robert Fitak

Department of Biology

Duke University

USA

rfitak9@gmail.com