Estimating the optimal number of migration edges from Treemix


This package uses results from the population software ‘Treemix’ by Pickrell and Pritchard (2012) DOI:10.1371/journal.pgen.1002967 to estimate the optimal number of migrations edges to add to the tree. Previously, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but optM automates this process using an ad hoc statistic based on the second order rate of change in the log likelihood. OptM has added functionality for various threshold modeling to compare with the ad hoc statistic. The various methods are:

Install OptM (from an R console)

Preparing the input files

To run OptM, you will need a folder of output files produced by Treemix v1.13. The function optM will automatically search the folder for the stem.llik, stem.modelcov.gz, and stem.cov.gz files; where “stem” is that provided to the -o parameter of treemix. It is recommended, but not required, to use stem in the format stem.i.M; where

In order for optM to function properly, you must run:

NOTE: There will be an error check to see if there is variation across iterations for each M. In other words, if the data are very robust, you may get the same likelihood across all runs, thus the standard deviation across runs is zero and the ad hoc statistic is undefined. In this case, try making larger variations in the dataset (subsetting the SNPs, varying -k in treemix, or other method of permutation/bootstrap).

Below is an example run of treemix from a UNIX terminal for M={1-10} and 5 iterations per M:

for m in {1..10}
   for i in {1..5}
      treemix \
         -i test.treemix.gz \
         -o test.${i}.${m} \
         -global \
         -m ${m} \
         -k 1000

To run OptM in R:

Version History


Fitak, R. R. (2018) optM: an R package to optimize the number of migration edges using threshold models. Journal of Heredity [in prep]


Robert Fitak
Department of Biology
Duke University