Skip to content
whao89 edited this page Nov 16, 2016 · 5 revisions

Topics:

Convergence of algorithm

By default, terastructure converges when the validation likelihood changes by a factor less than 1e-5 (or if the validation likelihood has decreased at least twice in a row). However, the relative convergence cutoff is not the most relevant tuning parameter for users. Because we are sampling random SNPs, we don't compute this likelihood after every iteration. Instead, we compute it every R iterations, then check for convergence. The value of R is specified by the user using the -rfreq option. The choice of R plays a major role in the performance of terastructure! Too small of an R means that not many SNPs are sampled between convergence checks and thus there is little change in the likelihood, resulting in early termination. Too large of an R means that the algorithm may have converged, but checks for convergence too infrequently to notice.

We suggest setting the -rfreq option to a value somewhere in the range of 5% to 20% of the number of SNPs, and experimenting to determine if the value is appropriate. We can suggest two rules of thumbs for assessing if the value of R is too high or two low. If R is too low, then the most recent few reported validation likelihoods will not have plateaued much. If R is too high, then we expect to see oscillation around the maximum of the validation likelihood.

Choosing K

We run the algorithm over a range of K with multiple reps for each value of K. Then, we extract the final validation likelihood for each run. Averaging over all reps, we choose the value of K where the validation likelihood plateaus.

Validation set options

TBA

Clone this wiki locally