-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Hi,
As can be seen in the log below, terastructure is having issues with a dataset I'm trying to use it on. While computing the held-out likelihood, the gamma function overflows, I think? The result seems to occur with this data at any reasonable k (2<=k<=12), with rfreq of 1, 2 and 3 million SNPs, and with any seed. I've confirmed the shape of the dataset is correct.
$ terastructure \
> -label rfreq1M-012 \
> -rfreq 1000000 \
> -file plink/nooutlier.012 \
> -n 123 \
> -seed 1234 \
> -l 8579612 \
> -k 5 \
> -nthreads 16
+ rfreq = 1000000
+ using file plink/nooutlier.012
+ n = 123
+ L = 8579612
+ K = 5
+ Creating directory n123-k5-l8579612-rfreq1M-012
+ Writing log to n123-k5-l8579612-rfreq1M-012/infer.log
+ done initializing env
+ .012 detected+ reading (123,8579612) snps from plink/nooutlier.012
8570000 locations read+ initialization begin
+ computing initial heldout likelihood
gsl: gamma.c:1489: ERROR: overflow
done:0.00%Default GSL error handler invoked.
Aborted
EDIT:
This is with GSL 2.3, using the intel compiler suite on 64 bit Centos 6.
As an aside, I've managed to get this dataset to "work" using plink BED files, however it gives nonsense results: For all k > 2, all samples have theta of 0.999995 for one particular population. This suggests to me that read_bed and read_012 are reading differently (the dataset is identical).
Metadata
Metadata
Assignees
Labels
No labels