Skip to content

GSL overflow computing heldout likelihood #14

@kdm9

Description

@kdm9

Hi,

As can be seen in the log below, terastructure is having issues with a dataset I'm trying to use it on. While computing the held-out likelihood, the gamma function overflows, I think? The result seems to occur with this data at any reasonable k (2<=k<=12), with rfreq of 1, 2 and 3 million SNPs, and with any seed. I've confirmed the shape of the dataset is correct.

$ terastructure \
>     -label rfreq1M-012 \
>     -rfreq 1000000 \
>     -file plink/nooutlier.012 \
>     -n 123 \
>     -seed 1234 \
>     -l 8579612 \
>     -k 5 \
>     -nthreads 16
+ rfreq = 1000000
+ using file plink/nooutlier.012
+ n = 123
+ L = 8579612
+ K = 5
+ Creating directory n123-k5-l8579612-rfreq1M-012
+ Writing log to n123-k5-l8579612-rfreq1M-012/infer.log
+ done initializing env
+ .012 detected+ reading (123,8579612) snps from plink/nooutlier.012
8570000 locations read+ initialization begin
+ computing initial heldout likelihood
gsl: gamma.c:1489: ERROR: overflow
done:0.00%Default GSL error handler invoked.
Aborted

EDIT:

This is with GSL 2.3, using the intel compiler suite on 64 bit Centos 6.
As an aside, I've managed to get this dataset to "work" using plink BED files, however it gives nonsense results: For all k > 2, all samples have theta of 0.999995 for one particular population. This suggests to me that read_bed and read_012 are reading differently (the dataset is identical).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions