Skip to content

Conversation

@dtaylo95
Copy link

As is, the tool requires that variants IDs (both in the input --eqtl file, and --vcf file) be formatted <chrom>_<pos>.... As far as I can tell, there are two reasons for this:

  1. It allows the program to parse the variant's position from its ID.
  2. It meets the formatting requirements used by lmfit in the fitting step.

I am proposing changes that make the tool agnostic to the format of the variant IDs (I can imagine some users have VCFs that use dbSNP rsIDs, for example). Briefly, the changes are as follows:

  1. the --eqtl file now must include two additional columns: variant_chr and variant_pos that describe the (1-based) position of each variant. This information is then used to fetch the genotypes from the tabix-indexed VCF
  2. Variants are assigned unique temporary IDs (a new variant_id_clean column) that meet the formatting requirements of lmfit and are used when fitting the model.
  3. I've also updated the gene_id_clean functionality to match that of the new variant_id_clean column. This assumes no specific formatting of the input gene IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant