Skip to content

Issues with inputting unmasked sequences #6

@edwhisnant

Description

@edwhisnant

Hi Jon and Jason,

I am currently testing the use of F2 without the use of RepeatMasker and relying on the F2 default (tantan?). RepeatMasker is currently having issues with skipping the masking of low-complexity repeat sequences (-nolow option), so I have been considering using tantan instead. For most of the genomes I have run through the F2 pipeline, there is no issue. However, for a few, I have run into an error for (I think) how mitochondrial contigs are being handled. It seems to be a genome-specific issue, since most others have run successfully. However, this error did not occur when the genomes were processed with RepeatMasker prior to sending through the F2 pipeline.

Also -- do you have any recommendation/preferences for masking methods prior to running the training and gene predictions? A colleague and I have been trying to decide which masking procedure to go by, and looking at the pros/cons of different kinds of soft-masking.

Thanks!


Here is an example of the error:

Here is the genome file

Loaded training params for Acarospora_socialis_ncbi_gca_025617375.2: ['augustus', 'glimmerhmm', 'snap', 'genemark']�[0m
�[92m[Apr 30 10:39 AM] �[0m�[38;20mtemporary files located in: /tmp/predict_1d27ff67-c777-45f8-b00b-8e4b67c52531�[0m
�[92m[Apr 30 10:39 AM] �[0m�[38;20mLoading genome assembly, running QC checks, searching for mitochondrial contigs, calculating softmasked regions and assembly gaps�[0m
Traceback (most recent call last):
  File "/hpc/group/bio1/ewhisnant/miniconda3/envs/funannotate2/bin/funannotate2", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/hpc/group/bio1/ewhisnant/miniconda3/envs/funannotate2/lib/python3.12/site-packages/funannotate2/__main__.py", line 26, in main
    predict(args)
  File "/hpc/group/bio1/ewhisnant/miniconda3/envs/funannotate2/lib/python3.12/site-packages/funannotate2/predict.py", line 158, in predict
    mito_contigs, _ = align_mito(args.fasta, cpus=args.cpus)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hpc/group/bio1/ewhisnant/miniconda3/envs/funannotate2/lib/python3.12/site-packages/funannotate2/align.py", line 211, in align_mito
    for i in merge_coordinates(coords):
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/hpc/group/bio1/ewhisnant/miniconda3/envs/funannotate2/lib/python3.12/site-packages/funannotate2/utilities.py", line 54, in merge_coordinates
    last_merged[1] = max(last_merged[1], current[1])
    ~~~~~~~~~~~^^^
TypeError: 'tuple' object does not support item assignment

Interestingly, on another genome (which does have a mitochondrial contig), there was no issue:

[Apr 29 04:50 PM] Loading genome assembly, running QC checks, searching for mitochondrial contigs, calculating softmasked regions and assembly gaps
[Apr 29 04:50 PM] Separating 1 mitochondrial contig(s) from the nuclear genome, will recombine at the end of predict
{'scaffold_00284': {'NW_026622727.1': 0.8227168748900229}}
[Apr 29 04:50 PM] Genome stats:
{
  "n_contigs": 413,
  "size": 34610783,
  "softmasked": "0.16%",
  "gaps": "0.16%",
  "n50": 243412,
  "n90": 43471,
  "l50": 43,
  "l90": 170,
  "avg_length": 83803
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions