Skip to content

dRep stalls on large genome sets; unclear output from -d flag #280

@Chandrasekaran-J

Description

@Chandrasekaran-J

Hi again,

I'm running dRep for dereplication of a large genome dataset. I previously faced an issue where it seemed like dRep (or possibly fastANI) silently stalled during execution. As suggested, I re-ran the same command with the -d flag to enable debug output.

..:: dRep dereplicate Step 2. Cluster ::..

07-14 22:45 INFO Running primary clustering
07-14 22:45 INFO Running pair-wise MASH clustering
07-14 22:45 INFO Will split genomes into 9 groups for primary clustering
07-15 05:28 DEBUG Clustering MASH database
07-15 06:29 DEBUG Debug mode on - saving Mdb ASAP
07-15 07:42 DEBUG Debug mode on - saving CdbF ASAP
07-15 07:42 DEBUG Saving primary_linkage pickle to /data/chandrasekaran/drep_completed_genomes/drep_output/data/Clustering_files/
07-15 07:42 INFO 7430 primary clusters made
07-15 07:42 INFO Running secondary clustering
07-15 07:42 INFO Running 32018245 fastANI comparisons- should take ~ 20812.9 min
07-15 07:42 DEBUG running cluster 2588
07-15 07:42 DEBUG /data/chandrasekaran/miniconda3/envs/drep/bin/fastANI --ql /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/tmp/genomeList --rl /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/tmp/genomeList -o /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/fastANI_out_ejqycipfir --matrix -t 6 --minFraction 0 ejqycipfir
07-15 07:42 DEBUG running cluster 3286
07-15 07:42 DEBUG /data/chandrasekaran/miniconda3/envs/drep/bin/fastANI --ql /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/tmp/genomeList --rl /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/tmp/genomeList -o /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/fastANI_out_btqmmtzduj --matrix -t 6 --minFraction 0 btqmmtzduj

With the -d flag I got some additional files:
(base) [chandrasekaran@actinium cmd_logs]$ ls
2025-07-15_07.42.22.045546.CMD 2025-07-15_07.42.22.045546.STDOUT 2025-07-15_07.42.35.013850.STDERR
2025-07-15_07.42.22.045546.STDERR 2025-07-15_07.42.35.013850.CMD 2025-07-15_07.42.35.013850.STDOUT

Out of which the stall of cluster 3286 corresponds to the file 2025-07-15_07.42.35.013850.STDERR :

Kmer size = 16
Fragment length = 3000
Threads = 6
ANI output file = /data/chandrasekaran/drep_completed_genomes/drep_output/data/fastANI_files/fastANI_out_btqmmtzduj
Sanity Check = 0

INFO [thread 0], skch::main, Count of threads executing parallel_for : 6
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 268142983
INFO [thread 0], skch::Sketch::index, unique minimizers = 5582865
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 1752540) ... (27134, 1)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 429.987 sec
INFO [thread 0], skch::main, Start Map 1
.
.
.
INFO [thread 0], skch::main, Start Map 40
INFO [thread 0], skch::main, Time spent mapping fragments in query #40 : 1787.84 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.570964 sec
INFO [thread 0], skch::main, Start Map 41

Here it seems to be have stuck at Start Map 41. Now, I don't understand how to proceed from here, kindly help me debug the issue.

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions