speed up panphlan_map.py with intervaltree?

In the code:

```
        with open(reads_file, mode='r') as IN:
            for line in IN:
                words = line.strip().split('\t')
                # words = CONTIG, POSITION, REFERENCE BASE, COVERAGE, READ BASE, QUALITY
                contig, position, abundance = words[0], int(words[1]), int(words[3])
                # For each gene in the contig, if position in range of gene, increase its abundance
                if contig in contig2gene.keys():
                    for gene, (fr,to) in contig2gene[contig].items():
                        if position in range(fr, to+1):
                            genes_abundances[gene] += abundance
        # WRITE
        if args.output == None:
            for g in genes_abundances:
                if genes_abundances[g] > 0:
                    sys.stdout.write(str(g) + '\t' + str(genes_abundances[g]) + '\n')
```

you are searching for positions in a range via `if position in range(fr, to+1)`. Changing `contig2gene` to an [intervaltree](https://pypi.org/project/intervaltree/) data structure will likely substantially speed this up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speed up panphlan_map.py with intervaltree? #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

speed up panphlan_map.py with intervaltree? #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions