Skip to content

mateIsFwd never gets a value set for unpaired hits #46

@jenni-westoby

Description

@jenni-westoby

In RapMap v.0.5.0, the value of mateIsFwd is never set for unpaired hits in the function mergeLeftRightHits, in the file include/RapMapUtils.hpp. The function in question is below


        inline void mergeLeftRightHits(
                std::vector<QuasiAlignment>& leftHits,
                std::vector<QuasiAlignment>& rightHits,
                std::vector<QuasiAlignment>& jointHits,
                uint32_t readLen,
                uint32_t maxNumHits,
                bool& tooManyHits,
                HitCounters& hctr) {
            if (leftHits.size() > 0) {
                constexpr const int32_t signedZero{0};
                auto leftIt = leftHits.begin();
                auto leftEnd = leftHits.end();
                auto leftLen = std::distance(leftIt, leftEnd);
                if (rightHits.size() > 0) {
                    auto rightIt = rightHits.begin();
                    auto rightEnd = rightHits.end();
                    auto rightLen = std::distance(rightIt, rightEnd);
                    size_t numHits{0};
                    jointHits.reserve(std::min(leftLen, rightLen));
                    uint32_t leftTxp, rightTxp;
                    while (leftIt != leftEnd && rightIt != rightEnd) {
                        leftTxp = leftIt->tid;
                        rightTxp = rightIt->tid;
                        if (leftTxp < rightTxp) {
                            ++leftIt;
                        } else {
                            if (!(rightTxp < leftTxp)) {
                                int32_t startRead1 = std::max(leftIt->pos, signedZero);
                                int32_t startRead2 = std::max(rightIt->pos, signedZero);
                                bool read1First{(startRead1 < startRead2)};
                                int32_t fragStartPos = read1First ? startRead1 : startRead2;
                                int32_t fragEndPos = read1First ?
                                    (startRead2 + rightIt->readLen) : (startRead1 + leftIt->readLen);
                                uint32_t fragLen = fragEndPos - fragStartPos;
                                jointHits.emplace_back(leftTxp,
                                        startRead1,
                                        leftIt->fwd,
                                        leftIt->readLen,
                                        fragLen, true);
                                // Fill in the mate info
                                auto& qaln = jointHits.back();
                                qaln.mateLen = rightIt->readLen;
                                qaln.matePos = startRead2;
                                qaln.mateIsFwd = rightIt->fwd;
                                jointHits.back().mateStatus = MateStatus::PAIRED_END_PAIRED;

                                ++numHits;
                                if (numHits > maxNumHits) { tooManyHits = true; break; }
                                ++leftIt;
                            }
                            ++rightIt;
                        }
                    }
                }
                if (tooManyHits) { jointHits.clear(); ++hctr.tooManyHits; }
            }

            // If we had proper paired hits
            if (jointHits.size() > 0) {
                hctr.peHits += jointHits.size();
                //orphanStatus = 0;
            } else if (leftHits.size() + rightHits.size() > 0 and !tooManyHits) {
                // If there weren't proper paired hits, then either
                // there were too many hits, and we forcibly discarded the read
                // or we take the single end hits.
                auto numHits = leftHits.size() + rightHits.size();
                hctr.seHits += numHits;
                //orphanStatus = 0;
                //orphanStatus |= (leftHits.size() > 0) ? 0x1 : 0;
                //orphanStatus |= (rightHits.size() > 0) ? 0x2 : 0;
                jointHits.insert(jointHits.end(),
                        std::make_move_iterator(leftHits.begin()),
                        std::make_move_iterator(leftHits.end()));
                jointHits.insert(jointHits.end(),
                        std::make_move_iterator(rightHits.begin()),
                        std::make_move_iterator(rightHits.end()));
            }
        }

The problem occurs because in the final else if (leftHits.size() + rightHits.size() > 0 and !tooManyHits) statement, hits are appended to the end of joint hits without updating the unset mateIsFwd field. Instead mateIsFwd is 'randomly' filled with a garbage value which appears to have some dependency on the previous read which was aligned. The value which is assigned to mateIsFwd then has an impact on the SAM flag because mateIsFwd is used to determine the SAM flag in getSamFlags. Consequently, read SRR5237781.4 in problem4_1.fastq and problem4_2.fastq usually produces the following alignment on my machine:

SRR5237781.4	89	ENSMUST00000020488	36263	1	100M	=	36263	0	ATCCTTCTGAGGTTAGGGATCTAGGGGAAGTGGGGGATGTGCTCTGAATTAACTTTGTTAGGTAATTTAGAATATAGGAATCTCTAACATTGGACTGAAC	*	NH:i:1
SRR5237781.4	165	ENSMUST00000020488	36263	0	*	=	36263	0	GTAAGGCCAGCAATCCTTCTGAGGTTAGGGATCTAGGGGAAGTGGGGGATGTGCTCTGAATTAACTTTGTTAGGTAATTTAGAATATAGGAATCTCTAAC	*	NH:i:1

Whereas the same read in problem4_3read_1.fastq and problem4_3read_2.fastq usually produces this alignment (virtually identical except different SAM flags):

SRR5237781.4	121	ENSMUST00000020488	36263	1	100M	=	36263	0	ATCCTTCTGAGGTTAGGGATCTAGGGGAAGTGGGGGATGTGCTCTGAATTAACTTTGTTAGGTAATTTAGAATATAGGAATCTCTAACATTGGACTGAAC	*	NH:i:1
SRR5237781.4	181	ENSMUST00000020488	36263	0	*	=	36263	0	GTAAGGCCAGCAATCCTTCTGAGGTTAGGGATCTAGGGGAAGTGGGGGATGTGCTCTGAATTAACTTTGTTAGGTAATTTAGAATATAGGAATCTCTAAC	*	NH:i:1

This problem does not occur for proper paired hits, because mateIsFwd is correctly set in the if (!(rightTxp < leftTxp)) code block. On the develop-salmon branch, read SRR5237781.4 produces proper paired hits, so this test example does not work (or rather, it does work, because mateIsFwd is correctly set in if (!(rightTxp < leftTxp)) ). Unfortunately I have not yet been able to find as nice a test example for the develop-salmon branch yet, but from reading the code I believe that this might also be a problem on the develop-salmon branch. The fix would be to set the value of mateIsFwd in the final else if statement of mergeLeftRightHits().

problem4_1.fastq.gz
problem4_2.fastq.gz
problem4_3read_1.fastq.gz
problem4_3read_2.fastq.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions