-
Notifications
You must be signed in to change notification settings - Fork 22
Implementation of soft-clipping and CIGAR computation #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I looked at the four commits and all the changes look great! I tested the soft-clipping with a couple of small examples and I found a case with the For these targets, when running the command Also in the case of running without the I have some fixes which I can commit, but we can also have a discussion before going forward. |
1868d67 to
d7f38a9
Compare
src/PufferfishAligner.cpp
Outdated
| // aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | ||
| // aconf.allowSoftclip = mopts->allowSoftclip; | ||
| aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | ||
| aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to have endBonus in the aconf struct anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it.
src/PufferfishAligner.cpp
Outdated
| // aconf.allowOverhangSoftclip = mopts->allowOverhangSoftclip; | ||
| // aconf.allowSoftclip = mopts->allowSoftclip; | ||
| aconf.computeCIGAR = (mopts->computeCIGAR and !mopts->noOutput); | ||
| aconf.endBonus = mopts->endBonus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we might need to have endBonus in the aconf struct anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it from here as well.
|
Based on my review and the conversations we had about the changes before, everything looks good to me. |
…d overhang cases happening together
85c702c to
cf54093
Compare
Here is a short summary of the changes:
end_bonusvalue from 10 to 5KSW_EZ_EXTZ_ONLYflag for KSW2 extension alignments (otherwise it always backtrack from the end of both sequences)CIGARGenerator's variable type to enable computation of CIGAR for cases with overlapping MEMs (uint32_ttoint32_t)--allowSoftclipand--allowOverhangSoftclipwith--computeCIGAR--debug; if not used, no debug information will be printedWith these changes I compared the output of PuffAligner with Minimap2 for single-end reads. This comparison showed that generally, the alignment is computed correctly in most of the cases.
The next essential step: In some cases the alignment is wrong. The reason is that the extension is stopped (ez->stopped==1) and therefore, the score for reaching the end of query is not calculated properly. As a result, it might be the case that score for reaching end of query is incorrectly higher than the maximum scoring prefix alignment. Here is an example:
It seems that extension stopping was not written with soft-clipping and computation of CIGAR strings in mind, so this requires a fix.
I will check if the original KSW2 extension fixes this issue and if it does, we can discuss about whether it's worth to have the stopping feature or not.
So next items:
end_bonusvalue