-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Nextclade appeared to misplace an amino acid (AA) substitution when it occurs immediately before a deletion.
In our analysis of a SARS-CoV-2 sequence (lineage XFC.1), we observed a nucleotide substitution G28360T which should cause an M25I AA change in ORF9b. This substitution is immediately followed by a 9-base deletion from 28361-28370.
Instead of reporting M25I, Nextclade reports a deletion at AA position 25 and incorrectly moves the resulting Isoleucine (I) to AA position 28, which is part of the deleted region.
Here is the relevant snippet from aaChangesGroups in nextclade.json:
[
{
"cdsName": "ORF9b",
"pos": 25,
"refAa": "M",
"qryAa": "-", // BUG: Incorrectly reported as a deletion
"nucPos": 28358,
"refTriplet": "ATG",
"qryTriplet": "ATT" // CONTRADICTION: Query triplet is ATT (I), not a deletion
},
{
"cdsName": "ORF9b",
"pos": 26,
"refAa": "E",
"qryAa": "-",
"nucPos": 28361,
"refTriplet": "GAG",
"qryTriplet": "---"
},
{
"cdsName": "ORF9b",
"pos": 27,
"refAa": "N",
"qryAa": "-",
"nucPos": 28364,
"refTriplet": "AAC",
"qryTriplet": "---"
},
{
"cdsName": "ORF9b",
"pos": 28,
"refAa": "A",
"qryAa": "I", // BUG: The 'I' from pos 25 is misplaced here
"nucPos": 28367,
"refTriplet": "GCA",
"qryTriplet": "---" // CONTRADICTION: Query triplet is a deletion, not 'I'
}
]The nucleotide G28360T mutation can be seen in the nextclade.aligned.fasta result:
Steps to Reproduce
- Analyze a SARS-CoV-2 sequence that has the G28360T substitution and the 28361-28370 deletion. This issue has been reproduced with the following public GISAID accessions from lineage XFC.1:
- EPI_ISL_20058200
- EPI_ISL_20066338
- Use Nextclade v3.16.0 with nextstrain/sars-cov-2/wuhan-hu-1/orfs.
- Inspect the the aaChangesGroups array for the CDS ORF9b in the nextclade.json output.
- Observe the nucleotide changes in the nextclade.aligned.fasta to confirm the G->T at ref position 28360 and the deletion at 28361.
I've also attached our input sequence file (change extension to .fasta) and the full nextclade.json result to help with debugging.