Skip to content

Incorrect placement of ORF9b amino acid substitution adjacent to a deletion (lineage XFC.1) #1670

@LucvZon

Description

@LucvZon

Nextclade appeared to misplace an amino acid (AA) substitution when it occurs immediately before a deletion.

In our analysis of a SARS-CoV-2 sequence (lineage XFC.1), we observed a nucleotide substitution G28360T which should cause an M25I AA change in ORF9b. This substitution is immediately followed by a 9-base deletion from 28361-28370.

Instead of reporting M25I, Nextclade reports a deletion at AA position 25 and incorrectly moves the resulting Isoleucine (I) to AA position 28, which is part of the deleted region.

Here is the relevant snippet from aaChangesGroups in nextclade.json:

[
  {
    "cdsName": "ORF9b",
    "pos": 25,
    "refAa": "M",
    "qryAa": "-", // BUG: Incorrectly reported as a deletion
    "nucPos": 28358,
    "refTriplet": "ATG",
    "qryTriplet": "ATT" // CONTRADICTION: Query triplet is ATT (I), not a deletion
  },
  {
    "cdsName": "ORF9b",
    "pos": 26,
    "refAa": "E",
    "qryAa": "-",
    "nucPos": 28361,
    "refTriplet": "GAG",
    "qryTriplet": "---"
  },
  {
    "cdsName": "ORF9b",
    "pos": 27,
    "refAa": "N",
    "qryAa": "-",
    "nucPos": 28364,
    "refTriplet": "AAC",
    "qryTriplet": "---"
  },
  {
    "cdsName": "ORF9b",
    "pos": 28,
    "refAa": "A",
    "qryAa": "I", // BUG: The 'I' from pos 25 is misplaced here
    "nucPos": 28367,
    "refTriplet": "GCA",
    "qryTriplet": "---" // CONTRADICTION: Query triplet is a deletion, not 'I'
  }
]

The nucleotide G28360T mutation can be seen in the nextclade.aligned.fasta result:

Image

Steps to Reproduce

  1. Analyze a SARS-CoV-2 sequence that has the G28360T substitution and the 28361-28370 deletion. This issue has been reproduced with the following public GISAID accessions from lineage XFC.1:
  • EPI_ISL_20058200
  • EPI_ISL_20066338
  1. Use Nextclade v3.16.0 with nextstrain/sars-cov-2/wuhan-hu-1/orfs.
  2. Inspect the the aaChangesGroups array for the CDS ORF9b in the nextclade.json output.
  3. Observe the nucleotide changes in the nextclade.aligned.fasta to confirm the G->T at ref position 28360 and the deletion at 28361.

I've also attached our input sequence file (change extension to .fasta) and the full nextclade.json result to help with debugging.

input_sequence.txt

nextclade.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    t:bugType: bug, error, something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions