Skip to content

Handle duplicate genes #71

@JakeSAI

Description

@JakeSAI

Both the large and medium datasets on /allen contain duplicate gene names with non-duplicate gene values. Assuming these samples aren't outliers, we need a process for de-duplicating gene names in the pipeline as the duplicate ids are not only obfuscating for users but cause projection to crash.

Tasks:

  • Check if scanpy has any preprocessing data validation methods that may apply here
  • Decide on deduplication or duplicate rejection solution with team
  • Implement solution

Validation:

  • Run pipeline on dataset with duplicate genes and establish that duplicate genes are handled without manual intervention

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions