-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Both the large and medium datasets on /allen contain duplicate gene names with non-duplicate gene values. Assuming these samples aren't outliers, we need a process for de-duplicating gene names in the pipeline as the duplicate ids are not only obfuscating for users but cause projection to crash.
Tasks:
- Check if scanpy has any preprocessing data validation methods that may apply here
- Decide on deduplication or duplicate rejection solution with team
- Implement solution
Validation:
- Run pipeline on dataset with duplicate genes and establish that duplicate genes are handled without manual intervention
Metadata
Metadata
Assignees
Labels
No labels