-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
In house data uses a form of 10x-Genomics-formatted hdf5 while we only support AnnData.
The basic code snippet doing the conversion is residing here:
http://confluence.corp.alleninstitute.org/display/IT/Transcriptomic+Clustering+Pipelines+with+Test+Datasets
We need to have a robust and easily available utility for converting 10X to AnnData in CSR format
- Add conversion script utility
- Handle exception for the unexpected/missing dataset names.
- Handle possible MemoryError that could result from loading large dataset.
- Check for max value of expression, if possible use uint16 to store counts
Validation:
- Conversion module in utils.py
- An executable running from the command line with progress report/expected conversion time/actual time and memory usage.
- show that both files include the same dataset sizes using h5ls or h5dump
Metadata
Metadata
Assignees
Labels
No labels