Skip to content

Add utility to convert inhouse 10x data to adata format #31

@sgratiy

Description

@sgratiy

In house data uses a form of 10x-Genomics-formatted hdf5 while we only support AnnData.

The basic code snippet doing the conversion is residing here:
http://confluence.corp.alleninstitute.org/display/IT/Transcriptomic+Clustering+Pipelines+with+Test+Datasets

We need to have a robust and easily available utility for converting 10X to AnnData in CSR format

  • Add conversion script utility
  • Handle exception for the unexpected/missing dataset names.
  • Handle possible MemoryError that could result from loading large dataset.
  • Check for max value of expression, if possible use uint16 to store counts

Validation:

  • Conversion module in utils.py
  • An executable running from the command line with progress report/expected conversion time/actual time and memory usage.
  • show that both files include the same dataset sizes using h5ls or h5dump

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions