Skip to content

Conversation

@schafferde
Copy link

I observed that the i-connectivity matrices in diffusion_nn are actually fairly dense for i > 1, and the use of sparse matrices and operations contributes to the memory and time bottleneck. (And, diffusion_nn consumes most of the time and memory of kBET.) So, I introduced some switches to dense matrices and operations. Of course, there are approximations and samplings that also reduce memory and runtime usage, but it is nice to be able to run the full method for reproducibility and/or in pipelines that call scib specifically (e.g., OpenProblems). In those cases, these changes make kBET much more runnable:

For a representative embedding of hypomap (with ~220k cells of type neuron), peak memory usage was reduced from 1583G to 541G, and overall runtime was reduced by a factor of ~8. On another embedding of hypomap, peak memory was reduced to 513G from >2TB, the limit of the machine. (So, I was only able to run kBET / replicate that portion of the pipeline with these modifications.) As kBET has by far the highest memory usage of any metric, this improves execution of a whole pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant