Use dense matrices in diffusion_nn #428

schafferde · 2025-10-24T06:30:06Z

I observed that the i-connectivity matrices in diffusion_nn are actually fairly dense for i > 1, and the use of sparse matrices and operations contributes to the memory and time bottleneck. (And, diffusion_nn consumes most of the time and memory of kBET.) So, I introduced some switches to dense matrices and operations. Of course, there are approximations and samplings that also reduce memory and runtime usage, but it is nice to be able to run the full method for reproducibility and/or in pipelines that call scib specifically (e.g., OpenProblems). In those cases, these changes make kBET much more runnable:

For a representative embedding of hypomap (with ~220k cells of type neuron), peak memory usage was reduced from 1583G to 541G, and overall runtime was reduced by a factor of ~8. On another embedding of hypomap, peak memory was reduced to 513G from >2TB, the limit of the machine. (So, I was only able to run kBET / replicate that portion of the pipeline with these modifications.) As kBET has by far the highest memory usage of any metric, this improves execution of a whole pipeline.

Use dense matrices in diffusion_nn

0a9d882

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use dense matrices in diffusion_nn #428

Use dense matrices in diffusion_nn #428

Uh oh!

schafferde commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Use dense matrices in diffusion_nn #428

Are you sure you want to change the base?

Use dense matrices in diffusion_nn #428

Uh oh!

Conversation

schafferde commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant