Skip to content

How to run SEACells efficiently on large-scale dataset #70

@koh2ng0

Description

@koh2ng0

Hi,

First of all, thank you for developing the excellent package. I have tried to run the SEACells on our large-scale datasets (~270K cells). While it performed well, it was too slow, taking almost 3 days and 3 hours for model training over 50 iterations.

I tried two approaches: using GPU and CPU.

  1. with GPU
    I attempted to run SEACells with GPU using the following commands:
model = SEACells.core.SEACells(adata, 
                                                        build_kernel_on=build_kernel_on, 
                                                        n_SEACells=n_SEACells, 
                                                        n_waypoint_eigs=n_waypoint_eigs,
                                                        convergence_epsilon = 1e-5,
                                                        use_gpu=True)

However, I encountered the following error:

"OutOfMemoryError: Out of memory allocating 6,121,777,152 bytes (allocated so far: 32,323,490,304 bytes)."
We have 3 GPUs, each with 32768MiB memory. I believed this would be sufficient, so I'm not sure why this error occurred.
Screenshot 2024-07-22 at 15 33 21
Could you guide how to resolve this issue? Additionally, is it possible to utilize more than one GPU for this process?

  1. with CPU
    While it works, it excessively takes too much time.
model = SEACells.core.SEACells(adata,
                                                        build_kernel_on = 'X_scVI',
                                                        n_SEACells = n_SEACells,
                                                        n_waypoint_eigs = n_waypoint_eigs,
                                                        convergence_epsilon = 1e-5,
                                                        use_sparse = True)

Could you recommend solutions to improve the time and memory efficiency for running SEACells on large-scale datasets?

Thank you for your assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions