Skip to content

[BUG] I have pulled the docker images,but when I run it ,I got errors. The errors suggest the images does not support AMD gpu. #68

@sunpian1

Description

@sunpian1

susie.sun@yz-amd1:~$ docker run -it rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed /bin/bash
root@c50e90963e1a:/var/lib/jenkins# deepspeed --num_gpus 1 deploy.py
[2023-12-14 01:52:04,385] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-14 01:52:05,180] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/bin/deepspeed", line 6, in
main()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 422, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available

our AMD gpu is AMD Radeon™ RX 7900 XTX

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions