-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About deepspeed and fsdp speed differences? #24
Comments
The AttributeError may be caused by ninja compilation error, e.g. missing ninja builder (can be solved by installing ninja As for the speed difference, we haven't check the fsdp speed yet. It is indeed surprising that finetuning on medical dataset can be so efficient under our framework. In fact, instruction finetuning can be even faster: only 1h on llama 30b lora + alpaca in 8*A100 🤗 |
The detailed error message is here: Detected CUDA files, patching ldflags The above exception was the direct cause of the following exception: Traceback (most recent call last): I do not have root permission, after using apt-get install ninja-build, terminal output : And ds_report is here:DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
|
It seems the gcc/g++ compiler is missing here. For users with root permissions, you may try
If you don't have the root permission, you may
Then add the binary folder to your environment variable |
Thank you for your reply, Emitting ninja build file /nvme/zhouyuhang/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja... The above exception was the direct cause of the following exception: Traceback (most recent call last): ================= Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple × python setup.py egg_info did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. × Encountered error while generating package metadata. note: This is an issue with the package mentioned above, not pip. |
The error message has changed, that's a step towards solving the problem. Seems like the gcc compiler is successfully installed, but your local cuda version doesn't support. You may try to install an older version of gcc, e.g. 9.x.x. To specify a package version in conda, you may refer to https://stackoverflow.com/questions/43222407/how-to-list-package-versions-available-with-conda. |
I have tried many different versions, but it doesn't work. It seems that different package versions are strictly matched. Could you please provide the conda list for reference? |
Hello, I have solved this problem!The root cause is on the pytorch side. They accidentally shipped the nvcc with their conda package which breaks the toolchain. ( microsoft/DeepSpeed#2684 is work! ) |
Hi, Thank you! |
Same problem. So how do you solve the problem in the end? Run export PATH=/usr/local/cuda/bin:$PATH or other solutions? Thank you! |
@john20000625 |
@zyuh |
@zyuh |
@john20000625 |
I have encountered some problems which cause that deepspeed cannot be used normally.
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
May I ask if you have compared the speed difference between deepspeed and fsdp? only ~16h finetuning on (llama 30b lora) is surprising
The text was updated successfully, but these errors were encountered: