-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] fatal error: cusolverDn.h: No such file or directory #2684
Comments
Hello @Shaukat-Hussain Could you try |
To follow up on this issue: the root cause is on the pytorch side. They accidentally shipped the For now, please use temporary workaround: Ref: https://discuss.pytorch.org/t/not-able-to-include-cusolverdn-h/169122 Please feel free to reopen the issue if the above solution doesn't work. |
sudo apt install nvidia-cuda-dev |
This can lead to |
I solved this issue by swapping out docker base image. Used And the issue went away. Hope this helps. |
Along with adding to $PATH, make sure CUDA_HOME is also set properly to the nvcc version, that resolved the issue for me |
@HeyangQin Can you help me sir. I have checked nvcc dir is correct. cuda is already added to $PATH but still get this error ERROR TraceBackInstalled CUDA version 11.2 does not match the version torch was compiled with 11.6 but since the APIs are compatible, accepting this combination The above exception was the direct cause of the following exception: Traceback (most recent call last): DS_REPORT:DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at
|
Hi @thanhlong1997. Could you manually check if |
hello,export PATH=/usr/local/cuda/bin:$PATH,I want to ask how to find my cuda dir。This is my command |
For me this solved the issue: export CPATH=/usr/local/cuda/include:$CPATH |
牛逼!I solved problem by this way! |
Another solution if use still want to use conda to manage cuda: simply install |
that's not work for me😭 |
I found that one of the best method is: git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed/
DS_BUILD_CPU_ADAM=1 python setup.py build_ext -j8 bdist_wheel
pip install dist/deepspeed-0.14.3+b6e24adb-cp312-cp312-linux_x86_64.whl |
Describe the bug
When I have installed deepspeed and dependencies gcc and g++ from the given links :
https://lindevs.com/install-gcc-on-ubuntu
https://lindevs.com/install-g-on-ubuntu
I am trying to run in python environment:
import deepspeed
deepspeed.ops.op_builder.CPUAdamBuilder().load()
which should result successful loading of cpu_adam, however, there is error
fatal error: cusolverDn.h: No such file or directory
and other error in the end is:
RuntimeError: Error building extension 'cpu_adam'
I have downloaded the packages
https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/
cuda-license-10-0_10.0.130-1_amd64.deb
cuda-cublas-dev-10-0_10.0.130-1_amd64.deb
cuda-cublas-10-0_10.0.130-1_amd64.deb
cuda-cusolver-10-0_10.0.130-1_amd64.deb
cuda-cusolver-dev-10-0_10.0.130-1_amd64.deb
cuda-curand-10-0_10.0.130-1_amd64.deb
and installed them all, however error does not go away.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 460, in load
return self.jit_load(verbose)
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 495, in jit_load
op_module = load(
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/opt/conda/envs/bitten/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
To Reproduce
Steps to reproduce the behavior:
OS version 18.04 Ubuntu
(bitten) root@C.5718699:
$ nvcc --version$ lsb_release -anvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
(bitten) root@C.5718699:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 On | 00000000:04:00.0 Off | Off |
| 30% 28C P8 18W / 230W | 1MiB / 24256MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 On | 00000000:44:00.0 Off | Off |
| 30% 27C P8 19W / 230W | 1MiB / 24256MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
Please run
ds_report
to give us details about your setup.(bitten) root@C.5718699:~$ ds_report
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
spatial_inference ...... [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/envs/bitten/lib/python3.8/site-packages/torch']
torch version .................... 1.13.1
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/opt/conda/envs/bitten/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6
Please help , thanks
The text was updated successfully, but these errors were encountered: