You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the current behavior
Colab GPU runtime has cudnn8 pre-installed with cuda11.2, but the cudnn library is placed outside $LD_LIBRARY_PATH=/usr/local/nvidia/lib;/usr/local/nvidia/lib64 and/or outside /usr/local/cuda which layman CUDA user would expect it to be. It's placed in /usr/lib/x86_64-linux-gnu without exposure to the colab user:
// current env vars
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
PATH=/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin
This causes training a PyTorch model on CUDA to run into missing cudnn library error for some operators (e.g., convolution):
Describe the expected behavior
Unless we intended users to examine the directory structure and find where the scattered cuda related library packages, we should export $LD_LIBRARY_PATH or some sort to include both cuda and cudnn libraries. So user can just run cuda workload without re-installing or linking.
I also noticed that the pre-installed cudnn version is lower than what we require:
Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.1. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Would it be possible to bump up the version? Also, the GPU box runs cuda 11.2 which is not officially supported by cudnn 8.0.5release note, cuda 11.2 is compatible with cudnn 8.1.0+.
Describe the current behavior
Colab GPU runtime has cudnn8 pre-installed with cuda11.2, but the cudnn library is placed outside
$LD_LIBRARY_PATH=/usr/local/nvidia/lib;/usr/local/nvidia/lib64
and/or outside/usr/local/cuda
which layman CUDA user would expect it to be. It's placed in/usr/lib/x86_64-linux-gnu
without exposure to the colab user:This causes training a PyTorch model on CUDA to run into
missing cudnn
library error for some operators (e.g., convolution):Describe the expected behavior
Unless we intended users to examine the directory structure and find where the scattered cuda related library packages, we should export $LD_LIBRARY_PATH or some sort to include both cuda and cudnn libraries. So user can just run cuda workload without re-installing or linking.
For instance,
What web browser you are using
(Chrome, Firefox, Safari, etc.)
Chrome
Additional context
I was testing torch 1.11 and torch-xla 1.11, before the release -- and found this issue and #2649 .
The text was updated successfully, but these errors were encountered: