Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug with find.lua - Multiple GPUs #319

Open
hashbangCoder opened this issue Jan 30, 2017 · 2 comments
Open

Potential Bug with find.lua - Multiple GPUs #319

hashbangCoder opened this issue Jan 30, 2017 · 2 comments

Comments

@hashbangCoder
Copy link

hashbangCoder commented Jan 30, 2017

Hi,
I have no idea how this cropped up, but require 'cudnn' threw an out of memory error

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-7735/cutorch/init.c line=261 error=2 : out of memory
~/distro/install/share/lua/5.1/trepl/init.lua:389: 
~/distro/install/share/lua/5.1/trepl/init.lua:389: 
~/distro/install/share/lua/5.1/cudnn/find.lua:165: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-7735/cutorch/init.c:261

This is strange because I have 4 GPUs (all TitanX; 2 idle and 2 busy) which can be detected by cutorch.getDeviceCount(), after explicitly setting cutorch.setDevice() to an idle device and verifying that current GPU is indeed idle using cutorch.getDevice() and cutorch.getMemoryUsage().

For some weird reason, calling require 'cudnn' sets the current device to a busy one with all the memory occupied. After digging a little into the traceback, I found that in init.lua find.reset() is called with cutorch.synchronizeAll() here. In cutorch's init.c, this call cycles through all available GPUs and performs a synchronize()
Changing this to cutorch.synchronize() seems to solve this error, although I dont know if I've broken anything else.

I've tried updating all the cudnn, cunn and cutorch modules to the latest. Finally also tried a fresh install of torch, to no effect.
Please let me know If I'm missing something obvious here.

OS - Ubuntu 14.04
CUDA - 7.5
cuDNN - 5103
GPUs - 4 Nvidia TitanX
The 2 busy GPUs are running tensorflow which I think allocates all the memory by default.

EDIT - making that change to find.lua breaks the code.

 cublas runtime error : library not initialized at /tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCGeneral.c:378

Also tried setting CUDA_VISIBLE_DEVICES to a single GPU. This causes a long traceback to be printed

/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [116,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [117,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [118,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [119,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [120,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [121,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [122,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCTensorIndex.cu:321: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [53,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

~/distro/install/share/lua/5.1/nn/Linear.lua:66: cublas runtime error : library not initialized at /tmp/luarocks_cutorch-scm-1-1387/cutorch/lib/THC/THCGeneral.c:378
@ngimel
Copy link
Collaborator

ngimel commented Jan 30, 2017

Preferred method to use a subset of GPUs is setting CUDA_VISIBLE_DEVICES, otherwise torch will try to create context on all the GPUs, and with memory on your "busy" GPUs already allocated that could fail. Setting CUDA_VISIBLE_DEVICES to a single GPU should work. Do you have a repro where it fails? The errors that you have (cublas not initialized) are totally unrelated to cudnn.torch, looks like something is wrong with the setup. I also suspect that require 'cutorch' would result in the same error.

@hashbangCoder
Copy link
Author

hashbangCoder commented Jan 30, 2017

Thanks for the quick response.

Setting CUDA_VISIBLE_DEVICES to a single GPU should work.

I did just that. Tried it with each idle GPUs (one at a time) which leads to the cublas error.

I also suspect that require 'cutorch' would result in the same error.

The reason I posted it on this repo is because require cutorch or require cunn works just fine. Its require cudnn which is the problem. Infact I used cutorch to verify the current device and memory usage.

looks like something is wrong with the setup

I have all the paths (cuda/cudnn) set correctly. If its incorrect, then cutorch or cunn shouldn't load right?
PATH set to PATH=$PATH:/usr/local/cuda-7.5/bin and LD_LIBRARY_PATH=/home/user/cuda/lib64/:$LD_LIBRARY_PATH

Do you have a repro where it fails?

Not sure what this means. You mean like an example code/scenario? Just running require 'cutorch'; require 'cunn'; require 'cudnn' causes this error.

Also, I checked again just now when all GPUs are idle and require cudnn loads without any issues. I'm only facing problems when the some GPUs are occupied in a multi-GPU server. Also using CUDA_VISIBLE_DEVICES set to any GPU causes it to crash (cublas error above) at all times.

EDIT - This recent cutorch issue seems very relevant to mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants