Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.nn.ReplicationPad2D Report "invalid configuration argument" Error under Compute Sanitizer #89254

Open
Kristoff-starling opened this issue Nov 18, 2022 · 2 comments
Labels
module: error checking Bugs related to incorrect/lacking error checking module: nn Related to torch.nn module: padding triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Kristoff-starling
Copy link
Contributor

Kristoff-starling commented Nov 18, 2022

🐛 Describe the bug

A test of torch.nn.Replication2D reports "invalid configuration argument" error when it's run under compute sanitizer. Without sanitizers it terminates normally on GPU.
Test:

import torch

def test():
    arg_class = torch.nn.ReplicationPad2d([0,0,30,1024,-1,0])
    arg_tensor = torch.rand([1, 1, 3, 3], dtype=torch.float32).clone().cuda()
    arg = [arg_tensor,]
    res = arg_class(*arg)

test()

Error log:

========= COMPUTE-SANITIZER
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaLaunchKernel.
========= 
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaGetLastError.
========= 
========= ERROR SUMMARY: 2 errors

Versions

PyTorch version: 1.14.0a0+gitbdc9911
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: 11.1.0-6
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 515.65.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.14.0a0+gitbdc9911
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39he7a7128_1  
[conda] numpy-base                1.21.5           py39hf524024_1  
[conda] numpydoc                  1.2                pyhd3eb1b0_0  
[conda] torch                     1.14.0a0+gitce2f870          pypi_0    pypi

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are

@soulitzer soulitzer added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 18, 2022
@mruberry mruberry added module: error checking Bugs related to incorrect/lacking error checking module: padding labels Nov 18, 2022
@mruberry
Copy link
Collaborator

Likely an issue with error-checking when a CUDA tensor is passed to a module with parameters on the CPU

@Zalways
Copy link

Zalways commented Mar 6, 2024

i met similar issue when i use the exported onnx model to inference on cuda:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument

can you help me with my problem?
@mruberry @Kristoff-starling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: error checking Bugs related to incorrect/lacking error checking module: nn Related to torch.nn module: padding triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants