`torch.nn.ReplicationPad2D` Report "invalid configuration argument" Error under Compute Sanitizer #89254

Kristoff-starling · 2022-11-18T01:00:59Z

🐛 Describe the bug

A test of torch.nn.Replication2D reports "invalid configuration argument" error when it's run under compute sanitizer. Without sanitizers it terminates normally on GPU.
Test:

import torch

def test():
    arg_class = torch.nn.ReplicationPad2d([0,0,30,1024,-1,0])
    arg_tensor = torch.rand([1, 1, 3, 3], dtype=torch.float32).clone().cuda()
    arg = [arg_tensor,]
    res = arg_class(*arg)

test()

Error log:

========= COMPUTE-SANITIZER
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaLaunchKernel.
========= 
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaGetLastError.
========= 
========= ERROR SUMMARY: 2 errors

Versions

PyTorch version: 1.14.0a0+gitbdc9911
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.1 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: 11.1.0-6
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.6 (main, Nov  2 2022, 18:53:38) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090
GPU 2: NVIDIA GeForce RTX 3090

Nvidia driver version: 515.65.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.14.0a0+gitbdc9911
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39he7a7128_1  
[conda] numpy-base                1.21.5           py39hf524024_1  
[conda] numpydoc                  1.2                pyhd3eb1b0_0  
[conda] torch                     1.14.0a0+gitce2f870          pypi_0    pypi

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are

The text was updated successfully, but these errors were encountered:

mruberry · 2022-11-18T08:23:01Z

Likely an issue with error-checking when a CUDA tensor is passed to a module with parameters on the CPU

Zalways · 2024-03-06T01:51:57Z

i met similar issue when i use the exported onnx model to inference on cuda:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running TopK node. Name:'/model/TopK' Status Message: CUDA error cudaErrorInvalidConfiguration:invalid configuration argument

can you help me with my problem?
@mruberry @Kristoff-starling

soulitzer added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 18, 2022

mruberry added module: error checking Bugs related to incorrect/lacking error checking module: padding labels Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch.nn.ReplicationPad2D` Report "invalid configuration argument" Error under Compute Sanitizer #89254

`torch.nn.ReplicationPad2D` Report "invalid configuration argument" Error under Compute Sanitizer #89254

Kristoff-starling commented Nov 18, 2022 •

edited by pytorch-bot bot

Loading

mruberry commented Nov 18, 2022

Zalways commented Mar 6, 2024

torch.nn.ReplicationPad2D Report "invalid configuration argument" Error under Compute Sanitizer #89254

torch.nn.ReplicationPad2D Report "invalid configuration argument" Error under Compute Sanitizer #89254

Comments

Kristoff-starling commented Nov 18, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

mruberry commented Nov 18, 2022

Zalways commented Mar 6, 2024

`torch.nn.ReplicationPad2D` Report "invalid configuration argument" Error under Compute Sanitizer #89254

`torch.nn.ReplicationPad2D` Report "invalid configuration argument" Error under Compute Sanitizer #89254

Kristoff-starling commented Nov 18, 2022 •

edited by pytorch-bot bot

Loading