[BUG] cudaErrorUnsupportedPtxVersion with cuCIM+CuPy on CUDA 11.5 #170

gigony · 2021-12-01T11:08:57Z

Describe the bug

I'm seeing some poor behavior of latest cucim with cupy and CEC:

In [1]: import cupy as cp

In [2]: a = cp.zeros((3, 3))

In [3]: import cucim

In [4]: a = cp.zeros((3, 3))
---------------------------------------------------------------------------
CUDARuntimeError                          Traceback (most recent call last)
<ipython-input-4-bea1f486f5af> in <module>
----> 1 a = cp.zeros((3, 3))

/datasets/bzaitlen/miniconda3/envs/cucim-2021-11-30/lib/python3.8/site-packages/cupy/_creation/basic.py in zeros(shape, dtype, order)
    207
    208     """
--> 209     a = cupy.ndarray(shape, dtype, order=order)
    210     a.data.memset_async(0, a.nbytes)
    211     return a

cupy/_core/core.pyx in cupy._core.core.ndarray.__init__()

cupy/cuda/memory.pyx in cupy.cuda.memory.alloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.MemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool.malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool._malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.SingleDeviceMemoryPool._alloc()

cupy/cuda/memory.pyx in cupy.cuda.memory._malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory._malloc()

cupy/cuda/memory.pyx in cupy.cuda.memory.Memory.__init__()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.malloc()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.check_status()

CUDARuntimeError: cudaErrorUnsupportedPtxVersion: the provided PTX was compiled with an unsupported toolchain.

Steps/Code to reproduce bug

On ampere architecture GPUs (such as GeForce RTX 3090),

mamba create -n test-cucim -c rapidsai -c conda-forge cucim cudatoolkit=11.2 cupy=9.6
conda activate test-cucim

python
>>> import cupy as cp
>>> import cucim.clara
>>> a = cp.zeros((3,3))

Expected behavior

No errors

Environment details (please complete the following information):

Environment location: [Bare-metal]
Method of cuCIM install: [conda]

Additional context

CMAKE_CUDA_ARCHITECTURES

cucim/cpp/plugins/cucim.kit.cumed/cmake/modules/CuCIMUtils.cmake

Lines 41 to 60 in d6d3af5

    
           # Define CMAKE_CUDA_ARCHITECTURES for the given architecture values 
        
           # 
        
           # Params: 
        
           #   arch_list - architecture value list (e.g., '60;70;75;80;86') 
        
           if(NOT COMMAND cucim_define_cuda_architectures) 
        
               function(cucim_define_cuda_architectures arch_list) 
        
                   set(arch_string "") 
        
                   # Create SASS for all architectures in the list 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${arch}-real") 
        
                   endforeach(arch) 
        
                   # Create PTX for the latest architecture for forward-compatibility. 
        
                   list(GET arch_list -1 latest_arch) 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${latest_arch}-virtual") 
        
                   endforeach(arch) 
        
                   set(CMAKE_CUDA_ARCHITECTURES ${arch_string} PARENT_SCOPE) 
        
               endfunction() 
        
           endif()

cucim/cpp/plugins/cucim.kit.cuslide/cmake/modules/CuCIMUtils.cmake

Lines 41 to 60 in d6d3af5

    
           # Define CMAKE_CUDA_ARCHITECTURES for the given architecture values 
        
           # 
        
           # Params: 
        
           #   arch_list - architecture value list (e.g., '60;70;75;80;86') 
        
           if(NOT COMMAND cucim_define_cuda_architectures) 
        
               function(cucim_define_cuda_architectures arch_list) 
        
                   set(arch_string "") 
        
                   # Create SASS for all architectures in the list 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${arch}-real") 
        
                   endforeach(arch) 
        
                   # Create PTX for the latest architecture for forward-compatibility. 
        
                   list(GET arch_list -1 latest_arch) 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${latest_arch}-virtual") 
        
                   endforeach(arch) 
        
                   set(CMAKE_CUDA_ARCHITECTURES ${arch_string} PARENT_SCOPE) 
        
               endfunction() 
        
           endif()

cucim/cpp/cmake/modules/CuCIMUtils.cmake

Lines 41 to 60 in d6d3af5

    
           # Define CMAKE_CUDA_ARCHITECTURES for the given architecture values 
        
           # 
        
           # Params: 
        
           #   arch_list - architecture value list (e.g., '60;70;75;80;86') 
        
           if(NOT COMMAND cucim_define_cuda_architectures) 
        
               function(cucim_define_cuda_architectures arch_list) 
        
                   set(arch_string "") 
        
                   # Create SASS for all architectures in the list 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${arch}-real") 
        
                   endforeach(arch) 
        
                   # Create PTX for the latest architecture for forward-compatibility. 
        
                   list(GET arch_list -1 latest_arch) 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${latest_arch}-virtual") 
        
                   endforeach(arch) 
        
                   set(CMAKE_CUDA_ARCHITECTURES ${arch_string} PARENT_SCOPE) 
        
               endfunction() 
        
           endif()

cucim/python/cmake/modules/CuCIMUtils.cmake

Lines 41 to 60 in d6d3af5

    
           # Define CMAKE_CUDA_ARCHITECTURES for the given architecture values 
        
           # 
        
           # Params: 
        
           #   arch_list - architecture value list (e.g., '60;70;75;80;86') 
        
           if(NOT COMMAND cucim_define_cuda_architectures) 
        
               function(cucim_define_cuda_architectures arch_list) 
        
                   set(arch_string "") 
        
                   # Create SASS for all architectures in the list 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${arch}-real") 
        
                   endforeach(arch) 
        
                   # Create PTX for the latest architecture for forward-compatibility. 
        
                   list(GET arch_list -1 latest_arch) 
        
                   foreach(arch IN LISTS arch_list) 
        
                       set(arch_string "${arch_string}" "${latest_arch}-virtual") 
        
                   endforeach(arch) 
        
                   set(CMAKE_CUDA_ARCHITECTURES ${arch_string} PARENT_SCOPE) 
        
               endfunction() 
        
           endif()

The error is related to the use of nvcc when no CUDA kernel exists in the code.

# At least one file needs to be compiled with nvcc.
# Otherwise, it will cause `/usr/bin/ld: cannot find -lcudart` error message.
set_source_files_properties(src/cucim.cpp src/filesystem/cufile_driver.cpp PROPERTIES LANGUAGE CUDA)

The text was updated successfully, but these errors were encountered:

Including [rmm](https://github.com/rapidsai/rmm)'s CMakeLists.txt (by using add_subdirectory() method with FetchContent in CMakeList.txt), though it is not used/linked to `libcucim.so`, polluted main libcucim's CMake environment variables (cuCIM was including old `rmm` version whose CMakeLists.txt was not modernized) so PTX code was always included in libcucim.so causing the issue in #170. Since cuCIM currently doesn't use `rmm`, This patch removes rmm dependency completely and makes sure that `libcucim.so` doesn't have PTX code. - Remove `superbuild_depend(rmm)` and add `superbuild_depend(googletest)` - Remove CUDA language in CMakeLists.txt - Fix compilation warnings/errors caused by switching to GCC compiler (instead of nvcc).

gigony added the bug Something isn't working label Dec 1, 2021

gigony added this to the v21.12.00 milestone Dec 1, 2021

gigony self-assigned this Dec 1, 2021

gigony mentioned this issue Dec 1, 2021

Do not compile code with nvcc if no CUDA kernel exists #171

Merged

GPUtester closed this as completed in 28ac81f Dec 2, 2021

gigony mentioned this issue Dec 2, 2021

Remove rmm/nvcc dependencies to fix cudaErrorUnsupportedPtxVersion error #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cudaErrorUnsupportedPtxVersion with cuCIM+CuPy on CUDA 11.5 #170

[BUG] cudaErrorUnsupportedPtxVersion with cuCIM+CuPy on CUDA 11.5 #170

gigony commented Dec 1, 2021

[BUG] cudaErrorUnsupportedPtxVersion with cuCIM+CuPy on CUDA 11.5 #170

[BUG] cudaErrorUnsupportedPtxVersion with cuCIM+CuPy on CUDA 11.5 #170

Comments

gigony commented Dec 1, 2021