`AbstractIrrational` does not play nice with CUDA #73

Red-Portal · 2023-08-15T23:36:55Z

Hi, it seems that many of the functions are not compatible with CUDA.jl out of the box due to dynamic precision (?). Here's a MWE:

LogExpFunctions.log1mexp.(CuVector([-1f0, -2f0, -3f0]))

ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#broadcast_kernel#26")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to var"#setprecision#25"(kws::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::typeof(setprecision), f::Function, ::Type{T}, prec::Integer) where T @ Base.MPFR mpfr.jl:969)
Stacktrace:
 [1] setprecision
   @ ./mpfr.jl:969
 [2] Type
   @ ./irrationals.jl:69
 [3] <
   @ ./irrationals.jl:96
 [4] log1mexp
   @ ~/.julia/packages/LogExpFunctions/jq98q/src/basicfuns.jl:234
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:59
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:415 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:414 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:129
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
 [10] compile
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
 [11] #1037
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
 [12] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
 [13] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
 [14] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
 [15] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
 [16] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
 [17] macro expansion
    @ ./lock.jl:267 [inlined]
 [18] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
 [19] cufunction
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
 [20] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
 [21] #launch_heuristic#1080
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
 [22] launch_heuristic
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
 [23] _copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
 [24] copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
 [25] copy
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
 [26] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(log1mexp), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast ./broadcast.jl:873
 [27] top-level scope
    @ REPL[22]:1
 [28] top-level scope
    @ ~/.julia/packages/CUDA/tVtYo/src/initialization.j

Simply changing the definition of log1mexp to the following fixes the issue:

log1mexp_cuda(x::T) where {T <: Real} = x < log(T(1)/2) ? log1p(-exp(x)) : log(-expm1(x))

julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 -0.4586752
 -0.14541346
 -0.051069178

Do we really need IrrationalConstants here?

The text was updated successfully, but these errors were encountered:

devmotion · 2023-08-16T05:23:24Z

What exactly is the problem here? IrrationalConstants works in exactly the same way as the irrational constants in Base, so I wonder if the same problem can be provoked with e.g. pi instead of IrrationalConstants.loghalf. One advantage of these irrational constants is that they are precomputed for e.g. Float32 and Float64 but allow precise calculations also with other types and functions.

I'm very surprised that CUDA cares about the BigFloat methods if clearly only the Float32 constant is needed. Generally, I'm hesistant to remove IrrationalConstants since it is generally useful and used in Base and throughout the ecosystem, so it seems this problem should be fixed in a different way.

Red-Portal · 2023-08-16T19:38:58Z

it seems this problem should be fixed in a different way.

Let me try to summon the CUDA experts.

Red-Portal · 2023-08-17T19:01:44Z

I spoke with Tim Besard; it seems there is no easy way to do this as long as BigFloat is involved. It's because some of the BigFloat conversions call the libmpfr CPU library, which CUDA can't support.

devmotion · 2023-08-17T20:54:34Z

BigFloat should not be involved here - for irrationals in Base and IrrationalConstants, Float32(::MyIrrational) is explicitly defined and set to a constant precomputed value (the same for Float64).

devmotion · 2023-08-17T22:32:23Z

I figured out what's going on: The fallback definitions of the comparison operators (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L96 and surrounding lines) are based not on Float32(x) but Float32(x, RoundDown) - which in contrast to Float32(x) is not defined with a constant but implemented dynamically based on BigFloat (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L68-L72).

I wonder if we should extend the @irrational macros in Base and IrrationalConstants and define Float64(x, RoundDown/RoundUp) and Float32(x, RoundDown/RoundUp) explicitly statically using constants to avoid these dynamic dispatches at least for the common case where the irrational is defined with the macro.

devmotion · 2023-08-17T22:38:41Z

As suspected, the error is not IrrationalConstants specific: For instance,

julia> using CUDA, IrrationalConstants

julia> log1mexp_cuda(x::Real) = twoπ*exp(x) < π ? log1p(-exp(x)) : log(-expm1(x))
log1mexp_cuda (generic function with 1 method)

julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
...

errors as well. I updated the title of the issue to reflect this.

Red-Portal · 2023-08-17T22:50:11Z

Oh I see! I was scratching my head looking at the Float32(x, RoundDown) and wondering what it should have been. Shouldn't be handled upstream rather than overriding the behavior downstream? I think this issue might pop up in other places that depend on AbstractIrrational too.

devmotion · 2023-08-17T23:04:15Z

Sure, it will be present in basically all code paths that involve comparisons of FloatXX with AbstractIrrationals.

devmotion · 2023-08-18T08:36:17Z

The general issue still exists but should maybe raised upstream. The case in the OP was fixed by #75.

Red-Portal · 2023-08-21T22:40:11Z

Okay, then I'll close this for now. I'll raise this upstream some time.

Tuebel · 2023-08-22T08:21:06Z

One addition: I ran into the same problem starting with Julia 1.9 (1.8 works fine) and opened an issue on the CUDA project which was moved to the GPUCompiler project: JuliaGPU/GPUCompiler.jl#384
Seems that the underlying issue with irrationals is not easy to resolve, so thanks for the effort here!

devmotion changed the title ~~IrrationalConstants doesn't play nice with CUDA~~ AbstractIrrational does not play nice with CUDA Aug 17, 2023

devmotion mentioned this issue Aug 17, 2023

Fix CUDA issue with irrational constant #75

Merged

Red-Portal closed this as completed Aug 21, 2023

Red-Portal mentioned this issue Aug 25, 2023

Comparison operators with AbstractIrrational are GPU incompatible JuliaLang/julia#51058

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AbstractIrrational` does not play nice with CUDA #73

`AbstractIrrational` does not play nice with CUDA #73

Red-Portal commented Aug 15, 2023

devmotion commented Aug 16, 2023 •

edited

Loading

Red-Portal commented Aug 16, 2023

Red-Portal commented Aug 17, 2023

devmotion commented Aug 17, 2023

devmotion commented Aug 17, 2023

devmotion commented Aug 17, 2023

Red-Portal commented Aug 17, 2023 •

edited

Loading

devmotion commented Aug 17, 2023

devmotion commented Aug 18, 2023

Red-Portal commented Aug 21, 2023

Tuebel commented Aug 22, 2023

AbstractIrrational does not play nice with CUDA #73

AbstractIrrational does not play nice with CUDA #73

Comments

Red-Portal commented Aug 15, 2023

devmotion commented Aug 16, 2023 • edited Loading

Red-Portal commented Aug 16, 2023

Red-Portal commented Aug 17, 2023

devmotion commented Aug 17, 2023

devmotion commented Aug 17, 2023

devmotion commented Aug 17, 2023

Red-Portal commented Aug 17, 2023 • edited Loading

devmotion commented Aug 17, 2023

devmotion commented Aug 18, 2023

Red-Portal commented Aug 21, 2023

Tuebel commented Aug 22, 2023

`AbstractIrrational` does not play nice with CUDA #73

`AbstractIrrational` does not play nice with CUDA #73

devmotion commented Aug 16, 2023 •

edited

Loading

Red-Portal commented Aug 17, 2023 •

edited

Loading