Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractIrrational does not play nice with CUDA #73

Closed
Red-Portal opened this issue Aug 15, 2023 · 11 comments
Closed

AbstractIrrational does not play nice with CUDA #73

Red-Portal opened this issue Aug 15, 2023 · 11 comments

Comments

@Red-Portal
Copy link

Hi, it seems that many of the functions are not compatible with CUDA.jl out of the box due to dynamic precision (?). Here's a MWE:

LogExpFunctions.log1mexp.(CuVector([-1f0, -2f0, -3f0]))
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#broadcast_kernel#26")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to var"#setprecision#25"(kws::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::typeof(setprecision), f::Function, ::Type{T}, prec::Integer) where T @ Base.MPFR mpfr.jl:969)
Stacktrace:
 [1] setprecision
   @ ./mpfr.jl:969
 [2] Type
   @ ./irrationals.jl:69
 [3] <
   @ ./irrationals.jl:96
 [4] log1mexp
   @ ~/.julia/packages/LogExpFunctions/jq98q/src/basicfuns.jl:234
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:59
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:415 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:414 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:129
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
 [10] compile
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
 [11] #1037
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
 [12] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
 [13] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
 [14] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
 [15] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
 [16] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
 [17] macro expansion
    @ ./lock.jl:267 [inlined]
 [18] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
 [19] cufunction
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
 [20] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
 [21] #launch_heuristic#1080
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
 [22] launch_heuristic
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
 [23] _copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
 [24] copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
 [25] copy
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
 [26] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(log1mexp), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast ./broadcast.jl:873
 [27] top-level scope
    @ REPL[22]:1
 [28] top-level scope
    @ ~/.julia/packages/CUDA/tVtYo/src/initialization.j

Simply changing the definition of log1mexp to the following fixes the issue:

log1mexp_cuda(x::T) where {T <: Real} = x < log(T(1)/2) ? log1p(-exp(x)) : log(-expm1(x))
julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 -0.4586752
 -0.14541346
 -0.051069178

Do we really need IrrationalConstants here?

@devmotion
Copy link
Member

devmotion commented Aug 16, 2023

What exactly is the problem here? IrrationalConstants works in exactly the same way as the irrational constants in Base, so I wonder if the same problem can be provoked with e.g. pi instead of IrrationalConstants.loghalf. One advantage of these irrational constants is that they are precomputed for e.g. Float32 and Float64 but allow precise calculations also with other types and functions.

I'm very surprised that CUDA cares about the BigFloat methods if clearly only the Float32 constant is needed. Generally, I'm hesistant to remove IrrationalConstants since it is generally useful and used in Base and throughout the ecosystem, so it seems this problem should be fixed in a different way.

@Red-Portal
Copy link
Author

it seems this problem should be fixed in a different way.

Let me try to summon the CUDA experts.

@Red-Portal
Copy link
Author

I spoke with Tim Besard; it seems there is no easy way to do this as long as BigFloat is involved. It's because some of the BigFloat conversions call the libmpfr CPU library, which CUDA can't support.

@devmotion
Copy link
Member

BigFloat should not be involved here - for irrationals in Base and IrrationalConstants, Float32(::MyIrrational) is explicitly defined and set to a constant precomputed value (the same for Float64).

@devmotion
Copy link
Member

I figured out what's going on: The fallback definitions of the comparison operators (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L96 and surrounding lines) are based not on Float32(x) but Float32(x, RoundDown) - which in contrast to Float32(x) is not defined with a constant but implemented dynamically based on BigFloat (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L68-L72).

I wonder if we should extend the @irrational macros in Base and IrrationalConstants and define Float64(x, RoundDown/RoundUp) and Float32(x, RoundDown/RoundUp) explicitly statically using constants to avoid these dynamic dispatches at least for the common case where the irrational is defined with the macro.

@devmotion devmotion changed the title IrrationalConstants doesn't play nice with CUDA AbstractIrrational does not play nice with CUDA Aug 17, 2023
@devmotion
Copy link
Member

As suspected, the error is not IrrationalConstants specific: For instance,

julia> using CUDA, IrrationalConstants

julia> log1mexp_cuda(x::Real) = twoπ*exp(x) < π ? log1p(-exp(x)) : log(-expm1(x))
log1mexp_cuda (generic function with 1 method)

julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
...

errors as well. I updated the title of the issue to reflect this.

@Red-Portal
Copy link
Author

Red-Portal commented Aug 17, 2023

Oh I see! I was scratching my head looking at the Float32(x, RoundDown) and wondering what it should have been. Shouldn't be handled upstream rather than overriding the behavior downstream? I think this issue might pop up in other places that depend on AbstractIrrational too.

@devmotion
Copy link
Member

Sure, it will be present in basically all code paths that involve comparisons of FloatXX with AbstractIrrationals.

@devmotion
Copy link
Member

The general issue still exists but should maybe raised upstream. The case in the OP was fixed by #75.

@Red-Portal
Copy link
Author

Okay, then I'll close this for now. I'll raise this upstream some time.

@Tuebel
Copy link

Tuebel commented Aug 22, 2023

One addition: I ran into the same problem starting with Julia 1.9 (1.8 works fine) and opened an issue on the CUDA project which was moved to the GPUCompiler project: JuliaGPU/GPUCompiler.jl#384
Seems that the underlying issue with irrationals is not easy to resolve, so thanks for the effort here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants