Skip to content
This repository has been archived by the owner on May 27, 2021. It is now read-only.

WMMA tests fail on julia-debug #587

Closed
maleadt opened this issue Mar 3, 2020 · 6 comments · Fixed by #591
Closed

WMMA tests fail on julia-debug #587

maleadt opened this issue Mar 3, 2020 · 6 comments · Fixed by #591
Assignees
Labels

Comments

@maleadt
Copy link
Member

maleadt commented Mar 3, 2020

Bunch of these:

Intrinsic has incorrect return type!
[8 x <2 x half>] (i8 addrspace(1)*, i32)* @llvm.nvvm.wmma.m16n16k16.load.a.col.stride.f16.p1i8
Intrinsic has incorrect return type!
[8 x <2 x half>] (i8 addrspace(1)*, i32)* @llvm.nvvm.wmma.m16n16k16.load.b.col.stride.f16.p1i8
Intrinsic has incorrect return type!
[4 x <2 x half>] (i8 addrspace(1)*, i32)* @llvm.nvvm.wmma.m16n16k16.load.c.col.stride.f16.p1i8
Intrinsic has incorrect return type!
[4 x <2 x half>] (<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>)* @llvm.nvvm.wmma.m16n16k16.mma.col.col.f16.f16
in function _Z14ptxcall_kernel13CuDeviceArrayI7Float16Li2E6GlobalE13CuDeviceArrayI7Float16Li2E6GlobalE13CuDeviceArrayI7Float16Li2E6GlobalE13CuDeviceArrayI7Float16Li2E6GlobalE7Float167Float16
MAC: A: ColMajor, B: ColMajor, C: ColMajor, D: RowMajor, C type: Float16, D type: Float16: Error During Test at /home/tim/Julia/pkg/CUDAnative/test/device/wmma.jl:190
  Got exception outside of a @test
  LLVM error: Broken function found, compilation aborted!
  Stacktrace:
   [1] handle_error(::Cstring) at /home/tim/Julia/pkg/LLVM/src/core/context.jl:103
   [2] macro expansion at /home/tim/Julia/pkg/LLVM/src/base.jl:18 [inlined]
   [3] LLVMRunPassManager at /home/tim/Julia/pkg/LLVM/lib/9.0/libLLVM_h.jl:2813 [inlined]
   [4] run! at /home/tim/Julia/pkg/LLVM/src/passmanager.jl:34 [inlined]
   [5] (::CUDAnative.var"#139#144"{LLVM.Module,CUDAnative.var"#initialize!#143"{LLVM.Module,LLVM.TargetMachine}})(::LLVM.ModulePassManager) at /home/tim/Julia/pkg/CUDAnative/src/compiler/optim.jl:24
   [6] LLVM.ModulePassManager(::CUDAnative.var"#139#144"{LLVM.Module,CUDAnative.var"#initialize!#143"{LLVM.Module,LLVM.TargetMachine}}) at /home/tim/Julia/pkg/LLVM/src/passmanager.jl:28
   [7] optimize!(::CUDAnative.CompilerJob, ::LLVM.Module, ::LLVM.Function) at /home/tim/Julia/pkg/CUDAnative/src/compiler/optim.jl:19
   [8] macro expansion at /home/tim/Julia/depot/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:214 [inlined]
   [9] macro expansion at /home/tim/Julia/pkg/CUDAnative/src/compiler/driver.jl:108 [inlined]
   [10] macro expansion at /home/tim/Julia/depot/packages/TimerOutputs/7Id5J/src/TimerOutput.jl:214 [inlined]
   [11] codegen(::Symbol, ::CUDAnative.CompilerJob; libraries::Bool, dynamic_parallelism::Bool, optimize::Bool, strip::Bool, strict::Bool) at /home/tim/Julia/pkg/CUDAnative/src/compiler/driver.jl:96
   [12] compile(::Symbol, ::CUDAnative.CompilerJob; libraries::Bool, dynamic_parallelism::Bool, optimize::Bool, strip::Bool, strict::Bool) at /home/tim/Julia/pkg/CUDAnative/src/compiler/driver.jl:45
   [13] #compile#174 at /home/tim/Julia/pkg/CUDAnative/src/compiler/driver.jl:33 [inlined]
   [14] cufunction_slow(::Function, ::Type{T} where T, ::Int64; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:326
   [15] #222 at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:391 [inlined]
   [16] get!(::CUDAnative.var"#222#223"{Nothing,Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},typeof(kernel),DataType,Int64}, ::Dict{UInt64,CUDAnative.HostKernel}, ::UInt64) at ./dict.jl:450
   [17] cufunction_fast(::Function, ::Type{T} where T, ::Int64; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:390
   [18] cufunction(::typeof(kernel), ::Type{Tuple{CuDeviceArray{Float16,2,CUDAnative.AS.Global},CuDeviceArray{Float16,2,CUDAnative.AS.Global},CuDeviceArray{Float16,2,CUDAnative.AS.Global},CuDeviceArray{Float16,2,CUDAnative.AS.Global},Float16,Float16}}; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:419
   [19] cufunction(::Function, ::Type{T} where T) at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:419
   [20] top-level scope at /home/tim/Julia/pkg/CUDAnative/src/execution.jl:157
   [21] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/device/wmma.jl:233
   [22] top-level scope at /home/tim/Julia/julia/build/release/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1188
   [23] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/device/wmma.jl:190
   [24] top-level scope at /home/tim/Julia/julia/build/release/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1114
   [25] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/device/wmma.jl:190
   [26] top-level scope at /home/tim/Julia/julia/build/release/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1114
   [27] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/device/wmma.jl:11
   [28] include(::String) at ./client.jl:441
   [29] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/runtests.jl:99
   [30] top-level scope at /home/tim/Julia/julia/build/release/usr/share/julia/stdlib/v1.5/Test/src/Test.jl:1114
   [31] top-level scope at /home/tim/Julia/pkg/CUDAnative/test/runtests.jl:11
   [32] include(::Function, ::Module, ::String) at ./Base.jl:380
   [33] include(::Module, ::String) at ./Base.jl:368
   [34] exec_options(::Base.JLOptions) at ./client.jl:288
   [35] _start() at ./client.jl:490

Julia 1.5.0-DEV.383, CUDAnative#master.

@thomasfaingnaert
Copy link
Member

Hmm, I could've sworn that I tested this with a debug build of LLVM and assertions enabled. That may have been when upstream was still on LLVM 6, though.

The issue is that strictly speaking, the return type is e.g. {float, float, float, float} instead of [4 x float]. LLVM didn't seem to care though, when I last tested this.

Are you a fan of adding (yet another...) type to Julia to fix this?

@maleadt
Copy link
Member Author

maleadt commented Mar 3, 2020

Are you a fan of adding (yet another...) type to Julia to fix this?

It would be nice if we could deal with this using existing types, ref JuliaLang/julia#31681 (comment)

@maleadt
Copy link
Member Author

maleadt commented Mar 3, 2020

I don't think this is a quick fix, so we should probably disable the WMMA tests when running in debug mode and print a warning instead.

@thomasfaingnaert
Copy link
Member

thomasfaingnaert commented Mar 3, 2020

I don't think this is a quick fix, so we should probably disable the WMMA tests when running in debug mode and print a warning instead.

I agree, I'd rather not go down the rabbit hole of differentiating NTuple{2, Int} and Tuple{Int, Int} by preserving the Varargs 😄

I suppose what we could do is emit Julia structs as LLVM structures in codegen. Instead of using NTuple{...}, we would have to declare custom structs though, and I'm not sure if there's a better way of doing this than:

struct Foo4{T}
    x1::T
    x2::T
    x3::T
    x4::T
end

struct Foo8{T}
    x1::T
    ...
    x8::T
end

I guess that's not that bad, since all return types are 4 or 8 long, anyway.

@thomasfaingnaert
Copy link
Member

thomasfaingnaert commented Mar 3, 2020

we would have to declare custom structs though, and I'm not sure if there's a better way of doing this than:

Well, it turns out there is a better way by (ab)using Julia's metaprogramming capabilities: thomasfaingnaert@6447332.
If you'd like to go the struct route, I'll do some more testing and then send the relevant PRs.

@maleadt
Copy link
Member Author

maleadt commented Mar 4, 2020

Well, it turns out there is a better way by (ab)using Julia's metaprogramming capabilities: thomasfaingnaert@6447332.

That's not really abuse, but what the metaprogramming capabilities are made for. Using Cartesian's @nexprs is a bit of a stretch, but hey if it works 😄 Please do add some comment though, with some details and a ref to this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants