-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
callsite union splitting #17212
callsite union splitting #17212
Conversation
Sigh, now i need something else to actually test dynamic dispatch. Anyway, 👍 |
Looks like the julia> @noinline f(x) = x
f (generic function with 1 method)
julia> @noinline g(x) = f(x[])
g (generic function with 1 method)
julia> @code_warntype g(Ref{Union{Int,Float64}}())
Variables:
#self#::#g
x::Base.RefValue{Union{Float64,Int64}}
#temp#::LambdaInfo
Body:
begin # REPL[2], line 1:
unless (Core.isa)((Core.getfield)(x::Base.RefValue{Union{Float64,Int64}},:x)::Union{Float64,Int64},Float64)::Any goto 6
#temp#::LambdaInfo = LambdaInfo for f(::Float64)
goto 14
6:
unless (Core.isa)((Core.getfield)(x::Base.RefValue{Union{Float64,Int64}},:x)::Union{Float64,Int64},Int64)::Any goto 10
#temp#::LambdaInfo = LambdaInfo for f(::Int64)
goto 14
10:
goto 12
12:
(Base.error)("error in type inference due to #265")::Any
14:
SSAValue(0) = $(Expr(:invoke, :(#temp#), :(Main.f), :((Core.getfield)(x,:x)::Union{Float64,Int64})))
return SSAValue(0)
end::Union{Float64,Int64} |
yes, but that's a bug in |
This also seems to do a nice job slaughtering a few of #16128 (although perhaps more the result of "disable gcframe rooting on return path of jlcall wrapper" than the call-site splitting) |
append!(stmts, match) | ||
if error_label !== nothing | ||
push!(stmts, error_label) | ||
push!(stmts, Expr(:call, GlobalRef(_topmod(sv.mod), :error), "error in type inference due to #265")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...issue #265"? (This is effectively "jargon.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was not addressed
Not sure I understand the code terribly well, but from the look of it, once there are more than 16 types it throws out the oldest? Are there any circumstances where this will cause a performance degradation? |
Really cool. For me the benchmark in the OP goes from 69 seconds to 32 seconds.
|
As another possible benchmark, here's some code that several of us were playing with at JuliaCon (CC @simonbyrne): using BenchmarkTools
@noinline unstable(x) = x < 0.5 ? 1 : 1.0
@noinline stable(x) = x < 0.5 ? 1.0 : 1.0
@noinline convF64(x) = convert(Float64, x) # or try @inline, both are interesting
function mysum(A)
s = 0.0
for a in A
s += convF64(unstable(a))
end
s
end
function mysum_md(A) # md=manual dispatch
s = 0.0
for a in A
x = unstable(a)
if isa(x, Int)
s += convF64(x::Int)
elseif isa(x, Float64)
s += convF64(x::Float64)
else
error("oops")
end
end
s
end
function mysum_stable(A)
s = 0.0
for a in A
s += convF64(stable(a))
end
s
end
r = rand(10^4)
@show @benchmark mysum(r)
@show @benchmark mysum_md(r)
@show @benchmark mysum_stable(r) On a recent master build:
So the gap between type-stable and type-unstable is roughly 8x (this is already better than it used to be), and the annotated version gets you to within 2.5x of the type-stable code (and eliminates the memory allocation, which I didn't expect). Would be interesting to know whether this PR gets pretty close to that without the need to enumerate the possible types (which is a huge advantage). |
The indirect call cost should be fairly minimal, and it saves us the work of linearizing the IR here. It also allows us to generate much more compact llvm code, since we only have one call site, rather than jlcall frames, etc. For small enough branching factors, I expect that the tradeoff will be worthwhile. This is more general to handle moderate branching factors. |
d2646e1
to
316bfdf
Compare
I looked at the test case above. |
This is pretty cool. My naive question: what is the manual dispatch version now doing that the default version is not? And what would it take to replicate that? |
They use a different representation. This version allows us to generate smaller code, although it doesn't perform as well on synthetic benchmarks at present. I'm expecting Oscar will make a couple PRs soon enough that will close the performance gap. |
provides a faster-path for call-site semi-stable calls.
for example, the usual test code:
(vs. the type-stable version which takes
12.771225 sec
)