Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some inlining cost model updates #35235

Merged
merged 2 commits into from
Jun 4, 2021
Merged

some inlining cost model updates #35235

merged 2 commits into from
Jun 4, 2021

Conversation

JeffBezanson
Copy link
Sponsor Member

  • treat arrayset like arrayref
  • higher cost for passing unknown things to builtins
  • add small cost for forward branches
  • add small cost for sizeof, nfields, isa, subtype

In the spirit of #31455. I tried to keep this quite conservative and unlikely to affect high-performance code. If everyone thinks this looks reasonable I'll run benchmarks.

Some quick stats:
Before:

Base  ─────────── 36.138898 seconds
Total ───────  93.147351 seconds 
Generating precompile statements... 1713 generated in 133.823332 seconds (overhead  99.003688 seconds)
-rw-r--r-- 1 jeff jeff  12616483 Mar 22 13:52 corecompiler.ji
-rwxr-xr-x 1 jeff jeff 150780360 Mar 22 13:59 sys.so

After:

Base  ─────────── 35.660582 seconds
Total ───────  92.700790 seconds 
Generating precompile statements... 1720 generated in 130.747812 seconds (overhead  95.763560 seconds)
-rw-r--r-- 1 jeff jeff  12401243 Mar 23 10:48 corecompiler.ji
-rwxr-xr-x 1 jeff jeff 148853832 Mar 23 10:55 sys.so

@JeffBezanson JeffBezanson added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Mar 23, 2020
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 23, 2020

add small cost for sizeof, nfields, isa, subtype

These seem like they should also be even higher cost, since if they're still present, that means we couldn't constant fold them.

And we should do nanosoldier, even if just to know which benchmarks will be affected after merging.

@JeffBezanson
Copy link
Sponsor Member Author

My hope was that would be covered by looking at the argument types; with abstract types they will generally be pretty expensive, concrete types not constant-folded might be expensive but might be in-between, e.g. sizeof on a String that does a load. Maybe isa and subtype should have higher costs though.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 23, 2020

Those also may be just a pointer comparison. Okay, makes sense this is handled by the abstract type cost.

@timholy
Copy link
Sponsor Member

timholy commented Mar 23, 2020

These all look fine to me. In the longer run, should we consider trying to tune these numbers? Maybe even to the user's specific CPU?

@JeffBezanson
Copy link
Sponsor Member Author

Oops, I messed up the type check here.

@JeffBezanson
Copy link
Sponsor Member Author

This also fixes the following inlining bug:

julia> const k = [1];

julia> f() = k[1];

julia> @code_typed f()
CodeInfo(
1 ─ %1 = invoke Base.getindex(Main.k::Array{Int64,1}, 1::Int64)::Int64
└──      return %1
) => Int64

All in all, I was hoping this PR would make us do less inlining, but I think it's going to end up just moving things around and improving the decisions, not necessarily doing more or less overall.

@JeffBezanson
Copy link
Sponsor Member Author

Ok I think this is ready to go. Will run benchmarks if CI passes.

@JeffBezanson
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@JeffBezanson
Copy link
Sponsor Member Author

The regressions in Random are due to inlining being rearranged in an unlucky way that prevents us from removing an allocation we used to remove, the Sampler here I believe:

rand!(rng::AbstractRNG, A::AbstractArray{T}, X) where {T}             = rand!(rng, A, Sampler(rng, X))

Looking through it, the new inlining decisions seem totally reasonable, it's just that they're not able to take this into account. So we may want to revisit this after #34126, which should make argument passing cost more uniform.

@JeffBezanson
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 28, 2020

That looks quite positive!

@nanosoldier runbenchmarks("random", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 16, 2020

bump?

- treat arrayset like arrayref
- add small cost for sizeof, nfields, isa, subtype
@JeffBezanson
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks("random", vs=":master")

@JeffBezanson
Copy link
Sponsor Member Author

I rebased this, and after some experiments decided to pare it down to some minimal changes. The other changes seemed to make some things a bit slower.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 31, 2021

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG

@KristofferC
Copy link
Sponsor Member

Looks pretty good, some things that might be worth looking at:

["array", "cat", "4467"] | 5.16 (5%) | 1.13 (1%)
["random", "types", ("rand", "MersenneTwister", "Complex{Int16}")] | 2.31 (25%) | 1.00 (1%)
["array", "index", ("sumelt_boundscheck", "100000:-1:1")] | 2.00 (50%) | 1.00 (1%)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 2, 2021

I'm still happy with this. @JeffBezanson merge?

In the array cat example above, it is a moderately complex function which shouldn't usually need inlining. However, vcat is a varargs function, so we're not adequately accounting for the dynamic cost of choosing not to inline that. We should consider giving a boost to Vararg functions which will have a different dynamic cost (due to different compilation signature), but that is separate from this change. We could also consider changing vcat literal lowering to avoid making a tuple, but I don't know if that is necessary.

@vtjnash vtjnash merged commit abbb220 into master Jun 4, 2021
@vtjnash vtjnash deleted the jb/inliningcost2 branch June 4, 2021 02:48
shirodkara pushed a commit to shirodkara/julia that referenced this pull request Jun 9, 2021
- treat arrayset like arrayref
- add small cost for typeof
johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021
- treat arrayset like arrayref
- add small cost for typeof
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants