-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize: add at least a small cost for most instructions #31455
Conversation
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Looks like some regressions in array handling. Maybe pair this with an increase in the cost thresholds? |
That would somewhat defeat the purpose of this. In the benchmark, it looks like this improved performance by 10% on many benchmarks, and up to 40% on others. I would rather understand why that one benchmark is not being optimized well. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Depends on the purpose; it wouldn't defeat the "endless inlining" (in the sense of infinite). But agreed, best to focus on the few failures. |
05cd026
to
ba97088
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Results seem pretty good to me now, so I'll plan to merge over the weekend, unless there's any more comments. |
Why did you only run the broadcast benchmarks? There where categories with regression outside of that.
for example |
Those are the only ones that would have changed. I looked into several others. Most were expected/intended. The usual causes were an insufficient benchmark complexity, that's now just measuring overhead noise (e.g. scalar iteration), and inaccurate cost modeling issues (harder to deal with, but maybe we should store the result for boundscheck=none at least? this problem existed before, we just don't have good benchmark coverage of the case.) |
View construction and indexing seems like quite a relevant microbenchmark... and is something we've tried hard to make as fast as possible. Why can't we just manually inline whatever is throwing a wrench into things here? Edit: oh, without looking I'd bet it's the generic |
ba97088
to
0e0e39c
Compare
0e0e39c
to
526964b
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Wow, okay, that seems quite bad |
185b62a
to
dd5a0a8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This removes the dependence on inlining for performance.
This should better reflect the aggregate expected cost of their presence and restrict endless inlining: SROA may eliminate many of them, but probably not all of them.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG |
Not sure that it is worthwhile trying to figure out those regressions for this minor gain |
This should better reflect the aggregate expected cost of their presence
and restrict endless inlining: SROA may eliminate many of them, but
probably not all of them.
This was part of #31338, but I realized that I need to separate the nanosoldier results.