Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag the Repr::repr function with #[inline] #9032

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

This allows cross-crate inlining which is very good because this is called a
lot throughout libstd (even when libstd is inlined across crates).

In one of my projects, I have a test case with the following performance characteristics

commit optimization level runtime (seconds)
before O2 22s
before O3 107s
after O2 13s
after O3 12s

I'm a bit disturbed by the 107s runtime from O3 before this commit. The performance characteristics of this test involve doing an absurd amount of small operations. A huge portion of this is creating hashmaps which involves allocating vectors.

The worst portions of the profile are:
screen shot 2013-09-06 at 10 32 15 pm

Which as you can see looks like some serious problems with inlining. I would expect the hash map methods to be high up in the profile, but the top 9 callers of cast::transmute_copy were Repr::repr's various monomorphized instances.

I wish there we a better way to detect things like this in the future, and it's unfortunate that this is required for performance in the first place. I suppose I'm not entirely sure why this is needed because all of the methods should have been generated in-crate (monomorphized versions of library functions), so they should have gotten inlined? It also could just be that by modifying LLVM's idea of the inline cost of this function it was able to inline it in many more locations.

This allows cross-crate inlining which is *very* good because this is called a
lot throughout libstd (even when libstd is inlined across crates).
@thestinger
Copy link
Contributor

It's not an issue with LLVM. If you mark a function as #[inline] we emit the AST into the metadata and compile it again, so anything it calls in libstd is a cross-crate call.

bors added a commit that referenced this pull request Sep 7, 2013
This allows cross-crate inlining which is *very* good because this is called a
lot throughout libstd (even when libstd is inlined across crates).

In one of my projects, I have a test case with the following performance characteristics

commit | optimization level | runtime (seconds)
----|------|----
before | O2  | 22s
before | O3  | 107s
after | O2  | 13s
after | O3  | 12s

I'm a bit disturbed by the 107s runtime from O3 before this commit. The performance characteristics of this test involve doing an absurd amount of small operations. A huge portion of this is creating hashmaps which involves allocating vectors.

The worst portions of the profile are:
![screen shot 2013-09-06 at 10 32 15 pm](https://f.cloud.github.com/assets/64996/1100723/e5e8744c-177e-11e3-83fc-ddc5f18c60f9.png)

Which as you can see looks like some *serious* problems with inlining. I would expect the hash map methods to be high up in the profile, but the top 9 callers of `cast::transmute_copy` were `Repr::repr`'s various monomorphized instances.

I wish there we a better way to detect things like this in the future, and it's unfortunate that this is required for performance in the first place. I suppose I'm not entirely sure why this is needed because all of the methods should have been generated in-crate (monomorphized versions of library functions), so they should have gotten inlined? It also could just be that by modifying LLVM's idea of the inline cost of this function it was able to inline it in many more locations.
@alexcrichton
Copy link
Member Author

It seems weird though because wherever Repr::repr is compiled to, it should inline cast::transmute_copy into itself, right?

@thestinger
Copy link
Contributor

There are no guarantees about inlining those, because there are undefined bitcasts from type_use.

@thestinger
Copy link
Contributor

Can you replicate it with -Z no-monomorphic-collapse?

@bors bors closed this Sep 7, 2013
@alexcrichton
Copy link
Member Author

I cannot replicate it with no-monomorphic-collapse, interesting. The runtime goes from 23s => 11s with that flag.

@thestinger
Copy link
Contributor

Yep, so it's just from undefined behaviour in our IR.

@alexcrichton
Copy link
Member Author

There was talk of removing that part of the compiler, right?

@thestinger
Copy link
Contributor

Yes, but it hasn't happened. The mergefunc pass hits an assert on the code produced by rustc when trying a build with it enabled.

flip1995 pushed a commit to flip1995/rust that referenced this pull request Jun 30, 2022
enum_variant_names should ignore when all prefixes are _

close rust-lang#9018

When Enum prefix is only an underscore, we should not issue warnings.

changelog: fix false positive in enum_variant_names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants