Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile hangs endlessly in LLVM in await-heavy code when ThinLTO is enabled #66036

Closed
tmandry opened this issue Nov 2, 2019 · 8 comments · Fixed by #68435
Closed

Compile hangs endlessly in LLVM in await-heavy code when ThinLTO is enabled #66036

tmandry opened this issue Nov 2, 2019 · 8 comments · Fixed by #68435
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group

Comments

@tmandry
Copy link
Member

tmandry commented Nov 2, 2019

Some code went from compiling in <10 seconds to (hours) in between commits 10f12fe..b3a0350 (unfortunately I don't have a smaller range right now).

The regression only happens when ThinLTO is enabled. Full LTO doesn't have the slowdown.

Here's the source code which is causing the regression. The function has about 20 .await points, which I suspect might be related. It doesn't seem to have anything to do with inlining the awaited function (I replaced it with a panic and linking was still slow).

Here's the LLVM IR for the fast and slow cases (built with the above commits of rustc, respectively): llvm-ir.zip
EDIT: See below for a reproducer

cc @petrhosek

@tmandry tmandry added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group labels Nov 2, 2019
@Mark-Simulacrum
Copy link
Member

10f12fe...b3a0350 is the range and the following is the list of bors commits in that range:

b3a0350 Auto merge of #65982 - RalfJung:miri, r=alexcrichton
2f16be4 Auto merge of #65990 - Centril:rollup-v843h4a, r=Centril
03d537e Auto merge of #65957 - Xanewok:update-rls, r=Xanewok
c553e8e Auto merge of #65941 - ehuss:update-cargo-books, r=alexcrichton
0b7e28a Auto merge of #65068 - estebank:trait-impl-lt-mismatch, r=nikomatsakis
aa69777 Auto merge of #65943 - tmandry:rollup-g20uvkh, r=tmandry
caa1f8d Auto merge of #65904 - matthiaskrgr:submodule_upd, r=Manishearth
d3d28a4 Auto merge of #65927 - eddyb:eval-always-considered-harmful, r=michaelwoerister
eb5ef81 Auto merge of #65435 - michaelwoerister:fix-issue-64153, r=alexcrichton
2dd4e73 Auto merge of #65919 - Centril:rollup-qrgwnt6, r=Centril
cac6821 Auto merge of #65907 - Centril:rollup-9i8ev23, r=Centril
b497e18 Auto merge of #65202 - pietroalbini:scriptify-ci-config, r=alexcrichton
8d78bf6 Auto merge of #65421 - estebank:variants, r=petrochenkov
03a50ae Auto merge of #65188 - matthewjasper:stabilize-const-constructor, r=Centril
9285d40 Auto merge of #65885 - Centril:rollup-y6b2qbf, r=Centril
95f437b Auto merge of #65869 - Centril:rollup-bzlo74f, r=Centril
0f677c6 Auto merge of #65541 - eddyb:spanned-inferred-outlives, r=nikomatsakis
b7176b4 Auto merge of #65519 - pnkfelix:issue-63438-trait-based-structural-match, r=matthewjasper
cf148a7 Auto merge of #65288 - estebank:point-at-assoc-type, r=nikomatsakis
fcf516d Auto merge of #65852 - flip1995:clippyup, r=Manishearth
fae75cd Auto merge of #65167 - hermitcore:rusty-hermit, r=alexcrichton
46e6c53 Auto merge of #65845 - Centril:rollup-28jtjfc, r=Centril
8318ef2 Auto merge of #65758 - RalfJung:miri, r=oli-obk
084edc4 Auto merge of #63812 - eddyb:promo-sanity, r=oli-obk
246be7e Auto merge of #65826 - JohnTitor:rollup-mr6crka, r=JohnTitor
23f890f Auto merge of #65804 - Centril:rollup-arlxgch, r=Centril
85943fd Auto merge of #65764 - Manishearth:clippyup, r=Manishearth
dd2df8f Auto merge of #65793 - Centril:rollup-v40xke9, r=Centril
d54111a Auto merge of #65771 - Centril:rollup-upfsvrb, r=Centril
10a52c2 Auto merge of #65762 - mati865:msys2-bug, r=pietroalbini
8e0007f Auto merge of #65474 - Mark-Simulacrum:rustc-dev-split, r=pietroalbini
55e0063 Auto merge of #65733 - Centril:rollup-0zth66f, r=Centril
4a8c5b2 Auto merge of #57545 - bovinebuddha:object_safe_for_dispatch, r=nikomatsakis
f466f52 Auto merge of #65716 - JohnTitor:rollup-fkcr85k, r=JohnTitor
d6e4028 Auto merge of #65713 - lzutao:clippy-up, r=Manishearth
57bfb80 Auto merge of #65503 - popzxc:refactor-libtest, r=wesleywiser
50ffa79 Auto merge of #65501 - alexcrichton:remove-emscripten-backend, r=Mark-Simulacrum
6576f4b Auto merge of #65671 - Centril:rollup-00glhmb, r=Centril

@tmandry tmandry added the I-compiletime Issue: Problems and improvements with respect to compile times. label Nov 12, 2019
@tmandry
Copy link
Member Author

tmandry commented Nov 16, 2019

I had to double check to be sure, but the first bad commit was 060b6cb, "Update hashbrown to 0.6.2"

The only reason I can come up with for this is that the change in hashbrown caused some IR to be emitted in a different order, and this triggered a bug somewhere, possibly LLVM, but before invoking the linker.

@tmandry tmandry changed the title Major link-time slowdown in await-heavy code when ThinLTO is enabled Major compile-time slowdown in LLVM in await-heavy code when ThinLTO is enabled Dec 3, 2019
@tmandry tmandry changed the title Major compile-time slowdown in LLVM in await-heavy code when ThinLTO is enabled Compile hangs endlessly in LLVM in await-heavy code when ThinLTO is enabled Dec 3, 2019
@mati865
Copy link
Contributor

mati865 commented Dec 4, 2019

Cc @alexcrichton @Amanieu

@tmandry
Copy link
Member Author

tmandry commented Dec 6, 2019

I finally have a reproducer that works with opt (the one built in the rust tree). I reproduced with the following line:

build/x86_64-unknown-linux-gnu/llvm/bin/opt -instcombine last_stage.nodebug.ll -o opt-out

(I've also used the --thinlto-bc and --time-passes in the longest runs I've done, but pretty sure those flags aren't necessary.)

reproducer.zip

@tmandry
Copy link
Member Author

tmandry commented Dec 6, 2019

Reduced: reduced.zip

@tmandry
Copy link
Member Author

tmandry commented Dec 6, 2019

For future reference, here's how I got the above reproducer. Note that this is for a hang, but could easily be adapted for crashing as well.

Prereqs

I had already minimized the rust source code manually, which allowed me to identify the exact function that was causing the hang. It turned out that pulling out a self-contained example was really difficult or impossible; this bug seemed to depend on a certain set of things being in a given codegen unit, for example. If I had some code that depended on items Foo, Bar, Baz, then pulled those dependencies and the offending function into their own file, the problem suddenly went away.

I had already run perf on rustc and produced a flamegraph to get a basic idea of where time was being spent. Pro-tip: --call-graph=lbr was the only mode that gave me good backtrace info for both rustc and LLVM. I could see it was spending a significant amount of time in LLVM.

Getting a llvm reproducer

The Debugging LLVM page of the rustc guide was very helpful here. Note that the following steps are easier if you can set -Ccodegen-units=1. In my case, this made the issue go away, so I couldn't.

  1. Ran rustc with -Csave-temps until it hung
  2. Observe that for each codegen unit there were a number of files (N) that looked like validator_bin_test.validator.7rcbfp3g-cgu.9.rcgu.no-opt.bc (9 is the codegen unit, no-opt is the stage bit which varies)
  3. ...except for one codegen unit, 10, which had fewer than N files. This is the one that was hanging.
  4. By sorting all of these files by creation timestamp, it's easy to see which step is missing. In this case, the last file in codegen unit 10 was validator_bin_test.validator.7rcbfp3g-cgu.10.rcgu.thin-lto-after-patch.bc. The stage after thin-lto-after-patch was thin-lto-after-pm. Okay, so the hang is occurring in the pass manager. That matches what I was seeing in perf.
  5. Now we have the input bitcode before the pass manager runs. But that isn't enough to reproduce the failure outside of rustc; rustc configures the pass manager with a specific set of passes, each with their own configuration. Reproducing that configuration in a tool like opt proved very difficult.
  6. What I did instead was to use rustc's -Zprint-llvm-passes option to print the LLVM IR for the function for each pass, until the compiler hung. Now I had the LLVM IR for the function right before the pass that was hanging (instcombine).
  7. The LLVM IR for the function printed by -Zprint-llvm-passes is not self-contained. It references other things. This is where the bitcode files above come in handy. I obtained the LLVM IR from the after-patch bitcode file above using llvm-dis, found the IR for the offending function, and replaced it with the IR output by -Zprint-llvm-passes.
  8. Simply patching IR like this doesn't quite work. The main problem is debuginfo; the debuginfo metadata (looks like !123) references global tables that aren't printed by -Zprint-llvm-passes.[1] What I did was run a regex over the newly patched-in IR to remove these debuginfo attributes. The vim-flavored regex was something like s/, \([^!,]*!\)\+[^!,]*//g. Also, I had to comment out every line that contained llvm.dbg (g/llvm\.dbg/norm I;).
  9. Now, run that ll file through opt with the pass that's hanging. In my case, opt -instcombine testcase.ll. With any luck, opt will accept the IR and hang. If you get errors you can also try with --disable-verify, though this never helped me.

[1]: Disabling debuginfo doesn't quite work, since you're probably depending on libraries (like libstd) that were compiled with debuginfo, and their code has been inlined into yours at this point.

Minimizing the reproducer

Congratulations! You now have a reproducer, but it may be huge. I used both bugpoint and llvm-reduce (in that order) to reduce the LLVM IR down to a manageable size. There are guides on how to do this. llvm-reduce's interface is simpler, but it couldn't handle all the function attributes I had in my IR (which bugpoint kindly stripped out).

For minimizing hangs, what you want to do is use timeout with a threshold that's big enough to prevent false positives. In my case, the offending IR would hang forever, but a different version of the same IR would optimize in 0.2s. I used 4 seconds as my argument to timeout.

timeout has an exit code of 124 when it times out. This is your "interesting" case. If the exit code is nonzero but not 124, that means the minimizer produced invalid input; this is not interesting. Here's the script I used with bugpoint:

#!/bin/bash -x
timeout 4 /code/fuchsia-rust/rust/build/x86_64-unknown-linux-gnu/llvm/bin/opt -instcombine "$@" -o opt-out
if [ $? -eq 124 ]; then
  echo "TIMED OUT"
  exit 1
elif [ $? -eq 0 ]; then
  echo "SUCCESS"
  exit 0
else
  echo "INVALID"
  exit 0
fi

The bugpoint run took around 9 hours in my case. llvm-reduce was much shorter, since the case was already minimized.

One final annoyance is that these two tools expect inverted exit codes from your interesting-ness script (I had to swap exit 1 and exit 0 above when running with llvm-reduce). They both check that your input is considered "interesting," so you'll know right away if you need to switch it up.

@tmandry
Copy link
Member Author

tmandry commented Dec 7, 2019

@petrhosek filed an issue upstream: https://bugs.llvm.org/show_bug.cgi?id=44245

@tmandry
Copy link
Member Author

tmandry commented Jan 15, 2020

Patch has landed: https://reviews.llvm.org/D71164

@tmandry tmandry added I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. and removed I-compiletime Issue: Problems and improvements with respect to compile times. labels Jan 15, 2020
@tmandry tmandry mentioned this issue Jan 21, 2020
bors added a commit that referenced this issue Jan 23, 2020
@bors bors closed this as completed in 13a91a8 Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants