Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra Simplifier & gotoElim passes to warm/hot #20127

Merged

Conversation

ymanton
Copy link
Member

@ymanton ymanton commented Sep 6, 2024

If Simplifier detects trivial conditional branches where the fallthrough
block is a goto block to the branch target it can remove said branches
and leave behind a goto block. The resulting CFG can then potentially
be cleaned up by RedundantGotoElimination. Running these two opts
before BlockSplitter allows these trivial conditional branches to be
completely optimized away before BlockSplitter gets a change to run and
potentially duplicate blocks that hide the opportunity and make it
harder to detect and eliminate later.

This PR inserts Simplifer and RedundantGotoElimination passes before
BlockSplitter and moves all 3 after VP, which typically detects
opportunities that result in dead code, which when removed, results
in the trivial conditional branches that we're interested in.

This PR is made more effective by eclipse/omr#7456.

@ymanton
Copy link
Member Author

ymanton commented Sep 6, 2024

Jenkins test sanity.functional xlinux|plinux|zlinux jdk11,jdk17 depends eclipse/omr#7456

@ymanton
Copy link
Member Author

ymanton commented Sep 6, 2024

Jenkins test sanity.functional xlinux,plinux,zlinux jdk11,jdk17 depends eclipse/omr#7456

@hzongaro
Copy link
Member

hzongaro commented Sep 7, 2024

In the case of scorchingStrategyOpts, does OMR::redundantGotoElimination need to be included somewhere between OMR::veryExpensiveGlobalValuePropagationGroup and OMR::blockSplitter?

And now that I'm looking at it, I should ask the same question of omrHotStrategyOpts in OMR. . . .

If Simplifier detects trivial conditional branches where the fallthrough
block is a goto block to the branch target it can remove said branches
and leave behind a goto block. The resulting CFG can then potentially
be cleaned up by RedundantGotoElimination. Running these two opts
before BlockSplitter allows these trivial conditional branches to be
completely optimized away before BlockSplitter gets a change to run and
potentially duplicate blocks that hide the opportunity and make it
harder to detect and eliminate later.

This PR inserts Simplifer and RedundantGotoElimination passes before
BlockSplitter and moves all 3 after VP, which typically detects
opportunities that result in dead code, which when removed, results
in the trivial conditional branches that we're interested in.

Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
@ymanton ymanton force-pushed the simplifier-trivial-cond-branches branch from 56c9066 to 8d9a6bd Compare September 9, 2024 22:07
@ymanton
Copy link
Member Author

ymanton commented Sep 9, 2024

In the case of scorchingStrategyOpts, does OMR::redundantGotoElimination need to be included somewhere between OMR::veryExpensiveGlobalValuePropagationGroup and OMR::blockSplitter?

And now that I'm looking at it, I should ask the same question of omrHotStrategyOpts in OMR. . . .

partialRedundancyEliminationGroup includes redundantGotoElimination and there's an unconditional pass in both the OpenJ9 scorching and OMR hot strategies.

@ymanton
Copy link
Member Author

ymanton commented Sep 9, 2024

Jenkins test sanity.functional xlinux,plinux,zlinux jdk11,jdk17 depends eclipse/omr#7456

@hzongaro
Copy link
Member

partialRedundancyEliminationGroup includes redundantGotoElimination and there's an unconditional pass in both the OpenJ9 scorching and OMR hot strategies.

Ah! Thanks for pointing that out.

@ymanton
Copy link
Member Author

ymanton commented Sep 12, 2024

The failures look like an exposed bug in BlockSplitter. Crashes happen in local VP because its processing a conditional branch node and expects to find an edge in the CFG to the node's target, but doesn't. Looking back through the trace log it looks to me like BlockSplitter goes off the rails and doesn't properly update the CFG.

[   183] O^O BLOCK SPLITTER: Splitting merge block_19 for predecessor 23 (frequency 4904)
         BLOCK CLONER: Newly created block_109 is a clone of original block_19
         BLOCK CLONER: Newly created block_110 is a clone of original block_12
         BLOCK CLONER: Newly created block_111 is a clone of original block_106

[   184] O^O BLOCK SPLITTER: Splitting merge block_23 for predecessor 21 (frequency 4903)
         BLOCK CLONER: Newly created block_113 is a clone of original block_23
         BLOCK CLONER: Newly created block_114 is a clone of original block_109

block_19 and block_12 exist prior to this pass of BlockSplitter.

n436n     BBStart <block_19> (freq 6)                                                 
n463n     treetop                                                                     
n364n       aload  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.unsafe...
n465n     astore  <temp slot 7>[#536  Auto] [flags 0x7 0x0 ]                          
n365n       aload  <temp slot 5>[#543  Auto] [flags 0x20000007 0x0 ]                  
n467n     lstore  <temp slot 8>[#537  Auto] [flags 0x4 0x0 ]                          
n367n       lloadi  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.field...
n366n         aload  <temp slot 3>[#541  Auto] [flags 0x20000007 0x0 ] (X!=0 X>=0 )   
n497n     ificmpeq --> block_12 BBStart at n503n ()                                   
n947n       iand (X>=0 cannotOverflow )                                               
n946n         l2i                                                                     
n367n           ==>lloadi
n944n         iconst 1 (X!=0 X>=0 )                                                   
n944n       ==>iconst 1
n499n     BBEnd </block_19> =====                                                     

n503n     BBStart <block_12> (freq 6)                                                 
n474n     istore  <temp slot 9>[#539  Auto] [flags 0x3 0x0 ]                          
n475n       iloadi  <unsafe shadow sym>[#538  volatile Shadow] [flags 0x80002603 0x100
n476n         aladd (X>=0 )                                                           
n477n           aload  <temp slot 7>[#536  Auto] [flags 0x7 0x0 ]                     
n479n           lload  <temp slot 8>[#537  Auto] [flags 0x4 0x0 ] (cannotOverflow )   
n542n     istore  <auto slot 4>[#441  Auto] [flags 0x3 0x0 ]                          
n475n       ==>iloadi
n358n     BBEnd </block_12> =====                                                     

        19 [0x7efebf1301c0] BBStart at 0x7efebf087800, frequency = 6
                 in        = [27(0) ]
                 out       = [12(0) ]
                 exception in  = []
                 exception out = []

block_19 ends with a trivial conditional branch to block_12.

BlockSplitter clones block_19 and block_12 and creates the corresponding block_109 and block_110.

n1152n    BBStart <block_109> (freq 32)                                               
n1153n    treetop                                                                     
n1154n      aload  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.unsafe
n1155n    astore  <temp slot 7>[#536  Auto] [flags 0x7 0x0 ]                          
n1156n      aload  <temp slot 5>[#543  Auto] [flags 0x20000007 0x0 ]                  
n1157n    lstore  <temp slot 8>[#537  Auto] [flags 0x4 0x0 ]                          
n1158n      lloadi  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.field
n1159n        aload  <temp slot 3>[#541  Auto] [flags 0x20000007 0x0 ] (X!=0 X>=0 )   
n1160n    ificmpne --> block_110 BBStart at n1165n ()                                 
n1161n      iand (X>=0 cannotOverflow )                                               
n1162n        l2i                                                                     
n1158n          ==>lloadi
n1163n        iconst 1 (X!=0 X>=0 )                                                   
n1163n      ==>iconst 1
n1164n    BBEnd </block_109> =====                                                    

n1165n    BBStart <block_110> (freq 4903)                                             
n1166n    istore  <temp slot 9>[#539  Auto] [flags 0x3 0x0 ]                          
n1167n      iloadi  <unsafe shadow sym>[#538  volatile Shadow] [flags 0x80002603 0x100
n1168n        aladd (X>=0 )                                                           
n1169n          aload  <temp slot 7>[#536  Auto] [flags 0x7 0x0 ]                     
n1170n          lload  <temp slot 8>[#537  Auto] [flags 0x4 0x0 ] (cannotOverflow )   
n1171n    istore  <auto slot 4>[#441  Auto] [flags 0x3 0x0 ]                          
n1167n      ==>iloadi
n1172n    BBEnd </block_110> =====                                                    

       109 [0x7efebf2ad9d0] BBStart at 0x7efebf1857d0, frequency = 32
                 in        = [23(0) ]
                 out       = [110(0) ]
                 exception in  = []
                 exception out = []

It then clones block_109 to get block_114, but doesn't clone block_110, which means that block_109 block_114 needs two exit edges in the CFG, but according to the log output it doesn't have an edge to block_110 as it should.

(I don't know why block_115 was created, it's not mentioned in the log, but it looks OK in the CFG at least.)

n1190n    BBStart <block_114> (freq 6)                                              
n1191n    treetop                                                                   
n1192n      aload  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.unsafe...
n1193n    astore  <temp slot 7>[#536  Auto] [flags 0x7 0x0 ]                        
n1194n      aload  <temp slot 5>[#543  Auto] [flags 0x20000007 0x0 ]                
n1195n    lstore  <temp slot 8>[#537  Auto] [flags 0x4 0x0 ]                        
n1196n      lloadi  jdk/internal/reflect/UnsafeQualifiedIntegerFieldAccessorImpl.field...
n1197n        aload  <temp slot 3>[#541  Auto] [flags 0x20000007 0x0 ] (X!=0 X>=0 ) 
n1198n    ificmpne --> block_110 BBStart at n1165n ()                               
n1199n      iand (X>=0 cannotOverflow )                                             
n1200n        l2i                                                                   
n1196n          ==>lloadi
n1201n        iconst 1 (X!=0 X>=0 )                                                 
n1201n      ==>iconst 1
n1202n    BBEnd </block_114> =====                                                  

n1203n    BBStart <block_115> (freq 4903)                                           
n1205n    goto --> block_110 BBStart at n1165n                                      
n1204n    BBEnd </block_115> =====                                                  

       114 [0x7efebf2af2e0] BBStart at 0x7efebf1863b0, frequency = 6
                 in        = [113(0) ]
                 out       = [115(0) ]
                 exception in  = []
                 exception out = []
       115 [0x7efebf2af4f0] BBStart at 0x7efebf1867c0, frequency = 4903
                 in        = [119(4871) 114(0) ]
                 out       = [110(4903) ]
                 exception in  = []
                 exception out = []

@vijaysun-omr
Copy link
Contributor

Maybe it is better to run with tree and CFG verification enabled to make sure everything was valid before block splitter ran. It would be good to make sure of that since it would be a bit surprising to have it do transformations correctly in every other case, but start failing to do so after the recent changes in strategy.

@ymanton
Copy link
Member Author

ymanton commented Sep 16, 2024

CFG verification doesn't find any problem unfortunately. This bug is only reproducible with #20090 in the mix, which went in a couple of weeks ago and seems to change the IL around unsafe ops. Without those changes in the build I can't reproduce the crash we're seeing here, so it's possible that we're seeing some new CFG shapes that we weren't previously. Continuing to look.

@ymanton
Copy link
Member Author

ymanton commented Sep 17, 2024

I believe the bug is in this part of BlockSplitter:

https://github.com/eclipse/omr/blob/e136a7a853c0afb4e0a392b68158aa4d3827f671/compiler/optimizer/LocalOpts.cpp#L6674-L6701

      TR::Node* lastRealNode = itr->_to->getExit()->getPrevRealTreeTop()->getNode();
      TR::TreeTop* nextTree = itr->_from->getExit()->getNextTreeTop();
      if (
           lastRealNode->getOpCode().isBranch() &&
           lastRealNode->getOpCode().isIf()
         )
         {
          //if we are at the end of the clone we have to get back to the origional execution stream so fall through
          //to a goto
          //
          int32_t newFreq = nextTree->getNode()->getBlock()->getFrequency();
          if (newFreq > cloneEnd->getFrequency())
             newFreq = cloneEnd->getFrequency();


          TR::Block *newBlock = TR::Block::createEmptyBlock(lastRealNode, comp(), newFreq, nextTree->getNode()->getBlock());
          if (nextTree->getNode()->getBlock()->isCold() || cloneEnd->isCold())
             newBlock->setIsCold(true);
          newBlock->getExit()->join(itr->_to->getExit()->getNextTreeTop());
          itr->_to->getExit()->join(newBlock->getEntry());
          cfg->addNode(newBlock);


          newBlock->append(TR::TreeTop::create(comp(), TR::Node::create(lastRealNode, TR::Goto, 0, nextTree)));
          cfg->addEdge(cloneEnd, newBlock);
          cfg->addEdge(newBlock, nextTree->getNode()->getBlock());
          cfg->removeEdge(cloneEnd, nextTree->getNode()->getBlock());


          if (trace())
             traceMsg(comp(), "   Create extra goto block_%d --> %d\n", newBlock->getNumber(), nextTree->getNode()->getBlock()->getNumber());

The above code wants cloned blocks C to rejoin the original path of execution E, so it creates a goto block G, adds edges C -> G and G -> E, and removes C -> E. The problem looks to me like it does not recognize the case where the conditional branch target of C also goes to E, so the C -> E edge represents both the fall-through and branch target and can't be removed. Tracing the above code I can see that in cases where the branch target is not the fall-through path it works as intended, the fall-through edge is removed and the branch target edge remains and the CFG stays correct, but when the branch target is the fall-through path the C -> E edge is always missing after BlockSplitter runs.

The options I see are

  1. to check if the branch target is E and, if so, not remove the edge
  2. to redirect the branch target to G and remove the edge as before.

With the change in eclipse/omr#7456 Simplifier will now recognize (1) and be able to eliminate it, but (2) might be better because I think more opts will spot it and be able to eliminate it (e.g. RedundantGotoElimination).

@ymanton
Copy link
Member Author

ymanton commented Sep 17, 2024

Tested both options locally and they appear to work as expected. Pushed option 2 to eclipse/omr#7456 for PR testing.

@ymanton
Copy link
Member Author

ymanton commented Sep 17, 2024

Jenkins test sanity.functional xlinux,plinux,zlinux jdk11,jdk17 depends eclipse/omr#7456

@ymanton
Copy link
Member Author

ymanton commented Sep 18, 2024

The s390x jobs failed because of instances of #20150.
The ppc64le job failed because some JITServer+CRIU tests saw the following output and consider it cause for failure:

>> Failure condition was found: [Output match: AOT load and compilation disabled post restore.]

which is not related to the changes in these PRs as far as I can tell.

@ymanton ymanton marked this pull request as ready for review September 18, 2024 14:04
@vijaysun-omr
Copy link
Contributor

In above example you mentioned, is there an assumption that G will be added as a fall through block for C ?

@ymanton
Copy link
Member Author

ymanton commented Sep 18, 2024

Yes, G is always being made to follow C in the above code, so it will always represent the fall-through path.

@vijaysun-omr
Copy link
Contributor

The change itself is quite safe (just adds a couple of passes to optimization strategies). Should we also make similar changes for very hot and scorching optimization strategies ?

@vijaysun-omr
Copy link
Contributor

@mpirvu and @dsouzai since this affects warm optimization and hence compile time for Liberty (or similarly large apps). Though the passes in question are quite simple/light, so I don't expect the impact to be much beyond noise. Still, I thought I would let you know.

@ymanton
Copy link
Member Author

ymanton commented Sep 18, 2024

The change itself is quite safe (just adds a couple of passes to optimization strategies). Should we also make similar changes for very hot and scorching optimization strategies ?

Those strategies already have these passes happening in between GVP and BlockSplitter, so we're good there.

@vijaysun-omr vijaysun-omr merged commit e5af31d into eclipse-openj9:master Sep 18, 2024
11 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants