Skip to content

Commit

Permalink
JIT: initial implementation of profile synthesis (#82926)
Browse files Browse the repository at this point in the history
Implements a profile synthesis algorithm based on the classic Wu-Larus
paper (Static branch frequency and program profile analysis, Micro-27,
1994), with a simple set of heuristics.

First step is construction of a depth-first spanning tree (DFST) for the
flowgraph, and corresponding reverse postorder (RPO). Together these drive
loop recognition; currently we only recognize reducible loops. We use DFST
(non-) ancestry as a proxy for (non-) domination: the dominator of a node
is required to be a DFST ancestor. So no explicit dominance computation is
needed. Irreducible loops are noted but ignored. Loop features like entry,
back, and exit edges, body sets, and nesting are computed and saved.

Next step is assignment of edge likelihoods. Here we use some simple local
heuristics based on loop structure, returns, and throws. A final heuristic
gives slight preference to conditional branches that fall through to the
next IL offset.

After that we use loop nesting to compute the "cyclic probability" $cp$ for
each loop, working inner to outer in loops and RPO within loops. $cp$ summarizes
the effect of flow through the loop and around loop back edges. We cap $cp$ at
no more than 1000. When finding $cp$ for outer loops we use $cp$ for inner
loops.

Once all $cp$ values are known, we assign "external" input weights to method
and EH entry points, and then a final RPO walk computes the expected weight
of each block (and, via edge likelihoods, each edge).

We use the existing DFS code to build the DFST and the RPO, augmented by
some fixes to ensure all blocks (even ones in isolated cycles) get numbered.

This initial version is intended to establish the right functionality, enable
wider correctness testing, and to provide a foundation for refinement of the
heuristics. It is not yet as efficient as it could be.

The loop recognition and recording done here overlaps with similar code
elsewhere in the JIT. The version here is structural and not sensitive to IL
patterns, so is arguably more general and I believe a good deal simpler than
the lexically driven recognition we use for the current loop optimizer.
I aspire to reconcile this somehow in future work.

All this is disabled by default; a new config option either enables using
synthesis to set block weights for all root methods or just those without PGO
data.

Synthesis for inlinees is not yet enabled; progress here is blocked by #82755.
  • Loading branch information
AndyAyersMS committed Mar 6, 2023
1 parent 340d1f2 commit 73b2a0f
Show file tree
Hide file tree
Showing 7 changed files with 1,051 additions and 3 deletions.
2 changes: 2 additions & 0 deletions src/coreclr/jit/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ set( JIT_SOURCES
fginline.cpp
fgopt.cpp
fgprofile.cpp
fgprofilesynthesis.cpp
fgstmt.cpp
flowgraph.cpp
forwardsub.cpp
Expand Down Expand Up @@ -295,6 +296,7 @@ set( JIT_HEADERS
emitjmps.h
emitpub.h
error.h
fgprofilesynthesis.h
gentree.h
gentreeopsdef.h
gtlist.h
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/jit/block.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1431,15 +1431,15 @@ BasicBlock* Compiler::bbNewBasicBlock(BBjumpKinds jumpKind)
/* Give the block a number, set the ancestor count and weight */

++fgBBcount;
++fgBBNumMax;

if (compIsForInlining())
{
block->bbNum = ++impInlineInfo->InlinerCompiler->fgBBNumMax;
fgBBNumMax = block->bbNum;
}
else
{
block->bbNum = fgBBNumMax;
block->bbNum = ++fgBBNumMax;
}

if (compRationalIRForm)
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ class CSE_DataFlow; // defined in OptCSE.cpp
class OptBoolsDsc; // defined in optimizer.cpp
struct RelopImplicationInfo; // defined in redundantbranchopts.cpp
struct JumpThreadInfo; // defined in redundantbranchopts.cpp
class ProfileSynthesis; // defined in profilesynthesis.h
#ifdef DEBUG
struct IndentStack;
#endif
Expand Down Expand Up @@ -1993,6 +1994,7 @@ class Compiler
friend class SharedTempsScope;
friend class CallArgs;
friend class IndirectCallTransformer;
friend class ProfileSynthesis;

#ifdef FEATURE_HW_INTRINSICS
friend struct HWIntrinsicInfo;
Expand Down
16 changes: 16 additions & 0 deletions src/coreclr/jit/fgprofile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
#pragma hdrstop
#endif

#include "fgprofilesynthesis.h"

// Flowgraph Profile Support

//------------------------------------------------------------------------
Expand Down Expand Up @@ -2425,6 +2427,20 @@ PhaseStatus Compiler::fgIncorporateProfileData()
return PhaseStatus::MODIFIED_EVERYTHING;
}

#ifdef DEBUG
// Optionally just run synthesis
//
if ((JitConfig.JitSynthesizeCounts() > 0) && !compIsForInlining())
{
if ((JitConfig.JitSynthesizeCounts() == 1) || ((JitConfig.JitSynthesizeCounts() == 2) && !fgHaveProfileData()))
{
JITDUMP("Synthesizing profile data\n");
ProfileSynthesis::Run(this, ProfileSynthesisOption::AssignLikelihoods);
return PhaseStatus::MODIFIED_EVERYTHING;
}
}
#endif

// Do we have profile data?
//
if (!fgHaveProfileData())
Expand Down
Loading

0 comments on commit 73b2a0f

Please sign in to comment.