Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: initial implementation of profile synthesis #82926

Merged
merged 4 commits into from
Mar 6, 2023

Commits on Mar 3, 2023

  1. JIT: initial implementation of profile synthesis

    Implements a profile synthesis algorithm based on the classic Wu-Larus
    paper (Static branch frequency and program profile analysis, Micro-27,
    1994), with a simple set of heuristics.
    
    First step is construction of a depth-first spanning tree (DFST) for the
    flowgraph, and corresponding reverse postorder (RPO). Together these drive
    loop recognition; currently we only recognize reducible loops. We use DFST
    (non-) ancestry as a proxy for (non-) domination: the dominator of a node
    is required to be a DFST ancestor. So no explicit dominance computation is
    needed. Irreducible loops are noted but ignored. Loop features like entry,
    back, and exit edges, body sets, and nesting are computed and saved.
    
    Next step is assignment of edge likelihoods. Here we use some simple local
    heuristics based on loop structure, returns, and throws. A final heuristic
    gives slight preference to conditional branches that fall through to the
    next IL offset.
    
    After that we use loop nesting to compute the "cyclic probability" $cp$ for
    each loop, working inner to outer in loops and RPO within loops. $cp$ summarizes
    the effect of flow through the loop and around loop back edges. We cap $cp$ at
    no more than 1000. When finding $cp$ for outer loops we use $cp$ for inner
    loops.
    
    Once all $cp$ values are known, we assign "external" input weights to method
    and EH entry points, and then a final RPO walk computes the expected weight
    of each block (and, via edge likelihoods, each edge).
    
    We use the existing DFS code to build the DFST and the RPO, augmented by
    some fixes to ensure all blocks (even ones in isolated cycles) get numbered.
    
    This initial version is intended to establish the right functionality, enable
    wider correctness testing, and to provide a foundation for refinement of the
    heuristics. It is not yet as efficient as it could be.
    
    The loop recognition and recording done here overlaps with similar code
    elsewhere in the JIT. The version here is structural and not sensitive to IL
    patterns, so is arguably more general and I believe a good deal simpler than
    the lexically driven recognition we use for the current loop optimizer.
    I aspire to reconcile this somehow in future work.
    
    All this is disabled by default; a new config option either enables using
    synthesis to set block weights for all root methods or just those without PGO
    data.
    
    Synthesis for inlinees is not yet enabled; progress here is blocked by dotnet#82755.
    AndyAyersMS committed Mar 3, 2023
    Configuration menu
    Copy the full SHA
    a2ecd56 View commit details
    Browse the repository at this point in the history
  2. fix

    AndyAyersMS committed Mar 3, 2023
    Configuration menu
    Copy the full SHA
    84cf8ed View commit details
    Browse the repository at this point in the history

Commits on Mar 6, 2023

  1. Configuration menu
    Copy the full SHA
    1e90ed7 View commit details
    Browse the repository at this point in the history
  2. review feedback

    AndyAyersMS committed Mar 6, 2023
    Configuration menu
    Copy the full SHA
    bfb77ad View commit details
    Browse the repository at this point in the history