Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic PGO work planned for .NET 7 #64659

Closed
10 tasks done
AndyAyersMS opened this issue Feb 2, 2022 · 5 comments
Closed
10 tasks done

Dynamic PGO work planned for .NET 7 #64659

AndyAyersMS opened this issue Feb 2, 2022 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Feb 2, 2022

This issue captures the planned work for .NET 7. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope.

Summary

Dynamic PGO is an opt-in feature introduced in .NET 6. In .NET 7 we will work on making Dynamic PGO easier to use, applicable to more scenarios, and offering more performance improvements.

As part of this work, we will also enable On-Stack Replacement (OSR) and change the default for QuickJitForLoops for x64 and arm64. This will improve Dynamic PGO by including methods with loops in the Tier0 set. It will also notably improve startup time for apps that have jit-intensive startup interval.

Planned for .NET 7

Under Consideration

  • (EgorBo WIP for Native AOT) Add no-fallback GDV when runtime can provide full set of possible classes

Cut from .NET 7 Plans

@AndyAyersMS AndyAyersMS added Epic Groups multiple user stories. Can be grouped under a theme. area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 2, 2022
@AndyAyersMS AndyAyersMS added this to the 7.0.0 milestone Feb 2, 2022
@ghost
Copy link

ghost commented Feb 2, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This issue captures the planned work for .NET 7. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope.

Summary

Dynamic PGO is an opt-in feature introduced in .NET 6. In .NET 7 we will work on making Dynamic PGO easier to use, applicable to more scenarios, and offering more performance improvements.

Planned for .NET 7

Under Consideration

Author: AndyAyersMS
Assignees: -
Labels:

Epic, area-CodeGen-coreclr

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 2, 2022
@JulieLeeMSFT JulieLeeMSFT added User Story A single user-facing feature. Can be grouped under an epic. and removed Epic Groups multiple user stories. Can be grouped under a theme. untriaged New issue has not been triaged by the area owner labels Feb 2, 2022
@manofstick
Copy link
Contributor

manofstick commented Apr 12, 2022

Hi!

...Sorry if this isn't the correct place for this post...

I've been playing around with PGO in .net 6 with mostly excellent results, but...

I've hit one case where it appears (to ignorant me) that the statistics collecting method isn't being replaced by the optimized version (well at least that is my simple understanding of what PGO should be doing...)

Anyway I'm not sure what information I can provide, I have tried creating a simple test case (with BenchmarkDotNet) but when it the simple case it appeared to do the "correct thing". i.e. works at the equivalent-ish speed.

But in my "production" case it consistently, without fail, always generates badness.

The method is generated using Expression trees - i.e. compiled Lambda - but is just a very simple object construction based on a BinaryReader, equivalent to the following c#:

SomeBusinessType ReadItem(BinaryReader reader)
{
    return new SomeBusinessType (
        reader.ReadUInt16(),
        CustomSerializationMethods.Methods.binaryReadCustomType(reader),
        reader.ReadSingle()
    );
}

The SomeBusinessType is an F# struct, but doesn't appear to be anything out of the ordinary.

It works perfectly fine and performant in "normal" .net 6, or with the PGO environmental variables set and debug build (although I assume that's just because it doesn't engage the PGO engine?).

image

As you can see in the image from the profiler, the compiled lambda, and a function JIT_ClassProfile32 completely dominate.

(It's called in a tight loop, populating an ImmutableArray<>.Builder, running on multiple threads - on Tasks started with Task.Run(). After a successful, albeit slow load, restarting this loading process, from within the same process, it still runs slowly - i.e. still not swapping out the function.)

Anyway, it's in proprietary code, and given that my simplified version didn't exhibit the same issue, I'm not sure what I can offer, but you can direct me to provide further information if you feel the need.

Edit: the code that generates the deserializer is used on numerous types - i.e standard reflection based Expression treee creation stuff. In the "production" run it seems to work fine for about a dozen Types, only running slowly for a single one. And noted above, even that single one, when done is an isolated project, worked as expected. Hmmm.

@AndyAyersMS
Copy link
Member Author

...Sorry if this isn't the correct place for this post...

No worries, this is a good place.

What environment vars are you setting? If you've set COMPlus_TC_QuickJitForLoops=1 then it's possible for methods to get "trapped" in Tier0 code for long stretches of time, which would be consistent with what you're seeing above.

If so, the general fix for this is On Stack Replacement (aka OSR) which will be on by default in .NET 7 Preview 4.

There is an unsupported early version of OSR in .NET 6 which may or may not work... feel free give it a try by setting COMPlus_TC_OnStackReplacement=1.

You can also mitigate in source if you can restructure the code so that your loop is tiled into a two-level loop where the inner loop is in a helper method and the outer loop runs at least 30 iterations (split 10,000 into say -> 100 x 100). I realize this may not be easy if you don't have explicit looping.

@manofstick
Copy link
Contributor

Yes indeed I had COMPlus_TC_QuickJitForLoops set, and setting COMPlus_TC_OnStackReplacement did indeed solve this. Thanks.

I did try splitting the loop, but couldn't seem to force it out of Tier0. I also looked at setting AggressiveOptimization on the method, but it was in f# and found that the f# compiler currently doesn't propagate that attribute to the generated IL, so that was a failure too.

Anyway, my main concern that this could of been an issue going forward has been alleviated, so thank for that.

@AndyAyersMS
Copy link
Member Author

Work for .NET 7 is completed.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI User Story A single user-facing feature. Can be grouped under an epic.
Projects
Archived in project
Development

No branches or pull requests

3 participants