Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible regression with [ThreadStatic] in debug builds on .NET 9 RC1 #107728

Closed
alexyakunin opened this issue Sep 12, 2024 · 5 comments · Fixed by #107806
Closed

Possible regression with [ThreadStatic] in debug builds on .NET 9 RC1 #107728

alexyakunin opened this issue Sep 12, 2024 · 5 comments · Fixed by #107806
Assignees
Labels
area-VM-coreclr in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue

Comments

@alexyakunin
Copy link

alexyakunin commented Sep 12, 2024

Description

One of our tests started to run ~20x slower on .NET 9 RC1, if it's compiled in Debug configuration.

That's where it spends most of its time:

image

There is another branch with similar ~38%, and it's also on the same ThreadRandom.get_Instance call.

ThreadRandom looks as follows:

public static class ThreadRandom
{
    [ThreadStatic] private static Random? _instance;

    public static Random Instance {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        get => _instance ??= CreateInstance();
    }

    private static Random CreateInstance()
        => new(Random.Shared.Next());
}

What's weird is that all of my attempts to reproduce this perf. regression on isolated console app failed, even though a console app running the code of this specific unit test also manifests the regression.

Configuration

Windows 11 machine.

Regression?

Yes, it looks like this is a perf. regression.

Data

No additional data.

Analysis

Not sure what it could be.

@alexyakunin alexyakunin added the tenet-performance Performance related issue label Sep 12, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Sep 12, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Sep 12, 2024
@alexyakunin alexyakunin changed the title Possible regression with [ThreadStatic] in debug builds Possible regression with [ThreadStatic] in debug builds on .NET 9 RC1 Sep 12, 2024
@teo-tsirpanis teo-tsirpanis added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Sep 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@mangod9
Copy link
Member

mangod9 commented Sep 12, 2024

Thanks for reporting the issue @alexyakunin. Can you please clarify whether you are comparing the perf with .NET 8 or a previous preview version of 9? That would help narrow down the issue.

Adding @davidwrighton since there was some work on ThreadStatics recently.

@davidwrighton davidwrighton self-assigned this Sep 12, 2024
@alexyakunin
Copy link
Author

Yes, I'm comparing the perf. of .NET 9 RC1 vs .NET 8 (8.0.8).

@davidwrighton
Copy link
Member

I've confirmed this is a regression in CheckRunClassInitThrowing and several parts of our system could use some checks before doing expensive work. The regression would only likely be visible in a multithreaded application where multiple threads are accessing thread statics, and all of the thread static access was compiled as a DEBUG build, and a few other details..., which is probably why your isolated repro didn't show the slowdown. Our current performance tests do not cover this scenario.

davidwrighton added a commit to davidwrighton/runtime that referenced this issue Sep 13, 2024
- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Sep 13, 2024
davidwrighton added a commit that referenced this issue Sep 13, 2024
- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes #107728
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Sep 13, 2024
github-actions bot pushed a commit that referenced this issue Sep 13, 2024
…ClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes #107728
jkotas pushed a commit that referenced this issue Sep 14, 2024
…ClassInitThrowing` didn't check to see if the class had been initialized before taking a lock - `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated - And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all. (#107817)

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes #107728

Co-authored-by: David Wrighton <davidwr@microsoft.com>
@alexyakunin
Copy link
Author

Thanks, guys - that was quick!

jtschuster pushed a commit to jtschuster/runtime that referenced this issue Sep 17, 2024
…et#107806)

- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
sirntar pushed a commit to sirntar/runtime that referenced this issue Sep 30, 2024
…et#107806)

- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr in-pr There is an active PR which will close this issue when it is merged tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants