Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perf problems found in investigation of issue #107728 #107806

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Sep 13, 2024

  • CheckRunClassInitThrowing didn't check to see if the class had been initialized before taking a lock
  • EnsureTlsIndexAllocated didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
  • And finally JIT_GetNonGCThreadStaticBaseOptimized and JIT_GetGCThreadStaticBaseOptimized were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does multithreaded work and is forced into the paths which hit this issue...

Runtime Time
.NET 8 00.9414682 s
.NET 9 before this fix 22.8079382 s
.NET 9 with this fix 00.2004539 s

Fixes #107728

- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
@davidwrighton
Copy link
Member Author

Benchmark app for reference

using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace DebugThreadStaticPerf
{
    internal class Program
    {
        [ThreadStatic]
        public static int X;
        public volatile static int ReadyThreads;
        public static int ReadyThreadCount = 6;
        public static int IterationCount = 100000;
        public static bool PreJit = WaitAndExecuteOnReadyThreads(1, 1);

        public volatile static int DoneThreads;

        [MethodImpl(MethodImplOptions.NoOptimization)]
        public static ref int GetThreadStatic()
        {
            return ref X;
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        public static void ThreadProc()
        {
            WaitAndExecuteOnReadyThreads(ReadyThreadCount, IterationCount);
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        public static bool WaitAndExecuteOnReadyThreads(int readyThreadCount, int iterationCount)
        {
            Interlocked.Increment(ref ReadyThreads);
            while (ReadyThreads != readyThreadCount)
            {

            }

            for (int i = 0; i < iterationCount; i++)
            {
                GetThreadStatic() = GetThreadStatic() + 1;
            }

            Interlocked.Increment(ref DoneThreads);

            return true;
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        public static void TestFunction()
        {
            DoneThreads = 0;
            ReadyThreads = 0;

            List<Thread> threads = new List<Thread>();
            for (int i = 0; i < ReadyThreadCount; i++)
            {
                threads.Add(new Thread(ThreadProc));
                threads[i].Start();
            }

            while (DoneThreads != ReadyThreadCount)
            { }
        }

        [MethodImpl(MethodImplOptions.NoOptimization)]
        static Stopwatch RunTests(int iterationCount)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int i = 0; i < iterationCount; i++)
                TestFunction();
            sw.Stop();
            return sw;
        }

        static void Main(string[] args)
        {
            RunTests(1);

            Console.WriteLine(RunTests(100).ToString());
            Console.WriteLine(RunTests(100).ToString());
            Console.WriteLine(RunTests(100).ToString());
            Console.WriteLine(RunTests(100).ToString());
            Console.WriteLine(RunTests(100).ToString());
        }
    }
}

@davidwrighton davidwrighton merged commit d58b1ca into dotnet:main Sep 13, 2024
88 of 90 checks passed
@davidwrighton
Copy link
Member Author

/backport to release/9.0

Copy link
Contributor

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10857243727

jtschuster pushed a commit to jtschuster/runtime that referenced this pull request Sep 17, 2024
…et#107806)

- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
sirntar pushed a commit to sirntar/runtime that referenced this pull request Sep 30, 2024
…et#107806)

- `CheckRunClassInitThrowing` didn't check to see if the class had been initialized before taking a lock
- `EnsureTlsIndexAllocated` didn't check if the Tls index had been allocated before setting the flag via an expensive Interlocked call to indicate that it had been allocated
- And finally `JIT_GetNonGCThreadStaticBaseOptimized` and `JIT_GetGCThreadStaticBaseOptimized` were missing the fast paths which avoided even calling those apis at all.

Perf with a small benchmark which does complex multithreaded work...

| Runtime | Time |
| ---- | ---- |
| .NET 8 | 00.9414682 s |
| .NET 9 before this fix |  22.8079382 |
| .NET 9 with this fix | 00.2004539 |

Fixes dotnet#107728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible regression with [ThreadStatic] in debug builds on .NET 9 RC1
2 participants