Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Culture sensitive comparison performance on Linux #26054

Open
sebastienros opened this issue May 2, 2018 · 22 comments
Open

Culture sensitive comparison performance on Linux #26054

sebastienros opened this issue May 2, 2018 · 22 comments
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@sebastienros
Copy link
Member

After noticing a very important impact on string comparison algorithms while sorting a list of business objects, I decided to run a benchmark to analyze the differences between Linux and Windows.
The code is here: https://github.com/sebastienros/stringbenchmarks/blob/master/Startup.cs

Result:

Description RPS
Linux - CompareTo 147,252
Linux - CompareOrdinal 317,025
Windows - CompareTo 293,785
Windows - CompareOrdinal 364,015

CompareTo is expected to be slower that CompareOrdinal and I am not questioning that, but on Linux the ratio is 46% while on Windows it's 86%. This could have a significant impact on ASP.NET that uses it extensively. In the TechEmpower Fortunes scenario, on our 12 Cores machine we noticed using a performance by a factor of 3 while sorting the results using ordinal comparison (70K RPS to 216K RPS), so the impact seems to be even bigger than these micro benchmark differences.

@danmoseley
Copy link
Member

@stephentoub

@danmoseley
Copy link
Member

Trying to find our perf tests for these..

@danmoseley
Copy link
Member

danmoseley commented May 3, 2018

The tests are under Tests.System.Tests...

@sebastienros do you have evidence that this ratio is any different to 2.0? From the benchmarks we have, I do see that string comparison on Linux is significantly slower than Windows, but much faster than 2.0, and the ratio has improved.

[edit: that's comparing Linux with Windows for the same comparer, which was not what you flagged. Nevertheless, the question remains has this changed since 2.0?]

@sebastienros
Copy link
Member Author

I could find some culture tests on benchview, but none for string comparison.
I ran the same application on .NET 2.0 too, and I can see some regression on an ASP.NET app so maybe a micro-benchmark would show different results.

Description RPS - 2.0 RPS - 2.1 Delta
Linux - CompareTo 167931 147252 -12%
Linux - CompareOrdinal 319786 317025 -1%
Windows - CompareTo 371181 293785 -21%
Windows - CompareOrdinal 471728 364015 -23%

So you are right that the gap is less important on 2.1 from 2.0, but not for the good reason.

@danmoseley
Copy link
Member

@sebastienros the link above show's CoreFX perf results. Of course the tests may be poor (they are certainly too few iterations) but they show improvements over 2.0 for Linux.

Can you repeat your benchmark, without ASP.NET in the picture -- just a console app?

@danmoseley
Copy link
Member

Ideally with Benchmark.NET

@sebastienros
Copy link
Member Author

With BenchmarkDotNet we can also see regressions on CompareTo but not on CompareOrdinal.

Linux 2.0

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT


         Method |       Mean |     Error |    StdDev |
--------------- |-----------:|----------:|----------:|
      CompareTo | 4,320.8 ns | 59.219 ns | 55.394 ns |
 CompareOrdinal |   468.5 ns |  2.515 ns |  2.229 ns |

Linux 2.1

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT


         Method |       Mean |     Error |    StdDev |
--------------- |-----------:|----------:|----------:|
      CompareTo | 4,910.9 ns | 18.723 ns | 14.617 ns |
 CompareOrdinal |   443.1 ns |  3.340 ns |  2.961 ns |

Windows 2.0

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.14393.2189 (1607/AnniversaryUpdate/Redstone1)
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3410079 Hz, Resolution=293.2483 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.0.6 (CoreCLR 4.6.26212.01, CoreFX 4.6.26212.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.26212.01, CoreFX 4.6.26212.01), 64bit RyuJIT


         Method |       Mean |      Error |     StdDev |
--------------- |-----------:|-----------:|-----------:|
      CompareTo | 3,277.2 ns | 12.3949 ns | 11.5942 ns |
 CompareOrdinal |   467.8 ns |  0.5474 ns |  0.5120 ns |

Windows 2.1

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.14393.2189 (1607/AnniversaryUpdate/Redstone1)
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
Frequency=3410079 Hz, Resolution=293.2483 ns, Timer=TSC
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT


         Method |       Mean |     Error |    StdDev |
--------------- |-----------:|----------:|----------:|
      CompareTo | 3,495.5 ns | 1.2315 ns | 1.1520 ns |
 CompareOrdinal |   402.1 ns | 0.2601 ns | 0.2433 ns |

@stephentoub
Copy link
Member

Can you please share the benchmark?

@sebastienros
Copy link
Member Author

https://github.com/sebastienros/stringbenchmarks branch console, if you want to adapt it to measure other things (it's doing a Sort for instance) then I can re-run everything on all environments easily, that might save you some time.

@sebastienros
Copy link
Member Author

Snap, I pushed a change to remove the Sort just before your comment. One commit before in the 'console' branch then.

@stephentoub
Copy link
Member

@sebastienros, thanks for sharing.

First I'm surprised by some of the absolute values your benchmark shows. That's saying it took almost 5 microseconds to do that comparison on Linux? That must be a very slow machine, or something else is going on, as it's at least an order of magnitude more than I'd expect. I just tried your [Benchmarks]:

[Benchmark]
public int CompareTo() => Fortune1.CompareTo(Fortune2);

[Benchmark]
public int CompareOrdinal() => String.CompareOrdinal(Fortune1, Fortune2);

private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";

by plugging them into the harness I described in https://blogs.msdn.microsoft.com/dotnet/2018/04/18/performance-improvements-in-net-core-2-1/, and I get these results on my Ubuntu 16.04 VM:

         Method |     Toolchain |       Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
      CompareTo | .NET Core 2.0 | 157.709 ns |       0 B |
      CompareTo | .NET Core 2.1 | 175.951 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |   3.876 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |   3.123 ns |       0 B |

and this on my Windows 10 machine:

         Method |     Toolchain |      Mean | Allocated |
--------------- |-------------- |----------:|----------:|
      CompareTo | .NET Core 2.0 | 70.046 ns |       0 B |
      CompareTo | .NET Core 2.1 | 76.217 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |  2.403 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |  2.108 ns |       0 B |

so significantly smaller numbers in magnitude than what you got.

Second, extrapolating from a single micro-benchmark can be misleading. The particular micro-benchmark you've chosen has the strings entirely different, which means it's really just testing the overhead involved in setting up the comparison, that'll end up failing on the very first character examined. For culture-based comparisons, there is a tiny bit more overhead there in 2.1 due to spans being used internally, converting from strings to spans, etc. Once the comparison gets going, though, 2.1's implementation is better, e.g. try making the beginning of your two strings equal, and you should see 2.1 outshine 2.0. For example, I just changed the above to the following that has some differences in the middle of the short strings being compared:

private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "fortune: No such file is directory";

On Linux I got:

         Method |     Toolchain |       Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
      CompareTo | .NET Core 2.0 | 197.154 ns |       0 B |
      CompareTo | .NET Core 2.1 | 181.274 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |  10.242 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |   3.088 ns |       0 B |

and on Windows:

         Method |     Toolchain |       Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
      CompareTo | .NET Core 2.0 | 109.057 ns |       0 B |
      CompareTo | .NET Core 2.1 |  79.227 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |   6.173 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |   2.018 ns |       0 B |

showing 2.1 beating out 2.0. Then I further changed it to be:

private const string Fortune1 = "A computer scientist is someone who fixes things that aren''t broken!";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";

so that the strings differ only by the last character and are slightly longer; on Linux I got:

         Method |     Toolchain |       Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
      CompareTo | .NET Core 2.0 | 262.782 ns |       0 B |
      CompareTo | .NET Core 2.1 | 178.342 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |  15.097 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |   3.117 ns |       0 B |

and on Windows:

         Method |     Toolchain |       Mean | Allocated |
--------------- |-------------- |-----------:|----------:|
      CompareTo | .NET Core 2.0 | 149.206 ns |       0 B |
      CompareTo | .NET Core 2.1 |  85.108 ns |       0 B |
 CompareOrdinal | .NET Core 2.0 |  10.366 ns |       0 B |
 CompareOrdinal | .NET Core 2.1 |   1.911 ns |       0 B |

showing 2.1 being significantly better on these inputs than 2.0.

@stephentoub
Copy link
Member

cc: @tarekgh, @jkotas

@sebastienros
Copy link
Member Author

sebastienros commented May 4, 2018

I mentioned it in my previous comment that the results I pasted are not from the HEAD commit on the repository but the commit before, it had List.Sort call on a list of strings using the two comparers, hence the 5ms. This was just to do exactly the same thing as Fortunes is doing. Then I realized that if would make more sense to compare only a single thing so I changed it, I should have created another branch. For what it's worth I just created it under sort.

I ran the same tests as you then, but with a different outcome. That's problematic, but I can run it on a different set of machines to get numbers we can be confident with.

Compare

private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";

CompareSame

private const string Fortune1 = "fortune: No such file or directory";
private const string Fortune2 = "fortune: No such file is directory";

CompareSimilar

private const string Fortune1 = "A computer scientist is someone who fixes things that aren''t broken!";
private const string Fortune2 = "A computer scientist is someone who fixes things that aren''t broken.";

Linux 2.0

                Method |        Mean |     Error |    StdDev |
---------------------- |------------:|----------:|----------:|
             CompareTo |  94.5751 ns | 0.4293 ns | 0.3585 ns |
         CompareToSame |   1.1872 ns | 0.0187 ns | 0.0175 ns |
      CompareToSimilar | 174.3142 ns | 1.0688 ns | 0.8925 ns |
        CompareOrdinal |   2.5018 ns | 0.0353 ns | 0.0313 ns |
    CompareOrdinalSame |   0.8260 ns | 0.0164 ns | 0.0128 ns |
 CompareOrdinalSimilar |   9.8285 ns | 0.2129 ns | 0.1887 ns |

Linux 2.1

                Method |        Mean |     Error |    StdDev |
---------------------- |------------:|----------:|----------:|
             CompareTo | 108.3260 ns | 0.0919 ns | 0.0717 ns |
         CompareToSame |   8.5684 ns | 0.0065 ns | 0.0050 ns |
      CompareToSimilar | 196.7931 ns | 3.3594 ns | 2.9781 ns |
        CompareOrdinal |   2.0004 ns | 0.0293 ns | 0.0274 ns |
    CompareOrdinalSame |   0.7804 ns | 0.0310 ns | 0.0290 ns |
 CompareOrdinalSimilar |   9.7766 ns | 0.0813 ns | 0.0721 ns |

Windows 2.0

                Method |       Mean |     Error |    StdDev |
---------------------- |-----------:|----------:|----------:|
             CompareTo |  71.282 ns | 0.0295 ns | 0.0262 ns |
         CompareToSame |   1.379 ns | 0.0104 ns | 0.0097 ns |
      CompareToSimilar | 142.187 ns | 0.0927 ns | 0.0867 ns |
        CompareOrdinal |   2.623 ns | 0.0052 ns | 0.0048 ns |
    CompareOrdinalSame |   1.096 ns | 0.0025 ns | 0.0022 ns |
 CompareOrdinalSimilar |   8.868 ns | 0.0019 ns | 0.0017 ns |

Windows 2.1

                Method |        Mean |     Error |    StdDev |
---------------------- |------------:|----------:|----------:|
             CompareTo |  85.3825 ns | 0.0098 ns | 0.0082 ns |
         CompareToSame |   8.4219 ns | 0.0150 ns | 0.0133 ns |
      CompareToSimilar | 143.2853 ns | 0.0322 ns | 0.0301 ns |
        CompareOrdinal |   2.0223 ns | 0.0067 ns | 0.0059 ns |
    CompareOrdinalSame |   0.7535 ns | 0.0020 ns | 0.0018 ns |
 CompareOrdinalSimilar |  10.8721 ns | 0.0816 ns | 0.0724 ns |

Note that I am running Linux and Windows on two identical physical machines, without a VM (docker in the case of Linux) so the comparisons between Linux and Windows are fair.

Sample BenchmarkDotNet framework summary, to show the framework versions are correct:

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT

and

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT

Also Windows is faster in the noise margin even on my results so I assume we can forget about it and focus on the Linux case. Not quite for CompareTo

I will update this thread with results from Azure VMs to exclude any environment specificity.

@sebastienros
Copy link
Member Author

More data:

Linux - Azure - 2.0

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT


                Method |       Mean |     Error |     StdDev |
---------------------- |-----------:|----------:|-----------:|
             CompareTo | 132.176 ns | 2.7644 ns |  8.1510 ns |
         CompareToSame |   1.637 ns | 0.0748 ns |  0.1931 ns |
      CompareToSimilar | 237.060 ns | 5.2528 ns | 15.4879 ns |
        CompareOrdinal |   3.508 ns | 0.1128 ns |  0.3182 ns |
    CompareOrdinalSame |   1.204 ns | 0.0671 ns |  0.1732 ns |
 CompareOrdinalSimilar |  14.108 ns | 0.4070 ns |  1.1999 ns |

Linux - Azure - 2.1

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT


                Method |       Mean |     Error |     StdDev |
---------------------- |-----------:|----------:|-----------:|
             CompareTo | 157.820 ns | 3.4567 ns | 10.1922 ns |
         CompareToSame |  11.910 ns | 0.2834 ns |  0.6101 ns |
      CompareToSimilar | 281.970 ns | 6.2934 ns | 18.5564 ns |
        CompareOrdinal |   2.708 ns | 0.1058 ns |  0.3119 ns |
    CompareOrdinalSame |   1.127 ns | 0.0656 ns |  0.1622 ns |
 CompareOrdinalSimilar |  14.563 ns | 0.4333 ns |  1.2776 ns |

Linux - "Citrine (same hardware as TechEmpower)" - 2.0

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon Gold 5120 CPU 2.20GHz, 1 CPU, 28 logical and 14 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.6 (CoreCLR 4.6.0.0, CoreFX 4.6.26212.01), 64bit RyuJIT


                Method |       Mean |     Error |    StdDev |
---------------------- |-----------:|----------:|----------:|
             CompareTo | 133.469 ns | 0.0141 ns | 0.0132 ns |
         CompareToSame |   1.577 ns | 0.0013 ns | 0.0011 ns |
      CompareToSimilar | 234.991 ns | 0.0242 ns | 0.0202 ns |
        CompareOrdinal |   3.029 ns | 0.0115 ns | 0.0096 ns |
    CompareOrdinalSame |   1.248 ns | 0.0040 ns | 0.0033 ns |
 CompareOrdinalSimilar |  13.876 ns | 0.1108 ns | 0.0925 ns |

Linux - "Citrine (same hardware as TechEmpower)" - 2.1

BenchmarkDotNet=v0.10.14, OS=debian 8
Intel Xeon Gold 5120 CPU 2.20GHz, 1 CPU, 28 logical and 14 physical cores
.NET Core SDK=2.1.300-rc1-008673
  [Host]     : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-rc1-26428-03 (CoreCLR 4.6.26428.03, CoreFX 4.6.26428.03), 64bit RyuJIT


                Method |        Mean |     Error |    StdDev |
---------------------- |------------:|----------:|----------:|
             CompareTo | 158.5255 ns | 0.2813 ns | 0.2493 ns |
         CompareToSame |  12.5199 ns | 0.0007 ns | 0.0006 ns |
      CompareToSimilar | 278.3703 ns | 0.0449 ns | 0.0375 ns |
        CompareOrdinal |   2.5301 ns | 0.0141 ns | 0.0125 ns |
    CompareOrdinalSame |   0.7521 ns | 0.0016 ns | 0.0013 ns |
 CompareOrdinalSimilar |  13.9922 ns | 0.0529 ns | 0.0495 ns |

Interestingly the Citrine machines which are much powerful have the same results as the Azure VMs we are using. Note that the Citrine servers don't have Page Table Isolation disabled (Meltdown security vulnerability).

@stephentoub
Copy link
Member

stephentoub commented May 4, 2018

The CompareToSame lines appear to be highlighting something else. Note that what you wrote above for the "CompareSame" case is not actually what your benchmark is testing: for "CompareSame" you copied what I had, which was almost the same text but swapping the word "or" for "is" so that there was a difference in the middle. But your benchmark CompareToSame is actually doing what its name says and is comparing not only identical strings, but identical references. As such, it should be hitting the same fast path in both 2.0:
https://github.com/dotnet/coreclr/blob/19b74c1ea20102b4882a7e034e0ba8cd2ab88b82/src/mscorlib/src/System/String.Comparison.cs#L387-L396
and 2.1:
https://github.com/dotnet/coreclr/blob/ad0f22cd2f011e7112588415b13d615238b5acb4/src/mscorlib/shared/System/String.Comparison.cs#L270-L274
That there's a difference of ~2ns vs ~13ns is strange. That said, it's a fix difference: this test only measures the case where the strings are the same reference.

@sebastienros
Copy link
Member Author

Correct, I didn't see this difference, I will add it. I won't add the results here unless you want to, I think I already drowned this thread with too much data already.

@tarekgh
Copy link
Member

tarekgh commented Jun 9, 2019

@sebastienros

@adamsitnik has merged just a fix which likely fix this one too. could you try it with your scenario and look if you see any improvement?

@adamsitnik
Copy link
Member

Ok, I have forked the example provided by @sebastienros in https://github.com/dotnet/corefx/issues/37691 and extended it with the benchmarks provided here (I could not use https://github.com/sebastienros/stringbenchmarks/blob/master/Startup.cs because it targets 2.0):

https://github.com/adamsitnik/invariantcultureperf/blob/a2c7baf3c0d301eec993728b69d27209f941e886/Controllers/ValuesController.cs#L39-L80

sample command:

dotnet run -- --server http://asp-perf-lin:5001 --client http://asp-perf-load:5002 --repository https://github.com/adamsitnik/invariantcultureperf --project-file InvariantCulture.csproj --path /api/values/CompareOrdinal --warmup 1 --duration 5 --runtime 3.0.0-*             

The results are RPS for asp-perf-lin and asp-perf-win machines:

Path Windows Linux with 2 fixes Linux with 3 fixes
CompareTo 166 144 188
CompareOrdinal 183 189 192
CompareOrdinalIgnoreCase 171 188 208

The results are RPS for the Citrix machines:

Path Windows Linux with 2 fixes Linux with 3 fixes
CompareTo 516 105 377
CompareOrdinal 580 390 390
CompareOrdinalIgnoreCase 575 400 400

I am going to take a look at the traces from Citrix machines

@sebastienros
Copy link
Member Author

Citrine (not citrix)

@adamsitnik
Copy link
Member

adamsitnik commented Jun 19, 2019

I have run the StringComparer benchmarks from the performance repo using latest CoreCLR bits with my 3 fixes. (https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/corefx/System.Runtime/Perf.StringComparer.cs)

OS=Windows 10.0.17763.107 (1809/October2018Update/Redstone5)
OS=debian 10

Intel Xeon CPU E5-1650 v3 3.50GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.0.100-preview7-012507

Method Count Comparison Mean Windows Mean Debian Ratio
CompareSame 128 CurrentCulture 268.61 ns 127.63 ns 2,10
CompareSame 128 CurrentCultureIgnoreCase 265.39 ns 127.87 ns 2,08
CompareSame 128 InvariantCulture 266.37 ns 127.47 ns 2,09
CompareSame 128 InvariantCultureIgnoreCase 264.94 ns 127.91 ns 2,07
CompareSame 128 Ordinal 15.50 ns 15.04 ns 1,03
CompareSame 128 OrdinalIgnoreCase 53.60 ns 242.72 ns 0,22
CompareSame 262144 CurrentCulture 428,407.75 ns 140,836.14 ns 3,04
CompareSame 262144 CurrentCultureIgnoreCase 427,232.45 ns 140,973.42 ns 3,03
CompareSame 262144 InvariantCulture 427,156.27 ns 139,343.72 ns 3,07
CompareSame 262144 InvariantCultureIgnoreCase 425,779.88 ns 147,502.32 ns 2,89
CompareSame 262144 Ordinal 36,734.94 ns 33,936.02 ns 1,08
CompareSame 262144 OrdinalIgnoreCase 89,270.31 ns 420,274.81 ns 0,21

Linux is on par for Ordinal, two to three times faster for CurrentCulture, CurrentCultureIgnoreCase, InvariantCulture, InvariantCultureIgnoreCase and five times slower for OrdinalIgnoreCase.

I am going to do some research and remove the gap for OrdinalIgnoreCase.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Jun 15, 2020
@tarekgh
Copy link
Member

tarekgh commented Aug 17, 2020

The PR #40910 is addressing the ordinal cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants