Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Intrinsicify .SequenceCompareTo(byte, ...) #22127

Merged
merged 5 commits into from
Jan 25, 2019

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Jan 22, 2019

Using benchmarks from https://github.com/dotnet/coreclr/issues/22078#issuecomment-455843668

Different

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 10.751 ns |
| Managed |  16 |  5.141 ns |
|  Native |  17 | 10.705 ns |
| Managed |  17 |  4.582 ns |
|  Native |  25 | 10.564 ns |
| Managed |  25 |  4.435 ns |
|  Native |  32 | 10.551 ns |
| Managed |  32 |  5.227 ns |
|  Native |  79 | 10.360 ns |
| Managed |  79 |  4.584 ns |
|  Native | 256 | 10.813 ns |
| Managed | 256 |  5.013 ns |

AlmostSameButDifferent

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 11.298 ns |
| Managed |  16 |  4.717 ns |
|  Native |  17 | 13.002 ns |
| Managed |  17 |  5.557 ns |
|  Native |  25 | 12.173 ns |
| Managed |  25 |  5.548 ns |
|  Native |  32 | 12.470 ns |
| Managed |  32 |  4.613 ns |
|  Native |  79 | 16.639 ns |
| Managed |  79 |  7.081 ns |
|  Native | 256 | 21.392 ns |
| Managed | 256 | 11.007 ns |

Same

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 11.471 ns |
| Managed |  16 |  4.686 ns |
|  Native |  17 | 11.454 ns |
| Managed |  17 |  4.527 ns |
|  Native |  25 | 11.966 ns |
| Managed |  25 |  4.552 ns |
|  Native |  32 | 11.751 ns |
| Managed |  32 |  3.999 ns |
|  Native |  79 | 16.089 ns |
| Managed |  79 |  5.850 ns |
|  Native | 256 | 21.342 ns |
| Managed | 256 | 10.994 ns |

Fixes https://github.com/dotnet/coreclr/issues/22078

/cc @CarolEidt @fiigii @tannergooding @ahsonkhan

/cc @redknightlois @ayende

@benaadams
Copy link
Member Author

Test extensions dotnet/corefx#34742

@benaadams benaadams changed the title [WIP] Speedup SpanHelpers.IndexOf{Any}(byte, ...) and .SequenceCompareTo(byte, ...) [WIP] Speedup .SequenceCompareTo(byte, ...) Jan 22, 2019
@benaadams benaadams changed the title [WIP] Speedup .SequenceCompareTo(byte, ...) Speedup .SequenceCompareTo(byte, ...) Jan 24, 2019
@benaadams
Copy link
Member Author

Added performance numbers and rebased

Different

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 10.751 ns |
| Managed |  16 |  5.141 ns |
|  Native |  17 | 10.705 ns |
| Managed |  17 |  4.582 ns |
|  Native |  25 | 10.564 ns |
| Managed |  25 |  4.435 ns |
|  Native |  32 | 10.551 ns |
| Managed |  32 |  5.227 ns |
|  Native |  79 | 10.360 ns |
| Managed |  79 |  4.584 ns |
|  Native | 256 | 10.813 ns |
| Managed | 256 |  5.013 ns |

AlmostSameButDifferent

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 11.298 ns |
| Managed |  16 |  4.717 ns |
|  Native |  17 | 13.002 ns |
| Managed |  17 |  5.557 ns |
|  Native |  25 | 12.173 ns |
| Managed |  25 |  5.548 ns |
|  Native |  32 | 12.470 ns |
| Managed |  32 |  4.613 ns |
|  Native |  79 | 16.639 ns |
| Managed |  79 |  7.081 ns |
|  Native | 256 | 21.392 ns |
| Managed | 256 | 11.007 ns |

Same

|  Method |   N |      Mean |
|-------- |---- |----------:|
|  Native |  16 | 11.471 ns |
| Managed |  16 |  4.686 ns |
|  Native |  17 | 11.454 ns |
| Managed |  17 |  4.527 ns |
|  Native |  25 | 11.966 ns |
| Managed |  25 |  4.552 ns |
|  Native |  32 | 11.751 ns |
| Managed |  32 |  3.999 ns |
|  Native |  79 | 16.089 ns |
| Managed |  79 |  5.850 ns |
|  Native | 256 | 21.342 ns |
| Managed | 256 | 10.994 ns |

@jkotas
Copy link
Member

jkotas commented Jan 24, 2019

@tannergooding Does this look good to you?

@fiigii
Copy link

fiigii commented Jan 24, 2019

@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link

fiigii commented Jan 24, 2019

We should always test all the paths of hardware-specific code. For this PR, we can turn off AVX to test SSE2 path and turn off all intrinsic to test the software fallback. However, the S.N.Vector path cannot be covered on x86/x64...

@redknightlois
Copy link

@benaadams can you run with both with cache misses and without cache misses? Also with high numbers like 32K to 1M (with a few in-between)

@benaadams
Copy link
Member Author

benaadams commented Jan 24, 2019

Also with high numbers like 32K to 1M (with a few in-between)

Different

|  Method |       N |      Mean |      Error |    StdDev |
|-------- |-------- |----------:|-----------:|----------:|
|  Native |    1024 | 11.928 ns |  4.9578 ns | 0.2718 ns |
| Managed |    1024 |  4.620 ns |  3.3234 ns | 0.1822 ns |

|  Native |   32768 | 10.927 ns | 11.8992 ns | 0.6522 ns |
| Managed |   32768 |  7.585 ns | 31.4300 ns | 1.7228 ns |

|  Native |   65536 | 12.784 ns |  5.7687 ns | 0.3162 ns |
| Managed |   65536 |  4.524 ns |  2.8635 ns | 0.1570 ns |

|  Native |  131072 | 10.691 ns |  3.8721 ns | 0.2122 ns |
| Managed |  131072 |  4.438 ns |  0.4294 ns | 0.0235 ns |

|  Native |  262144 | 10.326 ns |  0.5011 ns | 0.0275 ns |
| Managed |  262144 |  4.971 ns |  0.3389 ns | 0.0186 ns |

|  Native |  524288 | 10.368 ns |  0.3761 ns | 0.0206 ns |
| Managed |  524288 |  4.477 ns |  0.1690 ns | 0.0093 ns |

|  Native | 1048576 | 10.389 ns |  1.4737 ns | 0.0808 ns |
| Managed | 1048576 |  5.013 ns |  0.4726 ns | 0.0259 ns |

|  Native | 8388608 | 10.503 ns |  3.6709 ns | 0.2012 ns |
| Managed | 8388608 |  4.545 ns |  1.6608 ns | 0.0910 ns |

AlmostSameButDifferent

|  Method |       N |            Mean |            Error |         StdDev |
|-------- |-------- |----------------:|-----------------:|---------------:|
|  Native |    1024 |        52.23 ns |         6.140 ns |      0.3366 ns |
| Managed |    1024 |        33.61 ns |        16.190 ns |      0.8875 ns |

|  Native |   32768 |     2,008.78 ns |        67.170 ns |      3.6818 ns |
| Managed |   32768 |     1,114.52 ns |        83.492 ns |      4.5765 ns |

|  Native |   65536 |     4,011.11 ns |       598.936 ns |     32.8297 ns |
| Managed |   65536 |     2,251.27 ns |       626.958 ns |     34.3657 ns |

|  Native |  131072 |     8,021.91 ns |     1,162.816 ns |     63.7379 ns |
| Managed |  131072 |     5,016.38 ns |       122.943 ns |      6.7389 ns |

|  Native |  262144 |    16,741.14 ns |       737.067 ns |     40.4011 ns |
| Managed |  262144 |    12,111.76 ns |     1,401.922 ns |     76.8441 ns |

|  Native |  524288 |    33,311.27 ns |     3,603.438 ns |    197.5166 ns |
| Managed |  524288 |    24,386.69 ns |     5,319.465 ns |    291.5778 ns |

|  Native | 1048576 |    66,513.58 ns |     7,202.769 ns |    394.8081 ns |
| Managed | 1048576 |    50,523.14 ns |    13,804.180 ns |    756.6537 ns |

|  Native | 8388608 | 1,779,989.61 ns |   805,216.188 ns | 44,136.6164 ns |
| Managed | 8388608 | 1,806,247.14 ns | 1,316,669.259 ns | 72,171.0850 ns |

Same

|  Method |       N |            Mean |           Error |         StdDev |
|-------- |-------- |----------------:|----------------:|---------------:|
|  Native |    1024 |        50.10 ns |       3.5770 ns |      0.1961 ns |
| Managed |    1024 |        22.84 ns |       0.1028 ns |      0.0056 ns |

|  Native |   32768 |     1,999.43 ns |     356.4009 ns |     19.5355 ns |
| Managed |   32768 |     1,098.49 ns |      32.1779 ns |      1.7638 ns |

|  Native |   65536 |     4,012.88 ns |     431.7510 ns |     23.6657 ns |
| Managed |   65536 |     2,356.23 ns |   4,815.9938 ns |    263.9809 ns |

|  Native |  131072 |     7,987.18 ns |     342.8520 ns |     18.7929 ns |
| Managed |  131072 |     4,996.34 ns |     211.1606 ns |     11.5744 ns |

|  Native |  262144 |    16,504.62 ns |   1,353.8718 ns |     74.2103 ns |
| Managed |  262144 |    11,966.56 ns |   1,042.4223 ns |     57.1387 ns |

|  Native |  524288 |    32,981.75 ns |   2,399.1959 ns |    131.5080 ns |
| Managed |  524288 |    23,830.30 ns |   2,754.7899 ns |    150.9993 ns |

|  Native | 1048576 |    66,725.47 ns |   7,043.4319 ns |    386.0743 ns |
| Managed | 1048576 |    48,112.26 ns |   7,447.5663 ns |    408.2262 ns |

|  Native | 8388608 | 1,667,792.22 ns |  49,976.4103 ns |  2,739.3757 ns |
| Managed | 8388608 | 1,613,212.61 ns | 252,499.1025 ns | 13,840.3278 ns |

can you run with both with cache misses and without cache misses?

If you can come up with a benchmark for that where killing the cache doesn't dominate.

However, it should effect both similarly (with memory access latency coming to the fore over instructions speed), does it add value?* (*as its straightforward sequential access)

At a guess the 8388608 size one would be evicting cache as it goes along?

@benaadams
Copy link
Member Author

Could maybe switch to the cpu compare intrinsics above a certain length? However I don't know much about their behaviour or usage /cc @fiigii

@benaadams
Copy link
Member Author

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please

@benaadams
Copy link
Member Author

Looks like infra error on coreclr-ci for Test pri0 Linux_rhel6 x64 checked Job
Agent: Hosted Ubuntu 1604 20

/__w/1/s/Tools/IL.targets(49,5):
 error MSB6003: The specified task executable "sh" could not be run.
 Cannot allocate memory [/__w/1/s/tests/src/readytorun/tests/fieldgetter.ilproj]

/__w/1/s/Tools/IL.targets(49,5):
 error MSB6003: The specified task executable "sh" could not be run.
 Cannot allocate memory [/__w/1/s/tests/src/reflection/DefaultInterfaceMethods/GetInterfaceMapProvider.ilproj]

/__w/1/s/Tools/IL.targets(49,5): error MSB6003:
 The specified task executable "sh" could not be run.
 Cannot allocate memory [/__w/1/s/tests/src/reflection/DefaultInterfaceMethods/InvokeProvider.ilproj]

/__w/1/s/Tools/IL.targets(49,5):
 error MSB6003: The specified task executable "sh" could not be run.
 Cannot allocate memory [/__w/1/s/tests/src/reflection/Modifiers/modifiersdata.ilproj]

/__w/1/s/tests/dir.traversal.targets(25,5):
 error : (No message specified) [/__w/1/s/tests/src/dirs.proj]

/__w/1/s/tests/dir.traversal.targets(25,5):
 error : (No message specified) [/__w/1/s/tests/build.proj]

Only renamed a label 5f6dcb8 since the previous CI pass so doubt its related

@redknightlois
Copy link

@benaadams We have a good starting place IMHO. This is testing difference at byte n style of check. As you can see cache misses do make a difference, especially for smaller memory sizes.

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.253 (1809/October2018Update/Redstone5)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-009812
  [Host]     : .NET Core 3.0.0-preview-27323-8 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7301), 64bit RyuJIT
  Job-IKWERK : .NET Core 3.0.0-preview-27323-8 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7301), 64bit RyuJIT

Jit=RyuJit  Platform=X64  Runtime=Core  
Method KeySize Mean Error StdDev StdErr Median Min Q1 Q3 Max Op/s
WithCacheMisses_Scalar 7 2.495 ns 0.0201 ns 0.0188 ns 0.0049 ns 2.483 ns 2.468 ns 2.480 ns 2.515 ns 2.522 ns 400,865,051.4
WithCacheMisses_Framework_MemoryExtensions_Reference 7 8.382 ns 0.0296 ns 0.0277 ns 0.0071 ns 8.374 ns 8.338 ns 8.367 ns 8.406 ns 8.437 ns 119,306,634.9
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 7 7.401 ns 0.0253 ns 0.0236 ns 0.0061 ns 7.397 ns 7.368 ns 7.379 ns 7.415 ns 7.455 ns 135,115,277.4
WithCacheMisses_Numerics32 7 6.487 ns 0.0587 ns 0.0549 ns 0.0142 ns 6.498 ns 6.385 ns 6.469 ns 6.521 ns 6.578 ns 154,160,286.8
NoCacheMisses_Scalar 7 1.882 ns 0.0254 ns 0.0238 ns 0.0061 ns 1.877 ns 1.853 ns 1.862 ns 1.902 ns 1.924 ns 531,399,621.5
NoCacheMisses_Framework_MemoryExtensions_Reference 7 13.916 ns 0.1071 ns 0.0950 ns 0.0254 ns 13.901 ns 13.783 ns 13.863 ns 13.966 ns 14.143 ns 71,858,529.6
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 7 9.204 ns 0.1419 ns 0.1327 ns 0.0343 ns 9.243 ns 8.966 ns 9.080 ns 9.299 ns 9.393 ns 108,648,109.6
NoCacheMisses_Numerics32 7 5.234 ns 0.0511 ns 0.0478 ns 0.0123 ns 5.232 ns 5.111 ns 5.217 ns 5.279 ns 5.303 ns 191,062,627.4
WithCacheMisses_Scalar 8 2.356 ns 0.0099 ns 0.0093 ns 0.0024 ns 2.357 ns 2.337 ns 2.347 ns 2.362 ns 2.370 ns 424,448,178.3
WithCacheMisses_Framework_MemoryExtensions_Reference 8 9.413 ns 0.0545 ns 0.0509 ns 0.0132 ns 9.414 ns 9.332 ns 9.366 ns 9.440 ns 9.511 ns 106,230,469.5
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 8 8.128 ns 0.0860 ns 0.0804 ns 0.0208 ns 8.146 ns 7.965 ns 8.076 ns 8.183 ns 8.232 ns 123,027,731.3
WithCacheMisses_Numerics32 8 2.305 ns 0.0119 ns 0.0111 ns 0.0029 ns 2.302 ns 2.280 ns 2.298 ns 2.312 ns 2.323 ns 433,922,620.0
NoCacheMisses_Scalar 8 1.885 ns 0.0210 ns 0.0197 ns 0.0051 ns 1.881 ns 1.848 ns 1.876 ns 1.896 ns 1.924 ns 530,532,673.5
NoCacheMisses_Framework_MemoryExtensions_Reference 8 15.381 ns 0.1350 ns 0.1263 ns 0.0326 ns 15.371 ns 15.082 ns 15.301 ns 15.481 ns 15.586 ns 65,013,381.3
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 8 10.270 ns 0.0536 ns 0.0447 ns 0.0124 ns 10.278 ns 10.171 ns 10.249 ns 10.291 ns 10.356 ns 97,366,709.3
NoCacheMisses_Numerics32 8 1.537 ns 0.0516 ns 0.0483 ns 0.0125 ns 1.549 ns 1.450 ns 1.484 ns 1.579 ns 1.596 ns 650,783,037.8
WithCacheMisses_Scalar 15 3.172 ns 0.0187 ns 0.0175 ns 0.0045 ns 3.174 ns 3.126 ns 3.164 ns 3.186 ns 3.193 ns 315,249,246.1
WithCacheMisses_Framework_MemoryExtensions_Reference 15 9.720 ns 0.0542 ns 0.0507 ns 0.0131 ns 9.707 ns 9.644 ns 9.688 ns 9.756 ns 9.825 ns 102,876,170.7
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 15 7.069 ns 0.0493 ns 0.0461 ns 0.0119 ns 7.069 ns 6.982 ns 7.034 ns 7.107 ns 7.165 ns 141,460,095.0
WithCacheMisses_Numerics32 15 2.837 ns 0.0180 ns 0.0168 ns 0.0043 ns 2.845 ns 2.808 ns 2.824 ns 2.852 ns 2.856 ns 352,446,330.2
NoCacheMisses_Scalar 15 1.790 ns 0.0310 ns 0.0290 ns 0.0075 ns 1.790 ns 1.730 ns 1.774 ns 1.814 ns 1.836 ns 558,640,584.4
NoCacheMisses_Framework_MemoryExtensions_Reference 15 15.648 ns 0.1572 ns 0.1471 ns 0.0380 ns 15.688 ns 15.413 ns 15.521 ns 15.729 ns 15.876 ns 63,905,921.4
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 15 8.598 ns 0.3790 ns 0.8240 ns 0.1091 ns 8.166 ns 7.920 ns 8.109 ns 9.061 ns 10.175 ns 116,299,599.8
NoCacheMisses_Numerics32 15 2.029 ns 0.0410 ns 0.0384 ns 0.0099 ns 2.019 ns 1.968 ns 1.999 ns 2.065 ns 2.090 ns 492,925,004.7
WithCacheMisses_Scalar 16 2.953 ns 0.0359 ns 0.0336 ns 0.0087 ns 2.955 ns 2.906 ns 2.919 ns 2.985 ns 3.017 ns 338,650,600.3
WithCacheMisses_Framework_MemoryExtensions_Reference 16 10.237 ns 0.0578 ns 0.0540 ns 0.0139 ns 10.250 ns 10.145 ns 10.212 ns 10.269 ns 10.322 ns 97,686,228.6
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 16 2.112 ns 0.0153 ns 0.0143 ns 0.0037 ns 2.114 ns 2.067 ns 2.109 ns 2.121 ns 2.127 ns 473,581,078.1
WithCacheMisses_Numerics32 16 2.840 ns 0.0147 ns 0.0137 ns 0.0035 ns 2.835 ns 2.817 ns 2.830 ns 2.845 ns 2.867 ns 352,163,763.8
NoCacheMisses_Scalar 16 2.159 ns 0.0269 ns 0.0251 ns 0.0065 ns 2.152 ns 2.130 ns 2.139 ns 2.178 ns 2.204 ns 463,163,312.4
NoCacheMisses_Framework_MemoryExtensions_Reference 16 16.331 ns 0.1864 ns 0.1744 ns 0.0450 ns 16.321 ns 16.014 ns 16.237 ns 16.472 ns 16.601 ns 61,233,619.1
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 16 2.330 ns 0.0523 ns 0.0489 ns 0.0126 ns 2.358 ns 2.240 ns 2.280 ns 2.370 ns 2.392 ns 429,092,956.6
NoCacheMisses_Numerics32 16 2.092 ns 0.0249 ns 0.0233 ns 0.0060 ns 2.094 ns 2.038 ns 2.077 ns 2.105 ns 2.126 ns 477,994,528.7
WithCacheMisses_Scalar 31 4.509 ns 0.0336 ns 0.0314 ns 0.0081 ns 4.517 ns 4.429 ns 4.489 ns 4.528 ns 4.558 ns 221,788,436.9
WithCacheMisses_Framework_MemoryExtensions_Reference 31 11.632 ns 0.0653 ns 0.0611 ns 0.0158 ns 11.630 ns 11.491 ns 11.588 ns 11.682 ns 11.721 ns 85,966,114.3
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 31 2.610 ns 0.0252 ns 0.0235 ns 0.0061 ns 2.608 ns 2.568 ns 2.596 ns 2.627 ns 2.654 ns 383,077,742.2
WithCacheMisses_Numerics32 31 2.690 ns 0.0147 ns 0.0137 ns 0.0035 ns 2.689 ns 2.663 ns 2.685 ns 2.702 ns 2.708 ns 371,708,008.2
NoCacheMisses_Scalar 31 2.770 ns 0.0419 ns 0.0392 ns 0.0101 ns 2.777 ns 2.689 ns 2.749 ns 2.796 ns 2.830 ns 361,073,083.7
NoCacheMisses_Framework_MemoryExtensions_Reference 31 16.779 ns 0.1405 ns 0.1314 ns 0.0339 ns 16.804 ns 16.488 ns 16.697 ns 16.912 ns 16.922 ns 59,599,521.4
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 31 2.005 ns 0.3097 ns 0.5893 ns 0.0878 ns 1.674 ns 1.622 ns 1.664 ns 2.360 ns 3.059 ns 498,658,416.2
NoCacheMisses_Numerics32 31 1.396 ns 0.0417 ns 0.0390 ns 0.0101 ns 1.409 ns 1.334 ns 1.339 ns 1.423 ns 1.445 ns 716,378,519.8
WithCacheMisses_Scalar 32 3.644 ns 0.0240 ns 0.0225 ns 0.0058 ns 3.646 ns 3.607 ns 3.622 ns 3.661 ns 3.679 ns 274,430,816.5
WithCacheMisses_Framework_MemoryExtensions_Reference 32 12.089 ns 0.0625 ns 0.0585 ns 0.0151 ns 12.099 ns 11.959 ns 12.058 ns 12.132 ns 12.163 ns 82,722,945.1
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 32 2.116 ns 0.0262 ns 0.0245 ns 0.0063 ns 2.115 ns 2.076 ns 2.090 ns 2.138 ns 2.149 ns 472,599,139.0
WithCacheMisses_Numerics32 32 2.748 ns 0.0146 ns 0.0136 ns 0.0035 ns 2.752 ns 2.714 ns 2.739 ns 2.756 ns 2.771 ns 363,935,989.4
NoCacheMisses_Scalar 32 3.435 ns 0.0276 ns 0.0258 ns 0.0067 ns 3.443 ns 3.390 ns 3.414 ns 3.447 ns 3.486 ns 291,113,218.4
NoCacheMisses_Framework_MemoryExtensions_Reference 32 17.164 ns 0.0885 ns 0.0828 ns 0.0214 ns 17.182 ns 17.019 ns 17.102 ns 17.220 ns 17.298 ns 58,260,832.5
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 32 2.188 ns 0.0188 ns 0.0176 ns 0.0045 ns 2.185 ns 2.168 ns 2.175 ns 2.198 ns 2.230 ns 456,945,864.9
NoCacheMisses_Numerics32 32 1.656 ns 0.0312 ns 0.0292 ns 0.0075 ns 1.666 ns 1.585 ns 1.652 ns 1.673 ns 1.686 ns 603,910,483.3
WithCacheMisses_Scalar 63 7.817 ns 0.0317 ns 0.0281 ns 0.0075 ns 7.820 ns 7.737 ns 7.810 ns 7.828 ns 7.857 ns 127,932,016.3
WithCacheMisses_Framework_MemoryExtensions_Reference 63 36.606 ns 0.1041 ns 0.0974 ns 0.0251 ns 36.610 ns 36.355 ns 36.576 ns 36.667 ns 36.719 ns 27,318,083.3
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 63 3.060 ns 0.0131 ns 0.0123 ns 0.0032 ns 3.061 ns 3.037 ns 3.053 ns 3.070 ns 3.077 ns 326,808,133.8
WithCacheMisses_Numerics32 63 3.868 ns 0.0145 ns 0.0135 ns 0.0035 ns 3.872 ns 3.845 ns 3.855 ns 3.877 ns 3.893 ns 258,532,864.7
NoCacheMisses_Scalar 63 5.215 ns 0.0319 ns 0.0298 ns 0.0077 ns 5.214 ns 5.173 ns 5.194 ns 5.247 ns 5.271 ns 191,771,595.3
NoCacheMisses_Framework_MemoryExtensions_Reference 63 37.978 ns 0.2159 ns 0.2020 ns 0.0522 ns 38.008 ns 37.638 ns 37.806 ns 38.141 ns 38.272 ns 26,330,717.3
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 63 2.524 ns 0.0438 ns 0.0409 ns 0.0106 ns 2.522 ns 2.461 ns 2.488 ns 2.552 ns 2.604 ns 396,209,182.4
NoCacheMisses_Numerics32 63 1.919 ns 0.0261 ns 0.0244 ns 0.0063 ns 1.928 ns 1.874 ns 1.896 ns 1.943 ns 1.949 ns 520,976,071.0
WithCacheMisses_Scalar 64 5.885 ns 0.0180 ns 0.0168 ns 0.0043 ns 5.884 ns 5.858 ns 5.871 ns 5.895 ns 5.922 ns 169,936,582.9
WithCacheMisses_Framework_MemoryExtensions_Reference 64 29.755 ns 0.2103 ns 0.1967 ns 0.0508 ns 29.813 ns 29.232 ns 29.724 ns 29.881 ns 29.959 ns 33,607,842.9
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 64 2.604 ns 0.0257 ns 0.0241 ns 0.0062 ns 2.611 ns 2.568 ns 2.578 ns 2.620 ns 2.642 ns 384,088,445.4
WithCacheMisses_Numerics32 64 3.695 ns 0.0171 ns 0.0152 ns 0.0041 ns 3.698 ns 3.657 ns 3.691 ns 3.703 ns 3.717 ns 270,652,756.6
NoCacheMisses_Scalar 64 4.342 ns 0.0544 ns 0.0509 ns 0.0131 ns 4.354 ns 4.232 ns 4.305 ns 4.374 ns 4.428 ns 230,317,195.5
NoCacheMisses_Framework_MemoryExtensions_Reference 64 37.705 ns 0.1402 ns 0.1243 ns 0.0332 ns 37.710 ns 37.483 ns 37.641 ns 37.771 ns 37.936 ns 26,521,770.4
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 64 2.545 ns 0.0324 ns 0.0303 ns 0.0078 ns 2.547 ns 2.492 ns 2.529 ns 2.565 ns 2.601 ns 392,950,298.1
NoCacheMisses_Numerics32 64 2.479 ns 0.0299 ns 0.0280 ns 0.0072 ns 2.478 ns 2.429 ns 2.456 ns 2.500 ns 2.523 ns 403,340,387.3
WithCacheMisses_Scalar 127 12.986 ns 0.0686 ns 0.0641 ns 0.0166 ns 13.012 ns 12.864 ns 12.949 ns 13.039 ns 13.046 ns 77,008,155.9
WithCacheMisses_Framework_MemoryExtensions_Reference 127 30.511 ns 0.1011 ns 0.0945 ns 0.0244 ns 30.497 ns 30.315 ns 30.475 ns 30.575 ns 30.681 ns 32,774,894.3
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 127 5.473 ns 0.0309 ns 0.0289 ns 0.0075 ns 5.481 ns 5.418 ns 5.447 ns 5.493 ns 5.520 ns 182,702,437.5
WithCacheMisses_Numerics32 127 6.311 ns 0.0205 ns 0.0182 ns 0.0049 ns 6.305 ns 6.290 ns 6.300 ns 6.316 ns 6.359 ns 158,464,738.7
NoCacheMisses_Scalar 127 8.854 ns 0.0326 ns 0.0305 ns 0.0079 ns 8.845 ns 8.818 ns 8.828 ns 8.887 ns 8.915 ns 112,938,761.3
NoCacheMisses_Framework_MemoryExtensions_Reference 127 36.703 ns 0.3040 ns 0.2844 ns 0.0734 ns 36.793 ns 35.832 ns 36.592 ns 36.896 ns 36.984 ns 27,245,859.6
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 127 4.445 ns 0.0362 ns 0.0339 ns 0.0088 ns 4.436 ns 4.390 ns 4.426 ns 4.475 ns 4.523 ns 224,947,350.5
NoCacheMisses_Numerics32 127 3.972 ns 0.0435 ns 0.0407 ns 0.0105 ns 3.984 ns 3.877 ns 3.940 ns 3.993 ns 4.035 ns 251,787,457.3
WithCacheMisses_Scalar 128 11.088 ns 0.0423 ns 0.0396 ns 0.0102 ns 11.092 ns 10.990 ns 11.068 ns 11.117 ns 11.138 ns 90,188,786.0
WithCacheMisses_Framework_MemoryExtensions_Reference 128 31.403 ns 0.1366 ns 0.1277 ns 0.0330 ns 31.403 ns 31.200 ns 31.326 ns 31.473 ns 31.650 ns 31,844,124.7
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 128 4.082 ns 0.0255 ns 0.0239 ns 0.0062 ns 4.090 ns 4.037 ns 4.057 ns 4.101 ns 4.112 ns 244,955,819.6
WithCacheMisses_Numerics32 128 5.799 ns 0.0305 ns 0.0285 ns 0.0074 ns 5.802 ns 5.722 ns 5.789 ns 5.810 ns 5.841 ns 172,455,043.8
NoCacheMisses_Scalar 128 8.225 ns 0.0236 ns 0.0209 ns 0.0056 ns 8.223 ns 8.196 ns 8.204 ns 8.243 ns 8.257 ns 121,586,856.7
NoCacheMisses_Framework_MemoryExtensions_Reference 128 38.654 ns 0.2179 ns 0.2038 ns 0.0526 ns 38.700 ns 38.099 ns 38.565 ns 38.816 ns 38.941 ns 25,870,755.5
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 128 4.462 ns 0.0234 ns 0.0219 ns 0.0057 ns 4.460 ns 4.432 ns 4.446 ns 4.477 ns 4.504 ns 224,109,920.2
NoCacheMisses_Numerics32 128 4.355 ns 0.0332 ns 0.0310 ns 0.0080 ns 4.359 ns 4.294 ns 4.341 ns 4.377 ns 4.401 ns 229,646,402.7
WithCacheMisses_Scalar 255 20.899 ns 0.0742 ns 0.0619 ns 0.0172 ns 20.891 ns 20.792 ns 20.859 ns 20.941 ns 21.031 ns 47,848,074.0
WithCacheMisses_Framework_MemoryExtensions_Reference 255 32.555 ns 0.2270 ns 0.1772 ns 0.0512 ns 32.621 ns 32.198 ns 32.483 ns 32.671 ns 32.741 ns 30,717,228.8
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 255 10.341 ns 0.0357 ns 0.0334 ns 0.0086 ns 10.344 ns 10.273 ns 10.314 ns 10.363 ns 10.392 ns 96,703,203.0
WithCacheMisses_Numerics32 255 11.791 ns 0.0801 ns 0.0749 ns 0.0193 ns 11.779 ns 11.626 ns 11.758 ns 11.860 ns 11.900 ns 84,810,322.5
NoCacheMisses_Scalar 255 15.132 ns 0.2896 ns 0.2709 ns 0.0700 ns 15.261 ns 14.614 ns 14.750 ns 15.298 ns 15.389 ns 66,086,357.4
NoCacheMisses_Framework_MemoryExtensions_Reference 255 44.484 ns 0.2853 ns 0.2668 ns 0.0689 ns 44.486 ns 43.960 ns 44.292 ns 44.693 ns 44.989 ns 22,480,115.6
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 255 5.991 ns 0.0382 ns 0.0338 ns 0.0090 ns 6.002 ns 5.907 ns 5.979 ns 6.014 ns 6.028 ns 166,908,800.4
NoCacheMisses_Numerics32 255 9.036 ns 0.0859 ns 0.0803 ns 0.0207 ns 9.041 ns 8.880 ns 8.963 ns 9.082 ns 9.201 ns 110,669,273.6
WithCacheMisses_Scalar 256 18.742 ns 0.1318 ns 0.1232 ns 0.0318 ns 18.741 ns 18.551 ns 18.631 ns 18.880 ns 18.926 ns 53,355,791.4
WithCacheMisses_Framework_MemoryExtensions_Reference 256 32.751 ns 0.0972 ns 0.0812 ns 0.0225 ns 32.764 ns 32.570 ns 32.725 ns 32.810 ns 32.837 ns 30,533,366.3
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 256 8.362 ns 0.1067 ns 0.0998 ns 0.0258 ns 8.393 ns 8.181 ns 8.260 ns 8.429 ns 8.511 ns 119,592,270.2
WithCacheMisses_Numerics32 256 9.967 ns 0.0309 ns 0.0289 ns 0.0075 ns 9.960 ns 9.910 ns 9.954 ns 9.990 ns 10.023 ns 100,328,139.0
NoCacheMisses_Scalar 256 12.450 ns 0.4240 ns 0.9216 ns 0.1221 ns 11.992 ns 11.674 ns 11.909 ns 12.938 ns 14.187 ns 80,318,569.3
NoCacheMisses_Framework_MemoryExtensions_Reference 256 46.095 ns 2.3561 ns 4.7055 ns 0.6722 ns 43.575 ns 42.695 ns 43.399 ns 48.911 ns 54.486 ns 21,694,150.5
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 256 5.932 ns 0.0351 ns 0.0328 ns 0.0085 ns 5.934 ns 5.841 ns 5.921 ns 5.962 ns 5.970 ns 168,573,183.2
NoCacheMisses_Numerics32 256 9.336 ns 0.0937 ns 0.0876 ns 0.0226 ns 9.340 ns 9.151 ns 9.284 ns 9.394 ns 9.453 ns 107,111,443.5
WithCacheMisses_Scalar 1024 76.956 ns 0.6305 ns 0.5898 ns 0.1523 ns 76.823 ns 76.233 ns 76.480 ns 77.408 ns 78.328 ns 12,994,485.0
WithCacheMisses_Framework_MemoryExtensions_Reference 1024 57.619 ns 1.1388 ns 2.1389 ns 0.3225 ns 58.217 ns 54.578 ns 55.414 ns 59.728 ns 60.457 ns 17,355,445.7
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 1024 36.635 ns 0.7323 ns 1.2826 ns 0.2054 ns 36.343 ns 34.801 ns 35.286 ns 38.045 ns 38.669 ns 27,296,657.4
WithCacheMisses_Numerics32 1024 40.593 ns 0.1193 ns 0.1058 ns 0.0283 ns 40.579 ns 40.399 ns 40.531 ns 40.695 ns 40.762 ns 24,634,629.1
NoCacheMisses_Scalar 1024 57.850 ns 0.3405 ns 0.3185 ns 0.0822 ns 57.836 ns 57.145 ns 57.642 ns 58.036 ns 58.512 ns 17,286,216.2
NoCacheMisses_Framework_MemoryExtensions_Reference 1024 56.153 ns 0.1210 ns 0.1132 ns 0.0292 ns 56.192 ns 55.942 ns 56.068 ns 56.241 ns 56.337 ns 17,808,441.8
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 1024 18.321 ns 0.1599 ns 0.1496 ns 0.0386 ns 18.373 ns 18.031 ns 18.218 ns 18.419 ns 18.510 ns 54,581,348.5
NoCacheMisses_Numerics32 1024 21.678 ns 0.9911 ns 2.1545 ns 0.2854 ns 20.512 ns 20.047 ns 20.455 ns 22.966 ns 25.570 ns 46,130,081.2
WithCacheMisses_Scalar 2048 144.293 ns 2.3634 ns 2.2107 ns 0.5708 ns 145.811 ns 140.908 ns 141.614 ns 146.104 ns 146.463 ns 6,930,347.2
WithCacheMisses_Framework_MemoryExtensions_Reference 2048 99.064 ns 1.9805 ns 3.6711 ns 0.5598 ns 99.292 ns 93.607 ns 95.017 ns 102.632 ns 103.966 ns 10,094,466.0
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 2048 66.685 ns 1.2951 ns 1.8156 ns 0.3494 ns 66.363 ns 63.594 ns 65.609 ns 68.379 ns 69.918 ns 14,995,963.2
WithCacheMisses_Numerics32 2048 78.476 ns 1.5569 ns 3.7001 ns 0.4520 ns 79.074 ns 71.849 ns 75.155 ns 81.738 ns 83.972 ns 12,742,706.1
NoCacheMisses_Scalar 2048 105.810 ns 0.6481 ns 0.6063 ns 0.1565 ns 105.902 ns 104.620 ns 105.251 ns 106.171 ns 106.864 ns 9,450,913.4
NoCacheMisses_Framework_MemoryExtensions_Reference 2048 73.506 ns 0.4784 ns 0.4475 ns 0.1155 ns 73.490 ns 72.628 ns 73.145 ns 73.805 ns 74.222 ns 13,604,403.2
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 2048 34.236 ns 0.4983 ns 0.4661 ns 0.1203 ns 34.325 ns 33.040 ns 34.041 ns 34.535 ns 34.801 ns 29,208,777.4
NoCacheMisses_Numerics32 2048 41.856 ns 1.2147 ns 2.4258 ns 0.3465 ns 40.639 ns 39.600 ns 40.255 ns 43.499 ns 46.447 ns 23,891,672.5
WithCacheMisses_Scalar 4096 272.316 ns 2.5880 ns 2.4208 ns 0.6250 ns 271.579 ns 269.424 ns 270.183 ns 274.098 ns 277.075 ns 3,672,209.9
WithCacheMisses_Framework_MemoryExtensions_Reference 4096 166.616 ns 3.3002 ns 5.1380 ns 0.9083 ns 166.117 ns 158.517 ns 163.031 ns 170.929 ns 175.939 ns 6,001,813.1
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 4096 131.444 ns 2.6142 ns 5.8471 ns 0.7549 ns 132.648 ns 122.321 ns 124.767 ns 137.112 ns 140.848 ns 7,607,803.4
WithCacheMisses_Numerics32 4096 144.932 ns 2.8370 ns 5.4660 ns 0.8059 ns 145.132 ns 135.731 ns 140.502 ns 147.944 ns 155.528 ns 6,899,807.8
NoCacheMisses_Scalar 4096 202.598 ns 0.5425 ns 0.4809 ns 0.1285 ns 202.502 ns 201.733 ns 202.313 ns 202.959 ns 203.477 ns 4,935,886.0
NoCacheMisses_Framework_MemoryExtensions_Reference 4096 107.508 ns 1.0976 ns 1.0267 ns 0.2651 ns 107.730 ns 104.206 ns 107.475 ns 108.130 ns 108.392 ns 9,301,621.1
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 4096 70.332 ns 0.6096 ns 0.5702 ns 0.1472 ns 70.340 ns 69.318 ns 69.891 ns 70.678 ns 71.354 ns 14,218,198.8
NoCacheMisses_Numerics32 4096 91.636 ns 0.3744 ns 0.3319 ns 0.0887 ns 91.602 ns 91.121 ns 91.379 ns 91.859 ns 92.254 ns 10,912,698.1
WithCacheMisses_Scalar 65536 6,505.342 ns 108.5906 ns 101.5757 ns 26.2267 ns 6,532.203 ns 6,355.362 ns 6,407.766 ns 6,564.026 ns 6,706.047 ns 153,719.8
WithCacheMisses_Framework_MemoryExtensions_Reference 65536 5,088.281 ns 101.6856 ns 267.8805 ns 29.7645 ns 5,053.463 ns 4,578.139 ns 4,898.362 ns 5,245.618 ns 5,762.948 ns 196,530.0
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 65536 4,735.168 ns 92.5417 ns 164.4927 ns 26.0086 ns 4,718.358 ns 4,483.191 ns 4,616.213 ns 4,808.853 ns 5,201.822 ns 211,185.7
WithCacheMisses_Numerics32 65536 5,004.260 ns 100.0020 ns 259.9184 ns 29.2431 ns 4,995.110 ns 4,598.605 ns 4,776.285 ns 5,189.003 ns 5,799.644 ns 199,829.8
NoCacheMisses_Scalar 65536 3,258.270 ns 19.4264 ns 18.1715 ns 4.6919 ns 3,263.016 ns 3,217.038 ns 3,243.190 ns 3,270.452 ns 3,285.077 ns 306,911.4
NoCacheMisses_Framework_MemoryExtensions_Reference 65536 1,305.216 ns 14.8057 ns 13.8492 ns 3.5759 ns 1,307.191 ns 1,278.557 ns 1,291.915 ns 1,314.801 ns 1,328.725 ns 766,156.6
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 65536 1,284.245 ns 14.0875 ns 13.1775 ns 3.4024 ns 1,281.738 ns 1,265.029 ns 1,275.725 ns 1,299.082 ns 1,311.044 ns 778,667.4
NoCacheMisses_Numerics32 65536 1,660.970 ns 18.3391 ns 17.1544 ns 4.4292 ns 1,660.886 ns 1,632.182 ns 1,644.318 ns 1,678.647 ns 1,686.014 ns 602,057.7
WithCacheMisses_Scalar 16777216 1,790,063.615 ns 19,352.9046 ns 16,160.5518 ns 4,482.1306 ns 1,792,927.000 ns 1,766,844.000 ns 1,777,034.000 ns 1,802,822.500 ns 1,817,697.000 ns 558.6
WithCacheMisses_Framework_MemoryExtensions_Reference 16777216 1,573,927.409 ns 31,257.8623 ns 38,387.4621 ns 8,184.2344 ns 1,568,203.500 ns 1,517,855.000 ns 1,546,941.000 ns 1,597,107.000 ns 1,668,936.000 ns 635.4
WithCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 16777216 1,562,446.125 ns 30,645.1134 ns 47,710.7495 ns 8,434.1486 ns 1,557,784.500 ns 1,488,429.000 ns 1,520,518.000 ns 1,596,370.500 ns 1,647,407.000 ns 640.0
WithCacheMisses_Numerics32 16777216 1,642,290.783 ns 32,571.2924 ns 62,753.8227 ns 9,252.5464 ns 1,637,004.500 ns 1,507,111.000 ns 1,596,210.000 ns 1,689,011.000 ns 1,765,254.000 ns 608.9
NoCacheMisses_Scalar 16777216 1,816,782.995 ns 33,357.0441 ns 31,202.1997 ns 8,056.3733 ns 1,828,150.703 ns 1,748,752.852 ns 1,786,762.031 ns 1,839,257.539 ns 1,863,783.516 ns 550.4
NoCacheMisses_Framework_MemoryExtensions_Reference 16777216 1,547,955.204 ns 30,738.0299 ns 60,673.8723 ns 8,757.5191 ns 1,547,845.381 ns 1,437,365.499 ns 1,494,152.999 ns 1,585,455.928 ns 1,694,405.538 ns 646.0
NoCacheMisses_Framework_SequenceCompareTo_Internal_Ben5f6dcb8 16777216 1,436,923.867 ns 13,795.1501 ns 12,903.9920 ns 3,331.7964 ns 1,436,730.156 ns 1,416,896.758 ns 1,427,511.211 ns 1,447,286.406 ns 1,460,044.219 ns 695.9
NoCacheMisses_Numerics32 16777216 1,563,701.726 ns 32,323.0548 ns 61,497.9710 ns 9,167.5762 ns 1,547,631.436 ns 1,474,750.771 ns 1,514,980.752 ns 1,606,901.553 ns 1,738,795.303 ns 639.5

@benaadams
Copy link
Member Author

@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please

@redknightlois
Copy link

Before you say anything, this is my first time with the 3.0 codegen so I am a bit sketchy in the details of how Intrisics are treated on the 3.0 JIT. So, I have a few questions regarding the assembler code I am looking at.

image

Also I see calls everywhere, which doesnt make sense to me... could tiered JIT be playing me a hardball maybe?

@benaadams
Copy link
Member Author

Can switch it off with the env variable:

set COMPlus_TieredCompilation=0

@redknightlois
Copy link

Thanks I was looking at how to do that, and couldnt find the relevant bit...

@fiigii
Copy link

fiigii commented Jan 24, 2019

@AndyAyersMS Shall we consider https://github.com/dotnet/coreclr/issues/14474 for 3.0? The current Tier0 does not guarantee "lower compilation cost"...

@benaadams
Copy link
Member Author

Would treating methods marked as [Intrinsic] as if they had [AggressiveOptimization] #20009 help any? Or is Tier0 not going to look at the other methods anyway since it doesn't inline?

/cc @kouvel

@jkotas
Copy link
Member

jkotas commented Jan 24, 2019

Would treating methods marked as [Intrinsic] as if they had [AggressiveOptimization]

I do not think that would help. The method you want to do the AggressiveOptimization for here is SequenceCompareTo that is not marked as [Intrinsic].

I think it may be a good idea to mark these low-level optimized methods like SequenceCompareTo with AggressiveOptimization since they are likely to be called a lot, and their performance without optimizations is pathetic.

@fiigii
Copy link

fiigii commented Jan 24, 2019

At a guess the 8388608 size one would be evicting cache as it goes along?

I guess that is related to alignment, but need more detailed data (e.g., cache-line split counter via VTune) for future analysis.

Copy link

@fiigii fiigii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AndyAyersMS
Copy link
Member

@fiigii I still think doing some cheap/obvious opts in Tier0 makes sense, but am reluctant at this point to make changes in Tier0 for 3.0. I don't think we're ready to have Tier0 to diverge from minopts just yet (to keep the test matrix somewhat contained) and we don't want to risk changes in minopts... we have made what appeared to be simple/safe changes there in the past and been surprised to find we'd gotten it wrong, eg #13245 lead to #16928 and #20047.

@benaadams you'd want the callers of intrinsic methods to be optimized aggressively by default. Seems cleaner to just call out the specific methods that should always be optimized.

@benaadams
Copy link
Member Author

I think it may be a good idea to mark these low-level optimized methods like SequenceCompareTo with AggressiveOptimization since they are likely to be called a lot, and their performance without optimizations is

Added PR: #22191

@tannergooding
Copy link
Member

Would treating methods marked as [Intrinsic] as if they had [AggressiveOptimization]

The majority of the hardware intrinsics are force-expanded and have to be inlined anyways (as is evident by actual codgen). It looks like the actually inefficiency is from the JIT not dropping the "dead" code-paths.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for adding the clarifying comments.

@jkotas jkotas merged commit 665593e into dotnet:master Jan 25, 2019
@benaadams benaadams deleted the IndexOf+SequenceCompareTo branch January 25, 2019 02:51
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corert that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/mono that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
marek-safar pushed a commit to mono/mono that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot pushed a commit to Dotnet-GitSync-Bot/corefx that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub pushed a commit to dotnet/corefx that referenced this pull request Jan 25, 2019
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@benaadams benaadams changed the title Speedup .SequenceCompareTo(byte, ...) Intrinsicify .SequenceCompareTo(byte, ...) Feb 3, 2019
@fiigii
Copy link

fiigii commented Feb 14, 2019

I don't think we're ready to have Tier0 to diverge from minopts just yet (to keep the test matrix somewhat contained) and we don't want to risk changes in minopts...

@AndyAyersMS If we want to diverge Tier0 from minopts in the future, can we consider to create a new compiler for Tier0? Perhaps, a simpler (and faster) one-pass compiler (similar to Rotor's FJIT) is better for this pure JIT environment.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Speedup .SequenceCompareTo(byte, ...)

* Rename jump location

* Better annotations for clarity


Commit migrated from dotnet/coreclr@665593e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants