-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
156f782
to
4660ed6
Compare
6b9b723
to
6f62eb3
Compare
Any perf numbers? |
Method | Position | Mean |
-------------------- |--------- |-----------:|
- String_IndexOf | 7 | 5.031 ns |
+ String_IndexOf | 7 | 4.403 ns |
- String_IndexOfAny_2 | 7 | 9.353 ns |
+ String_IndexOfAny_2 | 7 | 9.097 ns |
- String_IndexOfAny_3 | 7 | 11.177 ns |
+ String_IndexOfAny_3 | 7 | 10.270 ns |
- String_IndexOfAny_4 | 7 | 13.400 ns |
+ String_IndexOfAny_4 | 7 | 11.986 ns |
- String_IndexOfAny_5 | 7 | 14.540 ns |
+ String_IndexOfAny_5 | 7 | 12.865 ns |
- String_IndexOf | 8 | 5.053 ns |
+ String_IndexOf | 8 | 4.400 ns |
- String_IndexOfAny_2 | 8 | 9.716 ns |
+ String_IndexOfAny_2 | 8 | 9.133 ns |
- String_IndexOfAny_3 | 8 | 11.476 ns |
+ String_IndexOfAny_3 | 8 | 11.137 ns |
- String_IndexOfAny_4 | 8 | 14.236 ns |
+ String_IndexOfAny_4 | 8 | 13.529 ns |
- String_IndexOfAny_5 | 8 | 15.362 ns |
+ String_IndexOfAny_5 | 8 | 13.997 ns |
- String_IndexOf | 15 | 8.817 ns |
+ String_IndexOf | 15 | 4.577 ns |
- String_IndexOfAny_2 | 15 | 11.978 ns |
+ String_IndexOfAny_2 | 15 | 8.620 ns |
- String_IndexOfAny_3 | 15 | 12.835 ns |
+ String_IndexOfAny_3 | 15 | 10.462 ns |
- String_IndexOfAny_4 | 15 | 14.271 ns |
+ String_IndexOfAny_4 | 15 | 11.411 ns |
- String_IndexOfAny_5 | 15 | 15.660 ns |
+ String_IndexOfAny_5 | 15 | 11.894 ns |
- String_IndexOf | 16 | 8.223 ns |
+ String_IndexOf | 16 | 4.697 ns |
- String_IndexOfAny_2 | 16 | 11.919 ns |
+ String_IndexOfAny_2 | 16 | 8.616 ns |
- String_IndexOfAny_3 | 16 | 12.857 ns |
+ String_IndexOfAny_3 | 16 | 10.538 ns |
- String_IndexOfAny_4 | 16 | 14.544 ns |
+ String_IndexOfAny_4 | 16 | 11.411 ns |
- String_IndexOfAny_5 | 16 | 15.032 ns |
+ String_IndexOfAny_5 | 16 | 11.899 ns |
- String_IndexOf | 31 | 9.292 ns |
+ String_IndexOf | 31 | 5.226 ns |
- String_IndexOfAny_2 | 31 | 12.521 ns |
+ String_IndexOfAny_2 | 31 | 9.206 ns |
- String_IndexOfAny_3 | 31 | 14.332 ns |
+ String_IndexOfAny_3 | 31 | 11.389 ns |
- String_IndexOfAny_4 | 31 | 15.877 ns |
+ String_IndexOfAny_4 | 31 | 12.327 ns |
- String_IndexOfAny_5 | 31 | 17.119 ns |
+ String_IndexOfAny_5 | 31 | 13.023 ns |
- String_IndexOf | 32 | 9.276 ns |
+ String_IndexOf | 32 | 5.224 ns |
- String_IndexOfAny_2 | 32 | 12.519 ns |
+ String_IndexOfAny_2 | 32 | 9.208 ns |
- String_IndexOfAny_3 | 32 | 14.990 ns |
+ String_IndexOfAny_3 | 32 | 11.387 ns |
- String_IndexOfAny_4 | 32 | 15.861 ns |
+ String_IndexOfAny_4 | 32 | 13.056 ns |
- String_IndexOfAny_5 | 32 | 16.744 ns |
+ String_IndexOfAny_5 | 32 | 13.114 ns |
- String_IndexOf | 63 | 11.307 ns |
+ String_IndexOf | 63 | 6.991 ns |
- String_IndexOfAny_2 | 63 | 14.935 ns |
+ String_IndexOfAny_2 | 63 | 10.207 ns |
- String_IndexOfAny_3 | 63 | 16.546 ns |
+ String_IndexOfAny_3 | 63 | 12.833 ns |
- String_IndexOfAny_4 | 63 | 18.757 ns |
+ String_IndexOfAny_4 | 63 | 14.488 ns |
- String_IndexOfAny_5 | 63 | 20.503 ns |
+ String_IndexOfAny_5 | 63 | 16.153 ns |
- String_IndexOf | 64 | 11.196 ns |
+ String_IndexOf | 64 | 6.990 ns |
- String_IndexOfAny_2 | 64 | 14.828 ns |
+ String_IndexOfAny_2 | 64 | 10.332 ns |
- String_IndexOfAny_3 | 64 | 17.834 ns |
+ String_IndexOfAny_3 | 64 | 13.441 ns |
- String_IndexOfAny_4 | 64 | 18.735 ns |
+ String_IndexOfAny_4 | 64 | 14.534 ns |
- String_IndexOfAny_5 | 64 | 20.200 ns |
+ String_IndexOfAny_5 | 64 | 15.891 ns |
- String_IndexOf | 127 | 14.386 ns |
+ String_IndexOf | 127 | 9.050 ns |
- String_IndexOfAny_2 | 127 | 21.806 ns |
+ String_IndexOfAny_2 | 127 | 12.716 ns |
- String_IndexOfAny_3 | 127 | 24.639 ns |
+ String_IndexOfAny_3 | 127 | 15.658 ns |
- String_IndexOfAny_4 | 127 | 29.164 ns |
+ String_IndexOfAny_4 | 127 | 19.461 ns |
- String_IndexOfAny_5 | 127 | 33.225 ns |
+ String_IndexOfAny_5 | 127 | 23.109 ns |
- String_IndexOf | 128 | 14.643 ns |
+ String_IndexOf | 128 | 9.228 ns |
- String_IndexOfAny_2 | 128 | 21.848 ns |
+ String_IndexOfAny_2 | 128 | 12.544 ns |
- String_IndexOfAny_3 | 128 | 27.099 ns |
+ String_IndexOfAny_3 | 128 | 15.608 ns |
- String_IndexOfAny_4 | 128 | 29.107 ns |
+ String_IndexOfAny_4 | 128 | 19.520 ns |
- String_IndexOfAny_5 | 128 | 33.325 ns |
+ String_IndexOfAny_5 | 128 | 23.233 ns |
- String_IndexOf | 255 | 21.395 ns |
+ String_IndexOf | 255 | 13.919 ns |
- String_IndexOfAny_2 | 255 | 30.815 ns |
+ String_IndexOfAny_2 | 255 | 17.898 ns |
- String_IndexOfAny_3 | 255 | 38.530 ns |
+ String_IndexOfAny_3 | 255 | 22.278 ns |
- String_IndexOfAny_4 | 255 | 40.402 ns |
+ String_IndexOfAny_4 | 255 | 30.889 ns |
- String_IndexOfAny_5 | 255 | 47.078 ns |
+ String_IndexOfAny_5 | 255 | 36.649 ns |
- String_IndexOf | 256 | 21.452 ns |
+ String_IndexOf | 256 | 13.768 ns |
- String_IndexOfAny_2 | 256 | 30.896 ns |
+ String_IndexOfAny_2 | 256 | 17.715 ns |
- String_IndexOfAny_3 | 256 | 33.635 ns |
+ String_IndexOfAny_3 | 256 | 22.412 ns |
- String_IndexOfAny_4 | 256 | 40.521 ns |
+ String_IndexOfAny_4 | 256 | 30.565 ns |
- String_IndexOfAny_5 | 256 | 45.958 ns |
+ String_IndexOfAny_5 | 256 | 36.775 ns |
- String_IndexOf | 1023 | 68.075 ns |
+ String_IndexOf | 1023 | 49.797 ns |
- String_IndexOfAny_2 | 1023 | 88.839 ns |
+ String_IndexOfAny_2 | 1023 | 68.134 ns |
- String_IndexOfAny_3 | 1023 | 97.165 ns |
+ String_IndexOfAny_3 | 1023 | 76.172 ns |
- String_IndexOfAny_4 | 1023 | 116.962 ns |
+ String_IndexOfAny_4 | 1023 | 113.140 ns |
- String_IndexOfAny_5 | 1023 | 131.489 ns |
+ String_IndexOfAny_5 | 1023 | 133.238 ns |
- String_IndexOf | 1024 | 68.530 ns |
+ String_IndexOf | 1024 | 49.741 ns |
- String_IndexOfAny_2 | 1024 | 89.023 ns |
+ String_IndexOfAny_2 | 1024 | 68.261 ns |
- String_IndexOfAny_3 | 1024 | 97.223 ns |
+ String_IndexOfAny_3 | 1024 | 76.760 ns |
- String_IndexOfAny_4 | 1024 | 117.060 ns |
+ String_IndexOfAny_4 | 1024 | 113.144 ns |
- String_IndexOfAny_5 | 1024 | 131.619 ns |
+ String_IndexOfAny_5 | 1024 | 132.635 ns | |
coreclr-ci error is |
6f62eb3
to
063e9e8
Compare
063e9e8
to
baec140
Compare
baec140
to
c7836d5
Compare
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx @dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx @dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx |
// Note that MoveMask has converted the equal vector elements into a set of bit flags, | ||
// So the bit position in 'matches' corresponds to the element offset. | ||
// We preform the Or at non-Vector level as we are using the maximum number of non-preserved registers, | ||
// and more causes them first to be pushed to stack and then popped on exit to preseve their values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the prolog/epilog code have critical perf impact? I suggest doing vector-level OR to avoid the long latency MoveMask
.
Btw, only Windows has callee-saved SIMD registers, so vector-level OR should improve more on Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is the push and pops are a fixed cost; so it adds 64 bytes of write to stack and 64 bytes of read from stack even if the string is only 8 bytes long, so it has a significant impact for short lengths.
For longer lengths ( > 512 bytes) it gets amortized; but its use to check for invalid chars on short lengths quite often (html chars in a element name, special chars in a file search pattern etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't mean there isn't scope for a follow up 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. But I guess we probably can replace more MoveMask
by Or
.
@jkotas you asked in #21730 the performance effects of switching wcslen over to this implementation (last commit c7836d5)
I tried a preamble to align to UIntPtr then processed UIntPtr sizes with bittwiddling in the Sequential section; but that made it worse (branch forwards were better than the always run extra instructions) |
Should be good to go/review; mostly copying byte version, though also changed byte version to do the
|
Going to split this in to separate PRs |
Use
System.Runtime.Intrinsics.X86
in addition toSystem.Numerics.Vector
and apply the learnings ofbyte
#22118 and #22127 tochar
.Split this into separate commits so its easier to follow; thought the first one is kinda messy.
nLength
tolengthToExamine
as it was confusing me)Methods improved
Perf numbers #22187 (comment)
/cc @CarolEidt @fiigii @tannergooding @jkotas