-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a SearchValues implementation for values with unique low nibbles #106900
Add a SearchValues implementation for values with unique low nibbles #106900
Conversation
Tagging subscribers to this area: @dotnet/area-system-memory |
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Show resolved
Hide resolved
{ | ||
// Avoid false positives for the zero character if no other character has a low nibble of zero. | ||
// We can replace it with any other byte that has a non-zero low nibble. | ||
valuesByLowNibble.SetElementUnsafe(0, (byte)1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully grok this. Why don't we need to check if 1 is already being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All vector elements start out as 0, and not all of them may be initialized.
We map every input character to an element based on its lower nibble.
0, 16, 32 ... => valuesByLowNibble[0]
1, 17, 33 ... => valuesByLowNibble[1]
15, 31, 47 ... => valuesByLowNibble[15]
The search works by first picking a potential match based on the low nibble (Shuffle
) and then confirming it (Equals
).
This means that input characters with a given low nibble only care about the element of valuesByLowNibble
for that nibble. Values like 1 or 2 don't care about what the value of valuesByLowNibble[7]
is since they'll never be mapped to it.
This also means that it's okay for valuesByLowNibble
to be left uninitialized at 0.
The Equals
could only match for an input character 0, but those will always be mapped to valuesByLowNibble[0]
by the shuffle instead.
The edge case is the 0th nibble since the character 0 could be a false positive there.
But it'll only be a false positive if we don't have the character 0 in our values.
That's the valuesByLowNibble.GetElement(0) == 0 && !lookup.Contains(0)
check above.
To avoid false positives for 0, we can use the same trick of setting the element to some "unreachable" value.
We can use any value with a non-zero nibble, as the shuffle will map any inputs with those values to a different element. 1 is just an arbitrary choice.
Edit: I tweaked the comment a bit, hopefully, it's decipherable.
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Outdated
Show resolved
Hide resolved
d2ae610
to
fe3ae67
Compare
src/libraries/System.Private.CoreLib/src/System/SearchValues/IndexOfAnyAsciiSearcher.cs
Outdated
Show resolved
Hide resolved
…otnet#106900) * Add SearchValues implementation for values with unique low nibbles * More generics * Tweak comment * Remove extra empty line * Update comment
…otnet#106900) * Add SearchValues implementation for values with unique low nibbles * More generics * Tweak comment * Remove extra empty line * Update comment
Based on http://0x80.pl/articles/simd-byte-lookup.html#special-case-3-unique-lower-and-higher-nibbles
If all of the values have a different low nibble, we can use a faster search that takes advantage of that fact.
For example, this applies to the
"Sherlock|Holmes|Watson|Irene|Adler|John|Baker"
regex pattern which usesSearchValues.Create("ABHIJSW")
.As a comparison, the current core lookup for an ASCII set on AVX2 uses: 2 and, 1 shift, 2 shuffles
Where the core lookup for values with unique low nibbles uses: 1 comparison, 1 shuffle
(code-wise, most of the implementation in this PR is a copy-paste of the existing ASCII logic, swapping out this core lookup routine)
Consider a benchmark inspired by @lemire's https://lemire.me/blog/2024/07/05/scan-html-faster-with-simd-instructions-net-c-edition/
In this case, we're scanning UTF8 input for bytes relevant to HTML (
<
,&
,\r
and\0
).Previously,
SearchValues
would pick the same implementation asspan.IndexOfAny(4 values)
.The blog post highlights that a hand-written approach can beat
SearchValues
in this case -- not anymore :)This approach doubles the searching performance on my AVX2 CPU (Ryzen 1700).
On ARM, it's a 1.6X improvement.
Compared to the implementation for an arbitrary ASCII set, this improves throughput between 1.2x and 1.5x depending on the hardware (see more numbers below).
The
UniqueLowNibble
approach could be used a lot more aggresively (see benchmarks below).I conservatively placed it between 3 and 4 values to minimize the risk of regressions for now.
In practice, we're currently only using
SearchValues
with 4 or more values across runtime/aspnet.As a follow up, I plan on changing our heuristics around which approach we pick in
SearchValues
depending on the platform.After that, we may want to consider using it even with fewer values (e.g. 2 or 3).
We should also consider using
PackedSpanHelpers
on ARM.Searching for any subset of ASCII is currently faster than a basic
IndexOf('a')
on M1 hardware because we're not doing that.Throughput numbers for scanning through 10k elements (10k bytes or 10k chars).
Rows are ordered from fastest to slowest.
ARM (Apple M1)
ARM (Azure D8plsv5 VM)
X64 with Vector256 (i9-10900X - no full Avx512)
X64 with Vector256 (Ryzen 1700)
X64 with Vector512 (Xeon Platinum 8370C)