-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Vectorize some Guid APIs (ctor, TryWriteBytes) #21336
Conversation
@@ -841,8 +841,12 @@ public bool Equals(Guid g) | |||
// Now compare each of the elements | |||
return g._a == _a && | |||
Unsafe.Add(ref g._a, 1) == Unsafe.Add(ref _a, 1) && | |||
#if BIT64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to do this for the first check as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas in case if two guids are different (both random) - it becomes slower (see Equals_2
in the table - it does two ulong-checks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- changed to be two ulong *
@@ -841,8 +841,12 @@ public bool Equals(Guid g) | |||
// Now compare each of the elements | |||
return g._a == _a && | |||
Unsafe.Add(ref g._a, 1) == Unsafe.Add(ref _a, 1) && | |||
#if BIT64 | |||
Unsafe.Add(ref Unsafe.As<int, ulong>(ref g._a), 1) == Unsafe.Add(ref Unsafe.As<int, ulong>(ref _a), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has potential portability problem: It assumes that accessing misaligned ulong is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this should rather use Unsafe.ReadUnaligned
I has AggressiveInlining on everything. The actual implementation does not have it. Are the results representative? |
@jkotas oh, indeed, let me re-run the benchmark without it. Initially I wanted to replace duplicated code to compare guids with |
cc @OlegAxenow |
Turns out |
destination[13] = _i; | ||
destination[14] = _j; | ||
destination[15] = _k; | ||
if (Sse2.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it make sense to be special casing a simple copying like this for SSE2. Instead, this should be special cased for all little-endian platforms, like:
if (BitConverter.IsLittleEndian)
{
MemoryMarshal.Write(destination, ref this);
}
else
{
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, it's even faster than the sse impl!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folding WriteByteHelper into TryWriteBytes would make this even leaner for little endian platforms:
public byte[] ToByteArray()
{
var g = new byte[16];
if (BitConverter.IsLittleEndian)
{
MemoryMarshal.TryWrite<Guid>(g, ref this);
}
else
{
TryWriteBytes(g);
}
return g;
}
public bool TryWriteBytes(Span<byte> destination)
{
if (BitConverter.IsLittleEndian)
{
return MemoryMarshal.TryWrite<Guid>(destination, ref this);
}
else
{
... the slower path to write the bytes one by one ...
... you make consider using BinaryPrimitives.WriteInt32LittleEndian and BinaryPrimitives.WriteInt16LittleEndian to implement it ...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just call TryWriteBytes
from ToByteArray
directly. ToByteArray
is slow allocating method. It is not going to be used on performance critical paths. It is better to have smaller code for it.
// Returns whether bytes are sucessfully written to given span. | ||
public bool TryWriteBytes(Span<byte> destination) | ||
{ | ||
if ((uint)destination.Length < 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bounds check duplicated in MemoryMarshal.Write
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I guess MemoryMarshal.Write has it's own bounds checks (according to output) and this uint-hack doesn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why not to use TryWrite for the little endian path like I have suggested above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas oh, I didn't notice it, thanks!
} | ||
|
||
// slower path for BigEndian | ||
if ((uint)destination.Length >= 16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Does writing it this way help anything? It is generally more readable to use early returns, like:
if (destination.Length < 16)
return false;
destination[0] = ...;
....
WriteByteHelper(destination); | ||
return true; | ||
if (BitConverter.IsLittleEndian) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor that takes Span can be optimized the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but will it compile? C# requires stucts' constructors to init all fields by hands (however Decimal
compiles fine hm...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok it works
this = MemoryMarshal.Read<MyGuid>(b);
destination[12] = _h; | ||
destination[13] = _i; | ||
destination[14] = _j; | ||
destination[15] = _k; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to write this one first like you are doing in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought since it's surrounded with if ((uint)destination.Length >= 16)
JIT would not insert bounds checks but turns out it inserts them anyway (I think there is a feature-request for it somewhere...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this optimization is very sensitive to exact pattern today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it still write bounds checks if you specify 16u
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grant-d unfortunately yes it ignores (uint) > 16 (and 16u)
…zation (move to separate PR).
@jkotas I've removed |
Could you please also add a short comment like |
* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@@ -841,8 +841,12 @@ public bool Equals(Guid g) | |||
// Now compare each of the elements | |||
return g._a == _a && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a dumb question that I could have just looked up, but I wanted to ask here so I was sure... What part of the GUID structure is the LSB here? Is it different based on big-endian vs. little-endian? The reason I ask is that if someone is comparing lots of sequentially assigned GUIDs, it's probably faster to compare LSB first, rather than MSB, right? Seems like that would be faster on average than the other short-circuit eval and not slower in the non-sequential case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kellypleahy it depends on what you mean by "sequentially assigned GUIDs". For instance MS SQL Server increment first int field something like this:
00000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. It appears that UuidCreateSequential at least seems to update _a when generating sequential guids (and SQL Server just reverses the bytes of _a to big-endian when storing so as to ensure sequential order of the byte array) so in any case, comparing _a first is best by my argument above. Thanks for indulging me.
* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Commit migrated from dotnet/coreclr@248449d
Guid.TryWriteBytes(Span<byte>)
:new Guid(byte[])
andnew Guid(ReadOnlySpan<byte>)
:Guid.ToByteArray()
:Environment: