Vectorize some Guid APIs (ctor, TryWriteBytes) #21336

EgorBo · 2018-12-03T12:26:11Z

Guid.TryWriteBytes(Span<byte>):

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public Span<byte> TryWriteBytes(Guid id)
{
    Span<byte> s = new byte[20];
    id.TryWriteBytes(s);
    return s;
}

Method	Mean	Error	StdDev
TryWriteBytes_new	4.960 ns	0.0818 ns	0.0292 ns
TryWriteBytes_old	8.541 ns	0.0362 ns	0.0129 ns

new Guid(byte[]) and new Guid(ReadOnlySpan<byte>):

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public Guid GuidCtor(byte[] data)
{
    return new Guid(data);
}

Method	Mean	Error	StdDev
GuidCtor_new	2.615 ns	0.0275 ns	0.0098 ns
GuidCtor_old	7.483 ns	0.0146 ns	0.0052 ns

Guid.ToByteArray():

[Benchmark]
[ArgumentsSource(nameof(TestData))]
public byte[] ToByteArray(Guid id)
{
    return id.ToByteArray();
}

Method	Mean	Error	StdDev
ToByteArray_new	3.569 ns	0.0192 ns	0.0068 ns
ToByteArray_old	6.265 ns	0.0776 ns	0.0202 ns

Environment:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
Frequency=3609372 Hz, Resolution=277.0565 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-009812
  [Host]     : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT
  Job-CIYVXD : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT

IterationCount=6  WarmupCount=6

jkotas · 2018-12-03T12:35:16Z

src/System.Private.CoreLib/shared/System/Guid.cs

@@ -841,8 +841,12 @@ public bool Equals(Guid g)
            // Now compare each of the elements
            return g._a == _a &&
                Unsafe.Add(ref g._a, 1) == Unsafe.Add(ref _a, 1) &&
+#if BIT64


Why not to do this for the first check as well?

@jkotas in case if two guids are different (both random) - it becomes slower (see Equals_2 in the table - it does two ulong-checks).

changed to be two ulong *

jkotas · 2018-12-03T12:38:43Z

src/System.Private.CoreLib/shared/System/Guid.cs

@@ -841,8 +841,12 @@ public bool Equals(Guid g)
            // Now compare each of the elements
            return g._a == _a &&
                Unsafe.Add(ref g._a, 1) == Unsafe.Add(ref _a, 1) &&
+#if BIT64
+                Unsafe.Add(ref Unsafe.As<int, ulong>(ref g._a), 1) == Unsafe.Add(ref Unsafe.As<int, ulong>(ref _a), 1);


This has potential portability problem: It assumes that accessing misaligned ulong is ok.

I agree 🙁

So this should rather use Unsafe.ReadUnaligned

jkotas · 2018-12-03T12:52:26Z

https://gist.github.com/EgorBo/75348d5f194cbe91342d1b77d73b6c04

I has AggressiveInlining on everything. The actual implementation does not have it. Are the results representative?

EgorBo · 2018-12-03T12:55:56Z

@jkotas oh, indeed, let me re-run the benchmark without it. Initially I wanted to replace duplicated code to compare guids with Equals with [AggressiveInline] e.g. https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Guid.cs#L833-L845 (+ operator ==/!=)

jkotas · 2018-12-03T13:06:11Z

cc @OlegAxenow

EgorBo · 2018-12-03T13:11:05Z

hm.. without AggressiveInlining results are a little bit different:

comparing Guids as two ulongs is the most efficient way now (tried several times on a clean env).

PS: probably it makes sense to mark this method with AggressiveInlining anyway.

EgorBo · 2018-12-03T16:04:14Z

Turns out WriteByteHelper can be optimized with SSE2 (it makes it twice faster in my cases).

jkotas · 2018-12-03T16:17:31Z

src/System.Private.CoreLib/shared/System/Guid.cs

-            destination[13] = _i;
-            destination[14] = _j;
-            destination[15] = _k;
+            if (Sse2.IsSupported)


I do not think it make sense to be special casing a simple copying like this for SSE2. Instead, this should be special cased for all little-endian platforms, like:

if (BitConverter.IsLittleEndian) { MemoryMarshal.Write(destination, ref this); } else { ...

nice, it's even faster than the sse impl!

Folding WriteByteHelper into TryWriteBytes would make this even leaner for little endian platforms:

public byte[] ToByteArray() { var g = new byte[16]; if (BitConverter.IsLittleEndian) { MemoryMarshal.TryWrite<Guid>(g, ref this); } else { TryWriteBytes(g); } return g; } public bool TryWriteBytes(Span<byte> destination) { if (BitConverter.IsLittleEndian) { return MemoryMarshal.TryWrite<Guid>(destination, ref this); } else { ... the slower path to write the bytes one by one ... ... you make consider using BinaryPrimitives.WriteInt32LittleEndian and BinaryPrimitives.WriteInt16LittleEndian to implement it ... } }

Maybe just call TryWriteBytes from ToByteArray directly. ToByteArray is slow allocating method. It is not going to be used on performance critical paths. It is better to have smaller code for it.

jkotas · 2018-12-03T17:03:45Z

src/System.Private.CoreLib/shared/System/Guid.cs

+        // Returns whether bytes are sucessfully written to given span.
+        public bool TryWriteBytes(Span<byte> destination)
+        {
+            if ((uint)destination.Length < 16)


Is this bounds check duplicated in MemoryMarshal.Write ?

@jkotas I guess MemoryMarshal.Write has it's own bounds checks (according to output) and this uint-hack doesn't help.

So why not to use TryWrite for the little endian path like I have suggested above?

@jkotas oh, I didn't notice it, thanks!

jkotas · 2018-12-03T17:29:54Z

src/System.Private.CoreLib/shared/System/Guid.cs

+            }
+
+            // slower path for BigEndian
+            if ((uint)destination.Length >= 16)


Nit: Does writing it this way help anything? It is generally more readable to use early returns, like:

if (destination.Length < 16) return false; destination[0] = ...; ....

jkotas · 2018-12-03T17:31:11Z

src/System.Private.CoreLib/shared/System/Guid.cs

-            WriteByteHelper(destination);
-            return true;
+            if (BitConverter.IsLittleEndian)
+            {


The constructor that takes Span can be optimized the same way.

but will it compile? C# requires stucts' constructors to init all fields by hands (however Decimal compiles fine hm...).

ok it works

this = MemoryMarshal.Read<MyGuid>(b);

jkotas · 2018-12-03T17:31:35Z

src/System.Private.CoreLib/shared/System/Guid.cs

+                destination[12] = _h;
+                destination[13] = _i;
+                destination[14] = _j;
+                destination[15] = _k;


Why not to write this one first like you are doing in other places?

I thought since it's surrounded with if ((uint)destination.Length >= 16) JIT would not insert bounds checks but turns out it inserts them anyway (I think there is a feature-request for it somewhere...)

Right, this optimization is very sensitive to exact pattern today.

Does it still write bounds checks if you specify 16u?

@grant-d unfortunately yes it ignores (uint) > 16 (and 16u)

…zation (move to separate PR).

EgorBo · 2018-12-03T18:07:02Z

@jkotas I've removed Equals fix (will do that in a separate PR with other places).
I'll rewrite the PR description in an hour and attach better benchmarks for what we've done now 🙂

jkotas · 2018-12-03T18:12:25Z

Could you please also add a short comment like // Hoist bounds checks next to all of the out-of-order writes so that somebody looking at this will get a hint that it is done intentionally this way?

* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

kellypleahy · 2018-12-04T07:08:35Z

src/System.Private.CoreLib/shared/System/Guid.cs

@@ -841,8 +841,12 @@ public bool Equals(Guid g)
            // Now compare each of the elements
            return g._a == _a &&


This may be a dumb question that I could have just looked up, but I wanted to ask here so I was sure... What part of the GUID structure is the LSB here? Is it different based on big-endian vs. little-endian? The reason I ask is that if someone is comparing lots of sequentially assigned GUIDs, it's probably faster to compare LSB first, rather than MSB, right? Seems like that would be faster on average than the other short-circuit eval and not slower in the non-sequential case.

@kellypleahy it depends on what you mean by "sequentially assigned GUIDs". For instance MS SQL Server increment first int field something like this:
00000000-0000-0000-0000-000000000000
00010000-0000-0000-0000-000000000000

Ah, thanks. It appears that UuidCreateSequential at least seems to update _a when generating sequential guids (and SQL Server just reverses the bytes of _a to big-endian when storing so as to ensure sequential order of the byte array) so in any case, comparing _a first is best by my argument above. Thanks for indulging me.

* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Optimize some Guid APIs * get rid of WriteByteHelper * use TryWrite instead of Write * Optimize ctor `Guid(ReadOnlySpan<byte> b)` and remove `Equals` optimization (move to separate PR). Commit migrated from dotnet/coreclr@248449d

Optimize some Guid APIs

a4aec27

jkotas reviewed Dec 3, 2018

View reviewed changes

EgorBo added 2 commits December 3, 2018 16:15

compare as two ulong

be3ac7f

Optimize WriteByteHelper with SSE2

0feddf2

jkotas reviewed Dec 3, 2018

View reviewed changes

EgorBo added 2 commits December 3, 2018 19:24

replace SSE2 with MemoryMarshal.Write

90a7eb3

get rid of WriteByteHelper

e4d6a6a

jkotas reviewed Dec 3, 2018

View reviewed changes

EgorBo added 2 commits December 3, 2018 20:07

(uint) hack didn't work

fbd651d

use TryWrite instead of Write

e3a44a9

jkotas reviewed Dec 3, 2018

View reviewed changes

Optimize ctor Guid(ReadOnlySpan<byte> b) and remove Equals optimi…

0080ec8

…zation (move to separate PR).

fix compilation error and add comments

448130d

EgorBo changed the title ~~Optimize Guid.Equals and remove some bounds checks~~ Vectorize some Guid APIs (ctor, TryWriteBytes) Dec 3, 2018

stephentoub approved these changes Dec 3, 2018

View reviewed changes

jkotas approved these changes Dec 3, 2018

View reviewed changes

jkotas merged commit 248449d into dotnet:master Dec 3, 2018

kellypleahy reviewed Dec 4, 2018

View reviewed changes

EgorBo mentioned this pull request Jan 31, 2020

JIT doesn't eliminate bounds checks sometimes dotnet/runtime#11623

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize some Guid APIs (ctor, TryWriteBytes) #21336

Vectorize some Guid APIs (ctor, TryWriteBytes) #21336

EgorBo commented Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

EgorBo Dec 3, 2018

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

jkotas Dec 3, 2018

jkotas commented Dec 3, 2018

EgorBo commented Dec 3, 2018

jkotas commented Dec 3, 2018

EgorBo commented Dec 3, 2018 •

edited

Loading

EgorBo commented Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018 •

edited

Loading

EgorBo Dec 3, 2018

jkotas Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

jkotas Dec 3, 2018

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

EgorBo Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018

EgorBo Dec 3, 2018

jkotas Dec 3, 2018

grant-d Dec 3, 2018

EgorBo Dec 3, 2018

EgorBo commented Dec 3, 2018 •

edited

Loading

jkotas commented Dec 3, 2018

kellypleahy Dec 4, 2018

EgorBo Dec 4, 2018

kellypleahy Dec 4, 2018

		@@ -841,8 +841,12 @@ public bool Equals(Guid g)
		// Now compare each of the elements
		return g._a == _a &&

Vectorize some Guid APIs (ctor, TryWriteBytes) #21336

Vectorize some Guid APIs (ctor, TryWriteBytes) #21336

Conversation

EgorBo commented Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas commented Dec 3, 2018

EgorBo commented Dec 3, 2018

jkotas commented Dec 3, 2018

EgorBo commented Dec 3, 2018 • edited Loading

EgorBo commented Dec 3, 2018 • edited Loading

jkotas Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorBo Dec 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorBo commented Dec 3, 2018 • edited Loading

jkotas commented Dec 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorBo commented Dec 3, 2018 •

edited

Loading

EgorBo commented Dec 3, 2018 •

edited

Loading

EgorBo commented Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018 •

edited

Loading

jkotas Dec 3, 2018 •

edited

Loading

EgorBo Dec 3, 2018 •

edited

Loading

EgorBo commented Dec 3, 2018 •

edited

Loading