Improve Dictionary perfomance #27149

benaadams · 2019-10-12T00:45:33Z

Dictionary avoid second bounds check in Get methods
Hoist collision max count check
Avoid mod operator when fast alternative available

|        Method | Toolchain |      Mean | Ratio |
|-------------- |---------- |----------:|------:|
| KNucleotide_1 |      base | 368.1  ms |  1.00 |
| KNucleotide_1 |      diff | 336.9  ms |  0.92 |
| KNucleotide_9 |      base | 104.05 ms |  1.00 |
| KNucleotide_9 |      diff |  89.71 ms |  0.86 |

System.Collections.TryGetValueTrue_String, String

|     Method | Toolchain | Size |     Mean | Ratio |
|----------- |---------- |----- |---------:|------:|
| Dictionary |      base |  512 | 15.72 us |  1.00 |
| Dictionary |      diff |  512 | 12.87 us |  0.82 |

System.Collections.TryGetValueFalse_String, String

|     Method | Toolchain | Size |     Mean | Ratio |
|----------- |---------- |----- |---------:|------:|
| Dictionary |      base |  512 | 13.84 us |  1.00 |
| Dictionary |      diff |  512 | 10.61 us |  0.77 |

System.Collections.TryGetValueTrue_Int32, Int32

|     Method | Toolchain | Size |     Mean | Ratio |
|----------- |---------- |----- |---------:|------:|
| Dictionary |      base |  512 | 5.901 us |  1.00 |
| Dictionary |      diff |  512 | 3.642 us |  0.62 |

System.Collections.TryGetValueFalse_Int32, Int32

|     Method | Toolchain | Size |     Mean | Ratio |
|----------- |---------- |----- |---------:|------:|
| Dictionary |      base |  512 | 8.511 us |  1.00 |
| Dictionary |      diff |  512 | 4.194 us |  0.49 |

More #27149 (comment)

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

GrabYourPitchforks · 2019-10-12T02:16:43Z

For reference, here's the before and after codegen for the get_Item method, courtesy Sharplab. If you drill into the linked code on Sharplab, I mocked up Ben's changes inside a dummy type just to see what would happen to the codegen.

public static int Before(Dictionary<int, int> d) {
    return d[42];
}

public static int After(MockDictionary<int, int> d)
{
    return d[42];
}

C.Before(System.Collections.Generic.Dictionary`2<Int32,Int32>)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: mov ecx, [rsi]
    L000a: mov rcx, rsi
    L000d: mov edx, 0x2a
    L0012: call System.Collections.Generic.Dictionary`2[[System.Int32, System.Private.CoreLib],[System.Int32, System.Private.CoreLib]].FindEntry(Int32)
    L0017: test eax, eax
    L0019: jl L0035
    L001b: mov rcx, [rsi+0x10]
    L001f: cmp eax, [rcx+0x8]
    L0022: jae L0040
    L0024: movsxd rax, eax
    L0027: shl rax, 0x4
    L002b: mov eax, [rcx+rax+0x1c]
    L002f: add rsp, 0x20
    L0033: pop rsi
    L0034: ret
    L0035: mov ecx, 0x2a
    L003a: call System.ThrowHelper.ThrowKeyNotFoundException[[System.Int32, System.Private.CoreLib]](Int32)
    L003f: int3
    L0040: call 0x7ff95393ef00
    L0045: int3

C.After(MockDictionary`2<Int32,Int32>)
    L0000: sub rsp, 0x28
    L0004: mov edx, [rcx]
    L0006: mov edx, 0x2a
    L000b: call MockDictionary`2[[System.Int32, System.Private.CoreLib],[System.Int32, System.Private.CoreLib]].FindValueOrNull(Int32)
    L0010: test rax, rax
    L0013: jz L001c
    L0015: mov eax, [rax]
    L0017: add rsp, 0x28
    L001b: ret
    L001c: mov ecx, 0x2a
    L0021: call MockDictionary`2[[System.Int32, System.Private.CoreLib],[System.Int32, System.Private.CoreLib]].ThrowKeyNotFoundException[[System.Int32, System.Private.CoreLib]](Int32)
    L0026: int3

src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/RuntimeHelpers.cs

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

benaadams · 2019-10-13T02:32:00Z

Wasn't happy with the asm this was generating; and the gnarly bool handling logic that it generated whenever it inlined. So trying a different approach

The side-effect of the new that I'm not sure about is how it effects uses of ContainsKey as it introduces an extra byref param, and again introduces some unnecessary bool handling which looks in part due to wanting the register to clear the out value? e.g.

-   mov      rdx, rsi
-   call     [Dictionary`2:FindEntry(ref):int:this]
-   test     eax, eax
-   jge      SHORT G_M9502_IG08
+   lea      r8, bword ptr [rsp+20H]
+   mov      rdx, rsi
+   call     [Dictionary`2:TryGetValue(ref,byref):bool:this]
+   movzx    rcx, al
+   xor      rax, rax
+   mov      gword ptr [rsp+20H], rax
+   test     ecx, ecx
+   jne      SHORT G_M9502_IG08

Otherwise the diffs look good

Total bytes of diff: -891 (-0.03% of base)
    diff is an improvement.

Total byte diff includes -3935 bytes from reconciling methods
  Base had    4 unique methods,     3935 unique bytes
  Diff had    0 unique methods,  0 unique bytes

Top file improvements by size (bytes):
  -891 : System.Private.CoreLib.dasm (-0.03% of base)

IDisposable

Looks great, one minor nit

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

jkotas · 2019-10-13T05:02:01Z

Wasn't happy with the asm this was generating; and the gnarly bool handling logic that it generated whenever it inlined.

It is not unusual for the JIT to generate sub-optimal code for rarely used or brand new patterns. The JIT typically needs tweaking for best results.

benaadams · 2019-10-13T05:48:47Z

Shaved off a little more

Total bytes of diff: -1053 (-0.04% of base)
   diff is an improvement.

Total byte diff includes -3935 bytes from reconciling methods
   Base had    4 unique methods,     3935 unique bytes
   Diff had    0 unique methods,   0 unique bytes

Top file improvements by size (bytes):
  -1053 : System.Private.CoreLib.dasm (-0.04% of base)

1 total files with size differences (1 improved, 0 regressed), 0 unchanged.

Top method regressions by size (bytes):
   1845 (444.58% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(ref,byref):bool:this (5 methods)
    861 (337.65% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(long,byref):bool:this (3 methods)
    269 (316.47% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(int,byref):bool:this
    195 (110.17% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryGetValue(struct,byref):bool:this (2 base, 1 diff methods)
    108 ( 9.35% of base) : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.IDictionary.Contains(ref):bool:this (11 methods)
     92 ( 7.91% of base) : System.Private.CoreLib.dasm - PseudoCustomAttribute:GetCustomAttributes(ref,ref,byref) (4 methods)
     92 (10.17% of base) : System.Private.CoreLib.dasm - PseudoCustomAttribute:IsDefined(ref,ref):bool (4 methods)
     45 (51.72% of base) : System.Private.CoreLib.dasm - KeyCollection:System.Collections.Generic.ICollection<TKey>.Contains(long):bool:this (3 methods)
     40 (24.39% of base) : System.Private.CoreLib.dasm - Unsafe:AsRef(long):byref (41 base, 51 diff methods)
     33 (16.50% of base) : System.Private.CoreLib.dasm - KeyCollection:System.Collections.Generic.ICollection<TKey>.Contains(ref):bool:this (6 methods)
     31 (18.67% of base) : System.Private.CoreLib.dasm - Attribute:CopyToArrayList(ref,ref,ref)
     27 ( 3.77% of base) : System.Private.CoreLib.dasm - MemberInfoCache`1:PopulateEvents(struct,ref,ref,byref):this
     24 (25.26% of base) : System.Private.CoreLib.dasm - KeyCollection:System.Collections.Generic.ICollection<TKey>.Contains(struct):bool:this (2 methods)
     23 (11.79% of base) : System.Private.CoreLib.dasm - EventRegistrationTokenTable`1:AddEventHandlerNoLock(ref):struct:this
     23 ( 2.15% of base) : System.Private.CoreLib.dasm - EventSource:DebugCheckEvent(byref,ref,ref,ref,ref,int)

Top method improvements by size (bytes):
  -2270 (-100.00% of base) : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(ref):int:this (5 base, 0 diff methods)
  -1008 (-100.00% of base) : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(long):int:this (3 base, 0 diff methods)
   -336 (-100.00% of base) : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(struct):int:this (1 base, 0 diff methods)
   -321 (-100.00% of base) : System.Private.CoreLib.dasm - Dictionary`2:FindEntry(int):int:this (1 base, 0 diff methods)
   -234 (-15.50% of base) : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<TKey,TValue>>.Contains(struct):bool:this (11 methods)
   -196 (-11.48% of base) : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<TKey,TValue>>.Remove(struct):bool:this (11 methods)
   -128 (-8.34% of base) : System.Private.CoreLib.dasm - Dictionary`2:System.Collections.IDictionary.get_Item(ref):ref:this (11 methods)
   -114 (-7.88% of base) : System.Private.CoreLib.dasm - Dictionary`2:ContainsValue(ref):bool:this (6 methods)
    -75 (-3.24% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(long,ref,ubyte):bool:this (3 methods)
    -54 (-25.00% of base) : System.Private.CoreLib.dasm - Dictionary`2:get_Item(long):ref:this (3 methods)
    -26 (-1.30% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(ref,struct,ubyte):bool:this (2 methods)
    -25 (-3.24% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(struct,ref,ubyte):bool:this
    -24 (-3.18% of base) : System.Private.CoreLib.dasm - Dictionary`2:TryInsert(int,ref,ubyte):bool:this
    -20 (-8.77% of base) : System.Private.CoreLib.dasm - Dictionary`2:get_Item(ref):struct:this (2 methods)
    -18 (-26.09% of base) : System.Private.CoreLib.dasm - Dictionary`2:get_Item(int):ref:this

65 total methods with size differences (30 improved, 35 regressed), 18583 unchanged.

benaadams · 2019-10-13T06:00:30Z

It is not unusual for the JIT to generate sub-optimal code for rarely used or brand new patterns. The JIT typically needs tweaking for best results.

I see this pattern come up a bit; I think its post inlining and in this case from EqualityComparer<TKey>.Default.Equals

   mov      ecx, dword ptr [r13+12]
   cmp      ecx, esi
   sete     cl
   movzx    rcx, cl
   test     ecx, ecx
   jne      G_M59263_IG12

Would I correct in thinking it could just be:

   mov      ecx, dword ptr [r13+12]
   cmp      ecx, esi
   je       G_M59263_IG12

Where the cmp and je would microsfuse to 1 instuction? Will raise issue if so

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

benaadams · 2019-10-13T18:14:39Z

There is a further optimisation to be had here as the test could be dropped from the collision testing sequences

dec      r13d
test     r13d, r13d
jl       SHORT G_M59263_IG15

though doesn't look like the Jit currently does that optimization https://github.com/dotnet/coreclr/issues/7566

src/System.Private.CoreLib/shared/System/Collections/HashHelpers.cs

benaadams · 2019-10-14T02:49:00Z

ContainsKey which I was worried about looks ok

System.Collections.ContainsKeyTrue_String, String

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 14.86 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 13.01 us |  0.88 |

System.Collections.ContainsKeyFalse_String, String

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 11.48 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 10.79 us |  0.94 |

System.Collections.ContainsKeyTrue_Int32, Int32

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 4.655 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 3.826 us |  0.82 |

System.Collections.ContainsKeyFalse_Int32, Int32

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 4.815 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 3.336 us |  0.69 |

TryGetValue is happy

System.Collections.TryGetValueTrue_String, String

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 15.72 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 12.87 us |  0.82 |

System.Collections.TryGetValueFalse_String, String

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 13.84 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 10.61 us |  0.77 |

System.Collections.TryGetValueTrue_Int32, Int32

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 5.901 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 3.642 us |  0.62 |

System.Collections.TryGetValueFalse_Int32, Int32

|     Method |                   Toolchain | Size |     Mean | Ratio |
|----------- |---------------------------- |----- |---------:|------:|
| Dictionary | \Core_Root-base\CoreRun.exe |  512 | 8.511 us |  1.00 |
| Dictionary | \Core_Root-diff\CoreRun.exe |  512 | 4.194 us |  0.49 |

benaadams · 2019-10-14T02:57:59Z

KNucleotide changes

|        Method |                   Toolchain |      Mean | Ratio |
|-------------- |---------------------------- |----------:|------:|
| KNucleotide_1 | \Core_Root-base\CoreRun.exe | 368.1  ms |  1.00 |
| KNucleotide_1 | \Core_Root-diff\CoreRun.exe | 336.9  ms |  0.92 |
| KNucleotide_9 | \Core_Root-base\CoreRun.exe | 104.05 ms |  1.00 |
| KNucleotide_9 | \Core_Root-diff\CoreRun.exe |  89.71 ms |  0.86 |

/cc @danmosemsft

benaadams · 2019-10-14T03:15:29Z

src/System.Private.CoreLib/shared/System/Collections/HashHelpers.cs

+                // more costly idiv instruction.
+
+                const uint CrcReflected = 0xEDB88320;
+                return (uint)((Sse42.Crc32(CrcReflected, index) * (ulong)range) >> 32);


@mikedn @AndyAyersMS running the hashcode through Crc32 gets around the distribution issue with unknown hashcodes being high entropy in their LSB when using @lemire's fastrange. (int being an example here)

Though alas we can only do this when hardware accelerated as otherwise Crc32 is too slow; so fall back to modulo when its not supported in hardware.

I can confirm from my experience that fastrange is very dangerous when used as a hash reducer. I tried it in the past and it resulted in massive collisions in common cases like sequential numbers, zip codes, etc. - i.e. stuff that has entropy in low bits.

I did not consider hardware-accelerated mixer like Crc32 - cannot comment on that. It was not available at the time. It is an interesting idea.

My guess it may still regress cases like when dictionary is used as a sparse list and keys have clusters. Modulus, for good or bad, happens to preserve the locality.

Need to reverse the bits to reverse the entropy; and then you get something closer to the behaviour of modulus with fastrange. Alas there still isn't a fast way to do that in hardware; and reversing the endianness still leaves the wrong order entropy per byte.

Re: locality #27149 (comment) for Dictionary specifically it has a lesser impact; than for example Hashtable, as its only changing the locality of the smaller int[] array that contains the indexes and the larger Entry[] array would maintain the same very packed locality. But yes it would have an impact.

One approach could be to use a bucket array that was of the correct type for the capacity (e.g. dictionary size 1-255 could be a byte[], 256-65536 could be an ushort[], etc)

benaadams · 2019-10-14T04:41:45Z

One wrinkle with this change is it also does Crc32 on top of Marvin32 for strings which is a bit weird; but shouldn't really make it it more predictable? /cc @GrabYourPitchforks

GrabYourPitchforks · 2019-10-14T04:58:25Z

We'd need to run the CRC32 logic by the crypto board to make sure that we're not improperly compressing the range of the hash function. The final step of CRC32 is a finite field polynomial division, which likely introduces a slight bias toward zero. It's probably safe but needs an extra set of eyes from somebody more steeped in this field.

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

src/System.Private.CoreLib/shared/System/Collections/HashHelpers.cs

jkotas · 2019-10-14T05:25:16Z

src/System.Private.CoreLib/shared/System/Collections/HashHelpers.cs

+
+                const uint CrcReflected = 0xEDB88320;
+                return (uint)((Sse42.Crc32(CrcReflected, index) * (ulong)range) >> 32);
+            }


It is a neat idea. I can think about two potential problems with this:

This will make Dictionary slower when tiered JIT is disabled, or during startup until tiered JIT kicks in. @mjsabby Are you concerned about Dictionary being slower with tiered JIT disabled?

There may be cases today where the div gives a good locality for Dictionary access. This locality will disappear with crc32 and the overall performance will regress. I do not know what is the chance of this happening in real world workloads.

This will make Dictionary slower when tiered JIT is disabled, or during startup until tiered JIT kicks in.

Does the support check get cached in a static rather than always being a CPUID? Looking at the R2R asm (call to jmp) still have quite a few cycles to spare vs the div branch:

call [Sse42:get_IsSupported():bool] test al, al je SHORT G_M59265_IG10 mov eax, 0xD1FFAB1E crc32 eax, r12d mov edx, r13d imul rax, rdx shr rax, 32 mov ecx, eax jmp SHORT G_M59265_IG11 G_M59265_IG10: ;; bbWeight=0.50 mov eax, r12d xor rdx, rdx div edx:eax, r13d mov ecx, edx G_M59265_IG11: ;; bbWeight=0.50

Actually, can test it. Will COMPlus_TieredCompilation=0 mean it will always use the R2R version?

There may be cases today where the div gives a good locality for Dictionary access. This locality will disappear with crc32 and the overall performance will regress.

This should have a lower impact with Dictionary; than for example Hashtable as its only changing the locality of the smaller int[] array that contains the indexes and the larger Entry[] array would maintain the same very packed locality. But yes it would have an impact.

Probably couldn't use in in Hashtable due to this greater effect as it would additionally scatter the data. Although... Hashtable calls modulo for every collision (whereas Dictionary only calls it once); so might it might balance?

However; I wouldn't want to do the work to verify that 😄

Yes, Dictionary is one of the hottest pieces of code we run and we run with tiered jitting disabled and would have a pretty negative perf impact. I would love to see this without a regression to Dictionary in the non tiered case.

Still is an improvement for non-tiered (I assume Sse42 is available, its just tiering that's disabled?)

| Method | Toolchain | Mean | Ratio | |-------------- |---------- |----------:|------:| | KNucleotide_1 | base | 375.6 ms | 1.00 | | KNucleotide_1 | diff | 340.6 ms | 0.91 | | KNucleotide_9 | base | 100.13 ms | 1.00 | | KNucleotide_9 | diff | 97.05 ms | 0.97 |

System.Collections.TryGetValueFalse_Int, Int

| Method | Toolchain | Size | Mean | Ratio | |----------- |----------- |----- |---------:|------:| | Dictionary | base | 512 | 7.033 us | 1.00 | | Dictionary | diff | 512 | 3.381 us | 0.48 |

System.Collections.TryGetValueFalse_String, String

| Method | Toolchain | Size | Mean | Ratio | |----------- |----------- |----- |---------:|------:| | Dictionary | base | 512 | 16.46 us | 1.00 | | Dictionary | diff | 512 | 13.72 us | 0.83 |

System.Collections.TryGetValueTrue_Int, Int

| Method | Toolchain | Size | Mean | Ratio | |----------- |----------- |----- |---------:|------:| | Dictionary | base | 512 | 5.965 us | 1.00 | | Dictionary | diff | 512 | 3.755 us | 0.63 |

System.Collections.TryGetValueTrue_String, String

| Method | Toolchain | Size | Mean | Ratio | |----------- |----------- |----- |---------:|------:| | Dictionary | base | 512 | 17.92 us | 1.00 | | Dictionary | diff | 512 | 15.09 us | 0.84 |

@adamsitnik does setting the COMPlus_TieredCompilation=0 environment variable stick for the https://github.com/dotnet/performance repository or should I be passing a flag?

Also seems like the perf repo tests are using unique randomly distributed keys. Might be worth running some perf examples where we have less well distributed key sets?

I would prefer this hashcode-to-bucket mapping change to be a separate PR.

I am not quite sure why we are using the reflected polynomial of CRC-32 (CrcReflected) as the initial hash value for CRC-32C calculation implemented by the intrinsic.

Also seems like the perf repo tests are using unique randomly distributed keys. Might be worth running some perf examples where we have less well distributed key sets?

Added some test for that, and large values dotnet/performance#938

Yes, Dictionary is one of the hottest pieces of code we run

Makes my top 6 (the two waits don't count as they are the main thread and blocked forever)

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

Typos Co-Authored-By: Joni <jonijpn@gmail.com>

adamsitnik · 2019-10-14T17:05:32Z

does setting the COMPlus_TieredCompilation=0 environment variable stick for the https://github.com/dotnet/performance repository or should I be passing a flag?

Yes, this should work fine (the host process is going to derive the env var from your shell and the benchmark process from the host process).

The alternative is to use our python3 script which allows for even more:

  --dotnet-compilation-mode {Default,Tiered,NoTiering,FullyJittedNoTiering,MinOpt}
                        Different compilation modes that can be set to change
                        the .NET compilation behavior. The default
                        configurations have changed between releases of .NET.
                        These flags enable ensuring consistency when running
                        more than one runtime. The different modes are:
                        Default: no environment variables are set; Tiered:
                        tiering is enabled. NoTiering: tiering is disabled,
                        but includes R2R code, and it is useful for comparison
                        against Tiered; FullyJittedNoTiering: This is JIT-
                        only, useful for comparison against Tiered and NoTier
                        for changes to R2R code or tiering; MinOpt: uses
                        minopt-JIT for methods that do not have pregenerated
                        code, and useful for startup time comparisons in
                        scenario benchmarks that include a startup time
                        measurement (probably not for microbenchmarks),
                        probably not useful for a PR.

py .\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter $theFilter --coreRun $coreRuns --dotnet-compilation-mode NoTiering

VSadov · 2019-10-14T17:56:15Z

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

        private struct Entry
        {
+            public uint hashCode;


Reordered the fields ( I assume by the order of access) and made layout Auto. Just curious if this was on purpose or a typo.

On purpose; order of access, but also Auto in case the TKey and TValue are unusual size structs so the reordering doesn't work for the packing (e.g. byte, byte)

If Auto considers the initial layout (I do not know), it would be a nice trick to learn.

I wondered too, maybe worth a comment.

benaadams · 2019-10-14T21:29:55Z

Also seems like the perf repo tests are using unique randomly distributed keys. Might be worth running some perf examples where we have less well distributed key sets?

Sequential keys
Dictionary sizes 17 and 3000
Keys are: int
Values are: int, (long, long, long, long) and (object, object, object, object)

TryGetValue looks good;

|                             Method | Toolchain |      Mean | Ratio |
|----------------------------------- |---------- |----------:|------:|                                            
|             TryGetValue_17_Int_Int |      base | 10.157 ns |  1.00 |
|             TryGetValue_17_Int_Int |      diff |  6.109 ns |  0.60 |
|             TryGetValue_3k_Int_Int |      base |  9.563 ns |  1.00 |
|             TryGetValue_3k_Int_Int |      diff |  6.651 ns |  0.70 |
|                                    |           |           |       |
|     TryGetValue_17_Int_32ByteValue |      base | 10.633 ns |  1.00 |
|     TryGetValue_17_Int_32ByteValue |      diff |  6.772 ns |  0.64 |
|     TryGetValue_3k_Int_32ByteValue |      base | 10.132 ns |  1.00 |
|     TryGetValue_3k_Int_32ByteValue |      diff |  7.526 ns |  0.74 |
|                                    |           |           |       |
| TryGetValue_17_Int_32ByteRefsValue |      base | 14.538 ns |  1.00 |
| TryGetValue_17_Int_32ByteRefsValue |      diff | 11.143 ns |  0.77 |
| TryGetValue_3k_Int_32ByteRefsValue |      base | 14.161 ns |  1.00 |
| TryGetValue_3k_Int_32ByteRefsValue |      diff | 12.178 ns |  0.86 |

What is the performance going to be for larger TValue? E.g. ValueTuple<object, object, object, object> ? I would expect that e.g. ContainsKey will regress for this case.

Indeed it does

|                             Method | Toolchain |      Mean | Ratio |
|----------------------------------- |---------- |----------:|------:|
|           ContainsValue_17_Int_Int |      base |  8.196 ns |  1.00 |
|           ContainsValue_17_Int_Int |      diff |  6.019 ns |  0.74 |
|           ContainsValue_3k_Int_Int |      base |  7.773 ns |  1.00 |
|           ContainsValue_3k_Int_Int |      diff |  6.716 ns |  0.86 |
|                                    |           |           |       |
|     ContainsKey_17_Int_32ByteValue |      base |  7.713 ns |  1.00 |
|     ContainsKey_17_Int_32ByteValue |      diff |  7.209 ns |  0.93 |
|     ContainsKey_3k_Int_32ByteValue |      base |  7.518 ns |  1.00 |
|     ContainsKey_3k_Int_32ByteValue |      diff |  8.251 ns |  1.10 |
|                                    |           |           |       |
| ContainsKey_17_Int_32ByteRefsValue |      base |  7.624 ns |  1.00 |
| ContainsKey_17_Int_32ByteRefsValue |      diff | 11.860 ns |  1.56 |
| ContainsKey_3k_Int_32ByteRefsValue |      base |  7.583 ns |  1.00 |
| ContainsKey_3k_Int_32ByteRefsValue |      diff | 12.622 ns |  1.67 |

AntonLapounov · 2019-10-14T22:47:43Z

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs

@@ -228,26 +232,36 @@ public void Clear()
        }

        public bool ContainsKey(TKey key)
-            => FindEntry(key) >= 0;
+            => TryGetValue(key, out _);


It seems this would do an unnecessary TValue copy on success.

VSadov · 2019-10-14T23:59:27Z

Another way to avoid copying the value could be by using bool FindValue(key, ref value) - with ref instead of out, so that calee could just return false.

It may not be as efficient as dealing with unsafe, but could be not that much worse.

benaadams · 2019-10-15T00:03:59Z

Another way to avoid copying the value could be by using bool FindValue(key, ref value)

Doesn't help with ContainsKey as the caller then has to default the ref; and when true it would additionally do the assignment (unless additionally passed an enum to indicate mode, but then that eats a register).

Returning ref TValue allows ContainsKey to avoid any struct copies.

VSadov · 2019-10-15T00:15:57Z

Yes, that is why it may not be as efficient as unsafe version.

TryGet code path will be ok, but in the case of ContainsKey, caller will need to have space for the value and positive case will do a redundant copy.

That is just in case you (or JIT) stop liking unsafe for some reason :-)

benaadams · 2019-10-15T00:17:46Z

#27149 (comment)

I liked your previous approach where this returned ref TValue better.

😉

VSadov · 2019-10-15T00:35:59Z

My C# background cries when I see ref Unsafe.AsRef<TValue>(null).

From IL point of view it is ok though and I see how it is safe and efficient in this case.

AntonLapounov · 2019-10-15T00:52:38Z

My C# background cries when I see ref Unsafe.AsRef(null).

@VSadov That should be ref? TValue value = default;. ;-)

benaadams · 2019-10-15T02:42:00Z

bool FindValue(key, ref ref value) would work too 😉

VSadov · 2019-10-15T03:11:55Z

I think ref? was actually briefly considered as a nullable ref. But then why stop half way and not bring in a real notion of a byref type, with math and comparisons, etc.

Then TValue^ pValue = null; starts looking attractive...

We are digressing :-)

benaadams · 2019-10-15T05:05:14Z

I would prefer this hashcode-to-bucket mapping change to be a separate PR.

PR without the hashcode-to-bucket mapping change, will follow up with that in a different PR #27195

gahe88 · 2019-10-15T21:13:50Z

@benaadams I just came across an interesting way to map hash values to smaller ranges. Linked it in case you want to have a look:
https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

benaadams · 2019-10-19T04:34:15Z

Opened PR for the other half #27299

benaadams commented Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

GrabYourPitchforks reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

jkotas reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

jkotas reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

GrabYourPitchforks reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

benaadams force-pushed the dict-entry branch from 28fa945 to 5d04012 Compare October 12, 2019 07:24

AntonLapounov reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

AntonLapounov reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

AntonLapounov reviewed Oct 12, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

benaadams force-pushed the dict-entry branch from 5d04012 to 17d666c Compare October 13, 2019 02:22

benaadams added 2 commits October 13, 2019 03:54

Dictionary avoid second bounds check in Get methods

1928570

ContainsValue

97871c2

benaadams force-pushed the dict-entry branch from 17d666c to 97871c2 Compare October 13, 2019 02:54

IDisposable approved these changes Oct 13, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Show resolved Hide resolved

IDisposable reviewed Oct 13, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

benaadams added 2 commits October 14, 2019 01:19

Hoist collision max count check

4d9639c

Avoid mod operator when fast alternative available

7c5b390

benaadams force-pushed the dict-entry branch from 1539b22 to 7c5b390 Compare October 14, 2019 01:42

benaadams commented Oct 14, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/HashHelpers.cs Show resolved Hide resolved

benaadams changed the title ~~Dictionary avoid second bounds check in Get methods~~ Improve Dictionary perfomance Oct 14, 2019

benaadams marked this pull request as ready for review October 14, 2019 03:02

benaadams commented Oct 14, 2019

View reviewed changes

jkotas reviewed Oct 14, 2019

View reviewed changes

jo-ninja reviewed Oct 14, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

jo-ninja reviewed Oct 14, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

jo-ninja reviewed Oct 14, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

Apply suggestions from code review

7dd02dd

Typos Co-Authored-By: Joni <jonijpn@gmail.com>

benaadams force-pushed the dict-entry branch from e0a81d5 to 7dd02dd Compare October 14, 2019 06:07

Change back from allowedCollisions to collisionCount

2b2cc47

VSadov reviewed Oct 14, 2019

View reviewed changes

AntonLapounov reviewed Oct 14, 2019

View reviewed changes

return ref

48b9586

benaadams mentioned this pull request Oct 15, 2019

Improve Dictionary TryGetValue size/perfomance #27195

Merged

benaadams mentioned this pull request Oct 19, 2019

Avoid mod operator when fast alternative available #27299

Merged

benaadams closed this Oct 19, 2019

benaadams deleted the dict-entry branch October 19, 2019 04:34

theolivenbaum mentioned this pull request Dec 6, 2019

Dictionary throwing OOM dotnet/runtime#621

Closed

Improve Dictionary perfomance #27149

Improve Dictionary perfomance #27149

Conversation

benaadams commented Oct 12, 2019 • edited Loading

GrabYourPitchforks commented Oct 12, 2019 • edited Loading

benaadams commented Oct 13, 2019 • edited Loading

IDisposable left a comment

Choose a reason for hiding this comment

jkotas commented Oct 13, 2019

benaadams commented Oct 13, 2019

benaadams commented Oct 13, 2019 • edited Loading

benaadams commented Oct 13, 2019

benaadams commented Oct 14, 2019

benaadams commented Oct 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams Oct 14, 2019 • edited Loading

Choose a reason for hiding this comment

benaadams commented Oct 14, 2019

GrabYourPitchforks commented Oct 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Oct 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Oct 14, 2019

Choose a reason for hiding this comment

VSadov commented Oct 14, 2019

benaadams commented Oct 15, 2019

VSadov commented Oct 15, 2019

benaadams commented Oct 15, 2019

VSadov commented Oct 15, 2019

AntonLapounov commented Oct 15, 2019

benaadams commented Oct 15, 2019

VSadov commented Oct 15, 2019 • edited Loading

benaadams commented Oct 15, 2019

gahe88 commented Oct 15, 2019

benaadams commented Oct 19, 2019

benaadams commented Oct 12, 2019 •

edited

Loading

GrabYourPitchforks commented Oct 12, 2019 •

edited

Loading

benaadams commented Oct 13, 2019 •

edited

Loading

benaadams commented Oct 13, 2019 •

edited

Loading

benaadams Oct 14, 2019 •

edited

Loading

VSadov commented Oct 15, 2019 •

edited

Loading