[WIP] Pipelines - Reduce write lock contention #35267

benaadams · 2019-02-12T19:01:21Z

BufferSegment:ResetMemory() is not an inexpensive operation (blanking 80 bytes + returning data to pool); however it doesn't need to be done under the Pipe's reader/writer sync lock.

Also the scheduling/marking of completion can happen prior to the Segments being reset and returned to the pool, allowing for lower latency.

This change resets the segments outside of lock; then uses the pool as a different lock to return the Segments to the pool (and to get them from the pool) so its a fast lock only for the pooled Segments and doesn't lock the whole Pipe.

It also reduces the scope of the lock on the write side when acquiring Memory; to only when it modifies the readHead or needs to change state to Writing (so most write Advancing will skip the lock)

Shrunk the Flush lock when writing as it can modify the write head; then acquire lock after when signalling the Reader.

/cc @davidfowl @pakrym @jkotalik

stephentoub · 2019-02-12T19:05:29Z

Doesn't this mean that a segment which might otherwise have been available for the continuation is now much less likely to be? Is that not a problem at scale?

benaadams · 2019-02-12T19:39:56Z

Is that not a problem at scale?

Pool size is 16; segments are 4096 bytes so that would affect many small writes without Flush larger than 60kB (15 segments * 4096b) or larger if the ArrayPool is being used.

If you are doing single shot large writes then it would be 16 writes without a Flush (say 1MB x 16 = 16MB) that would hit this.

On the flip size holding the lock which is for both reader and writer and delaying the signaling of the continuation is more of an issue as ResetMemory is pretty chunky and needs to be called for every block returned; as well as then either returing the data via ArrayPool.Return (or ConcurrentQueue for Kestrel); so the writer wouldn't be able to make progress during this time (currently)

e.g. (called for every block returned)

; Assembly listing for method BufferSegment:ResetMemory():this
; Lcl frame size = 40

G_M401_IG01:
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488BF1               mov      rsi, rcx

G_M404_IG02:
       488B5628             mov      rdx, gword ptr [rsi+40]
       48B9B8D075C9FA7F0000 mov      rcx, 0x7FFAC975D0B8
       E8546A615F           call     CORINFO_HELP_ISINSTANCEOFINTERFACE
       4885C0               test     rax, rax
       7417                 je       SHORT G_M404_IG03
       488BC8               mov      rcx, rax
       49BB081536C9FA7F0000 mov      r11, 0x7FFAC9361508
       3909                 cmp      dword ptr [rcx], ecx
       FF1552B89AFF         call     [IDisposable:Dispose():this]
       EB49                 jmp      SHORT G_M404_IG04

G_M404_IG03:
       488B5628             mov      rdx, gword ptr [rsi+40]
       48B9020D50C9FA7F0000 mov      rcx, 0x7FFAC9500D02
       E845274F5F           call     CORINFO_HELP_ISINSTANCEOFARRAY
       488BF8               mov      rdi, rax
       4885FF               test     rdi, rdi
       742E                 je       SHORT G_M404_IG04
       48B9003F45C9FA7F0000 mov      rcx, 0x7FFAC9453F00
       BA77000000           mov      edx, 119
       E8B96C5F5F           call     CORINFO_HELP_CLASSINIT_SHARED_DYNAMICCLASS
       48B9E02E951095010000 mov      rcx, 0x19510952EE0
       488B09               mov      rcx, gword ptr [rcx]
       488BD7               mov      rdx, rdi
       4533C0               xor      r8d, r8d
       3909                 cmp      dword ptr [rcx], ecx
       E89795FFFF           call     TlsOverPerCoreLockedStacksArrayPool`1:Return(ref,bool):this

G_M405_IG04:
       33C0                 xor      rax, rax
       48894608             mov      gword ptr [rsi+8], rax
       33C0                 xor      rax, rax
       48894610             mov      qword ptr [rsi+16], rax
       33C0                 xor      rax, rax
       33D2                 xor      edx, edx
       488D4E18             lea      rcx, bword ptr [rsi+24]
       488901               mov      gword ptr [rcx], rax
       895108               mov      dword ptr [rcx+8], edx
       89510C               mov      dword ptr [rcx+12], edx
       33C0                 xor      rax, rax
       48894628             mov      gword ptr [rsi+40], rax
       48894630             mov      gword ptr [rsi+48], rax
       33C0                 xor      eax, eax
       894638               mov      dword ptr [rsi+56], eax
       33C0                 xor      rax, rax
       33D2                 xor      edx, edx
       4883C640             add      rsi, 64
       488906               mov      gword ptr [rsi], rax
       895608               mov      dword ptr [rsi+8], edx
       89560C               mov      dword ptr [rsi+12], edx

G_M407_IG05:
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      

; Total bytes of code 197, prolog size 6 for method BufferSegment:ResetMemory():this

pakrym · 2019-02-12T23:26:27Z

Is there a way to make segment pool non-locking? It doesn't have to be strict, loosing some segments to GC is acceptable.

davidfowl · 2019-02-12T23:34:33Z

Before we merge this one, we need to see the impact on our benchmarks.

benaadams · 2019-02-12T23:41:33Z

Is there a way to make segment pool non-locking?

Easiest way is to use the ConcurrentQueueSegment though its a bit chunky.

Though its single reader single writer, so might be able to do something simpler

SingleProducerConsumerQueue looks like the ticket?

benaadams · 2019-02-13T00:37:09Z

Made it lock-free and interleaved the Resets with the Returns

benaadams · 2019-02-13T01:54:38Z

Updated summary

This change changes the segment pool to be lock-free so the whole pipe doesn't need to be locked to return them; interleaves ResetMemory and return; eagerly making the BufferSegments available and triggers completion prior to returning the segments.

benaadams · 2019-02-13T02:18:29Z

@stephentoub does the conversion of SingleProducerSingleConsumerQueue to SingleProducerSingleConsumerPool look correct?

src/System.IO.Pipelines/src/System/IO/Pipelines/SingleProducerSingleConsumerPool.cs

benaadams · 2019-02-22T04:16:07Z

Removed the lock-free segment pool as SingleProducerSingleConsumerQueue is kinda chunky and it can be a future optimization.

Reduced the lock in the write path further following discussion in #35484

benaadams · 2019-02-22T04:19:21Z

Summary change

This change resets the segments outside of lock; then uses the pool as a different lock to return the Segments to the pool (and to get them from the pool) so its a fast lock only for the pooled Segments and doesn't lock the whole Pipe.

It also reduces the scope of the lock on the write side when acquiring Memory; to only when it modifies the readHead or needs to change state to Writing.

benaadams · 2019-02-23T14:44:50Z

Fixed race between setting writing active and _writingHead being set to null (e.g. need to check it again after setting state to writing as it may have been set to null in the mean time)

benaadams · 2019-02-23T17:58:54Z

*note race condition was in this PR, not in current version

benaadams · 2019-02-23T19:10:56Z

Using stack array for pooling rather than heap array; adds a chunk of zeroing; however its now done after completion and out of lock, so shouldn't hit perf

G_M48697_IG02:
       lea      rcx, bword ptr [rbp-B8H]
       vxorps   xmm0, xmm0
       vmovdqu  qword ptr [rcx], xmm0
       vmovdqu  qword ptr [rcx+16], xmm0
       vmovdqu  qword ptr [rcx+32], xmm0
       vmovdqu  qword ptr [rcx+48], xmm0
       vmovdqu  qword ptr [rcx+64], xmm0
       vmovdqu  qword ptr [rcx+80], xmm0
       vmovdqu  qword ptr [rcx+96], xmm0
       vmovdqu  qword ptr [rcx+112], xmm0

benaadams · 2019-02-23T21:15:11Z

Uses a CORINFO_HELP_CHECKED_ASSIGN_REF to write to the ValueArray; have added as example to the coreclr Jit issue: https://github.com/dotnet/coreclr/issues/15755#issuecomment-466683856

This reverts commit a3cb00f.

benaadams · 2019-02-23T21:47:48Z

Reverted the ValueArray as it didn't help perf and had issues (above)

benaadams · 2019-02-24T19:22:51Z

Debian.8.Amd64.Open-x64-Release failures System.Net.NameResolution.Functional.Tests
https://github.com/dotnet/corefx/issues/24355#issuecomment-466807679

pakrym · 2019-02-25T16:24:55Z

src/System.IO.Pipelines/src/System/IO/Pipelines/Pipe.cs

@@ -97,6 +98,7 @@ public Pipe(PipeOptions options)
            }

            _bufferSegmentPool = new BufferSegment[SegmentPoolSize];
+            _bufferSegmentsToReturn = new BufferSegment[SegmentPoolSize];


Ugh, larger pipe and more allocations per pipe.

pakrym · 2019-02-25T16:26:32Z

Any benchmark for the last iteration?

benaadams · 2019-02-25T16:28:03Z

Any benchmark for the last iteration?

Yea... Is why I added WIP 😉

pakrym · 2019-02-25T16:28:31Z

Yea... Is why I added WIP 😉

Whoops, sorry.

benaadams · 2019-02-25T16:40:20Z

Has a mild regression on some scenarios on the Pipe throughput benchmark; and I don't fully understand why; have dropped to asm and tweaking to improve asm CQ.

May fold them back in as a separate PR if I don't find the root cause.

karelz · 2019-03-04T05:54:56Z

@benaadams any update?

karelz · 2019-03-18T16:25:55Z

@benaadams any update / plans? (I assume you will be busy at MVP Summit this week)

karelz · 2019-04-01T20:28:10Z

1 month no update, closing for now. Please feel free to reopen it once you're ready to push it forward. Thanks!

benaadams force-pushed the buffers branch 2 times, most recently from c175b05 to e778d5b Compare February 12, 2019 22:31

benaadams force-pushed the buffers branch from 4ee10a8 to 5575e34 Compare February 13, 2019 02:10

benaadams force-pushed the buffers branch from 5575e34 to 1155517 Compare February 13, 2019 02:40

stephentoub reviewed Feb 13, 2019

View reviewed changes

src/System.IO.Pipelines/src/System/IO/Pipelines/SingleProducerSingleConsumerPool.cs Outdated Show resolved Hide resolved

benaadams changed the title ~~Schedule completion prior to pooling segments~~ [WIP] Schedule completion prior to pooling segments Feb 13, 2019

benaadams mentioned this pull request Feb 16, 2019

Lock-free IOQueue dotnet/aspnetcore#6154

Merged

benaadams force-pushed the buffers branch from 1155517 to 4209a5c Compare February 22, 2019 04:07

benaadams changed the title ~~[WIP] Schedule completion prior to pooling segments~~ Pipelines - Reduce lock contention Feb 22, 2019

benaadams changed the title ~~Pipelines - Reduce lock contention~~ Pipelines - Reduce write lock contention Feb 23, 2019

benaadams added 2 commits February 23, 2019 19:07

Pipelines - Reduce lock contention

4f9fd66

Fix race between lock and _writingHead == null

7e6d814

benaadams force-pushed the buffers branch from 663fa40 to 6b49d12 Compare February 23, 2019 19:07

benaadams force-pushed the buffers branch from 6b49d12 to 02053e9 Compare February 23, 2019 19:16

Use stack array for pooling

a3cb00f

benaadams force-pushed the buffers branch from 02053e9 to a3cb00f Compare February 23, 2019 19:32

benaadams added 2 commits February 23, 2019 21:40

Revert "Use stack array for pooling"

f548291

This reverts commit a3cb00f.

Move segment handling to bottom

aaeeff1

Shrink Flush lock

443b29b

benaadams force-pushed the buffers branch from 472fc75 to 443b29b Compare February 24, 2019 01:25

benaadams added 2 commits February 24, 2019 02:32

Move flush buffered to single method

3c09333

Add more asserts

30bcece

benaadams changed the title ~~Pipelines - Reduce write lock contention~~ [WIP] Pipelines - Reduce write lock contention Feb 24, 2019

pakrym reviewed Feb 25, 2019

View reviewed changes

karelz added the area-System.IO.Pipelines label Mar 4, 2019

karelz assigned benaadams, davidfowl and pakrym Mar 4, 2019

karelz closed this Apr 1, 2019

karelz added this to the 3.0 milestone Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Pipelines - Reduce write lock contention #35267

[WIP] Pipelines - Reduce write lock contention #35267

benaadams commented Feb 12, 2019 •

edited

Loading

stephentoub commented Feb 12, 2019

benaadams commented Feb 12, 2019 •

edited

Loading

pakrym commented Feb 12, 2019 •

edited

Loading

davidfowl commented Feb 12, 2019

benaadams commented Feb 12, 2019 •

edited

Loading

benaadams commented Feb 13, 2019

benaadams commented Feb 13, 2019

benaadams commented Feb 13, 2019

benaadams commented Feb 22, 2019

benaadams commented Feb 22, 2019

benaadams commented Feb 23, 2019 •

edited

Loading

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019 •

edited

Loading

benaadams commented Feb 23, 2019

benaadams commented Feb 24, 2019

pakrym Feb 25, 2019

pakrym commented Feb 25, 2019

benaadams commented Feb 25, 2019

pakrym commented Feb 25, 2019

benaadams commented Feb 25, 2019

karelz commented Mar 4, 2019

karelz commented Mar 18, 2019

karelz commented Apr 1, 2019

[WIP] Pipelines - Reduce write lock contention #35267

[WIP] Pipelines - Reduce write lock contention #35267

Conversation

benaadams commented Feb 12, 2019 • edited Loading

stephentoub commented Feb 12, 2019

benaadams commented Feb 12, 2019 • edited Loading

pakrym commented Feb 12, 2019 • edited Loading

davidfowl commented Feb 12, 2019

benaadams commented Feb 12, 2019 • edited Loading

benaadams commented Feb 13, 2019

benaadams commented Feb 13, 2019

benaadams commented Feb 13, 2019

benaadams commented Feb 22, 2019

benaadams commented Feb 22, 2019

benaadams commented Feb 23, 2019 • edited Loading

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019

benaadams commented Feb 23, 2019 • edited Loading

benaadams commented Feb 23, 2019

benaadams commented Feb 24, 2019

pakrym Feb 25, 2019

Choose a reason for hiding this comment

pakrym commented Feb 25, 2019

benaadams commented Feb 25, 2019

pakrym commented Feb 25, 2019

benaadams commented Feb 25, 2019

karelz commented Mar 4, 2019

karelz commented Mar 18, 2019

karelz commented Apr 1, 2019

benaadams commented Feb 12, 2019 •

edited

Loading

benaadams commented Feb 12, 2019 •

edited

Loading

pakrym commented Feb 12, 2019 •

edited

Loading

benaadams commented Feb 12, 2019 •

edited

Loading

benaadams commented Feb 23, 2019 •

edited

Loading

benaadams commented Feb 23, 2019 •

edited

Loading