fix for mutex slim being released too early during flush operation #1585

arsnyder16 · 2020-10-14T17:47:58Z

When flushing during FlushSync we are specifying a timeout to the Task, but when this timeout gets hit the underlying task is still running. The thread that invoked FlushSync continues and releases the MutexSlim that is protecting the io pipe allowing another thread to obtain the MutexSlim and try to run operations on the pipe.

To fix use the cancellation token that can be passed to PipeWriter.FlushAsync.

An alternative to this is to not specify a timeout. There are 2 additional locations that flush the pipe neither of those locations specify a timeout they just simply await the Task. Generally the timeouts are driven by how long an operation waits to obtain the MutexSlim

arsnyder16 · 2020-10-14T18:17:07Z

Looks like the CI checks are failing for some unrelated issue, seems like the other PRs have the same issue

mgravell · 2020-10-14T19:50:12Z

Normally, time constraints are limiting how much I can get done on PRs etc, but this is very interesting. I'm going to carve out some of tomorrow morning to look at this, because if you're right (and I'm assuming you are, haven't looked deep at the code yet) it could be really useful! Thanks.

…

On Wed, 14 Oct 2020, 19:17 arsnyder16, ***@***.***> wrote: Looks like the CI checks are failing for some unrelated issue, seems like the other PRs have the same issue — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1585 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAEHMBVNRW4IYMZKORSB33SKXTLHANCNFSM4SQ5QC6A> .

arsnyder16 · 2020-10-14T20:03:48Z

Thanks, i would be interested in hearing your thoughts, since i am not overly familiar with the library. Feel free to adjust it in another way, this seemed to make the most sense to me.

src/StackExchange.Redis/PhysicalConnection.cs

ghost · 2020-10-14T22:36:34Z

I can't speak to the exact fix, but the described behavior matches exactly the symptoms we've seen provoke the issue.

ghost · 2020-10-14T22:57:05Z

Okay, I've walked through the precise fixes. Normally, I'd like to see a continuation bound for on cancel but that's not really the model or style of the lib, I think?

Plasma · 2020-10-17T00:27:23Z

I agree the timeout can cause the lock to be released (incorrectly) while the task is actually making progress in the background. so I think this change is correct.

NickCraver · 2020-10-24T16:31:33Z

Merging main in here for updated build checks

JKurzer · 2020-11-03T21:31:18Z

Do we have a timeline for accepting the PR? This issue continues to block us from rolling our version forward.

mgravell · 2020-11-04T17:18:57Z

Sorry, I had planned to already look at it, but things got ... busy. I hope to look tomorrow (although I appreciate that I've said that before)

JKurzer · 2020-11-04T17:32:22Z

No worries, we're in a stable-ish state on an older version, and life is pretty complicated for all of us right now.

mgravell · 2020-11-05T12:54:00Z

I concur with the problem, but I'm very dubious of allocating a CTS with scheduled timeout every call; I wonder instead if we could allocate and reuse a CTS, and only schedule the timeout (dooming it for reuse) when we get a non-sync response; something like this: e1e7189

(I recommend viewing without whitespace deltas: e1e7189?w=1)

thoughts? if you agree that this minor tweak makes sense, feel free to cherry-pick or merge it into your branch, and we can get this merged.

A very nuanced spot - congrats for finding this and offering a fix.

mgravell · 2020-11-13T10:43:23Z

merged with the CTS tweak as proposed

fix for mutex slim being released too early during flush operation

4766196

arsnyder16 mentioned this pull request Oct 14, 2020

Get & Set Failing After Error Cascade #1438

Closed

arsnyder16 mentioned this pull request Oct 14, 2020

Prevent too many ProcessBacklog() from being scheduled #1574

Closed

Plasma reviewed Oct 14, 2020

View reviewed changes

src/StackExchange.Redis/PhysicalConnection.cs Outdated Show resolved Hide resolved

Andrew Snyder added 2 commits October 15, 2020 08:44

using cancellation token source

0ba9a2f

rethrow any other

0d5e7c9

Merge branch 'main' into pr/1585

3d43209

mgravell merged commit 3d43209 into StackExchange:main Nov 13, 2020

nmccalme mentioned this pull request Feb 24, 2021

Multi-target .NET Core 3.1 / .NET Core 5.0 for latest package versions (2.2.x) #1697

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for mutex slim being released too early during flush operation #1585

fix for mutex slim being released too early during flush operation #1585

arsnyder16 commented Oct 14, 2020 •

edited

Loading

arsnyder16 commented Oct 14, 2020

mgravell commented Oct 14, 2020 via email

arsnyder16 commented Oct 14, 2020

ghost commented Oct 14, 2020

ghost commented Oct 14, 2020

Plasma commented Oct 17, 2020

NickCraver commented Oct 24, 2020

JKurzer commented Nov 3, 2020 •

edited

Loading

mgravell commented Nov 4, 2020

JKurzer commented Nov 4, 2020

mgravell commented Nov 5, 2020 •

edited

Loading

mgravell commented Nov 13, 2020

fix for mutex slim being released too early during flush operation #1585

fix for mutex slim being released too early during flush operation #1585

Conversation

arsnyder16 commented Oct 14, 2020 • edited Loading

arsnyder16 commented Oct 14, 2020

mgravell commented Oct 14, 2020 via email

arsnyder16 commented Oct 14, 2020

ghost commented Oct 14, 2020

ghost commented Oct 14, 2020

Plasma commented Oct 17, 2020

NickCraver commented Oct 24, 2020

JKurzer commented Nov 3, 2020 • edited Loading

mgravell commented Nov 4, 2020

JKurzer commented Nov 4, 2020

mgravell commented Nov 5, 2020 • edited Loading

mgravell commented Nov 13, 2020

arsnyder16 commented Oct 14, 2020 •

edited

Loading

JKurzer commented Nov 3, 2020 •

edited

Loading

mgravell commented Nov 5, 2020 •

edited

Loading