(Another) performance issue after migrating from v1 to v2 #2000

andreminelli · 2022-02-18T12:48:44Z

So we were using v1.2.6 for the last 4 years at least, and we have tried (again) to update to v2.2.88 (latest at this time).

This time we have more metrics in place to share, hoping this could help us all to understand why the average "execution" time (we measure full 'modules' execution time, which call a few Redis operations among its own inner logic) is about 2 times slower with v2 compared to v1.
Obs: This measurements were obtained by updating a single 'worker' instance to V2 (in theory this was the only change - just updated the package, with no other code changes except to fix broken/new method signatures, for instance) and running it side-by-side with existing V1 'worker' instances. The volume of requests handled by this updated worker can be considered stable during this period.

Besides this 'slowness', we observed several internal performance metrics getting worse (the worker was updated on Febrary, 17th, in the morning).
Our 'reading' of those metrics on peak hours, in a nutshell, is that V2 is allocating more memory (we see much more collections on all generations), making GC to work harder (time in GC is consistently close to 15% and suspension above 8%). The thread usage is worse too, but for me is not clear if this is a symptom of GC 'hard time' or a cause...

Any thoughts?

andre-ss6 · 2022-02-18T21:03:53Z

I think it's worth noting that this is the 4th time we try to upgrade since 2.0 has launched, and every time we had performance regressions, mainly higher latency.

NickCraver · 2022-02-18T21:23:54Z

Can y'all provide any thread dumps or memory comparisons from those peak loads? Any information on where the time's being spent would be very helpful. Some constructs are very similar and some paths are completely rewritten so knowing which features/paths you're using would be very helpful. And it may be that we can suggest alternative setups, for example tweaks to the socket manager thread pool for burst loads or generally moving off it.

andre-ss6 · 2022-02-18T21:47:54Z

Sure! Will do! We already got a process memory dump from two machines (one on v2, another on v1 of the library); We will soon also get a report on CPU usage by method. We can then send you all those and any other info that we manage to collect :)

NickCraver · 2022-02-18T21:56:06Z

Excellent - thank you! I'd like to improve perf however we can, so looking forward to any more info.

andre-ss6 · 2022-02-23T00:44:17Z

Hey @NickCraver. These are some of the data that we've collected so far:

After we upgraded to Redis from 1.2.6 to 2.2.88, we experienced a slowdown in some parts of our code that uses Redis extensively, sometimes reaching a 2x slowdown. The following graphs show the average processing time of some of our modules that use Redis, where the green line represents the instance where Redis has been upgraded (and all the others are Redis 1.2.6):

This is a small sample size, but it is consistent in every timespan we analyzed: the instance where Redis is upgraded is always slower by that same rough margin.

For comparison, these are the average processing time of our modules that do not use Redis, where, again, the green line represents the instance where Redis has been upgraded:

We can observe a difference between the instances - and the green one is even higher than a few; However, the difference here not only is way smaller, but also it doesn’t correlate with the metrics we analyzed later (and are presented in this document) when we compare other metrics with other instances (including the orange and purple ones).

We also have code in place, client-side, to measure latency from Redis requests. It is simply a decorator on StackExchange.Redis.IDatabase, and it looks like this:

Do note that all methods in the interface are decorated in that same way, not only the ones shown here.

The tracer referenced there is from a library from Dynatrace.

These are the results from Dynatrace:

Response Times

2.2.88	1.2.6 (Instance 1)	1.2.6 (Instance 2)

GC and Thread pool differences

We also noticed a moderate increase to GC suspension time and time spent in GC and also to both the number Thread pool active worker threads and Thread pool queue. The following graphs show these metrics for some of our instances, one running the upgraded version of the library while the rest is using 1.2.6.

2.2.88	1.2.6 (Instance 1)	1.2.6 (Instance 2)	1.2.6 (Instance 3)

andre-ss6 · 2022-02-23T00:46:11Z

@NickCraver Also, do you know any way we could anonymize (basically remove strings from) our process memory dump in order to send it to you?

NickCraver · 2022-02-23T01:05:12Z

@andre-ss6 Thanks for these! I formatted the comment just a bit for easier comparison than scrolling. I wouldn't trust any tool to scrub data, so let me see about allocations between these 2 scenarios overall. Is this under a lot of concurrent load per instance? (I'm assuming yes)

andre-ss6 · 2022-02-23T01:32:13Z

@andre-ss6 Thanks for these! I formatted the comment just a bit for easier comparison than scrolling.

Thanks!

I wouldn't trust any tool to scrub data, so let me see about allocations between these 2 scenarios overall.

Aight. You want screenshots as well?

Is this under a lot of concurrent load per instance? (I'm assuming yes)

Yes. We do use a singleton IConnectionMultiplexer per instance though. By the way, we have also tried setting the Client Workers count to 20 (based on a comment in another issue) using a custom SocketManager in the upgraded version, however it seems to not have had any effect at all.

andreminelli · 2022-02-23T14:22:37Z

By the way, we have also tried setting the Client Workers count to 20 (based on a comment in another issue) using a custom SocketManager in the upgraded version

For the sake of completeness, the issue mentioned by @andre-ss6 is in fact a PR: #1115

NickCraver · 2022-02-24T03:21:10Z

@andre-ss6 @andreminelli pssssst, you're going to want to peek at #2008. I'll aim to get it in with a prerelease up on NuGet ASAP - any chance you could give that a go in your scenario when available? It'll be on MyGet right after the PR's merged as well.

andre-ss6 · 2022-02-24T16:39:57Z

@NickCraver #2009 is taking a while to be on MyGet. Does anything manual need to be done?

andre-ss6 · 2022-02-24T20:38:04Z

@NickCraver Ah, I just noticed some tests failed :/

NickCraver · 2022-02-24T22:06:58Z

@andre-ss6 kicked that build, and making that new Envoy test (which isn't present on AppVeyor because Windows) more resilient.

NickCraver · 2022-02-27T04:39:00Z

@andre-ss6 You'll want to grab latest off MyGet for testing what 2.5.x will have. I just ran a lot of old scenarios in issues that were performance gaps vs. 1.2.6 but now look way, way better. Very curious what performance y'all see especially with such great monitoring in place!

andre-ss6 · 2022-02-27T04:43:49Z

@andre-ss6 You'll want to grab latest off MyGet for testing what 2.5.x will have. I just ran a lot of old scenarios in issues that were performance gaps vs. 1.2.6 but now look way, way better. Very curious what performance y'all see especially with such great monitoring in place!

I'm also very curious! I'll be working on this for the next few days. Thanks a lot for your effort!

andreminelli · 2022-03-02T08:56:23Z

@NickCraver , is there an ETA for a 2.5 release ?

NickCraver · 2022-03-02T13:01:29Z

@andreminelli There's pretty much the final up on NuGet now as https://www.nuget.org/packages/StackExchange.Redis/2.5.34-prerelease, but I'm hoping to fold in #configuration and release 2.5 tomorrow if time allows, may get to it tonight but busy day so unsure.

andreminelli · 2022-03-02T13:12:05Z

There's pretty much the final up on NuGet now as https://www.nuget.org/packages/StackExchange.Redis/2.5.34-prerelease, but I'm hoping to fold in #configuration and release 2.5 tomorrow if time allows, may get to it tonight but busy day so unsure.

Wow, this soon? 😃
I must say that I was expecting to only have this after some weeks or so.
Thanks!

NickCraver · 2022-03-02T13:19:00Z

@andreminelli Welcome! If y'all were able to give that 2.5.34 a test I'd really appreciate it to have complete confirmation these changes in 2.5 make any impact for your scenario (we've been testing with partner teams elsewhere and see good things)

andre-ss6 · 2022-03-02T19:05:38Z

@andreminelli There's pretty much the final up on NuGet now as https://www.nuget.org/packages/StackExchange.Redis/2.5.34-prerelease, but I'm hoping to fold in #configuration and release 2.5 tomorrow if time allows, may get to it tonight but busy day so unsure.

Does this one include the follow-up PR? I ask since the latest one on MyGet is 2.5.39.

NickCraver · 2022-03-02T19:57:23Z

2.5.39 has all of the performance changes, only the config bits are left to follow up on but those shouldn't have any impact. I may add some connection logging as well but in any case: any performance assessments you have of 2.5.39 would be super helpful!

NickCraver · 2022-03-05T21:16:18Z

@andre-ss6 @andreminelli 2.5.43 is up on NuGet now :)

andre-ss6 · 2022-03-05T23:37:43Z

@andre-ss6 @andreminelli 2.5.43 is up on NuGet now :)

Nice, thanks! We are on the process of homologation of 2.5.39. it's behaving well for now. We will try deploying to canary next Monday.

andre-ss6 · 2022-03-08T19:21:20Z

@NickCraver ... I got some good news!

The green line is still the canary instance, with 2.5.39 now!

We will probably deploy 2.5.43 to production next week!

NickCraver · 2022-03-08T19:24:17Z

Wooo, thanks for the update @andre-ss6!

andre-ss6 · 2022-03-08T19:28:06Z

Thank you for you continued contribution, for your fast response and for your efforts! This was very important for us!

andreminelli · 2022-03-09T11:32:22Z

Thank you for you continued contribution, for your fast response and for your efforts! This was very important for us!

I couldn't agree more with @andre-ss6 .
Thank you so much, @NickCraver.

We should close this issue right after deploying with 2.5.43 release, ok?

NickCraver · 2022-03-09T12:33:01Z

I appreciate all the diagnostics and testing - help to us too <3

I'll leave this be, when y'all are deployed and good on 2.5.43 you can close at will or let me know if something unexpected happens :)

andreminelli · 2022-03-25T14:06:12Z

We have deployed successfully using version 2.5.43, with nearly no performance difference from 1.2.6
Thanks again, @NickCraver 👏👏

NickCraver added the 🐌 performance label Feb 18, 2022

andreminelli closed this as completed Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Another) performance issue after migrating from v1 to v2 #2000

(Another) performance issue after migrating from v1 to v2 #2000

andreminelli commented Feb 18, 2022

andre-ss6 commented Feb 18, 2022

NickCraver commented Feb 18, 2022

andre-ss6 commented Feb 18, 2022 •

edited

Loading

NickCraver commented Feb 18, 2022

andre-ss6 commented Feb 23, 2022 •

edited by NickCraver

Loading

andre-ss6 commented Feb 23, 2022

NickCraver commented Feb 23, 2022

andre-ss6 commented Feb 23, 2022 •

edited

Loading

andreminelli commented Feb 23, 2022

NickCraver commented Feb 24, 2022

andre-ss6 commented Feb 24, 2022

andre-ss6 commented Feb 24, 2022

NickCraver commented Feb 24, 2022

NickCraver commented Feb 27, 2022

andre-ss6 commented Feb 27, 2022

andreminelli commented Mar 2, 2022

NickCraver commented Mar 2, 2022

andreminelli commented Mar 2, 2022

NickCraver commented Mar 2, 2022

andre-ss6 commented Mar 2, 2022

NickCraver commented Mar 2, 2022

NickCraver commented Mar 5, 2022

andre-ss6 commented Mar 5, 2022 •

edited

Loading

andre-ss6 commented Mar 8, 2022 •

edited

Loading

NickCraver commented Mar 8, 2022

andre-ss6 commented Mar 8, 2022

andreminelli commented Mar 9, 2022 •

edited

Loading

NickCraver commented Mar 9, 2022

andreminelli commented Mar 25, 2022

(Another) performance issue after migrating from v1 to v2 #2000

(Another) performance issue after migrating from v1 to v2 #2000

Comments

andreminelli commented Feb 18, 2022

andre-ss6 commented Feb 18, 2022

NickCraver commented Feb 18, 2022

andre-ss6 commented Feb 18, 2022 • edited Loading

NickCraver commented Feb 18, 2022

andre-ss6 commented Feb 23, 2022 • edited by NickCraver Loading

Response Times

GC and Thread pool differences

andre-ss6 commented Feb 23, 2022

NickCraver commented Feb 23, 2022

andre-ss6 commented Feb 23, 2022 • edited Loading

andreminelli commented Feb 23, 2022

NickCraver commented Feb 24, 2022

andre-ss6 commented Feb 24, 2022

andre-ss6 commented Feb 24, 2022

NickCraver commented Feb 24, 2022

NickCraver commented Feb 27, 2022

andre-ss6 commented Feb 27, 2022

andreminelli commented Mar 2, 2022

NickCraver commented Mar 2, 2022

andreminelli commented Mar 2, 2022

NickCraver commented Mar 2, 2022

andre-ss6 commented Mar 2, 2022

NickCraver commented Mar 2, 2022

NickCraver commented Mar 5, 2022

andre-ss6 commented Mar 5, 2022 • edited Loading

andre-ss6 commented Mar 8, 2022 • edited Loading

NickCraver commented Mar 8, 2022

andre-ss6 commented Mar 8, 2022

andreminelli commented Mar 9, 2022 • edited Loading

NickCraver commented Mar 9, 2022

andreminelli commented Mar 25, 2022

andre-ss6 commented Feb 18, 2022 •

edited

Loading

andre-ss6 commented Feb 23, 2022 •

edited by NickCraver

Loading

andre-ss6 commented Feb 23, 2022 •

edited

Loading

andre-ss6 commented Mar 5, 2022 •

edited

Loading

andre-ss6 commented Mar 8, 2022 •

edited

Loading

andreminelli commented Mar 9, 2022 •

edited

Loading