Fix incorrect HashSlot calculation for XREAD and XREADGROUP commands #2093

nielsderdaele · 2022-04-14T08:39:56Z

I have created a fix for the incorrect HashSlot calculation for the XREAD and XREADGROUP commands (#2086) Since these differ a lot from other messages I have created some new CommandMessage classes:

SingleStreamReadGroupCommandMessage
MultiStreamReadGroupCommandMessage
SingleStreamReadCommandMessage
MultiStreamReadCommandMessage

The single CommandMessage classes could be removed by only using the multi command messages, but then we have to create a new StreamPostion array and a new StreamPostion object. I wasn't sure which option too choose, so feel free to give input on this (GC vs duplicate code).

There is also some code duplication between the Read and ReadGroup Command messages as the ReadGroup command is very similar to the XREAD but has some additional values in the front of the command (group, consumer, noack). A base class could be created as well to reduce some of the duplication (calculating the hashslot in case of multiple streams and writing the streams part of the commands).

The method CommandAndKey is also not overriden for these Messages as I wasn't which argument of the command to include in the string.

…m (StreamReadGroup/StreamReadGroupAsync)

…ms (StreamReadGroup/StreamReadGroupAsync)

…ync)

NickCraver · 2022-04-16T16:27:35Z

@nielsderdaele I think this looks good at a glance (needs some dedicated eyes), it'd be good if we could ensure this is working in tests though. For example we can attach a profiler and see if any messages were re-sent against cluster (also proving it's broken today). I'll try and grok this and propose some tests to ensure this doesn't regress and have a pattern where we can assess this anywhere else by using that.

nielsderdaele · 2022-04-18T06:27:13Z

@NickCraver Sounds good. I was also thinking about writing some test for the problem but didn't get to it yet. I was considering listening to the HashSlotMoved event on the ConnectionMultiplexer. This shouldn't be raised if the HashSlots are calculated correctly. In an ideal situation these tests should be implemented for each command.

For now the code has only been tested manually with a debugger and by watching statistics of redis (rejected_calls) while running some software with the modified code. We have some software components which read high volume of streams using the StreamReadGroup commands (and acknowledges the messages after processing). This is how we stumbled upon the problem as we were seeing a huge number of rejected_calls in redis statistiscs.

Note that I have only looked at the Stream commands but the same problem exists for multiple commands (eg. SortedSetCombine, SetIntersectionLength). Most of these commands are easier to fix as they have static key positions (eg. multiple keys start at position 1 -> RedisValueKeysValue(s) Commands).

NickCraver · 2022-04-18T15:40:49Z

@nielsderdaele Good idea! I'm on docs at the moment just playing (mostly away this week), but like we have expected errors in the test base class, we could set an expected number of moves to globally find and fail all these cases perhaps. If a class is intentionally getting a move, then it could declare so, SetExpectedAmbientFailureCount style.

Thoughts?

NickCraver · 2022-04-18T18:07:13Z

hmmm, I don't think the event handler is going to work because it doesn't fire in all cases. I'll get some counters in so we can observe this in tests. I need to explore this anyway because we're like to trigger a reconfigure after any wave of MOVED responses to solve the topology change case some are seeing on cluster resizing anyhow.

mgravell

looks good to me; not necessarily for this PR, but I wonder if our test code has access to the internals here sufficiently that we could new the message and validate the slot calculation, which would have shown the original problem... just a thought

Niels Derdaele added 3 commits April 14, 2022 09:28

Fix incorrect HashSlot calculation for XREADGROUP with a single strea…

5dc1b21

…m (StreamReadGroup/StreamReadGroupAsync)

Fix incorrect HashSlot calculation for XREADGROUP with multiple strea…

cb1fdea

…ms (StreamReadGroup/StreamReadGroupAsync)

Fix incorrect HashSlot calculation for XREAD (StreamRead/StreamReadAs…

6357b7e

…ync)

NickCraver added ➕ enhancement ⚙️ area:commands ⚙️ area:API labels Apr 16, 2022

NickCraver added 2 commits June 14, 2022 10:08

Merge remote-tracking branch 'origin/main' into pr/2093

d94f269

Formatting and release notes

dbc1cdd

mgravell approved these changes Jun 14, 2022

View reviewed changes

NickCraver merged commit 8c9cb1b into StackExchange:main Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect HashSlot calculation for XREAD and XREADGROUP commands #2093

Fix incorrect HashSlot calculation for XREAD and XREADGROUP commands #2093

nielsderdaele commented Apr 14, 2022 •

edited

Loading

NickCraver commented Apr 16, 2022

nielsderdaele commented Apr 18, 2022

NickCraver commented Apr 18, 2022

NickCraver commented Apr 18, 2022

mgravell left a comment

Fix incorrect HashSlot calculation for XREAD and XREADGROUP commands #2093

Fix incorrect HashSlot calculation for XREAD and XREADGROUP commands #2093

Conversation

nielsderdaele commented Apr 14, 2022 • edited Loading

NickCraver commented Apr 16, 2022

nielsderdaele commented Apr 18, 2022

NickCraver commented Apr 18, 2022

NickCraver commented Apr 18, 2022

mgravell left a comment

Choose a reason for hiding this comment

nielsderdaele commented Apr 14, 2022 •

edited

Loading