-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When the master in a cluster is failed over, the writes fail with InternalError until the next auto discovery cycle #1660
Comments
Several potential solutions come to mind, but I'm not an expert in the library and would love some feedback. One solution could be to add an override flag that will allow the library to write to the Alternatively, we could kick off an auto discovery operation ( Thoughts? |
I'm not 100% sure about this because if there is a MOVED happening (e.g. bad proxy somewhere) this would just continually re-run...but only once every 5 seconds. Overall though, we linger in a bad state retrying moves until a discovery happens today and this could be resolved much faster. Meant to help address #1520, #1660, #2074, and #2020.
Meant to help address #1520, #1660, #2074, and #2020. I'm not 100% sure about this because if there is a MOVED happening (e.g. bad proxy somewhere) this would just continually re-run...but only once every 5 seconds. Overall though, we linger in a bad state retrying moves until a discovery happens today and this could be resolved much faster.
I've been testing the StackExchange.Redis library against a Redis cluster with three shards. My topology consists of three physical nodes each running three Redis instances, i.e. one master and two replicas (for the other two shards) on each physical node, resulting in a total of nine Redis instances. Each of the physical nodes has its own public IP address and each Redis instance.
When the master is failed over to one of its replicas, the library doesn't seem to handle the MOVED response properly on set commands. Specifically, when the library processes the MOVED response as it tries to set a value on the old master (which is now a replica), it correctly updates
ServerSelectionStrategy.map
for the given hash slot (i.e. it changes the slot in the array to theServerEndpoint
of the new master), but when it tries to re-send the set command, the logic in ServerSelectionStrategy.Select() causes the old master to be chosen again because theServerEndpoint
isn't marked as a master yet:Interestingly,
ServerSelectionStrategy.Select()
has a comment that states that all the entries in the 'map' are masters, which is correct, so why do we need to even callFindMaster()
on the node if we could just use it directly?The end result of the current logic is that the set operation ultimately fails with an InternalServer error since the same endpoint is tried twice and it's no longer the master.
The text was updated successfully, but these errors were encountered: