Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4: Try to select a replica with available connections when calling DoSecondary #346

Open
woodsbury opened this issue Aug 28, 2023 · 4 comments

Comments

@woodsbury
Copy link

We had an issue recently where one of the replica nodes in our Redis cluster became unresponsive. Radix seemed to do a pretty good job at detecting that the connections it had open were failing and cleaning them out of the pool, but since Redis was still reporting the node in the output from "CLUSTER SLOTS" it understandably kept the pool around.

This led to an issue where all the commands we were running via DoSecondary on that shard slowed down considerably. We have two replicas for each shard in our cluster, and because the pool for the unresponsive node was still around Radix had a 50% chance of sending commands to it (which would inevitably fail after some timeout). We retry those failures, and we'd then have another 50% chance of getting the bad node again, etc.

An improvement we're suggesting is updating the code for selecting which replica to use to prefer any replicas which have non-empty pools. I've created an implementation of this that in some limited local testing removed all the errors we were seeing beforehand by making Radix more consistently choose the good node during this kind of scenario. If all nodes are bad (or just loaded up at that moment) then it should effectively perform the same behaviour as before of just selecting one at random.

The commit with the changes on it can be found here: woodsbury@62c0519

I wanted to present it here before creating a pull request for it in case there's any issues you can see with the general concept. The current changes do access the connections inside a pool directly as that was the easiest way to implement it, but there may be ways to clean that up more.

@mediocregopher
Copy link
Owner

Hi @woodsbury, I agree that this is a problem worth solving, and thank you for digging into it and coming up with a solution. I do have a couple comments about your implementation:

A minor one: the conns field on the pool is not thread-safe on its own, and Cluster's DoSecondary itself uses an RLock so multiple go-routines could be making calls against the pools at the same time. You'll want to access the pool within a pool.proc.WithRLock. That method returns an error if the pool has been closed, so you'll probably want to treat that as equivalent to an empty pool.

Another minor comment: whatever solution gets introduced here I would want to also add to the Sentinel implementation, because it surely has the same problem.

The bigger concern of mine is what you've touched on already, that you're accessing the pool directly. As it stands now there are no components of radix which downcast to any of radix's private types, everything is done via the public interfaces. This allows components to be wrapped or have their implementations completely replaced by users. It's something which is promised by the documentation, and I don't want to undercut that promise by introducing special behavior for the radix types.

A different solution which comes to mind would be to have Cluster track the result of calling doInner on each Client, using some kind of moving window. The random pool selection could be weighted based on how many errors have occurred in the window for each pool. This would have the downside of being a bit more complicated, but have the upside of working for any Client implementation.

I wouldn't blame you if you don't want to implement this though, it could be a bit tricky. But maybe you have some other ideas/comments as well. Wdyt?

@woodsbury
Copy link
Author

Yeah, I was worried just accessing the length like that wasn't thread safe.

Having the cluster track which of its clients seem to be misbehaving might work. The main thought I have around that is that if the cluster needs to determine that all by itself it would only ever be able to do so after the fact - it would have to first run a command against the client to detect if it returned an error or took longer than usual to respond, and then if it decided a client was unhealthy it wouldn't be able to work out when it had recovered without just trying to use it again and hoping it didn't get another error. It seemed to me like the client itself might be in a better position to determine if it's healthy or not, and therefore could react quicker both in determining that it was having problems and also that it had recovered. Tracking it in the cluster for each of its clients does also sound like it could be a trickier implementation on top of that.

With the implementation I've got, the idea I had to remove the explicit reference to the pool type was to introduce a new, optional interface. Any client could then implement that interface to opt in to the behaviour, or ignore it to continue using the current behaviour.

I've got an updated commit here with the concept: woodsbury@e3f42da

Like before, I'm not married to this specific implementation or naming or anything, just presenting it here to hopefully make it clearer what I'm thinking. I made the interface private just because that was simpler, but it could just as easily be a new public interface if that was better. I can also see having a way for a client to report whether or not it is healthy or ready might be useful outside of using it in a cluster as well. Do you have any thoughts on that?

@woodsbury
Copy link
Author

Continuing the testing we're doing on different failure states in a Redis cluster and how Radix behaves to them - this time checking failures of master nodes - we saw a similar issue but this time with the cluster resync-ing its topology.

When a master node dies, sending a command to it will fail. We can detect that we've got an error back and call Sync to try to update the topology, but when the cluster is only small there's a good chance that the node Radix chooses to call CLUSTER SLOTS on will be the same master that we've just detected as dead.

Using the same logic as that previous commit I was able to improve that scenario too: woodsbury@2e5f2c2

I've just duplicated the interface here because that was the easiest thing to do, not intending it to necessarily be a proper implementation. Just wanted to point out a related issue while we're discussing solutions.

@mediocregopher
Copy link
Owner

Hey @woodsbury , the idea of an optional interface had occurred to me as well. There's two reasons I'm hesitant about it:

  • I had worked hard to factor out all optional interfaces from v3 that had crept in, and as far as I remember was successful in doing so. This would be the first to be added to v4. Every optional interface is more surface area on the API, so it needs to be worth it.
  • When wrapping one interface implementation with another it's quite easy to negate the optional interface that the inner implementation might be implementing. This is especially true if the optional interface got added later. Maybe there's no so many interface wrappers being used for radix, but if it's possible to provide a benefit for everyone I would like to do so.

Which isn't to say I'm against the idea entirely, just that I'd like some more time to think on it.

It would be feasible to write a Client wrapper which implements the behavior I originally described, and use that to decide the return value from Ready(). This could be used by Cluster/Sentinel to wrap any Client implementations which don't implement the optional interface. By doing this we'd get the best of both worlds. The default Pool implementation would implement the optional interface as you've suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants