Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy middleware should detect broken backends and switch to healthy backends #2372

Closed
yonas opened this issue Dec 28, 2022 · 16 comments
Closed
Labels

Comments

@yonas
Copy link

yonas commented Dec 28, 2022

Issue Description

Currently, if a backend is down, Echo will return an error when it attempts to connect to it.

Expected behaviour

It would be a great feature for Echo's Proxy middleware to automatically iterate over remaining backends until one returns a successful connection.

Actual behaviour

Echo will return an error when it attempts to connect to a backend that is down.

Version/commit

4.9.0

@yonas yonas changed the title Load balancer should detect broken backends and switch to healthy backends Proxy middleware should detect broken backends and switch to healthy backends Dec 28, 2022
@aldas aldas added the feature label Dec 28, 2022
@mikemherron
Copy link
Contributor

mikemherron commented Mar 3, 2023

I'd be interested in taking a look at this and have some time available if this is still a desired feature.

Thinking through the potential implementation, there could be a few options. Simply moving on to the next backend on failure would work well for intermittent, occasional failures, but in the case of a catastrophic failure with a backend it might be wasteful to keep repeatedly trying it on every request, especially in high volume environments. In those cases, it might be better to remove the backend entirely from the pool for a period of time before trying to reintroduce it, essentially implementing circuit-breaker style logic.

Perhaps this could be broken down into 2 features...

  • Connection failure policy - what to do if unable to connect to a configured backend. Options are nothing (current behaviour), try next (try the next available backend), try all (keep trying backends until we can connect)
  • Backend eviction policy - what to do with a backend when unable to connect. Options are nothing (current behaviour) or a configurable circuit breaker (remove after X consecutive failures, after X failures over Y period etc..)

Perhaps the first feature, implementing a simple connection failure policy, is a good starting point?

@aldas
Copy link
Contributor

aldas commented Mar 3, 2023

It is probably better to implement this as a separate library as load-balancing it is quite huge domain itself and trying to fit into Echo core repository probably causes problems. As a separate library it can have own release plan, breaking changes, own dependencies etc.

And rules/behavior could be picked from similar implementations - every bigger HTTP server has load-balancing functionality.

probably these docs have hints how these implementations behave when problems occur.

@aldas
Copy link
Contributor

aldas commented Mar 3, 2023

Echo readme.md has 3rd party libraries block where we could add this library and maybe even recommend at website under proxy pages as a more advanced option

@yonas
Copy link
Author

yonas commented Mar 5, 2023

I'd be interested in taking a look at this and have some time available if this is still a desired feature.

@mikemherron It's still a desired feature for me. Thanks for working on this.

@aldas
Copy link
Contributor

aldas commented Mar 5, 2023

@mikemherron if you are developing this, consider adding (additional) support for using this proxy middleware with/as pure Handler.ServeHTTP method - this way other frameworks that are based on Go standard library could also benefit from it.

I tried quick search on Github for "golang proxy load-balancer" and did not find anything advanced. It would be nice if whole Go ecosystem could benefit from something like that.

@mikemherron
Copy link
Contributor

Thanks for the input @aldas - my main motivation was to understand the internals of Echo a bit better, and hopefully make a worthwhile contribution to an incredibly useful community project. I was reviewing the issues list and this seemed like a good one to pick up.

However, I completely understand your concern over adding that complexity to Echo core. Implementing a seperate lib is less appealing as it's probably quite a bit broader in scope and also more likely to become stale if not part of a wider community project. I may still give it a shot, but is there any (even minimal) implementation of this logic you would see as acceptable for core?

@aldas
Copy link
Contributor

aldas commented Mar 6, 2023

If its not huge/very complex, have many "moving parts" and does not try to do many things, does not depend on 3rd party libraries - I think we can merge it back to core.

This is domain that potentially has many requests for different features and as a maintainers we need to be careful not to scope-creep Echo too much. Sad truth is - maintainers come and go but lines of code only grows.

@aldas
Copy link
Contributor

aldas commented Mar 6, 2023

@mikemherron would it be OK from you, if you try to limit feature set only to

Connection failure policy - what to do if unable to connect to a configured backend. Options are nothing (current behaviour), try next (try the next available backend), try all (keep trying backends until we can connect)

  • when we can no connect to our target, try to connect to next target,
  • maybe have configuration number - how many next tries we can have till we give up (can not be more than configured backends) or callback that we can provide - which takes in connection error, returns error or nil. When nil - MW will choose next, when err - MW returns with that err.
  • no eviction (just to limit our scope)

@mikemherron
Copy link
Contributor

That makes sense and sounds good to me, thanks @aldas. I will have some time this week for a first pass and see how it goes. If it feels like too much for Echo core then of course we can always decide not to merge it.

@mikemherron
Copy link
Contributor

I linked a draft PR with a rough initial implementation - any thoughts or comments appreciated. I went with the callback idea you proposed @aldas. LMK any feedback - if it seems reasonable I can continue work on it and get it ready for a full PR. If you feel it's outside of the scope of core Echo then no problem.

@aldas
Copy link
Contributor

aldas commented Mar 7, 2023

It small, is not that complex - I think it has very high potential to land in core.

I can give more comments Thursday as it is quite late here and I have extremely long workday tomorrow with traveling so probably too mentally exhausted to give thought out feedback.

But quick feedback from diagonally scrolling thought the PR:

  • implementing ProxyRetryProvider struct may be ergonomically overkill compared to ProxyRetryHandler just needs function closure. Do not throw it immediately out but think of the user - lets say retry is based on some specific error then maybe creating struct for it little bit too much work. I may be wrong here - for creating request based retryhandler instance maybe interface is better. This is my first emotional reaction without thinking about different aspects here.
  • maybe retryHandler(c, e) should be more "generic" - not inside these 2 IFs. Maybe assumption of checking echo.HTTPError + http.StatusBadGateway is something that default retryHandler should implement but if you provide you own we do not limit your choices to this condition.

@mikemherron
Copy link
Contributor

Perfect, thanks for taking the time to look and appreciate the feedback. No rush, I'll consider these points and do some more detailed testing in the meantime.

@mikemherron
Copy link
Contributor

Thinkin on your feedback @aldas I think you are right, the struct is asking a bit much of the user. I went with a struct as the retry implementation needs to store the retry state per request - rather than having nested closures, it felt nicer for the retry config value to be stored on a struct.

However, thinking through the whole approach, it seems that users providing their own implementation would not want to do unbounded retries forever, this seems dangerous. Looking through the docs for the other HTTP servers, most allow to retry N times, or for a set period of time - retrying for a set period of time seems like it would be difficult to implement well without channels, which I think are overkill. So it doesn't seem likely to me a user would ever want to provide a custom implementation WITHOUT a fixed number of retries (and in fact it might be too easy/dangerous for a user to do this by accident with the current code).

Therefore, I think it will make more sense to:

  1. Make a fixed number of retries a built-in feature of the middleware, configurable with a simple int (default 0).
  2. Have a seperate function that can be used to check if the request should be retried (agree with your point it should be more generic) - this would default to checking fr bad gateway errors only but users could change this.

This way, the custom function doesn't need to track per-request state at all (no structrs or closures within closures) and overall feels like a simpler interface for users to understand. The disadvantages are it leaves 2 new things on the ProxyConfig public struct (retry count and handler function) and that the only retry behaviour supported would be to retry N times - but I don't think (without a channel interface) it would be possible to do much else that would be useful.

Apologies for the long post! I will update the PR either tonight or tomorrow and it will hopefully be clear.

@mikemherron
Copy link
Contributor

I updated my PR and made ready for review. I ended up keeping the error check outside of the handler function - have elaborated in the PR.

@aldas
Copy link
Contributor

aldas commented May 23, 2023

closing, #2414 has been merged

@aldas aldas closed this as completed May 23, 2023
@yonas
Copy link
Author

yonas commented May 24, 2023

@mikemherron @aldas Thanks! 🥳 I'm going to try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants