Proxy middleware should detect broken backends and switch to healthy backends #2372

yonas · 2022-12-28T16:32:28Z

Issue Description

Currently, if a backend is down, Echo will return an error when it attempts to connect to it.

Expected behaviour

It would be a great feature for Echo's Proxy middleware to automatically iterate over remaining backends until one returns a successful connection.

Actual behaviour

Echo will return an error when it attempts to connect to a backend that is down.

Version/commit

4.9.0

mikemherron · 2023-03-03T12:07:15Z

I'd be interested in taking a look at this and have some time available if this is still a desired feature.

Thinking through the potential implementation, there could be a few options. Simply moving on to the next backend on failure would work well for intermittent, occasional failures, but in the case of a catastrophic failure with a backend it might be wasteful to keep repeatedly trying it on every request, especially in high volume environments. In those cases, it might be better to remove the backend entirely from the pool for a period of time before trying to reintroduce it, essentially implementing circuit-breaker style logic.

Perhaps this could be broken down into 2 features...

Connection failure policy - what to do if unable to connect to a configured backend. Options are nothing (current behaviour), try next (try the next available backend), try all (keep trying backends until we can connect)
Backend eviction policy - what to do with a backend when unable to connect. Options are nothing (current behaviour) or a configurable circuit breaker (remove after X consecutive failures, after X failures over Y period etc..)

Perhaps the first feature, implementing a simple connection failure policy, is a good starting point?

aldas · 2023-03-03T12:30:46Z

It is probably better to implement this as a separate library as load-balancing it is quite huge domain itself and trying to fit into Echo core repository probably causes problems. As a separate library it can have own release plan, breaking changes, own dependencies etc.

And rules/behavior could be picked from similar implementations - every bigger HTTP server has load-balancing functionality.

Caddy docs are here https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#load-balancing
Apache HTTP https://httpd.apache.org/docs/2.4/mod/mod_proxy_balancer.html
Nginx https://nginx.org/en/docs/http/load_balancing.html

probably these docs have hints how these implementations behave when problems occur.

aldas · 2023-03-03T12:35:54Z

Echo readme.md has 3rd party libraries block where we could add this library and maybe even recommend at website under proxy pages as a more advanced option

yonas · 2023-03-05T14:16:58Z

I'd be interested in taking a look at this and have some time available if this is still a desired feature.

@mikemherron It's still a desired feature for me. Thanks for working on this.

aldas · 2023-03-05T18:18:28Z

@mikemherron if you are developing this, consider adding (additional) support for using this proxy middleware with/as pure Handler.ServeHTTP method - this way other frameworks that are based on Go standard library could also benefit from it.

I tried quick search on Github for "golang proxy load-balancer" and did not find anything advanced. It would be nice if whole Go ecosystem could benefit from something like that.

mikemherron · 2023-03-06T11:50:31Z

Thanks for the input @aldas - my main motivation was to understand the internals of Echo a bit better, and hopefully make a worthwhile contribution to an incredibly useful community project. I was reviewing the issues list and this seemed like a good one to pick up.

However, I completely understand your concern over adding that complexity to Echo core. Implementing a seperate lib is less appealing as it's probably quite a bit broader in scope and also more likely to become stale if not part of a wider community project. I may still give it a shot, but is there any (even minimal) implementation of this logic you would see as acceptable for core?

aldas · 2023-03-06T12:18:24Z

If its not huge/very complex, have many "moving parts" and does not try to do many things, does not depend on 3rd party libraries - I think we can merge it back to core.

This is domain that potentially has many requests for different features and as a maintainers we need to be careful not to scope-creep Echo too much. Sad truth is - maintainers come and go but lines of code only grows.

aldas · 2023-03-06T12:41:07Z

@mikemherron would it be OK from you, if you try to limit feature set only to

Connection failure policy - what to do if unable to connect to a configured backend. Options are nothing (current behaviour), try next (try the next available backend), try all (keep trying backends until we can connect)

when we can no connect to our target, try to connect to next target,
maybe have configuration number - how many next tries we can have till we give up (can not be more than configured backends) or callback that we can provide - which takes in connection error, returns error or nil. When nil - MW will choose next, when err - MW returns with that err.
no eviction (just to limit our scope)

mikemherron · 2023-03-06T13:26:07Z

That makes sense and sounds good to me, thanks @aldas. I will have some time this week for a first pass and see how it goes. If it feels like too much for Echo core then of course we can always decide not to merge it.

mikemherron · 2023-03-07T19:20:07Z

I linked a draft PR with a rough initial implementation - any thoughts or comments appreciated. I went with the callback idea you proposed @aldas. LMK any feedback - if it seems reasonable I can continue work on it and get it ready for a full PR. If you feel it's outside of the scope of core Echo then no problem.

aldas · 2023-03-07T19:44:16Z

It small, is not that complex - I think it has very high potential to land in core.

I can give more comments Thursday as it is quite late here and I have extremely long workday tomorrow with traveling so probably too mentally exhausted to give thought out feedback.

But quick feedback from diagonally scrolling thought the PR:

implementing ProxyRetryProvider struct may be ergonomically overkill compared to ProxyRetryHandler just needs function closure. Do not throw it immediately out but think of the user - lets say retry is based on some specific error then maybe creating struct for it little bit too much work. I may be wrong here - for creating request based retryhandler instance maybe interface is better. This is my first emotional reaction without thinking about different aspects here.
maybe retryHandler(c, e) should be more "generic" - not inside these 2 IFs. Maybe assumption of checking echo.HTTPError + http.StatusBadGateway is something that default retryHandler should implement but if you provide you own we do not limit your choices to this condition.

mikemherron · 2023-03-07T20:05:15Z

Perfect, thanks for taking the time to look and appreciate the feedback. No rush, I'll consider these points and do some more detailed testing in the meantime.

mikemherron · 2023-03-09T13:39:31Z

Thinkin on your feedback @aldas I think you are right, the struct is asking a bit much of the user. I went with a struct as the retry implementation needs to store the retry state per request - rather than having nested closures, it felt nicer for the retry config value to be stored on a struct.

However, thinking through the whole approach, it seems that users providing their own implementation would not want to do unbounded retries forever, this seems dangerous. Looking through the docs for the other HTTP servers, most allow to retry N times, or for a set period of time - retrying for a set period of time seems like it would be difficult to implement well without channels, which I think are overkill. So it doesn't seem likely to me a user would ever want to provide a custom implementation WITHOUT a fixed number of retries (and in fact it might be too easy/dangerous for a user to do this by accident with the current code).

Therefore, I think it will make more sense to:

Make a fixed number of retries a built-in feature of the middleware, configurable with a simple int (default 0).
Have a seperate function that can be used to check if the request should be retried (agree with your point it should be more generic) - this would default to checking fr bad gateway errors only but users could change this.

This way, the custom function doesn't need to track per-request state at all (no structrs or closures within closures) and overall feels like a simpler interface for users to understand. The disadvantages are it leaves 2 new things on the ProxyConfig public struct (retry count and handler function) and that the only retry behaviour supported would be to retry N times - but I don't think (without a channel interface) it would be possible to do much else that would be useful.

Apologies for the long post! I will update the PR either tonight or tomorrow and it will hopefully be clear.

mikemherron · 2023-03-10T16:49:47Z

I updated my PR and made ready for review. I ended up keeping the error check outside of the handler function - have elaborated in the PR.

aldas · 2023-05-23T21:05:55Z

closing, #2414 has been merged

yonas · 2023-05-24T01:32:25Z

@mikemherron @aldas Thanks! 🥳 I'm going to try it out.

yonas changed the title ~~Load balancer should detect broken backends and switch to healthy backends~~ Proxy middleware should detect broken backends and switch to healthy backends Dec 28, 2022

aldas added the feature label Dec 28, 2022

mikemherron mentioned this issue Mar 7, 2023

Support retries of failed proxy requests #2414

Merged

aldas closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy middleware should detect broken backends and switch to healthy backends #2372

Proxy middleware should detect broken backends and switch to healthy backends #2372

yonas commented Dec 28, 2022 •

edited

Loading

mikemherron commented Mar 3, 2023 •

edited

Loading

aldas commented Mar 3, 2023 •

edited

Loading

aldas commented Mar 3, 2023 •

edited

Loading

yonas commented Mar 5, 2023

aldas commented Mar 5, 2023 •

edited

Loading

mikemherron commented Mar 6, 2023

aldas commented Mar 6, 2023 •

edited

Loading

aldas commented Mar 6, 2023 •

edited

Loading

mikemherron commented Mar 6, 2023

mikemherron commented Mar 7, 2023

aldas commented Mar 7, 2023 •

edited

Loading

mikemherron commented Mar 7, 2023

mikemherron commented Mar 9, 2023

mikemherron commented Mar 10, 2023

aldas commented May 23, 2023

yonas commented May 24, 2023

Proxy middleware should detect broken backends and switch to healthy backends #2372

Proxy middleware should detect broken backends and switch to healthy backends #2372

Comments

yonas commented Dec 28, 2022 • edited Loading

Issue Description

Expected behaviour

Actual behaviour

Version/commit

mikemherron commented Mar 3, 2023 • edited Loading

aldas commented Mar 3, 2023 • edited Loading

aldas commented Mar 3, 2023 • edited Loading

yonas commented Mar 5, 2023

aldas commented Mar 5, 2023 • edited Loading

mikemherron commented Mar 6, 2023

aldas commented Mar 6, 2023 • edited Loading

aldas commented Mar 6, 2023 • edited Loading

mikemherron commented Mar 6, 2023

mikemherron commented Mar 7, 2023

aldas commented Mar 7, 2023 • edited Loading

mikemherron commented Mar 7, 2023

mikemherron commented Mar 9, 2023

mikemherron commented Mar 10, 2023

aldas commented May 23, 2023

yonas commented May 24, 2023

yonas commented Dec 28, 2022 •

edited

Loading

mikemherron commented Mar 3, 2023 •

edited

Loading

aldas commented Mar 3, 2023 •

edited

Loading

aldas commented Mar 3, 2023 •

edited

Loading

aldas commented Mar 5, 2023 •

edited

Loading

aldas commented Mar 6, 2023 •

edited

Loading

aldas commented Mar 6, 2023 •

edited

Loading

aldas commented Mar 7, 2023 •

edited

Loading