Basic load balancing #78

Tratcher · 2020-04-21T18:24:40Z

I'd added the config model to select a load balancing strategy per backend. I've also implemented most of the strategies.

Fixes #75 - The power of two choices - This is the default, it has the best cost-benefit trade off.
Fixes #74 - Least requests - Has the cost of examining all endpoints.
Fixes #73 - Random - Has degenerate cases where it make assign work to busy endpoints.

First - This was the only one available before. It doesn't seem like actual load balancing, but it may be useful in the scenario where you have two endpoints and you only want to switch to the second one if the first goes down. First handles that because load balancing only considers healthy endpoints.

#72 - Round-Robin - This is the only one I skipped. It's not clearly better than any of the above strategies, and it's difficult to implement over an unstable list of endpoints. Nor do we have a place to store the per-backend state needed to track which endpoint to select next.

Note this will have merge conflicts with #50.

davidfowl · 2020-04-21T18:36:06Z

src/ReverseProxy.Core/Service/RuntimeModel/LoadBalancingMode.cs

+    /// <summary>
+    /// Load balancing strategies for endpoint selection.
+    /// </summary>
+    public enum LoadBalancingMode


Custom load balancing would require a few different things.
A) You'd have to replace/wrap the ILoadBalancer service or middleware.
B) Would you only want that for some backends?
C) What if you have more than one type of custom load balancing for different backends?

I did a quick test and you can put "Mode": "-1" into config and it successfully binds to the enum. That means you don't need a custom value, people can just define their own entries.

src/ReverseProxy.Core/Service/Proxy/LoadBalancer.cs

halter73 · 2020-04-21T21:39:48Z

src/ReverseProxy.Core/Abstractions/BackendDiscovery/Contract/LoadBalancingOptions.cs

 // Licensed under the MIT License.

+using Microsoft.ReverseProxy.Core.RuntimeModel;
+
 namespace Microsoft.ReverseProxy.Core.Abstractions
 {
    /// <summary>
    /// Load balancing options.
    /// </summary>
    public sealed class LoadBalancingOptions


Should we have mode-specific options? I could imagine wanting to add a PowerOfNChoices mode where N was some sort of option for example, but I wouldn't want that cluttering up LoadBalancingOptions. Maybe the mode name should be a key in config, and the mode-specific options, if any, could be in the value.

I'd rather not tackle that until we have a concrete need. I am curious what kind of config would be needed, these algorithms are pretty self-contained. The discussion I saw around PowerOfNChoices was that the benefits dropped off steeply when doing more than 2 choices. If 3 choices was still a good option but more than that wasn't, then I could see just adding a separate enum value for it.

src/ReverseProxy.Core/Service/RuntimeModel/LoadBalancingMode.cs

src/ReverseProxy.Utilities/Utils/ThreadStaticRandom.cs

src/ReverseProxy.Core/Service/Proxy/LoadBalancer.cs

alnikola · 2020-04-22T13:17:13Z

src/ReverseProxy.Core/Service/Proxy/LoadBalancer.cs

+                case LoadBalancingMode.LeastRequests:
+                    var leastRequestsEndpoint = healthyEndpoints[0];
+                    var leastRequestsCount = leastRequestsEndpoint.ConcurrencyCounter.Value;
+                    for (var i = 1; i < endpointCount; i++)


It might be worth storing healthy endpoints in a Min Heap data structure instead of list when endpointCount is big enough (e.g. > 20). It will allow to find an endpoint with least requests with O(1) complexity in contrast to O(n) for searching in a plain list. Heap can be reconstructed in BackendProber.ProbeEndpointsAsync after each probing iteration.

The tradeoffs being how frequently you have to build and update that structure. Having to update it every time you add and remove a request seems like a lot of overhead.

Yes, I understand it. But, do we really want to see an effect of ConcurrencyCounters updates on each request? The current linear search algorithm is not atomic anyways, so it might return endpoint which is not the least loaded one.

Point, it's an approximation of least busy without locking.

Out goal at this point is getting basic implementations we can measure and compare. We'll use that data to decide what to refine later.

@alnikola We should file a bug for this for sure, that's a good idea.

Thinking about this more, that kind of structure would be disruptive to the other load balancing algorithms, not to mention unnecessary overhead when you're not using LeastRequests. It's also unclear how it would fit with the winnowing middleware approach.

If the heap were not updated frequently, wouldn't you end up overloading the first node?

Keep in mind that Power of Two Choices is already O(1) and has no overhead on ConcurrencyCounters updates. It is a good default choice when there is a large number of endpoints.

Thinking about this more, that kind of structure would be disruptive to the other load balancing algorithms, not to mention unnecessary overhead when you're not using LeastRequests.

We can introduce a new abstraction for endpoint collection (e.g. IEndpointCollection : IReadOnlyList<EndpointInfo>) exposing GetLeastLoadedEndpoint method with at least two implementations: ListEndpointCollection and MinHeapEndpointCollection. Further, in the end of probing iteration if the current BackendConfig.Mode = LeastRequests and endpoint count is big enough, BackendProber will command IEndpointManager to change the underlying collection holding endpoints to new MinHeapEndpointCollection, or to ListEndpointCollection otherwise.
The worst case here is mode changing to LeastRequests from something else just after BackendProber updated the endpoint list, so ListEndpointCollection would have to use a linear search to find the target item. However, it will be the same as it is currently.

If the heap were not updated frequently, wouldn't you end up overloading the first node?

To avoid overloading, heap update should be trigged by a heuristic based on the real request count tracked by ProxyInvokerMiddleware instead of a simple timer.

Summarily, I will create a separate issue as @davidfowl suggested for this improvement to discuss it separately from this PR.

Can the same endpoint appear in more than one backend cluster? If so I'm assuming that the load needs to be tracked on a per endpoint basis rather than as part of the backend structure. As the list of backends can change dynamically, managing lists for least loaded is going to be expensive.
This is where the pick the least loaded of 2 is beneficial. I wonder if we should do other algorithms that don't involve sorting all the endpoints - like something that can assess the load in relation to the average, and pick the first that is below average? That way it can drop out on the first good candidate. If that's combined with a round robin seed, so the next search will be based on the last found, then it will spread the load out across the servers.

Submitted issue #94

Tratcher added this to the 1.0.0-preview1 milestone Apr 21, 2020

Tratcher requested review from analogrelay, halter73, davidfowl and stephentoub April 21, 2020 18:24

Tratcher self-assigned this Apr 21, 2020

Tratcher requested a review from davidni April 21, 2020 18:25

davidfowl reviewed Apr 21, 2020

View reviewed changes

stephentoub reviewed Apr 21, 2020

View reviewed changes

src/ReverseProxy.Core/Service/Proxy/LoadBalancer.cs Outdated Show resolved Hide resolved

stephentoub approved these changes Apr 21, 2020

View reviewed changes

halter73 reviewed Apr 21, 2020

View reviewed changes

davidni approved these changes Apr 22, 2020

View reviewed changes

src/ReverseProxy.Core/Service/RuntimeModel/LoadBalancingMode.cs Outdated Show resolved Hide resolved

src/ReverseProxy.Utilities/Utils/ThreadStaticRandom.cs Show resolved Hide resolved

src/ReverseProxy.Core/Service/Proxy/LoadBalancer.cs Show resolved Hide resolved

alnikola reviewed Apr 22, 2020

View reviewed changes

Tratcher added 2 commits April 23, 2020 10:04

Basic load balancing

5d54de6

Clean up IRandom

303d764

Tratcher mentioned this pull request Apr 23, 2020

Panic threshold #83

Closed

Namespace

abf4936

Tratcher force-pushed the tratcher/loadrandom branch from df30d2a to abf4936 Compare April 23, 2020 17:16

Tratcher merged commit fee1ada into master Apr 23, 2020

Tratcher deleted the tratcher/loadrandom branch April 23, 2020 22:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic load balancing #78

Basic load balancing #78

Tratcher commented Apr 21, 2020

davidfowl Apr 21, 2020

Tratcher Apr 21, 2020

halter73 Apr 21, 2020

Tratcher Apr 21, 2020

alnikola Apr 22, 2020

Tratcher Apr 22, 2020

alnikola Apr 22, 2020

Tratcher Apr 22, 2020

davidfowl Apr 22, 2020

Tratcher Apr 22, 2020

davidni Apr 22, 2020

alnikola Apr 22, 2020

samsp-msft Apr 22, 2020

alnikola Apr 24, 2020

Basic load balancing #78

Basic load balancing #78

Conversation

Tratcher commented Apr 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment