Load scheduling by actual request concurrency and response times #2333

akevdmeer · 2023-07-07T07:54:45Z

We have a business-critical service that is characterized by a low request concurrency and a high response time variance of let's say 100ms to 10s (when not overloaded). Estimating the response time based on the request is infeasible. We need to apply (adaptive) load scheduling to arbitrate between contending workloads but don't see how this can be done effectively with Aperture at the moment.

Is it possible to do load scheduling based on the actual request concurrency, without needing to determine the request cost in tokens upfront? The flow.End() calls would seem to make it possible to track flows that are active.

The text was updated successfully, but these errors were encountered:

kwapik · 2023-07-07T10:24:20Z

@tanveergill PTAL

hdkshingala · 2023-07-07T13:52:03Z

@akevdmeer Please join our slack community. We can discuss this or any other issues you faced or need help with.

CC: @jaidesai-fn

harjotgill · 2023-07-07T17:55:46Z

@akevdmeer - Using flow.End() to track exact concurrency is certainly feasible. Bookkeeping is fairly simple, though we will have to build a token audit mechanism to replenish tokens in case they are lost due to some intermittent bookkeeping failures.

However, in most cases (including hard concurrency limits scenarios), Aperture's current mechanism works, provided we can reliably detect overload based on some health metric(s). At a high level, there are 2 parts of adaptive load scheduling:

Overload detection: Health signals (latency, concurrent connections, queue depths metrics) can be used to detect overloads, optionally, along with confirmatory signals to reduce false positives (CPU metric). I am not aware of your exact setup, but very likely we can find some metrics that can help us reliably detect the overload buildup. Based on the severity of the overload, the Aperture Controller adjusts the token bucket fill rate. We have found that using latency as an overload detection signal in a hard concurrency limit scenario (in our playground), Aperture can adaptively adjust the request rate to "discover" the concurrency limit without exact bookkeeping (see attached graph). Similar to this idea, we can switch out the latency-based feedback and use some other signal to adjust the request rate to match the inherent concurrency limit of your service.
Scheduling workloads (prioritization): In case your workloads have high variance, then you can switch off the latency-based token estimation algorithm and use the priority levels and/or hard code workload tokens to schedule requests. The automatic token estimation helps determine the "size" of each request w.r.t. to other workloads based on their response times. E.g., a low-priority but lightweight request will have a better chance of getting scheduled than another low-priority but heavier request.

We will be happy to learn more about your scenario and be more prescriptive with our advice. We can certainly build exact concurrency bookkeeping if that is needed to solve your scenario. But first, we are curious how the current request rate-based system would behave in this low concurrency, high latency variance scenario by using metrics other than latency to detect overloads.

PS: @tanveergill suggested that in case no other signal can alert us on overloads, then perhaps we can use the latency percentiles to get some stable readings. The Aperture FluxMeters collect latency metrics as Prometheus summaries.

kwapik added the Aperture Controller label Jul 7, 2023

kwapik assigned tanveergill Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load scheduling by actual request concurrency and response times #2333

Load scheduling by actual request concurrency and response times #2333

akevdmeer commented Jul 7, 2023

kwapik commented Jul 7, 2023

hdkshingala commented Jul 7, 2023

harjotgill commented Jul 7, 2023

Load scheduling by actual request concurrency and response times #2333

Load scheduling by actual request concurrency and response times #2333

Comments

akevdmeer commented Jul 7, 2023

kwapik commented Jul 7, 2023

hdkshingala commented Jul 7, 2023

harjotgill commented Jul 7, 2023