Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARP has a higher cpu usage than Nginx #2427

Open
doddgu opened this issue Mar 4, 2024 · 13 comments
Open

YARP has a higher cpu usage than Nginx #2427

doddgu opened this issue Mar 4, 2024 · 13 comments
Assignees
Labels
Type: Bug Something isn't working
Milestone

Comments

@doddgu
Copy link

doddgu commented Mar 4, 2024

Sorry, I don't know if it is a bug.

Describe the bug

I deployed 3 nginx at HongKong, and deployed 3 YARP at HangZhou.

Client -> Nginx -> Yarp -> Service

Nginx forwards some services, and YARP forward one of them.

Nginx CPU
image

YARP CPU
image

YARP other metrics
image

Htop (Cat.Service.dll is based on YARP)
image

I tried to analyze the CPU on vs
Top function
image

Module View
image

To Reproduce

No exception.

Further technical details

  • Include the version of the packages you are using
    2.1.0
  • The platform (Linux/macOS/Windows)
    Linux

They're all 4c8g, YARP on ubuntu 22.04, nginx on centos.
YARP 2.1.0 runs on .NET 8.

@doddgu doddgu added the Type: Bug Something isn't working label Mar 4, 2024
@Tratcher
Copy link
Member

Tratcher commented Mar 4, 2024

How does the load / RPS compare?

@doddgu
Copy link
Author

doddgu commented Mar 5, 2024

How does the load / RPS compare?

Every YARP is almost 4000
image

@doddgu doddgu changed the title Yapr has a higher cpu usage than Nginx Yarp has a higher cpu usage than Nginx Mar 5, 2024
@doddgu doddgu changed the title Yarp has a higher cpu usage than Nginx YARP has a higher cpu usage than Nginx Mar 5, 2024
@doddgu
Copy link
Author

doddgu commented Mar 5, 2024

I loaded pdb.
I find that the Thread in WorkerThreadStart method. The Thread.CurrentThread.SetThreadPoolWorkerThreadName() takes up a lot of CPU resources.

I don't know why have to call WorkerThreadStart so many times.

image

image

image

@doddgu
Copy link
Author

doddgu commented Mar 6, 2024

I used YARP source code analysis, I found that YARP itself does not have high cpu usage.

image

@doddgu
Copy link
Author

doddgu commented Mar 8, 2024

Hi @MihaZupan , any news?

@doddgu
Copy link
Author

doddgu commented Mar 11, 2024

Is it related to the dotnet/runtime#70098
And I see there's pr to fix it

@MihaZupan MihaZupan added this to the Backlog milestone Apr 9, 2024
@doddgu
Copy link
Author

doddgu commented Aug 12, 2024

@MihaZupan hi,is there any news?
In my case, I have a service , it has 120,000 qps. It only need 3 nginx, but used 40 yarp services. It troubles me.
I tried using.net 9 and I found a performance improvement of about 20%, but that's still a big difference.
Or are there any temporary ways to try to fix the problem? I'm happy to test it.

@zhenlei520
Copy link

The performance gap is so obvious, is there any room for improvement?

@zhenlei520
Copy link

How does the load / RPS compare?

Is there any news about this issue?
Through observations over the past few days, we found that when the response time of downstream services fluctuates, Porxy is under great pressure. Simply put, requests that originally required 100 threads to process require more threads to process these requests due to downstream fluctuations. At this time, threads are piled up, and then more threads are quickly started to process these requests. However, this rapid change of threads in a short period of time causes obvious CPU fluctuations, and as the downstream stabilizes, threads that have not been used for a long time will be destroyed. In this way, downstream fluctuations will have a great impact on Proxy. Although we set the minimum number of threads, this will not prevent the thread pool from recycling threads later. It only enables more threads to be started quickly. We hope to keep these threads alive all the time, and do not want frequent thread startups to cause large CPU fluctuations.

 ThreadPool.SetMinThreads(500, 500);

@Tratcher @MihaZupan

@zhenlei520
Copy link

image
image
image

@doddgu
Copy link
Author

doddgu commented Sep 10, 2024

We upgrade .NET 8 to .NET 9 preview, and set some envionment variables

The most obvious improvement in .NET 9 is half the memory

DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT = 500
DOTNET_ThreadPool_UnfairSemaphoreSpinLimit = 0
DOTNET_SYSTEM_NET_SOCKETS_INLINE_COMPLETIONS = 1

Overall, it indeed consumes less CPU (around 30% less), and there are no longer minute-level blockages causing widespread timeouts when the Current Request suddenly increases. However, there is still a small probability of request timeouts, and the frequency of CPU fluctuations has become very frequent. We tracked that the downstream service responds quickly, and occasionally requests timeout due to yarp, but because the QPS is relatively high, these timeouts are not visible on the dashboard. We have another upstream service that is particularly sensitive to abnormal requests, and in the upstream service, we see that requests with a small probability of timeout occur very frequently.

First, let's look at the performance of yarp, which has indeed improved.
image

These are abnormal requests detected upstream, all of which are SocketExceptions.
image

In summary: Setting thread-related parameters can reduce CPU usage but will introduce more instability, and there is still a significant gap compared to Nginx.

@zhenlei520
Copy link

Later we made some adjustments to the configuration

<PropertyGroup>
  <TargetFramework>net9.0</TargetFramework>
 <GarbageCollectionAdaptationMode>0</GarbageCollectionAdaptationMode>
</PropertyGroup>

Environment variables

DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0

After turning off spin, the CPU performance increased by nearly 40%, which is indeed a big improvement. However, according to the data, it will affect qps. However, we have not yet added link monitoring, so the impact on qps is not yet known. From the perspective of upstream requests, the average response time is not greatly affected.

image

However, compared with nginx, yarp still has a lot of room for improvement. We hope to use it instead of other reverse proxy products.

@doddgu
Copy link
Author

doddgu commented Sep 18, 2024

@Tratcher @MihaZupan
Sorry, I have to seek your help again. Because the CPU control still cannot meet our expectations, we might choose another reverse proxy as a result. This is a tough decision, as we are all .NET developers and had high hopes for YARP. Our requirements are not extremely stringent for YARP to match the performance parameters of Nginx. However, if we only need three Nginx servers to handle all the traffic stably, I cannot convince our team to choose YARP, which requires over 40 servers to run stably. I hope the .NET team can see this message and respond to us. Our biggest confusion right now is not knowing when it will be resolved, even just prioritizing the resolution would be very helpful. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants