-
Notifications
You must be signed in to change notification settings - Fork 528
Low RPS with lots of response data + chunking #572
Comments
Also could you share repo/gist (client) - would be interested to try |
After talking to @rynowak there are two major suggestions. One is allocate less. I have a commit that avoids an array allocation in The second suggestion is to delay the actually chunking until right before we make the call to For these reasons, I think this should be done post-RTM. |
I want to add to the discussion that Razor is always going to do chunking. We need to probably get a sense for how big of a page results in degraded performance and just how degraded that is. |
Using a variation of the synthetic with a wrk->windows setup; I'm seeing a lot of blockage on the sync Think its resolvable though; will try to have fix before end of weekend. |
Think I've fix for this, will see if there are any other areas that I can tweek. but wow! With the change the RPS is only up to 4,573; which I still didn't think was that impressive; but then I looked at the data rate and I've never seen Kestrel go so high! Its outputting 11.6Gbps
Though is 32% CPU so just over 5 cores of a 16 core machine to do it. |
Testing other connection amounts 16 connections is 112 rps (33.23MB/s or 265.84Mbit/s) So 500 rps is over 1GBit/s..! |
Next is hotspot is CopyFrom, adding #585 gives higher peaks but also CPU drops are at play (GC) The two changes giving an additional 123 rps (4,573 -> 4,694) though that is another 40Mbit/s. I have a feeling the network is saturated at this point.
Then it looks like we are back to the usual suspects; which are post RTM? (Three combined are https://github.com/benaadams/KestrelHttpServer/tree/chunking ) |
Running longer 5m test to be sure; turns out better: Peaking 12.6GBps, ave range 11.5 - 12.4 Gbps; 2 very brief GC drops over 5 mins.
|
A single connection is interesting as it bandwidth is quite variable 17Mbps - 62Mbps; might be more optimal chunk sizes rather than 1024B. Though it does use 0% CPU apparently. |
@benaadams How did you modify the plaintext benchmark to do chunking? What were your wrk numbers before the change? |
See #589 (comment) Basically before, it died... as it got caught up in chunk writing going sync - I think that might have been the delay added by the network vs loopback? Post the chunking also going async it was fine - that was the main change for this. |
Might have also been the effects of running 1024 connections and them all flipping sync |
Yikes. I now see that this was the key change. I knew we weren't awaiting the suffix, but I didn't think we were blocking either. I guess you wouldn't notice until you surpassed the write-behind buffer. Good catch! |
@benaadams change has now been merged: d3d9c8d I think this is the major cause of the low RPS with lots of response data + chunking. @rynowak Reopen this if you think there is more that needs to be done. |
I'm only able to get about 500-550 rps using the following middleware over the loopback interface. I can get 1200+ with a content length set.
Note that this is intentionally exceeding the write-behind buffer and going through the chunked path. This only gets the cpu to ~15%.
This is a synthetic version of a Razor benchmark I've been trying to improve.
The text was updated successfully, but these errors were encountered: