Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve TLS perf on macOS #32338

Merged
merged 8 commits into from
Apr 16, 2020
Merged

improve TLS perf on macOS #32338

merged 8 commits into from
Apr 16, 2020

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Feb 14, 2020

Thos removes old data processing and uses ArrayBuffers instead. I think the lock is not needed, but keeping it in place has a negligible impact so it seems safer. I may do more testing with upcoming locking changes to support TLS13.
The biggest winner (no surprise) is LargeBuffer test with 189us -> 35us.

Old Code:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
HandshakeAsync 91,433.62 us 3,873.713 us 4,460.974 us 89,684.15 us 86,362.65 us 101,266.67 us - - - 44946 B
TLS12HandshakeECDSA256CertAsync 18,295.40 us 2,111.528 us 2,259.312 us 17,413.53 us 15,792.05 us 23,607.42 us - - - 36128 B
TLS12HandshakeRSA1024CertAsync 13,471.60 us 5,008.974 us 5,143.848 us 12,002.27 us 9,580.23 us 30,215.76 us - - - 37448 B
TLS12HandshakeRSA2048CertAsync 22,991.82 us 1,834.026 us 1,962.387 us 22,450.63 us 20,409.81 us 27,256.41 us - - - 42136 B
TLS12HandshakeRSA4096CertAsync 144,988.34 us 24,471.431 us 28,181.336 us 142,633.76 us 103,928.26 us 203,175.22 us - - - 47280 B
WriteReadAsync 32.20 us 3.944 us 3.873 us 30.69 us 28.16 us 40.77 us 0.0800 - - 300 B
ReadWriteAsync 45.66 us 12.723 us 14.651 us 39.42 us 32.99 us 78.06 us 0.1000 - - 360 B
ConcurrentReadWrite 41.34 us 2.887 us 3.325 us 40.64 us 37.01 us 47.14 us 0.1600 - - 526 B
ConcurrentReadWriteLargeBuffer 189.07 us 20.563 us 22.002 us 187.04 us 164.12 us 252.14 us 0.1000 - - 539 B

This change:

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
HandshakeAsync 85,647.06 us 1,671.304 us 1,788.276 us 85,147.41 us 83,299.81 us 89,488.52 us - - - 105024 B
TLS12HandshakeECDSA256CertAsync 18,209.13 us 1,428.538 us 1,587.816 us 18,156.97 us 15,652.98 us 21,043.62 us - - - 105184 B
TLS12HandshakeRSA1024CertAsync 9,549.63 us 258.889 us 277.008 us 9,456.88 us 9,254.65 us 10,367.09 us - - - 106408 B
TLS12HandshakeRSA2048CertAsync 20,166.37 us 382.840 us 358.109 us 20,037.48 us 19,836.90 us 20,962.80 us - - - 107985 B
TLS12HandshakeRSA4096CertAsync 86,057.27 us 1,293.336 us 1,079.994 us 86,264.44 us 84,164.35 us 87,723.55 us - - - 111048 B
WriteReadAsync 20.81 us 0.306 us 0.286 us 20.74 us 20.37 us 21.41 us 0.1000 - - 351 B
ReadWriteAsync 24.26 us 0.449 us 0.375 us 24.15 us 23.82 us 25.18 us 0.1000 - - 360 B
ConcurrentReadWrite 30.00 us 0.350 us 0.327 us 29.93 us 29.33 us 30.48 us 0.1600 - - 526 B
ConcurrentReadWriteLargeBuffer 34.05 us 0.889 us 0.988 us 33.91 us 32.32 us 36.05 us 0.1000 - - 529 B

@wfurt wfurt requested review from stephentoub, bartonjs and a team February 14, 2020 23:17
@wfurt wfurt self-assigned this Feb 14, 2020
Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Small notes.

@scalablecory
Copy link
Contributor

It looks like there is more optimization opportunity here if you bypass the ArrayBuffer and copy bytes directly into a user's buffer when there is a pending ReadAsync.

@wfurt wfurt requested a review from scalablecory April 9, 2020 03:35
@wfurt
Copy link
Member Author

wfurt commented Apr 9, 2020

With #32925 in place, handshake, EncryptMessage and DecyptMessage are all mutually exclusive. The way how it flows is that managed code read TLS frame and than we put it to _inputBuffer using context.Write(). Then we call OS crypto function and that consumes the data as needed. When data needs to be sent out, WriteToConnection() puts it to _outputBuffer and when the call returns we collect that using ReadPendingWrites(). In this whole process there is no concurrency so the locks are not needed any more.
As far as the dispose, disposing ArrayBuffer buffer does not release underlying buffers, it just discards pending data. If this happens during read/write that should be ok as we are aborting and in inconsistent state anyway.

This should be ready for final reviews.

@@ -14,11 +14,12 @@ namespace System.Net
{
internal sealed class SafeDeleteSslContext : SafeDeleteContext
{
private const int InitialBufferSize = 2048;
Copy link
Member

@stephentoub stephentoub Apr 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often are these SafeDeleteSslContexts created? 4K seems fairly large (with two of these 2K sizes allocated below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are created once for each SslStream. TLS frame can be up to 16k but I wanted to fit at least typical handshake. Some other parts for SslStream would release the buffer every time when remaining data drop to zero. I can possibly do that as part of this change or as follow-up. With OpenSSL we simply write data to BIO and magic happens. But I don';t think it shrinks the buffers - I can take closer look as well.

@wfurt wfurt merged commit b44492d into dotnet:master Apr 16, 2020
@wfurt wfurt deleted the sslMac branch April 16, 2020 04:54
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants