Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privately expose MsQuic performance counters #98422

Merged
merged 10 commits into from
Feb 22, 2024

Conversation

rzikm
Copy link
Member

@rzikm rzikm commented Feb 14, 2024

Contributes to #55979.

The main problem is that we can only get the MsQuic perf counters in bulk, so any observable/polling metrics/coutners API would require fetching entire array (32 items) for each separate counter and there is no reasonable way to extend the existing API without needing public API changes.

Since this is going to be internal use only, I went with capping the refresh rate of backgorund counters at once every 50ms, current monitoring tools usually query every X seconds so this should be low enough not to confuse any monitoring tool and high enough to finish querying all counters before the next refresh.

example when monitoring functional tests run:

dotnet counters monitor Private.InternalDiagnostics.System.Net.Quic.MsQuic -p (gps dotnet | ? { $_.Path -like '*testhost*' }).Id
Press p to pause, r to resume, q to quit.
    Status: Running

[Private.InternalDiagnostics.System.Net.Quic.MsQuic]
    msquic.app.recv_bytes (By / 1 sec)                               594,652
    msquic.app.send_bytes (By / 1 sec)                             1,056,986
    msquic.connection.allocated ({connection})                             2
    msquic.connection.app_rejected ({connection} / 1 sec)                  0
    msquic.connection.connected ({connection})                             2
    msquic.connection.created ({connection} / 1 sec)                      48
    msquic.connection.handshake_failures ({connection} / 1 sec)            0   
    msquic.connection.no_alpn ({connection} / 1 sec)                       0
    msquic.connection.protocol_errors ({connection} / 1 sec)               0
    msquic.connection.resumed ({connection} / 1 sec)                       0
    msquic.datapath.path_failure ({challenge} / 1 sec)                     0
    msquic.datapath.path_validated ({challenge} / 1 sec)                   0
    msquic.datapath.send_stateless_reset ({packet} / 1 sec)                0
    msquic.datapath.send_stateless_retry ({packet} / 1 sec)                0
    msquic.packet.decryption_failures ({packet} / 1 sec)                   0
    msquic.packet.dropped ({packet} / 1 sec)                               0
    msquic.packet.suspected_lost ({packet} / 1 sec)                        0
    msquic.stream.allocated ({stream})                                     1
    msquic.threadpool.conn_oper_completed ({operation} / 1 sec)        1,540
    msquic.threadpool.conn_oper_queue_depth ({operation})                  0
    msquic.threadpool.conn_oper_queued ({operation} / 1 sec)           1,540
    msquic.threadpool.conn_queue_depth ({connection})                      0
    msquic.threadpool.work_oper_completed ({operation} / 1 sec)            0
    msquic.threadpool.work_oper_queue_depth ({operation})                  0
    msquic.threadpool.work_oper_queued ({operation} / 1 sec)               0
    msquic.udp.recv_bytes (By / 1 sec)                               984,227
    msquic.udp.recv_datagrams ({datagram} / 1 sec)                     1,006
    msquic.udp.recv_events ({event} / 1 sec)                           1,006
    msquic.udp.send_bytes (By / 1 sec)                               984,227
    msquic.udp.send_calls ({call} / 1 sec)                               596
    msquic.udp.send_datagrams ({datagram} / 1 sec)                     1,117

This PR also adds dump of selected metrics after running QUIC functional tests.

@ghost
Copy link

ghost commented Feb 14, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #55979.

The main problem is that we can only get the MsQuic perf counters in bulk, so any observable/polling metrics/coutners API would require fetching entire array (32 items) for each separate counter and there is no reasonable way to extend the existing API without needing public API changes.

I think using the new Metrics API is the best middle-ground solution. We can use one ObservableGauge and one ObservableCounter and report multiple counters by tagging each measurement with perf counter name. While we can't specify individual descriptions and units for each individual counter, I think this is acceptable for internal diagnostics.

example when monitoring functional tests run:

dotnet counters monitor Private.InternalDiagnostics.System.Net.Quic -p (gps dotnet | ? { $_.Path -like '*testhost*' }).Id
Press p to pause, r to resume, q to quit.
    Status: Running

[Private.InternalDiagnostics.System.Net.Quic]
    MsQuic
        Name=APP_RECV_BYTES                                       26,368,237
        Name=APP_SEND_BYTES                                       26,368,237
        Name=CONN_ACTIVE                                                   1
        Name=CONN_APP_REJECT                                               0
        Name=CONN_CONNECTED                                                0
        Name=CONN_CREATED                                                 22
        Name=CONN_HANDSHAKE_FAIL                                           2
        Name=CONN_LOAD_REJECT                                              0
        Name=CONN_NO_ALPN                                                  0
        Name=CONN_OPER_COMPLETED                                       3,047
        Name=CONN_OPER_QUEUE_DEPTH                                         0
        Name=CONN_OPER_QUEUED                                          3,046
        Name=CONN_PROTOCOL_ERRORS                                          0
        Name=CONN_QUEUE_DEPTH                                              0
        Name=CONN_RESUMED                                                  0
        Name=PATH_FAILURE                                                  0
        Name=PATH_VALIDATED                                                0
        Name=PKTS_DECRYPTION_FAIL                                          0
        Name=PKTS_DROPPED                                                  0
        Name=PKTS_SUSPECTED_LOST                                         644
        Name=SEND_STATELESS_RESET                                          0
        Name=SEND_STATELESS_RETRY                                          0
        Name=STRM_ACTIVE                                                   0
        Name=UDP_RECV                                                 19,000
        Name=UDP_RECV_BYTES                                       27,227,902
        Name=UDP_RECV_EVENTS                                           1,151
        Name=UDP_SEND                                                 19,690
        Name=UDP_SEND_BYTES                                       28,174,485
        Name=UDP_SEND_CALLS                                            1,168
        Name=WORK_OPER_COMPLETED                                           0
        Name=WORK_OPER_QUEUE_DEPTH                                         0
Author: rzikm
Assignees: -
Labels:

area-System.Net.Quic

Milestone: -

@rzikm rzikm requested a review from a team February 14, 2024 14:02
@@ -929,6 +929,7 @@ internal enum QUIC_PERFORMANCE_COUNTERS
PATH_FAILURE,
SEND_STATELESS_RESET,
SEND_STATELESS_RETRY,
CONN_LOAD_REJECT,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to microsoft/msquic#4120, we should wait until that one merges.

We don't need to wait with MsQuic update before merging, the counter will be 0 until a version which supports it is loaded.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, thanks.

@danmoseley
Copy link
Member

Cc @JamesNK in case relevant

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rzikm
Copy link
Member Author

rzikm commented Feb 21, 2024

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rzikm
Copy link
Member Author

rzikm commented Feb 22, 2024

CI failures are unrelated.

@rzikm rzikm merged commit dcc66a7 into dotnet:main Feb 22, 2024
108 of 119 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2024
@karelz karelz added this to the 9.0.0 milestone May 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants