Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parser performance by reducing tracing overhead #10533

Merged
merged 3 commits into from
Jun 30, 2021

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Jun 29, 2021

Passing structures larger than the register size is very expensive
due to Microsoft's x64 calling convention. We could reduce the
overhead by passing the string-view by reference, but this forces us
to allocate the parameters as static string-views on the data
segment of our binary. I've found that passing them as classic
C-strings is more ergonomic instead and fits the need for
high performance in this particular code.
This improves performance for VT-heavy output by 15-20%.

PR Checklist

  • I work here
  • Tests added/passed

@lhecker lhecker added Area-Performance Performance-related issue Product-Conpty For console issues specifically related to conpty labels Jun 29, 2021
@lhecker lhecker requested a review from miniksa June 29, 2021 22:22
@lhecker
Copy link
Member Author

lhecker commented Jun 29, 2021

I believe it might be worthwhile to re-evaluate in the future whether we still need such detailed tracing profiles.
An alternative that I previously suggested was to post only one trace per fully parsed VT sequence instead. This would make ETW tracing more expensive, but potentially remove tracing from our hot path entirely (for instance StateMachine::_EventCsiParam).

@miniksa
Copy link
Member

miniksa commented Jun 29, 2021

Insisting you leave a comment on the method headers or bodies explaining the unsafe looking pointer is for perf reasons and justified so no future good-doer replaces them back the other way thinking they're doing something righteous for memory safety.

Also as we discussed offline, I am good with you compressing the csi trace to a string for further perf or even compiling out all of this for release builds as the intricacies of the parser are probably not useful for in-the-wild diagnostics anyway. I'm not sold on total removal or asymmetric removal of only one path's tracing.

Copy link
Collaborator

@skyline75489 skyline75489 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible!

@lhecker
Copy link
Member Author

lhecker commented Jun 29, 2021

@miniksa I'm sorry, I must've misunderstood your message somehow...
But I believe it's useful to continue having ETW support. As such I'd like to go the way of compressing the CSI traces. I might open a PR for this later. This change already improves the tracing performance so much that we can focus on other performance areas for now before we finally circle back to tracing again, in my opinion. 🙂

@miniksa
Copy link
Member

miniksa commented Jun 29, 2021

@miniksa I'm sorry, I must've misunderstood your message somehow...
But I believe it's useful to continue having ETW support. As such I'd like to go the way of compressing the CSI traces. I might open a PR for this later. This change already improves the tracing performance so much that we can focus on other performance areas for now before we finally circle back to tracing again, in my opinion. 🙂

Alright that works for me. I just thought that yesterday you briefly asked if we could dump the Csi ones completely and leave the rest and that I didn't like. I do like the compression idea (behind an IsTraceloggingEnabled check on the provider of course)

And I offered "Def out in release" as an option if the structure overhead thing turned out to be a major player with the "is ETW on?" test portion of the logging macros as the instance variable of the ETW channel is a structure of sorts when I looked in the code behind and I thought could have been subject to your "Microsoft is bad perf-wise at passing structs on x64" assertion.

@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jun 30, 2021
@ghost
Copy link

ghost commented Jun 30, 2021

Hello @lhecker!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

Do note that I've been instructed to only help merge pull requests of this repository that have been opened for at least 8 hours, a condition that will be fulfilled in about 6 hours 1 minute. No worries though, I will be back when the time is right! 😉

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@lhecker lhecker merged commit ee32598 into main Jun 30, 2021
@lhecker lhecker deleted the dev/lhecker/perf-vt-tracing branch June 30, 2021 00:28
DHowett pushed a commit that referenced this pull request Jul 7, 2021
Passing structures larger than the register size is very expensive
due to Microsoft's x64 calling convention. We could reduce the
overhead by passing the string-view by reference, but this forces us
to allocate the parameters as static string-views on the data
segment of our binary. I've found that passing them as classic
C-strings is more ergonomic instead and fits the need for
high performance in this particular code.
This improves performance for VT-heavy output by 15-20%.

## PR Checklist
* [x] I work here
* [x] Tests added/passed

(cherry picked from commit ee32598)
@ghost
Copy link

ghost commented Jul 14, 2021

🎉Windows Terminal v1.9.1942.0 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Jul 14, 2021

🎉Windows Terminal Preview v1.10.1933.0 has been released which incorporates this pull request.:tada:

Handy links:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue AutoMerge Marked for automatic merge by the bot when requirements are met Product-Conpty For console issues specifically related to conpty
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants