Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DispatcherQueueSynchronizationContext faster and zero-alloc #1058

Conversation

Sergio0694
Copy link
Member

Overview

The current DispatcherQueueSynchronizationContext implementation is rather inefficient, for two reasons:

  • It's creating a delegate with closure every single time (!), capturing both the callback and the state
  • There's also additional overhead in general due to how the actual callback is then scheduled internally

Here's the specific line that's troubling:

This type is used extremely frequently in applications, especially for those that eg. rely on portable libraries for things such as MVVM support etc. that also deal with proper dispatching. The way that works is by capturing the synchronization context during setup and then dispatch through that, meaning this Post method will get invoked every time.

The proposed solution

This PR optimizes the Post implementation by making it faster and zero-alloc.
There are no public API changes.
Opening this as draft as I'm not entirely sure of the procedure to add tests for this for this specific repo.
Pinging @jkoritzinsky as we discussed this feature idea previously on Discord as well 😄

I run some benchmarks for this a while back, got some nice numbers:

Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
TryEnqueueWithClosure 7.062 us 0.1353 us 0.1852 us 1.00 0.0687 - - 312 B
TryEnqueueWithState 1.139 us 0.0229 us 0.0676 us 0.15 - - - -

In this scenario dispatching with this system was about 7x faster, and also completely allocation free.
I'd argue it'd be nice to have such a basic building block like this one to be as efficient as possible.

I'm sure it might be a good idea to setup up some new ones in this repo as well, though I'd need guidance on that 😅

Opening as draft for now because of this, can switch this once everyone's happy with coverage and overall setup.

Related issues/PRs

@Sergio0694 Sergio0694 force-pushed the feature/DispatcherQueueSynchronizationContext-opts branch from 4873d69 to 5cf6ccd Compare November 30, 2021 10:30
@manodasanW
Copy link
Member

Thanks @Sergio0694 for this contribution. A ~7x improvement with reduced memory pressure is definitely great to see in something that can be frequently used. The version of this type that you have edited here is actually a copy of the one in WinAppSDK and is here for test purposes. Adding @jeffstall and @ionvasilian who own this type and should be able to consider and update the actual one in WinAppSDK.

I looked over the change and it looks good to me. It basically looks like you use a natively allocated proxy instance to hold onto the callback and state and make it look like a delegate by implementing Invoke and then fetch them later when invoked to make the call. The only thing I wonder is if the delegate proxy should also implement IReferenceTrackerTarget. I am not sure if XAML reference tracking is used for this type or not, but Jeff or Ion might be able to comment on that.

@Sergio0694
Copy link
Member Author

Hi @manodasanW, thank you for taking a look, really glad to hear this is something that could be interesting to have! 🎉
I wasn't aware this type was just a copy and not the real one (whoops!), but then again the main point of this PR was to show a proof of concept, share some numbers and start a conversation about this. Happy to help migrate this if needed! 😄

"I looked over the change and it looks good to me. It basically looks like you use a natively allocated proxy instance to hold onto the callback and state and make it look like a delegate by implementing Invoke and then fetch them later when invoked to make the call."

Yup, that's correct! The underlying WinRT API for IDispatcherQueue::TryEnqueue is just this one:

int TryEnqueue(IDispatcherQueueHandler* callback, byte* result);

Where that callback is this an object inheriting from IDispatcherQueueHandler, which is:

class IDispatcherQueueHandler : IUnknown
{
    int Invoke();
}

So in this PR I'm just creating my own implementation instead of letting CsWinRT create one wrapping that managed delegate/closure (we're essentially defining a C# struct that implements an object derived from IDispatcherQueueHandler). This means I can then also make it blittable and just store a couple GC handles in it to track the managed objects to invoke later (delegate and optional state), without extra allocations. And the Invoke() method which is then invoked by the WinRT runtime when the callback is dispatched simply grabs the target callback and state back and invokes them, and then ensures exceptions are tracked correctly, if any. That IReferenceTrackerTarget interface hasn't come up while investigating this so far (as in, IDispatcherQueueHandler didn't inherit from it, and when testing this and checking with a breakpoint on QueryInterface, that interface was never queried), but for sure, waiting for Jeff to confirm this or suggest any changes 😊

@Sergio0694
Copy link
Member Author

Publishing this PR as the issue raised by @jkotas has been resolved, and there doesn't seem to be other outstanding ones. Chatted with @jkoritzinsky on Discord and he said it seems all obvious concerns have been addressed. @manodasanW would these changes still also need to be merged here to mirror the actual type in WASDK? @jeffstall, @ionvasilian happy to help eventually migrating these changes to the right place in WASDK if these changes are accepted 😊

@Sergio0694 Sergio0694 marked this pull request as ready for review January 11, 2022 21:18
@Sergio0694 Sergio0694 force-pushed the feature/DispatcherQueueSynchronizationContext-opts branch from 3b3fb83 to ff357bb Compare January 11, 2022 21:21
This relies on Roslyn just compiling ReadOnlySpan<byte> arrays directly into a span mapping to .data in the loaded assembly. Those properties will be JITted to a single mov reading from a constant address, and the class will no longer have a static constructor at all
IDispatcherQueue is just a type directly mapping to the native COM object, so the field is really just used to easily map to the vtable on that object
…mpl"

This reverts commit d952216.
This optimization currently works but results in fragile code as it's relying on an implementation detail which could change in the future. This could be reverted again latere on if/when we got explicit support for alignment for RVA fields (or, support for constant unmanaged buffers built-in).
@Sergio0694 Sergio0694 force-pushed the feature/DispatcherQueueSynchronizationContext-opts branch from ff357bb to 2869a87 Compare February 18, 2022 17:36
using WinRT;

#nullable enable
#pragma warning disable CA1416
Copy link

@benwestprog benwestprog Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CA1416

Could you add a comment documenting why this is disabled? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 52dd2dd 👍

{
if (d == null)
#if NET6_0
Copy link

@benwestprog benwestprog Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, could this whole if/else/endif block be

if (d is null)
{
ThrowArgumentNullException(nameof(d));
}

Looks like we get the same code but much easier to read. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 91f5a08 🙂

/// <returns>The <c>HRESULT</c> for the operation.</returns>
/// <remarks>The <paramref name="callback"/> parameter is assumed to be a pointer to an <c>IDispatcherQueueHandler</c> object.</remarks>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public int TryEnqueue(void* callback, byte* result)
Copy link

@benwestprog benwestprog Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte

Can this be bool to match the native definition more closely? #Closed

Copy link
Member Author

@Sergio0694 Sergio0694 Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use bool* in the actual function pointer signature, or it will cause issues on .NET Standard 2.0 (see here). I've changed the parameter to bool* though as request, and just changed it to byte* internally before invoking the API, which is fine. Change done in a71a63d 🙂

#if NET6_0
DispatcherQueueProxyHandler* @this = (DispatcherQueueProxyHandler*)NativeMemory.Alloc((nuint)sizeof(DispatcherQueueProxyHandler));
#else
DispatcherQueueProxyHandler* @this = (DispatcherQueueProxyHandler*)Marshal.AllocHGlobal(sizeof(DispatcherQueueProxyHandler));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of making a distinction here? Do we need two possible implementations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marshal.AllocHGlobal is particularly slow and generally just not recommended for use anymore (see here). NativeMemory has been introduced in .NET 6 and it's just a lightweigt set of wrappers for the relative C APIs. Once .NET 5 goes out of support (which is like in less than a month anyway) and WASDK drops it, we can just remove all of those extra paths anyway and just leave the ones for .NS2.0 (the default one) and the .NET 6 one. Does that make sense? 🙂

Copy link

@benwestprog benwestprog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@AreaZR
Copy link
Contributor

AreaZR commented Oct 20, 2022

Any updates on this?

@benwestprog
Copy link

We ultimately decided not to port these changes into WinAppSDK. They add considerable code complexity, and we could not justify the added risk.


In reply to: 1285755982

@Sergio0694 Sergio0694 closed this Oct 20, 2022
@mqudsi
Copy link

mqudsi commented Oct 20, 2022

Thanks @Sergio0694 for trying anyway. It would have been nice to have!

(Too bad GH doesn't have a sad-face emoji to express sadness without disapproval. A 👎 or 😕do not convey the same meaning!)

@Sergio0694
Copy link
Member Author

@mqudsi Thank you! 😊

For the record, we've been using this implementation in the Microsoft Store for months already (just slightly different, as the Store uses UWP and .NET Native, not CsWinRT, but it's effectively identical minus a different GUID), and it's something I do want us to open source eventually. If CsWinRT is not the right path, we might look into open sourcing it in the Windows Community Toolkit at some point, maybe packaged in some higher-level API (like the dispatcher service we showed at //BUILD 2022) that people could use like we're doing, we'll see. Of course, you can also just copy this implementation for yourself in the meantime 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants