-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DispatcherQueueSynchronizationContext faster and zero-alloc #1058
Make DispatcherQueueSynchronizationContext faster and zero-alloc #1058
Conversation
4873d69
to
5cf6ccd
Compare
Thanks @Sergio0694 for this contribution. A ~7x improvement with reduced memory pressure is definitely great to see in something that can be frequently used. The version of this type that you have edited here is actually a copy of the one in WinAppSDK and is here for test purposes. Adding @jeffstall and @ionvasilian who own this type and should be able to consider and update the actual one in WinAppSDK. I looked over the change and it looks good to me. It basically looks like you use a natively allocated proxy instance to hold onto the callback and state and make it look like a delegate by implementing Invoke and then fetch them later when invoked to make the call. The only thing I wonder is if the delegate proxy should also implement IReferenceTrackerTarget. I am not sure if XAML reference tracking is used for this type or not, but Jeff or Ion might be able to comment on that. |
Hi @manodasanW, thank you for taking a look, really glad to hear this is something that could be interesting to have! 🎉
Yup, that's correct! The underlying WinRT API for int TryEnqueue(IDispatcherQueueHandler* callback, byte* result); Where that class IDispatcherQueueHandler : IUnknown
{
int Invoke();
} So in this PR I'm just creating my own implementation instead of letting CsWinRT create one wrapping that managed delegate/closure (we're essentially defining a C# struct that implements an object derived from |
...icrosoft.UI.Dispatching.DispatcherQueueSynchronizationContext.DispatcherQueueProxyHandler.cs
Outdated
Show resolved
Hide resolved
Publishing this PR as the issue raised by @jkotas has been resolved, and there doesn't seem to be other outstanding ones. Chatted with @jkoritzinsky on Discord and he said it seems all obvious concerns have been addressed. @manodasanW would these changes still also need to be merged here to mirror the actual type in WASDK? @jeffstall, @ionvasilian happy to help eventually migrating these changes to the right place in WASDK if these changes are accepted 😊 |
3b3fb83
to
ff357bb
Compare
This relies on Roslyn just compiling ReadOnlySpan<byte> arrays directly into a span mapping to .data in the loaded assembly. Those properties will be JITted to a single mov reading from a constant address, and the class will no longer have a static constructor at all
IDispatcherQueue is just a type directly mapping to the native COM object, so the field is really just used to easily map to the vtable on that object
…mpl" This reverts commit d952216. This optimization currently works but results in fragile code as it's relying on an implementation detail which could change in the future. This could be reverted again latere on if/when we got explicit support for alignment for RVA fields (or, support for constant unmanaged buffers built-in).
ff357bb
to
2869a87
Compare
using WinRT; | ||
|
||
#nullable enable | ||
#pragma warning disable CA1416 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 52dd2dd 👍
{ | ||
if (d == null) | ||
#if NET6_0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, could this whole if/else/endif block be
if (d is null)
{
ThrowArgumentNullException(nameof(d));
}
Looks like we get the same code but much easier to read. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 91f5a08 🙂
/// <returns>The <c>HRESULT</c> for the operation.</returns> | ||
/// <remarks>The <paramref name="callback"/> parameter is assumed to be a pointer to an <c>IDispatcherQueueHandler</c> object.</remarks> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public int TryEnqueue(void* callback, byte* result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...s/Reunion/Microsoft.UI.Dispatching.DispatcherQueueSynchronizationContext.IDispatcherQueue.cs
Outdated
Show resolved
Hide resolved
#if NET6_0 | ||
DispatcherQueueProxyHandler* @this = (DispatcherQueueProxyHandler*)NativeMemory.Alloc((nuint)sizeof(DispatcherQueueProxyHandler)); | ||
#else | ||
DispatcherQueueProxyHandler* @this = (DispatcherQueueProxyHandler*)Marshal.AllocHGlobal(sizeof(DispatcherQueueProxyHandler)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of making a distinction here? Do we need two possible implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marshal.AllocHGlobal
is particularly slow and generally just not recommended for use anymore (see here). NativeMemory
has been introduced in .NET 6 and it's just a lightweigt set of wrappers for the relative C APIs. Once .NET 5 goes out of support (which is like in less than a month anyway) and WASDK drops it, we can just remove all of those extra paths anyway and just leave the ones for .NS2.0 (the default one) and the .NET 6 one. Does that make sense? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Projections/Reunion/Microsoft.UI.Dispatching.DispatcherQueueSynchronizationContext.cs
Outdated
Show resolved
Hide resolved
Any updates on this? |
We ultimately decided not to port these changes into WinAppSDK. They add considerable code complexity, and we could not justify the added risk. In reply to: 1285755982 |
Thanks @Sergio0694 for trying anyway. It would have been nice to have! (Too bad GH doesn't have a sad-face emoji to express sadness without disapproval. A 👎 or 😕do not convey the same meaning!) |
@mqudsi Thank you! 😊 For the record, we've been using this implementation in the Microsoft Store for months already (just slightly different, as the Store uses UWP and .NET Native, not CsWinRT, but it's effectively identical minus a different GUID), and it's something I do want us to open source eventually. If CsWinRT is not the right path, we might look into open sourcing it in the Windows Community Toolkit at some point, maybe packaged in some higher-level API (like the dispatcher service we showed at //BUILD 2022) that people could use like we're doing, we'll see. Of course, you can also just copy this implementation for yourself in the meantime 🙂 |
Overview
The current
DispatcherQueueSynchronizationContext
implementation is rather inefficient, for two reasons:Here's the specific line that's troubling:
CsWinRT/src/Projections/Reunion/Microsoft.UI.Dispatching.DispatcherQueueSynchronizationContext.cs
Line 25 in 31a191d
This type is used extremely frequently in applications, especially for those that eg. rely on portable libraries for things such as MVVM support etc. that also deal with proper dispatching. The way that works is by capturing the synchronization context during setup and then dispatch through that, meaning this
Post
method will get invoked every time.The proposed solution
This PR optimizes the
Post
implementation by making it faster and zero-alloc.There are no public API changes.
Opening this as draft as I'm not entirely sure of the procedure to add tests for this for this specific repo.
Pinging @jkoritzinsky as we discussed this feature idea previously on Discord as well 😄
I run some benchmarks for this a while back, got some nice numbers:
In this scenario dispatching with this system was about 7x faster, and also completely allocation free.
I'd argue it'd be nice to have such a basic building block like this one to be as efficient as possible.
I'm sure it might be a good idea to setup up some new ones in this repo as well, though I'd need guidance on that 😅
Opening as draft for now because of this, can switch this once everyone's happy with coverage and overall setup.
Related issues/PRs