Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make delegates immutable #99200

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

MichalPetryka
Copy link
Contributor

First attempt at making delegates immutable in CoreCLR so that they can be allocated on the NonGC heap.

I've checked it with a simple app and corerun locally with a delegate from an unloadable ALC and it seemed to not crash, assert nor unload the ALC from under the delegate, however I couldn't actually find any runtime tests that would verify delegates from unloadable ALCs work so the CI coverage might be missing.

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

Contributes to #85014.

cc @jkotas

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 4, 2024

I'm not sure what's up with the failures here, tests that are failing on the CI seem to pass on my machine.
EDIT: I was testing with R2R disabled locally since VS kept complaining about being unable to load PDBs for it.

src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved
MichalPetryka and others added 3 commits March 4, 2024 16:40
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@MichalPetryka MichalPetryka marked this pull request as ready for review March 4, 2024 23:28
@AndyAyersMS
Copy link
Member

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Mar 5, 2024

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

@jkotas
Copy link
Member

jkotas commented Mar 5, 2024

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 5, 2024

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

I think the thing that'd need benchmarking here are equality checks and maybe the impact of collectible delegates being stored in the CWT on the GC, the rest of things shouldn't be performance sensitive enough to matter I think? I'm not fully sure what'd be the proper way for benchmarking the latter.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

I am going to benchmark the impact of the equality change tomorrow, I feel like if the impact won't be big, potential optimizations to it can be done later.

@jkotas
Copy link
Member

jkotas commented Mar 5, 2024

I'm not fully sure what'd be the proper way for benchmarking the latter.

Write a small program that loads an assembly as collectible and calls a method in it. The method in collectible assembly can create delegates in a loop. (If you would like to do it under benchmarknet, it works too - but it is probably more work.)

@jkotas
Copy link
Member

jkotas commented Mar 6, 2024

We should be able to minimize the perf hit for collectible assemblies by piggy backing on the _invocationList field in MulticastDelegate. The field is used to keep things alive in some cases already - it may contain LoaderAllocator - https://github.com/dotnet/runtime/pull/99200/files#diff-1686e89b85836d3c27bf44e0d986c29cec82fba34b66bc76f719d5b1e22cfac9R32.

@jkotas
Copy link
Member

jkotas commented Mar 6, 2024

And if correctness for collectible assemblies can be taken care of by reusing the existing delegate field, ConditionalWeakTable becomes just a throughput optimization for repeated Delegate.Method calls. It may be fine to take a throughput regression for this case. I think it rare for Delegate.Method to be called repeatedly. And when it is called, it is going to be used together with other reflection that is generally slow so it does not need to be super-fast.

@MichalPetryka
Copy link
Contributor Author

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

I've finally gotten around to benchmarking the Equality here and the perf regression here is noticeable:


BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 9 7900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XDNOLE : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-ITZOXA : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 8.0   : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Runtime=.NET 8.0  

Method Job Toolchain Mean Error StdDev Ratio Code Size Allocated Alloc Ratio
TwoCached Job-XDNOLE main 4.439 ns 0.0050 ns 0.0039 ns 0.27 523 B - NA
LeftCached Job-XDNOLE main 16.540 ns 0.1013 ns 0.0898 ns 0.99 523 B - NA
RightCached Job-XDNOLE main 16.529 ns 0.1024 ns 0.0855 ns 0.99 523 B - NA
TwoUncached Job-XDNOLE main 16.682 ns 0.0575 ns 0.0510 ns 1.00 523 B - NA
TwoCached Job-ITZOXA pr 13.864 ns 0.0261 ns 0.0203 ns 0.83 523 B - NA
LeftCached Job-ITZOXA pr 18.714 ns 0.0916 ns 0.0812 ns 1.12 523 B - NA
RightCached Job-ITZOXA pr 23.233 ns 0.0723 ns 0.0677 ns 1.39 523 B - NA
TwoUncached Job-ITZOXA pr 21.247 ns 0.0837 ns 0.0783 ns 1.27 523 B - NA
TwoCached .NET 8.0 Default 5.162 ns 0.0643 ns 0.0537 ns 0.31 565 B - NA
LeftCached .NET 8.0 Default 17.194 ns 0.0628 ns 0.0557 ns 1.03 565 B - NA
RightCached .NET 8.0 Default 17.534 ns 0.0705 ns 0.0588 ns 1.05 565 B - NA
TwoUncached .NET 8.0 Default 17.410 ns 0.0725 ns 0.0678 ns 1.04 565 B - NA

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 23, 2024

@jkotas Would it be possible to repurpose _methodPtrAux to store the VM MethodDesc* and just make equals compare that? IIRC it's fetched already on delegate creation anyway and the pointer seems to be only used for equality today?

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Mar 24, 2024

@jkotas Would it be possible to repurpose _methodPtrAux to store the VM MethodDesc* and just make equals compare that? IIRC it's fetched already on delegate creation anyway and the pointer seems to be only used for equality today?

Hmm, it says object? _target; // Initialized by VM as needed; null if static delegate in the code but apparently this is wrong and the target is the delegate itself for static methods, for my idea to work this would need to be fixed but today setting it to null causes the runtime to crash for some reason.
EDIT: seems like static delegates work completely different than I thought (and this comment suggests), I thought that the runtime creates a new stub for every static method passed to the delegate constructor but it seems to use a single shared stub per delegate type that calls the pointer in _methodPtrAux. This makes my idea impossible to execute then.

@jkotas
Copy link
Member

jkotas commented Mar 24, 2024

it seems to use a single shared stub per delegate type that calls the pointer in _methodPtrAux

Right.

This table has the basic delegate kinds. In addition to that, there are multicast delegates and marshalled interop delegates.

The delegate fields and implementation are split between Delegate and MulticastDelegate for historic reasons. If you would like to explore playing tricks with overloading the fields, I think it would help to move all fields to Delegate in a separate prep-work PR. Similar mechanical refactoring was done for native AOT in #97959.

@MichalPetryka
Copy link
Contributor Author

If you would like to explore playing tricks with overloading the fields, I think it would help to move all fields to Delegate in a separate prep-work PR.

One issue with doing that is that it'd cause the runtime to reorder fields and put the invocationList before the pointer fields, we'd need to use some other tricks like fixed layout to prevent this.

@jkotas
Copy link
Member

jkotas commented Mar 25, 2024

One issue with doing that is that it'd cause the runtime to reorder fields and put the invocationList before the pointer fields, we'd need to use some other tricks like fixed layout to prevent this.

Yes, that's fine. It would be a temporary measure until something forces us to take R2R version reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants