Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

Open
3 tasks
MichalPetryka opened this issue Apr 18, 2023 · 15 comments
Open
3 tasks
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.CompilerServices help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Apr 18, 2023

Background and motivation

Roslyn currently caches lambda delegates in static mutable fields in nested classes and initializes them lazily at the use site (guarded by a null check).
This pattern is inefficient for current runtimes, being hard to detect and optimize correctly, leading to dead code being left in generated assembly and making optimizations like delegate inlining more tricky.
Discussion here ended up with two possible solutions: creating a runtime intrinsic that'd create and cache the delegate for Roslyn or caching the delegates in static readonly fields which CoreCLR and Native AOT can analyze easily even today but that'd come with some metadata cost.

cc @jkotas @EgorBo @jaredpar @CyrusNajmabadi

API Proposal

From @jkotas in dotnet/csharplang#6746 (reply in thread):

public class RuntimeHelpers
{
    [Intrinsic]
    // ldftn instruction must immediately precede call of this method. The target method must
    // match the delegate signature and it must be an instance method on the TScope reference. TScope should
    // have no instance fields. Given function pointer can be only mapped to one delegate type using this method.
    static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class, new(), TDelegate: delegate
    {
        // This is mock implementation. This method is always going to be expanded as JIT intrinsic.
        lock (s_table)
        {
            MethodInfo mi = MapEntryPointToMethodInfo(typeof(TScope), ftn);

            if (s_table.TryGetValue(mi, out Delegate del))
                return (TDelegate)del;

            Delegate ret = typeof(TScope).IsCollectible ? 
                 InternalCreateDelegate(typeof(TDelegate), new TScope(), ftn) :
                 InternalCreateFrozenDelegate(typeof(TDelegate), new-frozen TScope(), ftn);
            s_table.Add(mi, ret);
            return (TDelegate)ret;
        }
    }

    static readonly ConditionalWeakTable<MethodInfo, Delegate> s_table;
}

API Usage

This would be used internally by Roslyn when emitting lambdas.

Alternative Designs

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes, but it'd introduce either more metadata bloat or more eager delegate object allocation (we'd either need a separate nested class per lambda or calling one lambda would then create delegates for all the lambdas in the nested class).

Risks

All runtimes including Mono would need to have this implemented.


Tasks:

  • Refactor delegates to be immutable (see below)
  • Build a functional-enough prototype
  • Submit an API proposal for review, incorporating learnings from the prototype
@MichalPetryka MichalPetryka added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 18, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 18, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 18, 2023
@tannergooding tannergooding added area-System.Runtime.CompilerServices and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 18, 2023
@ghost
Copy link

ghost commented Apr 18, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-compilerservices
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Roslyn currently caches lambda delegates in static mutable fields in nested classes and initializes them lazily at the use site (guarded by a null check).
This pattern is inefficient for current runtimes, being hard to detect and optimize correctly, leading to dead code being left in generated assembly and making optimizations like delegate inlining more tricky.
Discussion here ended up with two possible solutions: creating a runtime intrinsic that'd create and cache the delegate for Roslyn or caching the delegates in static readonly fields which CoreCLR and Native AOT can analyze easily even today but that'd come with some metadata cost.

cc @jkotas @EgorBo @jaredpar @CyrusNajmabadi

API Proposal

From @jkotas in dotnet/csharplang#6746 (reply in thread):

public class RuntimeHelpers
{
    [Intrinsic]
    // ldftn instruction must immediately precede call of this method. The target method must
    // match the delegate signature and it must be an instance method on the TScope reference. TScope should
    // have no instance fields. Given function pointer can be only mapped to one delegate type using this method.
    static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class, TDelegate: delegate
    {
        // This is mock implementation. This method is always going to be expanded as JIT intrinsic.
        lock (s_table)
        {
            MethodInfo mi = MapEntryPointToMethodInfo(typeof(TScope), ftn);

            if (s_table.TryGetValue(mi, out Delegate del))
                return (TDelegate)del;

            Delegate ret = typeof(TScope).IsCollectible ? 
                 InternalCreateDelegate(typeof(TDelegate), new TScope(), ftn) :
                 InternalCreateFrozenDelegate(typeof(TDelegate), new-frozen TScope(), ftn);
            s_table.Add(mi, ret);
            return (TDelegate)ret;
        }
    }

    static readonly ConditionalWeakTable<MethodInfo, Delegate> s_table;
}

API Usage

This would be used internally by Roslyn when emitting lambdas.

Alternative Designs

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes, but it'd introduce either more metadata bloat or more eager delegate object allocation (we'd either need a separate nested class per lambda or calling one lambda would then create delegates for all the lambdas in the nested class).

Risks

All runtimes including Mono would need to have this implemented.

Author: MichalPetryka
Assignees: -
Labels:

api-suggestion, area-System.Runtime.CompilerServices, untriaged

Milestone: -

@jaredpar
Copy link
Member

static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class,

Needs to include the new() constraint.

@jaredpar
Copy link
Member

jaredpar commented Apr 18, 2023

This will require a bit of refactoring on Roslyn's part. Today our emit strategy puts all lambdas for a given scope into the same closure type. That means it mixes capturing and non-capturing lambdas. In order to meet the constraints of this API we'll need to break up our closures such that non-capturing lambdas are never mixed with capturing ones.

I think this can be done by having a single closure per method for all of our non-capturing lambdas vs. having up to two per scope. Hopefully that is a bit cheaper given our implementation.

@EgorBo
Copy link
Member

EgorBo commented Apr 19, 2023

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes

The plan is to allocate them on a frozen (nongc) segment which is not supported for static readonly in JIT today (but it can be done, just wanted to point out that there will be some runtime(jit) work any way).

@MichalPetryka
Copy link
Contributor Author

Today our emit strategy puts all lambdas for a given scope into the same closure type.

It doesn't, it emits a type for all methods from the type.

@jaredpar
Copy link
Member

It doesn't, it emits a type for all methods from the type.

My bad. It does look like we implemented that optimization already. I thought we had a number of cases where we still combined them with the capturing lambdas but does not appear so.

@MichalPetryka
Copy link
Contributor Author

This should also be used for creation of delegates from method groups.

@MichalPetryka
Copy link
Contributor Author

@jkotas @EgorBo While working on #85349, I've accidentally made delegate creation be regarded as slightly cheaper for the inliner, which triggers huge diffs for all the lambdas with the mostly dead initialization branch. This means that changing this will also trigger huge diffs due to removal of that branch.

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label May 26, 2023
@tannergooding
Copy link
Member

I'll leave this open for the time being, but until I get the green light from Jan and Jared, this will remain an api-suggestion.

@jkotas
Copy link
Member

jkotas commented May 27, 2023

In order to be able to allocate delegates as frozen heap, they must be immutable, or more precisely they cannot point to objects that are not allocated on frozen heap.

The delegates are not immutable in CoreCLR today. They contain cached MethodInfo that is initialized lazily:

if ((_methodBase == null) || !(_methodBase is MethodInfo))
{
IRuntimeMethodInfo method = FindMethodHandle();
RuntimeType? declaringType = RuntimeMethodHandle.GetDeclaringType(method);
// need a proper declaring type instance method on a generic type
if (declaringType.IsGenericType)
{
bool isStatic = (RuntimeMethodHandle.GetAttributes(method) & MethodAttributes.Static) != (MethodAttributes)0;
if (!isStatic)
{
if (_methodPtrAux == IntPtr.Zero)
{
// The target may be of a derived type that doesn't have visibility onto the
// target method. We don't want to call RuntimeType.GetMethodBase below with that
// or reflection can end up generating a MethodInfo where the ReflectedType cannot
// see the MethodInfo itself and that breaks an important invariant. But the
// target type could include important generic type information we need in order
// to work out what the exact instantiation of the method's declaring type is. So
// we'll walk up the inheritance chain (which will yield exactly instantiated
// types at each step) until we find the declaring type. Since the declaring type
// we get from the method is probably shared and those in the hierarchy we're
// walking won't be we compare using the generic type definition forms instead.
Type? currentType = _target!.GetType();
Type targetType = declaringType.GetGenericTypeDefinition();
while (currentType != null)
{
if (currentType.IsGenericType &&
currentType.GetGenericTypeDefinition() == targetType)
{
declaringType = currentType as RuntimeType;
break;
}
currentType = currentType.BaseType;
}
// RCWs don't need to be "strongly-typed" in which case we don't find a base type
// that matches the declaring type of the method. This is fine because interop needs
// to work with exact methods anyway so declaringType is never shared at this point.
Debug.Assert(currentType != null || _target.GetType().IsCOMObject, "The class hierarchy should declare the method");
}
else
{
// it's an open one, need to fetch the first arg of the instantiation
MethodInfo invoke = this.GetType().GetMethod("Invoke")!;
declaringType = (RuntimeType)invoke.GetParameters()[0].ParameterType;
}
}
}
_methodBase = (MethodInfo)RuntimeType.GetMethodBase(declaringType, method)!;
. So the first step is to figure out what to do about this cache. Is it possible to get rid of it and make delegate objects smaller? It should be fine to make Delegate.Method property slower if it makes Delegate objects smaller.

@jkotas jkotas added the help wanted [up-for-grabs] Good issue for external contributors label May 27, 2023
@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented May 27, 2023

Could the cache maybe use RuntimeMethodHandle.Value instead? Then the field will just be an IntPtr.

@jkotas
Copy link
Member

jkotas commented May 27, 2023

Could the cache maybe use RuntimeMethodHandle.Value instead? Then the field will just be an IntPtr.

If we are not going to store MethodInfo directly in the delegate, there needs to be a lookup or mapping operation involved. If we are going to pay for a mapping or lookup operation, we can as well get rid of the field and use the existing fields or the delegate instance as the key for the lookup (notice that the existing implementation is able to compute the MethodInfo from existing fields). Some options:

  • Cache it using the method pointer as the key. Skip caching for collectible assemblies.
  • Cache it in ConditionalWeakTable using delegate instance as the key.
  • Do not cache it at all. Declare Delegate.Method to be a slow operation.

None that the field is overloaded for other purposes, but those other purposes do not look that hard to refactor away.

@MichalPetryka
Copy link
Contributor Author

  • Cache it in ConditionalWeakTable using delegate instance as the key.

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

@EgorBo
Copy link
Member

EgorBo commented May 29, 2023

  • Cache it in ConditionalWeakTable using delegate instance as the key.

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

delegates from collectible assemblies won't be allocated on frozen segments anyway

@jkotas
Copy link
Member

jkotas commented May 29, 2023

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

It should be possible to store the object reference that keeps collectible assemblies alive in _invocationList field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.CompilerServices help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

6 participants