-
Notifications
You must be signed in to change notification settings - Fork 510
Speed up typeof(Foo) and Object.GetType by 50+% #8095
Speed up typeof(Foo) and Object.GetType by 50+% #8095
Conversation
This adds a general purpose concept of "writable data" to the EEType data structure. This points to a block of memory whose purpose is defined by the class library (we can potentially cache all sorts of stuff there). The writable data is used to hold a GC handle to the runtime `System.Type` instance associated with the EEType. This bypasses the hashtable lookup that has been used before. I'm seeing more than 50% TP improvement against best case scenarios with the hashtable (best case defined as "the thing we're looking for is the first entry in the hashtable bucket"). Both CoreCLR and Mono use a similar approach. CoreRT was lagging in perf because of the hashtable lookup: this brings us much closer. We're still not as fast as CoreCLR, but getting faster than this would have impacts elsewhere (in terms of working set and size on disk). The implementation I chose for CoreRT (optional relative pointer field) has minimal working set impact.
{ | ||
// If writable data is supported, we shouldn't be using the hashtable - the runtime type | ||
// is accessible through a couple indirections from the EETypePtr which is much faster. | ||
Debug.Assert(Internal.Runtime.EEType.SupportsWritableData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug.Assert(Internal.Runtime.EEType.SupportsWritableData); | |
Debug.Assert(!Internal.Runtime.EEType.SupportsWritableData); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch. I guess CppCodegen would quickly point that out.
src/System.Private.CoreLib/src/Internal/Reflection/Core/NonPortable/RuntimeTypeUnifier.cs
Outdated
Show resolved
Hide resolved
"Well sure let's debug this Mac specific issue."
|
😤 This was a bug in ObjectWriter. Also had to turn off warnings as errors when building the fixed ObjectWriter because building LLVM we use for the ObjectWriter library generates warnings with the latest Xcode tools. |
Should probably mention that the only reason why I'm even doing this pull request is the discussion around #7962 (comment). Finally the saga is coming to an end. |
// identity of EEType pointers). There is another unifier behind that cache | ||
// that ensures this code is race-free. | ||
Type result = RuntimeTypeUnifier.GetRuntimeTypeBypassCache(eeType); | ||
GCHandle tempHandle = GCHandle.Alloc(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I keep coming back to is whether there would be any advantage in making this a weak handle. The fast retrieval path would get slower because now we need to check both whether the handle is allocated and whether the target wasn't collected. But we might be able to reclaim memory. @jkotas do you have an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to tell whether it would be an improvement.
I think it would be best to match CoreCLR caching strategy. Keep a lightweight System.RuntimeType instance referenced via strong handle. Cache the heavy reflection inside this RuntimeType via a weakhandle. This is the caching strategy that was found to work best for realworld code (iirc, there was investigation on this done for .NET Framework 4.0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is similar to how the reflection stack is structured - the lightweight RuntimeType
instances hold onto a couple things (e.g. for named types, it holds onto the RuntimeTypeHandle
(EETypePtr), a lazy cache, the metadata reader, the type definition and type definition handle, and a "namespace chain"). The rest of the "heavy" things are cached in the lazy cache (TypeComponentsCache
).
The cache currently caches things in a ConcurrentUnifier
, but maybe it could be a ConcurrentUnifierW
(the flavor that holds things using a weak reference).
We can look at that at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lazy cache, the metadata reader, the type definition and type definition handle, and a "namespace chain"
That is a lot more than in the CoreCLR case. I like the CoreCLR model much better.
This adds a general purpose concept of "writable data" to the EEType data structure. This points to a block of memory whose purpose is defined by the class library (we can potentially cache all sorts of stuff there).
The writable data is used to hold a GC handle to the runtime
System.Type
instance associated with the EEType. This bypasses the hashtable lookup that has been used before.I'm seeing more than 50% TP improvement against best case scenarios with the hashtable (best case defined as "the thing we're looking for is the first entry in the hashtable bucket").
Both CoreCLR and Mono use a similar approach. CoreRT was lagging in perf because of the hashtable lookup: this brings us much closer. We're still not as fast as CoreCLR, but getting faster than this would have impacts elsewhere (in terms of working set and size on disk). The implementation I chose for CoreRT (optional relative pointer field) has minimal working set impact.