-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hashcodes when looking up the JsonSerializerOptions global cache. #76782
Use hashcodes when looking up the JsonSerializerOptions global cache. #76782
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsFollowing up on #76607 this PR adds hashcode comparison when looking up the cache. This should minimize the number of full-blown equality comparisons when traversing a saturated cache.
|
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Outdated
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Outdated
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Outdated
Show resolved
Hide resolved
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems like it's complicating implementation but saturated cache doesn't seem like it's going to be common scenario and in any other case we do it just once.
Is this actually worth to do this? I'd rather have cleaner code
A saturated cache is precisely the scenario we're expecting in cases where users aren't reusing their JsonSerializerOptions instances, specifically here: An added hashcode calculation does theoretically add more work for cases where the new JsonSerializerOptions instance always hits the one and only entry in the global cache: But really is more of an artificial scenario that only surfaces in benchmarks. |
@eiriktsarpalis not reusing is not gonna cause saturation because even different instance should match the equality check meaning using same cache - the only exception is if they add custom converters or type info resolver. We should not optimize for cases where user does more advanced things but not using APIs correctly (i.e. optimizing default case/simple changes of bool flags etc sounds good to me but converters/resolvers are not simple scenarios so users should re-use options in those cases). |
That's not always true, the equality check uses reference equality for converters, contract resolvers, naming policy, etc. (because it can't reliably use any other form of equality). In such cases it is pretty much expected that the cache will be filled up quickly with identical entries and that's precisely what this scenario is meant to address.
Using a custom converter is not "advanced" or uncommon. Arguably the entire global cache is an optimization for improper use of our APIs, but they are widespread enough to warrant mitigation. |
But will hash code help in those cases? Should we instead special-case JsonStringEnumConverter? (or implement Equals/GetHashCode for those known public converters) |
generally for TypeInfoResolver and custom converters (not public in STJ and potentially frequently used like JsonStringEnumConverter) I'd consider them advanced scenario and I'd expect users to re-use options in those cases. Only simple scenarios is where I'd be more forgetful about misusage |
Any perf numbers for cases that this change makes better and cases that this change makes worse? |
That's an independent concern I would think. I don't think we should be implementing special equality semantics on converters just to improve this cache.
It regresses performance when hitting the first entry in the cache, while improving it when hitting the entries further down in the cache. Generally it appears to balance performance in the case of cache hits. Cache misses are largely unchanged since the cost of rebuilding metadata is by far the dominant factor.
|
I do not have a strong opinion on this. The code delta looks ok. I am not sure whether it is worth it based on your data. It makes the case that feels more common a bit slower, and the case that feels less common a bit faster. |
I would argue that it regresses the case we're more likely to see in an isolated microbenchmark while improving scenaria we're more likely to see in an application that would saturate the cache with variable cofigurations/false negative equality comparisons. It's making performance more predictable, but as you're pointing out not drastically so. |
If somebody does not understand the importance of caching the options and uses converters, I think they are likely to write code like this that is not getting improved at all: https://github.com/dotnet/performance/blob/4dad164bbeebd3356f35ecfe2b42a10b75561427/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ColdStartSerialization.cs#L56-L60 . |
The global cache exists to help with the most common patterns of JsonSerializerOptions misuse. I expect that the likelihood of the different patterns (sorted from the most common to the least common) is:
My point is that this change makes 2 that I believe to be more common slower, and 4 that I believe to be less common faster. It does not sound right to be optimizing for the less common cases at the cost of making the more common cases slower. |
Why would somebody want to pass a default instance without parameterization? It's functionally equivalent to passing no options parameter at all.
The example is completely artificial and unlikely to ever materialize in real code, what the benchmark does represent however is a cache hit with nontrivial configuration. |
We should do the analyzer regardless of this cache. Also, while instance JsonSerializer methods might help, I fully expect they'll just kick the can down the road a bit. We'll start to see folks new'ing up a JsonSerializer for each operation, just as they do today for an options bag. It might be a bit more intuitive that the serializer itself has resources associated with it, but these resources are so expensive comparatively that I think we'll still find a non-trivial number of folks new'ing it up each time because "what's one more object". |
Maybe, but I generally believe that API names have suggestive power. A class called "options" tends to be interpreted as a throwaway property bag (and in fact most users express surprise when we tell them that But then again, perhaps this ship has sailed. |
Ok, you can delete this case from my list. Do we have examples of real code where people used non-trivial configuration with non-cached json options that would be improved by this PR? The Blazor code that motivated this change looks like the |
NB |
...braries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializerOptions.Caching.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wider point though is that this change clearly improves performance in all cases where you're not expected to get a cache hit in the first couple of elements in the cache array.
Makes sense.
Following up on #76607 this PR adds hashcode comparison when looking up the cache. This should minimize the number of full-blown equality comparisons when traversing a saturated cache.