Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hashcodes when looking up the JsonSerializerOptions global cache. #76782

Merged
merged 2 commits into from
Oct 17, 2022

Conversation

eiriktsarpalis
Copy link
Member

Following up on #76607 this PR adds hashcode comparison when looking up the cache. This should minimize the number of full-blown equality comparisons when traversing a saturated cache.

@ghost
Copy link

ghost commented Oct 8, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Following up on #76607 this PR adds hashcode comparison when looking up the cache. This should minimize the number of full-blown equality comparisons when traversing a saturated cache.

Author: eiriktsarpalis
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Oct 8, 2022
Copy link
Member

@krwq krwq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems like it's complicating implementation but saturated cache doesn't seem like it's going to be common scenario and in any other case we do it just once.

Is this actually worth to do this? I'd rather have cleaner code

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 11, 2022

saturated cache doesn't seem like it's going to be common scenario

A saturated cache is precisely the scenario we're expecting in cases where users aren't reusing their JsonSerializerOptions instances, specifically here:

https://github.com/dotnet/performance/blob/4dad164bbeebd3356f35ecfe2b42a10b75561427/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ColdStartSerialization.cs#L53-L60

An added hashcode calculation does theoretically add more work for cases where the new JsonSerializerOptions instance always hits the one and only entry in the global cache:

https://github.com/dotnet/performance/blob/4dad164bbeebd3356f35ecfe2b42a10b75561427/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ColdStartSerialization.cs#L40-L41

But really is more of an artificial scenario that only surfaces in benchmarks.

@krwq
Copy link
Member

krwq commented Oct 11, 2022

@eiriktsarpalis not reusing is not gonna cause saturation because even different instance should match the equality check meaning using same cache - the only exception is if they add custom converters or type info resolver. We should not optimize for cases where user does more advanced things but not using APIs correctly (i.e. optimizing default case/simple changes of bool flags etc sounds good to me but converters/resolvers are not simple scenarios so users should re-use options in those cases).

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 11, 2022

even different instance should match the equality check meaning using same cache

That's not always true, the equality check uses reference equality for converters, contract resolvers, naming policy, etc. (because it can't reliably use any other form of equality). In such cases it is pretty much expected that the cache will be filled up quickly with identical entries and that's precisely what this scenario is meant to address.

We should not optimize for cases where user does more advanced things but not using APIs correctly

Using a custom converter is not "advanced" or uncommon. Arguably the entire global cache is an optimization for improper use of our APIs, but they are widespread enough to warrant mitigation.

@krwq
Copy link
Member

krwq commented Oct 11, 2022

But will hash code help in those cases? Should we instead special-case JsonStringEnumConverter? (or implement Equals/GetHashCode for those known public converters)

@krwq
Copy link
Member

krwq commented Oct 11, 2022

generally for TypeInfoResolver and custom converters (not public in STJ and potentially frequently used like JsonStringEnumConverter) I'd consider them advanced scenario and I'd expect users to re-use options in those cases. Only simple scenarios is where I'd be more forgetful about misusage

@jkotas
Copy link
Member

jkotas commented Oct 11, 2022

Any perf numbers for cases that this change makes better and cases that this change makes worse?

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 11, 2022

Should we instead special-case JsonStringEnumConverter?

That's an independent concern I would think. I don't think we should be implementing special equality semantics on converters just to improve this cache.

But will hash code help in those cases?

Any perf numbers for cases that this change makes better and cases that this change makes worse?

It regresses performance when hitting the first entry in the cache, while improving it when hitting the entries further down in the cache. Generally it appears to balance performance in the case of cache hits. Cache misses are largely unchanged since the cost of rebuilding metadata is by far the dominant factor.

Method Job Branch Mean Error StdDev Median Min Max Ratio MannWhitney(3%) RatioSD Gen 0 Gen 1 Gen 2 Allocated Alloc Ratio
NewDefaultOptions Job-VONQSL main 914.9 ns 49.30 ns 56.77 ns 923.1 ns 817.6 ns 998.6 ns 1.00 Base 0.00 0.0426 0.0033 0.0033 460 B 1.00
NewDefaultOptions Job-TPJGAG PR 1,039.9 ns 48.13 ns 55.43 ns 1,033.1 ns 958.0 ns 1,167.7 ns 1.14 Slower 0.09 0.0391 0.0039 0.0039 460 B 1.00
NewCustomizedOptions Job-VONQSL main 1,198.2 ns 71.32 ns 82.13 ns 1,186.6 ns 1,073.6 ns 1,359.2 ns 1.00 Base 0.00 0.0436 0.0044 0.0044 485 B 1.00
NewCustomizedOptions Job-TPJGAG PR 1,216.8 ns 54.65 ns 62.93 ns 1,232.2 ns 1,098.2 ns 1,331.7 ns 1.02 Same 0.09 0.0450 0.0045 0.0045 486 B 1.00
NewCustomConverter Job-VONQSL main 19,317.6 ns 422.06 ns 469.12 ns 19,359.9 ns 18,459.7 ns 20,078.9 ns 1.00 Base 0.00 0.7585 0.0758 - 8003 B 1.00
NewCustomConverter Job-TPJGAG PR 19,038.8 ns 330.91 ns 293.35 ns 19,082.1 ns 18,337.0 ns 19,371.3 ns 0.98 Same 0.03 0.7327 0.0733 - 8013 B 1.00
NewCachedCustomConverter Job-VONQSL main 1,391.5 ns 51.00 ns 58.73 ns 1,392.6 ns 1,297.3 ns 1,511.8 ns 1.00 Base 0.00 0.0475 0.0053 0.0053 543 B 1.00
NewCachedCustomConverter Job-TPJGAG PR 1,234.4 ns 55.97 ns 64.45 ns 1,234.7 ns 1,119.0 ns 1,359.3 ns 0.89 Faster 0.06 0.0489 0.0049 0.0049 544 B 1.00

@jkotas
Copy link
Member

jkotas commented Oct 13, 2022

I do not have a strong opinion on this. The code delta looks ok. I am not sure whether it is worth it based on your data. It makes the case that feels more common a bit slower, and the case that feels less common a bit faster.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 13, 2022

I would argue that it regresses the case we're more likely to see in an isolated microbenchmark while improving scenaria we're more likely to see in an application that would saturate the cache with variable cofigurations/false negative equality comparisons. It's making performance more predictable, but as you're pointing out not drastically so.

@jkotas
Copy link
Member

jkotas commented Oct 13, 2022

If somebody does not understand the importance of caching the options and uses converters, I think they are likely to write code like this that is not getting improved at all: https://github.com/dotnet/performance/blob/4dad164bbeebd3356f35ecfe2b42a10b75561427/src/benchmarks/micro/libraries/System.Text.Json/Serializer/ColdStartSerialization.cs#L56-L60 .

@eiriktsarpalis
Copy link
Member Author

That's a more general criticism of the global cache. We can certainly have a conversation about whether such a cache is even necessary, and pivot our strategy towards an analyzer or even advocate for APIs that avoid that particular pit of failure.

@jkotas
Copy link
Member

jkotas commented Oct 13, 2022

The global cache exists to help with the most common patterns of JsonSerializerOptions misuse. I expect that the likelihood of the different patterns (sorted from the most common to the least common) is:

  1. NewDefaultOptions
  2. NewCustomizedOptions
  3. NewCustomConverter
  4. NewCachedCustomConverter

My point is that this change makes 2 that I believe to be more common slower, and 4 that I believe to be less common faster. It does not sound right to be optimizing for the less common cases at the cost of making the more common cases slower.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Oct 13, 2022

  1. NewDefaultOptions

Why would somebody want to pass a default instance without parameterization? It's functionally equivalent to passing no options parameter at all.

  1. that I believe to be less common faster.

The example is completely artificial and unlikely to ever materialize in real code, what the benchmark does represent however is a cache hit with nontrivial configuration.

@stephentoub
Copy link
Member

stephentoub commented Oct 13, 2022

whether such a cache is even necessary, and pivot our strategy towards an #65396 or even advocate for #74492 that avoid that particular pit of failure.

We should do the analyzer regardless of this cache.

Also, while instance JsonSerializer methods might help, I fully expect they'll just kick the can down the road a bit. We'll start to see folks new'ing up a JsonSerializer for each operation, just as they do today for an options bag. It might be a bit more intuitive that the serializer itself has resources associated with it, but these resources are so expensive comparatively that I think we'll still find a non-trivial number of folks new'ing it up each time because "what's one more object".

@eiriktsarpalis
Copy link
Member Author

Maybe, but I generally believe that API names have suggestive power. A class called "options" tends to be interpreted as a throwaway property bag (and in fact most users express surprise when we tell them that JsonSerializerOptions encapsulates a caching context). A type called "serializer" OTOH could suggest something stateful that is expensive to create and is therefore worth reusing. It wouldn't completely prevent folks from doing this however (just like nothing prevented folks from creating new HttpClient instances on every request).

But then again, perhaps this ship has sailed.

@jkotas
Copy link
Member

jkotas commented Oct 13, 2022

Why would somebody want to pass a default instance without parameterization? It's functionally equivalent to passing no options parameter at all.

Ok, you can delete this case from my list.

Do we have examples of real code where people used non-trivial configuration with non-cached json options that would be improved by this PR?

The Blazor code that motivated this change looks like the NewCustomConverter case and so it won't see any improvements: https://github.com/dotnet/aspnetcore/blob/442a380854c25b85252a743c3ba2e21f08fa6879/src/JSInterop/Microsoft.JSInterop/src/JSRuntime.cs#L31-L44

@eiriktsarpalis
Copy link
Member Author

NB NewCustomConverter does register performance improvement, even though percentage-wise it is negligible given the exorbitant cost of metadata recalculation. The wider point though is that this change clearly improves performance in all cases where you're not expected to get a cache hit in the first couple of elements in the cache array.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wider point though is that this change clearly improves performance in all cases where you're not expected to get a cache hit in the first couple of elements in the cache array.

Makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants