-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0.68 introduces memory leak #546
Comments
Upon further testing, it seems the memory leak was introduced in 1.0.68. Reverting back to .67 fixes the problem. |
Can you still please try to run the JDK 17 JVM you seem to be using as the Scala/ammonite runtime platform to start with java command line option "-XX:+HeapDumpOnOutOfMemoryError" 😄 For details, see: This would allow Java developers to pinpoint the place where the memory is lost rather than looking for a needle in a haystack... 😢 |
Note: |
Thanks for the quick response. I can't send a heap dump based on the current JSON source that I'm using; I'll need to run something over the JSON to replace many strings with random characters. But I can work on that. I was hoping that you might have some JSON files and Schemas nearby yourself, to use with the script I posted, as I doubt the exact content of my JSON matters for triggering the bug. Just that the file is fairly large. If you're on a Mac, then it's just (It looks for a |
If you can't disclose the whole heap dump due to the private contents of its Strings, can you maybe just list the major findings from opening the heap dump in a heap dump analyzer (like Eclipse MAT etc) so that we have a chance to see objects of which class/in which map etc. (regardless of particular content) do accumulate without limits over the course of time? |
Sure, I can do that. It's 1:30am here though, so it'll be "tomorrow" for me when I get to it. Thanks |
Also, I have built a redacted JSON file, so I'm happy to share a heap dump with you privately if you like. |
Sorry, I myself don't have time until this evening (Tue, and I'm in Europe/MET) to look into it (just in case nobody else wants to take over in the meantime... 😀), but the heap dump should compress quite well, so I assume you can/should even be able to reduce -Xmx and attach the redacted/"anonymized" one here... Use the max compression level of e.g. xz, bz2 or zip and attach the compressed dump (max file size here seems to be 25MB). If that's not sufficient, can you prepare/provide a download link for the heap dump? Thx! 😄 But I think even the info from the screenshot in your previous comment will most likely help to find out what's going wrong here in the AnyOfValidator, it's a pretty good starting point... 👍 |
Sorry - I still haven't received a heap dump from @TJC ... 😞 Please be informed that without a "readily prepared" heap dump to analyze, I will only find some time to reproduce the issue by myself over the weekend at the earliest... Maybe @prashanthjos / @prashanth-chaitanya as the authors of the most likely faulty (by result of bisect) commit mentioned above can help earlier? |
My previous offer to share it with you privately still stands. And as I said in an earlier comment - I strongly suspect this issue will be easy to reproduce using your own collection of JSON and JSON Schemas. Maybe just ensure your test schema includes a couple of nested |
@TJC so please indeed share it privately with me - how do you plan to do it? I can provide you an https upload link to my GMX web space, or you put it onto some location where I can download it via https. I'll immediately send a private e-mail to the adress in your GitHub user profile offering my upload URL. In order to separate credentials from the link for security reasons: the password for the (only privately shared) upload URL contained in the e-mail is "networkNTjson". 😉 |
hi @AndreasALoew, @networknt/json-schema-validator apologies for my late entry into the issue, I have made change to add evaluated and unevaluatedProperties to CollectorContext (Which is thread level context that uses ThreadLocal). Looks like you are repeatedly calling json schema validate in a loop. Can you try resetting the CollectorContext before every call to validate using CollectorContext.getInstance().reset() ? CollectorContext is not automatically reset or removed from ThreadLocal, It was a decision left to the user of the framework deliberately. Please let me know your thoughts, always happy to help. |
Hello @prashanthjos , many thanks for the explanation 😄 My immediate follow-up question to you would be: what is the advantage of not resetting i.e. "reusing" a CollectorContext for two subsequent calls to validate()? If there is no such advantage, wouldn't it make more sense to clean the thread-local CollectorContext in a finally() block immediately before returning from any call to validate()? Or, in case there is a valid scenario, I'd like to propose that we change the default behaviour for the CollectorContext to be cleaned immediately before returning from validate, but we add an additional validate() call passing a boolean parameter that would explicitly tell the framework to keep and NOT clean the CollectorContext at the end of validate(), and thereby leave the decision when to clean it to the caller (as you seem to have intended). What do you think? |
Agree with you @AndreasALoew we can add an additional validate and walk methods that will NOT reset the CollectorContext based on user input. @TJC can you quickly verify if the reset of CollectorContext fixes your issue. |
... such that the default (i.e. if you continue to use the framework like it has been in previous releases) cleans the context, and if you explicitly would like to keep the CollectorContext (still the question is: why would you want do do so - please explain), then this can be expressed by an additional optional boolean parameter as proposed earlier... Big advantage of doing it this way is that it won't break (i.e. cause OutOfMemory) all existing code using the validator... So @prashanthjos , would you be willing to propose such an additional PR? That'd be just great! 👍 |
Information collected in CollectorContext is used by us beyond validation. A simple example could be collecting information about references($ref) and there data while walking over the schema or we may want to store specific node information into our database after we validate, Since the validation already walks over every schema node and its corresponding data node I am out of town till Monday(11th April 2022). I will be happy to create a PR once I am back. In the mean time I would request @TJC to test with CollectorContext reset fix and see if it works. |
Thanks for helping out with that - if you check your email now, you should find details about how to access the heap dump :) |
@TJC can you please confirm here once you test if CollectorContext.getInstance().reset() works. Thanks! |
My demonstration script is calling We deployed some code to production that included several dependency updates, including the minor patch version of the JSON Schema validator. In the middle of the night, the systems started running out of memory, causing us to get people out of bed to investigate the issue.
Normally when a significant breaking change like this is made to software, the major version number gets incremented, so that users of the software know to pay more attention. eg. Changing the version from 1.0.67 to 2.0.0. Because the version just incremented from 1.0.67 to 1.0.68, we thought it only included minor changes and bugfixes, and would not require us to make changes to how we use the software. I also note that the documentation doesn't seem to reflect this new requirement. |
I can confirm that calling reset on the CollectorContext does indeed seem to fix the memory issue. However I would encourage you to make the reset a default experience rather than an optional one, as I'm sure my company is not the only one that will step on this landmine. |
No, it isn't. 😉 |
@TJC sorry for the inconvenience and thank you for confirming. CollectorContext was added long before and it is noted in the documentation that data is stored in ThreadLocal. Please note the issue could be because of the UnevaluatedProperties accumulation in CollectorContext. CollectorContext is being used in this library for several use-cases like adding defaults. Also http servers based on Java Servlets , could tend to reuse same threads for processing requests so the underlying ThreadLocal also might be reused thus not clearing the older CollectorContext from ThreadLocal. It is always recommended to reset the CollectorContext explicitly. Please note another class from the Library ValidatorState is also on ThreadLocal. Also I see there is no documentation for every new keyword that is added. I can add one for this though. I would also recommend testing in your test/perf environments with an approximate load replicating production in future. Yes reset will be a default experience in my PR next week. |
When you say it was always recommended.. Where was that recommendation made? Let me check the documentation again..
|
Anyway, thank you for making a fix. I look forward to it. |
Yes it was supposed to be updated. But however we clearly mentioned that there are ThreadLocal usages in the library. I will try to make it more explicit in the documentation. |
@prashanthjos I can now finally definitely confirm from heap dump analysis that the Memory Leak observed is indeed caused by the "com.networknt.schema.CollectorKey" CollectorContext instance and its "collectorMap" HashMap entry with key "com.networknt.schema.UnEvaluatedPropertiesValidator.EvaluatedProperties" pointing to an ArrayList that grows indefinitely in thread-local storage when executing multiple JSON validations in a loop from the same worker thread. See the following screenshot from YourKit profiler for the details: So I fully second the request to do an implicit default reset of the CollectorContext in a finally block immediately before returning from the (common) default validate (and possibly walk) method(s), and adding an additional variant of the validate (and possibly walk) method with a boolean "reset" flag that can be used for the non-default (uncommon) use case of explicitly wanting to collect CollectorContext information throughout the sequential validation of several JSON documents in a row. Many thanks! 😄 |
FYI. There are some discussions in another thread about how to fix this memory leak issue and proposed design solutions. Please let us know what you think about the solution. |
I have merged @prashanthjos' code to the master branch. With the newly introduced config flag, most users won't need to worry about resetting the context. Advanced users can set the flag to false to allow the context to live across multiple validators. I am wondering if we should update the document for this flag. Also, let me know if we are ready to have a quick release. Thank you everybody for taking the effort to get it implemented. |
Running with the current master version, I'm not seeing memory exhaustion. That's good. |
@stevehu thanking you for merging. I will update the documentation in a couple of days. Thank you @TJC and @AndreasALoew for your patience and testing with master change. |
This issue is resolved in the 1.0.69 release. Thank you everyone who are working hard to get this issue resolved. |
Version 1.0.68 introduces a memory leak.
When called repeatedly, we see the JSON Schema validator consumes all available memory, until it crashes with
java.lang.OutOfMemoryError: Java heap space
The following script demonstrates it (uses Ammonite on JDK 17, Scala 2.13).
Run the script after setting
JAVA_OPTS=-Xmx256M
This assumes you're parsing some fairly large JSON files (500kbyte in my case), and the JSON Schema is of some complexity. I assume you have some examples around like this to use.
Note that if you revert the version back to 1.0.67, the script runs successfully.
The text was updated successfully, but these errors were encountered: