Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

Closed
wants to merge 26 commits into from
Closed

[Oss-Fuzz] Adding fuzzer-stl_operations #2160

wants to merge 26 commits into from

Conversation

tanuj208
Copy link
Contributor

@tanuj208 tanuj208 commented Jun 3, 2020

I went through the codebase and added a new fuzz target - fuzzer-stl_operations to improve the coverage of the existing fuzz targets.

This fuzz target tests the following functionality -

  • conversion from various STL containers (vector, deque, list, forward_list, sets and maps) to json
  • Various looping methods in json vectors and json maps
  • get() and at() methods for value access
  • push_back() and emplace_back() methods for json vectors
  • key() and value() methods for accessing elements inside json maps
  • parsing directly from vectors

By doing so, the coverage goes up by
Line coverage: 284 lines
Function coverage: 31 functions
Region coverage: 89 regions

Coverage report

To get this coverage report, I did the following:

  • Forked this repo, added the new fuzz target & updated Makefiles
  • Cloned google'e repo - oss-fuzz, changed the destination of json's repo to my forked repo in dockerfile and used their script helper.py to build fuzzer.
  • Downloaded the test corpus from oss-fuzz for existing fuzz targets.
  • Ran the new fuzz target for approximately 30-40 minutes and ran the same helper.py script to get coverage stats.

Let me know if there is any incorrect usage of the API

@tanuj208 tanuj208 requested a review from nlohmann as a code owner June 3, 2020 19:35
@coveralls
Copy link

coveralls commented Jun 3, 2020

Coverage Status

Coverage remained the same at 100.0% when pulling 0afebec on tanuj208:develop into 7444c7f on nlohmann:develop.

@nlohmann
Copy link
Owner

nlohmann commented Jun 4, 2020

I may not understand the issue completely, but that change does not really increase the coverage of the parsers, but only adds some more constructors. In the end, any JSON input will be processed by the same parser - regardless whether it was read from a file, a string, a vector, etc.

How did you come up with the coverage numbers? Do you have insights into fuzzing? I never digged too deep into this.

@FrancoisChabot
Copy link
Contributor

FrancoisChabot commented Jun 4, 2020

I do think that these targets bring a tiny bit of value because the parser is technically a different block of code in each of them. @nlohmann is correct that they will never catch programmer errors within the library's codebase that will not be caught by the existing fuzz tests. However, it's theoretically possible to run into a compiler bug that would exhibit itself through the interaction of a complex-ish iterator implementation and the parser.

Whether this is something worth throwing compute at is debatable considering how unlikely such a scenario is.

Regardless of all that, these coverage numbers are "slightly" misleading since the tests causes additional template instantiations that are never-ever going to be seen in the wild. The ordered containers: vector, list, deque, forward_list I can buy into. But the set ones are just nonsensical.

Edit: wait, I'm suddenly confused. From the title of this PR, I thought this was fuzzing the parser, but it makes no sense since parsing non-contiguous containers is not merged into the develop branch yet (#2145). Attempting the fuzz the parser on any of these except for std::vector<> should cause a compile error.

This is fuzzing the adl_serializer<>, which is a different thing from the parser. I don't think you are fuzzing what you think you are fuzzing. Either that, or you need to rename things.

@nlohmann
Copy link
Owner

nlohmann commented Jun 5, 2020

So far, all value from the fuzzers were to detect issues during parsing. I do not see value in extending the fuzzing to the input adapters, because they can be unit tested 100%. Though, technically, we would just wasting Google's resources here, I'd rather improve the coverage of the existing fuzz targets if it is not 100% yet.

@tanuj208 tanuj208 marked this pull request as draft June 5, 2020 15:38
Tanuj Garg and others added 24 commits June 7, 2020 14:46
@tanuj208 tanuj208 changed the title [Oss-Fuzz] Adding parse_stl_fuzzer [Oss-Fuzz] Adding fuzzer-stl_operations Jun 7, 2020
@tanuj208
Copy link
Contributor Author

tanuj208 commented Jun 7, 2020

@nlohmann and @FrancoisChabot thank you very much for your comments. I apologize for the wrong and confusing name of the fuzz target and lack of explanation. I have updated this pull request with more explanation about the functions that this fuzz target tests. I have also added how did I come up with the coverage stats.

I am an intern at Google and writing/improving fuzz target for this repo is a part of my internship project. I learned about fuzzing only a few days back.

I got the coverage stats for each of the functions in this fuzz target, and I found out that the improvement in coverage because of parsing or direct conversion from STL containers was very little, like you mentioned. The main improvement came because of loops in JSON map & JSON vectors, and because of methods like push_back and get.

I request you to please go through the pull request again and decide if it is of value.

@tanuj208 tanuj208 marked this pull request as ready for review June 7, 2020 11:28
@nlohmann
Copy link
Owner

nlohmann commented Jun 8, 2020

@tanuj208 That is cool to hear! I really appreciate the effort Google already put in checking this project with OSS-Fuzz.

I have a question: How exactly is the project integrated into OSS Fuzz? Do you just execute make fuzzers and then execute the generated binaries? I'm always afraid to break anything, hence I never touched these lines. It would be great to know more about this so I can document this properly for the future.

@FrancoisChabot
Copy link
Contributor

FrancoisChabot commented Jun 8, 2020

I'm going to reiterate (and hopefully better reword) my concern that I seriously suspect this new fuzzer does not fuzz what you think it fuzzes:

If I compile and run the following:

#include "nlohmann/json.hpp"
#include <iostream>

int main() {
    std::string data = "[1, 3, 4]";
    std::vector<uint8_t> vec(data.begin(), data.end());

    // parsing from STL containers
    nlohmann::json j_vector(vec);

    std::cout << j_vector << "\n";
}

The result is:

[91,49,44,32,51,44,32,52,93]

That's because initializing a json object with a vector of characters does not parse these characters, it creates a json list with one number entry per character. The values in the data are not interpreted, just stored as-is inside the json object.

Changing the contents of the vector (and the same goes for any stl container passed this way) cannot change which code paths are executed, and that makes it a bad candidate for fuzzing.

Now. if you did: auto j_vector = nlohmann::json::parse(vec); Then that would be a whole different matter, and fuzzing would make a lot of sense, since the data in the vector is actually interpreted.

@nlohmann
Copy link
Owner

nlohmann commented Jun 8, 2020

I agree with @FrancoisChabot - however, exactly this check (parsing from a vector of bytes from OSS Fuzz) is already performed by parse_afl_fuzzer.

Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fuzzing makes no sense like this.

std::unordered_multiset<uint8_t> umultiset(data, data + size);

// parsing from STL containers
json j_vector(vec);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constructor call, not a parse call. Instead of interpreting the bytes in the containers to create a JSON value, the containers are interpreted as JSON array. As such, no real library code is tested, but just copy/move constructors from STL containers.

@tanuj208
Copy link
Contributor Author

tanuj208 commented Jun 9, 2020

@nlohmann and @FrancoisChabot, I understand what you were trying to explain and why the fuzz target I wrote cannot change code paths with different inputs. I will close this pull request for now. I am very thankful for all the quick replies.

@nlohmann regarding your question about OSS Fuzz, @oliverchang has more context, I will let him respond.

@nlohmann
Copy link
Owner

nlohmann commented Jun 9, 2020

If there is any way to improve the current fuzzing, e.g. with a grammar or so, please let us know!

@FrancoisChabot
Copy link
Contributor

A huge +1 to adding a grammar to the fuzzer as per https://github.com/google/AFL#9-fuzzer-dictionaries, especially considering afl appears to come pre-packaged with a json dictionary.

That being said, this is a property of how the fuzzer is invoked rather than how it is built, which seems to be external to this repo.

@oliverchang
Copy link

@tanuj208 That is cool to hear! I really appreciate the effort Google already put in checking this project with OSS-Fuzz.

I have a question: How exactly is the project integrated into OSS Fuzz? Do you just execute make fuzzers and then execute the generated binaries? I'm always afraid to break anything, hence I never touched these lines. It would be great to know more about this so I can document this properly for the future.

Json is integrated into OSS-Fuzz here. We do essentially just execute make fuzzers.

@tanuj208
Copy link
Contributor Author

After adding the JSON dictionary, the coverage did not increase since fuzzer was running for a long time and after trying millions of random inputs, it picked up the JSON format by itself. But since it is good to have a dictionary, I added that in oss-fuzz repository. It should get merged soon.

@tanuj208 tanuj208 closed this Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants