Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coda_expression_fuzzer: Stack-overflow in print_expression #66

Closed
schwehr opened this issue May 7, 2020 · 5 comments
Closed

coda_expression_fuzzer: Stack-overflow in print_expression #66

schwehr opened this issue May 7, 2020 · 5 comments
Milestone

Comments

@schwehr
Copy link
Contributor

schwehr commented May 7, 2020

(gdb) bt
#0  print_expression (expr=<optimized out>, print=<optimized out>, xml=<optimized out>, html=<optimized out>, precedence=<optimized out>)
    at third_party/stcorp_coda/libcoda/coda-expr.c:4196
#1  0x00005555560b9a3b in coda_expression_print (expr=0x0, print=0xffffe9a4b80) at third_party/stcorp_coda/libcoda/coda-expr.c:4876
#2  0x0000555556053145 in LLVMFuzzerTestOneInput (
    data=0x7ffff49dc800 "1--1--1--1--1--1--1--1--1--1--1--1--3--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--0--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1-1--1--1--1--1--1--1--1--1--"..., size=142921) at third_party/stcorp_coda/fuzz/coda_expression_fuzzer.cc:27
AddressSanitizer:DEADLYSIGNAL
=================================================================
==60967==ERROR: AddressSanitizer: stack-overflow on address 0x7ffd986efff8 (pc 0x7faf5348d055 bp 0x7ffd986f0050 sp 0x7ffd986f0000 T0)
    #3 0x5556b767602b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4198:5
    #4 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #5 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #6 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #7 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #8 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #9 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #10 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #11 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #12 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #13 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #14 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
[SNIP]
    #250 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #251 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #252 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #253 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #254 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #255 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13

SUMMARY: AddressSanitizer: stack-overflow 
==60967==ABORTING

One possible solution is to modify print_expression to track depth

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence)
{

Would become

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence, int depth)
{
    if (depth > 32) {
        coda_set_error(CODA_ERROR_INVALID_FORMAT, "Too many sub expressions");
        return -1;
    }
    depth++;

And it would start with things like this:

int coda_expression_print_html(const coda_expression *expr, int (*print) (const char *, ...))
{
    const int depth = 0;
    return print_expression(expr, print, 1, 1, 15, depth);
}

The fuzzer:

int printf_black_hole(const char* fmt, ...) {
  va_list args;
  va_end(args);
  return 0;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  coda_init();
  auto done = absl::MakeCleanup([] { coda_done(); });

  const std::string exprstring(reinterpret_cast<const char *>(data), size);

  coda_expression *expr = nullptr;
  coda_expression_from_string(exprstring.c_str(), &expr);

  if (!expr) return 0;

  coda_expression_print(expr, printf_black_hole);
  coda_expression_delete(expr);

  return 0;
}

testcase-4902997930278912.zip

@svniemeijer
Copy link
Contributor

I am not sure that this is something we can 'resolve' easily in the code. There are no guarantees about the size of the stack. This is purely compiler and os dependent.
Even some arbitrary depth limit of 32 is not guaranteed to solve the problem (with the added downside that it will result in valid expressions to become unprintable, which is definitely a bug).

As far as I know, stack overflow errors are just a thing that can happen when using recursive functions. The only real solution would be to not use recursive functions and rewrite these functions using loops that use the heap.
But is this really worth the effort? How problematic is the raising of a stack overflow error in case of malconstructed input?

@schwehr
Copy link
Contributor Author

schwehr commented May 8, 2020

In my situation, a stack overflow is something I can't allow processes to do. So I have to put a fix in and I'd rather it match what is in the upstream repository. A non-recursive solution isn't critical for the short term. Why not make a solution that will almost always allow the default build to print a bogus expression, but that I can easily override in my build? In my compiler settings, I can just add "-DCODA_MAX_RECURSION=64" (or whatever number works for my memory configuration) e.g.

#ifndef CODA_MAX_RECURSION
// May OOM or stack-overflow with this many levels of recursion
#define CODA_MAX_RECURSION 100000
#endif

and

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence, int depth)
{
    if (depth > CODA_MAX_RECURSION) {
        coda_set_error(CODA_ERROR_INVALID_FORMAT, "too many levels of recursion in expression");
        return -1;
    }
    depth++;

Some of the reasons behind limiting the recursion:

  1. Crashing is worse than returning a cannot finish printing error for most users in my experience
  2. A stack overflow is a security risk
  3. Deep recursion can be used as a denial-of-service (DOS) for servers using coda
  4. Chasing down a bug in production can be really difficult when there are thousands of workers traversing millions of files. Being able to log an error is easier to track than dealing with a stack trace from a crash
  5. In a highly threaded environment, allowing a larger stack size is expensive (e.g. most of my processes have > 50 threads)
  6. Unlimited recursion slows down fuzzer progress for finding other bugs

See also this discussion which is similar:

nlohmann/json#832 (comment)

@svniemeijer
Copy link
Contributor

Ok. I understand the rationale.

The problem is that this is not just about the print_expression() function. It also applies to the evaluation and delete functions for expressions.
So, if we want to limit the depth we should do this in the construction of the expression tree (i.e. somewhere within coda_expression_from_string). I will have a go at this.

@svniemeijer
Copy link
Contributor

Implemented in 2cde1d8

@schwehr
Copy link
Contributor Author

schwehr commented May 10, 2020

Verified

@svniemeijer svniemeijer added this to the 2.22 milestone May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants