-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reimplement PCRE2_UNREACHABLE() assertions with a safer approach #490
base: master
Are you sure you want to change the base?
Conversation
One comment: the
I think the implementation in this PR only meets point 1 but not point 2; but if that is not correct, please mention. |
You are correct, there is no production implementation and therefore there is no optimization hint either (except for DEBUG builds by the mentioned attribute that is expected from abort(), but that of course is implementation dependent). Interestingly enough from all the places where it was used, the only place where it would make a difference was the function fixed in #489, and the difference wouldn't be good for our users. If the assertion was ever reached, and their library was built with a recent enough compiler in an x86 system, they will get a crash, instead of getting an error, because as you clearly pointed out, the "return" call itself would be removed at compile time; was that behaviour what was expected from that "optimization"? |
In answer to that question, what was expected was that if we assert a certain path is unreachable, that path should really be unreachable, and because the compiler is told that is unreachable, it can (in some cases) generate better code. For example, if you tell the compiler that the If the unreachable path is really unreachable, then there will be no crashes in production (because it will never be reached). On the other hand, if we are not able to guarantee, through whatever method, that our "unreachable" code is really unreachable... that is a problem. 😦 Since PCRE2 has significant installed base, it would be nice if the codebase was at a high enough level of quality that we could have full confidence that the "unreachable" paths are really unreachable. In that case, they could safely benefit from However, if the codebase is not at that level of quality yet, then you are very right that In any case, thanks very much for looking into this issue. Another thought: It would be nice if, in production builds, all instances of |
Just one suggestion: I think the assertion failure message in this PR would be easier to understand if it said something like "Execution must not reach this point" or "Execution reached unexpected point". |
Allow showing the internal value for PCRE2_DEBUG in the summary, just like is done for ./configure
The original asserts weren't very useful in debug mode as they were lacking information on where they were being triggered and were also unreliable and dangerous as they could result in important code being removed and trigger crashes. Instead of implementing one generic assert for both modes, build a more useful one for each mode, but to make sure that the non production paths are not being unnecessarily eliminated, allow for a parameter that could be used to indicate if that functionality is desired or not. Reinstate all original assertions to use that instead, and make sure to set the parameter to a safe value in the one that is known to cause problems with the previous code.
Most of the uses of `PCRE2_UNREACHABLE(0)` are at the end of `case` and therefore in non debug builds, the assertion "should" tell the compiler that a "fall back" is not possible, but the version of gcc used in Ubuntu 22.04 has a bug and will instead see the assertion as additional code that doesn't have a `break` after and therefore trigger `-Wimplicit-fallthrough` warnings instead. Update the job to use Ubuntu 24.04 that provides gcc 13.2 and that doesn't have the bug anymore, and while at it update all jobs to error on warnings so that failures will be more visible.
Reaching an unreachable code is always a big problem, since you cannot write tests to see what happens. Hence whatever we do, we probably have issues. I don't mind unreachable in matching code, because its purpose is high performance. For pattern compilation, we could use errors, although the problem is the same: we cannot test them. |
I would prefer no unreachables in production, as they are additional code, their behaviour changes from compiler to compiler, and even between versions of the same compiler and have bugs, but agree that since they are meant to be really unreachable (assuming the original selection was done carefully enough) they could be used to maybe improve performance. Anyway, as requested by Alex updated the series for further discussion. |
Btw I would prefer an assertion before all internal errors. When I develop code, and get those errors, it is too hard to find their sources. |
#endif | ||
|
||
#ifdef PCRE2_DEBUG | ||
|
||
#if defined(HAVE_ASSERT_H) && !defined(NDEBUG) | ||
#include <assert.h> | ||
#define PCRE2_ASSERT(x) assert(x) | ||
#elif defined(HAVE_STDLIB_H) && defined(HAVE_STDIO_H) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this additional checks were apparently introduced by mistake, which is why the final version might also remove them and why they are not being used in the new debug assert for PCRE2_UNREACHABLE() below.
the code was just reformatted so it can be easier to read and also compare with the proposed similar implementation.
Carlo, how would this be:
Just a suggestion. |
Suggestion: Aside from PCRE2_ASSERT, can we add another macro which expands to a failing assertion (abort) in debug builds, but This macro must only be used in functions which return |
|
||
#ifndef PCRE2_UNREACHABLE | ||
#ifdef PCRE2_DEADTRAP | ||
#define PCRE2_UNREACHABLE(d) if (d == 0) PCRE2_DEADTRAP() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. So the parameter to PCRE2_UNREACHABLE indicates whether we have confidence that the code is "really unreachable" or not.
Interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess that is a way to put it, but what it really indicates to me is, "this unreachable should NEVER abort in non debug builds and should NEVER be considered as a hint for code elimination either, it is there ONLY to be used in debug builds to catch bugs, as all assertions should be.
sure, fork away and go ahead, I promise not to rebase this until all discussion is settled and review your code to suggest improvements. note though that having different assertion per type of function is going to be messy, since each function has different ways to report those errors, so IMHO might be easier to open code each as needed using |
Posted mainly for discussion on the implementation details and split on three patches to make discussion easier.
First patch is just a somehow unrelated cleanup that is useful to test the change with
cmake
Second patch shows the proposed changes that implement safe assertions that could be also used in non debug builds as optimization hints, and adds a simpler version of the fix proposed in #489 on top as a prove of concept. Ideally, this series (once agreed upon, fully audited for safety and tested) could be rebased on top of that change.
Third patch is a workaround for a gcc bug that the proposed fix will trigger and that affects CI.