Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new API function pcre2_set_optimization() for controlling enabled optimizations #471

Merged
merged 1 commit into from
Sep 21, 2024

Conversation

alexdowad
Copy link
Contributor

It is anticipated that over time, more and more optimizations will be added to PCRE2, and we want to be able to switch optimizations off/on, both for testing purposes and to be able to work around bugs in a released library version.

The number of free bits left in the compile options word is very small. Hence, we will start putting all optimization enable/disable flags in a separate word. To switch these off/on, the new API function pcre2_set_optimization() will be used.

The values which can be passed to pcre2_set_optimization() are different from the internal flag bit values. The values accepted by pcre2_set_optimization() are contiguous integers, so there is no danger of ever running out of them. This means in the future, the internal representation can be changed at any time without breaking backwards compatibility. Further, the 'directives' passed to pcre2_set_optimization() are not restricted to control a single, specific optimization. As an example, passing PCRE2_FULL_OPTIMIZATION will turn on all optimizations supported by whatever version of PCRE2 the client program happens to be linked with.

@alexdowad
Copy link
Contributor Author

There are obviously a couple of things missing here (notably, documentation), but I am pushing it to allow for review and comments.

@alexdowad
Copy link
Contributor Author

One thing I would like to ask: Are there any other existing option bits which are used to turn optimizations on/off? Is it just the 3 which I have already identified?

src/pcre2test.c Outdated Show resolved Hide resolved
src/pcre2test.c Outdated Show resolved Hide resolved
src/pcre2_match.c Show resolved Hide resolved
@@ -646,6 +647,7 @@ typedef struct pcre2_real_code {
uint16_t top_backref; /* Highest numbered back reference */
uint16_t name_entry_size; /* Size (code units) of table entries */
uint16_t name_count; /* Number of name entries in the table */
uint32_t optimization_flags; /* Optimizations enabled at compile time */
Copy link
Contributor

@carenas carenas Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this code, but this shouldn't be needed here, and indeed the fact that it is needed so that PCRE2_NO_START_OPTIMIZE doesn't break is telling us IMHO there is a problem somewhere else.

this is also why I didn't like the idea of getting all optimizations together, since there are a lot more but those 3 (and JIT) are the only ones that are externally visible because have specific flags to disable them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's more than one way to look at this issue. In any case, any specific suggestion for improvement would be welcome.

src/pcre2_internal.h Outdated Show resolved Hide resolved
src/pcre2_context.c Outdated Show resolved Hide resolved
src/pcre2_compile.c Outdated Show resolved Hide resolved
src/pcre2_compile.c Outdated Show resolved Hide resolved
src/pcre2.h.generic Show resolved Hide resolved
src/pcre2.h.generic Outdated Show resolved Hide resolved
@alexdowad
Copy link
Contributor Author

Philip, Carlo, Zoltan: As usual, the code review on this project is great!
Thanks very much... I will go through the comments one by one now.

@alexdowad
Copy link
Contributor Author

Pushing an update which addresses many of the points raised by reviewers.
I would love to ask once more: Do we have any other existing toggles which turn optimizations on/off? Is it just the 3 which I have included already?

src/pcre2.h.generic Outdated Show resolved Hide resolved
@zherczeg
Copy link
Collaborator

Conversion is perfect, since it happens internally. During compile, the optimizations flags, which are already present as compile flags are "or'ed" to compile flags, so whichever is set, it activates the optimization (they don't need to have the same index). This should be easy since only a few flags are affected. For these optimizations the already present infrastructure will handle them (the optimization flags can be overwritten by compile flags). For newer optimizations, the optimization flagset is directly used.

@alexdowad
Copy link
Contributor Author

Just a couple of thoughts here...

Right now, the default optimization behavior of PCRE2 is "enable all optimizations by default, disable selectively using option bits". For various reasons, it might become desirable to deviate from this behavior in the future.

Reason 1: There is more than one metric which can be optimized for. It's possible to optimize for small memory footprint at the cost of CPU time, or vice versa. Also, it's possible to optimize for low average CPU time at the cost of worst-case CPU time, or vice versa.

Reason 2: There may be certain "dangerous" optimizations, which improve performance at the cost of disabling certain features, or causing observable behavior changes. (For example: While DFA-based matching is desirable for consistent worst-case performance, the documentation for pcre2_dfa_match states that some features may behave slightly differently as compared to pcre2_match.)

For both of these reasons, I don't think we can assume that the default behavior will always be to enable all optimizations. Sooner or later, some may need to be opt-in rather than opt-out.

For this reason, I am thinking of adding a directive called PCRE2_OPTIMIZATION_DEFAULT which will restore the "default" settings, whatever those are. For now, PCRE2_OPTIMIZATION_DEFAULT would effectively be the same as PCRE2_OPTIMIZATION_FULL, but that might not be true forever.

Any comments?

src/pcre2.h.in Outdated Show resolved Hide resolved
@alexdowad
Copy link
Contributor Author

OK, latest contributions from Carlo and Zoltan have been incorporated.

I hope to add tests and documentation tomorrow.

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few questions.

@@ -11005,7 +11051,7 @@ used in this code because at least one compiler gives a warning about loss of
"const" attribute if the cast (PCRE2_UCHAR *)codestart is used directly in the
function call. */

if (errorcode == 0 && (re->overall_options & PCRE2_NO_AUTO_POSSESS) == 0)
if (errorcode == 0 && (optim_flags & PCRE2_OPTIM_AUTO_POSSESS))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many cases the == 0 and != 0 are removed. Is this a new coding style? We should eventually update all cases them, except for pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just trying to make the code shorter and didn't see any reason to include != 0. However, I can add it back, that isn't a problem.

I am a new contributor to this project and don't know all the local conventions yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of my "fetishes" -- I only ever use if (x) when x is a Boolean; if it's not I use if (x != 0) to emphasize that it is a numerical value. Similar to my use of if (pointer != NULL) to emphasize that the variable is a pointer (though in principle, NULL need not be zero, so this is more correct C).

src/pcre2_compile.c Show resolved Hide resolved
@PhilipHazel
Copy link
Collaborator

Alex wrote:

I wonder if it might not be a good idea to have some unit tests which are written in C, rather than as pcre2test scripts?? Just a thought...

I have no objection - we do already have pcre2_posix_test.c and pcre2_jit_test.c, but remember that any such program has to work with every possible combination of 8, 16, or 32 bit libraries.

@carenas
Copy link
Contributor

carenas commented Sep 15, 2024

For this reason, I am thinking of adding a directive called PCRE2_OPTIMIZATION_DEFAULT which will restore the "default" settings, whatever those are. For now, PCRE2_OPTIMIZATION_DEFAULT would effectively be the same as PCRE2_OPTIMIZATION_FULL, but that might not be true forever.

I don't see a use for this and even believe it is likely to cause confusion. The default is currently all on and when that changes you will still use PCRE2_OPTIMIZATION_FULL to override that default if you so desire so.

This API call (just like the other pcre2_set_* calls) are meant to be called only in a context that hasn't been used to compile the expression you want to affect and that will be followed by a pcre2_compile() call that uses that context. For the optimization flags, that compilation might disable some of them (using VERBS in the expression), but that context is done. Having that option (specially as some optimizations have effect at match time) might seem to imply that you can use it to affect the compiled expression optimizations further IMHO.

On second thought, if you are concerned about reusing the context WHEN the default is no longer full, it might be better to wait until then to add this IMHO.

@alexdowad
Copy link
Contributor Author

OK, well, we have tests now.

Next is to add documentation.

@carenas
Copy link
Contributor

carenas commented Sep 16, 2024

OK, well, we have tests now.

The code is now using the new field, so why is pcre2test still setting the bits in the pcre2_compile() option parameter instead of calling the new API? for the original modifiers?

Sure; we can only do the API call if we have a context, so to test the "legacy" codepath need to probably piggyback on the null_context modifier, and we need a way to test turning off all optimizations so there is a need for the optimization_none modifier, but something more generic like optimization=0 might work better long term IMHO.

Still think that an API with an extra BOOL parameter to turn on/off the optimization makes more sense as well.

@alexdowad
Copy link
Contributor Author

The code is now using the new field, so why is pcre2test still setting the bits in the pcre2_compile() option parameter instead of calling the new API?

PCRE2 needs to both support the new pcre2_set_optimization API, as well as the legacy option bits, so pcre2test now has modifiers which allow us to test both of those.

@zherczeg
Copy link
Collaborator

We should also test combinations, when a compile and optimization flag contradict to each other (compile flag should be stronger).

Since we might need to add lots of test for different optimizations in the future, we could even start a new input/output file for them.

@carenas
Copy link
Contributor

carenas commented Sep 16, 2024

now has modifiers which allow us to test both of those.

I wasn't arguing about having more modifiers, although was explaining that we could do with almost none by abusing what we currently have, but that the additional modifiers are only used sparingly and therefore the new code is barely exercised. But changing what the current modifiers do testing will be more comprehensive, and of course adding tests that go through the "legacy" path was also expected.

@alexdowad
Copy link
Contributor Author

We should also test combinations, when a compile and optimization flag contradict to each other (compile flag should be stronger).

Ah, good point. Let me add that right now.

@alexdowad
Copy link
Contributor Author

But changing what the current modifiers do testing will be more comprehensive, and of course adding tests that go through the "legacy" path was also expected.

Ah, true. That is a good point.

Hmm. I do think that for PCRE2 developers, it makes things a bit easier if the pcre2test modifiers have the same name as the corresponding option, and the names of the pcre2_set_optimization() directives are currently like PCRE2_AUTO_POSSESS_OFF...

Hmm. 🤔

@alexdowad
Copy link
Contributor Author

Ah, good point. Let me add that right now.

Done, you'll see it on the next push.

@alexdowad
Copy link
Contributor Author

But changing what the current modifiers do testing will be more comprehensive

Just thinking about this...

The tests do demonstrate that pcre2_set_optimization turns the optimization flags on and off as expected. Since the code for pcre2_set_optimization is almost trivially simple, is that not enough?

I think what needs more thorough testing is that the actual optimizations behave as expected and do not break anything. But we already have existing tests for that. Right now we are just testing a different way of switching them on and off.

@alexdowad
Copy link
Contributor Author

Documentation changes. I have not scrutinized these in detail, but what I noticed is that you are changing the overall style of the documentation...

Thanks for the feedback. 👍 Please check the update which I have just pushed.

@PhilipHazel
Copy link
Collaborator

On a fairly quick read (I know we want to get on with this) that looks good. Minor style nitpick is that you refer to pcre2_match rather than pcre2_match(). Are we now ready to merge this PR?

Copy link
Collaborator

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good patch, only a few minor things remained.

src/pcre2_context.c Show resolved Hide resolved
src/pcre2_context.c Show resolved Hide resolved
doc/html/pcre2api.html Show resolved Hide resolved
@alexdowad
Copy link
Contributor Author

On a fairly quick read (I know we want to get on with this) that looks good. Minor style nitpick is that you refer to pcre2_match rather than pcre2_match().

Thanks for noting that. Fixed.

@alexdowad
Copy link
Contributor Author

I know we want to get on with this

Yes, we do want to get on with this, but to the extent possible, we also want to get the API "right" the first time rather than making breaking changes later. So by all means, let's invite more comments from reviewers especially if they concern the design of the API, as visible to client programmers.

Non-breaking improvements to the implementation can be done any time.

doc/html/pcre2api.html Show resolved Hide resolved
backtracks into a+ that can never be successful. However, if callouts are in
use, auto-possessification means that some callouts are never taken. You can
set this option if you want the matching functions to do a full unoptimized
search and run all the callouts, but it is mainly provided for testing
purposes.
purposes. (It is recommended to use <b>pcre2_set_optimize</b> instead.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mention the directive to use as well, and probably also mention that if both are given this one takes precedence.

the message should also be in its own paragraph instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why I can't add a reply to the above comment about PCRE2_ERROR_BADOPTION. Anyways, what I would like to say is that, while the other "set" functions return PCRE2_ERROR_BADDATA on invalid input, I feel that the symbolic name BADDATA does not describe the actual error well in this case. For other "set" functions it does make sense; for example, if the value passed for "maximum heap size" or "maximum variable lookbehind length" is invalid, we could (in natural human language) describe that as "bad data". In this case, my preferred name for the error code would be something like PCRE2_UNKNOWN_DIRECTIVE. Do you think it would be a good idea to add a new error code for this reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion re: PCRE2_ERROR_NULL is fine and I have implemented it.

doc/html/pcre2api.html Show resolved Hide resolved
src/pcre2_context.c Show resolved Hide resolved
src/pcre2_dfa_match.c Outdated Show resolved Hide resolved
src/pcre2_jit_compile.c Outdated Show resolved Hide resolved
src/pcre2_match.c Outdated Show resolved Hide resolved
@@ -3925,6 +3934,21 @@ for (;;)
else *((uint32_t *)field) |= m->value;
break;

case MOD_OPTMZ:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there is a macro to help with these code, but agree is not needed if it is only called once (which I was hoping would change, though)

testdata/testinput2 Show resolved Hide resolved
@alexdowad
Copy link
Contributor Author

Converted pcre2_set_optimize to use switch as suggested by Carlo.

@alexdowad
Copy link
Contributor Author

Just pushed another update.

doc/pcre2api.3 Show resolved Hide resolved
doc/pcre2api.3 Outdated Show resolved Hide resolved
doc/pcre2pattern.3 Outdated Show resolved Hide resolved
src/pcre2_compile.c Outdated Show resolved Hide resolved
src/pcre2_compile.c Outdated Show resolved Hide resolved
@@ -10432,6 +10439,18 @@ if (patlen > ccontext->max_pattern_length)
return NULL;
Copy link
Contributor

@carenas carenas Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic above that restrict PCRE2_LITERAL needs to be also likely updated for when the optimizations are disabled through the new API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So it shouldn't be allowed to enable/disable optimizations when the pattern is literal? But some optimizations, like START_OPTIMIZE, still make sense on a literal pattern.

I appreciate the suggestion but don't see a good reason right now to make a change in this area.

Copy link
Contributor

@carenas carenas Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it shouldn't be allowed to enable/disable optimizations when the pattern is literal?

I think you misread the code; it is more like you can't use the PCRE2_LITERAL option unless those two optimizations are enabled. Not sure if there might be another similar restrictions but @PhilipHazel could probably explain the rationale for this one better, or if there are any other.

don't see a good reason right now to make a change in this area.

It would be safer if you do IMHO, specially considering that the "legacy" modifiers still set the "legacy" bits ONLY, so we will need to construct a new test to exercise this new codepath an therefore there is nothing in the current tests that would cover it, even if it is obviously broken, as this (and their equivalent for the other optimization) should behave the same and don't:

PCRE2 version 10.45-DEV 2024-06-09 (8-bit)
  re> //literal,no_auto_possess
Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL
  re> //literal,auto_possess_off
data> 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding further detail. I will read the relevant section of code again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I looked at the code again and it still seems that my first reading was correct.

PUBLIC_COMPILE_OPTIONS lists all the bits which can normally be set in options; but if PCRE2_LITERAL is set, then only the bits in PUBLIC_LITERAL_COMPILE_OPTIONS can be set, or else compilation fails with an error. PCRE2_NO_AUTO_POSSESS and PCRE2_NO_DOTSTAR_ANCHOR are not in PUBLIC_LITERAL_COMPILE_OPTIONS, so those two options cannot be used to disable optimizations for literal patterns.

Doubtless this is because the 2 optimizations concerned do not apply to literal patterns anyways, so it doesn't make much sense to specifically disable them in that case.

Again, this raises the question: should we block users from enabling/disabling certain optimizations in the compile context for literal patterns? And again, my opinion is: no, I don't think there is a good reason to do that.

One reason is because the compile context can be re-used for multiple calls to pcre2_compile(). It might happen that some of those calls are for literal patterns, while others are not.

So again, from my viewpoint, this is a desired difference in behavior between the legacy option bits and the new API function. However, if other key contributors agree that the use of pcre2_set_optimize() should be restricted if the compile context is subsequently used for a literal pattern... well and good, I can implement it in that case.

Copy link
Contributor

@carenas carenas Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The seeming "inconsistency" in this case has already been looked

My bad, should had used a different word; the "inconsistency" I am referring to, is that the code and documentation don't match, let me know if you need me to provide a patch to help drive this discussion.

Zoltan's suggestion is the same as the current behavior of this PR.

I know, and that is why I mentioned it as the first option that we could take.
We could also decide on a combination of them, for example letting the PCRE2_NO_DOTSTAR_ANCHOR to not fail pcre2_compile(), is also a very minimal change and apparently safe, from what you reported earlier. Eitherway all the assumptions we are making should be documented.

Tests have already been added confirming that the new API does not cause an error

A test that shows the error from pcre2_compile() while using the "legacy" modifiers as I proposed in the original comments would be nice as well, since we are "committing" to that as the expected behaviour going forward.
Tests that show possible edge cases reflected from your auditing of the code even better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carlo, I'm happy to update the documentation, but I'm not sure which part is currently not matching the code. Can you clarify?

I will add tests showing the error when the legacy modifiers are used.

Copy link
Contributor

@carenas carenas Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new optimization flags should not change this. I.e. if an optimization is not allowed in LITERAL, that optimization is ignored in the new option flags.

Instead of ignoring to disable the optimization, it is actually disabling it (even if it is known to conflict with LITERAL)

@alexdowad: could this be corrected and documented?

FWIW the test part of that documentation should show (maybe using BI modifiers, as it is done for other cases) that the optimization isn't disabled when the literal modifier is used as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carlo, thanks for explaining further what you meant. Further tests have been added showing what happens when the legacy options are used.

I'm glad that you are carefully checking what is needed for documentation. I wholeheartedly agree that for a library like PCRE2, documentation is of utmost importance. We both feel the same on that issue. 👍🏻

For the documentation, I can say there is nothing which needs to be added regarding this issue. I hope this does not give the (false) impression that I want to push this PR through quickly, while sweeping issues under the rug; that is not the case. I'm not sure to what extent you understand the auto-possess and dotstar anchor optimizations, but please bear with me as I explain these. I believe that if the following points are understood, we will be able to successfully conclude this discussion, with everyone on the same page.

Let's start with AUTO_POSSESS. What does this optimization do? It takes a regex like /x+y/ and converts it to /x++y/. This does not help performance when the regex matches, but when it doesn't match, the amount of useless backtracking is reduced, so the match can fail faster and move on to another position.

Now, could that optimization ever apply to a literal regex? No, it is impossible! Why? Because a literal regex can never contain any quantifiers! So how would it be possible to convert quantifiers to possessive ones? Remember, a literal regex can only contain literal characters. The regex /x+y/, if compiled with PCRE2_LITERAL, will just match a literal x, a literal +, and a literal y.

How about DOTSTAR_ANCHOR? This optimization takes a regex like /.*abc/ and marks it as matching only at the beginning of a line. Then the regex engine won't try matching it in the middle of a line; it will skip to the beginning of the next line instead.

Could that ever apply to a literal regex? No, it is impossible! But why? Think about it: a literal regex can never contain META_DOT (it can only contain a literal period). And a literal regex can never include the quantifier star either (it can only contain a literal asterisk). So how could dotstar anchoring ever be done to a literal regex? It cannot!

So both AUTO_POSSESS and DOTSTAR_ANCHOR, regardless of whether they are enabled or disabled, could never have any effect whatsoever on a literal regex. They are completely just no-ops in that case.

Does it make more sense now why pcre2_compile() does not allow the PCRE2_NO_AUTO_POSSESS and PCRE2_NO_DOTSTAR_ANCHOR options to be used for literal patterns? It is because these options are useless on literal regexes.

Does it also make more sense now why pcre2_set_optimize() allows PCRE2_{AUTO_POSSESS,DOTSTAR_ANCHOR}{,_OFF} regardless of whether we are compiling literal regexes or not? It is because the pcre2_compile_context can be re-used multiple times, and even if it is used to compile a literal regex now, the same compile context might be used for a non-literal regex next time.

Note this point: The options argument to pcre2_compile fundamentally applies to one compilation only. However, the directive argument to pcre2_set_optimize applies to a series of one or more compilations. So it is not surprising that while certain useless combinations are restricted from appearing in options, directive is handled a bit differently.

The documentation already accurately explains that only a few options may be used together with PCRE2_LITERAL, not including PCRE2_NO_{AUTO_POSSESS,DOTSTAR_ANCHOR}. That is good! At the same time, the documentation does not say anything about a relationship between pcre2_set_optimize() and PCRE2_LITERAL, because no such relationship exists. That is correct, and intended.

Hopefully it is clearer now that pcre2_set_optimize is not "disabling the optimization rather than ignoring to disable it". It is also not true that these optimizations "are known to conflict with PCRE2_LITERAL".

Carlo, your code review on this PR has been great! While this particular issue was a "red herring" (meaning something which was distracting and not relevant), much of your other feedback was excellent. I can only say "2 thumbs up". 👍🏻 👍🏻

Copy link
Contributor

@carenas carenas Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and indeed (again) kudos in a very good "first" contribution.

I don't think it is a "red herring" though, as we both missed the point Zoltan was making originally and that implied the code also needed an update. Note that the proposed code changes are on pcre2_compile() so I hope it helps clarify what the intended behaviour is and why "allowing" or not to disable those two optimizations in pcre2_set_optimize() was never a question.

To make discussion on the specifics of that easier had prepared it as alexdowad#8, which completes the suggested behavior in the code and documents it so users of the new API aren't surprised by the "apparent" difference of behaviour between the legacy and new entrypoints.

I agree with you that the behaviour might be different, but must be documented somewhere.

Also agree with you that the code might be able to work with or without the optimization anyway, but since the code without the optimizations was blocked by the pcre2_compile() error, then making sure we keep using the same code paths for the new API, only makes this proposed API "migration" safer and allow us to merge it now without further delay or the need of additional code audits in the current code.

@alexdowad
Copy link
Contributor Author

Pushed another update.

@alexdowad
Copy link
Contributor Author

Can others share any further comments on what error code to return from pcre2_set_optimize() if an invalid directive is passed? Right now it's PCRE2_ERROR_BADOPTION. Should I switch to another one, or create a new error code specifically for this case?

@zherczeg
Copy link
Collaborator

https://github.com/PCRE2Project/pcre2/blob/master/src/pcre2_match.c#L6764

PCRE2_ERROR_BADOPTION means the option is unknown, although the name is a bit strong. PCRE2_ERROR_NULL is returned if the pattern is not set. That means the current return values are ok for me.

For the function body, the large switch does not look nice for me. For generic options, such as FULL, the switch is perfect, but for option bits, I liked the old code (which could be placed in the default case, and returns with BADOPTION if the bit is not known).

As for LITERAL, it is ok if the compile limits the flags (we should not change this), but for compile context, which is a generic configuration store, there should be no limitations. The unrelated bits can be simply ignored. This is true for other members: if malloc is not needed by a function, it simply does not use it.

@alexdowad
Copy link
Contributor Author

Just updating code to match Zoltan's preference on the use of switch...

@alexdowad
Copy link
Contributor Author

Pushed another update.

@carenas
Copy link
Contributor

carenas commented Sep 19, 2024

Can anyone see anything else which has been missed??

A minor issue and easy to miss is that the generated html documentation is not linked correctly; had prepared alexdowad#7 where the first commit show the missing bits that could be cherry-picked over to fix it.

That branch is a little old, and so the continuous rebasing of this PR makes it an odd target for a simple merge/rebase but hope you find also the other commits useful otherwise.

… optimizations

It is anticipated that over time, more and more optimizations will be
added to PCRE2, and we want to be able to switch optimizations off/on,
both for testing purposes and to be able to work around bugs in a
released library version.

The number of free bits left in the compile options word is very small.
Hence, we will start putting all optimization enable/disable flags in
a separate word. To switch these off/on, the new API function
pcre2_set_optimization() will be used.

The values which can be passed to pcre2_set_optimization() are
different from the internal flag bit values. The values accepted by
pcre2_set_optimization() are contiguous integers, so there is no
danger of ever running out of them. This means in the future, the
internal representation can be changed at any time without breaking
backwards compatibility. Further, the 'directives' passed to
pcre2_set_optimization() are not restricted to control a single,
specific optimization. As an example, passing PCRE2_OPTIMIZATION_FULL
will turn on all optimizations supported by whatever version of
PCRE2 the client program happens to be linked with.

Co-Authored-By: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Co-Authored-by: Zoltan Herczeg <hzmester@freemail.hu>
@alexdowad
Copy link
Contributor Author

Incorporated the addition to index.html (sorry that I missed that point) and some other helpful additions from Carlo. Pushed another update.

@PhilipHazel
Copy link
Collaborator

Is this ready to go now?

@zherczeg
Copy link
Collaborator

It is good for me. If something comes up, we can fix it later. Before the next release, we can even change the api if needed, but I think we discussed everything.

@carenas
Copy link
Contributor

carenas commented Sep 20, 2024

I think we discussed everything

I think that considering there are almost 150 comments, we probably discussed too much, but also run the risk of having overlooked, misunderstood or not resolved some of the points made as well.

Merging it now, will hopefully help having a less volatile base to work on though, and hopefully will lead to less clutter in the communication about them.

@alexdowad
Copy link
Contributor Author

Is this ready to go now?

Yes, it's ready.

@PhilipHazel PhilipHazel merged commit b868b41 into PCRE2Project:master Sep 21, 2024
13 checks passed
PhilipHazel added a commit that referenced this pull request Sep 21, 2024
@alexdowad alexdowad deleted the optflags branch September 21, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants