Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/7.0] Fix handling of backtracking stack with some loops #79361

Merged
merged 3 commits into from
Jan 4, 2023

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Dec 7, 2022

Backport of #79353 to release/7.0

/cc @stephentoub

Customer Impact

Some more complicated regex patterns fail to execute properly when using RegexOptions.Compiled or the regex source generator. This can amount to false positive matches, false negative matches, or even unexpected exceptions. This was reported in an issue using a pattern @"(ver\.? |[_ ]+)?\d+(\.\d+){2,3}$". It's a regression between .NET 6 and .NET 7 with RegexOptions.Compiled.

This can affect a pattern containing a greedy loop with:

  • a minimum bound of at least 2
  • no child constructs that backtrack
  • and a child that's more than a one/notone/set (aka things that match a single character)

That loop can end up leaving extraneous state on the backtracking stack when:

  • at least one iteration of the loop successfully matches
  • but not enough iterations match to make the loop successful such that matching the loop fails

In that case, if a previous construct in the pattern pushed any state onto the backtracking stack such that it expects to be able to pop off and use that state upon backtracking to it, it will potentially pop the erroneously leftover state and try to use it.

We already have the ability to remember the backtracking stack position when we initially enter the loop so that we can reset to that position later on. The fix is simply to extend that to also perform that reset when failing the match of such a loop in such circumstances.

Testing

Existing inner and outer loop regex tests, plus new test cases based on the repro case.

Risk

The change is very isolated and relatively straightforward to prove is correct from a reasoned perspective, but this is also a complicated region of code, and given all the possible combinations involved, it's feasible but unlikely something unexpected will be affected. More likely is there are additional such corner cases that we're still missing; this won't fix those, but it won't make them worse.

With both RegexOptions.Compiled and the Regex source generator, Regex greedy loops with
- a minimum bound of at least 2
- no child constructs that backtrack
- and a child that's more than a one/notone/set (aka things that match a single character)

are possibly leaving state on the backtracking stack when:
- at least one iteration of the loop successfully matches
- but not enough iterations match to make the loop successful such that matching the loop fails

In that case, if a previous construct in the pattern pushed any state onto the backtracking stack such that it expects to be able to pop off and use that state upon backtracking to it, it will potentially pop the erroneously leftover state.  This can then cause execution to go awry, as it's getting back an unexpected value.  That can lead to false positives, false negatives, or exceptions such as an IndexOutOfRangeException due to trying to pop too much from the backtracking stack.

We already have the ability to remember the backtracking stack position when we initially enter the loop so that we can reset to that position later on.  The fix is simply to extend that to also perform that reset when failing the match of such a loop in such circumstances.
@ghost
Copy link

ghost commented Dec 7, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #79353 to release/7.0

/cc @stephentoub

Customer Impact

Testing

Risk

IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@stephentoub stephentoub added the Servicing-consider Issue for next servicing release review label Dec 7, 2022
Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and code comments; helped me understand what otherwise would have been obscure to me.

@stephentoub stephentoub added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Dec 8, 2022
This extends the fix made earlier around proper handling of the backtracking stack in EmitLoop.  If this loop is inside of another loop, then we need to preserve the startingStackpos on the backtracking stack upon successfully completing the loop, as the outer loop might iterate and cause another occurrence of this loop to run, which will then overwrite our single startingStackpos local.  If that iteration backtracks, we then need to be able to pop the previous iterations startingStackpos and restore its value.
@stephentoub stephentoub removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Dec 8, 2022
@stephentoub stephentoub added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Dec 8, 2022
@stephentoub stephentoub added this to the 7.0.x milestone Dec 8, 2022
@carlossanlop carlossanlop modified the milestones: 7.0.x, 7.0.3 Jan 4, 2023
@carlossanlop
Copy link
Member

Approved by Tactics (7.0.3).
Signed off by area owners.
CI failure is known/unrelated/already fixed: #78778 .
No OOB changes needed: neither the src nor the gen csprojs have IsPackable=true.
Ready to merge. :shipit:

@carlossanlop carlossanlop merged commit 235508d into release/7.0 Jan 4, 2023
@carlossanlop carlossanlop deleted the backport/pr-79353-to-release/7.0 branch January 4, 2023 21:42
@ghost ghost locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants