Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of backtracking stack with some loops #79353

Merged
merged 1 commit into from
Dec 7, 2022

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Dec 7, 2022

With both RegexOptions.Compiled and the Regex source generator, Regex greedy loops with

  • a minimum bound of at least 2
  • no child constructs that backtrack
  • and a child that's more than a one/notone/set (aka things that match a single character)

are possibly leaving state on the backtracking stack when:

  • at least one iteration of the loop successfully matches
  • but not enough iterations match to make the loop successful such that matching the loop fails

In that case, if a previous construct in the pattern pushed any state onto the backtracking stack such that it expects to be able to pop off and use that state upon backtracking to it, it will potentially pop the erroneously leftover state. This can then cause execution to go awry, as it's getting back an unexpected value. That can lead to false positives, false negatives, or exceptions such as an IndexOutOfRangeException due to trying to pop too much from the backtracking stack.

We already have the ability to remember the backtracking stack position when we initially enter the loop so that we can reset to that position later on. The fix is simply to extend that to also perform that reset when failing the match of such a loop in such circumstances.

Fixes #79298, but I think we should backport this to release/7.0.

With both RegexOptions.Compiled and the Regex source generator, Regex greedy loops with
- a minimum bound of at least 2
- no child constructs that backtrack
- and a child that's more than a one/notone/set (aka things that match a single character)

are possibly leaving state on the backtracking stack when:
- at least one iteration of the loop successfully matches
- but not enough iterations match to make the loop successful such that matching the loop fails

In that case, if a previous construct in the pattern pushed any state onto the backtracking stack such that it expects to be able to pop off and use that state upon backtracking to it, it will potentially pop the erroneously leftover state.  This can then cause execution to go awry, as it's getting back an unexpected value.  That can lead to false positives, false negatives, or exceptions such as an IndexOutOfRangeException due to trying to pop too much from the backtracking stack.

We already have the ability to remember the backtracking stack position when we initially enter the loop so that we can reset to that position later on.  The fix is simply to extend that to also perform that reset when failing the match of such a loop in such circumstances.
@ghost
Copy link

ghost commented Dec 7, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

With both RegexOptions.Compiled and the Regex source generator, Regex greedy loops with

  • a minimum bound of at least 2
  • no child constructs that backtrack
  • and a child that's more than a one/notone/set (aka things that match a single character)

are possibly leaving state on the backtracking stack when:

  • at least one iteration of the loop successfully matches
  • but not enough iterations match to make the loop successful such that matching the loop fails

In that case, if a previous construct in the pattern pushed any state onto the backtracking stack such that it expects to be able to pop off and use that state upon backtracking to it, it will potentially pop the erroneously leftover state. This can then cause execution to go awry, as it's getting back an unexpected value. That can lead to false positives, false negatives, or exceptions such as an IndexOutOfRangeException due to trying to pop too much from the backtracking stack.

We already have the ability to remember the backtracking stack position when we initially enter the loop so that we can reset to that position later on. The fix is simply to extend that to also perform that reset when failing the match of such a loop in such circumstances.

Fixes #79298, but I think we should backport this.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 8.0.0

@stephentoub stephentoub merged commit 13a9a3c into dotnet:main Dec 7, 2022
@stephentoub stephentoub deleted the fixregexloopstackreset branch December 7, 2022 20:33
@stephentoub
Copy link
Member Author

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2022

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3642612581

@ghost ghost locked as resolved and limited conversation to collaborators Jan 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IndexOutOfRangeException when using compiled and ignorecase regex in .NET 7.0
2 participants