Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegexOptions.NonBacktracking fixed-length markers never triggered #65532

Closed
stephentoub opened this issue Feb 17, 2022 · 10 comments
Closed

RegexOptions.NonBacktracking fixed-length markers never triggered #65532

stephentoub opened this issue Feb 17, 2022 · 10 comments

Comments

@stephentoub
Copy link
Member

stephentoub commented Feb 17, 2022

Running our regex tests with code coverage shows this for SymbolicRegexMatcher.FindMatch:
image
This suggests that there's something very wrong with that logic, as we definitely have tests that should trigger it, and it should either be fixed or all the "watchdog" logic deleted.

cc: @olsaarik, @veanes

@stephentoub stephentoub added this to the 7.0.0 milestone Feb 17, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 17, 2022
@ghost
Copy link

ghost commented Feb 17, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

Running our regex tests with code coverage shows this for SymbolicRegexMatcher.FindMatch:
image
This suggests that there's something very wrong with that logic, as we definitely have tests that should trigger it, and it should either be fixed or deleted.

cc: @olsaarik, @veanes

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@veanes
Copy link
Contributor

veanes commented Feb 18, 2022

@olsaarik The watchdog logic may have become something that is no longer triggered because of all the other optimizations.
If this is the case it should be deleted, it may also cause conflict inside ordered-or. We'll discuss it also next week (Tuesday) on our separate weekly meeting with Olli.

@stephentoub
Copy link
Member Author

If we end up deleting it, presumably it would be sound to extend the phase 3 skipping based on whole-pattern fixed length from #65531 to also skip phase 2. That's not as comprehensive as the "watchdog" was meant to be, but it will get some of the cases, and it'd be easy.

@veanes
Copy link
Contributor

veanes commented Feb 21, 2022

I did some debugging. It seems there is a change in the converter code that at some point turned this feature off, so watchdogs are not inserted although they should be inserted:
RegexNodeToSymbolicConverter.cs:
where the concatenation topLevel:false in line 240, I think this is a bug
not sure when or why this happened:

                case RegexNodeKind.Capture when node.N == -1:
                    int captureNum;
                    if (_caps == null || !_caps.TryGetValue(node.M, out captureNum))
                        captureNum = node.M;
                    return _builder.MkCapture(Convert(node.Child(0), topLevel: false), captureNum);

I believe the correct code should be the following where topLevel is the input parameter of the Convert method, unless there is another reason why topLevel was turned off (related to captures maybe?)
@olsaarik could also take a look at this.

                case RegexNodeKind.Capture when node.N == -1:
                    int captureNum;
                    if (_caps == null || !_caps.TryGetValue(node.M, out captureNum))
                        captureNum = node.M;
                    return _builder.MkCapture(Convert(node.Child(0), topLevel), captureNum);

Watchdogs were only appended after top level concatenations, but here this info is turned off, so that later on, any potential watchdog is never appended even when fixed length is clearly detected.

@olsaarik
Copy link
Contributor

@veanes Good find. This was not intended and that Convert call should indeed just pass in topLevel.

@veanes
Copy link
Contributor

veanes commented Feb 22, 2022

@olsaarik thanks for confirming this. I'll fix it with my next edits and also add an extra unit test I was using that should clearly trigger it, no matter what other optimizations are in place.

@stephentoub
Copy link
Member Author

The value is now being just passed along for a capture:

return _builder.CreateCapture(ConvertToSymbolicRegexNode(node.Child(0), tryCreateFixedLengthMarker), captureNum);

but it's still not being hit:
image

@stephentoub stephentoub added bug and removed untriaged New issue has not been triaged by the area owner labels Feb 28, 2022
@stephentoub
Copy link
Member Author

I was hoping this may have been addressed by #68199, but alas, it was not:
image

@olsaarik
Copy link
Contributor

olsaarik commented Jun 7, 2022

This was fixed in #69839. Fixed length marker use was also further improved in #70217.

@olsaarik olsaarik closed this as completed Jun 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants