-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RegexFindOptimization for embedded strings #67907
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsCurrently if a pattern begins with a multi-character string, we'll use IndexOf with that substring to find the next possible match location. But if that multi-character string is a non-zero fixed distance into the pattern, we won't see it. This changes that, letting us find such strings based on the fixed-distance sets we're already gathering, and using IndexOf for it. From dotnet/performance:
There are ~170 patterns in our corpus that benefit from this.
|
Currently if a pattern begins with a multi-character string, we'll use IndexOf with that substring to find the next possible match location. But if that multi-character string is a non-zero fixed distance into the pattern, we won't see it. This changes that, letting us find such strings based on the fixed-distance sets we're already gathering, and using IndexOf for it.
3eab3c6
to
b3076dd
Compare
.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFindOptimizations.cs
Show resolved
Hide resolved
...ibraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments but looks good otherwise. Great to see those nice improvements.
Improvements on ubuntu x64: dotnet/perf-autofiling-issues#4747 |
Improvements on all configurations for a variety of tests. Here's the most drastic improvement: System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig.Count(Pattern: "[a-z]shing", Options: None)
|
Currently if a pattern begins with a multi-character string, we'll use IndexOf with that substring to find the next possible match location. But if that multi-character string is a non-zero fixed distance into the pattern, we won't see it. This changes that, letting us find such strings based on the fixed-distance sets we're already gathering, and using IndexOf for it.
From dotnet/performance:
There are ~170 patterns in our corpus that benefit from this.