-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Port Regex Boyer-Moore fix to Preview 8 #39422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dotnet/dnceng ? |
|
That's strange. Let me check. |
Have seen this before in both github and AzDO, it's usually a replication delay in the backing storage where one side thinks it's created files and the other side is fetching from a replicated version. Not much we can do when it's in github other than retry. |
|
@MattGal was it retrying and I somehow interrupted it? or it retried a few times and still no luck? |
|
Unlucky snap point? |
|
@eiriktsarpalis these errors seem related to cd8759d ? |
By the time I looked that build was deleted from more commits being pushed, but this can succeed on retry. |
|
@danmosemsft the build error is tracked by this issue #39444 |
|
Hello @danmosemsft! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
|
@jaredpar one of the builds here did not complete in 60 minutes and timed out - OSX: https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/734197/logs/22 |
Summary
Fix #39390
Some simple regex patterns will not match. In this case the pattern "H#" would not match "#H#" iff RegexOptions.IgnoreCase | RegexOptions.Compiled.
Because the pattern contains a literal prefix (indeed it is the entire pattern) we will use Boyer-Moore to find the first instance of it. (One could imagine a more efficient way to search for a 2-character prefix.) Because the IgnoreCase was passed, we lowercase the pattern immediately to "h#", and when we match against a character in the text, we must lower case that character to compare it.
As a performance optimization, in the Compiled path, we avoid calling ToLower on the text candidate if we can cheaply verify that the character we are searching for is not be affected by case conversion. In this case, for example, we need not bother to lower case the text candidate character when we are searching for "#" because it is in a UnicodeCategory ("OtherPunctuation") which we know is not affected by case conversion. This optimization, like many others, does not exist in the non Compiled path.
The bug was that when deciding whether to lowercase the text candidate, instead of examining the character we were searching for, we were examining the last character of the prefix instead. In this repro case that is "#" so when searching for "H" we would not lower case it.
Customer Impact
This blocks Bing updating to the latest build.
Regression
Certainly yes
Risk
Very low. The change is localized and the problem well understood. I added a test that fails without this fix.