Handle single-node branches in ExtractCommonPrefixNode by danmoseley · Pull Request #124881 · dotnet/runtime

danmoseley · 2026-02-26T02:16:21Z

When alternation branches are reduced to single nodes (e.g., Set[Pp] from a single-child Concatenation after prior prefix extraction), ExtractCommonPrefixNode previously bailed because it required all branches to be Concatenations. This caused IgnoreCase alternation prefix extraction to stop one character short (e.g., htt instead of http for (http|https)).

Fix: Remove the upfront gate check and handle both Concatenation and single-node branches throughout the extraction loop. When a single-node branch matches the common prefix, it is replaced with Empty.

Fixes #124871

Tests: 10 new test cases covering http/https IC, shorter-branch-is-prefix, multi-char difference, 3-branch variants, case-sensitive regression guard, behavioral match correctness, 4-branch mixed single-node/Concat, and non-IgnoreCase Set-node branches.

When alternation branches are reduced to single nodes (e.g., Set[Pp] from a single-child Concatenation after prior prefix extraction), ExtractCommonPrefixNode previously bailed because it required all branches to be Concatenations. This caused IgnoreCase alternation prefix extraction to stop one character short (e.g., 'htt' instead of 'http' for (http|https)). Fix: remove the upfront gate check, and handle both Concatenation and single-node branches throughout the extraction loop. Fixes dotnet#124871 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Behavioral correctness test for (?:http|https)://foo with IgnoreCase - 4-branch regression test exercising single-node branch after recursive prefix extraction - Non-IgnoreCase Set-node branch test via character class alternation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-02-26T02:17:27Z

I believe I follow this, so opening a PR. At least one test fails before the change - the others are just completeness.

-- Dan

Copilot

Pull request overview

This PR fixes a regex optimization issue where alternation prefix extraction stopped prematurely when branches were reduced to single nodes (e.g., Set[Pp]) after prior IgnoreCase prefix extraction. Previously, ExtractCommonPrefixNode required all branches to be Concatenations with at least 2 children, causing patterns like (http|https) with IgnoreCase to extract only "htt" instead of "http".

Changes:

Removed the upfront gate check that required all branches to be multi-child Concatenations
Modified extraction logic to handle both Concatenation and single-node branches by checking branch type before accessing children
When a single-node branch matches the common prefix and is fully extracted, it's replaced with an Empty node

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
RegexNode.cs	Core logic changes to handle single-node branches in `ExtractCommonPrefixNode`, removing gate check and adding branch type checks at lines 1205, 1226, and 1248-1257
RegexFindOptimizationsTests.cs	10 new test cases covering http/https IC, various alternation scenarios with shared prefixes, case-sensitive regression guard, and Set-node branches
Regex.Match.Tests.cs	3 functional tests verifying correctness of http/https pattern matching with IgnoreCase after the optimization

danmoseley · 2026-02-26T03:20:21Z

@MihuBot benchmark Regex

danmoseley · 2026-02-26T03:20:27Z

@MihuBot regexdiff

MihuBot · 2026-02-26T03:40:08Z

396 out of 18857 patterns have generated source code changes.

Examples of GeneratedRegex source diffs

"[^'\",]+'[^^']+'|[^'\",]+\"[^\"]+\"|[^,]+" (21563 uses)

[GeneratedRegex("[^'\",]+'[^^']+'|[^'\",]+\"[^\"]+\"|[^,]+")]

  /// <code>[^'",]+'[^^']+'|[^'",]+"[^"]+"|[^,]+</code><br/>
  /// Explanation:<br/>
  /// <code>
-   /// ○ Match with 3 alternative expressions, atomically.<br/>
+   /// ○ Match with 2 alternative expressions, atomically.<br/>
  ///     ○ Match a sequence of expressions.<br/>
  ///         ○ Match a character in the set [^"',] atomically at least once.<br/>
-   ///         ○ Match '\''.<br/>
-   ///         ○ Match a character in the set [^'^] atomically at least once.<br/>
-   ///         ○ Match '\''.<br/>
-   ///     ○ Match a sequence of expressions.<br/>
-   ///         ○ Match a character in the set [^"',] atomically at least once.<br/>
-   ///         ○ Match '"'.<br/>
-   ///         ○ Match a character other than '"' atomically at least once.<br/>
-   ///         ○ Match '"'.<br/>
+   ///         ○ Match with 2 alternative expressions, atomically.<br/>
+   ///             ○ Match a sequence of expressions.<br/>
+   ///                 ○ Match '\''.<br/>
+   ///                 ○ Match a character in the set [^'^] atomically at least once.<br/>
+   ///                 ○ Match '\''.<br/>
+   ///             ○ Match a sequence of expressions.<br/>
+   ///                 ○ Match '"'.<br/>
+   ///                 ○ Match a character other than '"' atomically at least once.<br/>
+   ///                 ○ Match '"'.<br/>
  ///     ○ Match a character other than ',' atomically at least once.<br/>
  /// </code>
  /// </remarks>
                  int matchStart = pos;
                  ReadOnlySpan<char> slice = inputSpan.Slice(pos);
                  
-                   // Match with 3 alternative expressions, atomically.
+                   // Match with 2 alternative expressions, atomically.
                  {
                      int alternation_starting_pos = pos;
                      
                              pos += iteration;
                          }
                          
-                           // Match '\''.
-                           if (slice.IsEmpty || slice[0] != '\'')
+                           // Match with 2 alternative expressions, atomically.
                          {
-                               goto AlternationBranch;
-                           }
-                           
-                           // Match a character in the set [^'^] atomically at least once.
-                           {
-                               int iteration1 = slice.Slice(1).IndexOfAny('\'', '^');
-                               if (iteration1 < 0)
-                               {
-                                   iteration1 = slice.Length - 1;
-                               }
-                               
-                               if (iteration1 == 0)
+                               if (slice.IsEmpty)
                              {
                                  goto AlternationBranch;
                              }
                              
-                               slice = slice.Slice(iteration1);
-                               pos += iteration1;
+                               switch (slice[0])
+                               {
+                                   case '\'':
+                                       
+                                       // Match a character in the set [^'^] atomically at least once.
+                                       {
+                                           int iteration1 = slice.Slice(1).IndexOfAny('\'', '^');
+                                           if (iteration1 < 0)
+                                           {
+                                               iteration1 = slice.Length - 1;
+                                           }
+                                           
+                                           if (iteration1 == 0)
+                                           {
+                                               goto AlternationBranch;
+                                           }
+                                           
+                                           slice = slice.Slice(iteration1);
+                                           pos += iteration1;
+                                       }
+                                       
+                                       // Match '\''.
+                                       if ((uint)slice.Length < 2 || slice[1] != '\'')
+                                       {
+                                           goto AlternationBranch;
+                                       }
+                                       
+                                       pos += 2;
+                                       slice = inputSpan.Slice(pos);
+                                       break;
+                                       
+                                   case '"':
+                                       
+                                       // Match a character other than '"' atomically at least once.
+                                       {
+                                           int iteration2 = slice.Slice(1).IndexOf('"');
+                                           if (iteration2 < 0)
+                                           {
+                                               iteration2 = slice.Length - 1;
+                                           }
+                                           
+                                           if (iteration2 == 0)
+                                           {
+                                               goto AlternationBranch;
+                                           }
+                                           
+                                           slice = slice.Slice(iteration2);
+                                           pos += iteration2;
+                                       }
+                                       
+                                       // Match '"'.
+                                       if ((uint)slice.Length < 2 || slice[1] != '"')
+                                       {
+                                           goto AlternationBranch;
+                                       }
+                                       
+                                       pos += 2;
+                                       slice = inputSpan.Slice(pos);
+                                       break;
+                                       
+                                   default:
+                                       goto AlternationBranch;
+                               }
                          }
                          
-                           // Match '\''.
-                           if ((uint)slice.Length < 2 || slice[1] != '\'')
-                           {
-                               goto AlternationBranch;
-                           }
-                           
-                           pos += 2;
-                           slice = inputSpan.Slice(pos);
                          goto AlternationMatch;
                          
                          AlternationBranch:
                      
                      // Branch 1
                      {
-                           // Match a character in the set [^"',] atomically at least once.
+                           // Match a character other than ',' atomically at least once.
                          {
-                               int iteration2 = slice.IndexOfAny('"', '\'', ',');
-                               if (iteration2 < 0)
-                               {
-                                   iteration2 = slice.Length;
-                               }
-                               
-                               if (iteration2 == 0)
-                               {
-                                   goto AlternationBranch1;
-                               }
-                               
-                               slice = slice.Slice(iteration2);
-                               pos += iteration2;
-                           }
-                           
-                           // Match '"'.
-                           if (slice.IsEmpty || slice[0] != '"')
-                           {
-                               goto AlternationBranch1;
-                           }
-                           
-                           // Match a character other than '"' atomically at least once.
-                           {
-                               int iteration3 = slice.Slice(1).IndexOf('"');
+                               int iteration3 = slice.IndexOf(',');
                              if (iteration3 < 0)
                              {
-                                   iteration3 = slice.Length - 1;
+                                   iteration3 = slice.Length;
                              }
                              
                              if (iteration3 == 0)
                              {
-                                   goto AlternationBranch1;
+                                   return false; // The input didn't match.
                              }
                              
                              slice = slice.Slice(iteration3);
                              pos += iteration3;
                          }
                          
-                           // Match '"'.
-                           if ((uint)slice.Length < 2 || slice[1] != '"')
-                           {
-                               goto AlternationBranch1;
-                           }
-                           
-                           pos += 2;
-                           slice = inputSpan.Slice(pos);
-                           goto AlternationMatch;
-                           
-                           AlternationBranch1:
-                           pos = alternation_starting_pos;
-                           slice = inputSpan.Slice(pos);
-                       }
-                       
-                       // Branch 2
-                       {
-                           // Match a character other than ',' atomically at least once.
-                           {
-                               int iteration4 = slice.IndexOf(',');
-                               if (iteration4 < 0)
-                               {
-                                   iteration4 = slice.Length;
-                               }
-                               
-                               if (iteration4 == 0)
-                               {
-                                   return false; // The input didn't match.
-                               }
-                               
-                               slice = slice.Slice(iteration4);
-                               pos += iteration4;
-                           }
-                           
                      }
                      
                      AlternationMatch:;

"^(http|https)\\://[a-zA-Z0-9\\-\\.]+(:[a-zA- ..." (821 uses)

[GeneratedRegex("^(http|https)\\://[a-zA-Z0-9\\-\\.]+(:[a-zA-Z0-9]*)?(/[a-zA-Z0-9\\-\\._]*)*$", RegexOptions.IgnoreCase)]

  /// ○ 1st capture group.<br/>
  ///     ○ Match a character in the set [Hh].<br/>
  ///     ○ Match a character in the set [Tt] exactly 2 times.<br/>
-   ///     ○ Match with 2 alternative expressions.<br/>
-   ///         ○ Match a character in the set [Pp].<br/>
-   ///         ○ Match a sequence of expressions.<br/>
-   ///             ○ Match a character in the set [Pp].<br/>
-   ///             ○ Match a character in the set [Ss].<br/>
+   ///     ○ Match a character in the set [Pp].<br/>
+   ///     ○ Match a character in the set [Ss] atomically, optionally.<br/>
  /// ○ Match the string "://".<br/>
  /// ○ Match a character in the set [\-.0-9A-Za-z\u212A] greedily at least once.<br/>
  /// ○ Optional (greedy).<br/>
              {
                  int pos = base.runtextpos;
                  int matchStart = pos;
-                   int alternation_branch = 0;
-                   int alternation_starting_capturepos = 0;
-                   int alternation_starting_pos = 0;
                  int capture_starting_pos = 0;
                  int charloop_capture_pos = 0;
                  int charloop_starting_pos = 0, charloop_ending_pos = 0;
                  }
                  
                  // 1st capture group.
-                   //{
+                   {
                      capture_starting_pos = pos;
                      
-                       if ((uint)slice.Length < 3 ||
-                           !slice.StartsWith("htt", StringComparison.OrdinalIgnoreCase)) // Match the string "htt" (ordinal case-insensitive)
+                       if ((uint)slice.Length < 4 ||
+                           !slice.StartsWith("http", StringComparison.OrdinalIgnoreCase)) // Match the string "http" (ordinal case-insensitive)
                      {
                          UncaptureUntil(0);
                          return false; // The input didn't match.
                      }
                      
-                       // Match with 2 alternative expressions.
-                       //{
-                           alternation_starting_pos = pos;
-                           alternation_starting_capturepos = base.Crawlpos();
-                           
-                           // Branch 0
-                           //{
-                               // Match a character in the set [Pp].
-                               if ((uint)slice.Length < 4 || ((slice[3] | 0x20) != 'p'))
-                               {
-                                   goto AlternationBranch;
-                               }
-                               
-                               alternation_branch = 0;
-                               pos += 4;
-                               slice = inputSpan.Slice(pos);
-                               goto AlternationMatch;
-                               
-                               AlternationBranch:
-                               pos = alternation_starting_pos;
-                               slice = inputSpan.Slice(pos);
-                               UncaptureUntil(alternation_starting_capturepos);
-                           //}
-                           
-                           // Branch 1
-                           //{
-                               if ((uint)slice.Length < 5 ||
-                                   !slice.Slice(3).StartsWith("ps", StringComparison.OrdinalIgnoreCase)) // Match the string "ps" (ordinal case-insensitive)
-                               {
-                                   UncaptureUntil(0);
-                                   return false; // The input didn't match.
-                               }
-                               
-                               alternation_branch = 1;
-                               pos += 5;
-                               slice = inputSpan.Slice(pos);
-                               goto AlternationMatch;
-                           //}
-                           
-                           AlternationBacktrack:
-                           if (Utilities.s_hasTimeout)
+                       // Match a character in the set [Ss] atomically, optionally.
+                       {
+                           if ((uint)slice.Length > (uint)4 && ((slice[4] | 0x20) == 's'))
                          {
-                               base.CheckTimeout();
+                               slice = slice.Slice(1);
+                               pos++;
                          }
-                           
-                           switch (alternation_branch)
-                           {
-                               case 0:
-                                   goto AlternationBranch;
-                               case 1:
-                                   UncaptureUntil(0);
-                                   return false; // The input didn't match.
-                           }
-                           
-                           AlternationMatch:;
-                       //}
+                       }
                      
+                       pos += 4;
+                       slice = inputSpan.Slice(pos);
                      base.Capture(1, capture_starting_pos, pos);
-                       
-                       goto CaptureSkipBacktrack;
-                       
-                       CaptureBacktrack:
-                       goto AlternationBacktrack;
-                       
-                       CaptureSkipBacktrack:;
-                   //}
+                   }
                  
                  // Match the string "://".
                  if (!slice.StartsWith("://"))
                  {
-                       goto CaptureBacktrack;
+                       UncaptureUntil(0);
+                       return false; // The input didn't match.
                  }
                  
                  // Match a character in the set [\-.0-9A-Za-z\u212A] greedily at least once.
                      
                      if (iteration == 0)
                      {
-                           goto CaptureBacktrack;
+                           UncaptureUntil(0);
+                           return false; // The input didn't match.
                      }
                      
                      slice = slice.Slice(iteration);
                      
                      if (charloop_starting_pos >= charloop_ending_pos)
                      {
-                           goto CaptureBacktrack;
+                           UncaptureUntil(0);
+                           return false; // The input didn't match.
                      }
                      pos = --charloop_ending_pos;
                      slice = inputSpan.Slice(pos);
                          base.Capture(2, capture_starting_pos1, pos);
                          
                          Utilities.StackPush(ref base.runstack!, ref stackpos, capture_starting_pos1);
-                           goto CaptureSkipBacktrack1;
+                           goto CaptureSkipBacktrack;
                          
-                           CaptureBacktrack1:
+                           CaptureBacktrack:
                          capture_starting_pos1 = base.runstack![--stackpos];
                          goto CharLoopBacktrack1;
                          
-                           CaptureSkipBacktrack1:;
+                           CaptureSkipBacktrack:;
                      //}
                      
                      
                          // No iterations of the loop remain to backtrack into. Fail the loop.
                          goto CharLoopBacktrack;
                      }
-                       goto CaptureBacktrack1;
+                       goto CaptureBacktrack;
                      LoopEnd:;
                  //}

"^[\\s\\S]+?(?=[\\\\<!\\[*`~\\:]|\\b_|\\bhttp ..." (774 uses)

[GeneratedRegex("^[\\s\\S]+?(?=[\\\\<!\\[*`~\\:]|\\b_|\\bhttps?:\\/\\/| {2,}\\n|$)")]

  /// ○ Match if at the beginning of the string.<br/>
  /// ○ Match any character lazily at least once.<br/>
  /// ○ Zero-width positive lookahead.<br/>
-   ///     ○ Match with 5 alternative expressions, atomically.<br/>
+   ///     ○ Match with 4 alternative expressions, atomically.<br/>
  ///         ○ Match a character in the set [!*:&lt;[\\`~].<br/>
  ///         ○ Match a sequence of expressions.<br/>
  ///             ○ Match if at a word boundary.<br/>
-   ///             ○ Match '_'.<br/>
-   ///         ○ Match a sequence of expressions.<br/>
-   ///             ○ Match if at a word boundary.<br/>
-   ///             ○ Match the string "http".<br/>
-   ///             ○ Match 's' atomically, optionally.<br/>
-   ///             ○ Match the string "://".<br/>
+   ///             ○ Match with 2 alternative expressions, atomically.<br/>
+   ///                 ○ Match '_'.<br/>
+   ///                 ○ Match a sequence of expressions.<br/>
+   ///                     ○ Match the string "http".<br/>
+   ///                     ○ Match 's' atomically, optionally.<br/>
+   ///                     ○ Match the string "://".<br/>
  ///         ○ Match a sequence of expressions.<br/>
  ///             ○ Match ' ' atomically at least twice.<br/>
  ///             ○ Match '\n'.<br/>
                      
                      int atomic_stackpos = stackpos;
                      
-                       // Match with 5 alternative expressions, atomically.
+                       // Match with 4 alternative expressions, atomically.
                      {
                          int alternation_starting_pos = pos;
                          
                                  goto AlternationBranch1;
                              }
                              
-                               // Match '_'.
-                               if (slice.IsEmpty || slice[0] != '_')
+                               // Match with 2 alternative expressions, atomically.
                              {
-                                   goto AlternationBranch1;
+                                   if (slice.IsEmpty)
+                                   {
+                                       goto AlternationBranch1;
+                                   }
+                                   
+                                   switch (slice[0])
+                                   {
+                                       case '_':
+                                           pos++;
+                                           slice = inputSpan.Slice(pos);
+                                           break;
+                                           
+                                       case 'h':
+                                           // Match the string "ttp".
+                                           if (!slice.Slice(1).StartsWith("ttp"))
+                                           {
+                                               goto AlternationBranch1;
+                                           }
+                                           
+                                           // Match 's' atomically, optionally.
+                                           {
+                                               if ((uint)slice.Length > (uint)4 && slice[4] == 's')
+                                               {
+                                                   slice = slice.Slice(1);
+                                                   pos++;
+                                               }
+                                           }
+                                           
+                                           // Match the string "://".
+                                           if (!slice.Slice(4).StartsWith("://"))
+                                           {
+                                               goto AlternationBranch1;
+                                           }
+                                           
+                                           pos += 7;
+                                           slice = inputSpan.Slice(pos);
+                                           break;
+                                           
+                                       default:
+                                           goto AlternationBranch1;
+                                   }
                              }
                              
-                               pos++;
-                               slice = inputSpan.Slice(pos);
                              goto AlternationMatch;
                              
                              AlternationBranch1:
                          }
                          
                          // Branch 2
-                           {
-                               // Match if at a word boundary.
-                               if (!Utilities.IsPreWordCharBoundary(inputSpan, pos))
-                               {
-                                   goto AlternationBranch2;
-                               }
-                               
-                               // Match the string "http".
-                               if (!slice.StartsWith("http"))
-                               {
-                                   goto AlternationBranch2;
-                               }
-                               
-                               // Match 's' atomically, optionally.
-                               {
-                                   if ((uint)slice.Length > (uint)4 && slice[4] == 's')
-                                   {
-                                       slice = slice.Slice(1);
-                                       pos++;
-                                   }
-                               }
-                               
-                               // Match the string "://".
-                               if (!slice.Slice(4).StartsWith("://"))
-                               {
-                                   goto AlternationBranch2;
-                               }
-                               
-                               pos += 7;
-                               slice = inputSpan.Slice(pos);
-                               goto AlternationMatch;
-                               
-                               AlternationBranch2:
-                               pos = alternation_starting_pos;
-                               slice = inputSpan.Slice(pos);
-                           }
-                           
-                           // Branch 3
                          {
                              // Match ' ' atomically at least twice.
                              {
                                  
                                  if (iteration < 2)
                                  {
-                                       goto AlternationBranch3;
+                                       goto AlternationBranch2;
                                  }
                                  
                                  slice = slice.Slice(iteration);
                              // Match '\n'.
                              if (slice.IsEmpty || slice[0] != '\n')
                              {
-                                   goto AlternationBranch3;
+                                   goto AlternationBranch2;
                              }
                              
                              pos++;
                              slice = inputSpan.Slice(pos);
                              goto AlternationMatch;
                              
-                               AlternationBranch3:
+                               AlternationBranch2:
                              pos = alternation_starting_pos;
                              slice = inputSpan.Slice(pos);
                          }
                          
-                           // Branch 4
+                           // Branch 3
                          {
                              // Match if at the end of the string or if before an ending newline.
                              if (pos < inputSpan.Length - 1 || ((uint)pos < (uint)inputSpan.Length && inputSpan[pos] != '\n'))

"\\G(?:[\"“”]|\\s|\\\\[@#*]|\\\\[@#*bfmv])" (33 uses)

[GeneratedRegex("\\G(?:[\"“”]|\\s|\\\\[@#*]|\\\\[@#*bfmv])")]

  /// Explanation:<br/>
  /// <code>
  /// ○ Match if at the start position.<br/>
-   /// ○ Match with 3 alternative expressions, atomically.<br/>
+   /// ○ Match with 2 alternative expressions, atomically.<br/>
  ///     ○ Match a character in the set ["\u201C\u201D\s].<br/>
  ///     ○ Match a sequence of expressions.<br/>
  ///         ○ Match '\\'.<br/>
-   ///         ○ Match a character in the set [#*@].<br/>
-   ///     ○ Match a sequence of expressions.<br/>
-   ///         ○ Match '\\'.<br/>
  ///         ○ Match a character in the set [#*@bfmv].<br/>
  /// </code>
  /// </remarks>
                      return false; // The input didn't match.
                  }
                  
-                   // Match with 3 alternative expressions, atomically.
+                   // Match with 2 alternative expressions, atomically.
                  {
                      int alternation_starting_pos = pos;
                      
                      }
                      
                      // Branch 1
-                       {
-                           if ((uint)slice.Length < 2 ||
-                               slice[0] != '\\' || // Match '\\'.
-                               (((ch = slice[1]) != '#') & (ch != '*') & (ch != '@'))) // Match a character in the set [#*@].
-                           {
-                               goto AlternationBranch1;
-                           }
-                           
-                           pos += 2;
-                           slice = inputSpan.Slice(pos);
-                           goto AlternationMatch;
-                           
-                           AlternationBranch1:
-                           pos = alternation_starting_pos;
-                           slice = inputSpan.Slice(pos);
-                       }
-                       
-                       // Branch 2
                      {
                          if ((uint)slice.Length < 2 ||
                              slice[0] != '\\' || // Match '\\'.

For more diff examples, see https://gist.github.com/MihuBot/8b933494bb1466554d325dc4ca9fc8d4

JIT assembly changes

Total bytes of base: 54284087
Total bytes of diff: 54418439
Total bytes of delta: 134352 (0.25 % of base)
Total relative delta: 50.97
    diff is a regression.
    relative diff is a regression.

For a list of JIT diff regressions, see Regressions.md
For a list of JIT diff improvements, see Improvements.md

Sample source code for further analysis

const string JsonPath = "RegexResults-1798.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FH1OjmHA");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

MihuBot · 2026-02-26T05:00:37Z

See benchmark results at https://gist.github.com/MihuBot/0dcd9383958d2b13883e0441fc6791f3

danmoseley · 2026-02-26T05:48:25Z

Diffs seem to be expected as far as I can tell. Some of them show cascading improvements.

eg [GeneratedRegex("\\G(?:[\"“”]|\\s|\\\\[@#*]|\\\\[@#*bfmv])")] above goes from 3 branches to 2 because it now sees the last two branches start with \ and that allows it to see the subsequent parts collapse into [@#*bfmw] instead of their own alternation.

-- Dan

danmoseley · 2026-02-26T05:56:12Z

AI digestion of mihubot perf run. As expected, none of our benchmarks really benefit from this - most have unchanged code generated for a start. The reason to take this change would be to get the diffs on more real world patterns shown by the mihubot diffs. Almost by definition, they are more relevant to customers than many of our perf tests scenarios.

-- Dan

======

Suite	Test	Pattern/Options	Ratio	Notes
Sherlock	Count	`.*` / None	1.05	not changed by PR
Sherlock	Count	`(?i)Sher[a-z]+\|Hol[a-z]+` / Compiled	1.17	not changed by PR (no shared prefix)
Sherlock	Count	`(?i)Sherlock` / NonBacktracking	0.77	not changed by PR
Sherlock	Count	`(?i)Sherlock\|Holmes\|Watson` / Compiled	1.04	not changed by PR (no shared prefix)
Sherlock	Count	`(?i)Sherlock\|Holmes\|Watson` / NonBacktracking	1.09	not changed by PR (no shared prefix)
Sherlock	Count	`(?i)Sherlock\|...\|Baker` / Compiled	1.10	not changed by PR (no shared prefix)
Sherlock	Count	`(?i)the` / None	1.08	not changed by PR
Sherlock	Count	`\p{L}` / NonBacktracking	1.08	not changed by PR
Sherlock	Count	`\p{Lu}` / Compiled	1.04	not changed by PR
Sherlock	Count	`\w+` / NonBacktracking	1.04	not changed by PR
Sherlock	Count	`\w+\s+Holmes\s+\w+` / Compiled	0.95	not changed by PR
Sherlock	Count	`(?s).*` / Compiled	0.97	not changed by PR
Russian	Count	`Шерлок Холмс` / None	0.84	not changed by PR
Russian	Count	`Шерлок Холмс` / NonBacktracking	0.96	not changed by PR
Mariomkas	Ctor	IP pattern / None	0.97	not changed by PR
Mariomkas	Count	IP pattern / NonBacktracking	1.04	not changed by PR
Mariomkas	Ctor	email / NonBacktracking	0.93	not changed by PR
BoostDocs	IsMatch	#2 / NonBacktracking	1.09	not changed by PR (unknown pattern)
BoostDocs	IsMatch	#6 / None	1.11	not changed by PR (unknown pattern)
BoostDocs	IsMatch	#12 / NonBacktracking	1.04	not changed by PR (unknown pattern)
Common	Email_IsMatch / None		1.08	not changed by PR
Common	Uri_IsNotMatch / Compiled		0.94	not changed by PR
Common	IP_IsNotMatch / Compiled		0.97	not changed by PR
Common	ReplaceWords / Compiled		0.91	not changed by PR
Common	SplitWords / Compiled		0.94	not changed by PR
Common	MatchesWords / IC, Compiled		1.21	not changed by PR
Common	ReplaceWords / IC, Compiled		0.82	not changed by PR
Common	SplitWords / IC, Compiled		0.66	not changed by PR
Common	Date_IsMatch / IC, Compiled		0.90	not changed by PR
Common	MatchWord / IC, Compiled		1.11	not changed by PR

Bottom line: Every significant benchmark deviation is noise. None of the benchmark patterns exercise the single-node branch fix. The wildly swinging Common results (SplitWords IC Compiled at 0.66, MatchesWords
IC Compiled at 1.21) in opposite directions confirm these are measurement artifacts, not real effects. The real impact of this PR is shown by the 396 source code diffs, not this benchmark suite.

src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexFindOptimizationsTests.cs

stephentoub

Thanks. The improvement looks good. I'd like to see some more tests as commented.

Address review feedback: - Add PatternsReduceIdentically tests validating reduced tree shapes for IgnoreCase and non-IgnoreCase node prefix extraction - Add PatternsReduceDifferently tests confirming optional vs required patterns produce distinct trees - Add non-IgnoreCase variants of LeadingPrefix test cases - Add LeadingSet tests for reversed branch order and IgnoreCase sets Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-02-26T16:58:05Z

tests added. I'll merge on green.

danmoseley and others added 2 commits February 25, 2026 15:22

Copilot AI review requested due to automatic review settings February 26, 2026 02:16

danmoseley mentioned this pull request Feb 26, 2026

Handle single-node branches in ExtractCommonPrefixNode danmoseley/runtime#34

Closed

github-actions bot added the area-System.Text.RegularExpressions label Feb 26, 2026

dotnet-policy-service bot assigned danmoseley Feb 26, 2026

Copilot started reviewing on behalf of danmoseley February 26, 2026 02:17 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

This was referenced Feb 26, 2026

[RegexDiff X64] [danmoseley] Handle single-node branches in ExtractCommonPre ... MihuBot/runtime-utils#1798

Open

[Benchmark X64] [danmoseley] Handle single-node branches in ExtractCommonPre ... MihuBot/runtime-utils#1799

Open

build-analysis bot mentioned this pull request Feb 26, 2026

[android][clr] No peer certificates when executing System.Net.Http.Functional.Tests on Android emulator #124526

Open

stephentoub reviewed Feb 26, 2026

View reviewed changes

src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexFindOptimizationsTests.cs Show resolved Hide resolved

stephentoub reviewed Feb 26, 2026

View reviewed changes

src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexFindOptimizationsTests.cs Show resolved Hide resolved

stephentoub approved these changes Feb 26, 2026

View reviewed changes

danmoseley enabled auto-merge (squash) February 26, 2026 16:58

danmoseley merged commit 371aca3 into dotnet:main Feb 26, 2026
87 of 90 checks passed

build-analysis bot mentioned this pull request Feb 26, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

dotnet-maestro bot mentioned this pull request Feb 27, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#5129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle single-node branches in ExtractCommonPrefixNode#124881

Handle single-node branches in ExtractCommonPrefixNode#124881
danmoseley merged 3 commits intodotnet:mainfrom
danmoseley:regex-redux/fix-alternation-prefix

danmoseley commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

MihuBot commented Feb 26, 2026

Uh oh!

MihuBot commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026 •

edited

Loading

Uh oh!

danmoseley commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

stephentoub left a comment

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

danmoseley commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

MihuBot commented Feb 26, 2026

Uh oh!

MihuBot commented Feb 26, 2026

Uh oh!

danmoseley commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danmoseley commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

danmoseley commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danmoseley commented Feb 26, 2026 •

edited

Loading

danmoseley commented Feb 26, 2026 •

edited

Loading