Skip to content

Handle Capture nodes in TryGetOrdinalCaseInsensitiveString#124842

Merged
danmoseley merged 12 commits intodotnet:mainfrom
danmoseley:regex-redux/fix-ignorecase-capture-gap
Mar 18, 2026
Merged

Handle Capture nodes in TryGetOrdinalCaseInsensitiveString#124842
danmoseley merged 12 commits intodotnet:mainfrom
danmoseley:regex-redux/fix-ignorecase-capture-gap

Conversation

@danmoseley
Copy link
Member

TryGetOrdinalCaseInsensitiveString iterates the direct children of a Concatenate node to extract an ordinal case-insensitive prefix string. It handles One, Multi, Set, Empty, and zero-width assertions — but when it encounters a Capture node, it breaks out of the loop, never examining the content inside.

For a pattern like \b(in)\b with IgnoreCase, the regex tree after lowering is:

Capture(0) → Concatenate(Boundary, Capture(1) → Concatenate(Set([Ii]), Set([Nn])), Boundary)

FindPrefixOrdinalCaseInsensitive descends through Capture(0) and calls TryGetOrdinalCaseInsensitiveString on the inner Concatenate. At child index 1 (Capture(1)), the method breaks — it never finds "in". The pattern falls through to the slower FixedDistanceSets path (or, after #124736, uses the multi-string ordinal SearchValues path with 4 case variants).

This change unwraps Capture nodes transparently and recurses into nested Concatenate children, matching the behavior already present in FindPrefixesCore. This allows \b(in)\b with IgnoreCase to use the optimal LeadingString_OrdinalIgnoreCase_LeftToRight strategy with a single "in" string and OrdinalIgnoreCase comparison.

Follows up on a codegen diff observed in #124736.

Source-generated code diff for [GeneratedRegex(@"\b(in)\b", RegexOptions.IgnoreCase)]
 private bool TryFindNextPossibleStartingPosition(ReadOnlySpan<char> inputSpan)
 {
     int pos = base.runtextpos;

     // Any possible match is at least 2 characters.
     if (pos <= inputSpan.Length - 2)
     {
-        // The pattern has multiple strings that could begin the match. Search for any of them.
-        // If none can be found, there's no match.
-        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_...);
+        // The pattern has the literal "in" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+        // If it can't be found, there's no match.
+        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_in_OrdinalIgnoreCase);
         if (i >= 0)
         {
             base.runtextpos = pos + i;
             return true;
         }
     }

     base.runtextpos = inputSpan.Length;
     return false;
 }
-/// Supports searching for the specified strings.
-internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_... =
-    SearchValues.Create(["IN", "iN", "In", "in"], StringComparison.Ordinal);
+/// Supports searching for the string "in".
+internal static readonly SearchValues<string> s_indexOfString_in_OrdinalIgnoreCase =
+    SearchValues.Create(["in"], StringComparison.OrdinalIgnoreCase);

TryGetOrdinalCaseInsensitiveString iterates the children of a
Concatenate node to extract an ordinal case-insensitive prefix string.
Previously it did not handle Capture or nested Concatenate nodes,
causing patterns like \b(in)\b with IgnoreCase to miss the optimal
LeadingString_OrdinalIgnoreCase search path and fall through to the
slower FixedDistanceSets path.

Unwrap Capture nodes transparently and recurse into Concatenate
children, matching the behavior of FindPrefixesCore.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves regex prefix analysis for ordinal case-insensitive searches by making TryGetOrdinalCaseInsensitiveString treat Capture nodes as transparent, enabling more patterns (e.g., \b(in)\b with IgnoreCase) to use the faster leading-string search strategy.

Changes:

  • Unwrap Capture nodes during ordinal ignore-case prefix extraction and handle nested Concatenate nodes via recursion.
  • Add unit tests validating that capture groups don’t prevent ordinal ignore-case leading-prefix detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Makes prefix extraction skip over Capture nodes and recurse into nested Concatenate nodes to find an ordinal ignore-case prefix.
src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexFindOptimizationsTests.cs Adds test cases ensuring capture groups are transparent to ordinal ignore-case prefix extraction.

danmoseley and others added 2 commits February 24, 2026 20:45
Add TryEnsureSufficientExecutionStack check before the recursive call
in TryGetOrdinalCaseInsensitiveString to safely handle deeply nested
capture patterns like ((((ab)))) without risking a stack overflow.

Add OuterLoop test exercising 2000-deep capture nesting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TryGetOrdinalCaseInsensitiveString is also called from the compiler and
source generator (EmitConcatenation). Although TryGetJoinableLengthCheckChildRange
currently excludes Capture nodes from the joinable range, add an explicit
unwrapCaptures parameter (default false) as defense-in-depth so only the
prefix analysis caller opts into Capture unwrapping.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member Author

@MihuBot regexdiff

@MihuBot
Copy link

MihuBot commented Feb 25, 2026

180 out of 18857 patterns have generated source code changes.

Examples of GeneratedRegex source diffs
"\\b(in)\\b" (658 uses)
[GeneratedRegex("\\b(in)\\b", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 2 characters.
                     if (pos <= inputSpan.Length - 2)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_409072BF36F03A4496ACC585815833300ABA306360D979616ACDCED385DDC8FB);
+                        // The pattern has the literal "in" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_in_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_409072BF36F03A4496ACC585815833300ABA306360D979616ACDCED385DDC8FB = SearchValues.Create(["IN", "iN", "In", "in"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "in".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_in_OrdinalIgnoreCase = SearchValues.Create(["in"], StringComparison.OrdinalIgnoreCase);
     }
 }
"\\b(from).+(to)\\b.+" (316 uses)
[GeneratedRegex("\\b(from).+(to)\\b.+", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 8 characters.
                     if (pos <= inputSpan.Length - 8)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_DA0DF7757216159252C4FA00AB5982AAA4403D2C43304873401C53E36F92CA04);
+                        // The pattern has the literal "from" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_from_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_DA0DF7757216159252C4FA00AB5982AAA4403D2C43304873401C53E36F92CA04 = SearchValues.Create(["FROM", "fROM", "FrOM", "frOM", "FRoM", "fRoM", "FroM", "froM", "FROm", "fROm", "FrOm", "frOm", "FRom", "fRom", "From", "from"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "from".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_from_OrdinalIgnoreCase = SearchValues.Create(["from"], StringComparison.OrdinalIgnoreCase);
     }
 }
"(DATEADD|DATEPART)\\(\\s*(YEAR|Y|YY|YYYY|MON ..." (294 uses)
[GeneratedRegex("(DATEADD|DATEPART)\\(\\s*(YEAR|Y|YY|YYYY|MONTH|MM|M|DAYOFYEAR|DY|DAY|DD|D|WEEKDAY|DW|HOUR|HH|MINUTE|MI|N|SECOND|SS|S|MILLISECOND|MS)\\s*\\,", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant)]
                     // Any possible match is at least 10 characters.
                     if (pos <= inputSpan.Length - 10)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_OrdinalIgnoreCase_2AC5E9CD8492EE9AF8BE2E7D112B6E7B0E2EB16F4F0FF47ECAA2B811EE26A081);
+                        // The pattern has the literal "date(" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_1DE7C48BB4BC0E30E65E38B4F39A75CA57C22461AE122A6380A42312C9E67BCA);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_OrdinalIgnoreCase_2AC5E9CD8492EE9AF8BE2E7D112B6E7B0E2EB16F4F0FF47ECAA2B811EE26A081 = SearchValues.Create(["dateadd", "datepart"], StringComparison.OrdinalIgnoreCase);
+        /// <summary>Supports searching for the string "date(".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_1DE7C48BB4BC0E30E65E38B4F39A75CA57C22461AE122A6380A42312C9E67BCA = SearchValues.Create(["date("], StringComparison.OrdinalIgnoreCase);
     }
 }
"\\b(et\\s*(le|la(s)?)?)\\b.+" (291 uses)
[GeneratedRegex("\\b(et\\s*(le|la(s)?)?)\\b.+", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 3 characters.
                     if (pos <= inputSpan.Length - 3)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_40190A5AE82B92C9577FE9A45CD09B22413116F9859390E6536F6EF2E5085EA1);
+                        // The pattern has the literal "et" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_et_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_40190A5AE82B92C9577FE9A45CD09B22413116F9859390E6536F6EF2E5085EA1 = SearchValues.Create(["ET", "eT", "Et", "et"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "et".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_et_OrdinalIgnoreCase = SearchValues.Create(["et"], StringComparison.OrdinalIgnoreCase);
     }
 }
"\\b(em)\\b" (200 uses)
[GeneratedRegex("\\b(em)\\b", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 2 characters.
                     if (pos <= inputSpan.Length - 2)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_00298CB1C9B37035848F363BE27E1EB54A4FE98FE07EEFB24B812417AC25856B);
+                        // The pattern has the literal "em" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_em_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_00298CB1C9B37035848F363BE27E1EB54A4FE98FE07EEFB24B812417AC25856B = SearchValues.Create(["EM", "eM", "Em", "em"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "em".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_em_OrdinalIgnoreCase = SearchValues.Create(["em"], StringComparison.OrdinalIgnoreCase);
     }
 }
"\\b(avant)\\b" (195 uses)
[GeneratedRegex("\\b(avant)\\b", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 5 characters.
                     if (pos <= inputSpan.Length - 5)
                     {
-                        // The pattern matches a character in the set [Vv] at index 1.
-                        // Find the next occurrence. If it can't be found, there's no match.
-                        ReadOnlySpan<char> span = inputSpan.Slice(pos);
-                        for (int i = 0; i < span.Length - 4; i++)
+                        // The pattern has the literal "avant" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_avant_OrdinalIgnoreCase);
+                        if (i >= 0)
                         {
-                            int indexOfPos = span.Slice(i + 1).IndexOfAny('V', 'v');
-                            if (indexOfPos < 0)
-                            {
-                                goto NoMatchFound;
-                            }
-                            i += indexOfPos;
-                            
-                            // The primary set being searched for was found. 2 more sets will be checked so as
-                            // to minimize the number of places TryMatchAtCurrentPosition is run unnecessarily.
-                            // Make sure they fit in the remainder of the input.
-                            if ((uint)(i + 3) >= (uint)span.Length)
-                            {
-                                goto NoMatchFound;
-                            }
-                            
-                            if (((span[i + 3] | 0x20) == 'n') &&
-                                ((span[i] | 0x20) == 'a'))
-                            {
-                                base.runtextpos = pos + i;
-                                return true;
-                            }
+                            base.runtextpos = pos + i;
+                            return true;
                         }
                     }
                     
                     // No match found.
-                    NoMatchFound:
                     base.runtextpos = inputSpan.Length;
                     return false;
                 }
             0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xFF, 0x03,
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
+        
+        /// <summary>Supports searching for the string "avant".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_avant_OrdinalIgnoreCase = SearchValues.Create(["avant"], StringComparison.OrdinalIgnoreCase);
     }
 }
"(week)(\\s*)(?<number>\\d\\d|\\d|0\\d)" (194 uses)
[GeneratedRegex("(week)(\\s*)(?<number>\\d\\d|\\d|0\\d)", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 5 characters.
                     if (pos <= inputSpan.Length - 5)
                     {
-                        // The pattern matches a character in the set [Kk\u212A] at index 3.
-                        // Find the next occurrence. If it can't be found, there's no match.
-                        ReadOnlySpan<char> span = inputSpan.Slice(pos);
-                        for (int i = 0; i < span.Length - 4; i++)
+                        // The pattern has the literal "wee" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_wee_OrdinalIgnoreCase);
+                        if (i >= 0)
                         {
-                            int indexOfPos = span.Slice(i + 3).IndexOfAny('K', 'k', 'K');
-                            if (indexOfPos < 0)
-                            {
-                                goto NoMatchFound;
-                            }
-                            i += indexOfPos;
-                            
-                            if (((span[i] | 0x20) == 'w') &&
-                                ((span[i + 1] | 0x20) == 'e'))
-                            {
-                                base.runtextpos = pos + i;
-                                return true;
-                            }
+                            base.runtextpos = pos + i;
+                            return true;
                         }
                     }
                     
                     // No match found.
-                    NoMatchFound:
                     base.runtextpos = inputSpan.Length;
                     return false;
                 }
         
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
+        
+        /// <summary>Supports searching for the string "wee".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_wee_OrdinalIgnoreCase = SearchValues.Create(["wee"], StringComparison.OrdinalIgnoreCase);
     }
 }
"\\b(entre\\s*(le|la(s)?)?)\\b" (194 uses)
[GeneratedRegex("\\b(entre\\s*(le|la(s)?)?)\\b", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 5 characters.
                     if (pos <= inputSpan.Length - 5)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_3200475DE471EA58FF8C7B5F0CA4A9515EFACDBAA912EFAC506148E560A6D596);
+                        // The pattern has the literal "entre" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_entre_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
             0xFE, 0xFF, 0xFF, 0x87, 0xFE, 0xFF, 0xFF, 0x07
         };
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_3200475DE471EA58FF8C7B5F0CA4A9515EFACDBAA912EFAC506148E560A6D596 = SearchValues.Create(["ENTR", "eNTR", "EnTR", "enTR", "ENtR", "eNtR", "EntR", "entR", "ENTr", "eNTr", "EnTr", "enTr", "ENtr", "eNtr", "Entr", "entr"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "entre".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_entre_OrdinalIgnoreCase = SearchValues.Create(["entre"], StringComparison.OrdinalIgnoreCase);
     }
 }
"(mes)(\\s*)((do|da|de))" (193 uses)
[GeneratedRegex("(mes)(\\s*)((do|da|de))", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 5 characters.
                     if (pos <= inputSpan.Length - 5)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_DC6FBF049DFCA75A0085CE45822CFFFBACDEEEF2607AA4096D769AC2377EF021);
+                        // The pattern has the literal "mes" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_mes_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_DC6FBF049DFCA75A0085CE45822CFFFBACDEEEF2607AA4096D769AC2377EF021 = SearchValues.Create(["MES", "mES", "MeS", "meS", "MEs", "mEs", "Mes", "mes"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "mes".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_mes_OrdinalIgnoreCase = SearchValues.Create(["mes"], StringComparison.OrdinalIgnoreCase);
     }
 }
"(semana)(\\s*)((do|da|de))" (193 uses)
[GeneratedRegex("(semana)(\\s*)((do|da|de))", RegexOptions.IgnoreCase | RegexOptions.Singleline)]
                     // Any possible match is at least 8 characters.
                     if (pos <= inputSpan.Length - 8)
                     {
-                        // The pattern has multiple strings that could begin the match. Search for any of them.
-                        // If none can be found, there's no match.
-                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfAnyStrings_Ordinal_1B7E1CD8AF955A2769ABD6F7FC469F9212B5B795E7DC6CF668A8EE08D2419045);
+                        // The pattern has the literal "semana" ordinal case-insensitive at the beginning of the pattern. Find the next occurrence.
+                        // If it can't be found, there's no match.
+                        int i = inputSpan.Slice(pos).IndexOfAny(Utilities.s_indexOfString_semana_OrdinalIgnoreCase);
                         if (i >= 0)
                         {
                             base.runtextpos = pos + i;
         /// <summary>Whether <see cref="s_defaultTimeout"/> is non-infinite.</summary>
         internal static readonly bool s_hasTimeout = s_defaultTimeout != Regex.InfiniteMatchTimeout;
         
-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_1B7E1CD8AF955A2769ABD6F7FC469F9212B5B795E7DC6CF668A8EE08D2419045 = SearchValues.Create(["SEMA", "sEMA", "SeMA", "seMA", "SEmA", "sEmA", "SemA", "semA", "SEMa", "sEMa", "SeMa", "seMa", "SEma", "sEma", "Sema", "sema"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "semana".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_semana_OrdinalIgnoreCase = SearchValues.Create(["semana"], StringComparison.OrdinalIgnoreCase);
     }
 }

For more diff examples, see https://gist.github.com/MihuBot/4212adf85284694d34d378af2233fa23

JIT assembly changes
Total bytes of base: 54284087
Total bytes of diff: 54262264
Total bytes of delta: -21823 (-0.04 % of base)
Total relative delta: -31.17
    diff is an improvement.
    relative diff is an improvement.

For a list of JIT diff regressions, see Regressions.md
For a list of JIT diff improvements, see Improvements.md

Sample source code for further analysis
const string JsonPath = "RegexResults-1792.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FHwNbpHA");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

@danmoseley
Copy link
Member Author

danmoseley commented Feb 25, 2026

incidentally for this

-        /// <summary>Supports searching for the specified strings.</summary>
-        internal static readonly SearchValues<string> s_indexOfAnyStrings_Ordinal_DA0DF7757216159252C4FA00AB5982AAA4403D2C43304873401C53E36F92CA04 = SearchValues.Create(["FROM", "fROM", "FrOM", "frOM", "FRoM", "fRoM", "FroM", "froM", "FROm", "fROm", "FrOm", "frOm", "FRom", "fRom", "From", "from"], StringComparison.Ordinal);
+        /// <summary>Supports searching for the string "from".</summary>
+        internal static readonly SearchValues<string> s_indexOfString_from_OrdinalIgnoreCase = SearchValues.Create(["from"], StringComparison.OrdinalIgnoreCase);

Should SearchValues anyway recognize it's been given all case variations and collapse to a single one with ignore case?

When TryGetOrdinalCaseInsensitiveString recurses into an inner
Concatenation (from an unwrapped Capture) and only partially consumes
it, stop iterating the outer Concatenation. Otherwise subsequent
siblings are incorrectly appended to the prefix string. For example
(abcde|abcfg)\( was producing 'abc(' instead of 'abc'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 25, 2026 18:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

danmoseley and others added 3 commits February 25, 2026 11:51
Add tests for adjacent captures, captures at non-zero position,
single-char captures (unwrap to Set), empty captures (unwrap to Empty),
partial inner Concatenate consumption with different trailing node kinds,
and Atomic groups (documents current conservative behavior).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Atomic groups only affect backtracking behavior, not what text is
matched, so they can safely be unwrapped during prefix analysis just
like Capture nodes. This allows patterns like ab(?>cd)ef with
IgnoreCase to extract the full prefix 'abcdef'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 25, 2026 19:11
@danmoseley
Copy link
Member Author

With AI we spotted the Atomic case can easily be handled here as well. And we added all the tests we can collectively think of.

@danmoseley
Copy link
Member Author

@MihuBot regexdiff

Accept both PR's capture-group test cases and upstream's alternation
prefix-extraction test cases in RegexFindOptimizationsTests.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 3, 2026 00:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Split into two tests:
- DeepCaptureNesting: pure nested captures (((...ab...))) exercises the
  iterative Capture-walking loop in FindPrefixOrdinalCaseInsensitive.
- InterleavedCaptureNesting: interleaved Capture+Concatenate pattern
  (...(ab)ab...)ab exercises the recursive Capture-unwrapping and
  inner-Concatenate path in TryGetOrdinalCaseInsensitiveString.
  Uses depth 5 (must succeed) and 5000 (tests stack guard, must not crash).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

…st pattern construction

- Update unwrapCaptures param docs and inline comment to accurately
  explain why unwrapping Atomic is safe for prefix analysis (atomicity
  affects backtracking, not which characters are matched at a position)
- Replace O(n^2) string concatenation in stress test with O(n)
  StringBuilder construction

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley danmoseley force-pushed the regex-redux/fix-ignorecase-capture-gap branch from f9ab080 to 6cbaa67 Compare March 3, 2026 21:42
@danmoseley
Copy link
Member Author

issues are the "dead letter" problem. they're in unrelated libraries, so when this is signed off we can bypass.

…c nesting test

- Rename unwrapCaptures parameter to forPrefixAnalysis per review feedback
  (it now also unwraps Atomic, not just Capture)
- Add Atomic group variants to the interleaved nesting stress test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 17, 2026 19:58
@danmoseley danmoseley enabled auto-merge (squash) March 17, 2026 19:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@danmoseley danmoseley merged commit cf53805 into dotnet:main Mar 18, 2026
91 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants