Vectorize RegexInterpreter opcode loops for Oneloop, Onerep, Notonerep, and MatchString by danmoseley · Pull Request #124628 · dotnet/runtime

danmoseley · 2026-02-20T07:13:03Z

The RegexInterpreter already had a precedent for vectorizing per-character loops: the Notoneloop/Notoneloopatomic opcode used IndexOf for left-to-right matching. This PR extends that pattern to four more opcodes:

Oneloop/Oneloopatomic (a+, a*): Use IndexOfAnyExcept(ch) instead of a per-char loop
Onerep (a{N}): Use ContainsAnyExcept(ch) instead of a per-char equality loop
Notonerep ([^x]{N}): Use Contains(ch) instead of a per-char inequality loop
MatchString (literal strings): Use SequenceEqual instead of a per-char comparison loop

All optimizations apply only to left-to-right matching paths. Right-to-left paths (rare) are left unchanged as they can't benefit from forward-scanning vectorization.

These methods (IndexOfAnyExcept, ContainsAnyExcept, Contains, SequenceEqual) are SIMD-accelerated in .NET and process 16–32 chars at a time vs 1-at-a-time in the original loops.

Benchmark Results

Tested on Intel Core i9-14900K, .NET 11.0.0-dev, using BenchmarkDotNet with --corerun comparing before and after builds:

Benchmark	Before	After	Speedup
Oneloop `a+` (64 chars)	89 ns	81 ns	~1.1x
Oneloop `a+` (256 chars)	180 ns	85 ns	~2.1x
Oneloop `a+` (1024 chars)	430 ns	62 ns	~7x
*Oneloop `a`** (256 chars)	144 ns	43 ns	~3.3x
Onerep `a{64}`	58 ns	28 ns	~2x
Onerep `a{256}`	245 ns	52 ns	~4.7x
Notonerep `[^x]{64}`	87 ns	28 ns	~3.1x
Notonerep `[^x]{256}`	216 ns	30 ns	~7.2x
MatchString (8 chars)	29 ns	26 ns	~1.1x
MatchString (16 chars)	31 ns	28 ns	~1.1x
MatchString (52 chars)	52 ns	29 ns	~1.8x

Zero regressions. Zero allocation changes. Improvements scale with input length as expected from SIMD vectorization.

Benchmark source code

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using MicroBenchmarks;

namespace System.Text.RegularExpressions.Tests
{
    /// <summary>
    /// Benchmarks targeting specific interpreter opcode paths:
    /// Oneloop, Onerep, Notonerep, and literal string matching (MatchString).
    /// Uses RegexOptions.None to force the interpreter engine.
    /// </summary>
    [BenchmarkCategory(Categories.Libraries, Categories.Regex)]
    public class Perf_Regex_Interpreter_Vectorize
    {
        // --- Inputs ---
        // Short input (64 chars) to measure per-call overhead
        private const string ShortA = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"; // 64 'a's
        // Medium input (256 chars)
        private const string MediumA = ShortA + ShortA + ShortA + ShortA; // 256 'a's
        // Long input (1024 chars)
        private const string LongA = MediumA + MediumA + MediumA + MediumA; // 1024 'a's

        private const string ShortText = "Sherlock Holmes lived at 221B Baker Street in London";
        private const string MediumText = ShortText + " and was known as the greatest detective of all time. His companion Dr. Watson chronicled their many adventures together through foggy London nights.";
        private const string LongText = MediumText + MediumText + MediumText + MediumText;

        // No 'x' chars - for Notonerep [^x]{N}
        private const string NoXShort = "abcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmn"; // 64 chars, no 'x'
        private const string NoXMedium = NoXShort + NoXShort + NoXShort + NoXShort; // 256 chars

        // === Oneloop: greedy single-char loops like a+, a*, [^x]+ ===
        // These use IndexOfAnyExcept in the optimized path

        private Regex _oneloopPlus64, _oneloopPlus256, _oneloopPlus1024;
        private Regex _oneloopStar256;

        [GlobalSetup(Target = nameof(Oneloop_Plus_64))]
        public void Setup_Oneloop_Plus_64() => _oneloopPlus64 = new Regex("a+", RegexOptions.None);

        [GlobalSetup(Target = nameof(Oneloop_Plus_256))]
        public void Setup_Oneloop_Plus_256() => _oneloopPlus256 = new Regex("a+", RegexOptions.None);

        [GlobalSetup(Target = nameof(Oneloop_Plus_1024))]
        public void Setup_Oneloop_Plus_1024() => _oneloopPlus1024 = new Regex("a+", RegexOptions.None);

        [GlobalSetup(Target = nameof(Oneloop_Star_256))]
        public void Setup_Oneloop_Star_256() => _oneloopStar256 = new Regex("a*", RegexOptions.None);

        [Benchmark]
        public Match Oneloop_Plus_64() => _oneloopPlus64.Match(ShortA);

        [Benchmark]
        public Match Oneloop_Plus_256() => _oneloopPlus256.Match(MediumA);

        [Benchmark]
        public Match Oneloop_Plus_1024() => _oneloopPlus1024.Match(LongA);

        [Benchmark]
        public Match Oneloop_Star_256() => _oneloopStar256.Match(MediumA);

        // === Onerep: fixed-count single-char like a{64}, a{256} ===
        // These use ContainsAnyExcept in the optimized path

        private Regex _onerep64, _onerep256;

        [GlobalSetup(Target = nameof(Onerep_64))]
        public void Setup_Onerep_64() => _onerep64 = new Regex("a{64}", RegexOptions.None);

        [GlobalSetup(Target = nameof(Onerep_256))]
        public void Setup_Onerep_256() => _onerep256 = new Regex("a{256}", RegexOptions.None);

        [Benchmark]
        public bool Onerep_64() => _onerep64.IsMatch(ShortA);

        [Benchmark]
        public bool Onerep_256() => _onerep256.IsMatch(MediumA);

        // === Notonerep: fixed-count not-char like [^x]{64}, [^x]{256} ===
        // These use Contains in the optimized path

        private Regex _notonerep64, _notonerep256;

        [GlobalSetup(Target = nameof(Notonerep_64))]
        public void Setup_Notonerep_64() => _notonerep64 = new Regex("[^x]{64}", RegexOptions.None);

        [GlobalSetup(Target = nameof(Notonerep_256))]
        public void Setup_Notonerep_256() => _notonerep256 = new Regex("[^x]{256}", RegexOptions.None);

        [Benchmark]
        public bool Notonerep_64() => _notonerep64.IsMatch(NoXShort);

        [Benchmark]
        public bool Notonerep_256() => _notonerep256.IsMatch(NoXMedium);

        // === MatchString: literal string matching ===
        // These use SequenceEqual in the optimized path

        private Regex _matchStr8, _matchStr16, _matchStr52;

        [GlobalSetup(Target = nameof(MatchString_8))]
        public void Setup_MatchString_8() => _matchStr8 = new Regex("Sherlock", RegexOptions.None);

        [GlobalSetup(Target = nameof(MatchString_16))]
        public void Setup_MatchString_16() => _matchStr16 = new Regex("Sherlock Holmes ", RegexOptions.None);

        [GlobalSetup(Target = nameof(MatchString_52))]
        public void Setup_MatchString_52() => _matchStr52 = new Regex("Sherlock Holmes lived at 221B Baker Street in Lo", RegexOptions.None);

        [Benchmark]
        public bool MatchString_8() => _matchStr8.IsMatch(ShortText);

        [Benchmark]
        public bool MatchString_16() => _matchStr16.IsMatch(ShortText);

        [Benchmark]
        public bool MatchString_52() => _matchStr52.IsMatch(LongText);
    }
}

Replace the per-character loop in the Oneloop/Oneloopatomic opcode handler with a vectorized IndexOfAnyExcept call for left-to-right matching. This mirrors the existing optimization already applied to Notoneloop (which uses IndexOf), enabling SIMD-accelerated scanning when matching repeated occurrences of a single character (e.g. a+ or a{3,}). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the per-character loop in the Onerep opcode handler with a vectorized ContainsAnyExcept call for left-to-right matching. This enables SIMD-accelerated verification when matching a fixed number of occurrences of a single character (e.g. the minimum repetitions of a{5,}). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the per-character loop in the Notonerep opcode handler with a vectorized Contains call for left-to-right matching. This enables SIMD-accelerated verification when matching a fixed number of characters that must not be a specific character (e.g. [^a]{5}). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace the per-character backwards comparison loop in MatchString with a vectorized SequenceEqual call for left-to-right matching. This enables SIMD-accelerated string comparison when matching literal multi-character strings within regex patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR optimizes hot-path opcode handling in RegexInterpreter by replacing per-character loops with SIMD-accelerated span operations for left-to-right matching, extending the existing vectorization precedent in the interpreter.

Changes:

Vectorize literal string matching (Multi / MatchString) using ReadOnlySpan<char>.SequenceEqual.
Vectorize fixed-count opcodes Onerep and Notonerep using ContainsAnyExcept / Contains for left-to-right paths.
Vectorize greedy single-char loops Oneloop / Oneloopatomic using IndexOfAnyExcept for left-to-right paths.

danmoseley · 2026-02-20T08:21:08Z

Real-world impact estimate: Analyzing the 15,817 unique patterns in the regex test corpus (assuming interpreter engine):

Multi/SequenceEqual is the most broadly impactful: ~38% of patterns contain literal substrings of 8+ chars (one SIMD register width), where vectorization provides clear wins. At 16+ chars it's ~10%.
Oneloop (a+, x*) appears in ~1% of patterns; actual benefit is input-length-dependent.
For +/* quantifiers generally, the speedup depends on matched length at runtime — a pattern like [^:]+ could match 1 char or 1000.

Follow-up PR #124630 adds SearchValues-based vectorization for Setloop/Setrep character class opcodes ([a-z]+, [0-9]{4}, etc.), covering an additional ~35% of patterns with explicit character classes (though again, benefit scales with matched length).

stephentoub · 2026-02-20T12:23:29Z

@MihuBot benchmark Regex

MihuBot · 2026-02-20T13:59:40Z

See benchmark results at https://gist.github.com/MihuBot/2004009ab9c5dbd509e407cc62b2d5a5

danmoseley · 2026-02-20T19:01:50Z

MihuBot Benchmark Analysis

Compiled and NonBacktracking paths are entirely unaffected (ratios 0.98–1.02 across all suites), as expected since the PR only modifies interpreter opcodes.

Interpreter regressions flagged by MihuBot

Benchmark	Main	PR	Ratio
Email_IsMatch None	222.5 ns	249.2 ns	1.12
BoostDocs Id=5 None	213.5 ns	236.6 ns	1.11
BoostDocs Id=9 None	70.3 ns	77.3 ns	1.10
MatchWord None	879.5 ns	962.2 ns	1.09
BoostDocs Id=6 None	70.5 ns	75.0 ns	1.06
SliceSlice IgnoreCase None	680.9 ms	717.4 ms	1.05
Backtracking None	814.7 ns	855.4 ns	1.05
Cache 400K/7/15	27.9 ms	31.0 ms	1.11

Investigation: do these hit modified opcodes?

I mapped each regressed benchmark's pattern to the interpreter opcodes it exercises:

Email_IsMatch ^([a-zA-Z0-9_\-\.]+)@... → uses Setloop for character classes — not modified by this PR
BoostDocs Id=5 (same email pattern) → Setloop — not modified
BoostDocs Id=9 ^\d{1,2}/\d{1,2}/\d{4}$ → Setloop/Setrep for \d, One for / — not modified
BoostDocs Id=6 ^[a-zA-Z]{1,2}[0-9]... {0,1}... → Setloop/Setrep for char classes; Oneloop only for {0,1} with len≤1 — marginally touched
MatchWord tempus|magna|semper → alternation + MatchString for 5-6 char literals — touched, but SequenceEqual overhead negligible at this length
Backtracking .*(ss) → Setloop for .*, MatchString for 2-char "ss" — marginally touched, dominated by backtracking cost
SliceSlice IgnoreCase (every word, case-insensitive) → IgnoreCase converts single chars to Set opcodes — not modified
Cache 400K/7/15 → cache lookup benchmark, not pattern-matching bound — not modified

5 of 8 regressions don't exercise any modified opcode. The 3 that marginally touch modified code are dominated by other costs (backtracking, alternation, cache behavior).

Root cause: JIT code layout effects

TryMatchAtCurrentPosition is an ~830-line method with a 40+ case switch. Adding if (!_rightToLeft) branches to 3 case arms changes the JIT-compiled native code layout for the entire method — shifting instruction cache boundaries, branch predictor state, and basic block alignment for all opcodes including unmodified ones. The same effect causes the improvement on \w+\s+Holmes\s+\w+ None (0.89 ratio, 11% faster) and the noise in the IgnoreCase Compiled suite (ReplaceWords 1.28 but SplitWords 0.84 — clearly not real).

These are interpreter-only, sub-microsecond-scale, on shared cloud VMs, affecting unmodified code paths — classic JIT layout noise.

danmoseley · 2026-02-20T23:25:47Z

build analysis is green - test failures are unrelated. ready for review?

danmoseley · 2026-02-23T05:42:44Z

MihuBot Results vs. Local Benchmarks

The MihuBot standard benchmark suites (Sherlock, Leipzig, BoostDocs, etc.) don't directly validate the 2x-7x local speedups because they use complex real-world patterns where the hot paths are mostly Setloop/Setrep (character classes) rather than the Oneloop/Onerep/Notonerep/MatchString opcodes modified here, and literal strings in the patterns are short (e.g. Sherlock = 8 chars, where the local benchmarks show only ~1.1x).

What MihuBot does confirm:

Compiled/NonBacktracking paths are flat (0.98-1.02 ratios across all suites) -- expected since only interpreter opcodes were changed.
No real regressions -- the flagged interpreter regressions (1.05-1.12x) don't exercise modified opcodes (they hit Setloop/Setrep/cache paths); see analysis above.
Directionally positive interpreter results:
- \w+\s+Holmes\s+\w+ None: 0.89 ratio (11% faster) -- plausibly from MatchString on Holmes
- the None: 0.97, Sherlock Holmes None: 0.98, Sherlock\s+Holmes None: 0.97 -- consistent with small MatchString wins on short strings
- the\s+\w+ None: 0.97

The local microbenchmarks are the right tool for validating these specific codepaths since they isolate the modified opcodes with long enough inputs to show the SIMD gains.

stephentoub · 2026-03-15T00:53:49Z

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

                if (inputSpan.Length - runtextpos < c)
                {
                    return false;
                }

-                pos = runtextpos + c;
+                if (!inputSpan.Slice(runtextpos, c).SequenceEqual(str.AsSpan()))


Can this be simplified to:

Suggested change

if (!inputSpan.Slice(runtextpos, c).SequenceEqual(str.AsSpan()))

if (runtextpos > inputSpan.Length ||

!inputSpan.Slice(runtextpos).StartsWith(str.AsSpan()))

? Then you also wouldn't need the earlier int c = str.Length for this code path.

stephentoub · 2026-03-15T00:55:18Z

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

+                while (c != 0)
                {
-                    return false;
+                    if (str[--c] != inputSpan[--pos])
+                    {
+                        return false;
+                    }
                }


Can/should this also be a SequenceEqual or EndsWith or equivalent?

stephentoub · 2026-03-15T00:59:14Z

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

+                            if (!_rightToLeft)
                            {
-                                if (Forwardcharnext(inputSpan) != ch)
+                                if (inputSpan.Slice(runtextpos, c).ContainsAnyExcept(ch))


Is this one actually beneficial? It's pretty rare to have long runs of a single specific character.

stephentoub · 2026-03-15T01:00:47Z

...raries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs

-                                if (Forwardcharnext(inputSpan) != ch)
+                                // We're left-to-right, so we can employ the vectorized IndexOfAnyExcept
+                                // to search for any character that isn't the target.
+                                i = inputSpan.Slice(runtextpos, len).IndexOfAnyExcept(ch);


Same question.

If these onerep/loop/loopatomic actually improve performance on real regexes rather than microbenchmarks targeting these cases, great. Otherwise, though, I'd rather avoid adding more specialized code paths for things that only help with fake scenarios.

danmoseley and others added 4 commits February 19, 2026 22:23

Copilot AI review requested due to automatic review settings February 20, 2026 07:13

github-actions bot added the area-System.Text.RegularExpressions label Feb 20, 2026

dotnet-policy-service bot assigned danmoseley Feb 20, 2026

Copilot started reviewing on behalf of danmoseley February 20, 2026 07:13 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

danmoseley mentioned this pull request Feb 20, 2026

Use SearchValues for Setrep/Setloop regex interpreter opcodes #124630

Closed

build-analysis bot mentioned this pull request Feb 20, 2026

[android][clr] No peer certificates when executing System.Net.Http.Functional.Tests on Android emulator #124526

Open

MihuBot mentioned this pull request Feb 20, 2026

[Benchmark X64] [danmoseley] Vectorize RegexInterpreter opcode loops for One ... MihuBot/runtime-utils#1773

Open

danmoseley mentioned this pull request Feb 20, 2026

FileSystemWatcher_SymbolicLink_TargetsDirectory_Create_IncludeSubdirectories failed with missed event #124677

Open

stephentoub reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize RegexInterpreter opcode loops for Oneloop, Onerep, Notonerep, and MatchString#124628

Vectorize RegexInterpreter opcode loops for Oneloop, Onerep, Notonerep, and MatchString#124628
danmoseley wants to merge 4 commits intodotnet:mainfrom
danmoseley:vectorize-regex-interpreter

danmoseley commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

danmoseley commented Feb 20, 2026

Uh oh!

stephentoub commented Feb 20, 2026

Uh oh!

MihuBot commented Feb 20, 2026

Uh oh!

danmoseley commented Feb 20, 2026

Uh oh!

danmoseley commented Feb 20, 2026

Uh oh!

danmoseley commented Feb 23, 2026

Uh oh!

stephentoub Mar 15, 2026 •

edited

Loading

Uh oh!

stephentoub Mar 15, 2026

Uh oh!

stephentoub Mar 15, 2026

Uh oh!

stephentoub Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if (!inputSpan.Slice(runtextpos, c).SequenceEqual(str.AsSpan()))
	if (runtextpos > inputSpan.Length \|\|
	!inputSpan.Slice(runtextpos).StartsWith(str.AsSpan()))

Conversation

danmoseley commented Feb 20, 2026

Benchmark Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

danmoseley commented Feb 20, 2026

Uh oh!

stephentoub commented Feb 20, 2026

Uh oh!

MihuBot commented Feb 20, 2026

Uh oh!

danmoseley commented Feb 20, 2026

MihuBot Benchmark Analysis

Interpreter regressions flagged by MihuBot

Investigation: do these hit modified opcodes?

Root cause: JIT code layout effects

Uh oh!

danmoseley commented Feb 20, 2026

Uh oh!

danmoseley commented Feb 23, 2026

MihuBot Results vs. Local Benchmarks

Uh oh!

stephentoub Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

stephentoub Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stephentoub Mar 15, 2026 •

edited

Loading