Skip to content

Use SearchValues for Setrep/Setloop regex interpreter opcodes#124630

Closed
danmoseley wants to merge 4 commits intodotnet:mainfrom
danmoseley:interpreter-searchvalues
Closed

Use SearchValues for Setrep/Setloop regex interpreter opcodes#124630
danmoseley wants to merge 4 commits intodotnet:mainfrom
danmoseley:interpreter-searchvalues

Conversation

@danmoseley
Copy link
Member

Summary

Precompute SearchValues<char> for character class strings at regex construction time, and use them in the Setrep and Setloop/Setloopatomic interpreter opcode handlers to replace per-character CharInClass loops with vectorized SIMD-accelerated span operations.

Character classes that use Unicode categories, subtraction+negation, or have more than 128 characters fall back to the existing per-character path. A SetSearchValues wrapper struct encapsulates SearchValues<char> and the set's negation flag so the interpreter doesn't need to know whether the class was defined as negated.

This is a follow-up to #124628 which vectorized the Oneloop, Onerep, Notonerep, and MatchString opcodes.

Benchmark Results

Benchmark Before (ns) After (ns) Ratio Speedup
Setloop_AZ_64 115.12 47.04 0.41 2.4x
Setloop_AZ_256 290.71 47.36 0.16 6.1x
Setloop_Digit_256 285.67 51.36 0.18 5.6x
Setrep_AZ_64 91.59 30.67 0.33 3.0x
Setrep_AZ_256 247.85 32.99 0.13 7.5x
Benchmark code
using BenchmarkDotNet.Attributes;
using MicroBenchmarks;

namespace System.Text.RegularExpressions.Tests
{
    [BenchmarkCategory(Categories.Libraries, Categories.Regex)]
    public class Perf_Regex_Interpreter_Vectorize
    {
        // === Setloop: greedy character class loops like [a-z]+, \w+, [A-Za-z0-9]+ ===
        // These use SearchValues + IndexOfAnyExcept in the optimized path

        private Regex _setloopAZ64, _setloopAZ256, _setloopDigit256;

        [GlobalSetup(Target = nameof(Setloop_AZ_64))]
        public void Setup_Setloop_AZ_64() => _setloopAZ64 = new Regex("[a-z]+", RegexOptions.None);

        [GlobalSetup(Target = nameof(Setloop_AZ_256))]
        public void Setup_Setloop_AZ_256() => _setloopAZ256 = new Regex("[a-z]+", RegexOptions.None);

        [GlobalSetup(Target = nameof(Setloop_Digit_256))]
        public void Setup_Setloop_Digit_256() => _setloopDigit256 = new Regex("[0-9]+", RegexOptions.None);

        private const string LowerAZ64 = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijkl"; // 64 chars
        private const string LowerAZ256 = LowerAZ64 + LowerAZ64 + LowerAZ64 + LowerAZ64;
        private const string Digits256 = "1234567890123456789012345678901234567890123456789012345678901234" +
                                         "1234567890123456789012345678901234567890123456789012345678901234" +
                                         "1234567890123456789012345678901234567890123456789012345678901234" +
                                         "1234567890123456789012345678901234567890123456789012345678901234";

        [Benchmark]
        public Match Setloop_AZ_64() => _setloopAZ64.Match(LowerAZ64);

        [Benchmark]
        public Match Setloop_AZ_256() => _setloopAZ256.Match(LowerAZ256);

        [Benchmark]
        public Match Setloop_Digit_256() => _setloopDigit256.Match(Digits256);

        // === Setrep: fixed-count character class like [a-z]{64}, [a-z]{256} ===
        // These use SearchValues + ContainsAnyExcept in the optimized path

        private Regex _setrepAZ64, _setrepAZ256;

        [GlobalSetup(Target = nameof(Setrep_AZ_64))]
        public void Setup_Setrep_AZ_64() => _setrepAZ64 = new Regex("[a-z]{64}", RegexOptions.None);

        [GlobalSetup(Target = nameof(Setrep_AZ_256))]
        public void Setup_Setrep_AZ_256() => _setrepAZ256 = new Regex("[a-z]{256}", RegexOptions.None);

        [Benchmark]
        public bool Setrep_AZ_64() => _setrepAZ64.IsMatch(LowerAZ64);

        [Benchmark]
        public bool Setrep_AZ_256() => _setrepAZ256.IsMatch(LowerAZ256);
    }
}

Precompute SearchValues<char> for character class strings at regex
construction time. Use them in the Setrep and Setloop/Setloopatomic
opcode handlers to replace per-character CharInClass loops with
vectorized SIMD-accelerated span operations.

Character classes that use Unicode categories, subtraction+negation,
or have more than 128 characters fall back to the existing per-character
path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 20, 2026 07:56
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the regex interpreter’s SIMD/vectorization work by precomputing SearchValues<char> for eligible character-class strings at construction time, then using those precomputed matchers in the Setrep and Setloop/Setloopatomic opcode handlers to replace per-character CharInClass loops with span-based vectorized operations.

Changes:

  • Add RegexInterpreterCode.StringsSetSearchValues to precompute SearchValues<char> (plus negation) for small/enumerable character classes.
  • Introduce SetSearchValues helper struct to encapsulate the SearchValues<char> and negation semantics.
  • Update RegexInterpreter Setrep and Setloop handlers to use ContainsAnyExcept / IndexOfAnyExcept (or inverted forms for negated sets) when running left-to-right.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs Precomputes SearchValues<char>-based matchers for eligible set strings and exposes them to the interpreter.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs Uses precomputed set matchers to vectorize Setrep and Setloop opcode execution for left-to-right matching.

@danmoseley
Copy link
Member Author

Real-world impact estimate: Analyzing the 15,817 unique patterns in the regex test corpus (assuming interpreter engine):

  • Setloop ([a-z]+, [^:=]+, etc.): ~35% of patterns have explicit character class loops eligible for SearchValues vectorization. Actual speedup is input-length-dependent — the 2.4x–7.5x benchmark wins above reflect 64–256 char matches.
  • Setrep ([a-f0-9]{32}, etc.): ~2% of patterns have fixed-count character classes at 8+ chars.
  • Another ~35% of patterns use shorthand classes (\w+, \d+, \s+) which use Unicode categories — GetSetChars returns 0 for these so they fall back to per-character CharInClass. These could be a future optimization opportunity.

danmoseley and others added 2 commits February 20, 2026 01:28
The Strings table contains both character class strings and Multi
literal strings. Add validation that the string has a well-formed
char-class encoding (valid flags byte, consistent lengths) before
calling GetSetChars, which assumes well-formed input. Also clarify
the comment about GetSetChars behavior for negated sets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A Multi literal starting with \0 and having an even-valued second byte
and \0 at index 2 satisfies all CanEasilyEnumerateSetContents checks,
causing GetSetChars to enumerate past the end of the string. This test
verifies CreateSetSearchValues validates the encoding first.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 20, 2026 08:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@stephentoub
Copy link
Member

What is the impact on Regex construction time?

We can make the Regex interpreter much faster by not using the regex interpreter, e.g. implicitly adding .Compiled, so the main performance benefit of the interpreter is faster construction for cases where it will be rarely used.

Also, can you share the output of benchmarks for shorter runs in the input, like where a set+ will match only 1, 2, 3 characters?

@stephentoub
Copy link
Member

@MihuBot benchmark Regex

@MihuBot
Copy link

MihuBot commented Feb 20, 2026

@danmoseley
Copy link
Member Author

@stephentoub Here are benchmarks addressing both concerns:

Construction cost:

Benchmark Before (ns) After (ns) Ratio Extra alloc
Ctor_NoCharClass 224 225 1.00 +56 B (5%)
Ctor_1CharClass 326 573 1.76 +88 B (6%)
Ctor_3CharClasses 768 1,457 1.90 +200 B (9%)
Ctor_5CharClasses 1,249 2,174 1.74 +304 B (10%)
Ctor_10CharClasses 2,025 3,578 1.77 +576 B (11%)
Ctor_RealWorld_Email 2,707 4,470 1.65 +240 B (4%)
Ctor_UnicodeCategories (\w+\s+\d+) 581 866 1.49 +104 B (6%)

SearchValues.Create adds ~200-250ns per character class. For a pattern with 10 classes, that's ~1.5µs extra construction cost.

Short match lengths ([a-z]+ matching N chars):

Match length Before (ns) After (ns) Ratio
1 42.4 42.1 0.99
2 43.0 43.7 1.02
3 45.2 43.1 0.95
8 50.0 44.0 0.88
16 56.6 44.4 0.79

Short matches (1-3 chars) are neutral — no regression. The SearchValues dispatch overhead is negligible because CharInClass per iteration is already relatively expensive (ASCII table lookup + bit ops). Wins start at ~8 chars.

So the tradeoff is ~1.7x slower construction vs 2-7x faster matching for longer character class runs. For a rarely-used regex where construction dominates, this is a net cost.

Benchmark code
[BenchmarkCategory(Categories.Libraries, Categories.Regex)]
public class Perf_Regex_Interpreter_SearchValues_Impact
{
    [Benchmark]
    public Regex Ctor_NoCharClass() => new Regex("hello world");

    [Benchmark]
    public Regex Ctor_1CharClass() => new Regex("[a-z]+");

    [Benchmark]
    public Regex Ctor_3CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+");

    [Benchmark]
    public Regex Ctor_5CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+[a-f]+[!@#]+");

    [Benchmark]
    public Regex Ctor_10CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+[a-f]+[!@#]+[g-m]+[4-8]+[N-T]+[x-z]+[,;:]+");

    [Benchmark]
    public Regex Ctor_RealWorld_Email() => new Regex(@"[a-z0-9]+(?:\.[a-z0-9]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?");

    [Benchmark]
    public Regex Ctor_UnicodeCategories() => new Regex(@"\w+\s+\d+");

    private Regex _setloop;

    [GlobalSetup(Targets = new[] {
        nameof(Setloop_Match1), nameof(Setloop_Match2), nameof(Setloop_Match3),
        nameof(Setloop_Match8), nameof(Setloop_Match16) })]
    public void Setup_Setloop() => _setloop = new Regex("[a-z]+", RegexOptions.None);

    [Benchmark]
    public Match Setloop_Match1() => _setloop.Match("a.");

    [Benchmark]
    public Match Setloop_Match2() => _setloop.Match("ab.");

    [Benchmark]
    public Match Setloop_Match3() => _setloop.Match("abc.");

    [Benchmark]
    public Match Setloop_Match8() => _setloop.Match("abcdefgh.");

    [Benchmark]
    public Match Setloop_Match16() => _setloop.Match("abcdefghijklmnop.");
}

@danmoseley
Copy link
Member Author

danmoseley commented Feb 20, 2026

So maybe this comes down to whether we prefer to optimize for

  • quick, throwaway interpreted Regexes newed up many times (not reused) that have several char classes and run on small inputs only so construction time is significant. these will regress, though absolute size of regression is around a microsecond each.
    or
  • interpreted Regexes that spend significant time matching char classes. these will improve with no upward limit on the improvement. pathological cases can get dramatically better.

I'm inclined to say it's a worthwhile tradeoff because it can improve performance dramatically, without limit in cases where significant time is being spent, against a very small construction impact.

In general changing from interpreted to generated (if pattern is static which is generally the case) is a no brainer improvement over interpreted. So the relevant scenario here is long standing libraries that are not being actively maintained. (I'm ignoring throwaway code where they're using interpreted just to type slightly less). Not sure how that impacts the calculus here. It is a good reason for caring about optimizing interpreter even though it's rarely the right choice to use for perf (not suggesting you're arguing otherwise)

thoughts?

@danmoseley
Copy link
Member Author

Analysis of the MihuBot benchmark results, focusing on interpreter (Options=None):

Matching wins (Sherlock corpus, interpreter only):

Pattern Main (ns) PR (ns) Ratio
[a-q][^u-z]{13}x 35,355 32,073 0.91
(?s).* 977,442 894,681 0.92
[a-zA-Z]+ing 9,478,381 8,917,699 0.94
\s[a-zA-Z]{0,12}ing\s 10,070,462 9,533,484 0.95
\w+\s+Holmes 7,916,515 7,599,869 0.96
Sher[a-z]+|Hol[a-z]+ 123,404 121,523 0.98
SliceSlice IgnoreCase 678.2 ms 628.7 ms 0.93

Neutral / no benefit (as expected — \w, \d, \s use Unicode categories, no SearchValues):

Pattern Ratio
\w+ 1.02
\b\w+n\b 0.99
\p{L} 1.01
Huck[a-zA-Z]+|Saw[a-zA-Z]+ 1.00

Construction overhead (Mariomkas, real-world patterns):

Pattern Ctor Main (µs) Ctor PR (µs) Ratio
IP validator (87 chars, many classes) 5.47 5.71 1.04
URL pattern (51 chars) 2.58 2.63 1.02
Email pattern 1.45 1.46 1.01

The real-world construction overhead is only 1-4%, much smaller than the ~1.7x my micro-benchmarks showed for minimal patterns. This is because SearchValues.Create is a small fraction of total construction cost once you include parsing, tree optimization, and opcode generation.

Summary: 5-9% matching wins for explicit character class patterns on realistic inputs, 1-4% construction overhead for real-world patterns, neutral for \w/\d/\s patterns.

@MihaZupan
Copy link
Member

MihaZupan commented Feb 20, 2026

If this goes beyond just the process startup overhead and you're creating SearchValues often, we could look into optimizing the Create more.
I did a simple test of vectorizing the min/max computation we do as the first step in #124667, and that makes Create ~2x cheaper (for a set with 64 chars). EgorBot/runtime-utils#654 (comment)

@danmoseley
Copy link
Member Author

@MihaZupan interesting, any reason we shouldn't take that change? seems pretty localized complexity.

otherwise, this is ready for review I think. (optionally, we could wait on change to SearchValues proposed)

@MihaZupan
Copy link
Member

Just that it's more (unsafe) code. I wouldn't block any changes on that though.

@danmoseley
Copy link
Member Author

Just that it's more (unsafe) code. I wouldn't block any changes on that though.

true. agreed

@danmoseley
Copy link
Member Author

danmoseley commented Mar 12, 2026

I will do an experiment of the worst case -- one shot, new Regex(pattern).IsMatch(input) (where interpreter is likely common, and any constrution perf regression would be most impactful), using various real world patterns with both short and long inputs, match and non match.

Then measure with BDN the overall cost of new Regex(pattern).IsMatch(input) in each case. This combines construction+search. If generally faster - this change is a win as search dominates the construction cost. If generally slower - we should not take it.

@danmoseley
Copy link
Member Author

SearchValues One-Shot Regex Performance Report

PR: #124630 — Add SearchValues.Create() precomputation to regex interpreter
Date: 2026-03-12/13
Machine: Windows 11, Intel Core i9-14900K, X64 RyuJIT AVX2

Executive Summary

The PR adds SearchValues.Create() precomputation during regex construction for character classes
with explicit ranges. This costs extra construction time (~10-30%) but can dramatically speed up
matching (up to 95% faster) for patterns with eligible char classes on long inputs.

Key question from Stephen Toub: Does the construction overhead pay for itself in one-shot
(construct + match) scenarios?

Answer: It depends on input length and pattern type:

  • Long inputs (2000+ chars): YES for most eligible patterns — OneShot is 28-95% faster
  • Short/medium inputs (<100 chars): NO — OneShot is 5-50% slower due to construction overhead
  • Some patterns regress regardless — negated char classes and patterns where char-class search
    isn't the bottleneck show pure overhead

Methodology

  • 20 real-world patterns from Regex_RealWorldPatterns.json, selected by NuGet download count
  • Grouped by eligible char class count: A (0, control), B (1), C (2-3), D (4+)
  • 6 input tiers per pattern: VeryShortNonMatch(5), ShortMatch(30), MediumNonMatch(100),
    MediumMatch(100), LongNonMatch(2000), LongMatch(2000)
  • A/B via BDN --coreRun with baseline at merge-base ab11a456596 and PR at 195e844920e
  • DLL hashes verified before and after: baseline DF945E46... vs PR EC83F5F6... (confirmed different)
  • 18/20 patterns completed successfully (C4, D4 failed due to input validation bugs)

Group Average OneShot Ratios (PR / Baseline)

< 1.0 = PR faster, > 1.0 = PR slower

Group VShort NM Short M Med NM Med M Long NM Long M Count Long Construct
A: 0 eligible (control) 1.02 1.04 0.99 1.03 1.01 1.05 1.02 1.03
B: 1 eligible 1.11 1.09 1.07 1.08 0.94 0.90 0.87 1.15
C: 2-3 eligible 1.07 1.04 1.07 1.02 0.96 0.88 0.90 1.10
D: 4+ eligible 1.12 1.05 0.96 1.05 0.60 0.72 0.58 1.17

Group A (control) shows ~1-4% noise, confirming no regression for patterns with 0 eligible classes.

Group D (4+ eligible) shows the clearest trend: Long-NonMatch averages 0.60 (40% faster),
Count-LongMatch averages 0.58 (42% faster).

Construction Overhead

Group Avg Construct Ratio Interpretation
A (control) 1.026 ~2.6% noise — no meaningful overhead
B (1 eligible) 1.154 ~15% overhead
C (2-3 eligible) 1.102 ~10% overhead
D (4+ eligible) 1.170 ~17% overhead

Absolute overhead ranges from ~1ns (A1) to +1.8us (D3 email validation).
Typical: +48-214ns for most patterns.

MatchOnly Ratios Reveal the Truth

MatchOnly benchmarks use a pre-constructed Regex, isolating the matching speedup from construction overhead.
This is the clearest signal of whether SearchValues helps a pattern.

Patterns where matching is dramatically faster (MatchOnly LongMatch ratio):

Pattern Regex MatchOnly Long OneShot Long Why
B4 ^(?<LINE>[0-9]*)$ 0.037 (96x) 0.320 (3x) Simple [0-9]* — SearchValues dominates
C2 ^...[0-9]*-...[0-9]*$ 0.042 (24x) 0.432 (2.3x) Two [0-9]* groups
D5 ^[A-Za-z0-9-_]+\.... 0.046 (22x) 0.448 (2.2x) Large char class with + quantifier
D2 ^...[0-9]*,...[0-9]*,... 0.053 (19x) 0.534 (1.9x) Four [0-9]* groups
D1 complex CSV-like 0.054 (18x) 0.052 (19x)* Multiple char classes, long scan

*D1 LongNonMatch: 1.8ms down to 94.1us — the biggest absolute win

Patterns where matching is NOT faster (MatchOnly ~1.0):

Pattern Regex MatchOnly Long OneShot Long Why
B1 [^a-zA-Z0-9_.] 0.995 1.166 Negated class — SearchValues doesn't help
B3 ^[.] 1.005 0.973 Single char — trivial class
C3 (?<lang>[a-z]{2,8})... 0.985 1.082 Bounded quantifier {2,8} — few iterations
D3 email validation 0.955 1.171 Complex pattern — backtracking dominates

Key Individual Results

Biggest Wins (OneShot)

Pattern Input Tier Baseline PR Ratio Saved
D1 LongNonMatch 1.8 ms 94.1 us 0.052 1.72 ms
D1 Count_Long 376.9 us 55.6 us 0.148 321 us
B4 LongMatch 2.6 us 833 ns 0.320 1.77 us
C2 Count_Long 3.0 us 1.3 us 0.427 1.7 us
D5 LongMatch 2.8 us 1.3 us 0.448 1.5 us
B4 LongNonMatch 1.7 us 811 ns 0.477 889 ns
D1 MediumNonMatch 9.5 us 4.7 us 0.499 4.8 us

Biggest Regressions (OneShot)

Pattern Input Tier Baseline PR Ratio Added
B1 VeryShortNM 463.9 ns 695.4 ns 1.499 +232 ns
B1 MediumNonMatch 531.7 ns 775.7 ns 1.459 +244 ns
D3 VeryShortNM 6.5 us 8.1 us 1.244 +1.6 us
C5 MediumNonMatch 602.3 ns 718.4 ns 1.193 +116 ns

B1 stands out: [^a-zA-Z0-9_.] (negated class) has 56.5% construction overhead with zero
matching benefit. This is the worst pattern — pure regression on all input sizes.

Conclusions

1. The optimization works spectacularly for the right patterns

Patterns with unbounded quantifiers (*, +) on non-negated char classes see 90-96% faster
matching
on long inputs. Even in one-shot mode (including construction overhead), these are
50-95% faster on long inputs.

2. Construction overhead is real but modest

~10-17% construction overhead for eligible patterns. In absolute terms, typically 50-200ns.
Group A (control) confirms this is specific to eligible patterns, not a general regression.

3. Short-input regression is consistent

For inputs under ~100 chars, construction overhead dominates and OneShot is ~5-15% slower.
This is the unavoidable cost of precomputation.

4. Negated classes are a concern

B1 ([^a-zA-Z0-9_.]) shows 50% regression with zero benefit. The PR should consider
excluding negated char classes from SearchValues optimization, or the overhead should be
investigated.

5. Breakeven analysis

The one-shot breakeven point depends on the pattern, but roughly:

  • Patterns with large char class + unbounded quantifier: breakeven at ~100-500 chars
  • Patterns with small/few eligible classes: breakeven at ~1000-2000 chars
  • Patterns with negated classes or bounded quantifiers: may never break even

6. Overall verdict

For the interpreter one-shot path, the optimization is a net positive for real-world workloads
where inputs tend to be medium-to-long. The dramatic wins on long inputs (19x for D1!) far
outweigh the modest short-input overhead. However, negated char classes should be investigated
as a potential exclusion.

Failed Patterns

  • C4 (Content-Disposition=...): LongMatch input didn't actually match the regex — input data bug
  • D4 (IP address validation): VeryShortNonMatch input accidentally matched — input data bug

These are test harness issues, not PR issues. 18/20 patterns provide solid data.

Appendix: All OneShot Ratios

Pat VShort NM Short M Med NM Med M Long NM Long M Count Construct Pattern
A1 1.053 1.035 1.035 1.044 1.001 1.006 1.008 1.003 \s+
A2 1.016 1.065 1.049 1.069 1.069 1.089 0.937 1.079 (\(.*?\))
A3 1.030 1.043 1.017 1.006 1.001 1.101 1.102 0.984 <.*>
A4 1.002 1.057 0.979 1.036 1.009 1.022 1.048 1.041 \%(\d+)!.*?!
A5 1.001 1.006 0.886 0.979 0.987 1.018 0.995 1.024 ^[^ ]*$
B1 1.499 1.416 1.459 1.452 1.125 1.166 1.093 1.565 [^a-zA-Z0-9_.]
B2 1.015 1.065 1.015 1.040 1.016 1.023 0.979 1.027 ^-?([^-+/*\(\)\^\s]+)
B3 0.943 0.979 0.980 0.962 1.010 0.973 0.930 0.992 ^[.]
B4 1.062 0.993 0.930 0.939 0.477 0.320 0.325 1.122 ^(?<LINE>[0-9]*)$
B5 1.035 1.007 0.956 1.025 1.054 1.006 1.011 1.066 ^\{([^\} ]&#124;\}\})*\}$
C1 0.996 1.032 0.997 1.015 0.988 0.986 1.069 0.993 CLI flag parser
C2 1.031 1.006 0.971 0.907 0.596 0.432 0.427 1.039 Line range [0-9]*-[0-9]*
C3 1.096 1.078 1.115 1.099 1.091 1.082 1.063 1.155 Language tag
C5 1.155 1.051 1.193 1.046 1.145 1.027 1.054 1.223 trackId[^0-9]*([0-9]*)
D1 1.011 0.983 0.499 0.999 0.052 0.724 0.148 1.041 CSV-like parser
D2 1.028 0.997 1.016 0.985 0.708 0.534 0.540 1.061 Line,col range
D3 1.244 1.177 1.180 1.159 0.984 1.171 1.192 1.284 Email validation
D5 1.186 1.054 1.142 1.049 0.642 0.448 0.449 1.292 JWT/dotted name

@danmoseley
Copy link
Member Author

Follow-up on the one-shot benchmark results: negated class overhead

While investigating why [^a-zA-Z0-9_.] showed 56% construction overhead with zero matching benefit, I looked at the code paths.

CreateSetSearchValues() builds a SearchValues for every eligible char class string unconditionally. But on the matching side, only the bulk-scanning opcodes use it (Setloop, Setloopatomic, Setrep). The single-char Set opcode and lazy Setlazy skip it entirely — reasonably, since vectorized search over 1 char isn't useful.

So for patterns like [^a-zA-Z0-9_.] (no quantifier → Set opcode), the SearchValues.Create() cost is pure overhead. This isn't specific to negation — it affects any unquantified or lazy-quantified char class.

A possible fix: only create SearchValues for class strings that are actually referenced by Setloop/Setrep/Setloopatomic opcodes.

@danmoseley
Copy link
Member Author

Overall construction overhead is a net loss for short inputs across the board, and even for eligible patterns the benefit only kicks in on longer inputs.

Closing

@danmoseley danmoseley closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants