Use SearchValues for Setrep/Setloop regex interpreter opcodes#124630
Use SearchValues for Setrep/Setloop regex interpreter opcodes#124630danmoseley wants to merge 4 commits intodotnet:mainfrom
Conversation
Precompute SearchValues<char> for character class strings at regex construction time. Use them in the Setrep and Setloop/Setloopatomic opcode handlers to replace per-character CharInClass loops with vectorized SIMD-accelerated span operations. Character classes that use Unicode categories, subtraction+negation, or have more than 128 characters fall back to the existing per-character path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Pull request overview
This PR extends the regex interpreter’s SIMD/vectorization work by precomputing SearchValues<char> for eligible character-class strings at construction time, then using those precomputed matchers in the Setrep and Setloop/Setloopatomic opcode handlers to replace per-character CharInClass loops with span-based vectorized operations.
Changes:
- Add
RegexInterpreterCode.StringsSetSearchValuesto precomputeSearchValues<char>(plus negation) for small/enumerable character classes. - Introduce
SetSearchValueshelper struct to encapsulate theSearchValues<char>and negation semantics. - Update
RegexInterpreterSetrepandSetloophandlers to useContainsAnyExcept/IndexOfAnyExcept(or inverted forms for negated sets) when running left-to-right.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs | Precomputes SearchValues<char>-based matchers for eligible set strings and exposes them to the interpreter. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreter.cs | Uses precomputed set matchers to vectorize Setrep and Setloop opcode execution for left-to-right matching. |
...es/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs
Show resolved
Hide resolved
...es/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexInterpreterCode.cs
Outdated
Show resolved
Hide resolved
|
Real-world impact estimate: Analyzing the 15,817 unique patterns in the regex test corpus (assuming interpreter engine):
|
The Strings table contains both character class strings and Multi literal strings. Add validation that the string has a well-formed char-class encoding (valid flags byte, consistent lengths) before calling GetSetChars, which assumes well-formed input. Also clarify the comment about GetSetChars behavior for negated sets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A Multi literal starting with \0 and having an even-valued second byte and \0 at index 2 satisfies all CanEasilyEnumerateSetContents checks, causing GetSetChars to enumerate past the end of the string. This test verifies CreateSetSearchValues validates the encoding first. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
What is the impact on Regex construction time? We can make the Regex interpreter much faster by not using the regex interpreter, e.g. implicitly adding .Compiled, so the main performance benefit of the interpreter is faster construction for cases where it will be rarely used. Also, can you share the output of benchmarks for shorter runs in the input, like where a set+ will match only 1, 2, 3 characters? |
|
@MihuBot benchmark Regex |
|
See benchmark results at https://gist.github.com/MihuBot/e83bbd17cae6db6cc0670f22e9d19b38 |
|
@stephentoub Here are benchmarks addressing both concerns: Construction cost:
Short match lengths (
Short matches (1-3 chars) are neutral — no regression. The SearchValues dispatch overhead is negligible because So the tradeoff is ~1.7x slower construction vs 2-7x faster matching for longer character class runs. For a rarely-used regex where construction dominates, this is a net cost. Benchmark code[BenchmarkCategory(Categories.Libraries, Categories.Regex)]
public class Perf_Regex_Interpreter_SearchValues_Impact
{
[Benchmark]
public Regex Ctor_NoCharClass() => new Regex("hello world");
[Benchmark]
public Regex Ctor_1CharClass() => new Regex("[a-z]+");
[Benchmark]
public Regex Ctor_3CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+");
[Benchmark]
public Regex Ctor_5CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+[a-f]+[!@#]+");
[Benchmark]
public Regex Ctor_10CharClasses() => new Regex("[a-z]+[0-9]+[A-Z]+[a-f]+[!@#]+[g-m]+[4-8]+[N-T]+[x-z]+[,;:]+");
[Benchmark]
public Regex Ctor_RealWorld_Email() => new Regex(@"[a-z0-9]+(?:\.[a-z0-9]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?");
[Benchmark]
public Regex Ctor_UnicodeCategories() => new Regex(@"\w+\s+\d+");
private Regex _setloop;
[GlobalSetup(Targets = new[] {
nameof(Setloop_Match1), nameof(Setloop_Match2), nameof(Setloop_Match3),
nameof(Setloop_Match8), nameof(Setloop_Match16) })]
public void Setup_Setloop() => _setloop = new Regex("[a-z]+", RegexOptions.None);
[Benchmark]
public Match Setloop_Match1() => _setloop.Match("a.");
[Benchmark]
public Match Setloop_Match2() => _setloop.Match("ab.");
[Benchmark]
public Match Setloop_Match3() => _setloop.Match("abc.");
[Benchmark]
public Match Setloop_Match8() => _setloop.Match("abcdefgh.");
[Benchmark]
public Match Setloop_Match16() => _setloop.Match("abcdefghijklmnop.");
} |
|
So maybe this comes down to whether we prefer to optimize for
I'm inclined to say it's a worthwhile tradeoff because it can improve performance dramatically, without limit in cases where significant time is being spent, against a very small construction impact. In general changing from interpreted to generated (if pattern is static which is generally the case) is a no brainer improvement over interpreted. So the relevant scenario here is long standing libraries that are not being actively maintained. (I'm ignoring throwaway code where they're using interpreted just to type slightly less). Not sure how that impacts the calculus here. It is a good reason for caring about optimizing interpreter even though it's rarely the right choice to use for perf (not suggesting you're arguing otherwise) thoughts? |
|
Analysis of the MihuBot benchmark results, focusing on interpreter (Options=None): Matching wins (Sherlock corpus, interpreter only):
Neutral / no benefit (as expected —
Construction overhead (Mariomkas, real-world patterns):
The real-world construction overhead is only 1-4%, much smaller than the ~1.7x my micro-benchmarks showed for minimal patterns. This is because Summary: 5-9% matching wins for explicit character class patterns on realistic inputs, 1-4% construction overhead for real-world patterns, neutral for |
|
If this goes beyond just the process startup overhead and you're creating SearchValues often, we could look into optimizing the |
|
@MihaZupan interesting, any reason we shouldn't take that change? seems pretty localized complexity. otherwise, this is ready for review I think. (optionally, we could wait on change to SearchValues proposed) |
|
Just that it's more (unsafe) code. I wouldn't block any changes on that though. |
true. agreed |
|
I will do an experiment of the worst case -- one shot, Then measure with BDN the overall cost of |
SearchValues One-Shot Regex Performance ReportPR: #124630 — Add Executive SummaryThe PR adds Key question from Stephen Toub: Does the construction overhead pay for itself in one-shot Answer: It depends on input length and pattern type:
Methodology
Group Average OneShot Ratios (PR / Baseline)< 1.0 = PR faster, > 1.0 = PR slower
Group A (control) shows ~1-4% noise, confirming no regression for patterns with 0 eligible classes. Group D (4+ eligible) shows the clearest trend: Long-NonMatch averages 0.60 (40% faster), Construction Overhead
Absolute overhead ranges from ~1ns (A1) to +1.8us (D3 email validation). MatchOnly Ratios Reveal the TruthMatchOnly benchmarks use a pre-constructed Regex, isolating the matching speedup from construction overhead. Patterns where matching is dramatically faster (MatchOnly LongMatch ratio):
*D1 LongNonMatch: 1.8ms down to 94.1us — the biggest absolute win Patterns where matching is NOT faster (MatchOnly ~1.0):
Key Individual ResultsBiggest Wins (OneShot)
Biggest Regressions (OneShot)
B1 stands out: Conclusions1. The optimization works spectacularly for the right patternsPatterns with unbounded quantifiers ( 2. Construction overhead is real but modest~10-17% construction overhead for eligible patterns. In absolute terms, typically 50-200ns. 3. Short-input regression is consistentFor inputs under ~100 chars, construction overhead dominates and OneShot is ~5-15% slower. 4. Negated classes are a concernB1 ( 5. Breakeven analysisThe one-shot breakeven point depends on the pattern, but roughly:
6. Overall verdictFor the interpreter one-shot path, the optimization is a net positive for real-world workloads Failed Patterns
These are test harness issues, not PR issues. 18/20 patterns provide solid data. Appendix: All OneShot Ratios
|
|
Follow-up on the one-shot benchmark results: negated class overhead While investigating why
So for patterns like A possible fix: only create |
|
Overall construction overhead is a net loss for short inputs across the board, and even for eligible patterns the benefit only kicks in on longer inputs. Closing |
Summary
Precompute
SearchValues<char>for character class strings at regex construction time, and use them in theSetrepandSetloop/Setloopatomicinterpreter opcode handlers to replace per-characterCharInClassloops with vectorized SIMD-accelerated span operations.Character classes that use Unicode categories, subtraction+negation, or have more than 128 characters fall back to the existing per-character path. A
SetSearchValueswrapper struct encapsulatesSearchValues<char>and the set's negation flag so the interpreter doesn't need to know whether the class was defined as negated.This is a follow-up to #124628 which vectorized the
Oneloop,Onerep,Notonerep, andMatchStringopcodes.Benchmark Results
Benchmark code