Rework RegexGenerator incremental source model to support value equality#125438
Rework RegexGenerator incremental source model to support value equality#125438eiriktsarpalis wants to merge 20 commits intomainfrom
Conversation
Add RegexGeneratorIncrementalTests.cs with 18 tests covering: - Same compilation caching (SameInput_DoesNotRegenerate) - Unrelated code changes (UnrelatedCodeChange_DoesNotRegenerate) - Pattern/options/timeout/culture changes triggering regeneration - Declaring type and member name changes - Method add/remove scenarios - Property vs method support - Complex patterns with unrelated changes (Theory with 5 patterns) - Limited support regex (NonBacktracking) caching - Multiple regex methods with partial changes These tests establish the baseline behavior before refactoring the generator's incremental pipeline for proper value equality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gex generator Add ImmutableEquatableDictionary<TKey, TValue> to Common/src/SourceGenerators/ following the same pattern as the existing ImmutableEquatableArray<T>. Link ImmutableEquatableArray, ImmutableEquatableDictionary, and HashHelpers to the Regex generator project for use in the upcoming equatable model types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line Add gen/Model/ directory with immutable, structurally equatable record types: - RegexGenerationSpec: top-level envelope (incremental cache boundary) - RegexMethodSpec: per-method data with declaring type, pattern, options, tree - RegexTreeSpec: equatable regex tree with baked-in analysis results - RegexNodeSpec: equatable tree node with analysis flags per node - FindOptimizationsSpec: equatable find optimizations - FixedDistanceSetSpec: equatable fixed-distance set - RegexTypeSpec: declaring type hierarchy Also add required member polyfills for netstandard2.0 compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion Rewrites the RegexGenerator Initialize() pipeline to: 1. Collect per-method results into a single batch 2. Run a Parser that builds an equatable RegexGenerationSpec model 3. Split into source model (with value equality) and diagnostics providers 4. Register separate source outputs for model and diagnostics The equatable model (RegexGenerationSpec -> RegexMethodSpec -> RegexTreeSpec -> RegexNodeSpec) enables Roslyn to skip the emitter when regex patterns/options haven't changed. During emission, the spec is converted back to the mutable RegexMethod/RegexTree types by re-parsing the regex pattern. This re-parse is fast (~1ms) and only occurs on incremental cache misses, while the expensive emission (10-100ms) is fully skipped on cache hits. Also fixes CultureChange_Regenerates test to check incremental step reason rather than output string equality, since culture changes may not always produce different generated code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Pull request overview
This pull request refactors the RegexGenerator incremental source generator pipeline to use deeply equatable model types, enabling Roslyn incremental caching to skip expensive emission when regex inputs haven’t changed.
Changes:
- Replaced the heterogeneous
ImmutableArray<object>incremental model with equatable*Specmodel records (usingImmutableEquatableArray/ newImmutableEquatableDictionary). - Reworked the pipeline to split source emission from diagnostic reporting and re-parse the regex during emission only on cache misses.
- Added a new incremental-caching-focused test suite and wired it into the functional test project.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/System.Text.RegularExpressions.Tests.csproj | Adds the new incremental generator tests to the test project. |
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs | New tests validating incremental caching behavior via tracked generator steps. |
| src/libraries/System.Text.RegularExpressions/gen/System.Text.RegularExpressions.Generator.csproj | Links shared equatable collection helpers and required-member polyfills into the generator project. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs | Pipeline rewrite to produce a structurally equatable model and re-parse on emission. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs | Converts RegexTree / analysis into equatable *Spec snapshots. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexGenerationSpec.cs | New top-level equatable incremental cache boundary model. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs | New per-member equatable model for generation inputs. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexTreeSpec.cs | New equatable representation of regex trees and baked-in analysis flags. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexNodeSpec.cs | New equatable representation of regex parse nodes. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexTypeSpec.cs | New equatable representation of declaring type hierarchy. |
| src/libraries/System.Text.RegularExpressions/gen/Model/FindOptimizationsSpec.cs | New equatable representation of find-optimization state. |
| src/libraries/System.Text.RegularExpressions/gen/Model/FixedDistanceSetSpec.cs | New equatable representation of fixed-distance set optimization data. |
| src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs | New shared equatable dictionary helper for structural equality in generator models. |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/Model/RegexNodeSpec.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs
Show resolved
Hide resolved
…mitter - F1: Extract LiteralAfterLoopSpec record from nested tuple in FindOptimizationsSpec for clarity. - F2+F3: Rewrite RegexGenerator.Parser.cs with unified parsing flow. Parse() is the outer method accepting GeneratorAttributeSyntaxContext batch; ParseMethod() handles each member end-to-end (attribute extraction + regex parsing + spec conversion). Eliminates the intermediate RegexPatternAndSyntax type and ImmutableArray<object> boxing. - F4: Move Emit() orchestration and ConvertRegexTypeSpecToRegexType to RegexGenerator.Emitter.cs. RegexGenerator.cs now only contains Initialize() pipeline wiring, constants, and CompilationData. - Fix parent chain construction for immutable RegexTypeSpec records (build outside-in instead of mutating). - Remove unused usings from slimmed RegexGenerator.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ToImmutableEquatableDictionary<TKey, TValue>(this Hashtable) extension method following the existing factory pattern, and simplify the ConvertHashtable caller in Conversion.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
You can also share your feedback on Copilot code review. Take the survey.
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Show resolved
Hide resolved
src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
- Add ImmutableEquatableSet<T> (from PolyType) providing order- independent structural equality via HashSet<T>. Use it for RegexGenerationSpec.RegexMethods to prevent spurious cache misses from non-deterministic ordering of GeneratorAttributeSyntaxContext. - Fix null-safety bug in ImmutableEquatableDictionary.Equals: use EqualityComparer<TValue>.Default.Equals instead of direct call that would NRE on null values. - Remove unused keySelector/valueSelector and IEnumerable<KVP> overloads from ImmutableEquatableDictionary factory. - Inline ConvertHashtable calls in Conversion.cs. - Extract AddDiagnostic helper to reduce noise in Parser.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…culture - Replace var with explicit types throughout changed code - Change RegexMethod/RegexType from internal sealed record to private sealed class - Add using var to runnerSw/runnerWriter for proper disposal - Include CultureName in dedup key for emitted expressions - Remove unused using SourceGenerators from Emitter.cs - Add using System.Text to incremental tests - Strengthen SameInput test with cached/unchanged assertion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
53e38f5 to
93dd4a4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs:150
- The spec currently doesn’t preserve the
cultureNameargument from[GeneratedRegex].CreateRegexTreeSpeclater relies ontree.Culture?.Name, butRegexParser.ParsesetsRegexTree.Cultureto non-null only for ignore-case backreferences; for most ignore-case regexes the tree culture will be null. That means the emitter may re-parse usingInvariantCultureeven when a culture was specified, potentially generating incorrect code for culture-sensitive case folding. Store the requested/selected culture name directly onRegexMethodSpec(or alongsideTree) and use that for emission and de-duping.
string? pattern = items[0].Value as string;
int? options = null;
int? matchTimeout = null;
string? cultureName = string.Empty;
if (items.Length >= 2)
{
options = items[1].Value as int?;
if (items.Length == 4)
{
matchTimeout = items[2].Value as int?;
cultureName = items[3].Value as string;
}
// If there are 3 parameters, we need to check if the third argument is
// int matchTimeoutMilliseconds, or string cultureName.
else if (items.Length == 3)
{
if (items[2].Type?.SpecialType == SpecialType.System_Int32)
{
matchTimeout = items[2].Value as int?;
}
else
{
cultureName = items[2].Value as string;
}
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Outdated
Show resolved
Hide resolved
Sort method specs by fully-qualified type name before ID assignment and emission to ensure deterministic output regardless of HashSet iteration order (which varies with string hash randomization). Add test for CompilationData (AllowUnsafe) change triggering regeneration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move CompilationData record struct to gen/Model/CompilationData.cs - Change Parse() accumulators from HashSet/ImmutableArray.Builder to List - Reorder declarations so methods come before diagnostics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
Inline diagnostic accumulation using (diagnostics ??= []).Add(...). Remove sort in Emit() since ImmutableEquatableSet iteration order is deterministic within a process and the emitter only runs on cache misses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6421f6a to
4740968
Compare
…llect" This reverts commit fb66f83.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs
Show resolved
Hide resolved
RegexTree.Culture is only set for patterns with backreferences, so tree.Culture?.Name was null for most patterns. The emitter then defaulted to InvariantCulture instead of the original culture, breaking case-insensitive matching with Turkish İ characters. Pass the validated effectiveCultureName from the Parser directly to CreateRegexTreeSpec instead of reading it from RegexTree.Culture. Only store the culture name when it was actually validated and used (IgnoreCase active + non-empty culture name), avoiding crashes on invalid culture names that were never validated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verify that two methods with the same pattern and options but different cultures (en-US vs tr-TR) produce distinct generated implementations and are not incorrectly deduped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
Replace broad TrackedOutputSteps assertions with targeted TrackedSteps checks using a named SourceGenerationSpec step. This ensures tests verify the source model step specifically, not unrelated diagnostic output steps. Extract shared AssertSourceModelCached/Modified helpers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
Fixes #125409
Discovered while working on #125404 — the RegexGenerator's incremental source model lacks proper value equality, causing the source output callback to fire on every compilation change regardless of whether regex patterns have actually changed.
Approach
Refactors the generator following the STJ/Configuration Binder pattern with distinct Parser/Emitter files and deeply equatable model types (
*Specrecords usingImmutableEquatableArray,ImmutableEquatableSet, andImmutableEquatableDictionary).The emitter re-parses the regex from
Pattern + Options + CultureNameduring emission to reconstruct the mutableRegexTree/RegexNodeobjects rather than porting the 5800-line emitter to use*Spectypes. This re-parse is fast (~1ms) and only runs on cache misses, while the expensive emission (10-100ms) is fully skipped on cache hits.19 new incremental compilation tests verify caching behavior via
TrackedOutputSteps.