Skip to content

Rework RegexGenerator incremental source model to support value equality#125438

Open
eiriktsarpalis wants to merge 20 commits intomainfrom
fix/regex-generator-incrementality
Open

Rework RegexGenerator incremental source model to support value equality#125438
eiriktsarpalis wants to merge 20 commits intomainfrom
fix/regex-generator-incrementality

Conversation

@eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Mar 11, 2026

Fixes #125409

Discovered while working on #125404 — the RegexGenerator's incremental source model lacks proper value equality, causing the source output callback to fire on every compilation change regardless of whether regex patterns have actually changed.

Approach

Refactors the generator following the STJ/Configuration Binder pattern with distinct Parser/Emitter files and deeply equatable model types (*Spec records using ImmutableEquatableArray, ImmutableEquatableSet, and ImmutableEquatableDictionary).

The emitter re-parses the regex from Pattern + Options + CultureName during emission to reconstruct the mutable RegexTree/RegexNode objects rather than porting the 5800-line emitter to use *Spec types. This re-parse is fast (~1ms) and only runs on cache misses, while the expensive emission (10-100ms) is fully skipped on cache hits.

19 new incremental compilation tests verify caching behavior via TrackedOutputSteps.

eiriktsarpalis and others added 4 commits March 11, 2026 13:05
Add RegexGeneratorIncrementalTests.cs with 18 tests covering:
- Same compilation caching (SameInput_DoesNotRegenerate)
- Unrelated code changes (UnrelatedCodeChange_DoesNotRegenerate)
- Pattern/options/timeout/culture changes triggering regeneration
- Declaring type and member name changes
- Method add/remove scenarios
- Property vs method support
- Complex patterns with unrelated changes (Theory with 5 patterns)
- Limited support regex (NonBacktracking) caching
- Multiple regex methods with partial changes

These tests establish the baseline behavior before refactoring the
generator's incremental pipeline for proper value equality.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gex generator

Add ImmutableEquatableDictionary<TKey, TValue> to Common/src/SourceGenerators/
following the same pattern as the existing ImmutableEquatableArray<T>.

Link ImmutableEquatableArray, ImmutableEquatableDictionary, and HashHelpers
to the Regex generator project for use in the upcoming equatable model types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line

Add gen/Model/ directory with immutable, structurally equatable record types:
- RegexGenerationSpec: top-level envelope (incremental cache boundary)
- RegexMethodSpec: per-method data with declaring type, pattern, options, tree
- RegexTreeSpec: equatable regex tree with baked-in analysis results
- RegexNodeSpec: equatable tree node with analysis flags per node
- FindOptimizationsSpec: equatable find optimizations
- FixedDistanceSetSpec: equatable fixed-distance set
- RegexTypeSpec: declaring type hierarchy

Also add required member polyfills for netstandard2.0 compatibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion

Rewrites the RegexGenerator Initialize() pipeline to:
1. Collect per-method results into a single batch
2. Run a Parser that builds an equatable RegexGenerationSpec model
3. Split into source model (with value equality) and diagnostics providers
4. Register separate source outputs for model and diagnostics

The equatable model (RegexGenerationSpec -> RegexMethodSpec -> RegexTreeSpec ->
RegexNodeSpec) enables Roslyn to skip the emitter when regex patterns/options
haven't changed.

During emission, the spec is converted back to the mutable RegexMethod/RegexTree
types by re-parsing the regex pattern. This re-parse is fast (~1ms) and only
occurs on incremental cache misses, while the expensive emission (10-100ms) is
fully skipped on cache hits.

Also fixes CultureChange_Regenerates test to check incremental step reason
rather than output string equality, since culture changes may not always
produce different generated code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 12:55
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the RegexGenerator incremental source generator pipeline to use deeply equatable model types, enabling Roslyn incremental caching to skip expensive emission when regex inputs haven’t changed.

Changes:

  • Replaced the heterogeneous ImmutableArray<object> incremental model with equatable *Spec model records (using ImmutableEquatableArray / new ImmutableEquatableDictionary).
  • Reworked the pipeline to split source emission from diagnostic reporting and re-parse the regex during emission only on cache misses.
  • Added a new incremental-caching-focused test suite and wired it into the functional test project.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/System.Text.RegularExpressions.Tests.csproj Adds the new incremental generator tests to the test project.
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs New tests validating incremental caching behavior via tracked generator steps.
src/libraries/System.Text.RegularExpressions/gen/System.Text.RegularExpressions.Generator.csproj Links shared equatable collection helpers and required-member polyfills into the generator project.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs Pipeline rewrite to produce a structurally equatable model and re-parse on emission.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs Converts RegexTree / analysis into equatable *Spec snapshots.
src/libraries/System.Text.RegularExpressions/gen/Model/RegexGenerationSpec.cs New top-level equatable incremental cache boundary model.
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs New per-member equatable model for generation inputs.
src/libraries/System.Text.RegularExpressions/gen/Model/RegexTreeSpec.cs New equatable representation of regex trees and baked-in analysis flags.
src/libraries/System.Text.RegularExpressions/gen/Model/RegexNodeSpec.cs New equatable representation of regex parse nodes.
src/libraries/System.Text.RegularExpressions/gen/Model/RegexTypeSpec.cs New equatable representation of declaring type hierarchy.
src/libraries/System.Text.RegularExpressions/gen/Model/FindOptimizationsSpec.cs New equatable representation of find-optimization state.
src/libraries/System.Text.RegularExpressions/gen/Model/FixedDistanceSetSpec.cs New equatable representation of fixed-distance set optimization data.
src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs New shared equatable dictionary helper for structural equality in generator models.

You can also share your feedback on Copilot code review. Take the survey.

eiriktsarpalis and others added 2 commits March 11, 2026 15:29
…mitter

- F1: Extract LiteralAfterLoopSpec record from nested tuple in
  FindOptimizationsSpec for clarity.
- F2+F3: Rewrite RegexGenerator.Parser.cs with unified parsing flow.
  Parse() is the outer method accepting GeneratorAttributeSyntaxContext
  batch; ParseMethod() handles each member end-to-end (attribute
  extraction + regex parsing + spec conversion). Eliminates the
  intermediate RegexPatternAndSyntax type and ImmutableArray<object>
  boxing.
- F4: Move Emit() orchestration and ConvertRegexTypeSpecToRegexType to
  RegexGenerator.Emitter.cs. RegexGenerator.cs now only contains
  Initialize() pipeline wiring, constants, and CompilationData.
- Fix parent chain construction for immutable RegexTypeSpec records
  (build outside-in instead of mutating).
- Remove unused usings from slimmed RegexGenerator.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 13:58
Add ToImmutableEquatableDictionary<TKey, TValue>(this Hashtable)
extension method following the existing factory pattern, and simplify
the ConvertHashtable caller in Conversion.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.


You can also share your feedback on Copilot code review. Take the survey.

- Add ImmutableEquatableSet<T> (from PolyType) providing order-
  independent structural equality via HashSet<T>. Use it for
  RegexGenerationSpec.RegexMethods to prevent spurious cache misses
  from non-deterministic ordering of GeneratorAttributeSyntaxContext.
- Fix null-safety bug in ImmutableEquatableDictionary.Equals: use
  EqualityComparer<TValue>.Default.Equals instead of direct call
  that would NRE on null values.
- Remove unused keySelector/valueSelector and IEnumerable<KVP>
  overloads from ImmutableEquatableDictionary factory.
- Inline ConvertHashtable calls in Conversion.cs.
- Extract AddDiagnostic helper to reduce noise in Parser.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 15:29
…culture

- Replace var with explicit types throughout changed code
- Change RegexMethod/RegexType from internal sealed record to private sealed class
- Add using var to runnerSw/runnerWriter for proper disposal
- Include CultureName in dedup key for emitted expressions
- Remove unused using SourceGenerators from Emitter.cs
- Add using System.Text to incremental tests
- Strengthen SameInput test with cached/unchanged assertion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eiriktsarpalis eiriktsarpalis force-pushed the fix/regex-generator-incrementality branch from 53e38f5 to 93dd4a4 Compare March 11, 2026 15:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs:150

  • The spec currently doesn’t preserve the cultureName argument from [GeneratedRegex]. CreateRegexTreeSpec later relies on tree.Culture?.Name, but RegexParser.Parse sets RegexTree.Culture to non-null only for ignore-case backreferences; for most ignore-case regexes the tree culture will be null. That means the emitter may re-parse using InvariantCulture even when a culture was specified, potentially generating incorrect code for culture-sensitive case folding. Store the requested/selected culture name directly on RegexMethodSpec (or alongside Tree) and use that for emission and de-duping.
            string? pattern = items[0].Value as string;
            int? options = null;
            int? matchTimeout = null;
            string? cultureName = string.Empty;
            if (items.Length >= 2)
            {
                options = items[1].Value as int?;
                if (items.Length == 4)
                {
                    matchTimeout = items[2].Value as int?;
                    cultureName = items[3].Value as string;
                }
                // If there are 3 parameters, we need to check if the third argument is
                // int matchTimeoutMilliseconds, or string cultureName.
                else if (items.Length == 3)
                {
                    if (items[2].Type?.SpecialType == SpecialType.System_Int32)
                    {
                        matchTimeout = items[2].Value as int?;
                    }
                    else
                    {
                        cultureName = items[2].Value as string;
                    }

You can also share your feedback on Copilot code review. Take the survey.

eiriktsarpalis and others added 2 commits March 11, 2026 17:48
Sort method specs by fully-qualified type name before ID assignment
and emission to ensure deterministic output regardless of HashSet
iteration order (which varies with string hash randomization).

Add test for CompilationData (AllowUnsafe) change triggering regeneration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move CompilationData record struct to gen/Model/CompilationData.cs
- Change Parse() accumulators from HashSet/ImmutableArray.Builder to List
- Reorder declarations so methods come before diagnostics

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 11, 2026 16:05
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

Inline diagnostic accumulation using (diagnostics ??= []).Add(...).
Remove sort in Emit() since ImmutableEquatableSet iteration order
is deterministic within a process and the emitter only runs on
cache misses.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eiriktsarpalis eiriktsarpalis force-pushed the fix/regex-generator-incrementality branch from 6421f6a to 4740968 Compare March 11, 2026 16:14
@eiriktsarpalis eiriktsarpalis changed the title Fix RegexGenerator incremental source model to support value equality Rework RegexGenerator incremental source model to support value equality Mar 11, 2026
@eiriktsarpalis eiriktsarpalis marked this pull request as ready for review March 11, 2026 16:40
@eiriktsarpalis eiriktsarpalis added the source-generator Indicates an issue with a source generator feature label Mar 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

@eiriktsarpalis
Copy link
Member Author

FYI @ericstj @tarekgh

eiriktsarpalis and others added 2 commits March 12, 2026 10:12
RegexTree.Culture is only set for patterns with backreferences, so
tree.Culture?.Name was null for most patterns. The emitter then
defaulted to InvariantCulture instead of the original culture,
breaking case-insensitive matching with Turkish İ characters.

Pass the validated effectiveCultureName from the Parser directly to
CreateRegexTreeSpec instead of reading it from RegexTree.Culture.
Only store the culture name when it was actually validated and used
(IgnoreCase active + non-empty culture name), avoiding crashes on
invalid culture names that were never validated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verify that two methods with the same pattern and options but
different cultures (en-US vs tr-TR) produce distinct generated
implementations and are not incorrectly deduped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 12, 2026 08:49
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Replace broad TrackedOutputSteps assertions with targeted
TrackedSteps checks using a named SourceGenerationSpec step.
This ensures tests verify the source model step specifically,
not unrelated diagnostic output steps. Extract shared
AssertSourceModelCached/Modified helpers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 12, 2026 15:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Text.RegularExpressions source-generator Indicates an issue with a source generator feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RegexGenerator source model lacks value equality, preventing effective incremental caching

2 participants