Fix NonBacktracking correctness issues and exponential blowup on nested loops by danmoseley · Pull Request #125457 · dotnet/runtime

danmoseley · 2026-03-11T21:18:21Z

Fix NonBacktracking correctness issues and exponential blowup on nested loops

Problem

The NonBacktracking regex engine has exponential time complexity on deeply nested loop patterns like ((a)*)*. The root cause is in the Brzozowski derivative computation: when the left side of a Concat is nullable (as with * loops), the derivative creates a 2-branch Alternate. With N levels of nesting, this produces O(2^N) intermediate nodes per derivative step.

Nesting depth	Input length	Time (before)
15	1000	1.62s
17	1000	5.93s
20	1000	>60s
2000	1	10.7s

The last row shows that even matching a single character takes 10.7s at depth 2000 --- the bottleneck is the first derivative computation, not cumulative per-character cost.

Fix

Three complementary changes:

1. Loop flattening in CreateLoop --- Simplify (R*)* to R* (and variants (R+)*, (R*)+) during symbolic AST construction. This directly eliminates the exponential state space. The simplification is restricted to cases where it preserves semantics:

Inner lower bound must be 0 or 1 (with outer lower 0). Counterexample: (R{2,})* cannot match a single R, so must not become R*.
Both loops must have the same laziness. Counterexample: (?:R*)+? (greedy inner, lazy outer) has different match behavior than R*?.
No capture effects, since collapsing loops would change capture group bindings.

2. StripEffects caching --- The derivative of deeply nested loop patterns with captures produces a DAG with shared sub-trees wrapped in different Effect nodes. Without caching, StripEffects traverses each shared sub-tree once per reference rather than once per unique node, leading to exponential work. Adding a cache (following the existing pattern of _pruneLowerPriorityThanNullabilityCache) fixes this.

3. CountSingletons defense-in-depth --- Improve the NFA size estimate so that nested unbounded loops multiply the estimate by 2 per nesting level, matching the actual derivative branching cost. This causes the existing safe-size threshold (10,000) to reject patterns with deeply nested unbounded loops that survive loop flattening (e.g., alternating lazy/greedy nesting like (?:(?:a*)*?)* at 15+ levels). With loop flattening, same-laziness nesting is already collapsed before the estimate runs, so the multiplier only fires for patterns that would actually cause exponential blowup.

Note on linear-time guarantee

The NonBacktracking engine guarantees linear time in the length of the input, and this guarantee was never violated by the nested loop issue. The engine is O(C * N) where N is the input length and C is a pattern-dependent constant representing the per-character cost. For deeply nested patterns like ((a)*)*, the constant C grows exponentially with the nesting depth of the pattern, but for any given pattern the matching time remains linear in the input length.

In other words, this was an exponential blowup in the pattern complexity constant, not in input-length scaling. In practice, triggering the blowup requires deliberately nesting redundant quantifiers 15+ levels deep, which does not arise naturally.

Tests

Re-enables StressTestDeepNestingOfLoops for NonBacktracking at depth 120 (previously disabled citing NonBacktracking engine is pathologically slow on patterns like "((((((((a)*)*)*)*)*)*)*" and input like "aaaaaaa" which other engines handle fine #84188; even depth 20 was intractable before the fix).
Adds correctness test cases for nested loop simplification edge cases, verified to fail without the fix and pass with it:
- ^(?:(?:a){2,})*$ on "a" --- 0 matches (would be 1 if body._lower >= 2 check were missing)
- ^(?:(?:a){2})*$ on "a", "aaa" --- 0 matches (would be 1 if incorrectly simplified to a*)
- (?:(?:a*)+?) on "aaa" --- 2 matches (would be 4 if laziness guard were missing)
- Sanity checks for patterns that are correctly simplified
Adds unit tests for the CountSingletons defense-in-depth:
- Alternating-laziness depth-20 pattern exceeds unsafe threshold
- Same-laziness depth-100 stays safe after loop flattening

The NonBacktracking regex engine suffered exponential time complexity when processing deeply nested loop patterns like ((a)*)*. Even depth 20 could take >60 seconds, and depth 2000 with a single-character input took ~11s. Two changes fix this: 1. Loop flattening in CreateLoop: simplify (R*)* to R* (and variants like (R+)*, (R*)+) when both loops have unbounded upper, compatible lower bounds, same laziness, and no capture effects. This eliminates the exponential state space in derivative computation. 2. StripEffects caching: cache results of StripEffects to avoid repeatedly traversing shared sub-trees in the derivative DAG, which otherwise causes exponential work even after loop flattening for patterns with capture groups. The loop flattening condition is carefully restricted to preserve correctness: - body._lower must be 0 or 1 (with outer lower 0): (R{2,})* cannot match a single R, so it must not be simplified to R*. - Both loops must have the same laziness: (?:R*)+? (greedy inner, lazy outer) has different match semantics than R*?. - No capture effects: collapsing loops would change capture group bindings. Re-enables StressTestDeepNestingOfLoops for NonBacktracking (depth 120). Fixes dotnet#84188 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Change the nested loop flattening in CreateLoop from a recursive call (if + return CreateLoop) to a while loop that mutates lower/body in place. This avoids consuming stack proportional to the nesting depth, which could cause a stack overflow for deeply nested patterns. Also note in comments that the exponential blowup occurs during lazy DFA construction within matching, not just pattern compilation, so it violates the engine's linear-time-in-input-length guarantee. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The engine remains O(input_length) for any given pattern. The blowup is in the per-character cost (pattern-dependent constant), not in input-length scaling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR improves the NonBacktracking (symbolic) regex engine’s behavior on deeply nested quantifiers by reducing derivative-state explosion and redundant effect-stripping work, and updates tests to validate both correctness and stress behavior.

Changes:

Add loop-flattening in symbolic AST construction to simplify redundant nested unbounded loops when semantics are preserved.
Add caching for StripEffects to avoid repeated traversal of shared derivative subtrees.
Extend/adjust regex tests to cover nested-loop simplification correctness and re-enable NonBacktracking stress coverage with a safer nesting depth.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs	Adds match-count correctness cases for nested-loop simplification and updates deep loop stress data to include NonBacktracking with reduced depth.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs	Implements nested-loop flattening in `CreateLoop` and adds `StripEffects` caching (plus improved alternation handling) to prevent exponential blowups.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexBuilder.cs	Introduces `_stripEffectsCache` to support the new cached `StripEffects` behavior.

...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs

danmoseley · 2026-03-11T21:39:45Z

@olsaarik are you comfortable signing off? I responded to your comment above; no code changes, just comment tweak.

Improve CountSingletons to multiply the estimate by 2 when an unbounded loop's body is itself an unbounded loop. This causes the NFA size estimate to grow exponentially with nesting depth, so patterns with alternating lazy/greedy nesting at 15+ levels are rejected upfront. Add tests: - Alternating-laziness depth-20 pattern exceeds unsafe threshold - Same-laziness depth-100 stays safe (loop flattening collapses it) - (a{2})* correctness: odd-length strings must not match Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-03-12T01:18:36Z

Added defense-in-depth in the last commit: CountSingletons now multiplies the NFA size estimate by 2 when an unbounded loop's body is itself an unbounded loop. This makes the estimate grow exponentially with nesting depth, matching the actual derivative cost.

With loop flattening, same-laziness nesting (e.g. (?:a*)*) is already collapsed before the check runs, so the multiplier doesn't fire for those. It primarily catches patterns that survive flattening — alternating lazy/greedy nesting like (?:(?:a*)*?)* — rejecting them at ~15 levels via the existing 10K threshold.

I verified that with loop flattening disabled, the CountSingletons change alone is sufficient to reject the pathological patterns.

danmoseley · 2026-03-19T23:51:34Z

/azp cancel

azure-pipelines · 2026-03-19T23:51:42Z

Command 'cancel' is not supported by Azure Pipelines. Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

danmoseley · 2026-03-19T23:52:13Z

/azp help

azure-pipelines · 2026-03-19T23:52:19Z

Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Copilot · 2026-03-20T00:08:05Z

src/libraries/System.Text.RegularExpressions/tests/UnitTests/SymbolicRegexTests.cs

+            // (which requires matching laziness) and cause exponential derivative blowup.
+            // CountSingletons accounts for this by multiplying by 2 per nesting level, so
+            // at depth 15+ the estimate exceeds the default 10,000 threshold.
+            // Build: (?:(?:(?:a)*?)*)*? ... with alternating greedy/lazy at each level


The comment showing the example pattern doesn't match the actual pattern constructed by the loops: the closing loop runs from depth-1 down to 0, making the outermost quantifier greedy ("") rather than lazy ("?") as implied by ...)*? .... Please update the comment to reflect the real generated pattern (or describe the alternation without suggesting a specific outermost suffix).

Suggested change

// Build: (?:(?:(?:a)*?)*)*? ... with alternating greedy/lazy at each level

// Build: (?:(?:(?:a)*)*?)* ... with alternating greedy/lazy quantifiers at each level

danmoseley and others added 4 commits March 6, 2026 13:04

Remove issue URL from code comment

1c017ca

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 11, 2026 21:18

github-actions bot added the area-System.Text.RegularExpressions label Mar 11, 2026

dotnet-policy-service bot assigned danmoseley Mar 11, 2026

Copilot started reviewing on behalf of danmoseley March 11, 2026 21:20 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

danmoseley commented Mar 11, 2026

View reviewed changes

...tem.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexNode.cs Show resolved Hide resolved

build-analysis bot mentioned this pull request Mar 12, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

danmoseley requested a review from stephentoub March 18, 2026 16:18

Merge branch 'main' into fix-nested-loop-perf

92b7c77

Copilot AI review requested due to automatic review settings March 19, 2026 23:57

Copilot started reviewing on behalf of danmoseley March 20, 2026 00:00 View session

This was referenced Mar 20, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

[android-arm64] The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#6408

Open

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NonBacktracking correctness issues and exponential blowup on nested loops#125457

Fix NonBacktracking correctness issues and exponential blowup on nested loops#125457
danmoseley wants to merge 6 commits intodotnet:mainfrom
danmoseley:fix-nested-loop-perf

danmoseley commented Mar 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

danmoseley commented Mar 11, 2026

Uh oh!

danmoseley commented Mar 12, 2026

Uh oh!

danmoseley commented Mar 19, 2026

Uh oh!

azure-pipelines bot commented Mar 19, 2026

Uh oh!

danmoseley commented Mar 19, 2026

Uh oh!

azure-pipelines bot commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// Build: (?:(?:(?:a)?))*? ... with alternating greedy/lazy at each level
	// Build: (?:(?:(?:a))?)* ... with alternating greedy/lazy quantifiers at each level

Conversation

danmoseley commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!