Skip to content

Handle single-node branches in ExtractCommonPrefixNode#34

Closed
danmoseley wants to merge 2 commits intomainfrom
regex-redux/fix-alternation-prefix
Closed

Handle single-node branches in ExtractCommonPrefixNode#34
danmoseley wants to merge 2 commits intomainfrom
regex-redux/fix-alternation-prefix

Conversation

@danmoseley
Copy link
Owner

When alternation branches are reduced to single nodes (e.g., Set[Pp] from a single-child Concatenation after prior prefix extraction), ExtractCommonPrefixNode previously bailed because it required all branches to be Concatenations. This caused IgnoreCase alternation prefix extraction to stop one character short (e.g., htt instead of http for (http|https)).

Fix: Remove the upfront gate check and handle both Concatenation and single-node branches throughout the extraction loop. When a single-node branch matches the common prefix, it is replaced with Empty.

Fixes dotnet#124871

Tests: 7 new test cases covering http/https IC, shorter-branch-is-prefix, multi-char difference, 3-branch variants, and case-sensitive regression guard.

When alternation branches are reduced to single nodes (e.g., Set[Pp]
from a single-child Concatenation after prior prefix extraction),
ExtractCommonPrefixNode previously bailed because it required all
branches to be Concatenations. This caused IgnoreCase alternation
prefix extraction to stop one character short (e.g., 'htt' instead
of 'http' for (http|https)).

Fix: remove the upfront gate check, and handle both Concatenation and
single-node branches throughout the extraction loop.

Fixes dotnet#124871

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Behavioral correctness test for (?:http|https)://foo with IgnoreCase
- 4-branch regression test exercising single-node branch after recursive prefix extraction
- Non-IgnoreCase Set-node branch test via character class alternation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Owner Author

Moved to dotnet#124881

@danmoseley danmoseley closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Patterns with literal string in ASCII alternation of different length do not fast-search for longest string

1 participant