Skip to content

JIT: Convert multi-target switches to branchless checks#124567

Open
JulieLeeMSFT wants to merge 4 commits intodotnet:mainfrom
JulieLeeMSFT:optPatternMatchSwitch
Open

JIT: Convert multi-target switches to branchless checks#124567
JulieLeeMSFT wants to merge 4 commits intodotnet:mainfrom
JulieLeeMSFT:optPatternMatchSwitch

Conversation

@JulieLeeMSFT
Copy link
Member

@JulieLeeMSFT JulieLeeMSFT commented Feb 18, 2026

Fixes #123858.

This PR improves JIT codegen for multi-target switch-style tests by converting eligible cases into branchless checks. The goal is to produce optimal codegen that matches the intent of C# pattern matching with or (e.g., x is A or B or C).

When a switch has exactly 2 unique successors (all non-default cases target one block, default targets another), convert it from Switch to an unsigned range comparison (In fgOptimizeSwitchBranches).
When both targets are simple return blocks, fgFoldCondToReturnBlock further folds this into branchless return.

Example

        private static bool IsLetterCategory(UnicodeCategory uc)
        {
            return uc == UnicodeCategory.UppercaseLetter
                 || uc == UnicodeCategory.LowercaseLetter
                 || uc == UnicodeCategory.TitlecaseLetter
                 || uc == UnicodeCategory.ModifierLetter
                 || uc == UnicodeCategory.OtherLetter;
        }

Before

cmp      ecx, 4
ja       SHORT G_M22758_IG05
mov      eax, 1
ret
G_M22758_IG05:
xor      eax, eax
ret

After

cmp      ecx, 4
setbe    al
movzx    rax, al
ret

Details

  • The comparison direction (GT_LE vs GT_GT) is chosen to favor fall-through to the lexically next block.
  • Switches with 3 or fewer cases are skipped — the backend's native switch lowering already generates efficient comparison chains for those, and converting early can produce worse code via if-conversion (cmov).
  • Edge dup counts and profile weights are fixed up after conversion.
  • Added tests for zero-based and non-zero-based consecutive ranges.

ASMDiffs

  • SPMI asmdiffs show code size improvements of 83 bytes.

@JulieLeeMSFT JulieLeeMSFT added this to the 11.0.0 milestone Feb 18, 2026
Copilot AI review requested due to automatic review settings February 18, 2026 19:25
@JulieLeeMSFT JulieLeeMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 18, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@JulieLeeMSFT
Copy link
Member Author

@EgorBo, PTAL.
CC @dotnet/jit-contrib.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes multi-target switches where all non-default cases target a single block (distinct from the default target) by converting them into unsigned range comparisons. This enables branchless code generation when both targets are simple return blocks, addressing the disparity between pattern matching and explicit equality comparisons noted in issue #123858.

Changes:

  • Adds logic in fgOptimizeSwitchBranches to detect and optimize switches with exactly 2 unique successors
  • Transforms such switches into unsigned LE/GT comparisons, choosing the direction to favor fall-through
  • Adds comprehensive test coverage for both zero-based and non-zero-based consecutive ranges

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/fgopt.cpp Implements the switch-to-range-check optimization with edge weight/dup count fixup and profile weight updates
src/tests/JIT/opt/OptSwitchRecognition/optSwitchRecognition.cs Adds tests for zero-based (0-4) and non-zero-based (10-14) consecutive value ranges with comprehensive boundary cases

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 19:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 19:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +169 to +189
// Test with non-zero-based consecutive values
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool IsInRange10To14(int val)
{
return val == 10
|| val == 11
|| val == 12
|| val == 13
|| val == 14;
}

[Theory]
[InlineData(9, false)]
[InlineData(10, true)]
[InlineData(11, true)]
[InlineData(12, true)]
[InlineData(13, true)]
[InlineData(14, true)]
[InlineData(15, false)]
[InlineData(-1, false)]
public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1));
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “non-zero-based consecutive values” test uses 10..14, but SwitchRecognition forces minValue=0 when maxValue <= SWITCH_MAX_DISTANCE, so the recognized switch will likely cover 0..14 with many entries targeting the false block. That shape is not eligible for the new switch→range-check simplification (which requires all non-default cases to share one target). If the intent is to exercise the new optimization, consider using a range where maxValue > SWITCH_MAX_DISTANCE (e.g., 100..104) so recognition keeps a non-zero minValue and emits a dense 0..4 switch after subtracting.

Suggested change
// Test with non-zero-based consecutive values
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool IsInRange10To14(int val)
{
return val == 10
|| val == 11
|| val == 12
|| val == 13
|| val == 14;
}
[Theory]
[InlineData(9, false)]
[InlineData(10, true)]
[InlineData(11, true)]
[InlineData(12, true)]
[InlineData(13, true)]
[InlineData(14, true)]
[InlineData(15, false)]
[InlineData(-1, false)]
public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1));
// Test with non-zero-based consecutive values (range 100 to 104)
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool IsInRange100To104(int val)
{
return val == 100
|| val == 101
|| val == 102
|| val == 103
|| val == 104;
}
[Theory]
[InlineData(99, false)]
[InlineData(100, true)]
[InlineData(101, true)]
[InlineData(102, true)]
[InlineData(103, true)]
[InlineData(104, true)]
[InlineData(105, false)]
[InlineData(-1, false)]
public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange100To104(arg1));

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 18, 2026 23:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

// For small switches with 2 or fewer non-default cases (caseCount <= 3 including the default),
// the backend's switch lowering already generates efficient comparison chains. Converting
// early can enable if-conversion (cmov) that produces worse code for these cases.
if (caseCount <= 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case or just an assumption?

return true;
}
else if (block->GetSwitchTargets()->GetSuccCount() == 2 && block->GetSwitchTargets()->HasDefaultCase() &&
(block->IsLIR() || fgNodeThreading == NodeThreading::AllTrees))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only do this for !block->IsLIR() I don't think it's beneficial to support LIR (Lower will handle such switches anyway). So we can simplify code by removing special cases for IsLIR()

FlowEdge* trueEdge;
FlowEdge* falseEdge;

if (block->NextIs(defaultDest))
Copy link
Member

@EgorBo EgorBo Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally avoid relying on lexical order, you can delete this if and unconditionally use the GT_LE path. We used to do it, but it's no longer required.

if (caseDest != defaultDest)
{
bool allCasesSameTarget = true;
for (unsigned i = 1; i < caseCount - 1; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only need to check defaultEdge->getDupCount() being 1 here

// The highest case index is caseCount - 2 (caseCount includes the default case).
// Using an unsigned GT comparison handles negative values correctly because they
// wrap to large unsigned values, making them greater than maxCaseIndex as expected.
const unsigned maxCaseIndex = caseCount - 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid this math and the explanation by using maxCaseIndex = firstCaseEdge->getDupCount() and then GT_LT instead of GT_LE

const unsigned trueDupCount = trueEdge->getDupCount();
const unsigned falseDupCount = falseEdge->getDupCount();

if (trueDupCount > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow this logic. We basically only need to adjust the firstCaseEdge's dupCount - set it to 1. and adjust bbRefs of firstCaseEdge's destionation block.
The defaultEdge and its destionation block is unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IL and asm differences from pattern matching vs comparisons

2 participants

Comments