JIT: Convert multi-target switches to branchless checks#124567
JIT: Convert multi-target switches to branchless checks#124567JulieLeeMSFT wants to merge 4 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
@EgorBo, PTAL. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes multi-target switches where all non-default cases target a single block (distinct from the default target) by converting them into unsigned range comparisons. This enables branchless code generation when both targets are simple return blocks, addressing the disparity between pattern matching and explicit equality comparisons noted in issue #123858.
Changes:
- Adds logic in
fgOptimizeSwitchBranchesto detect and optimize switches with exactly 2 unique successors - Transforms such switches into unsigned LE/GT comparisons, choosing the direction to favor fall-through
- Adds comprehensive test coverage for both zero-based and non-zero-based consecutive ranges
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/jit/fgopt.cpp | Implements the switch-to-range-check optimization with edge weight/dup count fixup and profile weight updates |
| src/tests/JIT/opt/OptSwitchRecognition/optSwitchRecognition.cs | Adds tests for zero-based (0-4) and non-zero-based (10-14) consecutive value ranges with comprehensive boundary cases |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| // Test with non-zero-based consecutive values | ||
| [MethodImpl(MethodImplOptions.NoInlining)] | ||
| private static bool IsInRange10To14(int val) | ||
| { | ||
| return val == 10 | ||
| || val == 11 | ||
| || val == 12 | ||
| || val == 13 | ||
| || val == 14; | ||
| } | ||
|
|
||
| [Theory] | ||
| [InlineData(9, false)] | ||
| [InlineData(10, true)] | ||
| [InlineData(11, true)] | ||
| [InlineData(12, true)] | ||
| [InlineData(13, true)] | ||
| [InlineData(14, true)] | ||
| [InlineData(15, false)] | ||
| [InlineData(-1, false)] | ||
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1)); |
There was a problem hiding this comment.
This “non-zero-based consecutive values” test uses 10..14, but SwitchRecognition forces minValue=0 when maxValue <= SWITCH_MAX_DISTANCE, so the recognized switch will likely cover 0..14 with many entries targeting the false block. That shape is not eligible for the new switch→range-check simplification (which requires all non-default cases to share one target). If the intent is to exercise the new optimization, consider using a range where maxValue > SWITCH_MAX_DISTANCE (e.g., 100..104) so recognition keeps a non-zero minValue and emits a dense 0..4 switch after subtracting.
| // Test with non-zero-based consecutive values | |
| [MethodImpl(MethodImplOptions.NoInlining)] | |
| private static bool IsInRange10To14(int val) | |
| { | |
| return val == 10 | |
| || val == 11 | |
| || val == 12 | |
| || val == 13 | |
| || val == 14; | |
| } | |
| [Theory] | |
| [InlineData(9, false)] | |
| [InlineData(10, true)] | |
| [InlineData(11, true)] | |
| [InlineData(12, true)] | |
| [InlineData(13, true)] | |
| [InlineData(14, true)] | |
| [InlineData(15, false)] | |
| [InlineData(-1, false)] | |
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange10To14(arg1)); | |
| // Test with non-zero-based consecutive values (range 100 to 104) | |
| [MethodImpl(MethodImplOptions.NoInlining)] | |
| private static bool IsInRange100To104(int val) | |
| { | |
| return val == 100 | |
| || val == 101 | |
| || val == 102 | |
| || val == 103 | |
| || val == 104; | |
| } | |
| [Theory] | |
| [InlineData(99, false)] | |
| [InlineData(100, true)] | |
| [InlineData(101, true)] | |
| [InlineData(102, true)] | |
| [InlineData(103, true)] | |
| [InlineData(104, true)] | |
| [InlineData(105, false)] | |
| [InlineData(-1, false)] | |
| public static void TestSwitchToRangeCheckNonZeroBased(int arg1, bool expected) => Assert.Equal(expected, IsInRange100To104(arg1)); |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| // For small switches with 2 or fewer non-default cases (caseCount <= 3 including the default), | ||
| // the backend's switch lowering already generates efficient comparison chains. Converting | ||
| // early can enable if-conversion (cmov) that produces worse code for these cases. | ||
| if (caseCount <= 3) |
There was a problem hiding this comment.
Is it the case or just an assumption?
| return true; | ||
| } | ||
| else if (block->GetSwitchTargets()->GetSuccCount() == 2 && block->GetSwitchTargets()->HasDefaultCase() && | ||
| (block->IsLIR() || fgNodeThreading == NodeThreading::AllTrees)) |
There was a problem hiding this comment.
Can we only do this for !block->IsLIR() I don't think it's beneficial to support LIR (Lower will handle such switches anyway). So we can simplify code by removing special cases for IsLIR()
| FlowEdge* trueEdge; | ||
| FlowEdge* falseEdge; | ||
|
|
||
| if (block->NextIs(defaultDest)) |
There was a problem hiding this comment.
We generally avoid relying on lexical order, you can delete this if and unconditionally use the GT_LE path. We used to do it, but it's no longer required.
| if (caseDest != defaultDest) | ||
| { | ||
| bool allCasesSameTarget = true; | ||
| for (unsigned i = 1; i < caseCount - 1; i++) |
There was a problem hiding this comment.
I think you only need to check defaultEdge->getDupCount() being 1 here
| // The highest case index is caseCount - 2 (caseCount includes the default case). | ||
| // Using an unsigned GT comparison handles negative values correctly because they | ||
| // wrap to large unsigned values, making them greater than maxCaseIndex as expected. | ||
| const unsigned maxCaseIndex = caseCount - 2; |
There was a problem hiding this comment.
I think you can avoid this math and the explanation by using maxCaseIndex = firstCaseEdge->getDupCount() and then GT_LT instead of GT_LE
| const unsigned trueDupCount = trueEdge->getDupCount(); | ||
| const unsigned falseDupCount = falseEdge->getDupCount(); | ||
|
|
||
| if (trueDupCount > 1) |
There was a problem hiding this comment.
I don't follow this logic. We basically only need to adjust the firstCaseEdge's dupCount - set it to 1. and adjust bbRefs of firstCaseEdge's destionation block.
The defaultEdge and its destionation block is unchanged.
Fixes #123858.
This PR improves JIT codegen for multi-target
switch-style tests by converting eligible cases into branchless checks. The goal is to produce optimal codegen that matches the intent of C# pattern matching withor(e.g.,x is A or B or C).When a switch has exactly 2 unique successors (all non-default cases target one block, default targets another), convert it from Switch to an unsigned range comparison (In
fgOptimizeSwitchBranches).When both targets are simple return blocks,
fgFoldCondToReturnBlockfurther folds this into branchless return.Example
Before
After
Details
ASMDiffs