[release/5.0] Prevent emitting Avx2 instruction for Vector256<T>.AllBitsSet and Vector256.Create(-1) when Avx2 is not supported#48630
Merged
Anipik merged 1 commit intodotnet:release/5.0from Mar 10, 2021
Conversation
…ector256<T>.AllBitsSet` intrinsic when Avx2 is not supported. Emit `vcmptrueps ymmReg, ymmReg, ymmReg` instead
Contributor
Author
|
@dotnet/jit-contrib @tannergooding PTAL |
tannergooding
approved these changes
Feb 22, 2021
Contributor
Author
|
Can I have sign-off from someone on @dotnet/jit-contrib please? |
EgorBo
approved these changes
Feb 25, 2021
Contributor
Author
jeffschwMSFT
approved these changes
Mar 1, 2021
Member
jeffschwMSFT
left a comment
There was a problem hiding this comment.
Approved. We will take for consideration in .NET 5.0.x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is backporting of #48383 to 5.0
Closes #48283
Customer Impact
In the two following cases
and
on x86 and x64 the JIT emits
vpcmpeqd ymmReg, ymmReg, ymmReginstruction (that is supported by Avx2 CPU feature - see pcmpeqb:pcmpeqw:pcmpeqd) even on a machine that doesn't support the feature.For example, the issue occurs with CPUs that belong to Sandy Bridge or Ivy Bridge microarchitectures.
As you can see in #48283, a customer originally observed the issue on an Ivy Bridge machine. The execution of such code on their machine resulted in crash followed by terminating of the process by OS.
Validation
I don't have access to a machine that doesn't support Avx2. Therefore I used the same methodology we use in CI for ISA-related test scenarios (e.g. runtime-coreclr jitstress-isas-x86) where we set corresponding COMPlus environment variables to simulate that a CPU feature is not supported.
I generated all possible variations of
Vector128/256<T>.AllBitsSetandVector128/256.Create((T)-1)(in the latter caseTis a signed integer type) (see Runtime_48283.cs) and manually validated that the JIT doesn't emit the forbidden instructions.You can find below results of the JIT output for the following three cases - noavx (simulated with
COMPlus_EnableAVX=0), avx (simulated withCOMPlus_EnableAVX2=0) and avx2 (when none of the ISAs were forbidden) for both x86 and x64.noavx-win-x64.asm
avx-win-x64.asm
avx2-win-x64.asm
noavx-win-x86.asm
avx-win-x86.asm
avx2-win-x86.asm
After the change, the JIT will emit
vcmptrueps ymmReg, ymmReg, ymmRegfor such cases. This is similar to what Clang does - https://godbolt.org/z/erd5sf.Risk
Low, the changes impacts only code generation of hardware intrinsics
Vector128/256<int>.AllBitsSetthat were introduces in 5.0 and optimization in the JIT ofVector256.Create(-1)that uses the mentioned intrinsics (that was also done in 5.0).Regression
Yes, the
Vector256.Create(-1)on 3.1 wouldn't use Avx2 instruction.There is no
Vector128/256<int>.AllBitsSetAPI in 3.1.Note
The change depends on #48613 being merged - where another problem with
Vector256.Create(-1)optimization was fixed.The problem is that, at the moment, the JIT lowers both
Vector128.Create(-1)andVector256.Create(-1)toVector128<T>.AllBitsSetwithout taking into account the vector size (see src/coreclr/src/jit/lowerxarch.cpp