Simplifying the emitter handling of 4-byte encoded SSE instructions by tannergooding · Pull Request #21528 · dotnet/coreclr

tannergooding · 2018-12-13T18:41:12Z

This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.

Currently, we set UseSSE4=true whenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.

We can simplify this logic greatly by just getting rid of UseSSE4 and relying only on EncodedBySSE38OrSSE3A and UseVexEncoding.

tannergooding · 2018-12-13T18:41:20Z

CC. @CarolEidt, @fiigii

tannergooding · 2018-12-13T18:41:50Z

 #include "codegen.h"

-bool IsSSE2Instruction(instruction ins)
+bool IsSSEInstruction(instruction ins)


SSE2 wasn't an accurate name, since this was also checking SSE and SSE3 instructions

tannergooding · 2018-12-13T18:42:32Z

-    return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION);
-}
-
-bool IsSSE4Instruction(instruction ins)


We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A (and since we had some instructions in this grouping that weren't 4-byte encoded).

tannergooding · 2018-12-13T18:43:08Z

-//
-// Note that this should be true for any of the instructions in instrsXArch.h
-// that use the SSE38 or SSE3A macro.
-bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins)


This is just simplified to EncodedBySSE38OrSSE3A

tannergooding · 2018-12-13T18:44:57Z

 #define VEX3FLT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)

-INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
+INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


We can simplify this grouping to just SSE, which covers everything prior to AVX

tannergooding · 2018-12-13T18:45:35Z

 #define VEX3INT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)
 #define VEX3FLT(c1,c2)   PACK4(c1, 0xc5, 0x02, c2)

-INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


SSE2 wasn't accurate since we also had SSE and SSE3 instructions.

tannergooding · 2018-12-13T18:46:15Z

 INST3(packuswb,         "packuswb",         IUM_WR, BAD_CODE,     BAD_CODE,     PCKDBL(0x67),                            INS_Flags_IsDstDstSrcAVXInstruction)    // Pack (narrow) short to unsigned byte with saturation
-INST3(LAST_SSE2_INSTRUCTION, "LAST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)

-INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)


SSE4 wasn't accurate, since this covered anything that was SSE38 or SSE3A encoded (including some SSSE3 and other instructions). It also didn't only cover instructions that required the 4-byte encoding.

tannergooding · 2018-12-13T18:47:53Z


    size_t insCode = 0;

+    if (!IsSSEOrAVXInstruction(ins))


This covers the additional check that Is4ByteSSE4Instruction and Is4ByteSSE4OrAVXInstruction were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.

fiigii

LGTM overall, just one naming suggestion.

fiigii · 2018-12-13T20:28:41Z

-bool emitter::Is4ByteSSE4Instruction(instruction ins)
+// that use the SSE38 or SSE3A macro but returns false if the VEX encoding is
+// in use, since that encoding does not require an additional byte.
+bool emitter::Is4ByteSSEInstruction(instruction ins)


Now, the code mixes using EncodedBySSE38orSSE3A and Is4ByteSSEInstruction that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A or something similar.

I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction so we can remove this as well.

Currently, the majority of places calling Is4ByteSSEInstruction are doing so to determine if they should update the sz estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).

Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins)); and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins) or centralized).

It would be a slightly more involved change, however; so I think it would be better as a followup fix.

fiigii · 2018-12-13T20:32:02Z

Please also run noavx and sse2only CI tests.

tannergooding · 2018-12-13T21:09:27Z

test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic
test Windows_NT x64 Checked jitsse2only

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic
test Windows_NT x86 Checked jitsse2only

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic
test Ubuntu x64 Checked jitsse2only

tannergooding · 2018-12-14T21:01:08Z

CC. @CarolEidt. This should be ready for review.
As per the title, this is just an iterative improvement trying to simplify some of the emitter logic around the SSE types.

CarolEidt

LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!

…otnet/coreclr#21528) Commit migrated from dotnet/coreclr@813bd6e

Simplifying the emitter handling of 4-byte encoded SSE instructions

4b978fe

tannergooding commented Dec 13, 2018

View reviewed changes

fiigii approved these changes Dec 13, 2018

View reviewed changes

CarolEidt approved these changes Dec 14, 2018

View reviewed changes

tannergooding merged commit 813bd6e into dotnet:master Dec 14, 2018

mikedn mentioned this pull request Mar 8, 2019

Handle addressing modes for HW intrinsics #22944

Merged

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Simplifying the emitter handling of 4-byte encoded SSE instructions (d…

51d747c

…otnet/coreclr#21528) Commit migrated from dotnet/coreclr@813bd6e

Conversation

tannergooding commented Dec 13, 2018

Uh oh!

tannergooding commented Dec 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Dec 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii commented Dec 13, 2018

Uh oh!

tannergooding commented Dec 13, 2018

Uh oh!

tannergooding commented Dec 14, 2018

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding Dec 13, 2018 •

edited

Loading