Simplifying the emitter handling of 4-byte encoded SSE instructions#21528
Conversation
|
CC. @CarolEidt, @fiigii |
| #include "codegen.h" | ||
|
|
||
| bool IsSSE2Instruction(instruction ins) | ||
| bool IsSSEInstruction(instruction ins) |
There was a problem hiding this comment.
SSE2 wasn't an accurate name, since this was also checking SSE and SSE3 instructions
| return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION); | ||
| } | ||
|
|
||
| bool IsSSE4Instruction(instruction ins) |
There was a problem hiding this comment.
We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A (and since we had some instructions in this grouping that weren't 4-byte encoded).
| // | ||
| // Note that this should be true for any of the instructions in instrsXArch.h | ||
| // that use the SSE38 or SSE3A macro. | ||
| bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins) |
There was a problem hiding this comment.
This is just simplified to EncodedBySSE38OrSSE3A
| #define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | ||
|
|
||
| INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) | ||
| INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
We can simplify this grouping to just SSE, which covers everything prior to AVX
| #define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | ||
| #define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2) | ||
|
|
||
| INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
SSE2 wasn't accurate since we also had SSE and SSE3 instructions.
| INST3(packuswb, "packuswb", IUM_WR, BAD_CODE, BAD_CODE, PCKDBL(0x67), INS_Flags_IsDstDstSrcAVXInstruction) // Pack (narrow) short to unsigned byte with saturation | ||
| INST3(LAST_SSE2_INSTRUCTION, "LAST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) | ||
|
|
||
| INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None) |
There was a problem hiding this comment.
SSE4 wasn't accurate, since this covered anything that was SSE38 or SSE3A encoded (including some SSSE3 and other instructions). It also didn't only cover instructions that required the 4-byte encoding.
|
|
||
| size_t insCode = 0; | ||
|
|
||
| if (!IsSSEOrAVXInstruction(ins)) |
There was a problem hiding this comment.
This covers the additional check that Is4ByteSSE4Instruction and Is4ByteSSE4OrAVXInstruction were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.
fiigii
left a comment
There was a problem hiding this comment.
LGTM overall, just one naming suggestion.
| bool emitter::Is4ByteSSE4Instruction(instruction ins) | ||
| // that use the SSE38 or SSE3A macro but returns false if the VEX encoding is | ||
| // in use, since that encoding does not require an additional byte. | ||
| bool emitter::Is4ByteSSEInstruction(instruction ins) |
There was a problem hiding this comment.
Now, the code mixes using EncodedBySSE38orSSE3A and Is4ByteSSEInstruction that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A or something similar.
There was a problem hiding this comment.
I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction so we can remove this as well.
Currently, the majority of places calling Is4ByteSSEInstruction are doing so to determine if they should update the sz estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).
Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins)); and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins) or centralized).
It would be a slightly more involved change, however; so I think it would be better as a followup fix.
|
Please also run noavx and sse2only CI tests. |
|
test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic |
|
CC. @CarolEidt. This should be ready for review. |
CarolEidt
left a comment
There was a problem hiding this comment.
LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!
This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.
Currently, we set
UseSSE4=truewhenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.We can simplify this logic greatly by just getting rid of
UseSSE4and relying only onEncodedBySSE38OrSSE3AandUseVexEncoding.