Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Simplifying the emitter handling of 4-byte encoded SSE instructions#21528

Merged
tannergooding merged 1 commit into
dotnet:masterfrom
tannergooding:simplify-sse4
Dec 14, 2018
Merged

Simplifying the emitter handling of 4-byte encoded SSE instructions#21528
tannergooding merged 1 commit into
dotnet:masterfrom
tannergooding:simplify-sse4

Conversation

@tannergooding
Copy link
Copy Markdown
Member

This is an incremental cleanup on the emitter around the 4-byte encoded SSE instruction handling.

Currently, we set UseSSE4=true whenever the compiler supports any ISA that requires such encoding. In the emitter, we then check this value along with some other metadata/values to determine if we need to increase the estimated bytes emitted or if we have an extra byte to actually emit.

We can simplify this logic greatly by just getting rid of UseSSE4 and relying only on EncodedBySSE38OrSSE3A and UseVexEncoding.

@tannergooding
Copy link
Copy Markdown
Member Author

CC. @CarolEidt, @fiigii

Comment thread src/jit/emitxarch.cpp
#include "codegen.h"

bool IsSSE2Instruction(instruction ins)
bool IsSSEInstruction(instruction ins)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE2 wasn't an accurate name, since this was also checking SSE and SSE3 instructions

Comment thread src/jit/emitxarch.cpp
return (ins >= INS_FIRST_SSE2_INSTRUCTION) && (ins <= INS_LAST_SSE2_INSTRUCTION);
}

bool IsSSE4Instruction(instruction ins)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this grouping anymore, since we just rely on EncodedBySSE38OrSSE3A (and since we had some instructions in this grouping that weren't 4-byte encoded).

Comment thread src/jit/emitxarch.cpp
//
// Note that this should be true for any of the instructions in instrsXArch.h
// that use the SSE38 or SSE3A macro.
bool emitter::Is4ByteSSE4OrAVXInstruction(instruction ins)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just simplified to EncodedBySSE38OrSSE3A

Comment thread src/jit/instrsxarch.h
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)

INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
INST3(FIRST_SSE_INSTRUCTION, "FIRST_SSE_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this grouping to just SSE, which covers everything prior to AVX

Comment thread src/jit/instrsxarch.h
#define VEX3INT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)
#define VEX3FLT(c1,c2) PACK4(c1, 0xc5, 0x02, c2)

INST3(FIRST_SSE2_INSTRUCTION, "FIRST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE2 wasn't accurate since we also had SSE and SSE3 instructions.

Comment thread src/jit/instrsxarch.h
INST3(packuswb, "packuswb", IUM_WR, BAD_CODE, BAD_CODE, PCKDBL(0x67), INS_Flags_IsDstDstSrcAVXInstruction) // Pack (narrow) short to unsigned byte with saturation
INST3(LAST_SSE2_INSTRUCTION, "LAST_SSE2_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)

INST3(FIRST_SSE4_INSTRUCTION, "FIRST_SSE4_INSTRUCTION", IUM_WR, BAD_CODE, BAD_CODE, BAD_CODE, INS_FLAGS_None)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE4 wasn't accurate, since this covered anything that was SSE38 or SSE3A encoded (including some SSSE3 and other instructions). It also didn't only cover instructions that required the 4-byte encoding.

Comment thread src/jit/emitxarch.cpp

size_t insCode = 0;

if (!IsSSEOrAVXInstruction(ins))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers the additional check that Is4ByteSSE4Instruction and Is4ByteSSE4OrAVXInstruction were doing, but as a single check (rather than multiple separate ones).
It also short-circuits the path for any instruction that can't be SSE38/SSE3A.

Copy link
Copy Markdown

@fiigii fiigii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one naming suggestion.

Comment thread src/jit/emitxarch.cpp
bool emitter::Is4ByteSSE4Instruction(instruction ins)
// that use the SSE38 or SSE3A macro but returns false if the VEX encoding is
// in use, since that encoding does not require an additional byte.
bool emitter::Is4ByteSSEInstruction(instruction ins)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, the code mixes using EncodedBySSE38orSSE3A and Is4ByteSSEInstruction that looks confusing. Perhaps, we can rename this function to EncodedByLegacySSE38orSSE3A or something similar.

Copy link
Copy Markdown
Member Author

@tannergooding tannergooding Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be okay to keep it as is, and instead follow up with the code calling Is4ByteSSEInstruction so we can remove this as well.

Currently, the majority of places calling Is4ByteSSEInstruction are doing so to determine if they should update the sz estimate (7/11 references). This logic would be better suited to be centralized (as most of the other size calculations are).

Of the remaining 4 usages, 2 are assert(IsAVXInstruction(ins) || Is4ByteSSEInstruction(ins));, which can be simplified to assert(IsAVXInstruction(ins) || EncodedBySSE38OrSSE3A(ins)); and the remaining two are for determining if the additional byte needs to be output (and could just be simplified to !UseVEXEncoding() && EncodedBySSE38orSSE3A(ins) or centralized).

It would be a slightly more involved change, however; so I think it would be better as a followup fix.

@fiigii
Copy link
Copy Markdown

fiigii commented Dec 13, 2018

Please also run noavx and sse2only CI tests.

@tannergooding
Copy link
Copy Markdown
Member Author

test Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0)

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic
test Windows_NT x64 Checked jitsse2only

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic
test Windows_NT x86 Checked jitsse2only

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic
test Ubuntu x64 Checked jitsse2only

@tannergooding
Copy link
Copy Markdown
Member Author

CC. @CarolEidt. This should be ready for review.
As per the title, this is just an iterative improvement trying to simplify some of the emitter logic around the SSE types.

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'm kind of surprised that it's possible to make incremental improvements like this!

@tannergooding tannergooding merged commit 813bd6e into dotnet:master Dec 14, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants