Updating the emitter to more generally handle 4-Byte SSE4 instructions.#16249
Conversation
CarolEidt
left a comment
There was a problem hiding this comment.
This LGTM except that I can't figure out why you're not emitting an extra byte in emitOutputRR() (probably something I've missed).
| } | ||
|
|
||
| if ((id->idInsFmt() != IF_RWR_RRD_ARD) && (id->idInsFmt() != IF_RWR_RRD_ARD_CNS)) | ||
| // encode source operand reg in 'vvvv' bits in 1's compliement form |
There was a problem hiding this comment.
While you're here you could fix the typo: "compliement" should be "complement"
There was a problem hiding this comment.
I fixed it here, and the 12 other places it occurred in this document.
| if ((code & 0xFF00) == 0xC000) | ||
| { | ||
| dst += emitOutputByte(dst, (0xC0 | regCode)); | ||
| dst += emitOutputWord(dst, code | (regCode << 8)); |
There was a problem hiding this comment.
There's probably something I'm missing here, but how did this go from emitting a byte to emitting a word?
There was a problem hiding this comment.
From what I can tell based on the comment above, this was trying to support the smaller encoding (which isn't supported anywhere else in the emitter).
I thought I had a comment indicating I was still looking at this in particular, but can't seem to find it anymore (maybe I just forget to submit).
There was a problem hiding this comment.
@CarolEidt, I'm finishing validating locally, but it looks like this is a dead code path and is not currently hit (for emitOutputRR). I can only speculate (based on the comment above this if block) that it was meant to support the smaller 2-byte VEX prefix scenario, which isn't actually working.
However, for emitOutputInstr it special cases IF_RRW_RRW_CNS, that does hit the equivalent code path, and that requires it to emitOutputWord.
I believe the correct fix is to refactor all three cases where this particular code pattern is (emitOutputRR, emitOutputRRR, and IF_RRW_RRW_CNS in emitOutputInstr) to do the following:
// TODO-XArch-CQ: Right now support 4-byte opcode instructions only
if ((code & 0xFF00) == 0xC000)
{
dst += emitOutputWord(dst, code | (regCode << 8));
}
else if ((code & 0xFF) == 0x00)
{
// This case happens for SSE4/AVX instructions only
assert(IsAVXInstruction(ins) || IsSSE4Instruction(ins));
dst += emitOutputByte(dst, (code >> 8) & 0xFF);
dst += emitOutputByte(dst, (0xC0 | regCode));
}
else
{
dst += emitOutputWord(dst, code);
dst += emitOutputByte(dst, (0xC0 | regcode));
}There was a problem hiding this comment.
Changed to the above. I validated that the original code path was never hit for emitOutputRR and emitOutputRRR.
There was a problem hiding this comment.
Awesome! Thanks for checking this out and cleaning it up.
|
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
|
The following are a separate issue, tracked by https://github.com/dotnet/coreclr/issues/16236:
The following timed out and have been reset:
The following is due to an existing issue: #16249 (comment)
|
|
x64_checked_windows_nt_jitstressregs4 is not related. The I've logged a bug (https://github.com/dotnet/coreclr/issues/16286) and should have a fix up tonight. |
seems to be related to |
FYI. @CarolEidt, @fiigii, @eerhardt
This should mostly resolve https://github.com/dotnet/coreclr/issues/15908 and https://github.com/dotnet/coreclr/issues/16216, at least for the code paths currently being executed.