Updating the emitter to more generally handle 4-Byte SSE4 instructions. by tannergooding · Pull Request #16249 · dotnet/coreclr

tannergooding · 2018-02-07T04:45:28Z

This should mostly resolve https://github.com/dotnet/coreclr/issues/15908 and https://github.com/dotnet/coreclr/issues/16216, at least for the code paths currently being executed.

…d, Math.Ceiling, and Math.Floor

CarolEidt

This LGTM except that I can't figure out why you're not emitting an extra byte in emitOutputRR() (probably something I've missed).

CarolEidt · 2018-02-07T19:13:24Z

+            }

-        if ((id->idInsFmt() != IF_RWR_RRD_ARD) && (id->idInsFmt() != IF_RWR_RRD_ARD_CNS))
+            // encode source operand reg in 'vvvv' bits in 1's compliement form


While you're here you could fix the typo: "compliement" should be "complement"

I fixed it here, and the 12 other places it occurred in this document.

CarolEidt · 2018-02-07T19:18:03Z

        if ((code & 0xFF00) == 0xC000)
        {
-            dst += emitOutputByte(dst, (0xC0 | regCode));
+            dst += emitOutputWord(dst, code | (regCode << 8));


There's probably something I'm missing here, but how did this go from emitting a byte to emitting a word?

From what I can tell based on the comment above, this was trying to support the smaller encoding (which isn't supported anywhere else in the emitter).

I thought I had a comment indicating I was still looking at this in particular, but can't seem to find it anymore (maybe I just forget to submit).

@CarolEidt, I'm finishing validating locally, but it looks like this is a dead code path and is not currently hit (for emitOutputRR). I can only speculate (based on the comment above this if block) that it was meant to support the smaller 2-byte VEX prefix scenario, which isn't actually working.

However, for emitOutputInstr it special cases IF_RRW_RRW_CNS, that does hit the equivalent code path, and that requires it to emitOutputWord.

I believe the correct fix is to refactor all three cases where this particular code pattern is (emitOutputRR, emitOutputRRR, and IF_RRW_RRW_CNS in emitOutputInstr) to do the following:

// TODO-XArch-CQ: Right now support 4-byte opcode instructions only if ((code & 0xFF00) == 0xC000) { dst += emitOutputWord(dst, code | (regCode << 8)); } else if ((code & 0xFF) == 0x00) { // This case happens for SSE4/AVX instructions only assert(IsAVXInstruction(ins) || IsSSE4Instruction(ins)); dst += emitOutputByte(dst, (code >> 8) & 0xFF); dst += emitOutputByte(dst, (0xC0 | regCode)); } else { dst += emitOutputWord(dst, code); dst += emitOutputByte(dst, (0xC0 | regcode)); }

Changed to the above. I validated that the original code path was never hit for emitOutputRR and emitOutputRRR.

Awesome! Thanks for checking this out and cleaning it up.

tannergooding · 2018-02-08T16:00:37Z

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

tannergooding · 2018-02-08T20:28:53Z

The following are a separate issue, tracked by https://github.com/dotnet/coreclr/issues/16236:

x64_checked_windows_nt_jitstress2
x64_checked_windows_nt_jitstress2_jitstressregs8
x64_checked_windows_nt_jitstress2_jitstressregs4
x64_checked_windows_nt_jitstress2_jitstressregs3
x64_checked_windows_nt_jitstress2_jitstressregs2
x64_checked_windows_nt_jitstress2_jitstressregs1
x64_checked_windows_nt_jitstress2_jitstressregs0x1000
x64_checked_windows_nt_jitstress2_jitstressregs0x80
x64_checked_windows_nt_jitstress2_jitstressregs0x10
x64_checked_windows_nt_jitstress1

The following timed out and have been reset:

x64_checked_osx10.12_jitx86hwintrinsicnoavx_flow

The following is due to an existing issue: #16249 (comment)

x64_checked_windows_nt_jitstressregs4

tannergooding · 2018-02-08T22:24:52Z

x64_checked_windows_nt_jitstressregs4 is not related. The AddRex*Prefix checks need to be updated to account for the prefetch instructions (which are SSE instructions, but which need the actual REX prefix, rather than the VEX prefix).

I've logged a bug (https://github.com/dotnet/coreclr/issues/16286) and should have a fix up tonight.

4creators · 2018-02-09T01:25:35Z

The following has been reset, tests failed due to OOM, looks unrelated:

seems to be related to Test Infrastructure Failure: The paging file is too small for this operation to complete. failures in #16237

tannergooding added 2 commits February 7, 2018 08:55

Updating the emitter to more generally handle 4-Byte SSE4 instructions.

d27217b

Enabling the named intrinsic support on SSE4.1 hardware for Math.Roun…

cfe1ad4

…d, Math.Ceiling, and Math.Floor

CarolEidt reviewed Feb 7, 2018

View reviewed changes

tannergooding changed the title ~~[WIP] Updating the emitter to more generally handle 4-Byte SSE4 instructions.~~ Updating the emitter to more generally handle 4-Byte SSE4 instructions. Feb 8, 2018

tannergooding added 2 commits February 8, 2018 08:00

Fixing up the emitOutput handling for RR/RRR instructions

88e9032

Change compliement to complement

8c2f3ec

CarolEidt approved these changes Feb 8, 2018

View reviewed changes

tannergooding merged commit 0991c4f into dotnet:master Feb 9, 2018

tannergooding deleted the 4ByteSSE4 branch May 30, 2018 04:12

mikedn mentioned this pull request Mar 8, 2019

Handle addressing modes for HW intrinsics #22944

Merged

fiigii mentioned this pull request Jan 31, 2020

[RyuJIT] Update the emitter for VEX-encoded SSE4.1/4.2 instructions with containment dotnet/runtime#9697

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the emitter to more generally handle 4-Byte SSE4 instructions.#16249

Updating the emitter to more generally handle 4-Byte SSE4 instructions.#16249
tannergooding merged 4 commits into
dotnet:masterfrom
tannergooding:4ByteSSE4

tannergooding commented Feb 7, 2018 •

edited

Loading

Uh oh!

CarolEidt left a comment

Uh oh!

CarolEidt Feb 7, 2018

Uh oh!

tannergooding Feb 8, 2018

Uh oh!

CarolEidt Feb 7, 2018

Uh oh!

tannergooding Feb 7, 2018

Uh oh!

tannergooding Feb 8, 2018

Uh oh!

tannergooding Feb 8, 2018

Uh oh!

CarolEidt Feb 8, 2018

Uh oh!

tannergooding commented Feb 8, 2018

Uh oh!

tannergooding commented Feb 8, 2018 •

edited

Loading

Uh oh!

tannergooding commented Feb 8, 2018 •

edited

Loading

Uh oh!

4creators commented Feb 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tannergooding commented Feb 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

CarolEidt Feb 7, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Feb 8, 2018

Choose a reason for hiding this comment

Uh oh!

CarolEidt Feb 7, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Feb 7, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Feb 8, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Feb 8, 2018

Choose a reason for hiding this comment

Uh oh!

CarolEidt Feb 8, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Feb 8, 2018

Uh oh!

tannergooding commented Feb 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Feb 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

4creators commented Feb 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tannergooding commented Feb 7, 2018 •

edited

Loading

tannergooding commented Feb 8, 2018 •

edited

Loading

tannergooding commented Feb 8, 2018 •

edited

Loading