[Arm64] Implement Simd.Extract + by sdmaclea · Pull Request #16085 · dotnet/coreclr

sdmaclea · 2018-01-29T23:37:35Z

No description provided.

sdmaclea

This implements the first intrinsic with const immediate. It handles the non-const case by generating a switch table

@CarolEidt @RussKeldorph @tannergooding PTAL
@dotnet/arm64-contrib @dotnet/jit-contrib FYI

sdmaclea · 2018-01-29T23:41:35Z

+        inst_JMP(EJ_jmp, labelBreakTarget);
+    }
+    genDefineTempLabel(labelBreakTarget);
+}


This genHWIntrinsicSwitchTable is intended to be reusable for any intrinsic requiring a switch table. It is designed to work with single instruction intrinsics, so the case spacing is hard coded to two instructions (8 bytes).

Why hardcode it, instead of defining a label per offset, so it can be dynamically sized, if required?

That is what I did for the x86 codegen (https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsiccodegenxarch.cpp#L655).

I do like the approach to making it reusable 😄

Why hardcode it, instead of defining a label per offset, so it can be dynamically sized, if required?

Arm64 instructions are fixed size. So it makes sense. Single instruction per case is the expected use case. (I would add a case instruction count parameter if needed).

Label per offset would require more complexity. Either a separate direct branch table so the target address could be calculated, or an if else compare and branch chain. Or ...

Looks like you chose the separate direct branch table.

The generated code will be smaller. Since the jump table is not needed.

Certainly if a variable sized case was needed, We could add the jump table. Right now I expect homogeneous cases of single instructions.

Looks like you chose the separate direct branch table

Yes, x86 instructions can potentially range in size from 1 byte to 16+ bytes (depending on prefixes, encoding, etc).

Although, within a given HWIntrinsic jump table, they will likely be fairly consistent.

It seems that for Arm64 this approach is best. If it were found to be necessary, it could be parameterized based on the instruction (though I don't see us needing this for anything that would require more than a single instruction per case).

sdmaclea · 2018-01-29T23:42:39Z

+
+    int lanes = emitTypeSize(simdType) / baseTypeSize;
+
+    auto emitSwCase = [&](int lane) {


Using a lambda to populate instructions in the switch table

sdmaclea · 2018-01-29T23:44:48Z

+    {
+        int lane = op2->AsIntConCommon()->IconValue();
+
+        emitSwCase(lane);


Lambda used even when not generating the switch.

sdmaclea · 2018-01-29T23:48:35Z

Tested with new tests added to #16008

sdmaclea · 2018-01-30T14:34:45Z

+
+    if (op2->isContainedIntOrIImmed())
+    {
+        int lane = op2->AsIntConCommon()->IconValue();


src\jit\codegenarm64.cpp(5147): warning C4244: 'initializing': conversion from 'ssize_t' to 'int', possible loss of data [D:\j\workspace\x64_checked_w---76911707\bin\obj\Windows_NT.x64.Checked\src\jit\protononjit\protononjit.vcxproj]

sdmaclea · 2018-01-30T23:49:24Z

Added Simd.BitwiseSelect() support.
Added #16097 & #16102 to avoid merge conflicts

sdmaclea · 2018-01-31T16:57:54Z

Simple rebase to fix merge conflict.

sdmaclea · 2018-01-31T19:02:07Z

@CarolEidt ping. Can this be reviewed/merged?

CarolEidt

I would like to see an assert verifying the single-instruction requirement, and function headers are missing from even some of the pre-existing methods.

CarolEidt · 2018-01-31T19:05:24Z

    genProduceReg(node);
 }

+template <typename HWIntrinsicSwitchCaseBody>


This needs a function header. See https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#94-function-header-comment

I would in particular describe the scenario this supports, as it may be confusing at first why we have a HW intrinsic switch.

CarolEidt · 2018-01-31T19:11:54Z

+        inst_JMP(EJ_jmp, labelBreakTarget);
+    }
+    genDefineTempLabel(labelBreakTarget);
+}


It seems that for Arm64 this approach is best. If it were found to be necessary, it could be parameterized based on the instruction (though I don't see us needing this for anything that would require more than a single instruction per case).

CarolEidt · 2018-01-31T19:24:41Z

+    genDefineTempLabel(labelFirst);
+    for (int i = 0; i < swMax; ++i)
+    {
+        emitSwCase(i);


Since you are assuming that this generates a single instruction, you might do something like:

// This code assumes that emitSwCase() generates a single instruction. unsigned prevInsCount = getEmitter()->emitInsCount; for (int i = 0; i < swMax; ++i) { emitSwCase(i); newInsCount = getEmitter()->emitInsCount; assert(newInsCount == (prevInsCount + 1)); prevInsCount = newInsCount; }

I think that emitInsCount is public.

CarolEidt · 2018-01-31T19:26:19Z

+    int lanes = emitTypeSize(simdType) / baseTypeSize;
+
+    auto emitSwCase = [&](int lane) {
+        assert(lane >= 0);


To me, lane is not very mnemonic, though others may disagree. Something like caseImmediate or caseImm?

@CarolEidt
The immediate is the vector element lane for this instruction.

caseImm ... feels too generic.

lane and lanes could become:

element & elements

vectorElement & vectorElements

vectorIndex & vectorLength

Preference?

Right - I hadn't really internalized that this is the specific case (not the general case as in genHWIntrinsicSwitchTable()). I think element and elements would be good.

CarolEidt · 2018-01-31T19:27:26Z

+    bool     is16Byte = (node->gtSIMDSize > 8);
+    emitAttr attr     = is16Byte ? EA_16BYTE : EA_8BYTE;
+
+    // Arm64 has three bit select forms each use three source registers


nit: I would add a ';' after "forms':

// Arm64 has three bit select forms; each use three source registers

sdmaclea · 2018-01-31T23:13:22Z

@CarolEidt @tannergooding

Adding Unsigned compare zero lowering. Unsigned compare zero tests are now all passing.
Added Simd.SetAllVector* support. Tests are passing.
Added/fixed comments per @CarolEidt request
Added asserts per @CarolEidt request
Renamed lane to element per @CarolEidt request.

PTAL

tannergooding · 2018-01-31T23:24:12Z

+//------------------------------------------------------------------------
+// genHWIntrinsicSimdBinaryOp:
+//
+// Produce code for a GT_HWIntrinsic node with form SimdBinaryOp.


Does the SIMD size matter?

Was just wanting to make sure since there will be Vector64 and Vector128 types, and eventually Vector256+ (if SVE is supported).

All bets are off on SVE

tannergooding · 2018-01-31T23:24:59Z

+// need to generate functionally correct code when the operand is not constant
+//
+// This is required by the HW Intrinsic design to handle:
+//   debugger calls


It might be better to list this as: to handle indirect calls, such as:

tannergooding · 2018-01-31T23:30:29Z

+            // op1 is the first operand
+            // op2 is the second operand
+            // op3 is the third operand
+            op3 = impSIMDPopStack(simdType);


It might be good to have a general helper method for popping/validating the types, as was requested/done for x86.

I am not sure why. Each form has different requirements. Not sure how a helper would help.

Maybe impScalarPopStack()

x86 has this method: https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsicxarch.cpp#L276

Which validates the type of a struct is a SIMD Type and the type of a scalar matches what the signature expects.

I believe @CarolEidt was the one who requested we validate the type of the scalar values as well, rather than just calling impPopStack().val

I was assuming we relied on the C# (et al) compiler to validate this.
Perhaps this is to check hand written IL.
Or perhaps it is just defensive programming,

In any case I think this can safely be a separate PR.

Or perhaps it is just defensive programming,

Yes, that's the idea (that's true of many of the asserts and checks in the JIT, but it's surprising how often the "obvious" checks catch a problem.

In any case I think this can safely be a separate PR.

Me too

tannergooding · 2018-01-31T23:31:14Z

    enum Flags
    {
-        None
+        None          = 0,


Prefix with HW_Flag_?

It is inside HWIntrinsicInfo so it is HWIntrinsicInfo::None

tannergooding · 2018-01-31T23:33:25Z

+    GenTree*        op1     = intrinsicTree->gtOp.gtOp1;
+    GenTree*        op2     = intrinsicTree->gtOp.gtOp2;
+
+    if (op1->OperIs(GT_LIST))


OperIsList()?

It is an explicit helper method for checking OperIs(GT_LIST). We have a few of them and they look to be the most prevalent in the codebase (or at least the most prevalent in the code I've touched so far).

Why? OperIs() is pretty standard in Arm64 lower.

FWIW I don't think we have a guidelines or even a lot of consistency on this. OperIs() is very useful for making checks for multiple oper values more concise and easier to read. For a single value, I think either is fine.

I don't have a strong preference either, I was just mostly wondering why one over the other was used here.

tannergooding · 2018-01-31T23:34:44Z

+    }
+    else
+    {
+        info->srcCount += GetOperandInfo(op1);


Is ARM not going to have 0 operand nodes or just not yet?

I do not see a reason. Unless we need to support barriers.

Unless we need to support barriers.

x86 added StoreFence, LoadFence, and MemoryFence. It also has a couple of helper methods (such as Sse.SetZeroVector128) which takes 0 args.

Separate PR if needed.

Sounds fine to me.

CarolEidt

LGTM

CarolEidt · 2018-01-31T23:40:38Z

+    auto intrinsicID   = node->gtHWIntrinsicId;
+    auto intrinsicInfo = comp->getHWIntrinsicInfo(node->gtHWIntrinsicId);
+
+    if ((intrinsicInfo.flags & HWIntrinsicInfo::LowerCmpUZero) && varTypeIsUnsigned(node->gtSIMDBaseType))


This could really use a conspicuous comment above it. This is all about handling unsigned, and it's easy to miss that.

@sdmaclea - I'd love to see the additional comment, but I'm OK with merging now and you can add with another PR. Let me know.

tannergooding

This LGTM as well.

Just had a few questions about the differences between this and the x86 implementation.

sdmaclea · 2018-02-01T13:26:24Z

@CarolEidt @tannergooding Pushed a final comment patch.

Based on 340f055 test results above this could be merged once format checks pass.

sdmaclea · 2018-02-01T15:12:07Z

All checks passed. This can be merged.

tannergooding · 2018-02-01T15:14:19Z

@sdmaclea, Thanks!

sdmaclea commented Jan 29, 2018

View reviewed changes

sdmaclea commented Jan 30, 2018

View reviewed changes

sdmaclea mentioned this pull request Jan 30, 2018

Create hwintrinsic.cpp #16097

Merged

sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch from 98f3450 to e1e2bb4 Compare January 30, 2018 23:36

sdmaclea changed the title ~~[Arm64] Implement Simd.Extract~~ [Arm64] Implement Simd.Extract + Jan 30, 2018

sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch 2 times, most recently from 10ae965 to 0cf806a Compare January 31, 2018 16:56

CarolEidt reviewed Jan 31, 2018

View reviewed changes

sdmaclea added 5 commits January 31, 2018 18:06

[Arm64] Implement Simd.Extract

cbc7c9c

[Arm64] Implement Simd.BitwiseSelect

2a96b0f

[Arm64] Implement Simd.SetAllVector*

c940207

[Arm64] Lower Unsigned Compare Zero

5abae39

[Arm64] HWIntrinsic codegen function headers

340f055

sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch from 0cf806a to 340f055 Compare January 31, 2018 23:07

tannergooding reviewed Jan 31, 2018

View reviewed changes

CarolEidt approved these changes Jan 31, 2018

View reviewed changes

tannergooding approved these changes Feb 1, 2018

View reviewed changes

[Arm64] Add Lower Compare Zero comments

b49feee

tannergooding merged commit 58d5f55 into dotnet:master Feb 1, 2018

sdmaclea deleted the PR-ARM64-SIMD-Extract branch May 24, 2018 19:21


		int lanes = emitTypeSize(simdType) / baseTypeSize;

		auto emitSwCase = [&](int lane) {

Conversation

sdmaclea commented Jan 29, 2018

Uh oh!

sdmaclea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Jan 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Jan 30, 2018

Uh oh!

sdmaclea commented Jan 31, 2018

Uh oh!

sdmaclea commented Jan 31, 2018

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Jan 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tannergooding Jan 31, 2018 •

edited

Loading