Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64] Implement Simd.Extract +#16085

Merged
tannergooding merged 6 commits into
dotnet:masterfrom
sdmaclea:PR-ARM64-SIMD-Extract
Feb 1, 2018
Merged

[Arm64] Implement Simd.Extract +#16085
tannergooding merged 6 commits into
dotnet:masterfrom
sdmaclea:PR-ARM64-SIMD-Extract

Conversation

@sdmaclea
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Author

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements the first intrinsic with const immediate. It handles the non-const case by generating a switch table

@CarolEidt @RussKeldorph @tannergooding PTAL
@dotnet/arm64-contrib @dotnet/jit-contrib FYI

Comment thread src/jit/codegenarm64.cpp
inst_JMP(EJ_jmp, labelBreakTarget);
}
genDefineTempLabel(labelBreakTarget);
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This genHWIntrinsicSwitchTable is intended to be reusable for any intrinsic requiring a switch table. It is designed to work with single instruction intrinsics, so the case spacing is hard coded to two instructions (8 bytes).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcode it, instead of defining a label per offset, so it can be dynamically sized, if required?

That is what I did for the x86 codegen (https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsiccodegenxarch.cpp#L655).

I do like the approach to making it reusable 😄

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcode it, instead of defining a label per offset, so it can be dynamically sized, if required?

Arm64 instructions are fixed size. So it makes sense. Single instruction per case is the expected use case. (I would add a case instruction count parameter if needed).

Label per offset would require more complexity. Either a separate direct branch table so the target address could be calculated, or an if else compare and branch chain. Or ...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you chose the separate direct branch table.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated code will be smaller. Since the jump table is not needed.

Certainly if a variable sized case was needed, We could add the jump table. Right now I expect homogeneous cases of single instructions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you chose the separate direct branch table

Yes, x86 instructions can potentially range in size from 1 byte to 16+ bytes (depending on prefixes, encoding, etc).

Although, within a given HWIntrinsic jump table, they will likely be fairly consistent.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that for Arm64 this approach is best. If it were found to be necessary, it could be parameterized based on the instruction (though I don't see us needing this for anything that would require more than a single instruction per case).

Comment thread src/jit/codegenarm64.cpp Outdated

int lanes = emitTypeSize(simdType) / baseTypeSize;

auto emitSwCase = [&](int lane) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a lambda to populate instructions in the switch table

Comment thread src/jit/codegenarm64.cpp Outdated
{
int lane = op2->AsIntConCommon()->IconValue();

emitSwCase(lane);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda used even when not generating the switch.

@sdmaclea
Copy link
Copy Markdown
Author

Tested with new tests added to #16008

Comment thread src/jit/codegenarm64.cpp Outdated

if (op2->isContainedIntOrIImmed())
{
int lane = op2->AsIntConCommon()->IconValue();
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src\jit\codegenarm64.cpp(5147): warning C4244: 'initializing': conversion from 'ssize_t' to 'int', possible loss of data [D:\j\workspace\x64_checked_w---76911707\bin\obj\Windows_NT.x64.Checked\src\jit\protononjit\protononjit.vcxproj]

@sdmaclea sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch from 98f3450 to e1e2bb4 Compare January 30, 2018 23:36
@sdmaclea sdmaclea changed the title [Arm64] Implement Simd.Extract [Arm64] Implement Simd.Extract + Jan 30, 2018
@sdmaclea
Copy link
Copy Markdown
Author

Added Simd.BitwiseSelect() support.
Added #16097 & #16102 to avoid merge conflicts

@sdmaclea sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch 2 times, most recently from 10ae965 to 0cf806a Compare January 31, 2018 16:56
@sdmaclea
Copy link
Copy Markdown
Author

Simple rebase to fix merge conflict.

@sdmaclea
Copy link
Copy Markdown
Author

@CarolEidt ping. Can this be reviewed/merged?

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an assert verifying the single-instruction requirement, and function headers are missing from even some of the pre-existing methods.

Comment thread src/jit/codegenarm64.cpp
genProduceReg(node);
}

template <typename HWIntrinsicSwitchCaseBody>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a function header. See https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-jit-coding-conventions.md#94-function-header-comment

I would in particular describe the scenario this supports, as it may be confusing at first why we have a HW intrinsic switch.

Comment thread src/jit/codegenarm64.cpp
inst_JMP(EJ_jmp, labelBreakTarget);
}
genDefineTempLabel(labelBreakTarget);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that for Arm64 this approach is best. If it were found to be necessary, it could be parameterized based on the instruction (though I don't see us needing this for anything that would require more than a single instruction per case).

Comment thread src/jit/codegenarm64.cpp
genDefineTempLabel(labelFirst);
for (int i = 0; i < swMax; ++i)
{
emitSwCase(i);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are assuming that this generates a single instruction, you might do something like:

// This code assumes that emitSwCase() generates a single instruction.
unsigned prevInsCount = getEmitter()->emitInsCount;
for (int i = 0; i < swMax; ++i)
{
    emitSwCase(i);
    newInsCount = getEmitter()->emitInsCount;
    assert(newInsCount == (prevInsCount + 1));
    prevInsCount = newInsCount;
}

I think that emitInsCount is public.

Comment thread src/jit/codegenarm64.cpp Outdated
int lanes = emitTypeSize(simdType) / baseTypeSize;

auto emitSwCase = [&](int lane) {
assert(lane >= 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, lane is not very mnemonic, though others may disagree. Something like caseImmediate or caseImm?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt
The immediate is the vector element lane for this instruction.

caseImm ... feels too generic.

lane and lanes could become:

  • element & elements
  • vectorElement & vectorElements
  • vectorIndex & vectorLength

Preference?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I hadn't really internalized that this is the specific case (not the general case as in genHWIntrinsicSwitchTable()). I think element and elements would be good.

Comment thread src/jit/codegenarm64.cpp Outdated
bool is16Byte = (node->gtSIMDSize > 8);
emitAttr attr = is16Byte ? EA_16BYTE : EA_8BYTE;

// Arm64 has three bit select forms each use three source registers
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would add a ';' after "forms':

// Arm64 has three bit select forms; each use three source registers

@sdmaclea sdmaclea force-pushed the PR-ARM64-SIMD-Extract branch from 0cf806a to 340f055 Compare January 31, 2018 23:07
@sdmaclea
Copy link
Copy Markdown
Author

@CarolEidt @tannergooding

Adding Unsigned compare zero lowering. Unsigned compare zero tests are now all passing.
Added Simd.SetAllVector* support. Tests are passing.
Added/fixed comments per @CarolEidt request
Added asserts per @CarolEidt request
Renamed lane to element per @CarolEidt request.

PTAL

Comment thread src/jit/codegenarm64.cpp
//------------------------------------------------------------------------
// genHWIntrinsicSimdBinaryOp:
//
// Produce code for a GT_HWIntrinsic node with form SimdBinaryOp.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the SIMD size matter?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No why?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just wanting to make sure since there will be Vector64 and Vector128 types, and eventually Vector256+ (if SVE is supported).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bets are off on SVE

Comment thread src/jit/codegenarm64.cpp
// need to generate functionally correct code when the operand is not constant
//
// This is required by the HW Intrinsic design to handle:
// debugger calls
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to list this as: to handle indirect calls, such as:

// op1 is the first operand
// op2 is the second operand
// op3 is the third operand
op3 = impSIMDPopStack(simdType);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to have a general helper method for popping/validating the types, as was requested/done for x86.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why. Each form has different requirements. Not sure how a helper would help.

Maybe impScalarPopStack()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86 has this method: https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsicxarch.cpp#L276

Which validates the type of a struct is a SIMD Type and the type of a scalar matches what the signature expects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @CarolEidt was the one who requested we validate the type of the scalar values as well, rather than just calling impPopStack().val

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming we relied on the C# (et al) compiler to validate this.
Perhaps this is to check hand written IL.
Or perhaps it is just defensive programming,

In any case I think this can safely be a separate PR.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps it is just defensive programming,

Yes, that's the idea (that's true of many of the asserts and checks in the JIT, but it's surprising how often the "obvious" checks catch a problem.

In any case I think this can safely be a separate PR.

Me too

enum Flags
{
None
None = 0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix with HW_Flag_?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inside HWIntrinsicInfo so it is HWIntrinsicInfo::None

Comment thread src/jit/lsraarm64.cpp
GenTree* op1 = intrinsicTree->gtOp.gtOp1;
GenTree* op2 = intrinsicTree->gtOp.gtOp2;

if (op1->OperIs(GT_LIST))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperIsList()?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an explicit helper method for checking OperIs(GT_LIST). We have a few of them and they look to be the most prevalent in the codebase (or at least the most prevalent in the code I've touched so far).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? OperIs() is pretty standard in Arm64 lower.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't think we have a guidelines or even a lot of consistency on this. OperIs() is very useful for making checks for multiple oper values more concise and easier to read. For a single value, I think either is fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference either, I was just mostly wondering why one over the other was used here.

Comment thread src/jit/lsraarm64.cpp
}
else
{
info->srcCount += GetOperandInfo(op1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ARM not going to have 0 operand nodes or just not yet?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a reason. Unless we need to support barriers.

Copy link
Copy Markdown
Member

@tannergooding tannergooding Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we need to support barriers.

x86 added StoreFence, LoadFence, and MemoryFence. It also has a couple of helper methods (such as Sse.SetZeroVector128) which takes 0 args.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate PR if needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me.

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/jit/lowerarmarch.cpp
auto intrinsicID = node->gtHWIntrinsicId;
auto intrinsicInfo = comp->getHWIntrinsicInfo(node->gtHWIntrinsicId);

if ((intrinsicInfo.flags & HWIntrinsicInfo::LowerCmpUZero) && varTypeIsUnsigned(node->gtSIMDBaseType))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could really use a conspicuous comment above it. This is all about handling unsigned, and it's easy to miss that.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdmaclea - I'd love to see the additional comment, but I'm OK with merging now and you can add with another PR. Let me know.

Copy link
Copy Markdown
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM as well.

Just had a few questions about the differences between this and the x86 implementation.

@sdmaclea
Copy link
Copy Markdown
Author

sdmaclea commented Feb 1, 2018

@CarolEidt @tannergooding Pushed a final comment patch.

Based on 340f055 test results above this could be merged once format checks pass.

@sdmaclea
Copy link
Copy Markdown
Author

sdmaclea commented Feb 1, 2018

All checks passed. This can be merged.

@tannergooding tannergooding merged commit 58d5f55 into dotnet:master Feb 1, 2018
@tannergooding
Copy link
Copy Markdown
Member

@sdmaclea, Thanks!

@sdmaclea sdmaclea deleted the PR-ARM64-SIMD-Extract branch May 24, 2018 19:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants