Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64] Initial HW intrinsic framework#15833

Merged
CarolEidt merged 1 commit into
dotnet:masterfrom
sdmaclea:PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK
Jan 29, 2018
Merged

[Arm64] Initial HW intrinsic framework#15833
CarolEidt merged 1 commit into
dotnet:masterfrom
sdmaclea:PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK

Conversation

@sdmaclea
Copy link
Copy Markdown

This is the basic framework for a table driven ARM64 HW Intrinsic framework

It has only been tested for the SIMD add case, but it does work end to end

@CarolEidt @tannergooding @4creators
cc/ @fiigii @dotnet/jit-contrib

Comment thread src/jit/compiler.cpp
Vector64IntHandle = nullptr;
Vector64ShortHandle = nullptr;
Vector64ByteHandle = nullptr;
Vector64LongHandle = nullptr;
Copy link
Copy Markdown
Author

@sdmaclea sdmaclea Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the dimunitive cases where Vector64<T> == T

Comment thread src/jit/compiler.h
#endif // _TARGET_XARCH_
#ifdef _TARGET_ARM64_
NamedIntrinsic lookupHWIntrinsic(const char* className, const char* methodName);
GenTree* impHWIntrinsic(NamedIntrinsic intrinsic,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice for X86 to use this too, but I left that for a separate PR.

Comment thread src/jit/compiler.cpp
if (jitFlags.IsSet(JitFlags::flag)) \
opts.setSupportedISA(InstructionSet_##isa);
#include "hwintrinsiclistArm64.h"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also in #15798, here to allow compilation.

return retNode;
}

CORINFO_CLASS_HANDLE Compiler::gtGetStructHandleForHWSIMD(var_types simdType, var_types simdBaseType)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be merged with X86's copy and moved to a shared file. Saved for later

// Return Value:
// a gtNewMustThrowException if mustExpand is true; otherwise, nullptr
//
GenTree* Compiler::impUnsupportedHWIntrinsic(unsigned helper,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an exact copy of X86's.
I think this could be merged with X86's copy and moved to a shared file. Saved for later

Comment thread src/jit/compiler.h
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
bool mustExpand);
GenTree* impUnsupportedHWIntrinsic(unsigned helper,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature looks identical between the two, can we share it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should move to a shared file, but I was goiing to make that a separate PR. So it could be reviewed faster.

Comment thread src/jit/importer.cpp
if ((namespaceName != nullptr) && strcmp(namespaceName, "System.Runtime.Intrinsics.Arm.Arm64") == 0)
{
result = lookupHWIntrinsic(className, methodName);
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could agree on a common interface here. Move namespace name inside.

lookupHWIntrinsic(namespaceName, className, methodName)

This allows IsSupported and Platform not supported exceptions to be handled in JIT.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be converted into switch this way as well

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ex. start with System.Runtime.Intrinsics, switch on X86, Arm, and than next switch statements if possible,

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we should eventually be using an unordered/hash dictionary lookup

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is one preference - to get it as fast as possible. @tannergooding and @fiigii were thinking about dictionary or switch, whichever would better work here. From this what I see it could be split into couple of switches or as you suggest lookup could be dictionary based.

Comment thread src/jit/compiler.h
#endif // _TARGET_XARCH_
#ifdef _TARGET_ARM64_
NamedIntrinsic lookupHWIntrinsic(const char* className, const char* methodName);
GenTree* impHWIntrinsic(NamedIntrinsic intrinsic,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this one, identical other than the name. Would be good to share if we can.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially renamed the X86 version, but I had too many merge conflicts. So I dropped it.

Comment thread src/jit/importer.cpp
impInlineInfo->retExprClassHndIsExact = false;
}
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a merge conflict resolve error

Comment thread src/jit/importer.cpp Outdated
(unsigned)CHECK_SPILL_ALL);

GenTree* tmpOp2 = gtNewLclvNode(lvaInlineeReturnSpillTemp, op2->TypeGet());
GenTreePtr tmpOp2 = gtNewLclvNode(lvaInlineeReturnSpillTemp, op2->TypeGet());
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another merge conflict resolve error

@sdmaclea sdmaclea force-pushed the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch from 3996b95 to 8384c3c Compare January 12, 2018 00:43
Comment thread src/jit/codegenarm64.cpp Outdated
instruction ins = getOpForHWIntrinsic(node, baseType);
assert(ins != INS_invalid);

bool is16B = (node->gtSIMDSize > 8);
Copy link
Copy Markdown

@4creators 4creators Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name is16B could be confusing - B may mean bit or byte, it would be better to use is16Byte

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/jit/codegenarm64.cpp Outdated
instruction ins = getOpForHWIntrinsic(node, baseType);
assert(ins != INS_invalid);

bool is16B = (node->gtSIMDSize > 8);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@sdmaclea sdmaclea force-pushed the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch from 8384c3c to 5d42803 Compare January 15, 2018 16:43
Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall; a couple of minor things and I think the IL implementation needs to check IsSupported

Comment thread src/jit/codegenarm64.cpp
instruction ins = getOpForHWIntrinsic(node, baseType);
assert(ins != INS_invalid);

bool is16Byte = (node->gtSIMDSize > 8);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why this wouldn't be node->gtSIMDSize == 16?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was drafted from the SIMD bin op code where 12 Byte vectors are treated as 16 byte. It is probably not needed here. Let me know if you want me to remove.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; I was just curious. Thanks

Comment thread src/jit/hwintrinsicArm64.h Outdated
{
enum Form
{
IsSupported,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these should have a short comment describing them. Also, I'm not sure what the Unsupported value of Form would be used for.

/// Vector add
/// Corresponds to vector forms of ARM64 ADD & FADD
/// </summary>
public static Vector128<byte> Add(Vector128<byte> left, Vector128<byte> right) => Add(left, right);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that these should first check IsSupported and throw PNSE if not - then do the recursive call.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt PlatformNotSupportedException is handled in the JIT to avoid replicating the code into every intrinsic.

Same with type checking for generics.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlatformNotSupportedException is handled in the JIT to avoid replicating the code into every intrinsic.

The issue is that we would like to support the scenario where the JIT doesn't have any support for intrinsics. In that case, the existing code would eventually get a stack overflow as it continues to recurse.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will require more discussion. I think this implementation is consistent with current X86 implementation.

cc/ @tannergooding @4creators @fiigii

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is consistent with the current implementation, but not what is currently proposed in dotnet/designs#28

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we would like to support the scenario where the JIT doesn't have any support for intrinsics. In that case, the existing code would eventually get a stack overflow as it continues to recurse.

Currently, this is handled by XXX.PlatformNotSupported.cs files that are used by unsupported platform build. For example, x64/x86 should build with Arm64/Simd.PlatformNotSupported.cs rather than Arm64/Simd.cs .

Copy link
Copy Markdown
Author

@sdmaclea sdmaclea Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt You own the design, but I thought that proposal was still in dispute.

So if we proceesd. I would add a function to every class

public static void ThrowIfUnsupported() 
{ 
    if (IsSupported == false) 
    { 
        throw new PlatformNotSupportedException(); 
    }
}

and methods would become

public static Vector128<byte>   Add(Vector128<byte>   left, Vector128<byte>   right) 
{ 
    ThrowIfUnsupported(); 
    return Add(left, right); 
}

This looks like it is simple enough

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course IsSupported still wouldn't work on platfoms without JIT support.

We could then get rid of the redundant C# files like this

#if ARM64
private static bool _IsSupported { get => _IsSupported; }
public static bool IsSupported { get => JITSupportsIntrinsics() && _IsSupported; }
#else
public static bool IsSupported { get => false; }
#endif

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And of course if the JIT was Intrinsic namespace aware it could easily support eliminating #if

private static bool _IsSupported { get => _IsSupported; }
public static bool IsSupported { get => JITSupportsIntrinsics() && _IsSupported; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii @sdmaclea - thanks for the clarifications. I haven't yet merged the design doc, and it seems there are further revisions needed.

If the XXX.PlatformNotSupported.cs approach is used, then it seems that we don't need to defend against the scenario I was describing. It would be good, though, if we had a way to catch the case where an expected intrinsics is not recognized, aside from stack overflow. Clearly that would be a bug, but the failure mode is pretty bad.

For the time being, I'l good with leaving these as-is.

static const HWIntrinsicInfo hwIntrinsicInfoArray[] = {
{NI_ARM64_IsSupported_True, "get_IsSupported", IsaFlag::EveryISA, HWIntrinsicInfo::IsSupported, HWIntrinsicInfo::None, {}},
{NI_ARM64_IsSupported_False, "::NI_ARM64_IsSupported_False", IsaFlag::EveryISA, HWIntrinsicInfo::IsSupported, HWIntrinsicInfo::None, {}},
{NI_ARM64_PlatformNotSupported, "::NI_ARM64_PlatformNotSupported", IsaFlag::EveryISA, HWIntrinsicInfo::Unsupported, HWIntrinsicInfo::None, {}},
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt This is the use of the Unsupported category.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I get it; thanks. This is for intrinsics that are for other platforms. A comment to that effect (where enum Form is declared) would be useful.

Copy link
Copy Markdown
Author

@sdmaclea sdmaclea Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is for intrinsics which are not supported on the running platform. So if the class reports IsSupported == false, all intrinsics of the class will return this intrinsic. This would include unsupported extensions, or JIT ISA flags which were disabled. It could also include other targets, but it does not yet.

result = (hwIntrinsicInfoArray[i].intrinsicID == NI_ARM64_IsSupported_True)
? NI_ARM64_PlatformNotSupported
: NI_ARM64_IsSupported_False;
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt This is the logic to handle the unsupported classes.

The result = (hwIntrinsicInfoArray[i].intrinsicID == NI_ARM64_IsSupported_True) line is incorrect.
It should be result = (hwIntrinsicInfoArray[i].intrinsicID != NI_ARM64_IsSupported_True)

It probably warrants a few comments

return gtNewIconNode((intrinsic == NI_ARM64_IsSupported_True) ? 1 : 0);

case HWIntrinsicInfo::Unsupported:
return impUnsupportedHWIntrinsic(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, method, sig, mustExpand);
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Here is the other part of the logic to handle unsupported.

Comment thread src/jit/hwintrinsiclistArm64.h Outdated
HARDWARE_INTRINSIC(NI_ARM64_SIMD_BitwiseAndNot, Simd, AndNot, SimdBinaryOp, INS_bic, INS_bic, INS_bic, None )
HARDWARE_INTRINSIC(NI_ARM64_SIMD_BitwiseOr, Simd, Or, SimdBinaryOp, INS_orr, INS_orr, INS_orr, None )
HARDWARE_INTRINSIC(NI_ARM64_SIMD_BitwiseOrNot, Simd, OrNot, SimdBinaryOp, INS_orn, INS_orn, INS_orn, None )
HARDWARE_INTRINSIC(NI_ARM64_SIMD_BitwiseNot, Simd, Not, SimdBinaryOp, INS_not, INS_not, INS_not, None )
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not should be a SimdUnaryOp

@sdmaclea sdmaclea force-pushed the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch 3 times, most recently from 4201eb2 to ddaf4ec Compare January 25, 2018 03:58
@sdmaclea
Copy link
Copy Markdown
Author

@CarolEidt I made the changes you suggested.

Tested many intrinsics using dotnet/corefx#26580 with #16008 . All implemented tests except Vector64 tests are passing.

Vector64 is still a work in progress. It has issues with return. For instance return Simd.Add(), the intrinsic is imported to be TYP_SIMD8. Because Short Vector ABI is not implemented, it falls back to return struct Vector64 as TYP_LONG. JIT to fixup the return type changes the HW intrinsic gtType to TYP_LONG. This breaks LSRA causing it to fail to assign a vector target register.

Vector128 tests are passing, but it is also not using the Short Vector ABI, so generated arg/return code is non-compliant and ugly.

I think Short Vector ABI support can be in separate PR. It will likely be complicated.

@sdmaclea sdmaclea force-pushed the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch from ddaf4ec to 0198a07 Compare January 26, 2018 18:31
Copy link
Copy Markdown
Author

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All enabled #16008 tests are now passing.

Vector64<T> tests were fixed w/o adding ShortVector ABI support. ShortVector ABI is separate issue #16021. I will not be working on that immediately.

CompareLessThanZero will fail when T is unsigned. These tests are commented out in #16008. I will work on that in a subsequent PR.

Insert, Extract, SetAllVector*, and BitwiseSelect are NYI. I will work on that in a subsequent PR.

@CarolEidt As I mentioned previously, I believe all your issues have been addressed. PTAL

Comment thread src/jit/importer.cpp
op = gtNewScalarHWIntrinsicNode(info.compRetNativeType, op, NI_ARM64_NONE_MOV);
}
}
#endif
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this workaround to make Vector64<T> work.

@sdmaclea sdmaclea force-pushed the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch from 0198a07 to 27173f5 Compare January 26, 2018 22:30
@sdmaclea
Copy link
Copy Markdown
Author

test this please

@sdmaclea
Copy link
Copy Markdown
Author

@CarolEidt Ping

@sdmaclea
Copy link
Copy Markdown
Author

timeout
test Tizen armel Cross Checked Innerloop Build and Test

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt CarolEidt merged commit 19c4e7b into dotnet:master Jan 29, 2018
@sdmaclea sdmaclea deleted the PR-ARM64-INITIAL-INTRINSIC-FRAMEWORK branch May 24, 2018 19:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants