[Arm64] HW Intrinsics API#26580
Conversation
8933799 to
27976a0
Compare
|
Needs to be rebased on #27567 |
27976a0 to
774683f
Compare
|
@eerhardt @CarolEidt @RussKeldorph @tannergooding @terrajobst @4creators @debayang Since we are moving this to an experimental package, it seems the consensus is that this can now be merged without a formal review. PTAL This represents what is currently implemented and tested in CoreCLR. This can merge before or after #27567 renames this directory structure. |
774683f to
a878804
Compare
|
Rebased now that #27567 merged |
|
Added crypto API's now that they were merged into CoreClr. This was probably premature. As CoreCLR bits need to propagate to corefx first. Removing for now. Will push to separate PR as this should merge before CoreCLR crypto bits propagate to CoreFX |
e273b7b to
a878804
Compare
Looks like either methods that don't exist in coreclr, or the signatures don't match. @sdmaclea - can you fix this? |
|
@eerhardt I think the methods were removed after from coreclr because the underlying instruction forms did not exist. I will revise. |
a878804 to
4401ae7
Compare
| public static Vector64<int> LeadingSignCount(Vector64<int> value) { throw null; } | ||
| public static Vector128<sbyte> LeadingSignCount(Vector128<sbyte> value) { throw null; } | ||
| public static Vector128<short> LeadingSignCount(Vector128<short> value) { throw null; } | ||
| public static Vector128<int> LeadingSignCount(Vector128<int> value) { throw null; } |
There was a problem hiding this comment.
Is there a reason why there is no Vector128<long> overload for this and the other methods that have Vector128<int>? Do ARM processors not support that type?
We seem to only have 2 of them:
public static Vector128<ulong> Abs(Vector128<long> value)
public static Vector128<long> Negate(Vector128<long> value)
There was a problem hiding this comment.
Many Arm64 simd instruction do not support 64-bit elements. If it is not supported, The API must spell out exactly which primitives are supported.
So if the intrinsic is generic and the vector is at least 128 bits long long and ulong are supported.
So Add<T>, And<T>, AndNot<T> ... all support long & ulong for Vector128<T>
Vector64<T> never supports 64 bit elements because it would not be a vector....
There was a problem hiding this comment.
I'm not sure I follow.
Why do we support using long in Abs, Negate , Add, And, AndNot ...., but we don't support LeadingSignCount with long?
Because the processor explicitly doesn't support LeadingSignCount with long?
There was a problem hiding this comment.
Because the processor explicitly doesn't support LeadingSignCount with long?
The processor does support LeadingSignCount with long in the base instruction set.
#if ARM64_HW_INTRINSIC_NYI
public static class Base
{
public static bool IsSupported { get { throw null; } }
public static uint LeadingSignCount(int value) { throw null; }
public static ulong LeadingSignCount(long value) { throw null; }
public static uint LeadingZeroCount(uint value) { throw null; }
public static ulong LeadingZeroCount(ulong value) { throw null; }
}
#endif
The processor doesn't support LeadingSignCount with Simd elements of long.
The Arm64 simd instructions smin, smax, umin, umax, mul, cls, clz ... do not support an integer element size of 64 bits. Apparently ARM decided cost/benefit was not strong enough to include in the initial Simd ISA.
We did not expose in the API, because they would not be intrinsics, but rather a sequence of instructions.
Completely unrelated to this change. OSX fails for infrastructure issue. test NETFX x86 Release Build |
| { | ||
| public static bool IsSupported { get { throw null; } } | ||
| public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; } | ||
| public static Vector64<ushort> Abs(Vector64<short> value) { throw null; } |
There was a problem hiding this comment.
The API follows pattern in Abs that converts signed integrals to unsigned ones. IMO it would be better to keep original signed type as return vector type since it is very probable that users will continue using signed integrals of the same type as argument type.
Furthermore, Abs return values will always fit in signed integrals of the same type.
There was a problem hiding this comment.
This is what is implemented in CoreCLR. I am happy to change in CoreCLR, but I will wait for consensus first.
If we want to change abs(), it should be removed from this PR so others can be merged,
There was a problem hiding this comment.
This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.
There was a problem hiding this comment.
@4creators, @sdmaclea. The reason x86 does it this way is because the instructions explicitly document themselves as: Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1.
| { | ||
| public static bool IsSupported { get { throw null; } } | ||
| public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; } | ||
| public static Vector64<ushort> Abs(Vector64<short> value) { throw null; } |
There was a problem hiding this comment.
This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.
|
Looks like all the OSX jobs are currently failing. |
|
All UWP CoreCLR x64 Debug Build are also failing. Can this be merged? |
| public static class Simd | ||
| { | ||
| public static bool IsSupported { get { throw null; } } | ||
| public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; } |
There was a problem hiding this comment.
nit: I don't think we are aligning the names/etc elsewhere. (CC. @eerhardt to make sure this is ok).
There was a problem hiding this comment.
This style isn't mentioned in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md, so I wouldn't block on it.
But in general, my suggestion is to not use formatting that is going to fight with VS's auto-formatting features (CTRL+K, CTRL+D). It just causes hassle for the next editor(s) of the file.
There was a problem hiding this comment.
I'll clean up whitespace in a separate PR.
|
test OSX x64 Debug Build |
eerhardt
left a comment
There was a problem hiding this comment.
Just waiting on clean CI. Will merge if there are failures not related to this change.
| public static Vector128<T> Or<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; } | ||
| public static Vector64<T> OrNot<T>(Vector64<T> left, Vector64<T> right) where T : struct { throw null; } | ||
| public static Vector128<T> OrNot<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; } | ||
| public static Vector64<byte> PopCount(Vector64<byte> value) { throw null; } |
There was a problem hiding this comment.
Does this instruction really only support bytes and not uint or ulong?
There was a problem hiding this comment.
Unfortunately, Arm64 simd cnt is Population Count per byte
The wider forms can be synthesized in C# when we add AddPairwise(), StaticCast() and optionally WidenLo()
i.e.
public Vector64<ushort> PopCount(Vector64<ushort> value)
{
Vector64<byte> popCountPerByte = PopCount(StaticCast<byte>(value));
Vector64<byte> popCountPerElement = AddPairwise(popCountPerByte, SetAllVector64(0));
return WidenLo(popCountPerElement);
}
public Vector64<uint> PopCount(Vector64<uint> value)
{
Vector64<byte> popCountPerByte = PopCount(StaticCast<ushort>(value));
Vector64<byte> popCountPerShort = AddPairwise(popCountPerByte, SetAllVector64(0));
Vector64<byte> popCountPerElement = AddPairwise(popCountPerShort, SetAllVector64(0));
return WidenLo(WidenLo(popCountPerElement));
}
There was a problem hiding this comment.
We could eventually add (or revise to)
vector64<byte> PopCountPerByte<T>(vector64<T> value);
It would eliminate the need for the static casts.
|
Thanks @sdmaclea ! |
No description provided.