[Arm64] HW Intrinsics API by sdmaclea · Pull Request #26580 · dotnet/corefx

sdmaclea · 2018-01-25T01:37:19Z

No description provided.

sdmaclea · 2018-03-01T03:07:02Z

Needs to be rebased on #27567

sdmaclea · 2018-03-01T16:29:10Z

@eerhardt @CarolEidt @RussKeldorph @tannergooding @terrajobst @4creators @debayang

Since we are moving this to an experimental package, it seems the consensus is that this can now be merged without a formal review.

PTAL This represents what is currently implemented and tested in CoreCLR.

This can merge before or after #27567 renames this directory structure.

sdmaclea · 2018-03-01T18:24:14Z

Rebased now that #27567 merged

sdmaclea · 2018-03-01T18:51:08Z

Added crypto API's now that they were merged into CoreClr. This was probably premature. As CoreCLR bits need to propagate to corefx first.

Removing for now. Will push to separate PR as this should merge before CoreCLR crypto bits propagate to CoreFX

eerhardt · 2018-03-01T19:46:06Z

11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingSignCount(System.Runtime.Intrinsics.Vector128<System.Int64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingZeroCount(System.Runtime.Intrinsics.Vector128<System.Int64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingZeroCount(System.Runtime.Intrinsics.Vector128<System.UInt64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(70,5): error : ApiCompat failed for '/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/bin/Unix.AnyCPU.Release/System.Runtime.Intrinsics.Experimental/netcoreapp/System.Runtime.Intrinsics.Experimental.dll' [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]

Looks like either methods that don't exist in coreclr, or the signatures don't match. @sdmaclea - can you fix this?

sdmaclea · 2018-03-01T19:49:44Z

@eerhardt I think the methods were removed after from coreclr because the underlying instruction forms did not exist. I will revise.

eerhardt · 2018-03-01T20:34:43Z

+        public static Vector64<int>    LeadingSignCount(Vector64<int>    value) { throw null; }
+        public static Vector128<sbyte> LeadingSignCount(Vector128<sbyte> value) { throw null; }
+        public static Vector128<short> LeadingSignCount(Vector128<short> value) { throw null; }
+        public static Vector128<int>   LeadingSignCount(Vector128<int>   value) { throw null; }


Is there a reason why there is no Vector128<long> overload for this and the other methods that have Vector128<int>? Do ARM processors not support that type?

We seem to only have 2 of them:

public static Vector128<ulong> Abs(Vector128<long> value)
public static Vector128<long> Negate(Vector128<long> value)

Many Arm64 simd instruction do not support 64-bit elements. If it is not supported, The API must spell out exactly which primitives are supported.

So if the intrinsic is generic and the vector is at least 128 bits long long and ulong are supported.

So Add<T>, And<T>, AndNot<T> ... all support long & ulong for Vector128<T>

Vector64<T> never supports 64 bit elements because it would not be a vector....

I'm not sure I follow.

Why do we support using long in Abs, Negate , Add, And, AndNot ...., but we don't support LeadingSignCount with long?

Because the processor explicitly doesn't support LeadingSignCount with long?

Because the processor explicitly doesn't support LeadingSignCount with long?

The processor does support LeadingSignCount with long in the base instruction set.

#if ARM64_HW_INTRINSIC_NYI public static class Base { public static bool IsSupported { get { throw null; } } public static uint LeadingSignCount(int value) { throw null; } public static ulong LeadingSignCount(long value) { throw null; } public static uint LeadingZeroCount(uint value) { throw null; } public static ulong LeadingZeroCount(ulong value) { throw null; } } #endif

The processor doesn't support LeadingSignCount with Simd elements of long.

The Arm64 simd instructions smin, smax, umin, umax, mul, cls, clz ... do not support an integer element size of 64 bits. Apparently ARM decided cost/benefit was not strong enough to include in the initial Simd ISA.

We did not expose in the API, because they would not be intrinsics, but rather a sequence of instructions.

eerhardt · 2018-03-01T21:15:00Z

NETFX is failing:

     System.Drawing.Tests.IconTests.Ctor_FilePath_Size(fileName: "48x48_multiple_entries_4bit.ico", size: {Width=16, Height=16}, expectedSize: {Width=16, Height=16}) [FAIL]
      System.Linq.Tests.ToArrayTests.ToArray_FailOnExtremelyLargeCollection [SKIP]
  === TEST EXECUTION SUMMARY ===
        Valid test but too intensive to enable even in OuterLoop
     System.Resources.ResourceManager.Tests  Total: 67, Errors: 0, Failed: 0, Skipped: 0, Time: 0.561s
  ----- end 12:49:12.85 ----- exit code 0 ----------------------------------------------------------
        System.IO.IOException : The process cannot access the file 'D:\j\workspace\windows-TGrou---2a8f9c29\bin\tests\System.Drawing.Common.Tests\netfx-Windows_NT-Release-x86\bitmaps\48x48_multiple_entries_4bit.ico' because it is being used by another process.
        Stack Trace:
             at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
             at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
             at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)
             at System.Drawing.Icon..ctor(String fileName, Int32 width, Int32 height)
             at System.Drawing.Icon..ctor(String fileName, Size size)
             at System.Drawing.Tests.IconTests.Ctor_FilePath_Size(String fileName, Size size, Size expectedSize)

Completely unrelated to this change.

OSX fails for infrastructure issue.

test NETFX x86 Release Build
test OSX x64 Debug Build

4creators · 2018-03-02T00:34:27Z

+    {
+        public static bool IsSupported { get { throw null; } }
+        public static Vector64<byte>    Abs(Vector64<sbyte>   value) { throw null; }
+        public static Vector64<ushort>  Abs(Vector64<short>   value) { throw null; }


The API follows pattern in Abs that converts signed integrals to unsigned ones. IMO it would be better to keep original signed type as return vector type since it is very probable that users will continue using signed integrals of the same type as argument type.

Furthermore, Abs return values will always fit in signed integrals of the same type.

This is what is implemented in CoreCLR. I am happy to change in CoreCLR, but I will wait for consensus first.

If we want to change abs(), it should be removed from this PR so others can be merged,

This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.

@4creators, @sdmaclea. The reason x86 does it this way is because the instructions explicitly document themselves as: Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1.

CarolEidt

LGTM

CarolEidt · 2018-03-02T05:33:36Z

+    {
+        public static bool IsSupported { get { throw null; } }
+        public static Vector64<byte>    Abs(Vector64<sbyte>   value) { throw null; }
+        public static Vector64<ushort>  Abs(Vector64<short>   value) { throw null; }


This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.

sdmaclea · 2018-03-02T06:23:20Z

Looks like all the OSX jobs are currently failing.

sdmaclea · 2018-03-02T17:30:45Z

All UWP CoreCLR x64 Debug Build are also failing.

Can this be merged?

tannergooding · 2018-03-02T17:43:40Z

+    public static class Simd
+    {
+        public static bool IsSupported { get { throw null; } }
+        public static Vector64<byte>    Abs(Vector64<sbyte>   value) { throw null; }


nit: I don't think we are aligning the names/etc elsewhere. (CC. @eerhardt to make sure this is ok).

This style isn't mentioned in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md, so I wouldn't block on it.

But in general, my suggestion is to not use formatting that is going to fight with VS's auto-formatting features (CTRL+K, CTRL+D). It just causes hassle for the next editor(s) of the file.

I'll clean up whitespace in a separate PR.

eerhardt · 2018-03-02T18:21:35Z

test OSX x64 Debug Build
test UWP CoreCLR x64 Debug Build

eerhardt

Just waiting on clean CI. Will merge if there are failures not related to this change.

eerhardt · 2018-03-02T18:32:45Z

+        public static Vector128<T> Or<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; }
+        public static Vector64<T>  OrNot<T>(Vector64<T>  left, Vector64<T>  right) where T : struct { throw null; }
+        public static Vector128<T> OrNot<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; }
+        public static Vector64<byte>    PopCount(Vector64<byte>    value) { throw null; }


Does this instruction really only support bytes and not uint or ulong?

Unfortunately, Arm64 simd cnt is Population Count per byte

The wider forms can be synthesized in C# when we add AddPairwise(), StaticCast() and optionally WidenLo()

i.e.

public Vector64<ushort> PopCount(Vector64<ushort> value) { Vector64<byte> popCountPerByte = PopCount(StaticCast<byte>(value)); Vector64<byte> popCountPerElement = AddPairwise(popCountPerByte, SetAllVector64(0)); return WidenLo(popCountPerElement); } public Vector64<uint> PopCount(Vector64<uint> value) { Vector64<byte> popCountPerByte = PopCount(StaticCast<ushort>(value)); Vector64<byte> popCountPerShort = AddPairwise(popCountPerByte, SetAllVector64(0)); Vector64<byte> popCountPerElement = AddPairwise(popCountPerShort, SetAllVector64(0)); return WidenLo(WidenLo(popCountPerElement)); }

We could eventually add (or revise to)
vector64<byte> PopCountPerByte<T>(vector64<T> value);
It would eliminate the need for the static casts.

eerhardt · 2018-03-02T19:40:57Z

Thanks @sdmaclea !

sdmaclea mentioned this pull request Jan 25, 2018

[Arm64] Initial HW intrinsic framework dotnet/coreclr#15833

Merged

karelz assigned sdmaclea, CarolEidt and eerhardt Jan 25, 2018

karelz added the area-System.Runtime.Intrinsics label Jan 25, 2018

sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from 8933799 to 27976a0 Compare January 26, 2018 18:28

sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from 27976a0 to 774683f Compare March 1, 2018 16:22

sdmaclea changed the title ~~WIP No Merge [Arm64] HW Intrinsics API~~ [Arm64] HW Intrinsics API Mar 1, 2018

sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from 774683f to a878804 Compare March 1, 2018 18:23

sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from e273b7b to a878804 Compare March 1, 2018 18:55

sdmaclea mentioned this pull request Mar 1, 2018

[Arm64] Add crypto intrinsics #27616

Merged

[Arm64] Add initial Simd HW intrinsics

4401ae7

sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from a878804 to 4401ae7 Compare March 1, 2018 20:21

eerhardt reviewed Mar 1, 2018

View reviewed changes

4creators reviewed Mar 2, 2018

View reviewed changes

CarolEidt approved these changes Mar 2, 2018

View reviewed changes

tannergooding reviewed Mar 2, 2018

View reviewed changes

tannergooding approved these changes Mar 2, 2018

View reviewed changes

eerhardt approved these changes Mar 2, 2018

View reviewed changes

eerhardt merged commit 4ecc7f7 into dotnet:master Mar 2, 2018

karelz added this to the 2.1.0 milestone Mar 10, 2018

Conversation

sdmaclea commented Jan 25, 2018

Uh oh!

sdmaclea commented Mar 1, 2018

Uh oh!

sdmaclea commented Mar 1, 2018

Uh oh!

sdmaclea commented Mar 1, 2018

Uh oh!

sdmaclea commented Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eerhardt commented Mar 1, 2018

Uh oh!

sdmaclea commented Mar 1, 2018

Uh oh!

eerhardt Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eerhardt Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eerhardt commented Mar 1, 2018

Uh oh!

4creators Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CarolEidt Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

CarolEidt left a comment

Choose a reason for hiding this comment

Uh oh!

CarolEidt Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Mar 2, 2018

Uh oh!

sdmaclea commented Mar 2, 2018

Uh oh!

tannergooding Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

eerhardt Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

eerhardt commented Mar 2, 2018

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

eerhardt Mar 2, 2018

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

sdmaclea commented Mar 1, 2018 •

edited

Loading

eerhardt Mar 1, 2018 •

edited

Loading

sdmaclea Mar 1, 2018 •

edited

Loading

eerhardt Mar 1, 2018 •

edited

Loading

sdmaclea Mar 1, 2018 •

edited

Loading

4creators Mar 2, 2018 •

edited

Loading

sdmaclea Mar 2, 2018 •

edited

Loading

sdmaclea Mar 2, 2018 •

edited

Loading

sdmaclea Mar 2, 2018 •

edited

Loading