Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64] HW Intrinsics API#26580

Merged
eerhardt merged 1 commit into
dotnet:masterfrom
sdmaclea:PR-ARM64-HW-INTRINSIC-WIP
Mar 2, 2018
Merged

[Arm64] HW Intrinsics API#26580
eerhardt merged 1 commit into
dotnet:masterfrom
sdmaclea:PR-ARM64-HW-INTRINSIC-WIP

Conversation

@sdmaclea
Copy link
Copy Markdown
Contributor

No description provided.

@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 1, 2018

Needs to be rebased on #27567

@sdmaclea sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from 27976a0 to 774683f Compare March 1, 2018 16:22
@sdmaclea sdmaclea changed the title WIP No Merge [Arm64] HW Intrinsics API [Arm64] HW Intrinsics API Mar 1, 2018
@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 1, 2018

@eerhardt @CarolEidt @RussKeldorph @tannergooding @terrajobst @4creators @debayang

Since we are moving this to an experimental package, it seems the consensus is that this can now be merged without a formal review.

PTAL This represents what is currently implemented and tested in CoreCLR.

This can merge before or after #27567 renames this directory structure.

@sdmaclea sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from 774683f to a878804 Compare March 1, 2018 18:23
@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 1, 2018

Rebased now that #27567 merged

@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 1, 2018

Added crypto API's now that they were merged into CoreClr. This was probably premature. As CoreCLR bits need to propagate to corefx first.

Removing for now. Will push to separate PR as this should merge before CoreCLR crypto bits propagate to CoreFX

@sdmaclea sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from e273b7b to a878804 Compare March 1, 2018 18:55
@eerhardt
Copy link
Copy Markdown
Member

eerhardt commented Mar 1, 2018

11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingSignCount(System.Runtime.Intrinsics.Vector128<System.Int64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingZeroCount(System.Runtime.Intrinsics.Vector128<System.Int64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(56,5): error : MembersMustExist : Member 'System.Runtime.Intrinsics.Arm.Arm64.Simd.LeadingZeroCount(System.Runtime.Intrinsics.Vector128<System.UInt64>)' does not exist in the implementation but it does exist in the contract. [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]
11:08:51 /mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/Tools/ApiCompat.targets(70,5): error : ApiCompat failed for '/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/bin/Unix.AnyCPU.Release/System.Runtime.Intrinsics.Experimental/netcoreapp/System.Runtime.Intrinsics.Experimental.dll' [/mnt/j/workspace/dotnet_corefx/master/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/src/System.Runtime.Intrinsics.Experimental/src/System.Runtime.Intrinsics.Experimental.csproj]

Looks like either methods that don't exist in coreclr, or the signatures don't match. @sdmaclea - can you fix this?

@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 1, 2018

@eerhardt I think the methods were removed after from coreclr because the underlying instruction forms did not exist. I will revise.

@sdmaclea sdmaclea force-pushed the PR-ARM64-HW-INTRINSIC-WIP branch from a878804 to 4401ae7 Compare March 1, 2018 20:21
public static Vector64<int> LeadingSignCount(Vector64<int> value) { throw null; }
public static Vector128<sbyte> LeadingSignCount(Vector128<sbyte> value) { throw null; }
public static Vector128<short> LeadingSignCount(Vector128<short> value) { throw null; }
public static Vector128<int> LeadingSignCount(Vector128<int> value) { throw null; }
Copy link
Copy Markdown
Member

@eerhardt eerhardt Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why there is no Vector128<long> overload for this and the other methods that have Vector128<int>? Do ARM processors not support that type?

We seem to only have 2 of them:

public static Vector128<ulong> Abs(Vector128<long> value)
public static Vector128<long> Negate(Vector128<long> value)

Copy link
Copy Markdown
Contributor Author

@sdmaclea sdmaclea Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many Arm64 simd instruction do not support 64-bit elements. If it is not supported, The API must spell out exactly which primitives are supported.

So if the intrinsic is generic and the vector is at least 128 bits long long and ulong are supported.

So Add<T>, And<T>, AndNot<T> ... all support long & ulong for Vector128<T>

Vector64<T> never supports 64 bit elements because it would not be a vector....

Copy link
Copy Markdown
Member

@eerhardt eerhardt Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow.

Why do we support using long in Abs, Negate , Add, And, AndNot ...., but we don't support LeadingSignCount with long?

Because the processor explicitly doesn't support LeadingSignCount with long?

Copy link
Copy Markdown
Contributor Author

@sdmaclea sdmaclea Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the processor explicitly doesn't support LeadingSignCount with long?

The processor does support LeadingSignCount with long in the base instruction set.

#if ARM64_HW_INTRINSIC_NYI
    public static class Base
    {
        public static bool IsSupported { get { throw null; } }

        public static uint  LeadingSignCount(int  value) { throw null; }
        public static ulong LeadingSignCount(long value) { throw null; }
        public static uint  LeadingZeroCount(uint  value) { throw null; }
        public static ulong LeadingZeroCount(ulong value) { throw null; }
    }
#endif

The processor doesn't support LeadingSignCount with Simd elements of long.

The Arm64 simd instructions smin, smax, umin, umax, mul, cls, clz ... do not support an integer element size of 64 bits. Apparently ARM decided cost/benefit was not strong enough to include in the initial Simd ISA.

We did not expose in the API, because they would not be intrinsics, but rather a sequence of instructions.

@eerhardt
Copy link
Copy Markdown
Member

eerhardt commented Mar 1, 2018

NETFX is failing:

     System.Drawing.Tests.IconTests.Ctor_FilePath_Size(fileName: "48x48_multiple_entries_4bit.ico", size: {Width=16, Height=16}, expectedSize: {Width=16, Height=16}) [FAIL]
      System.Linq.Tests.ToArrayTests.ToArray_FailOnExtremelyLargeCollection [SKIP]
  === TEST EXECUTION SUMMARY ===
        Valid test but too intensive to enable even in OuterLoop
     System.Resources.ResourceManager.Tests  Total: 67, Errors: 0, Failed: 0, Skipped: 0, Time: 0.561s
  ----- end 12:49:12.85 ----- exit code 0 ----------------------------------------------------------
        System.IO.IOException : The process cannot access the file 'D:\j\workspace\windows-TGrou---2a8f9c29\bin\tests\System.Drawing.Common.Tests\netfx-Windows_NT-Release-x86\bitmaps\48x48_multiple_entries_4bit.ico' because it is being used by another process.
        Stack Trace:
             at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
             at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
             at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)
             at System.Drawing.Icon..ctor(String fileName, Int32 width, Int32 height)
             at System.Drawing.Icon..ctor(String fileName, Size size)
             at System.Drawing.Tests.IconTests.Ctor_FilePath_Size(String fileName, Size size, Size expectedSize)

Completely unrelated to this change.

OSX fails for infrastructure issue.

test NETFX x86 Release Build
test OSX x64 Debug Build

{
public static bool IsSupported { get { throw null; } }
public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; }
public static Vector64<ushort> Abs(Vector64<short> value) { throw null; }
Copy link
Copy Markdown
Contributor

@4creators 4creators Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API follows pattern in Abs that converts signed integrals to unsigned ones. IMO it would be better to keep original signed type as return vector type since it is very probable that users will continue using signed integrals of the same type as argument type.

Furthermore, Abs return values will always fit in signed integrals of the same type.

Copy link
Copy Markdown
Contributor Author

@sdmaclea sdmaclea Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is implemented in CoreCLR. I am happy to change in CoreCLR, but I will wait for consensus first.

If we want to change abs(), it should be removed from this PR so others can be merged,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@4creators, @sdmaclea. The reason x86 does it this way is because the instructions explicitly document themselves as: Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1.

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
public static bool IsSupported { get { throw null; } }
public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; }
public static Vector64<ushort> Abs(Vector64<short> value) { throw null; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is entirely orthogonal to this PR, as this is what is being done for the x86 intrinsics. I think we should merge this PR as-is, and then consider whether or not to change it for all targets.

@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 2, 2018

Looks like all the OSX jobs are currently failing.

@sdmaclea
Copy link
Copy Markdown
Contributor Author

sdmaclea commented Mar 2, 2018

All UWP CoreCLR x64 Debug Build are also failing.

Can this be merged?

public static class Simd
{
public static bool IsSupported { get { throw null; } }
public static Vector64<byte> Abs(Vector64<sbyte> value) { throw null; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we are aligning the names/etc elsewhere. (CC. @eerhardt to make sure this is ok).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style isn't mentioned in https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md, so I wouldn't block on it.

But in general, my suggestion is to not use formatting that is going to fight with VS's auto-formatting features (CTRL+K, CTRL+D). It just causes hassle for the next editor(s) of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clean up whitespace in a separate PR.

@eerhardt
Copy link
Copy Markdown
Member

eerhardt commented Mar 2, 2018

test OSX x64 Debug Build
test UWP CoreCLR x64 Debug Build

Copy link
Copy Markdown
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on clean CI. Will merge if there are failures not related to this change.

public static Vector128<T> Or<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; }
public static Vector64<T> OrNot<T>(Vector64<T> left, Vector64<T> right) where T : struct { throw null; }
public static Vector128<T> OrNot<T>(Vector128<T> left, Vector128<T> right) where T : struct { throw null; }
public static Vector64<byte> PopCount(Vector64<byte> value) { throw null; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this instruction really only support bytes and not uint or ulong?

Copy link
Copy Markdown
Contributor Author

@sdmaclea sdmaclea Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Arm64 simd cnt is Population Count per byte

The wider forms can be synthesized in C# when we add AddPairwise(), StaticCast() and optionally WidenLo()

i.e.

public Vector64<ushort> PopCount(Vector64<ushort> value)
{
  Vector64<byte> popCountPerByte = PopCount(StaticCast<byte>(value));
  Vector64<byte> popCountPerElement = AddPairwise(popCountPerByte, SetAllVector64(0));
  return WidenLo(popCountPerElement);
}
public Vector64<uint> PopCount(Vector64<uint> value)
{
  Vector64<byte> popCountPerByte = PopCount(StaticCast<ushort>(value));
  Vector64<byte> popCountPerShort = AddPairwise(popCountPerByte, SetAllVector64(0));
  Vector64<byte> popCountPerElement = AddPairwise(popCountPerShort, SetAllVector64(0));
  return WidenLo(WidenLo(popCountPerElement));
}

Copy link
Copy Markdown
Contributor Author

@sdmaclea sdmaclea Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could eventually add (or revise to)
vector64<byte> PopCountPerByte<T>(vector64<T> value);
It would eliminate the need for the static casts.

@eerhardt eerhardt merged commit 4ecc7f7 into dotnet:master Mar 2, 2018
@eerhardt
Copy link
Copy Markdown
Member

eerhardt commented Mar 2, 2018

Thanks @sdmaclea !

@karelz karelz added this to the 2.1.0 milestone Mar 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants