Fix method names of hardware intrinsic APIs by fiigii · Pull Request #25965 · dotnet/corefx

fiigii · 2017-12-17T01:37:22Z

Matching the CoreCLR change dotnet/coreclr#15471

jkotas · 2017-12-17T13:37:53Z

@@ -198,7 +197,7 @@ public static class Avx
        public static Vector256<float> Set(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0) { throw null; }
        public static Vector256<double> Set(double e3, double e2, double e1, double e0) { throw null; }
        public static Vector256<T> Set1<T>(T value) where T : struct { throw null; }


Should this rather be called SetOne ? For consistency with SetZero, Vector<T>.One, Vector<T>.Zero.

I thought that Set1 should not be consistent with SetZero or Vector<T>.One.

SetZero stands for "set all elements to value of zero";

Vector<T>.One stands for "set all elements to value of one";

Set1 stands for "set all elements to one value of XX";

Set stands for "set elements to multiple values of XX, YY, ZZ, ...";

But I know Set1 is not a good name that just follows C++ Intel intrinsic naming. Do you have suggestions for a better name?

Just Set or SetAll might make sense.

Just Set or SetAll might make sense.

We have Set for multi-value initialization. SetAll makes sense to me. Thank you.

jkotas · 2017-12-17T13:41:38Z

-        public static Vector256<float> Permute2x128(Vector256<float> left, Vector256<float> right, byte control) { throw null; }
-        public static Vector256<double> Permute2x128(Vector256<double> left, Vector256<double> right, byte control) { throw null; }
+        public static Vector256<T> Permute2x128<T>(Vector256<T> left, Vector256<T> right, byte control) where T : struct { throw null; }
        public static Vector128<float> PermuteVar(Vector128<float> left, Vector128<float> mask) { throw null; }


Should this be called PermuteVariable? For consitency with BlendVariable.

Or do these two methods need the Var/Variable suffix at all? Would overload be sufficient?

We are using the Variable suffix only for v-suffixed instructions, e.g., BlendVariable -> vblendvp*, ShiftLeftLogicalVariable->vpsllv*, etc.
PermuteVar and PermuteVar8x32 is a special case that will generate vpermilp* and vperm*, which breaks the above convention, so it is following C++ Intel intrinsic naming.
We can change it to Variable suffix, I have no strong preference here.

jkotas · 2017-12-17T13:42:43Z

-        public static Vector128<float> ReciprocalSquareRoot(Vector128<float> value) { throw new NotImplementedException(); }
+        public static Vector128<float> ReciprocalSqrt(Vector128<float> value) { throw new NotImplementedException(); }
        public static Vector128<float> Set(float e3, float e2, float e1, float e0) { throw new NotImplementedException(); }
        public static Vector128<float> Set1(float value) { throw new NotImplementedException(); }


jkotas · 2017-12-17T14:00:21Z

For my education, what was the rule used to make the method generic vs. non-generic?

For example, I wondering about these:

Vector256<T> Set1<T>(T value) is generic, Vector128<int> Set1 is a series of non-generic methods. Should it be the same?
What is the point of e.g. float ExtractSingle<T>(Vector128<T> value, byte index) being generic? If I want to extract single, it should better be Vector128<float>.

jkotas · 2017-12-17T14:02:26Z

I am closing this because of I have cherry picked this into #25969. We can continue the discussion about the naming though.

fiigii · 2017-12-17T18:29:42Z

Vector256<T> Set1<T>(T value) is generic, Vector128<int> Set1 is a series of non-generic methods. Should it be the same?

Because SSE only has float instructions, so other types (Vector128<int> Set1, Vector128<double> Set1, ...) have to be in Sse2.
There is another solution would make it more consistent that we can provide generic Vector256<T> Set1<T>(T value) in Sse2 and Vector128<float> Set1 in Sse. Thoughts?

fiigii · 2017-12-17T18:33:52Z

What is the point of e.g. float ExtractSingle(Vector128 value, byte index) being generic? If I want to extract single, it should better be Vector128

Sometimes, ExtractSingle<T> can save one cast than ExtractSingle.

jkotas · 2017-12-18T01:43:07Z

Sometimes, ExtractSingle<T> can save one cast than ExtractSingle.

Is this a common pattern? It does not sound right to be optimizing for case where folks have e.g. Vector128<int> and they want to extract the int as float.

fiigii · 2017-12-18T07:16:33Z

Vector128 and they want to extract the int as float.

@jkotas Ok, I will fix this and Set1.

4creators · 2017-12-18T17:09:40Z

It does not sound right to be optimizing for case where folks have e.g. Vector128 and they want to extract the int as float.

@jkotas @fiigii
This is not an optimization. This is feature of underlying processor instructions which cannot discern if operand xmm is of float or int type. I do not think we should change that behavior. It would be OK for System.Numerics.Vector but intrinsics are just raw, very advanced API to underlying processor instructions and IMO we should just give them as they are.

tannergooding · 2017-12-18T18:23:09Z

@4creators, it is an optimization. It just avoids the consumer needing to insert a StaticCast (which will be no-op anyways). ExtractSingle should just take Vector128<float>

fiigii · 2017-12-18T18:29:50Z

I believe that these generic intrinsic is a "legacy" design from the long design process, and the original motivation has gone due to other changes. I will fix it soon, thanks for pointing out!

4creators · 2017-12-18T18:45:29Z

@tannergooding my point is that (E)(V)EXTRACTPS instruction does not check if operand is of xmm packed float type and it does not throw any type of floating point exception either, therefore, it can be treated as a general extraction of 32 bits from xmm vector. If we introduce any type of limitations which limit pure instruction functionality it is wrong for raw API.

4creators · 2017-12-18T18:50:00Z

float ExtractSingle<T>(Vector128<T> value, byte index) where T : struct { throw null; }
Vector128<Int64> src = //... //
var f = Sse41.ExtractSingle<Int64>(src, 3);

Should be perfectly legal - it saves 1 CPU cycle by doing extraction and cast (strange binary one but still), and I may want to use Int64 for loading to get atomic load for adjacent two 32bit values and do some precalculation step with it (again at binary level).

tannergooding · 2017-12-18T18:58:59Z

We are not providing a raw api, however. We are providing a managed abstraction over the underlying hardware instructions.

Because it is an abstraction and not raw access to the underlying instructions, there are some helper functions being provided (like static cast, set 1, etc) and other by design limitations set forth (such as no MMX instructions being exposed).

fiigii · 2017-12-18T19:06:52Z

@4creators Thank you for explaining my original design proposal 😄 . However, after I added StaticCast<T,U>, this kind of "raw operation" can be unified.

Vector128<Int64> src = //... //
Vector128<Single> srcFloat = Sse.StaticCast<Int64, Single>(src);
var f = Sse41.ExtractSingle(srcFloat, 3);

StaticCast<T,U> does not generate any runtime code (just makes the type system happy), so do not worry about "1 CPU cycle by doing extraction and cast".

fiigii · 2017-12-18T19:10:51Z

Updated the above code example.

4creators · 2017-12-19T01:49:45Z

Ahh my ... that was a good one on my side 😆

Fix method names of hardware intrinsic APIs

444bf56

fiigii mentioned this pull request Dec 17, 2017

Fix method names of hardware intrinsic APIs dotnet/coreclr#15471

Merged

jkotas reviewed Dec 17, 2017

View reviewed changes

jkotas closed this Dec 17, 2017

karelz added this to the 2.1.0 milestone Dec 28, 2017

Conversation

fiigii commented Dec 17, 2017

Uh oh!

jkotas Dec 17, 2017

Choose a reason for hiding this comment

Uh oh!

fiigii Dec 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Dec 17, 2017

Choose a reason for hiding this comment

Uh oh!

fiigii Dec 17, 2017

Choose a reason for hiding this comment

Uh oh!

jkotas Dec 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii Dec 17, 2017

Choose a reason for hiding this comment

Uh oh!

jkotas Dec 17, 2017

Choose a reason for hiding this comment

Uh oh!

jkotas commented Dec 17, 2017

Uh oh!

jkotas commented Dec 17, 2017

Uh oh!

fiigii commented Dec 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Dec 17, 2017

Uh oh!

jkotas commented Dec 18, 2017

Uh oh!

fiigii commented Dec 18, 2017

Uh oh!

4creators commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Dec 18, 2017

Uh oh!

fiigii commented Dec 18, 2017

Uh oh!

4creators commented Dec 18, 2017

Uh oh!

4creators commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented Dec 18, 2017

Uh oh!

fiigii commented Dec 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Dec 18, 2017

Uh oh!

4creators commented Dec 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fiigii Dec 17, 2017 •

edited

Loading

jkotas Dec 17, 2017 •

edited

Loading

fiigii commented Dec 17, 2017 •

edited

Loading

4creators commented Dec 18, 2017 •

edited

Loading

4creators commented Dec 18, 2017 •

edited

Loading

fiigii commented Dec 18, 2017 •

edited

Loading