Moving a number of the Sse2 hwintrinsic tests to use the test template.#16192
Conversation
…LoadScalarVector128 intrinsics.
…adScalarVector128 intrinsics.
|
FYI. @fiigii, @CarolEidt |
| HARDWARE_INTRINSIC(SSE2_ConvertToVector128Single, "ConvertToVector128Single", SSE2, -1, 16, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cvtdq2ps, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cvtpd2ps}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromArg) | ||
| HARDWARE_INTRINSIC(SSE2_Divide, "Divide", SSE2, -1, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_divpd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | ||
| HARDWARE_INTRINSIC(SSE2_LoadAlignedVector128, "LoadAlignedVector128", SSE2, -1, 16, 1, {INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_invalid, INS_movapd}, HW_Category_MemoryLoad, HW_Flag_NoFlag) | ||
| HARDWARE_INTRINSIC(SSE2_LoadScalarVector128, "LoadScalarVector128", SSE2, -1, 16, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_movd, INS_movd, INS_movq, INS_movq, INS_invalid, INS_movsdsse2}, HW_Category_MemoryLoad, HW_Flag_NoFlag) |
There was a problem hiding this comment.
@fiigii, the 8-bit and 16-bit overloads in particular don't have any backing instruction support (there is no movb or movw, or equivalent that I can find), was this intentional?
There was a problem hiding this comment.
I have no idea about ISA design. AFAIK, the performance of partial register access is tricky on Intel architectures, especially <32-bit.
There was a problem hiding this comment.
I was asking since you exposed these particular LoadScalar APIs. I was wondering if it was intentional (and you expect a movd+clear unnecessary bits) or if it was just an oversight at the time (and we should remove these 4 APIs in particular).
There was a problem hiding this comment.
it was just an oversight at the time
Yes, I think that is an oversight. LoadScalarVector128 should just work on float/double.
Looks like we should remove integer LoadScalarVector128 and add Vector128<long/ulong> LoadLow(Vector128<long/ulong> upper, double* address).
There was a problem hiding this comment.
I think we want LoadScalar for int/unit and long/ulong.
The movd and movq instructions will explicitly clear upper, rather than setting it from one of the source operands
There was a problem hiding this comment.
I think we want LoadScalar for int/unit and long/ulong.
Yes, I just looked that there are a few Scalar APIs work on int/long.
The movd and movq instructions will explicitly clear upper, rather than setting it from one of the source operands
That is okay, which makes consistent with C++ intrinsic semantics.
There was a problem hiding this comment.
That is okay, which makes consistent with C++ intrinsic semantics.
Yes, I was just indicating why we couldn't use them for LoadLower or LoadUpper, since those explicitly take a lower and upper value for the "other" bits.
I'll submit a separate PR to remove the 4 overloads that are not valid.
There was a problem hiding this comment.
@fiigii, did you have any other feedback or does this look good to you?
There was a problem hiding this comment.
Looks pretty good. Thanks for indicating this mistake.
|
The product changes (+20/-9) are all in the first commit. The last two commits (+813/-0 and +25,955/-3,718, respectively) are purely test changes. |
|
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
|
@tannergooding I am going to wait for this PR before submitting my PRs with remaining SSE2 intrinsics modulo #16200 |
This also implements the LoadVector128, LoadAlignedVector128, and LoadScalarVector128 intrinscis for Sse2.