Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Moving a number of the Sse2 hwintrinsic tests to use the test template.#16192

Merged
tannergooding merged 3 commits into
dotnet:masterfrom
tannergooding:hwintrin-tests
Feb 6, 2018
Merged

Moving a number of the Sse2 hwintrinsic tests to use the test template.#16192
tannergooding merged 3 commits into
dotnet:masterfrom
tannergooding:hwintrin-tests

Conversation

@tannergooding
Copy link
Copy Markdown
Member

This also implements the LoadVector128, LoadAlignedVector128, and LoadScalarVector128 intrinscis for Sse2.

@tannergooding
Copy link
Copy Markdown
Member Author

FYI. @fiigii, @CarolEidt

HARDWARE_INTRINSIC(SSE2_ConvertToVector128Single, "ConvertToVector128Single", SSE2, -1, 16, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cvtdq2ps, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cvtpd2ps}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromArg)
HARDWARE_INTRINSIC(SSE2_Divide, "Divide", SSE2, -1, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_divpd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE2_LoadAlignedVector128, "LoadAlignedVector128", SSE2, -1, 16, 1, {INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_movdqa, INS_invalid, INS_movapd}, HW_Category_MemoryLoad, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE2_LoadScalarVector128, "LoadScalarVector128", SSE2, -1, 16, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_movd, INS_movd, INS_movq, INS_movq, INS_invalid, INS_movsdsse2}, HW_Category_MemoryLoad, HW_Flag_NoFlag)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii, the 8-bit and 16-bit overloads in particular don't have any backing instruction support (there is no movb or movw, or equivalent that I can find), was this intentional?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about ISA design. AFAIK, the performance of partial register access is tricky on Intel architectures, especially <32-bit.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking since you exposed these particular LoadScalar APIs. I was wondering if it was intentional (and you expect a movd+clear unnecessary bits) or if it was just an oversight at the time (and we should remove these 4 APIs in particular).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was just an oversight at the time

Yes, I think that is an oversight. LoadScalarVector128 should just work on float/double.

Looks like we should remove integer LoadScalarVector128 and add Vector128<long/ulong> LoadLow(Vector128<long/ulong> upper, double* address).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want LoadScalar for int/unit and long/ulong.

The movd and movq instructions will explicitly clear upper, rather than setting it from one of the source operands

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want LoadScalar for int/unit and long/ulong.

Yes, I just looked that there are a few Scalar APIs work on int/long.

The movd and movq instructions will explicitly clear upper, rather than setting it from one of the source operands

That is okay, which makes consistent with C++ intrinsic semantics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is okay, which makes consistent with C++ intrinsic semantics.

Yes, I was just indicating why we couldn't use them for LoadLower or LoadUpper, since those explicitly take a lower and upper value for the "other" bits.

I'll submit a separate PR to remove the 4 overloads that are not valid.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii, did you have any other feedback or does this look good to you?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Thanks for indicating this mistake.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding
Copy link
Copy Markdown
Member Author

The product changes (+20/-9) are all in the first commit. The last two commits (+813/-0 and +25,955/-3,718, respectively) are purely test changes.

@tannergooding
Copy link
Copy Markdown
Member Author

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

@4creators
Copy link
Copy Markdown

@tannergooding I am going to wait for this PR before submitting my PRs with remaining SSE2 intrinsics modulo #16200

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants