Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Implement AVX/AVX2/SSE3 Load* intrinsics#16200

Merged
CarolEidt merged 3 commits into
dotnet:masterfrom
fiigii:loadstore
Feb 6, 2018
Merged

Implement AVX/AVX2/SSE3 Load* intrinsics#16200
CarolEidt merged 3 commits into
dotnet:masterfrom
fiigii:loadstore

Conversation

@fiigii
Copy link
Copy Markdown

@fiigii fiigii commented Feb 4, 2018

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 4, 2018

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

Comment thread src/jit/hwintrinsiclistxarch.h Outdated
HARDWARE_INTRINSIC(SSE41_IsSupported, "get_IsSupported", SSE41, -1, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IsSupportedProperty, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE41_Multiply, "Multiply", SSE41, -1, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_pmuldq, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_Commutative)
HARDWARE_INTRINSIC(SSE41_BlendVariable, "BlendVariable", SSE41, -1, 16, 3, {INS_pblendvb, INS_pblendvb, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_blendvps, INS_blendvpd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE41_LoadAlignedVector128NonTemporal, "LoadAlignedVector128NonTemporal", SSE41, -1, 16, 1, {INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_movntdqa, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_NoFlag)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alignment of fields is off

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix.

@tannergooding
Copy link
Copy Markdown
Member

It would be useful to uncomment the ProcessInputs lines for Avx and Avx2 and to rerun the script: https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/HardwareIntrinsics/X86/Shared/GenerateTests.csx#L169

The only thing blocking them (to my knowledge) was the LoadVector256 and LoadAlignedVector256 implementations.

@tannergooding
Copy link
Copy Markdown
Member

Are you going to handle the Store intrinsics in a separate PR?

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 5, 2018

Are you going to handle the Store intrinsics in a separate PR?

Yes, I will submit the store PR tomorrow.

@4creators
Copy link
Copy Markdown

Yes, I will submit the store PR tomorrow.
@fiigii will it comprise Sse2 as well?

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 5, 2018

will it comprise Sse2 as well?

Yes, I will implement all Sse2.Store*.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 5, 2018

@CarolEidt @tannergooding RyuJIT emitter seems not support SSE4.1 movntdqa without VEX-encoding. I removed Sse41.LoadAlignedVector128NonTemporal from this PR and logged the issue at https://github.com/dotnet/coreclr/issues/16216.

Can we merge these Load intrinsics at first and implement Sse41.LoadAlignedVector128NonTemporal later? That would be good to test containment on Load*.

@tannergooding
Copy link
Copy Markdown
Member

I agree that dropping the Sse41 4-byte instruction here and implementing it later is fine.

the SSE4.1 4-byte instructions in particular are blocked until https://github.com/dotnet/coreclr/issues/15908 is resolved.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 5, 2018

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

test OSX10.12 x64 Checked jitincompletehwintrinsic
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx
test OSX10.12 x64 Checked jitx86hwintrinsicnoavx2
test OSX10.12 x64 Checked jitx86hwintrinsicnosimd
test OSX10.12 x64 Checked jitnox86hwintrinsic

@fiigii fiigii changed the title Implement AVX/AVX2/SSE4.1/SSE3 Load* intrinsics Implement AVX/AVX2/SSE3 Load* intrinsics Feb 6, 2018
@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 6, 2018

@tannergooding Do you know why do these OSX tests fail? There is no error message.

@tannergooding
Copy link
Copy Markdown
Member

Looks like the machines had network issues. I have requeued the jobs.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 6, 2018

Thank you. Looks like unrelated failures.
Rebased to solve the conflict from SSE2 load.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 6, 2018

@CarolEidt Does this PR look good to you?

@CarolEidt
Copy link
Copy Markdown

It would be useful to uncomment the ProcessInputs lines for Avx and Avx2 and to rerun the script

It would be good to know if you have done that. If not, could you open an issue to enable the template for those?

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the conflicts need to be resolved before merging.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 6, 2018

@tannergooding @CarolEidt Thank for the review, I logged the containment test work at https://github.com/dotnet/coreclr/issues/16244. Will submit a PR soon.

@fiigii
Copy link
Copy Markdown
Author

fiigii commented Feb 6, 2018

@CarolEidt May I have the permission to add PR/issues to "Hardware Intrinsic Project" if possible? That may save your time 😄

@CarolEidt
Copy link
Copy Markdown

May I have the permission to add PR/issues to "Hardware Intrinsic Project" if possible?

I'm not sure if that's possible - I don't see any permission settings, so I suspect that it uses the repo permisions.

@CarolEidt CarolEidt merged commit 5e94fd1 into dotnet:master Feb 6, 2018
@fiigii fiigii deleted the loadstore branch February 6, 2018 23:04
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Implement AVX/AVX2/SSE3 Load* intrinsics

Commit migrated from dotnet/coreclr@5e94fd1
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants