Conversation
Contributor
|
Thanks for your pull request, @MartinNowak! Bugzilla references
|
Member
Author
|
I've disabled the OSX and Windows asm comparison tests completely. They use the same backend code, but due to ABI differences the assembly is different. Seems more pragmatic to just disable those tests instead of inventing some fuzzy matching. |
- for all supported types and combinations of SSE, AVX, AVX2 - includes a small script to update the expected output using objdump
- should use shorter 0xC5 encoding
…regs - as the upper 128-bits are no longer zero, the CPU will save/restore them when that register is used with legacy SSE instructions - avoid using vbroadcastsd which is a AVX-256 only instruction to initialize 128-bit XMM vectors
- should use shorter 0xC5 encoding
- better use 2 instructions shuffle & vinsertf128 - the Eoper == OPvar && !isregvar heuristic didn't work for ref/pointer parameters - also replace vbroadcastss/d YMM,XMM AVX2 insn with vinsertf128 in AVX1 mode
- avoid using temporary on stack
- avoid temporary stack usage - use dedicated vpbroadcastq AVX2 instruction
- remove superfluous eax moves - replace AVX2 vbroadcastss YMM,XMM insn in AVX1 mode - use dedicated AVX2 vpbroadcastd insn
- use short insn sequence punpcklwd & pshufd - use dedicated AVX2 vpbroadcastw insn
- use pshufb with AVX1 (>=SSE3) - use dedicated AVX2 vpbroadcastb insn
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Review commit by commit!
Also see #6315 and #6394 which added the problematic AVX code paths.