Skip to content

Comments

rewrite cdvecfill#7003

Merged
WalterBright merged 10 commits intodlang:masterfrom
MartinNowak:fix17484
Jul 17, 2017
Merged

rewrite cdvecfill#7003
WalterBright merged 10 commits intodlang:masterfrom
MartinNowak:fix17484

Conversation

@MartinNowak
Copy link
Member

@MartinNowak MartinNowak commented Jul 17, 2017

  • fix penalties for AVX-256 insns with XMM registers (see https://software.intel.com/en-us/articles/intel-avx-state-transitions-migrating-sse-code-to-avx)
  • stop using AVX2 insns in AVX1 mode (to init 32-byte registers)
  • avoid using temporary stack values
  • improve instruction sequences for int loading
  • use dedicated AVX2 int broadcast insns
  • optimize fixresult to avoid temporary stack values and superfluous movs
  • remove unnecessary REX.W prefixes for SIMD movs
  • add asm test to test generated code against regressions

Review commit by commit!
Also see #6315 and #6394 which added the problematic AVX code paths.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jul 17, 2017

Thanks for your pull request, @MartinNowak!

Bugzilla references

Auto-close Bugzilla Description
17484 high penalty for vbroadcastsd with -mcpu=avx

Copy link
Member

@WalterBright WalterBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@MartinNowak
Copy link
Member Author

I've disabled the OSX and Windows asm comparison tests completely. They use the same backend code, but due to ABI differences the assembly is different. Seems more pragmatic to just disable those tests instead of inventing some fuzzy matching.

- for all supported types and combinations of SSE, AVX, AVX2
- includes a small script to update the expected output using objdump
- should use shorter 0xC5 encoding
…regs

- as the upper 128-bits are no longer zero, the CPU will save/restore
  them when that register is used with legacy SSE instructions
- avoid using vbroadcastsd which is a AVX-256 only instruction to
  initialize 128-bit XMM vectors
- should use shorter 0xC5 encoding
- better use 2 instructions shuffle & vinsertf128
- the Eoper == OPvar && !isregvar heuristic didn't work for ref/pointer parameters
- also replace vbroadcastss/d YMM,XMM AVX2 insn with vinsertf128 in AVX1 mode
- avoid temporary stack usage
- use dedicated vpbroadcastq AVX2 instruction
- remove superfluous eax moves
- replace AVX2 vbroadcastss YMM,XMM insn in AVX1 mode
- use dedicated AVX2 vpbroadcastd insn
- use short insn sequence punpcklwd & pshufd
- use dedicated AVX2 vpbroadcastw insn
- use pshufb with AVX1 (>=SSE3)
- use dedicated AVX2 vpbroadcastb insn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants