[JIT] Optimize constant V512 vector with broadcast#92017
[JIT] Optimize constant V512 vector with broadcast#92017BruceForstall merged 8 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis PR is trying to solve #90328. The optimization is implemented by replacing the constant V512 vector by a V128 and a Currently, the implementation only covers
|
|
Ran the test suite twice, should be some known or random fails, turning PR to ready for review. |
|
Is this going to perform better or just save on the size of the rodata section? What about on hardware without AVX-512 ( What about For AVX-512, the |
The expected improvement is saving some memory space for constant values.
I was intended to use
I presume the scope is to achieve compressing larger existing constant vector to smaller vector in a pure memory operation, which embedded broadcast might not be able to handle. If we want to take compressing to scalar into consideration, we might also have the opportunity: V128/256/512 ->Byte/Word/DWord/QWord.
From my understanding of #90328, the issue is for a pure store instruction case, then the code gen is mostly: the optimization is essentially replacing the first load with a broadcast instruction with a smaller constant operand. I might get the issue wrong or incompletely, so please correct me if I have any misunderstanding. |
👍, if its primarily for the case where we'd otherwise have a I was initially concerned it would also change: into |
I think it wouldn't cover that case (at least it is not intended to cover), as the entry point of this opt is |
|
Fail should be unrelated. Hi, @tannergooding @EgorBo, I added the optimization for V512->V256 and V256->V128, and I think it reaches the expected coverage and ready for the reviews. |
|
@tannergooding, this community PR is ready to review. PTAL. |
src/coreclr/jit/lowerxarch.cpp
Outdated
There was a problem hiding this comment.
I believe you can just do:
| if (!node->Data()->AsVecCon()->TypeIs(TYP_SIMD32) && !node->Data()->AsVecCon()->TypeIs(TYP_SIMD64)) | |
| if (!node->Data()->AsVecCon()->TypeIs(TYP_SIMD32, TYP_SIMD64)) |
tannergooding
left a comment
There was a problem hiding this comment.
LGTM. This should get a secondary review from someone on the JIT team
CC. @dotnet/jit-contrib
|
CC. @jakobbotsch, @EgorBo in particular |
56041f3 to
a165107
Compare
a165107 to
a01fd58
Compare
|
Hi @jakobbotsch @EgorBo, this PR is ready for review, would you please take a look? Thanks! |
|
Since this is AVX-512 backend work I think @BruceForstall should take a look... On my quick glance it seemed a bit odd to do it during |
|
Thanks everyone for reviewing on this! |
This PR is trying to solve #90328.
The optimization is implemented by replacing the constant V512 vector by a V128 and a
broadcasti128node when loweringGT_STOREINDplus an eligible constant V512 vector as its operand.Currently, the implementation only covers
V512 -> broadcasti128(V128), we are open to adjust the implementation or bring more situations into this PR, ideallyV512/256 -> broadcasti128(V128), when AVX512 is available. (Possibly plusV512 -> broadcast64x4(V256).)