Enable block init unroll on ARM32#27450
Conversation
|
Diff summary (x86 crossgen + arm32 altjit): |
|
Diff should be better if local destination address containment is enabled (it is enabled for block copy but not for block init) but I'll do that in a separate PR because it affects all targets. |
| #ifdef _TARGET_ARM64_ | ||
| attr = EA_1BYTE; | ||
| #else | ||
| attr = EA_4BYTE; |
There was a problem hiding this comment.
now we have that code here and in genCodeForCpBlkUnroll.
And it takes me time to recall why it is like this. Maybe it can be a separate method with an explanation?
Also, unrelated to this change question, now it is unclear for me why do other places where we call emitActualTypeSize(short) on arm32 work?
For example, inst_SETCC could have a short or byte type, in this case it will call GetEmitter()->emitIns_R_I(INS_mov, emitActualTypeSize(type), dstReg, 0); with emitActualTypeSize(type) == 1 or 2, that should fail with an assert in the emmitter, why doesn't it happen?
There was a problem hiding this comment.
And it takes me time to recall why it is like this. Maybe it can be a separate method with an explanation?
Well, the proper fix would be to change the ARM64 emitter to not demand EA_1BYTE/EA_2BYTE. Actually I need to doublecheck this because I remember I did a change like this a while ago, maybe I missed a case.
with emitActualTypeSize(type) == 1 or 2, that should fail with an assert in the emmitter, why doesn't it happen?
Not sure I understand, emitActualTypeSize can't be 1 or 2, the "actual" types start with TYP_INT so it's at least 4. In fact, the "actual type" IR model matches well the ARM ISA, which does not have small int type registers nor small int type operations.
There was a problem hiding this comment.
Scratching head - if I remove the ifdefs it works fine with EA_4BYTE on ARM64. It is possible that I wrote this code before my PR that fixed the ARM64 emitter inconsistency was merged. Oh well, I'll check again tonight when I get back from work.
There was a problem hiding this comment.
Not sure I understand, emitActualTypeSize can't be 1 or 2, the "actual" types start with TYP_INT so it's at least 4. In fact, the "actual type" IR model matches well the ARM ISA, which does not have small int type registers nor small int type operations.
Makes sense, so there are no any other places where we could call `emit->emitIns_smth(*, EA_2BYTE, *). Thanks.
There was a problem hiding this comment.
Well, it turns out that I managed to confuse myself. The inconsistency that I fixed was actually in the ARM32 emitter, it insisted that ldrb/ldrh/strb/strh use EA_1BYTE/EA_2BYTE. That was changed in #20126 so it now requires EA_4BYTE.
The ARM64 never required this and it only cares about the size being EA_8BYTE or something else:
Lines 7857 to 7879 in 4795f24
So the ifdefs are actually useless. They were copied from the copy block code and the copy block code basically preserved them from the pre-cleanup code, when we had a huge ifdef for the whole copy block loop:
coreclr/src/jit/codegenarmarch.cpp
Line 1990 in 667222e
I'll remove these ifdefs in the next PR.
Extracted from #21711
On ARM32 unroll was already done for block copy but not for block init. This was the only target that didn't unroll block init.
I've lowered the existing init unroll limit from 32 to 16, I'm not familiar with ARM perf characteristic but the limit seemed quite high, considering that the current implementation uses only 32 bit stores. Other targets have 64 and even 128 unroll limits but they also use 128 bit stores.
16 also looks like a reasonable code size trade off - with the 32 byte limit the diff ends up 5 kbytes in red but with 16 it is a 15 kbytes improvement.