Fix block store local address containment on ARM#27338
Conversation
When the source/destination address was local, genCodeForCpBlkUnroll was folding the local offset into the address mode of the generated load/store instructions as if the local address was contained. But lowering didn't actually contain the address so useless ADD instructions were still being generated.
|
FX diff summary: arm32: |
|
The couple of ARM32 regressions are due to different register choices that result in longer instruction encoding (e.g. r12 vs. r4). --- D:\Projects\jit\diffs\dasmset_2\base\System.Reflection.Metadata.dasm 2019-10-21 20:12:54.000000000 +-0300
+++ D:\Projects\jit\diffs\dasmset_2\diff\System.Reflection.Metadata.dasm 2019-10-21 20:12:54.000000000 +-0300
@@ -143152,10 +143120,9 @@
add r2, lr
- addw r12, sp, 24 // [V05 tmp1]
G_M23755_IG04: ;; bbWeight=2
- ldr r4, [r2]
- str r4, [sp+0x18] // [V05 tmp1]
- ldr r4, [r2+4]
- str r4, [sp+0x1c] // [V05 tmp1+0x04]
- ldr r4, [r2+8]
- str r4, [sp+0x20] // [V05 tmp1+0x08]
+ ldr r12, [r2]
+ str r12, [sp+0x18]
+ ldr r12, [r2+4]
+ str r12, [sp+0x1c]
+ ldr r12, [r2+8]
+ str r12, [sp+0x20]The diff is enormous so I cannot check all of it but all the diffs I saw look like above - a removed |
sandreenko
left a comment
There was a problem hiding this comment.
LGTM, thank you.
The CQ improvement is impressive.
| @@ -326,6 +326,20 @@ void Lowering::LowerBlockStore(GenTreeBlk* blkNode) | |||
| if (!blkNode->OperIs(GT_STORE_DYN_BLK) && (size <= CPBLK_UNROLL_LIMIT)) | |||
There was a problem hiding this comment.
while you are here could you please change the condition to if (!blkNode->OperIs(GT_STORE_BLK) && (size <= CPBLK_UNROLL_LIMIT))?
There was a problem hiding this comment.
Sure! But it can probably wait until the next PR (immediately after this one). Let's not kill the CI more than it's needed. 😁
Looks like it can be even better
if (obj) {
}
else if (blk && unroll) {
}
else {
}
The less nesting the better.
Extracted from #21711
When the source/destination address was local, genCodeForCpBlkUnroll was folding the local offset into the address mode of the generated load/store instructions as if the local address was contained. But lowering didn't actually contain the address so useless ADD instructions were still being generated.