Fix genCodeForIndexAddr#22528
Conversation
|
FX minopts diff summary: Improvements come from using 32 bit compares (that may no longer need a REX prefix) and use of 3 operand IMUL. Regressions come from the extra x86 has only improvements from the 3 operand IMUL: |
This does some weird things - treats the array length as 64 bit when it's in fact 32 bit, fails to zero extend TYP_INT indices, creates new GT_IND/GT_LEA nodes out of thin air.
|
@dotnet/jit-contrib |
| getEmitter()->emitInsLoadInd(ins_Load(TYP_INT), EA_4BYTE, arrLen.gtRegNum, &arrLen); | ||
| tmpReg = node->GetSingleTempReg(); | ||
| getEmitter()->emitIns_R_AR(INS_mov, EA_4BYTE, tmpReg, baseReg, static_cast<int>(node->gtLenOffset)); | ||
| getEmitter()->emitIns_R_R(INS_cmp, EA_8BYTE, indexReg, tmpReg); |
There was a problem hiding this comment.
Nice. I much prefer not modifying the register-related properties during codegen.
|
test OSX10.12 x64 Checked Innerloop Build and Test |
| // LEA needs 64-bit operands so we need to widen the index if it's TYP_INT. | ||
| // Since it's TYP_INT the upper 32 bits aren't used so we should be able | ||
| // to widen in place, without needing a temporary register. | ||
| getEmitter()->emitIns_R_R(INS_mov, EA_4BYTE, indexReg, indexReg); |
There was a problem hiding this comment.
I'm a bit concerned about this. Sure, it's a TYP_INT value so the upper 32 bits should not be used.
But what if LSRA in the future does something like assign a 64 bit register variable to a 32 bit use - a[(int)longVarReg] - and this code ends up zeroing out the upper 32 bits of longVarReg.
@CarolEidt Opinions ?
There was a problem hiding this comment.
Ah, very good point. I'm believe that today that would be a cast and so we would only reuse the register if the source was a last use (and index would be a cast). To be conservative, one could assert that index is not a register candidate lclVar, or it is a TYP_INT, with a comment that, in future, if we make the cast "implicit" for this case, we need to use a temp if the lclVar is not a last use.
There was a problem hiding this comment.
Looks like this kind of register "reinterpretation" already occurs, but in a different manner:
N014 ( 2, 3) [000007] ------------ t7 = * CAST long <- int REG rdx
/--* t5 long
+--* t7 long
N016 ( 10, 11) [000008] ---X-------- t8 = * ADD long REG rax
/--* t8 long
N018 ( 14, 14) [000010] DA-X-------- * STORE_LCL_VAR long V02 tmp1 NA REG NA
N020 ( 3, 4) [000018] -c--G--N---- t18 = CLS_VAR_ADDR byref Hnd=0xe8001558 REG NA
; V02 is long but here it is used as int and of course, the use gets the same register
N022 ( 3, 2) [000013] C----------- t13 = LCL_VAR int V02 tmp1 rax REG rax
N024 ( 1, 1) [000014] -c---------- t14 = CNS_INT int 3 REG NA
/--* t13 int
+--* t14 int
N026 ( 8, 6) [000016] ------------ t16 = * MUL int REG rax
This happens even in minopts so it looks like I'll need to actually allocate and the temp register when the array index needs to be widened, it cannot be done in place.
There was a problem hiding this comment.
Thanks for investigating this. This is rather troubling, but probably too fundamental to be easily changed.
There was a problem hiding this comment.
Yes, IMO it would better to not have LCLVAR nodes with a type other than the variable itself but I'm not sure if it's possible to avoid all the CQ fallout from adding the required cast nodes. I would guess it should be possible to make those cast nodes contained, though it's probably a bit complicated.
But what bugs me the most is that this kind of reinterpretation occurs even in minopts mode.
|
Added a commit to deal with the index widening issue. New x64 FX diff: Worse than before. Oh well, it's minopts code. |
|
x86 FX diff improved: It turns out that LSRA build code was also messed up, it was allocating a temp register whenever the index type was |
|
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
|
Meh, OSX and java.nio.channels.ClosedChannelException again... @dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test |
Fix genCodeForIndexAddr Commit migrated from dotnet/coreclr@e3d4b9c
This does some weird things - treats the array length as 64 bit when it's in fact 32 bit, fails to zero extend TYP_INT indices, creates new GT_IND/GT_LEA nodes out of thin air.
#20126 has a similar fix for ARM.
Sample generated code for
a[i + 2]:It's possible to construct contrived examples where the upper 32 bits are not zero:
a[(int)checked(longVar + 2)]: