Improve init/copy block codegen#21711
Conversation
a4a2204 to
512b6ca
Compare
|
x64: x86: arm64: arm32: Not sure what Microsoft.Diagnostics.Tracing.TraceEvent is doing. Presumably it has a secret plan to take over the world by copying itself tens of thousands of times :) |
c3a516f to
6377fdd
Compare
6377fdd to
f735ccc
Compare
4461b9b to
db5d596
Compare
448d186 to
be19099
Compare
81dde33 to
49dd036
Compare
2985116 to
1b99f2d
Compare
801a7d6 to
04544c7
Compare
3cf090a to
698b556
Compare
4b694f9 to
de44e29
Compare
e440d2f to
c47445e
Compare
14604e7 to
44466c2
Compare
2d91c84 to
65e840a
Compare
65e840a to
e00ac4f
Compare
|
Whew, after a bunch of cleanup PRs I think this is done now. There are still CQ improvements that could be done but that's all I have at the moment. |
Could you please update the header with was is left in this PR? |
Right, it was still mentioning stuff done in other PRs or stuff that wasn't done at the time of its writing but that now is. |
|
/azp run coreclr-outerloop |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| } | ||
| #endif | ||
|
|
||
| if (!IsSafeToContainMem(blkNode, addr)) |
There was a problem hiding this comment.
From the description of IsSafeToContainMem it is not obvious that it allows non-immediate children, but from the code it is obvious that it does so calls like ContainBlockStoreAddress(blkNode, size, src->AsIndir()->Addr()); are correct.
There was a problem hiding this comment.
Yeah, the parent/child naming doesn't make a lot of sense for IsSafeToContainMem. This more or less checks if it's safe to move a "childNode" forward to the "parentNode" point. The parent/child appeared because it was used for containment.
| return; | ||
| } | ||
|
|
||
| if (!addr->OperIs(GT_ADD) || addr->gtOverflow() || !addr->AsOp()->gtGetOp2()->OperIs(GT_CNS_INT)) |
There was a problem hiding this comment.
Why TryCreateAddrMode can't be used here as it happens on XARCH?
I see that it does different checks, but it is unclear if this case is unique and other calls to TryCreateAddrMode on arm are correct or all other calls to ``TryCreateAddrModeneed to be changed to checkif ((size >= 2 * REGSIZE_BYTES) && (offset % REGSIZE_BYTES != 0))` for example.
There was a problem hiding this comment.
The address mode situation on ARM is pretty messy - TryCreateAddrMode (well, genCreateAddrMode really), was written for x86 and despite having tons of ifdefs in it (and I've rarely seen such an abuse of ifdefs) still generates address modes that aren't really valid on ARM and relies on the emitter to sort things out - for example:
Line 11823 in 5a381e7
And the emitter sorts things out only in certain cases (if you use emitInsLoadStoreOp and that only works with GT_IND nodes, does not support LDP/STP, requires temporary registers) so it's not suitable for block init/copy purposes.
So I preferred to avoid TryCreateAddrMode. It may be possible to use it to create an address mode node and then do extra checks before containing it. But there doesn't seem to be any point in doing that - it's a lot of extra code that needs to run only to produce a LEA that may not be useful and the original ADD should have been left alone.
This improved block op codegen by better containment of the source/destination address in the unrolled case:
Enable block init unroll on ARM32 (it was already enabled on ARM64 and the differences between the 2 architectures are trivial).(Done in Enable block init unroll on ARM32 #27450)Fix block copy local address containment on ARM. Block store lowering didn't mark local addresses as contained yet the codegen did containment on its own by ignoring the address computed in a register and instead emitting load/stores using the frame pointer register directly.(Done in Fix block store local address containment on ARM #27338)Examples:
With this change the following code is generated instead:
This also applies to array elements and local variables. For example, initializing a struct local var may currently generate:
After this change the generated code is: