Skip to content

Conversation

@jakobbotsch
Copy link
Member

We can almost always avoid allocating an internal register, which may be expensive if lr is picked since we cannot use thumb encoding for it.

The only case where we need an internal register is when we have a source that is in a register, and we have a single taget register to place that conflicts with that source register. The to-stack copies then need an intermediate scratch register to avoid clobbering the source register.

We can almost always avoid allocating an internal register, which may be
expensive if lr is picked since we cannot use thumb encoding for it.

The only case where we need an internal register is when we have a
source that is in a register, and we have a single taget register to
place that conflicts with that source register. The to-stack copies then
need an intermediate scratch register to avoid clobbering the source
register.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 30, 2022
@ghost ghost assigned jakobbotsch Sep 30, 2022
@jakobbotsch
Copy link
Member Author

Should fix regression seen in #76017 (comment).

@ghost
Copy link

ghost commented Sep 30, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We can almost always avoid allocating an internal register, which may be expensive if lr is picked since we cannot use thumb encoding for it.

The only case where we need an internal register is when we have a source that is in a register, and we have a single taget register to place that conflicts with that source register. The to-stack copies then need an intermediate scratch register to avoid clobbering the source register.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

Nice arm32 diffs::

Collection Base size (bytes) Diff size (bytes)
benchmarks.run.Linux.arm.checked.mch 16,418,348 -1,940
coreclr_tests.run.Linux.arm.checked.mch 268,893,112 -94,424
libraries.crossgen2.Linux.arm.checked.mch 20,903,482 -4,048
libraries.pmi.Linux.arm.checked.mch 45,081,540 -11,660
libraries_tests.pmi.Linux.arm.checked.mch 115,952,560 -31,282

cc @dotnet/jit-contrib PTAL @BruceForstall

}
}

// Find first register to place. If we are placing addrReg, then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that I understand, we basically pick the first register to place after addrReg and round robin back so addrReg is placed last?

For r0, r1, addrReg, r3, r4, we will place: r3, r4, r0, r1, addrReg?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, can you call that out in comment with an example. Initially it was not clear to me, if that's what we meant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I'll add a comment like you suggested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL

{
if (treeNode->GetRegNumByIdx(i) == addrReg)
{
firstRegToPlace = i + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
firstRegToPlace = i + 1;
firstRegToPlace = (i + 1) % treeNode->gtNumRegs;

That way, you can remove the code from below loop:

            if (curRegIndex == treeNode->gtNumRegs)
            {
                curRegIndex  = 0;
                structOffset = 0;
            }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still needs to be handled in below loop (since the loop there also increments curRegIndex). So I think I'll leave it as is.

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakobbotsch jakobbotsch merged commit 452fd84 into dotnet:main Oct 1, 2022
@jakobbotsch jakobbotsch deleted the avoid-internal-reg-PUTARG_SPLIT branch October 1, 2022 09:48
@ghost ghost locked as resolved and limited conversation to collaborators Oct 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants