Skip to content

Conversation

@kunalspathak
Copy link
Contributor

In #93223, we added StoreSelectedScalar intrinsic and some of the variants of that API that takes tuple as input needs consecutive registers. However, in the hwintrinsinc table, we were falsely marking the existing API as "NeedsConsecutiveRegister". Due to this, in jitstressregs, when we limit the available registers, we were prohibiting restricting the limited registers for a definition of internal temp, but were setting limitations on the use of that internal temp.

if (newRefPosition->isLiveAtConsecutiveRegistersLoc(consecutiveRegistersLocation))
{
// If we are assigning to refPositions that has consecutive registers requirements, skip the
// limit stress for them, because there are high chances that many registers are busy for
// consecutive requirements and
// we do not have enough remaining for other refpositions (like operands). Likewise, skip for the
// definition node that comes after that, for which, all the registers are in "delayRegFree" state.
}
else
#endif // TARGET_ARM64
{
newRefPosition->registerAssignment =
getConstrainedRegMask(newRefPosition, oldAssignment, calleeSaveMask, minRegCountForRef);
}

In codegen, we expect to see a single bit set for "one register" assigned to the temp, but in this case, we were assiging a different "copy" register to the use.

[000007] 20.#10 I6   Def    ORDER(A) x0   │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#11 I0   Use *  Keep     x19  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#12 I1   Use *  Keep     d8   │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#13 I5   Use *  Keep     x20  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
──────────────────────────────────────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
TreeID   LocRP# Name Type  Action    Reg  │x0  │x1  │x2  │x3  │x4  │x5  │x6  │x7  │x19 │x20 │x21 │d0  │d1  │d2  │d8  │d9  │
──────────────────────────────────────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
         20.#14 I6   Use *  ORDER(C) x21  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│I6 a│    │    │    │I1 a│    │

I have fixed by introducing new intrinsic ids for various StoreSelectedScalarVectorNxM variants, similar to how we do for LoadAndInsertScalarVectorNxM APIs. For existing StoreSelectedScalar API that doesn't need consecutive registers, we now do not mark with that flag anymore.

Fixes: #95025

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 21, 2023
@ghost ghost assigned kunalspathak Nov 21, 2023
@ghost
Copy link

ghost commented Nov 21, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

In #93223, we added StoreSelectedScalar intrinsic and some of the variants of that API that takes tuple as input needs consecutive registers. However, in the hwintrinsinc table, we were falsely marking the existing API as "NeedsConsecutiveRegister". Due to this, in jitstressregs, when we limit the available registers, we were prohibiting restricting the limited registers for a definition of internal temp, but were setting limitations on the use of that internal temp.

if (newRefPosition->isLiveAtConsecutiveRegistersLoc(consecutiveRegistersLocation))
{
// If we are assigning to refPositions that has consecutive registers requirements, skip the
// limit stress for them, because there are high chances that many registers are busy for
// consecutive requirements and
// we do not have enough remaining for other refpositions (like operands). Likewise, skip for the
// definition node that comes after that, for which, all the registers are in "delayRegFree" state.
}
else
#endif // TARGET_ARM64
{
newRefPosition->registerAssignment =
getConstrainedRegMask(newRefPosition, oldAssignment, calleeSaveMask, minRegCountForRef);
}

In codegen, we expect to see a single bit set for "one register" assigned to the temp, but in this case, we were assiging a different "copy" register to the use.

[000007] 20.#10 I6   Def    ORDER(A) x0   │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#11 I0   Use *  Keep     x19  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#12 I1   Use *  Keep     d8   │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
         20.#13 I5   Use *  Keep     x20  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│    │    │    │I1 a│    │
──────────────────────────────────────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
TreeID   LocRP# Name Type  Action    Reg  │x0  │x1  │x2  │x3  │x4  │x5  │x6  │x7  │x19 │x20 │x21 │d0  │d1  │d2  │d8  │d9  │
──────────────────────────────────────────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┤
         20.#14 I6   Use *  ORDER(C) x21  │I6 a│    │    │    │    │    │    │    │I0 a│I5 a│I6 a│    │    │    │I1 a│    │

I have fixed by introducing new intrinsic ids for various StoreSelectedScalarVectorNxM variants, similar to how we do for LoadAndInsertScalarVectorNxM APIs. For existing StoreSelectedScalar API that doesn't need consecutive registers, we now do not mark with that flag anymore.

Fixes: #95025

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Contributor Author

@SwapnilGaikwad

@SwapnilGaikwad
Copy link
Contributor

The change LGTM! Thanks for fixing it.

@kunalspathak kunalspathak marked this pull request as ready for review November 21, 2023 17:51
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib

@kunalspathak kunalspathak merged commit 52670e8 into dotnet:main Nov 21, 2023
@kunalspathak kunalspathak deleted the store-selected-scalar branch November 21, 2023 21:14
@kunalspathak kunalspathak added arm-sve Work related to arm64 SVE/SVE2 support and removed arm-sve Work related to arm64 SVE/SVE2 support labels Nov 22, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assertion failed 'genCountBits(availableSet) == 1'

3 participants