Create RefPositions without TreeNodeInfo#16517
Conversation
|
This results in the following throughput improvements (crossgen of SPC.dll) as measured by pin instruction count:
|
|
@dotnet/jit-contrib PTAL (for preliminary feedback) |
| } | ||
| #endif | ||
| break; | ||
| RefPosition* def = BuildDef(tree); |
There was a problem hiding this comment.
Did you intend to use the def variable for anything?
There was a problem hiding this comment.
No; at one point I had thought I would need to keep the defs around, but realized it was not necessary.
| assert(info->dstCount == 1); | ||
| srcCount = 0; | ||
| assert(dstCount == 1); | ||
| dstCandidates = RBM_NONE; |
There was a problem hiding this comment.
Placing this in a "else" ifdef would be better IMO
| // to special case them. | ||
| // These tree nodes will have their op1 marked as isDelayFree=true. | ||
| // That is, op1's reg remains in use until the subsequent instruction. | ||
| GenTree* addr = tree->gtOp.gtOp1; |
There was a problem hiding this comment.
I would suggest using functions like gtGetOp1 and gtGetOp2 consistently
There was a problem hiding this comment.
Will do; I was programmed to avoid it back before we added gtGetOp2IfPresent(), and I guess I'm a creature of habit.
| dstCount = 1; | ||
| if (!data->isContained()) | ||
| { | ||
| RefPosition* dataUse = dataUse = BuildUse(data); |
There was a problem hiding this comment.
There's something odd with this line, dataUse appears twice. And it's not used anywhere anyway.
There was a problem hiding this comment.
I mistakenly was initially setting delay free on the data, not addr, which was a mistake - no clue about the double assignment, though.
| for (; sourceInfo != nullptr; sourceInfo = sourceInfo->Next()) | ||
| srcCandidates = allRegs(TYP_INT) & ~RBM_RCX; | ||
| dstCandidates = allRegs(TYP_INT) & ~RBM_RCX; | ||
| if (tree->IsReverseOp()) |
There was a problem hiding this comment.
Reverse op? Wasn't this removed from LIR?
There was a problem hiding this comment.
Well, I feel a little silly - I had either forgotten or had missed that this happened. There's code in LSRA that attempts to handle it, but I see that the Rationalizer is clearing it on all nodes, and CheckLIR is validating it. I added an issue #16528
|
|
||
| assert(!call->isContained()); | ||
| info->srcCount = 0; | ||
| srcCount = 0; |
| } | ||
| if (!tree->isContained()) | ||
| { | ||
| info->srcCount = srcCount; | ||
| RefPosition* def = BuildDef(tree, dstCandidates); |
| bool isUnsignedMultiply = ((tree->gtFlags & GTF_UNSIGNED) != 0); | ||
| bool requiresOverflowCheck = tree->gtOverflowEx(); | ||
| // Only non-floating point mul has special requirements | ||
| if (!varTypeIsFloating(tree->TypeGet())) |
There was a problem hiding this comment.
Make it consistent with BuildDivMod that does if (is float) { build simple; return; }? Or move the float check inside the switch case and rename BuildMul and BuildDivMod to BuildIntegerMul and BuildIntegerDivMod?
Note that FP add/sub/mul/div use the same instruction format but they're handled more or less differently throughout lower/ra/codegen. I have a PR to clean this up but I was waiting for your lsra refactoring to finish it.
| reinterpret_cast<LocationInfoListNode*>(compiler->compGetMem(preallocateSize, CMK_LSRA)); | ||
| size_t preallocateSize = sizeof(RefInfoListNode) * preallocate; | ||
| RefInfoListNode* preallocatedNodes = | ||
| reinterpret_cast<RefInfoListNode*>(compiler->compGetMem(preallocateSize, CMK_LSRA)); |
There was a problem hiding this comment.
Existing code but note that casting from void* to X* is normally done via static_cast. reinterpret_cast is usually reserved for "weird" stuff such as casting int* to float*.
|
@mikedn - thanks so much for the quick and thorough review; it's really appreciated! Clearly I need to fix some bugs ;-) and will incorporate your feedback in the next round. |
Far from thorough, I've only took a quick look in the morning and then went to work. Maybe I'll take another look this evening :) |
| } | ||
|
|
||
| //------------------------------------------------------------------------ | ||
| // getKillSetForMul: Determine the liveness kill set for a mod or div node. |
There was a problem hiding this comment.
Wrong function name in comment. Applies to subsequent functions as well.
| case GT_MUL_LONG: | ||
| #endif | ||
| killMask = RBM_RAX | RBM_RDX; | ||
| killMask = getKillSetForMul(tree->AsOp()); |
There was a problem hiding this comment.
Not an issue with this particular line per se but this whole function seemed a bit strange to me when I looked at it in the past. I would have expected BuildNode/TreeNodeInfoInit to somehow deal with this instead of having this whole switch here, it looks like a special case.
Looking now through this function's uses I see that one is likely redundant: buildUpperVectorSaveRefPositions is only called from BuildDefsWithKills and this one has a killMask that's supposed to be the same as the one returned by getKillSetForNode due to an assert.
And then there's the said assert in BuildDefsWithKills. Does this function actually needs the killMask parameter?
Apparently the only "real" use of getKillSetForNode is in some stress related code in buildRefPositionsForNode. Maybe there's should be another way to communicate the kill mask from build node (e.g. via a class member similar to pendingDelayFree)?
This may then allow moving functions like getKillSetForMul to where they really belong - lsraxarch.cpp.
There was a problem hiding this comment.
I don't think it's necessary to maintain the kill mask as a class member. I'm making the general getKillSetForNode DEBUG-only; it will only be used in the assert and in the stress modes where we constrain nodes.
| #ifdef _TARGET_XARCH_ | ||
| if (tgtPrefUse != nullptr) | ||
| { | ||
| defRefPosition->getInterval()->assignRelatedIntervalIfUnassigned(tgtPrefUse->getInterval()); |
There was a problem hiding this comment.
Isn't defRefPosition->getInterval() same as interval?
| RefPosition* useRefPos = newRefPosition(interval, currentLoc, RefTypeUse, operand, candidates, multiRegIdx); | ||
| if (regOptional) | ||
| { | ||
| useRefPos->setAllocateIfProfitable(true); |
There was a problem hiding this comment.
It might be better performance wise to always call setAllocateIfProfitable, always setting that bit might very well be cheaper than a branch.
| // The number of use RefPositions created | ||
| // | ||
| void LinearScan::BuildSimple(GenTree* tree) | ||
| int LinearScan::BuildSimple(GenTree* tree) |
There was a problem hiding this comment.
Is this function really useful? For example, I look at xarch build code and I see that it handles GT_CNS_INT, GT_CND_DBL, GT_CNS_LNG (huh? looks like this one shouldn't reach LSRA due to decomposition) so attempting to handle GTK_CONST in BuildSimple is probably useless.
In general, BuildNode's switch already handles a ton of opers, I might be better (from a performance point of view and even for clarity) to add whatever opers are missing to those BuildNode switches and make default case unreached.
There was a problem hiding this comment.
Sounds reasonable, but probably not something I want to undertake with this PR.
| { | ||
| #ifdef _TARGET_XARCH_ | ||
| RefPosition* tgtPrefUse = nullptr; | ||
| if (node->OperIsBinary() && isRMWRegOper(node)) |
There was a problem hiding this comment.
Similar to BuildSimple, it's not clear if this kind of special casing in a what's supposed to be general purpose function is useful.
It may be better to simply call BuildRMWUses directly from BuildNode's switch relevant cases (e.g. case GT_XOR) rather than calling BuildBinaryUses from many places and having to decide again what to do.
| int dstCount = 0; | ||
| regMaskTP dstCandidates = RBM_NONE; | ||
| regMaskTP killMask = RBM_NONE; | ||
| pendingDelayFree = false; |
There was a problem hiding this comment.
These member assignments should be visually separated from the local variable above, I'd move them above locals and add a blank line. Or, perhaps they should not even be in BuildNode but in its caller.
Also, the need for these data members is a bit unfortunate. It's kind of difficult to track down where they are assigned values and where they are used, the delay free ones in particular.
There was a problem hiding this comment.
I agree that they are somewhat unfortunate; I tried to come up with an approach that didn't require these data members on LinearScan, but could only come up with models that involved excessive passing around of state.
I'd prefer to keep them in BuildNode, however, as that's the main entry point for all the building functionality. I'll separate them and document them more clearly.
| assert(!tree->IsUnusedValue() || (dstCount != 0)); | ||
| assert(dstCount == tree->GetRegisterDstCount()); | ||
| INDEBUG(dumpNodeInfo(tree, dstCandidates, srcCount, dstCount)); | ||
| return srcCount; |
There was a problem hiding this comment.
It looks like the return value of this function is used only in an assert. I hope that assert is worth all the added complication :)
There was a problem hiding this comment.
Yes - it's something I think should probably be removed, but I found it useful for debugging and comparison as I was making the transition from TreeNodeInfo
| isLocalDefUse = true; | ||
| tree->SetUnusedValue(); | ||
| } | ||
| RefPosition* def = BuildDef(tree); |
There was a problem hiding this comment.
Vertical alignment of all assignments is the worst formatting rule the JIT code uses...
| #endif // _TARGET_X86_ | ||
| BuildDef(tree, dstCandidates); |
There was a problem hiding this comment.
Or just pass RBM_BYTE_REGS directly? It's RBM_ALLINT on x64 so it should do the right thing.
| int srcCount = 0; | ||
| int internalCount = 0; | ||
| const int MaxInternalCount = 4; | ||
| RefPosition* internalDefs[MaxInternalCount]; |
There was a problem hiding this comment.
Not used it seems. Same for internalCount above. Strange that they have the same name as 2 LinearScan data members.
There was a problem hiding this comment.
Yes; I had originally added these to each method needing internal temps, but it was quite a bit messier than handling it pseudo-globally. I'll delete these.
| bool hasMultiRegRetVal = false; | ||
| ReturnTypeDesc* retTypeDesc = nullptr; | ||
| RefPosition* internalDefs[MAX_ARG_REG_COUNT]; |
There was a problem hiding this comment.
This is populated in the code below but then it doesn't appear to be used.
| { | ||
| return; | ||
| } | ||
| int srcCount = BuildRMWUses(tree->AsOp()); |
There was a problem hiding this comment.
Hmm, calling BuildRMWUses without calling isRMWRegOper that has special handling for GT_MUL?
|
Hmm, I left a few comments related to RMW but I'm not sure I'm seeing the big picture, there may be a few issues that need to be considered:
Now, neither issue is directly related to this PR but you may want to consider the impact they have on your refactoring. It may be that |
| // Example1: GT_EQ(int, op1 of type ubyte, op2 of type ubyte) - in this case codegen uses | ||
| // ubyte as the result of comparison and if the result needs to be materialized into a reg | ||
| // simply zero extend it to TYP_INT size. Here is an example of generated code: | ||
| // cmp dl, byte ptr[addr mode] |
There was a problem hiding this comment.
The setcc instruction is missing from the example
There was a problem hiding this comment.
Also, I'm not sure why "in this case codegen uses ubyte as the result of comparison and if the result needs to be materialized into a reg simply zero extend it to TYP_INT size" appears here, this comment is more suitable where dstCandidates is set.
| // Example2: GT_EQ(int, op1 of type ubyte, op2 is GT_CNS_INT) - in this case codegen uses | ||
| // ubyte as the result of the comparison and if the result needs to be materialized into a reg | ||
| // simply zero extend it to TYP_INT size. | ||
| else if (varTypeIsByte(op1) && op2->IsCnsIntOrI()) |
There was a problem hiding this comment.
Why was this case added? Codegen should not attempt to generate a byte instruction unless both operands are byte. If the constant isn't a byte then an int compare instruction should be generated even if the first operand is a byte, it's supposed to have been extended to int.
There was a problem hiding this comment.
These cases where copied from LinearScan::ExcludeNonByteableRegisters(), previously in lsraxarch.cpp. It may be that these can be improved/eliminated, but I am aiming for zero-diffs for now.
| RefPosition* sourceHiUse = BuildUse(sourceHi, srcCandidates); | ||
|
|
||
| if (tree->OperGet() == GT_LSH_HI) | ||
| if (!tree->isContained()) |
There was a problem hiding this comment.
A shift node can be contained under a STOREIND when it is part of a memory RMW op. In most cases this is just handled in BuildIndir, but shift ops have special register requirements.
| unsigned size = blkNode->gtBlkSize; | ||
| GenTree* source = blkNode->Data(); | ||
| int srcCount = 0; | ||
| int internalCount = 0; |
| if(WIN32) | ||
| # set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE " /LTCG /USEPROFILE:PGD=${ProfilePath}") | ||
| # set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELWITHDEBINFO " /LTCG /USEPROFILE:PGD=${ProfilePath}") | ||
| set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE " /LTCG /USEPROFILE:PGD=${ProfilePath}") |
There was a problem hiding this comment.
Presumably you wanted to build without PGO and accidentally checked in the commented out lines? There's a build.cmd if you need that - -nopgooptimize.
e1d63a4 to
238a4dc
Compare
|
@dotnet/jit-contrib ping. |
This was fixed recently @dotnet-bot test Windows_NT x64_arm64_altjit Checked jitstressregs0x1000 |
|
The 'x64_arm64_altjit Checked jitstressregs3' failures also occur in master. |
|
@dotnet-bot test Windows_NT x64_arm64_altjit Checked jitstressregs3 |
BruceForstall
left a comment
There was a problem hiding this comment.
couple minor things noticed so far
| @@ -82,118 +82,117 @@ inline regMaskTP calleeSaveRegs(RegisterType rt) | |||
| //------------------------------------------------------------------------ | |||
| // LocationInfo: Captures the necessary information for a node that is "in-flight" | |||
| } | ||
| prevListNode = listNode; | ||
| } | ||
| assert(!"GetRefPosition didn't find the node"); |
| } | ||
| prevListNode = listNode; | ||
| } | ||
| assert(!"GetRefPosition didn't find the node"); |
| // | ||
| // Return Value: a register mask of the registers killed | ||
| // | ||
| regMaskTP LinearScan::getKillSetForProfilerHook() |
There was a problem hiding this comment.
Seems like this should be named getKillSetForProfilerHookTailcall
There was a problem hiding this comment.
Except that the node name is GT_PROF_HOOK, so this seems like the better name to me.
| static bool OperIsMul(genTreeOps gtOper) | ||
| { | ||
| return (gtOper == GT_MUL) || (gtOper == GT_MULHI) | ||
| #if !defined(_TARGET_64BIT_) && !defined(LEGACY_BACKEND) |
There was a problem hiding this comment.
This will be a conflict with @BruceForstall change to remove LEGACY_BACKEND
There was a problem hiding this comment.
Yes, there are probably more conflicts as well; that's how it goes ;-)
|
|
||
| if (op2 != nullptr) | ||
| { | ||
| return 2; |
There was a problem hiding this comment.
You may want to add:
assert(!op1->OperIsList());
There was a problem hiding this comment.
Are the scenarios where op1->OperIsList() and op2 != nullptr?
There was a problem hiding this comment.
No, if op1 is a list, op2 must be null.
| return 2; | ||
| } | ||
|
|
||
| if (op1 != nullptr) |
There was a problem hiding this comment.
nit: This would be easier to read (less nesting) if we did:
if (op1 == nullptr)
{
return 0;
}
if (op1->OperIsList())
{
// logic
}
else
{
return 1;
}
There was a problem hiding this comment.
Need to account for op2, but will restructure.
| @@ -2263,9 +2290,8 @@ void LinearScan::BuildSIMD(GenTreeSIMD* simdTree) | |||
| // Return Value: | |||
| // None. | |||
|
|
|||
| void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) | |||
| int LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) | |||
There was a problem hiding this comment.
I cleaned up this method here: #18078
It might be worthwhile merging the PRs together (some of the changes were fixing "correctness" issues I found, and the other changes are required to support the FMA instruction set, where any of the three operands can be contained).
There was a problem hiding this comment.
The changes I did to support FMA are here: https://github.com/tannergooding/coreclr/commit/e402548087f2bf9554e83052911561c016c60346#diff-498fd4859cb147c9bf082f5ebf32ca8fR2507
(The PR isn't up as it requires the CoreFX change adding back the unimplemented APIs to the ref assembly).
There was a problem hiding this comment.
It might be worthwhile merging the PRs together
I think it might be easier to serialize them.
| list = list->Rest(); | ||
| } | ||
|
|
||
| assert(numArgs > 0); |
There was a problem hiding this comment.
Shouldn't this be: assert(numArgs >= 3) or are there cases where we (inefficiently) use a list for less than 2 args?
| { | ||
| info->srcCount = appendBinaryLocationInfoToList(tree->AsOp()); | ||
| assert((kind & GTK_SMPOP) != 0); | ||
| int srcCount = BuildBinaryUses(tree->AsOp()); |
There was a problem hiding this comment.
This definition hides the enclosing one; that seems wrong, or at least confusing.
There was a problem hiding this comment.
Fixed. (This default is rarely used, and @mikedn recommended we eliminate it, which is probably a good idea, but perhaps for another day.)
| } | ||
| #endif // DEBUG | ||
| int newDefListCount = defList.Count(); | ||
| int produce = newDefListCount - oldDefListCount; |
There was a problem hiding this comment.
So the only use (and assert using) produce is for amd64? Should you define this below, then?
There was a problem hiding this comment.
This assert was pre-existing, though before there were other uses of produce. However, I'd like to strengthen this assert to apply more broadly, but it requires some cleanup of the various nodes that produce multiple results. I've added a note to issue #13183 that deals with multi-reg nodes to strengthen this assert as well.
| { | ||
| internalCandidates = allRegs(TYP_INT); | ||
| } | ||
| printf(" +<TreeNodeInfo %d=%d %di %df", dstCount, srcCount, internalIntCount, internalFloatCount); |
There was a problem hiding this comment.
There are currently still many references in comments to the TreeNodeInfo struct -- and the type itself is still there. But the type is unused. Can you update the comments, and remove the references and type?
| // destination and internal [temp] register counts). | ||
| // | ||
| void LinearScan::BuildNode(GenTree* tree) | ||
| int LinearScan::BuildNode(GenTree* tree) |
There was a problem hiding this comment.
The return type isn't described in the comments above
| } | ||
|
|
||
| //------------------------------------------------------------------------ | ||
| // GetOperandInfo: Get the source registers for an operand that might be contained. | ||
| // BuildDef: Build one or more RefTypeDef RefPositions for the given node, |
There was a problem hiding this comment.
name in comment doesn't match function name
| } | ||
|
|
||
| //------------------------------------------------------------------------ | ||
| // GetOperandInfo: Get the source registers for an operand that might be contained. | ||
| // BuildDef: Build one or more RefTypeDef RefPositions for the given node |
There was a problem hiding this comment.
name in comment doesn't match function name
BruceForstall
left a comment
There was a problem hiding this comment.
LGTM
I've only had a few minor comments, but overall it looks good to me.
|
@dotnet-bot test Windows_NT x64 Formatting |
This is no longer used after dotnet#16517
|
@CarolEidt We need to update ryujit-overview.md that still refers to |
|
@erozenfeld - thanks for pointing that out. I'm going to open an issue and assign it to myself so that it doesn't get lost. |
This is no longer used after dotnet/coreclr#16517 Commit migrated from dotnet/coreclr@5d01d2d
This is the next phase of building RefPositions incrementally.
The big pictures is that, instead of creating
TreeNodeInfowith the register requirements for each node, theBuildmethods inLinearScanbuild theRefPositions directly, putting the defs in aDefListfor when the consuming node builds the corresponding uses.There are zero diffs for crossgen of frameworks & tests across all the x64 & x86 + altjits (arm64, arm and x64/ux), aside from a small number of improvements due to some RMW handling changes.