-
Notifications
You must be signed in to change notification settings - Fork 5.3k
[WIP] Stop using LIST nodes for SIMD operand lists #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
782b105 to
7c30d0d
Compare
|
@CarolEidt @sandreenko @echesakovMSFT This completes the removal of For We have space for 3 operands inside the SIMD node itself, that's good because it covers 99% of the intrinsic needs. For 4 operands or more (e.g. union {
Use m_inlineUses[3];
Use* m_uses;
};For some HW intrinsics the number of operands isn't fixed and in the current implementation it is computed by counting the number of nodes in the list. That doesn't work in the array implementation so we also need to store the number of operands in the node. uint16_t m_numOps;With these, the operands can be accessed using the following API: unsigned GetNumOps();
GenTree* GetOp(unsigned index);
void SetOp(unsigned index, GenTree* node);
UseArray Uses(); // range-based for loop support
Use& GetUse(unsigned index);So:
Other issues worth mentioning: Biggest remaining issue: Comments? |
958fa15 to
2c1feee
Compare
|
General Comments (will do more detailed review next):
I'm not sure how much simplification this would buy us, and would certainly cost some non-trivial implementation work, if not additional complexity. On the issue of 0-based vs 1-based indexing, I would strongly favor maintaining 0-based indexing. I'm sure that I would be more confused by having to remember that the indexed form is 1-based to match the operand names. But I'd be interested in others' opinions on this.
The only thing that's unfortunate about this is more conceptual than practical. That is, one would like to consider (most of?) these to have functional operator semantics, which to me seems to be implied by On the surface, the idea of sharing more between
Agreed. I'm sure I'm one of the guilty parties in getting us here. |
I think it would have been useful to represent true unary/binary intrinsics using What's there to lose by not using |
Ha ha, everyone is, me included :). Sometimes it's difficult to figure out when to move away from existing code base practices. Doing so can make new code better at the cost of becoming inconsistent with the old code. |
CarolEidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the direction looks good to me, with some comments, suggestions and questions.
src/coreclr/src/jit/gentree.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to reduce confusion it might be good to give this a different name such as GetIndexedOp or GetOpAtIndex, even though they're verbose it would at least reduce confusion with GetOp1 and GetOp2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense but I'm a bit concerned about the longer names as these are commonly used functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I get that - I'd be interested in others' thoughts on this. @dotnet/jit-contrib
I agree; there's not really a lot of actual value there.
The fact that the SIMD and HWIntrinsic enums are distinct seems like something that could be distinguished based on opcode. That is, if I have a |
Well, yes, this problem is solvable: struct GenTreeJitIntrinsic {
private:
union {
Use m_inlineUses[3];
Use* m_uses;
};
uint16_t m_numOps;
protected:
uint16_t m_intrinsic;
uint32_t m_unused;
};
struct GenTreeSIMD : public GenTreeJitIntrinsic {
SIMDIntrinsicID GetIntrinsic() const {
return static_cast<SIMDIntrinsicID>(m_intrinsic);
}
unsigned GetSIMDSize() const {
return (m_unused & 0xFFFF);
}
var_types GetSIMDBaseType() const {
return static_cast<var_types>((m_unused >> 16) & 0xFF);
}
};and similar The only problem is that I need to change a couple more hundreds of lines to replace |
Speaking of Also, I don't like the current intrinsic situation very much:
I've no idea if and when this situation could be improved. In the meantime we should ensure we're not making it worse somehow. |
src/coreclr/src/jit/gentree.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to double check this, it's not equivalent to the old gtSetListOrder code. On the other hand it's not clear if what gtSetListOrder did makes sense for SIMDIntrinsicInitN. Also, when looking more closely at the way SIMDIntrinsicInitN and gtSetEvalOrder work it seems that things are quite messy:
SIMDIntrinsicInitNhas horrible register requirements - up to 5 registers - because it needs to first evaluate all operands and then it stitches everything together. This approach will never fly if we try to make intrinsics forCreatemethods likeVector128<byte> Create(byte e0, byte e1, byte e2, byte e3, byte e4, byte e5, byte e6, byte e7, byte e8, byte e9, byte e10, byte e11, byte e12, byte e13, byte e14, byte e15)...gtSetEvalOrderruns before other optimizations so it has no idea that some trees may turn into constants or variables thus significantly reducing register requirements. This includesSIMDIntrinsicInitNbecoming a pseudo-constant in lowering...- We can't control the evaluation order for nodes with more than 2 operands unless we add more information to
GenTreeSIMD. And we're more or less out of space for any new information (e.g. an array of integers that would indicate the execution order of eveyr operand).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked "level" and costs a bit but "level" is still different from what gtSetListOrder does. But what gtSetListOrder does is dubious anyway:
runtime/src/coreclr/src/jit/gentree.cpp
Lines 2522 to 2533 in c1a51e2
| if (lvl < 1) | |
| { | |
| level = nxtlvl; | |
| } | |
| else if (lvl == nxtlvl) | |
| { | |
| level = lvl + 1; | |
| } | |
| else | |
| { | |
| level = lvl; | |
| } |
(l1 == l2) ? (l1 + 1) : max(l1, l2) and there's no trace of max anywhere in gtSetListOrder. I suspect max was supposed to be result of setting GTF_REVERSE_OPS and swapping levels:runtime/src/coreclr/src/jit/gentree.cpp
Lines 2501 to 2508 in c1a51e2
| if (list->gtFlags & GTF_REVERSE_OPS) | |
| { | |
| unsigned tmpl; | |
| tmpl = lvl; | |
| lvl = nxtlvl; | |
| nxtlvl = tmpl; | |
| } |
GTF_REVERSE_OPS is never set on list nodes.Whatever. Doesn't really matter and
gtSetEvalOrder is a mess anyway.
|
I think that for now I'm going to ignore the Instead I'm going to work on removing the far worse pre-existing duplication produced by the bajillion custom tree traversals that exist today. With |
Seems reasonable for now.
Those look quite promising, though when you're ready for final review it would be nice to limit this PR to a more minimal set of changes. |
With the fact that GenTreeHWIntrinsic are not longer GenTreeOp, can't we forbid bashing and changing them to other tree operands completely? It will allow us to allocate exact size for each HWIntrinsic node and we will be able to make a template parameter that means number of arguments up to 32.
I am not sure why the new sizes will be between SMALL and LARGE, can they be smaller than SMALL and larger than LARGE? |
I suppose we could but I'm not sure we want to go to such an extreme. Currently SIMD/HWINTRINSIC trees are not optimized but perhaps in the future we want do something about that. At a minimum, we should be able to convert such nodes to
Hmm, I don't think there's any real reason to limit the node size to LARGE, not sure what I was thinking. I think all nodes should be at least SMALL size so we can convert anything to |
We always can do a replacement instead of bashing. |
It depends. The JIR IR wasn't really designed for that, there are cases where you'll need to use |
Contributes to https://github.com/dotnet/coreclr/issues/19876