Stop using LIST nodes for FIELD_LIST#26800
Conversation
ee461d8 to
199b6f9
Compare
|
@sandreenko Does it sound reasonable to remove |
Yes, feel free to remove that.
Yes, also there were noway asserts and buggy code in assertion propagation that used it for correctness, that was cleaned. So now |
|
Thanks! (and oops, I asked the question in the wrong PR, it should have been in #24294) |
|
@dotnet/jit-contrib |
199b6f9 to
5e61a01
Compare
CarolEidt
left a comment
There was a problem hiding this comment.
Just comments and questions at this point, but I really love this change - I've never been a fan of the FIELD_LIST representation; I think it was one of the worst list cases.
| (void)new (comp, GT_FIELD_LIST) GenTreeFieldList(argHi, 4, TYP_INT, fieldList); | ||
| GenTreeFieldList* fieldList = new (comp, GT_FIELD_LIST) GenTreeFieldList(); | ||
| fieldList->gtUses = new (comp, CMK_ASTNode) GenTreeFieldList::Use(argLo, 0, TYP_INT); | ||
| fieldList->gtUses->SetNext(new (comp, CMK_ASTNode) GenTreeFieldList::Use(argHi, 4, TYP_INT)); |
There was a problem hiding this comment.
Did you consider adding a method to GenTreeFieldList to add fields?
There was a problem hiding this comment.
Yes, every time I did one of these "stop using GT_LIST" changes. The problem is that it's kind of risky to have something like "AddUse" because we only keep track of the head of the linked list and appending would require traversing the list to find the tail. I think there are 2 ways to handle this:
- add something similar to
gtNewCallArgs-gtNewFieldListor something like that - that hides away this manual list manipulation in those cases where the number of fields is statically known - add something like
UseListBuilder- a small class that keeps track of both the head and tail and can be used in the rest of the cases.
This should cover almost all cases. AFAIR the only place where we still need to do some manual linked list manipulation in long decomposition (and that's the main reason I maintained the existing linked list approach, otherwise arrays would be a very tempting alternative).
There was a problem hiding this comment.
Right. Another option would be to have constructor overloads for the known-number-of-fields cases. But I'm OK with deferring consideration of such a change.
There was a problem hiding this comment.
Though in the case of field list (and PHIs) I don't think there's anything stopping us from adding the tail of the list to the node, there's enough space in GenTreeFieldList for that. It's only calls that are problematic, they're large enough even without adding more pointers to them.
There was a problem hiding this comment.
Well, I added AddField and InsertField. Not sure I like all this too much, in part due to the side effect inheritance. Sure, GenTreeArgList did this automatically (and so do other GenTree derived classes). And I now realize that this happens even in LIR, where side effects are not supposed to be inherited. So I also added AddFieldLIR and InsertFieldLIR to deal with this case.
Opinions?
@sandreenko too
| GenTreeFieldList::Use(argx->gtOp.gtOp2, OFFSETOF__CORINFO_TypedReference__type, TYP_I_IMPL)); | ||
| fieldList->gtFlags |= | ||
| (fieldList->gtUses->GetNode()->gtFlags | fieldList->gtUses->GetNext()->GetNode()->gtFlags) & | ||
| GTF_ALL_EFFECT; |
There was a problem hiding this comment.
This flag inheritance was previously done by the GenTreeArgList constructor, and that's generally how it's done. It seems like it might be better to have an AddField method on GenTreeFieldList to take care of this.
| noway_assert(newArg->OperIs(GT_FIELD_LIST)); | ||
|
|
||
| // We need to propagate any GTF_ALL_EFFECT flags from the end of the list back to the beginning. | ||
| // This is verified in fgDebugCheckFlags(). |
There was a problem hiding this comment.
This comment could be updated; we only need to update the FIELD_LIST node, where previously we actually had to propagate the flags up.
|
|
||
| // First, insert the field node into the sorted list. | ||
| GenTreeFieldList* prev = nullptr; | ||
| for (GenTreeFieldList* cursor = head;; cursor = cursor->Rest()) |
There was a problem hiding this comment.
Sorting code was broken, it never updated prev. The reason why it worked is that the field list was already sorted by increasing offset so the bug here effectively turned this code into list reversal and we ended up getting the expected order. I don't see any reason to create field list with a different initial order so it seems preferable to just reverse the list and avoid the quadratic complexity concerns.
CarolEidt
left a comment
There was a problem hiding this comment.
This looks great now; thanks for the updates. Ping me when you're ready to remove the [WIP] and we can trigger the failing CI tests (for now, I believe there are still issues in the queue).
|
@CarolEidt Thanks, I've run diffs again (none) and done another pass checking for possible fallout from the change to Off to finish the #21711 monster 😁 |
PIN should work on x86, just run in from SysWOW64 cmd ("C:\Windows\SysWOW64\cmd.exe"), if it doesn't work then I will measure the PIN numbers for this change later today. |
|
@sandreenko AFAIR the problem I had with pin on x86 was that I couldn't get it to compile. If that worked for you I can try again one day. Thanks for the offer to run it! |
sandreenko
left a comment
There was a problem hiding this comment.
LGTM in general.
a few comments/questions.
PIN for x86 (release, no pgo, 5 runs average):
base: 22057891557,
diff: 22021891998,
improvement is 0.1%.
| use = use->GetNext(); | ||
| } | ||
|
|
||
| bool operator==(const UseIterator& other) |
There was a problem hiding this comment.
these operators should be const, shouldn't they?
There was a problem hiding this comment.
Though given that it took 2 retries to get CI clean it can probably wait until the next GT_LIST related PR.
| // | ||
| static bool Equals(GenTreeFieldList* list1, GenTreeFieldList* list2) | ||
| { | ||
| // TODO-Cleanup: The list type is actually ignored. It should always be TYP_STRUCT |
There was a problem hiding this comment.
what about gtFlags? I understand that it will be checked in the loop, but it could be a fast exit if they are not equal.
There was a problem hiding this comment.
Currently this is only called from Compare and that does its own flag check already. That check is kind of weird:
Lines 1248 to 1252 in f777668
Even if we want to be able to use Equals directly (IMO it would be a reasonable thing to do) it's not clear what flags should be checked.
|
|
||
| if (src->OperIs(GT_FIELD_LIST)) | ||
| { | ||
| #ifdef _TARGET_X86_ |
There was a problem hiding this comment.
Why the condition and #ifdef were switched?
so now x64 returns when it sees GT_FIELD_LIST in this method.
There was a problem hiding this comment.
Side effect of changing the field list type to TYP_STRUCT from TYP_INT or whatever happens to be the first field type.
With the original ifdef code was reaching an if (src->TypeGet() != TYP_STRUCT) that's below. Inside that if the only thing that happens is an attempt to contain constant operands so in the case of FIELD_LIST nothing really happened.
With the change to TYP_STRUCT, that if is no longer entered and instead some code that's expecting an OBJ is reached.
So it was always the case that FIELD_LIST was supposed to be ignored on x64 but the way this worked was a bit "by accident".
Thanks! It's interesting that PIN shows the same improvement as on x64. So it's very likely the improvements are indeed due to the removed field list checks rather than any field list handling code changes. I'm curious if we'll get anything more out of it when I remove the final |
* Stop using LIST nodes for FIELD_LIST * Change GT_FIELD_LIST type to TYP_STRUCT * Smaller GenTreeField::Use * Delete out of date comment * Cleanup LowerArg duplicated code * Add AddField/InsertField
Changes
GT_FIELD_LISTto be a special node with a list of "uses" for fields, similar toGT_PHI.Contributes to #19876
On x64 (which doesn't really use
GT_FIELD_LIST) PIN shows a 0.1% improvement. That's probably due to a few rationalizer checks that now only test forGT_LIST. It would probably be more interesting to see results for x86 but I couldn't get PIN to work on x86. On x86 there's a tiny drop in memory usage (0.03% or something like that).