-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Don't retype struct as primitive types in import. #33225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
96ff2ec to
6cb6b6c
Compare
|
PTAL @CarolEidt @dotnet/jit-contrib, I think that is ready for the first round of review. |
f7875df to
2660e14
Compare
CarolEidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
It's awesome to remove the retyping from the front-end, but I think that in Lowering when we have something that's returned in a single register, we should go ahead and retype. I think it would simplify things and would not lose information that the backend would need.
src/coreclr/src/jit/lower.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see us, perhaps in future, use BITCAST only where we actually require that the bits get moved to a different register file. We should be able to handle those cases much like the way multireg returns are handled. In fact, it's not clear why we need separate handling for that.
25f7779 to
2e00f95
Compare
|
I think this is ready for the second round, maybe expect No diffs if but the biggest part of it due to I have checked SPMI/crossgen/pmi locally. I will kick appropriate CI testing this night. |
src/coreclr/src/jit/gentree.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not happy with that solution, but the cost of having fieldSeq with information about overlapping fields was too high, both in terms of TP (checking the new flag in each access to it), in terms of memory consumption (make it 24 bytes instead of 16) and in implementation cost(there are dozens of places where we check for NotAField, many of them are very old and it was unclear how they should work with the new type.
Also, we don't often have overlapping fields, so it felt like doing I was spending too much on such a rare case. The frameworks showed -12 bytes of improvements from my fix that was an awful regression in TP and memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses a class and a struct to do the same logic, guess what case we optimize better. I should probably create an issue about that, but again, overlapping fields are rare.
Hm, I think we are thinking about two different possible representations, with these changes we have the following new nodes in lowering: I want to keep this struct representation visible even after lower, so for Another approach would be to produce the old LIR after lowering, meaning replace all such |
|
ping @dotnet/jit-contrib |
CarolEidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still just a bit apprehensive about using GT_BITCAST for structs, but I think the idea is growing on me.
I'd like to better understand why the elimination of retyping is tied to FEATURE_MULTIREG_RET but otherwise it loosk good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will disable this for x64/windows and not for any other targets - is that due to the regressions you described in your PR comments? I'm not sure why this would be tied to FEATURE_MULTIREG_RET.
Also, sorry for the focus on naming - but I think it might be good to give this a name that more clearly reflects that it's the "old" or "bad" way. "Allow" sounds too nice. I always described this as "lying about the types" but maybe that's just a bit too negative. Maybe "JitDoOldStructRetyping"? Or is that too verbose and/or negative?
2e00f95 to
c215a49
Compare
c215a49 to
e131051
Compare
|
How will this impact |
I'll let @sandreenko reply to the specific question of how this work impacts those types (I think it's largely agnostic to those types). In future, however, now that we have a |
The change is agnostic to |
e131051 to
6cc7061
Compare
|
Ok, the tests are green now, there were a few new failures due to tail call changes. The PR is ready for review. @AndyAyersMS of course, I will repeat analysis for the diffs and post results here. |
CarolEidt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly comments and some non-blocking suggestions or questions.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new name - thanks :-)
src/coreclr/src/jit/gentree.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this TODO
src/coreclr/src/jit/lower.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit unfortunate that we have to transform the call's user. I wonder whether it would be feasible and reasonable to handle these when lowering the user. I think this is fine for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we have agreed to do that in the user lowering but when I did so I saw that:
- we have to process call in
LowerCallwhen it is unused, so compilation time is spent to find a user in any case; - the processing goes into 2 functions (in addition to
LowerCall)LowerBlockStore, LowerStoreIndirand they are platform-specific, so the logic to find these cases should be duplicated for different platforms; - it takes more code and operations to find these cases from the user, because each struct call has its user that should be modified, but not all struct stores have such calls as they src.
so after a while, I changed it to process it in one place, I think it is more visible.
6cc7061 to
342b12d
Compare
|
@AndyAyersMS the first type, -580 (-40.93% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable`1This is the first type of improvements. where optimizations of both Then we apply CSE and propagation optimizations that benefit from asm after: the second type, -203 (-20.86% of base) : Newtonsoft.Json.dasm - DefaultContractResolver:CreatePropertyFromConstructorParameter(JsonProperty,ParameterInfo):JsonProperty:thisThis is the second type of improvements, that was the initial motivation for that work (from SIMD bench). We are inlining a method that returns a small struct: and before we were transforming it to: that was blocking after: the third type, -13 (-81.25% of base) : System.Private.CoreLib.dasm - ValueTuple:Create():ValueTupleThat is a funny one, for IL like: we were generating now it is (copy propagation for structs was unblocked in one of the preparation PRs). we now have when we create a After #34105 and #11413 are fixed we will probably see some other patterns as well instead of the current regressions. |
There are 2 issues that prevent it from being enbaled by default. They are causing significant asm regressions.
e06016f to
169ce1a
Compare
@sandreenko looks good. I would expect |
yes, all returns and calls that return structs in a register should benefit. I have added |
This change adds
COMPlus_JitNoStructRetypingthat prevents struct retyping forcall structandreturn structcases. Currently, this retyping is happening in importer, we want to move it to lower.The current retyping forbids later phases to do optimization with these values, for example, it affects code for inlined methods
if we inline
foo()we won't be able to promote or enregister fields ofnativeSizeStructbecause importer would access it asLCL_FIELD longand setdoNotEnreginimpFixupCallStructReturn.Why can't we do that right after inlining? Because other optimizations also need to know real types, for example, CSE and VN can't propagate a sequence like that:
but can do it if we don't do retyping.
Notes:
Note1: right after inlining is the right time to do retyping for methods that use return buffer, in that PR this is left in importer.Note2: that change doesn't fix retyping in
impNormStructVal, for cases likemethodWithStructArgs(foo(), foo())we currently always create a local var for each struct local argument and retype it as a primitive type. That would be fixed in a separate PR.Note3: that change doesn't fix retyping for limited struct promotion, when for a struct like:
struct PromotedStruct
{
StructWithOneField a; <- we will change the type of this field from struct to int, because we don't have recursive struct promotion.
int b;
}
struct StructWithOneField
{
int a;
}
but that change will get us closer to it.
So the phase in which we should retype struct types into ABI specific types is lowering. See details in https://github.com/dotnet/runtime/blob/master/docs/design/features/first-class-structs.md and #1231.
The PR is done to have no diffs if
COMPlus_JitNoStructRetyping=0, it helps to catch unwanted side-effects and makes merging safer. I have checked that there are no diffs on all windows platforms using altjit and framework assemblies. The change with the flag enabled was tested on SPMI collections, crossgen of framework libraries, and in a pri1 test run.The overall design is simple: keep structs as struct until lowering, then do retyping for returns and calls, but insert
BITCASTback to struct types to keep IR correct. Then teach the next phases (lsra, codegen) to work with the new struct nodes.New cases of struct nodes force us to have struct handle on all trees (except the right side of ASG), so
gtGetStructHandleIfPresentbecomes more important.Some initial diffs:
for the benchmark that motivated this change:
some other benchmarks are also winning, but not that significant:
code size changes for my small
StructABI\structreturn.cstest, improvements happen when we inline the constructor method:Overall, right now, it is a regression, I will start fixing them in the next change. Maybe I will push them to this PR or merge this PR with the flag disabled and fix the regressions in the next.