[skip ci] Update first-class-structs doc#26416
Conversation
|
@dotnet/jit-contrib @mikedn @tannergooding PTAL |
| Normalizing Struct Types | ||
| ------------------------ | ||
| We would like to facilitate full enregistration of structs with the following properties: | ||
| 1. Its fields are infrequently accessed, and |
There was a problem hiding this comment.
I'm skeptical that enregistering structs only when fields are infrequently accessed is going to provide a lot of value. If fields aren't accessed then it all mostly resumes to struct copying. So that
x = f.a;
y = f.a;with CSE and enregistration would hopefully generate just
mov eax, [rsi+8]
mov ebx, eaxNice but I'm not sure how common such scenarios are.
Ultimately we'll probably need to consider the possibility of LCL_FLD working on enregistered struct by generating appropriate shift and masking code.
There was a problem hiding this comment.
The issue arises when a struct is only passed or returned in registers. We currently don't have a mechanism for keeping such values in a register, unless they're enregisterable structs (i.e. SIMD* today) or have a single field of the same type as the register.
There was a problem hiding this comment.
Ah, right, that case is really ugly today. Speaking of struct return, I also have concerns about the general case where the struct is returned via a buffer and this transform is done pretty early in phase ordering. My initial impression is that it would be better to postpone that to lowering.
There was a problem hiding this comment.
That makes sense, though I confess that I've less experience in making changes to the handling of struct returns.
|
|
||
| We propose to create the following additional types to be used where struct types of the given size are | ||
| passed and/or returned in registers: | ||
| * `TYP_STRUCT1`, `TYP_STRUCT2`, `TYP_STRUCT4`, `TYP_STRUCT8` (on 64-bit systems) |
There was a problem hiding this comment.
I'm not sure that adding more struct types is a good thing. If anything, there are places in the JIT that have tables indexed by var_types and those tables will grow bigger. In particular, VN has AFAIR a bidimensional array where one of the dimensions is var_types and it's largish enough that adding/remove a type registers in the memory allocation stats.
I'm also not sure how these would interact with ClassLayout - would we need layout for a node having TYP_STRUCT2? I'd say yes because discarding layout/handle information has occassionally proved to be problematic. But then we end up with duplicate info - the size is encoded in both node's type and layout.
There was a problem hiding this comment.
What about TYP_STRUCT16?
The System V ABI defines __int128 as an ABI primitive, and while it is passed in register as if it were struct __int128 { int64_t low, high; };, it has additional packing/alignment requirements (much like TYP_SIMD16 and TYP_SIMD32).
There was a problem hiding this comment.
I would envision that these types would be handled, going forward, in the same way that SIMD types are today. Today you can have multiple TYP_SIMD16 values with different class handles. The specialized struct types would only imply size and register file.
It may be the case that this kind of thing could be deferred to register allocation, and dealt with without adding new types, but I think that would add complexity.
There was a problem hiding this comment.
I suspect that __int128 would be best handled as a special case, since, unlike the other special structs it would not be referenced as a single entity, adding, I think, undue complexity in backend components.
There was a problem hiding this comment.
Struct with specific alignment requirements - I guess we can simply add alignment info to ClassLayout :). Well, adding more info to it would make it bigger but the typical number of layouts a method needs is usually very low. And even if it's not it's still likely to be better than querying the VM multiple times or storing the same info in multiple places (e.g. in fgArgTabEntry objects).
There was a problem hiding this comment.
I've checked the memory usage issue I remembered about - it's there. Simply adding these 4 types to the list results in a 0.5% regression in memory usage, mostly thanks to VN. Perhaps there's a way to avoid that but...
At least on Linux x64, more structs can be returned in registers than on Windows x64 - a 3 byte struct is returned in eax and passed in, say, edi.
More generally, IMO it should be possible to enregister any structure small enough to fit in a register. You never known when it could be useful (think that for example on ARM you have ubfx/sbfx instructions which makes accessing fields of an enregistered structs cheap compared to x64 where you typically need 2 instructions to shift and zero/sign extended).
As such I don't think specific size TYP_STRUCT types help. It's not going to be very practical to later add even more TYP_STRUCT3, TYP_STRUCT5 and so on.
I'll have to take a closer look at LSRA code but my expectancy is that for a OBJ node of suitable size the ref position/interval building code simply builds a ref position that asks for a suitable GPR (or why not, XMM) register and that's all there is to it.
I noticed that LSRA's interval has a RegisterType registerType where RegisterType is actually var_types. I find that a bit strange considering that ultimately there are just GPR and XMM registers and the finer distinction between types (TYP_SHORT vs. TYP_USHORT) is irrelevant to LSRA. Even requesting a byte sized registers on x86 isn't done by using TYP_BYTE but rather by restricting the register candidate set to those that support byte access.
And the funny thing is that LSRA too contributes to the memory regression that occurs when adding more types. It too has arrays indexed by type (e.g. maxSpill) even though only a couple of entries are actually used.
|
|
||
| These struct-typed nodes are created by the importer, but transformed in morph, and so are not | ||
| encountered by most phases of the JIT: | ||
| * `GT_INDEX`: This is transformed to a `GT_IND` |
There was a problem hiding this comment.
It may be good to mention somewhere the horrible complexity associated with array of structs element access - the IND must be preserved due to the array info map where it acts as a key. And then the whole element access tree tends to end up wrapped in OBJ(ADDR(...)). FWIW I'd love to get rid of that. At a minimum we should be able to change the IND to OBJ and avoid wrapping. We should also be able to remove the class handle from the array info map and instead used the one provided by OBJ.
There was a problem hiding this comment.
I'll add that, though it's been a while since I've run into that complexity.
| * Additional child `gtDynamicSize` | ||
| * Note that these aren't really struct types; they represent dynamically sized blocks | ||
| of arbitrary data. | ||
| * For `GT_LCL_VAR` nodes, the shape information is obtained via the `LclVarDsc` |
There was a problem hiding this comment.
Mention LCL_FLD as well? ClassLayout was started with the goal of allowing LCL_FLD struct typed nodes.
There was a problem hiding this comment.
Right. That was an accidental omission.
| * `GT_IND`: in this case the LHS is expected to provide the struct handle | ||
| * Proposed: `GT_IND` would no longer be used for struct types | ||
| * `GT_CALL` | ||
| * `GT_RET_EXPR` |
There was a problem hiding this comment.
Oops. It should probably be GTK_NOTLIR
There was a problem hiding this comment.
Yes, and GTK_NOTMIR if we had such a thing. Lacking that we should probably assert that we don't see it in global morph, it's supposed to be gone after inlining.
There was a problem hiding this comment.
And apparently the case GT_RET_EXPR in gtGetStructHandleIfPresent isn't reached, though it's hard to be 100% sure as there are quite a few callers of gtGetStructHandleIfPresent.
| * `GT_RET_EXPR` | ||
| * `GT_LCL_VAR` | ||
| * `GT_LCL_FLD` | ||
| * Proposed: These would only be used for actual fields. |
There was a problem hiding this comment.
I'm going to disagree about the "actual fields" part. I don't like field sequences too much but recently I played enough with them and concluded that they work well enough at least for now. That includes NotAField.
Another thing I'm chasing is the removal of IsLocalAddrExpression checks. They're horrible in many ways, including blocking the removal of ASG nodes. And for that I need that tracked/SSA variables are only those that are always accessed via LCL_VAR and LCL_FLD nodes, any indirect access would leave the variable address exposed. To minimize such circumstances I need LCL_FLD to be flexible and handle all kind of odd scenarios. From what I've seen so far it can do that just fine.
There was a problem hiding this comment.
Right. I think the statement is too strong. What I'm mostly interested in is eliminating the use of GT_LCL_FLD in situations where it is referring to the entire struct, but by a different type (e.g. as the LHS of an assignment, or as an argument to a call). I hope that we can get to the point where, at least prior to Lowering, any full reference to a local struct is a GT_LCL_VAR.
There was a problem hiding this comment.
Right, the case of accessing an entire var using LCL_FLD is a bit more weird. Perhaps we can wrap the LCL_VAR in a BITCAST or other similar, new node to signal reinterpretation. Or perhaps just allow some level of type mismatching to occur in IR, though that would probably require a bit more care. In any case, I would like to have a way that avoids the use of indirections to perform reinterpretation.
There was a problem hiding this comment.
Yes, the idea would be that the cases where we currently use LCL_FLD would be either
- handled in
LoweringviaBITCASTwhen they are used to load a struct into a register, or - handled in codegen by deferring the handling of register-sized copies
|
|
||
| * Importer | ||
| * Vector types are normalized to the appropriate `TYP_SIMD*` type. Other struct nodes have `TYP_STRUCT`. | ||
| * Proposed: all struct-valued nodes that are created with a class handle will retain an index into |
There was a problem hiding this comment.
Small nit - the ClassLayout index is only needed when there are space constraints (e.g. there's no room in LclFld for a new pointer but we can fit in a uint16_t index. It can also be useful in VN. Otherwise using a ClassLayout pointer is preferrable.
| * Proposed: This transformation would be made even if it is passed in non-matching | ||
| registers. The necessary transformations for correct code generation would be | ||
| made in `Lowering`. | ||
| * If it is passed by reference, a copy is made, and that copy is passed according to |
There was a problem hiding this comment.
I'm wondering if copies required by implicit by ref parameters shouldn't be done in lowering. It's not clear if there's any advtange in doing those copies earlier as the required temps are address exposed and thus ignored by optimizer for most purposes. Perhaps we can insert these copies after the liveness pass before LSRA - the last use information in produces should allow the elision of some copies.
There was a problem hiding this comment.
That makes sense. I'll add it.
| * Transform struct values that are passed to or returned from calls by creating one or more of the following: | ||
| * `GT_FIELD_LIST` of `GT_LCL_FLD` when the struct is non-enregisterable and is passed in multiple | ||
| registers. | ||
| * `GT_BITCAST` when a promoted floating point field of a single-field struct is passed in an integer register. |
There was a problem hiding this comment.
BITCAST nodes could also be inserted during morph, not clear if it's useful. I suppose some could be CSE candidates.
There was a problem hiding this comment.
I don't think they would be all that useful as CSE candidates, though I suppose it could potentially eliminate a move between register files. I'd propose that we leave this as proposed, to avoid moving BITCAST nodes up in the phase ordering. We can change that later if it looks worthwhile.
There was a problem hiding this comment.
In genral, BITCAST nodes should anyway move up in phase ordering. I already have a change that produces such node early, to avoid blocking variable enregistration due to float/int reinterpretation and it does show some CQ improvements. The only problem I've yet to solve is decomposition of the double/long case on 32 bit targets.
There was a problem hiding this comment.
I'm OK with having them appear early, but I'd still prefer that we defer any code that's purely ABI-related to Lowering.
| * Fully promote structs of pointer size or less that have fewer field references than calls | ||
| or returns. | ||
| * Investigate whether it would be useful to re-type single-field structs, rather than creating new lclVars. | ||
| This would complicate type analysis when copied, passed or returned, but would avoid unnecessarily expanding |
There was a problem hiding this comment.
For passing IMO we should see if we can allow some sort of implicit reinterpretation to occur in IR (e.g. pass TYP_INT to single field struct typed call arg node). Of course, that means that the call arg should no longer attempt to recover the handle from the arg tree but rather keep track of the original type from the call signature. ClassLayout + no-GT_LIST based experiment here: 317ca47
There was a problem hiding this comment.
I like that idea, though I suspect it would be a fair bit of work to tease out all the dependencies on the args retaining their struct types.
On a related note, the HFA queries seem to result in a lot of JIT/EE traffic when HFA types are in use. It might be nice to cache that information.
There was a problem hiding this comment.
I don't see any problem to include HFA related info in ClassLayout. In fact the commit mentioned above does just that as an experiment. It doesn't actually cache anything right now but it did move HFA related queries to ClassLayout just to get an idea how it may look.
|
|
||
| ### <a name="Block-Assignments"/>Improve and Simplify Block Assignment Morphing | ||
|
|
||
| * `fgMorphOneAsgBlockOp` should probably be eliminated, and its functionality either moved to |
There was a problem hiding this comment.
Tried. Main issue is that the current codegen for unrolled block ops is kind of poor. Codegen fix mostly done here: #21711
| * The cache includes entries for opaque "struct" values of a given size, as well as for | ||
| structs with known class handles | ||
|
|
||
| ### Defer ABI-specific transformations to `Lowering`. |
There was a problem hiding this comment.
Any thoughts about getting in my removal of GT_LIST node from calls and field lists before doing any more significant work in call morphing and start getting conflicts left and right? 😁 It's nearly done (e.g #26392)
There was a problem hiding this comment.
I'll work on reviewing that today. It's another big one ;-)
There was a problem hiding this comment.
Hmm, might be a bit too early to do that. I'll have to remove a ton of commits because only the first 10 or so are relevant. The rest are experiments that I wanted to pass through CI testing. Also, the ARM part is missing (I have it locally but it's buggy ATM).
There was a problem hiding this comment.
OK, then I'll have a quick look and you can ping me when you think it's ready for an actual review.
There was a problem hiding this comment.
Ah, sorry about the confusion. #26392 is actually fine, I was thinking about a different PR that needs cleanup.
|
I'm going to merge this, further changes can be made as needed. |
Update first-class-structs doc Commit migrated from dotnet/coreclr@930a7b9
No description provided.