Updating hardware intrinsics codegen to be table driven#15642
Updating hardware intrinsics codegen to be table driven#15642tannergooding wants to merge 3 commits into
Conversation
|
FYI. @fiigii, @mikedn, @CarolEidt. This was discussed on a few of the threads. This is similar to the format already used by |
| InstructionSet lookupHWIntrinsicISA(const char* className); | ||
| NamedIntrinsic lookupHWIntrinsic(const char* methodName, InstructionSet isa); | ||
| instruction insOfHWIntrinsic(NamedIntrinsic intrinsicID, var_types baseType); | ||
| ssize_t ivalOfHWIntrinsic(NamedIntrinsic intrinsicID); |
There was a problem hiding this comment.
It looks like these 2 functions should be static. Also, it's not clear why they're members of Compiler. At least insOfHWIntrinsic should be moved to CodeGen probably.
There was a problem hiding this comment.
Several of these methods could probably be static and could likely be not in the compiler.
I put them here for consistency with the other existing methods that are driven off the HWIntrinsicInfo table I extended.
There was a problem hiding this comment.
It looks like the rest of the methods are related to intrinsic importation so they naturally belong here (for the lack of a better place). Consistency is usually a good thing, when the existing code isn't a mess. Unfortunately for us, the JIT code is a mess and attempting to be consistent with it is not always a useful goal.
There was a problem hiding this comment.
Moved them to be static functions defined in hwintrinsiccodegenxarch.
| case NI_SSE2_Add: | ||
| { | ||
| op2Reg = op2->gtRegNum; | ||
| assert((baseType == TYP_BYTE) || (baseType == TYP_UBYTE) || |
There was a problem hiding this comment.
This check should be a side effect of insOfHWIntrinsic. If the type is not supported (e.g. TYP_REF) insOfHWIntrinsic should assert. If a specific intrinsic does not support a specific type then the corresponding instruction should be INS_invalid and that too should assert.
| break; | ||
| } | ||
| ins = compiler->insOfHWIntrinsic(intrinsicID, baseType); | ||
| op2Reg = op2->gtRegNum; |
There was a problem hiding this comment.
Seems like this op2Reg variable and the corresponding assignment is pointless.
There was a problem hiding this comment.
Not always - for Sse2.Insert there are 3 arguments Vector128, short/ushort, immediate.
Avx and above use almost always 3 args.
There was a problem hiding this comment.
I don't understand what that has to do with my comment.
There was a problem hiding this comment.
I can change it to pass op2Reg directly for the binary instructions.
| // Return Value: | ||
| // ival for the intrinsic | ||
| // | ||
| ssize_t Compiler::ivalOfHWIntrinsic(NamedIntrinsic intrinsicID) |
There was a problem hiding this comment.
In generating instructions - immediate bytes are a part of the instruction encoding
There was a problem hiding this comment.
I suspect that but I don't see it used anywhere in this PR. It's problematic to review code that it isn't used.
There was a problem hiding this comment.
There are 2, possibly 3, SxS PRs going on that will need to use this method. It exists so that we don't hit merge conflicts.
I can give examples and link to one of the existing PRs if that helps.
There was a problem hiding this comment.
I think the main question is if this belongs in Compiler or in CodeGen. I suspect that it will only be needed in CodeGen so...
There was a problem hiding this comment.
Working on an update to move it into CodeGen and make it static now.
| // SSE Intrinsics | ||
| HARDWARE_INTRINSIC(SSE_IsSupported, "get_IsSupported", SSE) | ||
| HARDWARE_INTRINSIC(SSE_Add, "Add", SSE) | ||
| // Intrinsic ID Function name ISA Instructions (flt, dbl, i8, i16, i32, i64) ival |
There was a problem hiding this comment.
IMO we should support 128 bit operands (some insertions and extractions work on them) and make a reserved value/values for future use (binary16 et al IEEE 754:2008)
There was a problem hiding this comment.
Additionally there are conversions between base types: explicit and implicit i.e. unpack../pack.. etc.
There was a problem hiding this comment.
Perhaps it could be useful to differentiate between mem/reg/imm operands already here - just a general idea.
There was a problem hiding this comment.
128-bit operands, if we have a way to emit constants for them (and I'm not sure if we do right now), can be handled based off a single ssize_t variable (all fields are duplicated).
I can clarify, but the instruction encoding is made to be keyed off the baseType, not the targetType.
I don't think differentiating between mem/reg/imm here is needed. The instruction emitted should be the same and it can be handled in codegen.
| { | ||
| assert(intrinsicID != NI_Illegal); | ||
| assert(intrinsicID > NI_HW_INTRINSIC_START && intrinsicID < NI_HW_INTRINSIC_END); | ||
| return hwIntrinsicInfoArray[intrinsicID - NI_HW_INTRINSIC_START - 1].ival; |
There was a problem hiding this comment.
note: I don't think we will have any ivals where the hardcoded value is -1, so I could probably assert that we are not returning -1.
There was a problem hiding this comment.
I think that all immediate values are 8 bit so it makes sense to use int -1 as a special value. BTW, it looks like ival type is ssize_t now, this seems unnecessary.
| // POPCNT Intrinsics | ||
| HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT) | ||
| HARDWARE_INTRINSIC(POPCNT_PopCount, "PopCount", POPCNT) | ||
| HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, -1) |
There was a problem hiding this comment.
Thanks for the work. I have a incomplete implementation in my local repo that I think it would be better to have a “category” field (e.g., simple intrinsic, helper, memory access, etc.).
There was a problem hiding this comment.
What is the category field useful for? Also, is it in addition to the instruction array?
There was a problem hiding this comment.
For example, simple intrinsics can be easily table driven, and r64 intrinsics should throw exceptions on x86, and helpers should be manually treated.
There was a problem hiding this comment.
Okay, so it's moving a bit of the switch statement to also be table driven, so instead of case add, case sub, etc, it would be case simple_intrin`?
There was a problem hiding this comment.
Would it end up being better to have these categorized as Helper, SIMD_R_R, SIMD_R_R_R, SIMD_R_R_R_I, etc?
There are some that are simple (such as the compare instructions) but that have a different emit path (R_R_R_I instead of R_R_R).
Each emit path will need to have fairly consistent containment analysis done, so it may be a better "categorization"/split for the jump table.
There was a problem hiding this comment.
I agree that a category field will be good.
However, after having implemented most of the SSE intrinsics and having started looking at the containment support, I am pretty sure having them categorized by "supported encoding" would be ideal.
So, instructions like "Add, And, AndNot, etc" should be categorized as "R, R/M" instructions. And instructions like movlhps should be categorized as "R, R" instructions.
With this type of categorization, we can still cut out most of the switch table by using switch (category) rather than switch (intrinsicID). However, we can also have a genHWIntrinsic<category> method which centralizes the operand handling for a particular category of instructions.
For example, genHWIntrinsicR_RM(instruction ins, GenTree* op1, GenTree* op2) (used by Add, And, AndNot, etc) will know that op1 cannot be contained, but op2 can optionally be contained and call emitIns_R_R_R or emitIns_R_R_A as appropriate. Likewise, genHWIntrinsic_R_R (used by MoveHighToLow and MoveLowToHigh) will know that neither op can be contained and can skip those checks entirely.
We can also use the category check in lowerxarch to mark nodes as contained or registers as optional.
Thoughts?
|
I've added the SSE intrinsics to the list as part of this PR. It gives a better example of how this is used across a range of instructions. |
|
I took @fiigii's suggestion to add a category field and played with it a bit. I have a prototype here: https://github.com/tannergooding/coreclr/commits/hwintrin-category (single commit). I found that having the category based on the method signature and target codegen works well as it allows you to have consistent code paths and operand handling in order to cut down on duplicate code. For example: However,
There might be other opportunities to combine some categories and cut down duplicated code further, but we can come to that as things continue to get implemented. |
|
I have a branch (https://github.com/tannergooding/coreclr/tree/hwintrin-instrlookup-sse) which merges this PR with #15538 (the https://github.com/tannergooding/coreclr/blob/hwintrin-instrlookup-sse/src/jit/hwintrinsiccodegenxarch.cpp#L173 shows just how much duplicated code gets cut out with the table based approach (and it could probably be improved more). |
|
@tannergooding Thanks for the work. Our code is slightly different. I will upload my branch once I get back. |
This just updates the HardwareIntrinsicInfo struct to also carry the instruction per base type and an ival per instruction.
This makes it really easy to remove the duplicated code for most basic instructions.
It isn't quite a constant-time lookup, due to multiple overloads mapping to the same intrinsic, but it should still be much improved.