From 960667542c6f30d04193a4ef34d454a20709ce3d Mon Sep 17 00:00:00 2001 From: Carol Eidt Date: Fri, 12 Jan 2018 18:00:05 -0800 Subject: [PATCH 1/3] Update intrinsics doc based on recent discussions --- accepted/platform-intrinsics.md | 68 ++++++++++++++++++++++++++------- 1 file changed, 54 insertions(+), 14 deletions(-) diff --git a/accepted/platform-intrinsics.md b/accepted/platform-intrinsics.md index 018bf00a4..b6f924aa4 100644 --- a/accepted/platform-intrinsics.md +++ b/accepted/platform-intrinsics.md @@ -14,17 +14,16 @@ consistent, imposes a performance penalty unacceptable to implementors seeking maximum app/service throughput. To enable this last bit of performance improvement dotnet defines platform dependent intrinsics. This allows platform/hardware providers to define low level intrinsics that map to -particular hardware features that are not consistent - i.e. target dependent. +particular hardware features that are not available on all platforms. This document outlines the process for proposing and implementing these intrinsics. This process is intended to be open and can be initiated and implemented by any contributor or partner. ## Guidelines for Platform Dependent Intrinsics -1. A platform intrinsic should expose a specific feature or semantic that is -not in common with other platforms or hardware implementations. +1. A platform intrinsic should expose a specific feature or semantic that is not universally available on all platforms, and for which the implemented function is not easily recognizable by the JIT. * If the functionality is common and performant - make it platform - independent. + independent. * If instead the semantic is platform dependent, or there is a platform dependent high performance implementation, then there is a clear argument for making an assuming that the next point holds. @@ -32,12 +31,13 @@ not in common with other platforms or hardware implementations. user problem. Stated another way, platform intrinsics add complexity to the runtime implementation and language stack so they should help a concrete user scenario. + * If a set of intrinsics are logically associated (e.g. a vector intrinsic that operates over multiple base types), those intrinsics should generally be implemented together, even if only a subset have compelling impact. 3. A platform independent way of determining whether the current executing platform supports dependent functionality needs to be included. Users need -to be able to easily check for hardware acceleration. +to be able to easily check for hardware acceleration. This is accomplished by grouping into a single class all the instrutions that are exposed as a group by the platform (e.g. `SSE4.2`), and providing an `IsSupported` property to indicate that they are available. 4. Executing platform dependent APIs on a non supporting platform may result in a System.PlatformNotSupportedException or invalid instruction fault. Fallback -implementations maybe provided but are not required. +implementations may be provided but are not required, except as needed to support diagnostic tools (e.g. via reflection). See [Supporting Diagnostic Tools](#supporting-diagnostic-tools) ### Example: @@ -77,7 +77,7 @@ forward. ## Process -1. Design the API in the `System.Runtime.CompilerServices.Intrinsics` namespace. +1. Design the API in the `System.Runtime.Intrinsics` namespace. Take care to try and reuse what you can from the current system and ensure that other platforms can implement their functionality as well. 2. Open an issue in CoreFX for @@ -96,30 +96,70 @@ implementations needs to be available. As a consequence of this there can be multiple ways to implement functionality so careful consideration of the trade-offs needs to be made. +### Commonality across platforms + +Platform intrinsics are intended to provide a 1-to-1 mapping to instructions on the target platform. While many of these will be highly platform-specific, some functionality is likely to be common across platforms. It is desirable to use consistent names and signatures where possible. + +## Supporting Generic Intrinsics + +Many platform intrinsics support vectors over all primitive numeric types, in which case a generic API is the clear choice. However, some intrinsics are only available for a limited set of types, or the different element types are not all provided in the same platform class. In this case, there is no clear choice between offering a generic API and "exploding" the specific supported types. Some examples: +- On Intel architecture, there are intrinsics that are only available on float vectors in earlier classes, but are offered for a broader range of types in later classes (e.g. Add over 256-bit vectors on AVX vs. AVX2). For these, it makes sense to explicitly declare the supported types. +- ARM64 intrinsics include comparisons against zero, but they do not support unsigned types. For these, a trivial JIT expansion could be used to enable all the primitive types to be supported. + +In general, it is preferable for a particular instantiated type to not be visible in the API, rather than for the use of an unsupported type to result in a runtime error. + +## Versioning and Partial Implementation + +The platform intrinsics will be versioned with the runtime. New intrinsics can be added within an existing platform class, but all intrinsics declared in the platform class must be implemented in the JIT at the same time that it is exposed in the library. + +## Optionality of Expansion + +In general, it is expected that platform intrinsics will be expanded regardless of optimization level. This differs from the handling of other kinds of intrinsics primarily due to the fact that the performance difference between direct execution and fallback (even simply introducing a call) can be quite high, along with the fact that the JIT uses the non-optimizing path for methods that are very large. The combination presents an undesirable performance cliff. + +## Supporting Diagnostic Tools + +The implementation of the intrinsic must enable indirect invocation (e.g. via diagnostic tools or reflection). This is accomplished as follows: + * The source implementation (e.g. in C#) first checks whether the ISA Class is supported, and throws `PlatformNotSupportedException` if not. Otherwise, it performs any necessary checks on the arguments. Finally, if all checks pass, it invokes itself recursively. This call is recursive, and will be expanded by the JIT. + * When the JIT encounters an intrinsic that it recognizes, it checks the constraints that it requires to generate the simple expansion. This may include: + * Checking whether one or more operands is an immediate. This is required for some instructions, such as shuffle, that require an immediate and have no corresponding register form. + * Checking whether the generic type parameters, if any, match those + supported for the current architecture. See [Supporting Generic Intrinsics](#suporting-generic-intrinsics) above. + * If the intrinsic fails to meet the necessary criteria: + * If the method is recursive, i.e. `gtIsRecursiveCall()` returns true: + * If the constraint is one that the JIT must expand in order to support reflection and diagnostic tools (currently only the immediate operand case), it will produce an expanded implementation (e.g. a switchtable of possible immediate values). + * Otherwise, the JIT throws a `BADCODE` exception. Note that this case should not occur, as the recursive IL implementation should have checked all contraints other than those the JIT is expected to expand. + * Otherwise (the method is not recursive), the JIT generates a regular call. This will cause the IL implementation to be invoked, and will trigger the recursive case above if it has not yet been JIT'd. + * Otherwise the intrinsic will be expanded in its default simple form. + +If the intrinsic is supported only for the constant case, and the argument passed is not a constant, then it should be treated as a regular call. This will then cause the IL implementation to be JIT'd (if it has not already been). Then we will land in the "we are recursive" case and the JIT will generate the switch statement. +If the intrinsic is not supported for some other reason, e.g. for the given generic base type, again it should be treated as a regular call. However, in this case, the IL implementation, in addition to being recursive, should check for a valid base type and throw if it is not. +If the JIT is unable to optimize away the intrinsic in the case where the base type doesn't match, then it should assert in Checked/Debug mode, and either generate the throw, or (simpler to implement) throw a BAD_CODE exception to the VM. +I'm not sure if there are other preconditions to check for, but if so it would be desirable if they could fit into the same "treat it as a regular call unless it is recursive". + + ### FAQ: Q: What type and namespace should be used? -A: System.Runtime.CompilerServices.Intrinsics is the top namespace, but below that +A: System.Runtime.Intrinsics is the top namespace, but below that we would expect a breakout based on architecture and platform. Care should be taken to avoid collisions - an example might be an architecture like ARM, where there are multiple licensees - with the best recommendation being getting feedback from the community early. -Q: What should the method/parameters be named +Q: What should the method/parameters be named? A: Simple and clear is the best bet but that doesn't always help. There are a lot of cases where there is prior art in C++ for these intrinsics so having something that parallels that implementation (maybe a bit more clearly named) can make the intrinsics easier to use. That being said developers should try and follow regular C# naming conventions and choose names that -indicate the semantic usage. +indicate the semantic usage. In addition, if an equivalent intrinsic has been defined for another platform, *with the same functionality*, the naming should be consistent. Q: Is a software fallback implementation allowed? (Discussed above a bit) -A: Fallback is allowed but not required. For very low level +A: Fallback is allowed but not required, except in the immediate operand as described above. For very low level implementations a fallback could even be misleading. -Q: How are immediate operands handled? -A: We're planning to add support for this through Roslyn but haven't -settled on an implementation yet. Stay tuned. +Q: If C# adds support for a const qualifier, will this eliminate the need for switchtable expansion in the JIT? +A: Probably not. While it would improve the usability. Q: How are overloads handled? A: Overloads are allowed but expected to be rare. Lots of intrinsics are From f86b6718adc62a2f56defe600632c99a233f8a07 Mon Sep 17 00:00:00 2001 From: Carol Eidt Date: Tue, 20 Mar 2018 18:49:51 -0700 Subject: [PATCH 2/3] Update platform-intrinsics.md --- accepted/platform-intrinsics.md | 127 +++++++++++++++++++++----------- 1 file changed, 84 insertions(+), 43 deletions(-) diff --git a/accepted/platform-intrinsics.md b/accepted/platform-intrinsics.md index b6f924aa4..ba6e038d0 100644 --- a/accepted/platform-intrinsics.md +++ b/accepted/platform-intrinsics.md @@ -7,7 +7,7 @@ performance on different platform/hardware combinations. This consistency is one of the key selling points of the dotnet stack, but for some high end apps platform specific functionality needs to be available to achieve peak performance. Examples of this are particular hardware accelerated encoders -like crc32 on intel, or the particulars of the NEON SIMD instructions. At the +like crc32 on Intel, or the particulars of the NEON SIMD instructions. At the low level these are not consistent across platforms so a general functionality is not possible, and a higher-level, more abstract implementation, while consistent, imposes a performance penalty unacceptable to implementors seeking @@ -24,6 +24,10 @@ implemented by any contributor or partner. 1. A platform intrinsic should expose a specific feature or semantic that is not universally available on all platforms, and for which the implemented function is not easily recognizable by the JIT. * If the functionality is common and performant - make it platform independent. + * Note that, in some cases, there may be intrinsics that are broadly available, but + which differ enough that providing only a platform-independent API constrains its use. + In these cases, it may be best to provide both: a platform-independent API that + calls the platform-dependent implementation that provides full flexibility. * If instead the semantic is platform dependent, or there is a platform dependent high performance implementation, then there is a clear argument for making an assuming that the next point holds. @@ -31,50 +35,84 @@ implemented by any contributor or partner. user problem. Stated another way, platform intrinsics add complexity to the runtime implementation and language stack so they should help a concrete user scenario. - * If a set of intrinsics are logically associated (e.g. a vector intrinsic that operates over multiple base types), those intrinsics should generally be implemented together, even if only a subset have compelling impact. + * If a set of intrinsics are logically associated (e.g. a vector intrinsic that operates over multiple base types), those intrinsics should generally be implemented together, time and resources permitting, even if only a subset have compelling impact. 3. A platform independent way of determining whether the current executing platform supports dependent functionality needs to be included. Users need to be able to easily check for hardware acceleration. This is accomplished by grouping into a single class all the instrutions that are exposed as a group by the platform (e.g. `SSE4.2`), and providing an `IsSupported` property to indicate that they are available. -4. Executing platform dependent APIs on a non supporting platform may result in -a System.PlatformNotSupportedException or invalid instruction fault. Fallback -implementations may be provided but are not required, except as needed to support diagnostic tools (e.g. via reflection). See [Supporting Diagnostic Tools](#supporting-diagnostic-tools) +4. Executing platform dependent APIs on a non supporting platform will result in a System.PlatformNotSupportedException. Fallback +implementations are provided only to support diagnostic tools (e.g. via reflection). See [Supporting Diagnostic Tools](#supporting-diagnostic-tools) +5. The C# implementation of a platform intrinsic is recursive. When a call to such an intrinsic is originally encountered, if it +doesn't meet the constraints for directly generating the target instruction, the JIT will not expand the intrinsic. When the JIT +is called to compile the recursive method, the VM will tell the compiler that it `mustExpand` the recursive intrinsic call, +in which case the JIT will either generate a full expansion (in the case of a non-constant argument for an instruction that +requires an immediate), or generate the code to throw the appropriate exception. + +### Usage + +It is expected that the platform intrinsics will be used by developers for whom runtime performance is critical, and who will +carefully analyze their usage to ensure that they are meeting their requirements. + +The following exceptions may be thrown by platform intrinsics: +* `PlatformNotSupportedException` - when the intrinsic is not supported on the current hardware. +* `TypeNotSupportedException` - when the intrinsic is instantiated with a type parameter that is not supported on the current hardware. +* `ArgumentRangeException` - when an argument of the intrinsic (e.g. an immediate) is out of the expected range of values. ### Example: -On the intel platform there is a built in CRC32 implementation that the below -example would expose for use in C#. At high-level intrinsics are methods of -static classes that are marked with the [Intrinsic] attribute. +On the Intel platform there is a built in CRC32 implementation that the below +example would expose for use in C#. +This code resides in the coreclr/src/mscorlib/src/System/Runtime/Intrinsics/X86 directory. +The first snippet shows a partial implementation for the Intel (X86) platform. +The second snippet shows a partial implementation for other platforms, for which the +JIT need not recognize these as unsupported intrinsics. ```csharp // SSE42.cs -namespace System.Runtime.CompilerServices.Intrinsics.X86 +namespace System.Runtime.Intrinsics.X86 { public static class SSE42 { - public static bool IsSupported() { throw new NotImplementedException(); } - - // unsigned int _mm_crc32_u8 (unsigned int crc, unsigned char v) - [Intrinsic] - public static uint Crc32(uint crc, byte data) { throw new NotImplementedException(); } - // unsigned int _mm_crc32_u16 (unsigned int crc, unsigned short v) - [Intrinsic] - public static uint Crc32(uint crc, ushort data) { throw new NotImplementedException(); } - // unsigned int _mm_crc32_u32 (unsigned int crc, unsigned int v) - [Intrinsic] - public static uint Crc32(uint crc, uint data) { throw new NotImplementedException(); } - // unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v) - [Intrinsic] - public static ulong Crc32(ulong crc, ulong data) { throw new NotImplementedException(); } - - ...... + public static bool IsSupported { get => IsSupported; } + ... + /// + /// unsigned int _mm_crc32_u8 (unsigned int crc, unsigned char v) + /// CRC32 reg, reg/m8 + /// + public static uint Crc32(uint crc, byte data) => Crc32(crc, data); + /// + /// unsigned int _mm_crc32_u16 (unsigned int crc, unsigned short v) + /// CRC32 reg, reg/m16 + /// + public static uint Crc32(uint crc, ushort data) => Crc32(crc, data); + /// + /// unsigned int _mm_crc32_u32 (unsigned int crc, unsigned int v) + /// CRC32 reg, reg/m32 + /// + public static uint Crc32(uint crc, uint data) => Crc32(crc, data); + /// + /// unsigned __int64 _mm_crc32_u64 (unsigned __int64 crc, unsigned __int64 v) + /// CRC32 reg, reg/m64 + /// + public static ulong Crc32(ulong crc, ulong data) => Crc32(crc, data); + + ... } } -``` - -Note: This example hasn't been implemented yet - thus NotImplementedException -rather than PlatformNotSupportedException. So details could change going -forward. +// SSE42.PlatformNotSupported.cs +namespace System.Runtime.Intrinsics.X86 +{ + public static class SSE42 + { + public static bool IsSupported { get { return false; } } + ... + /// + /// unsigned int _mm_crc32_u8 (unsigned int crc, unsigned char v) + /// CRC32 reg, reg/m8 + ... + } +} +``` ## Process 1. Design the API in the `System.Runtime.Intrinsics` namespace. @@ -106,11 +144,11 @@ Many platform intrinsics support vectors over all primitive numeric types, in wh - On Intel architecture, there are intrinsics that are only available on float vectors in earlier classes, but are offered for a broader range of types in later classes (e.g. Add over 256-bit vectors on AVX vs. AVX2). For these, it makes sense to explicitly declare the supported types. - ARM64 intrinsics include comparisons against zero, but they do not support unsigned types. For these, a trivial JIT expansion could be used to enable all the primitive types to be supported. -In general, it is preferable for a particular instantiated type to not be visible in the API, rather than for the use of an unsupported type to result in a runtime error. +In general, it is preferable for a particular instantiation not to be visible in the API, rather than for the use of an unsupported type parameter to result in a runtime error. ## Versioning and Partial Implementation -The platform intrinsics will be versioned with the runtime. New intrinsics can be added within an existing platform class, but all intrinsics declared in the platform class must be implemented in the JIT at the same time that it is exposed in the library. +The platform intrinsics will be versioned with the Target Framework. New intrinsics can be added within an existing platform class, but all intrinsics declared in the platform class must be implemented in the JIT at the same time that it is exposed in the library. ## Optionality of Expansion @@ -119,23 +157,23 @@ In general, it is expected that platform intrinsics will be expanded regardless ## Supporting Diagnostic Tools The implementation of the intrinsic must enable indirect invocation (e.g. via diagnostic tools or reflection). This is accomplished as follows: - * The source implementation (e.g. in C#) first checks whether the ISA Class is supported, and throws `PlatformNotSupportedException` if not. Otherwise, it performs any necessary checks on the arguments. Finally, if all checks pass, it invokes itself recursively. This call is recursive, and will be expanded by the JIT. + * The source implementation (e.g. in C#) invokes itself recursively for the target platform. On other platforms, the source implementation directly throws `NotImplementedException`. * When the JIT encounters an intrinsic that it recognizes, it checks the constraints that it requires to generate the simple expansion. This may include: * Checking whether one or more operands is an immediate. This is required for some instructions, such as shuffle, that require an immediate and have no corresponding register form. * Checking whether the generic type parameters, if any, match those supported for the current architecture. See [Supporting Generic Intrinsics](#suporting-generic-intrinsics) above. * If the intrinsic fails to meet the necessary criteria: - * If the method is recursive, i.e. `gtIsRecursiveCall()` returns true: + * If the method is recursive, i.e. `mustExpand` is true: * If the constraint is one that the JIT must expand in order to support reflection and diagnostic tools (currently only the immediate operand case), it will produce an expanded implementation (e.g. a switchtable of possible immediate values). - * Otherwise, the JIT throws a `BADCODE` exception. Note that this case should not occur, as the recursive IL implementation should have checked all contraints other than those the JIT is expected to expand. + * Note that it is undesirable for the reflection and diagnostic tool scenario to be supported via a software emulation + of the instruction in managed code. This is because the switchtable approach ensures that each case of the + switch will execute using the target instruction with the given inputs, ensuring fidelity. + * Otherwise, the JIT throws the appropriate exception. * Otherwise (the method is not recursive), the JIT generates a regular call. This will cause the IL implementation to be invoked, and will trigger the recursive case above if it has not yet been JIT'd. * Otherwise the intrinsic will be expanded in its default simple form. If the intrinsic is supported only for the constant case, and the argument passed is not a constant, then it should be treated as a regular call. This will then cause the IL implementation to be JIT'd (if it has not already been). Then we will land in the "we are recursive" case and the JIT will generate the switch statement. -If the intrinsic is not supported for some other reason, e.g. for the given generic base type, again it should be treated as a regular call. However, in this case, the IL implementation, in addition to being recursive, should check for a valid base type and throw if it is not. -If the JIT is unable to optimize away the intrinsic in the case where the base type doesn't match, then it should assert in Checked/Debug mode, and either generate the throw, or (simpler to implement) throw a BAD_CODE exception to the VM. -I'm not sure if there are other preconditions to check for, but if so it would be desirable if they could fit into the same "treat it as a regular call unless it is recursive". - +If the intrinsic is not supported for some other reason, e.g. for the given generic base type, the JIT must generate code to throw the appropriate exception. ### FAQ: @@ -155,11 +193,14 @@ should try and follow regular C# naming conventions and choose names that indicate the semantic usage. In addition, if an equivalent intrinsic has been defined for another platform, *with the same functionality*, the naming should be consistent. Q: Is a software fallback implementation allowed? (Discussed above a bit) -A: Fallback is allowed but not required, except in the immediate operand as described above. For very low level -implementations a fallback could even be misleading. +A: Fallback is expressly discouraged (note that the immediate operand described above is not strictly a "software" fallback, +as it utilizes the same target instruction, generated by the same JIT). Ensuring fidelity with the +hardware semantics is difficult to ensure, and for very low level implementations a fallback could be misleading. Q: If C# adds support for a const qualifier, will this eliminate the need for switchtable expansion in the JIT? -A: Probably not. While it would improve the usability. +A: No. While it might improve the usability, not all compilers may recognize the attribute, and we would still have +to support the non-constant case for the reflection and diagnostic tools scenarios. It is expected that analysis tools +will be developed to identify cases where an intrinsic is called with a non-constant value when a constant is expected. Q: How are overloads handled? A: Overloads are allowed but expected to be rare. Lots of intrinsics are @@ -173,4 +214,4 @@ accelerated code as well as the user provided independent fallback are preserved in the output and selected between based on the runtime check result. Second, an unguarded platform intrinsic is used. If this is run on an platform that doesn't support it then an low level illegal instruction fault is -generated. \ No newline at end of file +generated. From b44acb3c5d1083387ef06687f413ba3441625011 Mon Sep 17 00:00:00 2001 From: Carol Eidt Date: Wed, 21 Mar 2018 07:45:16 -0700 Subject: [PATCH 3/3] Update platform-intrinsics.md --- accepted/platform-intrinsics.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/accepted/platform-intrinsics.md b/accepted/platform-intrinsics.md index ba6e038d0..7197a2f4e 100644 --- a/accepted/platform-intrinsics.md +++ b/accepted/platform-intrinsics.md @@ -157,7 +157,7 @@ In general, it is expected that platform intrinsics will be expanded regardless ## Supporting Diagnostic Tools The implementation of the intrinsic must enable indirect invocation (e.g. via diagnostic tools or reflection). This is accomplished as follows: - * The source implementation (e.g. in C#) invokes itself recursively for the target platform. On other platforms, the source implementation directly throws `NotImplementedException`. + * The source implementation (e.g. in C#) invokes itself recursively for the target platform. On other platforms, the source implementation directly throws `PlatformNotSupportedException`. * When the JIT encounters an intrinsic that it recognizes, it checks the constraints that it requires to generate the simple expansion. This may include: * Checking whether one or more operands is an immediate. This is required for some instructions, such as shuffle, that require an immediate and have no corresponding register form. * Checking whether the generic type parameters, if any, match those @@ -178,8 +178,8 @@ If the intrinsic is not supported for some other reason, e.g. for the given gene ### FAQ: Q: What type and namespace should be used? -A: System.Runtime.Intrinsics is the top namespace, but below that -we would expect a breakout based on architecture and platform. Care should be +A: System.Runtime.Intrinsics is the top namespace. Below that +is a breakout based on architecture and platform. Care should be taken to avoid collisions - an example might be an architecture like ARM, where there are multiple licensees - with the best recommendation being getting feedback from the community early. @@ -193,7 +193,7 @@ should try and follow regular C# naming conventions and choose names that indicate the semantic usage. In addition, if an equivalent intrinsic has been defined for another platform, *with the same functionality*, the naming should be consistent. Q: Is a software fallback implementation allowed? (Discussed above a bit) -A: Fallback is expressly discouraged (note that the immediate operand described above is not strictly a "software" fallback, +A: Fallback is expressly discouraged (note that the immediate operand fallback described above is not strictly a "software" fallback, as it utilizes the same target instruction, generated by the same JIT). Ensuring fidelity with the hardware semantics is difficult to ensure, and for very low level implementations a fallback could be misleading.