New kernel set for Arm SVE using assembly#396
Conversation
Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
|
@docularxu Sorry for the delayed response. Thank you for this contribution! I checked the new files and it appears that you found, read, and followed the section of the ConfigurationHowTo regarding adding new kernel sets. Thanks for making my review easy! We don't have access to ARM SVE hardware, so we can't test out the code. Furthermore, we have limited experience on ARM, so please consider yourself nominated as the maintainer of these new kernels going forward. :) But feel free to ask us questions should the need arise. |
|
@fgvanzee https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator, but I completely understand why you would not want to mess with this. |
Yeah, unfortunately I don't have time to take on a new project like this. But it's good to know such resources are available for others and/or for my own use in the future. |
|
Unfortunately, we have to prioritize work for which we are actually sponsored. ARM architectures do not qualify at the moment due to that restriction.
… On May 8, 2020, at 2:02 PM, Jeff Hammond ***@***.***> wrote:
@fgvanzee <https://github.com/fgvanzee> https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator <https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator>, but I completely understand why you would not want to mess with this.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#396 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLLYJ6RNIXULEJH43D3CO3RQRJMBANCNFSM4MSOXDXQ>.
|
|
I am using 'qemu-aarch64' [1] for verification of these two kernels. And I will keep optimizing them when I get hands-on real hardware. So far, the only public commercially available ARM-SVE-enabled hardware is Fujitsu's A64FX [2]. A64FX has 512-bit vector length, which is different with what I submitted. [1] https://www.linaro.org/blog/sve-in-qemu-linux-user/ |
Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
Here adds two kernels for Arm SVE vector extensions.
To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.
"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code." [1]
[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning
Signed-off-by: Guodong Xu guodong.xu@linaro.org