New kernel set for Arm SVE using assembly by docularxu · Pull Request #396 · flame/blis

docularxu · 2020-04-28T03:36:00Z

Here adds two kernels for Arm SVE vector extensions.

a gemm kernel for double at sizes 8x8.
a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code." [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu guodong.xu@linaro.org

Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>

fgvanzee · 2020-04-29T17:07:49Z

@docularxu Sorry for the delayed response. Thank you for this contribution! I checked the new files and it appears that you found, read, and followed the section of the ConfigurationHowTo regarding adding new kernel sets. Thanks for making my review easy!

We don't have access to ARM SVE hardware, so we can't test out the code. Furthermore, we have limited experience on ARM, so please consider yourself nominated as the maintainer of these new kernels going forward. :) But feel free to ask us questions should the need arise.

jeffhammond · 2020-05-08T19:02:07Z

@fgvanzee https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator, but I completely understand why you would not want to mess with this.

fgvanzee · 2020-05-08T19:32:28Z

but I completely understand why you would not want to mess with this.

Yeah, unfortunately I don't have time to take on a new project like this. But it's good to know such resources are available for others and/or for my own use in the future.

rvdg · 2020-05-08T19:36:58Z

Unfortunately, we have to prioritize work for which we are actually sponsored. ARM architectures do not qualify at the moment due to that restriction.

…

On May 8, 2020, at 2:02 PM, Jeff Hammond ***@***.***> wrote: @fgvanzee <https://github.com/fgvanzee> https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator <https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator>, but I completely understand why you would not want to mess with this. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#396 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLLYJ6RNIXULEJH43D3CO3RQRJMBANCNFSM4MSOXDXQ>.

docularxu · 2020-05-15T08:13:04Z

I am using 'qemu-aarch64' [1] for verification of these two kernels. And I will keep optimizing them when I get hands-on real hardware.

So far, the only public commercially available ARM-SVE-enabled hardware is Fujitsu's A64FX [2]. A64FX has 512-bit vector length, which is different with what I submitted.

[1] https://www.linaro.org/blog/sve-in-qemu-linux-user/
[2] https://github.com/fujitsu/A64FX

Here adds two kernels for Arm SVE vector extensions. 1. a gemm kernel for double at sizes 8x8. 2. a packm kernel for double at dimension 8xk. To achive best performance, variable length agonostic programming is not used. Vector length (VL) of 256 bits is mandated in both kernels. Kernels to support other VLs can be added later. "SVE is a vector extension for AArch64 execution mode for the A64 instruction set of the Armv8 architecture. Unlike other SIMD architectures, SVE does not define the size of the vector registers, but constrains into a range of possible values, from a minimum of 128 bits up to a maximum of 2048 in 128-bit wide units. Therefore, any CPU vendor can implement the extension by choosing the vector register size that better suits the workloads the CPU is targeting. Instructions are provided specifically to query an implementation for its register size, to guarantee that the applications can run on different implementations of the ISA without the need to recompile the code." [1] [1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning Signed-off-by: Guodong Xu <guodong.xu@linaro.org>

fgvanzee merged commit f032d5d into flame:master Apr 29, 2020

xrq-phys mentioned this pull request Jan 1, 2022

Enable user-customization of GEMM-like kernels and microkernels #583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New kernel set for Arm SVE using assembly#396

New kernel set for Arm SVE using assembly#396
fgvanzee merged 1 commit intoflame:masterfrom
docularxu:sve-upstream-staging

docularxu commented Apr 28, 2020

Uh oh!

fgvanzee commented Apr 29, 2020

Uh oh!

jeffhammond commented May 8, 2020

Uh oh!

fgvanzee commented May 8, 2020

Uh oh!

rvdg commented May 8, 2020 via email

Uh oh!

docularxu commented May 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

docularxu commented Apr 28, 2020

Uh oh!

fgvanzee commented Apr 29, 2020

Uh oh!

jeffhammond commented May 8, 2020

Uh oh!

fgvanzee commented May 8, 2020

Uh oh!

rvdg commented May 8, 2020 via email

Uh oh!

docularxu commented May 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants